Abstract
We introduce the Scanning Single Shot Detector (ScanSSD) for detecting both embedded and displayed math expressions in document images using a single-stage network that does not require page layout, font, or, character information. ScanSSD uses sliding windows to generate sub-images of large document page images rendered at 600 dpi and applies Single Shot Detector (SSD) on each sub-image. Detection results from sub-images are pooled to generate page-level results. For pooling sub-image level detections, we introduce new methods based on the confidence scores and density of detections. ScanSSD is a modular architecture that can be easily applied to detecting other objects in document images.
For the math expression detection task, we have created a new dataset called TFD-ICDAR 2019 from the existing GTDB datasets. Our dataset has 569 pages for training with 26,396 math expressions and 236 pages for testing with 11,885 math expressions. ScanSSD achieves an 80.19% F-score at IOU50 and a 72.96% F-score at IOU75 on TFD-ICDAR 2019 test dataset. An earlier version of ScanSSD placed 2nd in the ICDAR 2019 competition on the Typeset Formula Detection (TFD). Our data and code are publicly available at https://github.com/MaliParag/TFD-ICDAR2019 and https://github.com/MaliParag/ScanSSD, respectively.
Library of Congress Subject Headings
Optical character recognition; Optical pattern recognition; Image processing--Digital techniques; Mathematical notation
Publication Date
8-2019
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Science (MS)
Department, Program, or Center
Computer Science (GCCIS)
Advisor
Richard Zanibbi
Advisor/Committee Member
Zack Butler
Advisor/Committee Member
Joe Geigel
Recommended Citation
Mali, Parag Shrikrishna, "Scanning Single Shot Detector for Math in Document Images" (2019). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10210
Campus
RIT – Main Campus
Plan Codes
COMPSCI-MS