Abstract
We demonstrate that recognition of scanned typeset mathematical expression images can be done by extracting maximum spanning trees from line of sight graphs weighted using geometric and visual density features. The approach used is hierarchical contextual parsing (HCP): Hierarchical in terms of starting with connected components and building to the symbol level using visual, spatial, and contextual features of connected components. Once connected components have been segmented into symbols, a new set of spatial, visual, and contextual features are extracted. One set of visual features is used for symbol classification, and another for parsing. The features are used in parsing to assign classifications and confidences to edges in a line of sight symbol graph. Layout trees describe expression structure in terms of spatial relations between symbols, such as horizontal, subscript, and superscript. From the weighted graph Edmonds' algorithm is used to extract a maximum spanning tree. Segmentation and parsing are done without using symbol classification information, and symbol classification is done independently of expression structure recognition. The commonality between the recognition processes is the type of features they use, the visual densities. These visual densities are used for shape, spatial, and contextual information. The contextual information is shown to help in segmentation, parsing, and symbol recognition.
The hierarchical contextual parsing has been implemented in the Python and Graph-based Online/Offline Recognizer for Math (Pythagor^m) system and tested on the InftyMCCDB-2 dataset. We created InftyMCCDB-2 from InftyCDB-2 as a open source dataset for scanned typeset math expression recognition. In building InftyMCCDB-2 modified formula structure representations were used to better capture the spatial positioning of symbols in the expression structures. Namely, baseline punctuation and symbol accents were moved out of horizontal baselines as their positions are not horizontally aligned with symbols on a writing line. With the transformed spatial layouts and HCP, 95.97% of expressions were parsed correctly when given symbols and 93.95% correctly parsed when requiring symbol segmentation from connected components. Overall HCP reached 90.83% expression recognition rate from connected components.
Library of Congress Subject Headings
Image reconstruction; Image processing--Digital techniques; Classification--Data processing; Computer algorithms
Publication Date
8-2017
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Science (MS)
Department, Program, or Center
Computer Science (GCCIS)
Advisor
Richard Zanibbi
Advisor/Committee Member
Zack Butler
Advisor/Committee Member
Ivona Bezakova
Recommended Citation
Condon, Michael Patrick Erickson, "Applying Hierarchical Contextual Parsing with Visual Density and Geometric Features to Typeset Formula Recognition" (2017). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/9502
Campus
RIT – Main Campus
Plan Codes
COMPSCI-MS
Comments
Physical copy available from RIT's Wallace Library at TA1637 .C66 2017