Abstract
We present a new visual parsing method based on standard Convolutional Neural Networks (CNNs) for handwritten and typeset mathematical formulas. The Query-Driven Global Graph Attention (QD-GGA) parser employs multi-task learning, using a single feature representation for locating, classifying, and relating symbols. QD-GGA parses formulas by first constructing a Line-Of-Sight (LOS) graph over the input primitives (e.g handwritten strokes or connected components in images). Second, class distributions for LOS nodes and edges are obtained using query-specific feature filters (i.e., attention) in a single feed-forward pass. This allows end-to-end structure learning using a joint loss over primitive node and edge class distributions. Finally, a Maximum Spanning Tree (MST) is extracted from the weighted graph using Edmonds' Arborescence Algorithm. The model may be run recurrently over the input graph, updating attention to focus on symbols detected in the previous iteration. QD-GGA does not require additional grammar rules and the language model is learned from the sets of symbols/relationships and the statistics over them in the training set.
We benchmark our system against both handwritten and typeset state-of-the-art math recognition systems. Our preliminary results show that this is a promising new approach for visual parsing of math formulas. Using recurrent execution, symbol detection is near perfect for both handwritten and typeset formulas: we obtain a symbol f-measure of over 99.4% for both the CROHME (handwritten) and INFTYMCCDB-2 (typeset formula image) datasets. Our method is also much faster in both training and execution than state-of-the-art
RNN-based formula parsers. The unlabeled structure detection of QDGGA is competitive with encoder-decoder models, but QD-GGA symbol and relationship classification is weaker. We believe this may be addressed through increased use of spatial features and global context.
Library of Congress Subject Headings
Computer vision; Machine learning; Neural networks (Computer science); Optical pattern recognition; Mathematical symbols (Typefaces)--Classification; Graph theory
Publication Date
8-7-2020
Document Type
Dissertation
Student Type
Graduate
Degree Name
Imaging Science (Ph.D.)
Department, Program, or Center
Chester F. Carlson Center for Imaging Science (COS)
Advisor
Richard Zanibbi
Advisor/Committee Member
Daniel Phillips
Advisor/Committee Member
Carl Salvaggio
Recommended Citation
Mahdavi, Mahshad, "Query-Driven Global Graph Attention Model for Visual Parsing: Recognizing Handwritten and Typeset Math Formulas" (2020). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10543
Campus
RIT – Main Campus
Plan Codes
IMGS-PHD