Theses

Query-Driven Global Graph Attention Model for Visual Parsing: Recognizing Handwritten and Typeset Math Formulas

Abstract

We present a new visual parsing method based on standard Convolutional Neural Networks (CNNs) for handwritten and typeset mathematical formulas. The Query-Driven Global Graph Attention (QD-GGA) parser employs multi-task learning, using a single feature representation for locating, classifying, and relating symbols. QD-GGA parses formulas by first constructing a Line-Of-Sight (LOS) graph over the input primitives (e.g handwritten strokes or connected components in images). Second, class distributions for LOS nodes and edges are obtained using query-specific feature filters (i.e., attention) in a single feed-forward pass. This allows end-to-end structure learning using a joint loss over primitive node and edge class distributions. Finally, a Maximum Spanning Tree (MST) is extracted from the weighted graph using Edmonds' Arborescence Algorithm. The model may be run recurrently over the input graph, updating attention to focus on symbols detected in the previous iteration. QD-GGA does not require additional grammar rules and the language model is learned from the sets of symbols/relationships and the statistics over them in the training set.

We benchmark our system against both handwritten and typeset state-of-the-art math recognition systems. Our preliminary results show that this is a promising new approach for visual parsing of math formulas. Using recurrent execution, symbol detection is near perfect for both handwritten and typeset formulas: we obtain a symbol f-measure of over 99.4% for both the CROHME (handwritten) and INFTYMCCDB-2 (typeset formula image) datasets. Our method is also much faster in both training and execution than state-of-the-art

RNN-based formula parsers. The unlabeled structure detection of QDGGA is competitive with encoder-decoder models, but QD-GGA symbol and relationship classification is weaker. We believe this may be addressed through increased use of spatial features and global context.

Library of Congress Subject Headings

Computer vision; Machine learning; Neural networks (Computer science); Optical pattern recognition; Mathematical symbols (Typefaces)--Classification; Graph theory

Publication Date

8-7-2020

Document Type

Dissertation

Student Type

Graduate

Degree Name

Imaging Science (Ph.D.)

Department, Program, or Center

Chester F. Carlson Center for Imaging Science (COS)

Advisor

Richard Zanibbi

Advisor/Committee Member

Daniel Phillips

Advisor/Committee Member

Carl Salvaggio

Recommended Citation

Mahdavi, Mahshad, "Query-Driven Global Graph Attention Model for Visual Parsing: Recognizing Handwritten and Typeset Math Formulas" (2020). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10543

Campus

RIT – Main Campus

Plan Codes

IMGS-PHD

Download

COinS

Theses

Query-Driven Global Graph Attention Model for Visual Parsing: Recognizing Handwritten and Typeset Math Formulas

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

Query-Driven Global Graph Attention Model for Visual Parsing: Recognizing Handwritten and Typeset Math Formulas

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links