Abstract
Large data collections containing millions of math formulae in different formats are available on-line. Retrieving math expressions from these collections is challenging. We propose a framework for retrieval of mathematical notation using symbol pairs extracted from visual and semantic representations of mathematical expressions on the symbolic domain for retrieval of text documents. We further adapt our model for retrieval of mathematical notation on images and lecture videos. Graph-based representations are used on each modality to describe math formulas. For symbolic formula retrieval, where the structure is known, we use symbol layout trees and operator trees. For image-based formula retrieval, since the structure is unknown we use a more general Line of Sight graph representation. Paths of these graphs define symbol pairs tuples that are used as the entries for our inverted index of mathematical notation. Our retrieval framework uses a three-stage approach with a fast selection of candidates as the first layer, a more detailed matching algorithm with similarity metric computation in the second stage, and finally when relevance assessments are available, we use an optional third layer with linear regression for estimation of relevance using multiple similarity scores for final re-ranking. Our model has been evaluated using large collections of documents, and preliminary results are presented for videos and cross-modal search. The proposed framework can be adapted for other domains like chemistry or technical diagrams where two visually similar elements from a collection are usually related to each other.
Library of Congress Subject Headings
Mathematics--Formulae; Information retrieval; Coding theory
Publication Date
7-2017
Document Type
Dissertation
Student Type
Graduate
Degree Name
Computing and Information Sciences (Ph.D.)
Advisor
Richard Zanibbi
Advisor/Committee Member
Stephanie Ludi
Advisor/Committee Member
Daniel B. Phillips
Recommended Citation
Davila Castellanos, Kenny, "Symbolic and Visual Retrieval of Mathematical Notation using Formula Graph Symbol Pair Matching and Structural Alignment" (2017). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/9588
Campus
RIT – Main Campus
Plan Codes
COMPIS-PHD
Comments
Physical copy available from RIT's Wallace Library at QA41 .D38 2017