Abstract
We have developed a layout-based math retrieval system by indexing on pairs of symbols in mathematical expressions. Existing approaches to layout-based retrieval include tree edit distance-based matching on MathML trees (Kamali and Tompa, 2013) and longest common subsequence matching in LATEX strings (Kumar et al., 2012). In our work, we compare our new layout-based retrieval method with a math retrieval system built using the conventional text-based retrieval system Lucene (Zanibbi and Yuan, 2011), as such systems are commonly used for math search. We show that the search results returned by our system are scored by participants in a study as significantly more similar than those of the comparison system and that our system is fast enough to be used in real time.
Library of Congress Subject Headings
Mathematical symbols (Typefaces)--Classification; Information retrieval; Layout (Printing)
Publication Date
8-1-2013
Document Type
Thesis
Department, Program, or Center
Computer Science (GCCIS)
Advisor
Zanibbi, Richard
Advisor/Committee Member
Yuan, Bo
Advisor/Committee Member
Butler, Zack
Recommended Citation
Stalnaker, David, "Math expression retrieval using symbol pairs in layout trees" (2013). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/5533
Campus
RIT – Main Campus
Plan Codes
COMPSCI-MS
Comments
Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works in December 2013. Physical copy available from RIT's Wallace Library at Z250.6.M3 S73 2013