We have developed a layout-based math retrieval system by indexing on pairs of symbols in mathematical expressions. Existing approaches to layout-based retrieval include tree edit distance-based matching on MathML trees (Kamali and Tompa, 2013) and longest common subsequence matching in LATEX strings (Kumar et al., 2012). In our work, we compare our new layout-based retrieval method with a math retrieval system built using the conventional text-based retrieval system Lucene (Zanibbi and Yuan, 2011), as such systems are commonly used for math search. We show that the search results returned by our system are scored by participants in a study as significantly more similar than those of the comparison system and that our system is fast enough to be used in real time.

Library of Congress Subject Headings

Mathematical symbols (Typefaces)--Classification; Information retrieval; Layout (Printing)

Publication Date


Document Type


Department, Program, or Center

Computer Science (GCCIS)


Zanibbi, Richard

Advisor/Committee Member

Yuan, Bo

Advisor/Committee Member

Butler, Zack


Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works in December 2013. Physical copy available from RIT's Wallace Library at Z250.6.M3 S73 2013


RIT – Main Campus

Plan Codes