Abstract

We have developed a layout-based math retrieval system by indexing on pairs of symbols in mathematical expressions. Existing approaches to layout-based retrieval include tree edit distance-based matching on MathML trees (Kamali and Tompa, 2013) and longest common subsequence matching in LATEX strings (Kumar et al., 2012). In our work, we compare our new layout-based retrieval method with a math retrieval system built using the conventional text-based retrieval system Lucene (Zanibbi and Yuan, 2011), as such systems are commonly used for math search. We show that the search results returned by our system are scored by participants in a study as significantly more similar than those of the comparison system and that our system is fast enough to be used in real time.

Library of Congress Subject Headings

Mathematical symbols (Typefaces)--Classification; Information retrieval; Layout (Printing)

Publication Date

8-1-2013

Document Type

Thesis

Department, Program, or Center

Computer Science (GCCIS)

Advisor

Zanibbi, Richard

Advisor/Committee Member

Yuan, Bo

Advisor/Committee Member

Butler, Zack

Comments

Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works in December 2013. Physical copy available from RIT's Wallace Library at Z250.6.M3 S73 2013

Campus

RIT – Main Campus

Plan Codes

COMPSCI-MS

Share

COinS