Abstract
Recently math formula search engines have become a useful tool for novice users learning a new topic. While systems exist already with the ability to do formula retrieval, they rely on prefix matching and typed query entries. This can be an obstacle for novice users who are not proficient with languages used to express formulas such as LaTeX, or do not remember the left end of a formula, or wish to match formulas at multiple locations (e.g., using `$\int \quad\quad dx$' as a query). We generalize a one dimensional spatial encoding for word spotting in handwritten document images, the Pyramidal Histogram of Characters or PHOC, to obtain the two-dimensional XY-PHOC providing robust spatial embeddings with modest storage requirements, and without requiring costly operations used to generate graphs. The spatial representation captures the relative position of symbols without needing to store explicit edges between symbols. Our spatial representation is able to match queries that are disjoint subgraphs within indexed formulas. Existing graph and tree-based formula retrieval models are not designed to handle disjoint graphs, and relationships may be added to a query that do not exist in the final formula, making it less similar for matching.
XY-PHOC embeddings provide a simple spatial embedding providing competitive results in formula similarity search and autocompletion, and supports queries comprised of symbols in two dimensions, without the need to form a connected graph for search.
Library of Congress Subject Headings
Mathematics--Formulae--Data processing; Information retrieval; Optical pattern recognition; Writing--Data processing; Search engines--Programming
Publication Date
5-2021
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Science (MS)
Department, Program, or Center
Computer Science (GCCIS)
Advisor
Richard Zanibbi
Advisor/Committee Member
Zack Butler
Advisor/Committee Member
Edith Hemaspaandra
Recommended Citation
Avenoso, Robin, "Spatial vs. Graph-Based Formula Retrieval" (2021). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10784
Campus
RIT – Main Campus
Plan Codes
COMPSCI-MS