Abstract
I applied clustering analysis to the problem of creating tagged training data for optical character recognition (OCR). The creation of labeled character data by hand is a slow and cumbersome process. My belief is that clustering methods can be applied to character data before tagging it, allowing the operator to label entire groups of characters at once and greatly speeding the time in which tagged character data can be generated. This thesis will provide proof of concept as a basis for more in depth research and eventually the creation of a sophisticated application utilizing these techniques for the generation of labeled training data for OCR systems.
Library of Congress Subject Headings
Cluster analysis; Genetic algorithms; Optical character recognition devices
Publication Date
1997
Document Type
Thesis
Department, Program, or Center
Computer Science (GCCIS)
Advisor
Anderson, Peter
Advisor/Committee Member
Kazemian, Fereydoun
Advisor/Committee Member
Radziszowski, Stanislaw
Recommended Citation
Greenwald, Jennifer, "Optical character categorization: Clustering as it applies to OCR" (1997). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/67
Campus
RIT – Main Campus
Comments
Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works. Physical copy available through RIT's The Wallace Library at: QA278.65 .G743 1997