I applied clustering analysis to the problem of creating tagged training data for optical character recognition (OCR). The creation of labeled character data by hand is a slow and cumbersome process. My belief is that clustering methods can be applied to character data before tagging it, allowing the operator to label entire groups of characters at once and greatly speeding the time in which tagged character data can be generated. This thesis will provide proof of concept as a basis for more in depth research and eventually the creation of a sophisticated application utilizing these techniques for the generation of labeled training data for OCR systems.

Library of Congress Subject Headings

Cluster analysis; Genetic algorithms; Optical character recognition devices

Publication Date


Document Type


Department, Program, or Center

Computer Science (GCCIS)


Anderson, Peter

Advisor/Committee Member

Kazemian, Fereydoun

Advisor/Committee Member

Radziszowski, Stanislaw


Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works. Physical copy available through RIT's The Wallace Library at: QA278.65 .G743 1997


RIT – Main Campus