Abstract

I applied clustering analysis to the problem of creating tagged training data for optical character recognition (OCR). The creation of labeled character data by hand is a slow and cumbersome process. My belief is that clustering methods can be applied to character data before tagging it, allowing the operator to label entire groups of characters at once and greatly speeding the time in which tagged character data can be generated. This thesis will provide proof of concept as a basis for more in depth research and eventually the creation of a sophisticated application utilizing these techniques for the generation of labeled training data for OCR systems.

Library of Congress Subject Headings

Cluster analysis; Genetic algorithms; Optical character recognition devices

Publication Date

1997

Document Type

Thesis

Department, Program, or Center

Computer Science (GCCIS)

Advisor

Anderson, Peter

Advisor/Committee Member

Kazemian, Fereydoun

Advisor/Committee Member

Radziszowski, Stanislaw

Comments

Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works. Physical copy available through RIT's The Wallace Library at: QA278.65 .G743 1997

Campus

RIT – Main Campus

Share

COinS