Abstract
The thesis introduces a unified, automated document intelligence system that can in-house resolve significant issues in identity authentication and data OCR by two complementary technological components: advanced facial biometrics and powerful multilingual optical character recognition. The system directly addresses the inefficiencies of the slow speed of document processing by humans, human error and high cost of operation by offering a single pipeline for verifying the identity of the user via facial comparison and digitizing textual data contained in documents. The former module adopts an advanced face verification pipeline. It starts with an image pre-processing step, which improves the quality of the document by denoising, contrast enhancement, sharpening, as well as gamma adjustment. The resulting processed images are analysed by the RetinaFace model, which identifies faces with high accuracy and retrieves facial landmarks. These landmarks create a geometric embedding that can be used to match faces efficiently using cosine similarity, producing high-quality MATCH/NOT MATCH decisions that are validated on a variety of datasets such as Iranian and Egyptian ID cards. The second module provides text extraction functionalities in different languages using the EasyOCR engine. This component shows impressive linguistic capability and is able to handle documents written in three different script systems: English invoices with tabular layouts, Chinese student IDs with logographic characters, and Arabic documents with right-to-left text. The module not only returns the extracted text with confidence scores but also visual annotations with bounding boxes, which make the results transparent and verifiable. This combined system has practical applications in financial services, border control, and human resources by ensuring that identity verification is integrated with data digitization in a single workflow. The system is based on open-source technologies and includes a modular architecture, allowing it to scale in the future by adding more accurate facial recognition models, additional language support, and layout analysis to handle more complex documents such as passports and driver’s licenses. This study demonstrates how combining state-of-the-art computer vision and OCR technologies can address urgent challenges in real-world document processing.
Library of Congress Subject Headings
Identification cards--Data processing; Optical character recognition; Authentication--Data processing; Natural language processing (Computer science); Machine learning
Publication Date
12-2025
Document Type
Thesis
Student Type
Graduate
Degree Name
Professional Studies (MS)
Advisor
Sanjay Modak
Advisor/Committee Member
Ioannis Karamitsos
Recommended Citation
Rashed, Khalifa, "Multilingual Identity Document Information Extraction via Dynamic Templates and Hybrid OCR" (2025). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12449
Campus
RIT Dubai
Plan Codes
PROFST-MS
