Abstract

The thesis introduces a unified, automated document intelligence system that can in-house resolve significant issues in identity authentication and data OCR by two complementary technological components: advanced facial biometrics and powerful multilingual optical character recognition. The system directly addresses the inefficiencies of the slow speed of document processing by humans, human error and high cost of operation by offering a single pipeline for verifying the identity of the user via facial comparison and digitizing textual data contained in documents. The former module adopts an advanced face verification pipeline. It starts with an image pre-processing step, which improves the quality of the document by denoising, contrast enhancement, sharpening, as well as gamma adjustment. The resulting processed images are analysed by the RetinaFace model, which identifies faces with high accuracy and retrieves facial landmarks. These landmarks create a geometric embedding that can be used to match faces efficiently using cosine similarity, producing high-quality MATCH/NOT MATCH decisions that are validated on a variety of datasets such as Iranian and Egyptian ID cards. The second module provides text extraction functionalities in different languages using the EasyOCR engine. This component shows impressive linguistic capability and is able to handle documents written in three different script systems: English invoices with tabular layouts, Chinese student IDs with logographic characters, and Arabic documents with right-to-left text. The module not only returns the extracted text with confidence scores but also visual annotations with bounding boxes, which make the results transparent and verifiable. This combined system has practical applications in financial services, border control, and human resources by ensuring that identity verification is integrated with data digitization in a single workflow. The system is based on open-source technologies and includes a modular architecture, allowing it to scale in the future by adding more accurate facial recognition models, additional language support, and layout analysis to handle more complex documents such as passports and driver’s licenses. This study demonstrates how combining state-of-the-art computer vision and OCR technologies can address urgent challenges in real-world document processing.

Library of Congress Subject Headings

Identification cards--Data processing; Optical character recognition; Authentication--Data processing; Natural language processing (Computer science); Machine learning

Publication Date

12-2025

Document Type

Thesis

Student Type

Graduate

Degree Name

Professional Studies (MS)

Advisor

Sanjay Modak

Advisor/Committee Member

Ioannis Karamitsos

Campus

RIT Dubai

Plan Codes

PROFST-MS

Share

COinS