Communication serves key roles in building relationships through sharing feelings, passing information, and connecting with others. Communication among the hearing impaired remains a significant stumbling block in today’s society since their communication means demands for an interpreter each moment. Various researchers agree that successful communication calls for the involvement of all individuals in a conversation and thus, deaf and hearing-impaired people require precise and welcoming communication to promote their working and learning relationships. Sign Language Recognition (SLR) is a critical and auspicious approach to promoting communication among hearing-impaired people. Sign languages greatly benefit from Machine Learning based translation techniques since they are authentic natural languages characterized by several grammatical requirements and wordbooks. Similarly, sign language may benefit from the computer vision approaches of encoding because it is a visual-spatial language. In the recent past, the advent of machine learning techniques has greatly contributed to significant advances in computer vision approaches and natural language processing. Such advances have motivated the researchers to make efforts on extending the learning techniques to promote understanding of sign language. However, sign language interpretation remains a significant challenge since it entails an unending visual-spatial modality where context helps in deriving the meaning. The proposed project seeks to leverage the benefits of emerging technologies to find an effective and reliable system for recognizing gestures. The project aims at designing a machine learning-based model for recognizing sign language to automatically transcribe sign language videos into text. The work suggests a novel model that utilizes video sequences comprising spatial and temporal features. The study will use different Machine Learning algorithms to train the extracted features from the given datasets. An input video will be used to do sign language translation with the help of the machine learning models and the sign displayed in the video is detected and changed to text. The deep learning approach used had an accuracy of 80.01% on the testing set and 86.94% on the validation set. Using K=29, our k-nearest neighbor model performed with an accuracy of 88.02%. On the validation set, an accuracy of 82.50% was achieved using decision tree classifier. Also an accuracy of 76.86% was achieved using the boosted decision tree approach; similarly an accuracy of 85.00% was achieved using support vector classifier.

Publication Date


Document Type

Master's Project

Student Type


Degree Name

Professional Studies (MS)

Department, Program, or Center

Graduate Programs & Research (Dubai)


Sanjay Modak

Advisor/Committee Member

Khalil Al Hussaeni


RIT Dubai