In computer vision and image processing, object detection algorithms are used to detect semantic objects of certain classes of images and videos. Object detector algorithms use deep learning networks to classify detected regions. Unprecedented advancements in Convolutional Neural Networks (CNN) have led to new possibilities and implementations for object detectors. An object detector which uses a deep learning algorithm detect objects through proposed regions, and then classifies the region using a CNN. Object detectors are computationally efficient unlike a typical CNN which is computationally complex and expensive. Object detectors are widely used for face detection, recognition, and object tracking. In this thesis, deep learning based object detection algorithms are implemented to classify facially expressed emotions in real-time captured through a webcam. A typical CNN would classify images without specifying regions within an image, which could be considered as a limitation towards better understanding the network performance which depend on different training options. It would also be more difficult to verify whether a network have converged and is able to generalize, which is the ability to classify unseen data, data which was not part of the training set. Fast Region-based Convolutional Neural Network, an object detection algorithm; used to detect facially expressed emotion in real-time by classifying proposed regions. The Fast R-CNN is trained using a high-quality video database, consisting of 24 actors, facially expressing eight different emotions, obtained from images which were processed from 60 videos per actor. An object detector’s performance is measured using various metrics. Regardless of how an object detector performed with respect to average precision or miss rate, doing well on such metrics would not necessarily mean that the network is correctly classifying regions. This may result from the fact that the network model has been over-trained. In our work we showed that object detector algorithm such as Fast R-CNN performed surprisingly well in classifying facially expressed emotions in real-time, performing better than CNN.

Library of Congress Subject Headings

Human face recognition (Computer science); Emotion recognition; Computer vision; Neural networks (Computer science); Convolutions (Mathematics)

Publication Date


Document Type


Student Type


Degree Name

Electrical Engineering (MS)

Department, Program, or Center

Electrical Engineering (KGCOE)


Abdulla Ismail

Advisor/Committee Member

Boutheina Tlili

Advisor/Committee Member

Jinane Mounsef


RIT Dubai

Plan Codes