Gati Vashi


The continuous growth of video technology has resulted in increased research into the semantic analysis of video. The multimodal property of the video has made this task very complex. The objective of this thesis was to research, implement and examine the underlying methods and concepts of semantic analysis of videos and improve upon the state of the art in automated emotion recognition by using semantic knowledge in the form of Bayesian inference. The main domain of analysis is facial emotion recognition from video, including both visual and vocal aspects of facial gestures. The goal is to determine if an expression on a person's face in a sequence of video frames is happy, sad, angry, fearful or disgusted. A Bayesian network classification algorithm was designed and used to identify and understand facial expressions in video. The Bayesian network is an attractive choice because it provides a probabilistic environment and gives information about uncertainty from knowledge about the domain. This research contributes to current knowledge in two ways: by providing a novel algorithm that uses edge differences to extract keyframes in video and facial features from the keyframe, and by testing the hypothesis that combining two modalities (vision with speech) yields a better classification result (low false positive rate and high true positive rate) than either modality used alone.

Library of Congress Subject Headings

Human face recognition (Computer science); Computer vision; Pattern recognition systems; Facial expression; Bayesian statistical decision theory

Publication Date


Document Type


Student Type


Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science (GCCIS)


Roxanne Canosa

Advisor/Committee Member

Leon Reznik

Advisor/Committee Member

Zack Butler


Physical copy available from RIT's Wallace Library at TA1650 .V37 2011


RIT – Main Campus

Plan Codes