Abstract
Induced affect is the emotional effect of an object on an individual. It can be quantified through two metrics: valence and arousal. Valance quantifies how positive or negative something is, while arousal quantifies the intensity from calm to exciting. These metrics enable researchers to study how people opine on various topics. Affective content analysis of visual media is a challenging problem due to differences in perceived reactions. Industry standard machine learning classifiers such as Support Vector Machines can be used to help determine user affect. The best affect-annotated video datasets are often analyzed by feeding large amounts of visual and audio features through machine-learning algorithms. The goal is to maximize accuracy, with the hope that each feature will bring useful information to the table.
We depart from this approach to quantify how different modalities such as visual, audio, and text description information can aid in the understanding affect. To that end, we train independent models for visual, audio and text description. Each are convolutional neural networks paired with support vector machines to classify valence and arousal. We also train various ensemble models that combine multi-modal information with the hope that the information from independent modalities benefits each other.
We find that our visual network alone achieves state-of-the-art valence classification accuracy and that our audio network, when paired with our visual, achieves competitive results on arousal classification. Each network is much stronger on one metric than the other. This may lead to more sophisticated multimodal approaches to accurately identifying affect in video data. This work also contributes to induced emotion classification by augmenting existing sizable media datasets and providing a robust framework for classifying the same.
Library of Congress Subject Headings
Video recordings--Psychological aspects; Video recordings--Data processing; Phenomenology--Research; Support vector machines; Machine learning
Publication Date
7-2017
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Engineering (MS)
Department, Program, or Center
Computer Engineering (KGCOE)
Advisor
Raymond Ptucha
Advisor/Committee Member
Emily Prud'hommeaux
Advisor/Committee Member
Andres Kwasinski
Recommended Citation
Thomas, Titus Pallithottathu, "The Emotional Impact of Audio - Visual Stimuli" (2017). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/9557
Campus
RIT – Main Campus
Plan Codes
CMPE-MS
Comments
Physical copy available from RIT's Wallace Library at PN1992.95 .T46 2017