Abstract

This study explores the growing field of Multimodal Sentiment Analysis (MSA), focusing on understanding how advanced fusion techniques can improve sentiment prediction in social media contexts. As platforms like X and TikTok continue to expand and facilitate sharing sentiment through digital media, there is an increasing need for neural network architectures that can accurately interpret sentiment across modalities. We implement a model using BERT for textual features and ResNet for visual features. A cross-attention fusion module aligns the modalities for joint representation. We conduct experiments on the MVSA-Single and MVSA-Multiple datasets, which contain over 5,000 and 17,000 labeled text-image pairs. Our research explores the interactions between modalities and proposes a sentiment classifier that builds upon and outperforms current baselines while quantifying the contribution of each modality through an intramodality utilization analysis.

Library of Congress Subject Headings

Sentiment analysis; Social media--Data processing; Natural language processing (Computer science); Neural networks (Computer science)

Publication Date

5-2025

Document Type

Thesis

Student Type

Graduate

Degree Name

Artificial Intelligence (MS)

Department, Program, or Center

Electrical and Microelectronic Engineering, Department of

College

Golisano College of Computing and Information Sciences

Advisor

Jamison Heard

Advisor/Committee Member

Irina Mikhalevich

Advisor/Committee Member

Zhe Yu

Campus

RIT – Main Campus

Plan Codes

AI-MS

Share

COinS