Abstract
This study explores the growing field of Multimodal Sentiment Analysis (MSA), focusing on understanding how advanced fusion techniques can improve sentiment prediction in social media contexts. As platforms like X and TikTok continue to expand and facilitate sharing sentiment through digital media, there is an increasing need for neural network architectures that can accurately interpret sentiment across modalities. We implement a model using BERT for textual features and ResNet for visual features. A cross-attention fusion module aligns the modalities for joint representation. We conduct experiments on the MVSA-Single and MVSA-Multiple datasets, which contain over 5,000 and 17,000 labeled text-image pairs. Our research explores the interactions between modalities and proposes a sentiment classifier that builds upon and outperforms current baselines while quantifying the contribution of each modality through an intramodality utilization analysis.
Library of Congress Subject Headings
Sentiment analysis; Social media--Data processing; Natural language processing (Computer science); Neural networks (Computer science)
Publication Date
5-2025
Document Type
Thesis
Student Type
Graduate
Degree Name
Artificial Intelligence (MS)
Department, Program, or Center
Electrical and Microelectronic Engineering, Department of
College
Golisano College of Computing and Information Sciences
Advisor
Jamison Heard
Advisor/Committee Member
Irina Mikhalevich
Advisor/Committee Member
Zhe Yu
Recommended Citation
Gold, Ronen G., "A BERT-ResNet Cross-Attention Fusion Network and Modality Utilization Assessment for Multimodal Sentiment Classification" (2025). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12090
Campus
RIT – Main Campus
Plan Codes
AI-MS