Abstract
In the past decade, usage of social media platforms has increased significantly. People use these platforms to connect with friends and family, share information, news and opinions. Platforms such as Facebook, Twitter are often used to propagate offensive and hateful content online. The open nature and anonymity of the internet fuels aggressive and inflamed conversations. The companies and federal institutions are striving to make social media cleaner, welcoming and unbiased. In this study, we first explore the underlying topics in popular offensive language datasets using statistical and neural topic modeling. The current state-of-the-art models for aggression detection only present a toxicity score based on the entire post. Content moderators often have to deal with lengthy texts without any word-level indicators. We propose a neural transformer approach for detecting the tokens that make a particular post aggressive. The pre-trained BERT model has achieved state-of-the-art results in various natural language processing tasks. However, the model is trained on general-purpose corpora and lacks aggressive social media linguistic features. We propose fBERT, a retrained BERT model with over $1.4$ million offensive tweets from the SOLID dataset. We demonstrate the effectiveness and portability of fBERT over BERT in various shared offensive language detection tasks. We further propose a new multi-task aggression detection (MAD) framework for post and token-level aggression detection using neural transformers. The experiments confirm the effectiveness of the multi-task learning model over individual models; particularly when the number of training data is limited.
Library of Congress Subject Headings
Natural language processing (Computer science); Hate speech--Data processing; Aggressiveness--Data processing
Publication Date
5-2021
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Science (MS)
Department, Program, or Center
Computer Science (GCCIS)
Advisor
Marcos Zampieri
Advisor/Committee Member
Alexander G. Ororbia II
Advisor/Committee Member
Christopher Homan
Recommended Citation
Sarkar, Diptanu, "An Empirical Study of Offensive Language in Online Interactions" (2021). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10792
Campus
RIT – Main Campus
Plan Codes
COMPSCI-MS