Abstract
Artificial intelligence (AI) systems frequently need to learn from human decision makers, which aids in scaling systems for mass use. We see this in automated financial decisions, manufacturing processes, self- driving cars, and content moderation on social networks. These systems are built on human values and trained to mimic human decision-making through human annotation. However, different groups of human annotators, including domain experts, annotate content differently, leading to disagreements. Some practitioners treat dissenting opinions as “label noise”. A common approach to dealing with disagreement is to consider a majority vote, which effectively conceals the disagreements from the trained model. Although majority voting is a common approach to resolving disagreements during model training, it can mask diverse perspectives. Even with balanced datasets, imbalanced perspectives within the data can lead to inherent biases in the model. Inherent biases can be hard to identify since they are exposed in instances such as AI models being unfair to groups from minority demographics. In this dissertation, we introduce two models to learn and predict human disagreements: CrowdOpinion and DisCo. CrowdOpinion, a semi-supervised learning approach that models disagreements across the entire annotator population by pooling similar data items. Next, we introduce DisCo, an encoder-decoder-based model designed to capture individual annotators’ characteristics and disagreements during data annotation. Finally, to further emphasize the need for human involvement in building AI models for content moderation systems. We conduct a noise audit of state-of-the-art offensive and hate speech classification models to underscore the importance of involving humans throughout the annotation process. As part of this au- dit, we conducted a human annotation study, focusing on annotators’ ability to identify offensive content for themselves and their capacity to identify offensive content for others through vicarious labeling. Our findings provide compelling evidence that modeling human disagreements is crucial for AI systems to effectively classify offensive and harmful content. We conclude by summarizing the scope of our research and outlining promising avenues for future exploration in this domain.
Publication Date
4-2024
Document Type
Dissertation
Student Type
Graduate
Degree Name
Computing and Information Sciences (Ph.D.)
Department, Program, or Center
Computing and Information Sciences Ph.D, Department of
College
Golisano College of Computing and Information Sciences
Advisor
Christopher M. Homan
Advisor/Committee Member
Alexander G. Ororbia II
Advisor/Committee Member
Ashique KhudaBukhsh
Recommended Citation
Weerasooriya, Tharindu Cyril, "Learning from Disagreement in Human-Annotated Datasets" (2024). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/11899
Campus
RIT – Main Campus