Abstract

This thesis makes three contributions. First, via a substantial corpus of 1,419,047 comments posted on 3,161 YouTube news videos of major US cable news outlets, we analyze how users engage with LGBTQ+ news content. Our analyses focus both on positive and negative content. In particular, we construct a fine-grained hope speech classifier that detects positive hope speech, negative, neutral, and irrelevant content. Second, in consultation with a public health expert specializing on LGBTQ+ health, we conduct an annotation study with a balanced and diverse political representation and release a dataset of 3,750 instances with fine-grained labels and detailed annotator demographic information. Finally, beyond providing a vital resource for the LGBTQ+ community, our annotation study and subsequent in-the-wild assessments reveal (1) strong association between rater political beliefs and how they rate content relevant to a marginalized community; (2) models trained on individual political beliefs exhibit considerable in-the-wild disagreement; and (3) zero-shot large language models (LLMs) align more with liberal raters.

Publication Date

11-8-2024

Document Type

Thesis

Student Type

Graduate

Degree Name

Software Engineering (MS)

Department, Program, or Center

Software Engineering (GCCIS)

College

Golisano College of Computing and Information Sciences

Advisor

Ashique Khudabukhsh

Advisor/Committee Member

Naveen Sharma

Advisor/Committee Member

Christian D. Newman

Campus

RIT – Main Campus

Share

COinS