Abstract
Community resource inequities and undetected infrastructure vulnerabilities cost municipalities billions annually, with disproportionate impacts on marginalized communities. Current computational approaches to community needs assessment suffer from two critical limitations: they rely on aggregate-level analysis that obscures granular community expressions, and they implement sophisticated computational tools that remain inaccessible to stakeholders without technical expertise. This creates what we term the Community-Computational Gap—a persistent divide where domain experts with vital contextual knowledge cannot access the analytical tools they need, while computational experts develop models without sufficient community context. This dissertation addresses these challenges through a novel methodological framework for fine-grained utterance-level classification of community discourse. We demonstrate that individual units of communication contain sufficient linguistic patterns to accurately detect both present and future community needs without relying on metadata or aggregated approaches. Unlike existing methods that operate at neighborhood or census-tract levels, our approach preserves the contextual integrity of individual expressions while enabling more precise signal extraction from social media noise. Our technical implementation achieves unprecedented accuracy—94\% for existing community needs and assets and 82\% for future infrastructural issues—through two specialized computational models validated on novel annotated datasets: 3,511 Reddit conversations for community needs assessment and 2,662 social web instances related to future infrastructure concerns. Our linguistic analysis reveals distinctive discourse patterns where needs-related conversations exhibit specific pain points and calls to action, while asset-related discussions emphasize local resources and community capabilities. To bridge the Community-Computational Gap, we develop two complementary platforms: Citizenly, a mobile application enabling community-driven data collection and visualization, and CommuniDI, an integrated framework leveraging large language models that enables non-technical stakeholders to create sophisticated classifiers without programming requirements. Through mixed-methods evaluation, we demonstrate that this democratization—the systematic transfer of analytical capabilities to local stakeholders—maintains technical rigor (with F1 scores consistently exceeding 85\% across diverse datasets) while significantly lowering barriers to computational social science research. This work represents a fundamental shift from computational tools \textit{for} communities to computational tools \textit{with} communities, enabling more equitable and effective approaches to addressing local challenges.
Library of Congress Subject Headings
Social sciences--Data processing; Artificial intelligence--Social aspects; Natural language processing (Computer science); Machine learning
Publication Date
9-2025
Document Type
Dissertation
Student Type
Graduate
Degree Name
Computing and Information Sciences (Ph.D.)
Department, Program, or Center
Computing and Information Sciences Ph.D, Department of
College
Golisano College of Computing and Information Sciences
Advisor
Naveen Sharma
Advisor/Committee Member
Carlos R. Rivero
Advisor/Committee Member
Ashique KhudaBukhsh
Recommended Citation
Chowdhury, Md Towhidul Absar, "Democratizing Community Discourse Analysis in Computational Social Science" (2025). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12325
Campus
RIT – Main Campus
Plan Codes
COMPIS-PHD
