Abstract
Detecting sexism and gender bias in text is a central problem in NLP, yet most existing work focuses on short, informal content such as social media posts. In contrast, much less is known about how sexism operates in long-form, formal discourse, where it is often expressed indirectly through rhetorical framing, appeals to tradition, or references to gender roles. Parliamentary debates offer a particularly rich setting for studying such language, but their scale, linguistic complexity, use of archaic English, and the presence of distributional shift across centuries make systematic analysis challenging and historically infeasible without automation. In this work, we analyse 6,531 speeches on women’s political rights from over 200 years of UK parliamentary debate (Hansard, 1803–2005), using large language models to classify both stance and types of sexism, validated against expert annotation. We also release a structured version of the Hansard Corpus optimized for computational social science, comprising 6.7 million speeches across 1.2 million debates, with 89\% gender coverage for MPs in the House of Commons. Our analysis reveals that 88% of speeches opposing women’s representation contain sexist content, compared to only 11% of speeches in support. However, the two sides deploy fundamentally different forms of sexism: anti-suffrage rhetoric is overwhelmingly hostile and proscriptive, while pro-suffrage sexism is predominantly benevolent and paternalistic. Female MPs support women’s political rights at a rate of 95%, compared to 80% for male MPs, with this gap narrowing only after enfranchisement. Our findings provide large-scale, naturalistic evidence that hostile and benevolent sexism serve distinct rhetorical functions in political discourse. By combining LLM-based classification with established social-psychological theory, we demonstrate how computational methods can uncover the structure of reasoning in historical text, enabling the study of bias not just in what is said, but in how arguments are made across centuries of institutional debate.
Publication Date
4-29-2026
Document Type
Thesis
Student Type
Graduate
Degree Name
Artificial Intelligence (MS)
Department, Program, or Center
Information and Computing Studies
College
Golisano College of Computing and Information Sciences
Advisor
Ashique KhudaBukhsh
Advisor/Committee Member
Christopher Homan
Advisor/Committee Member
Evan Selinger
Recommended Citation
Sawkar, Mandira, "Two Centuries of Sexism in British Parliament: A Computational Analysis of Women’s Representation in the Hansard Corpus" (2026). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12556
Campus
RIT – Main Campus
