Abstract
As artificial intelligence and machine learning models become increasingly embedded in decision-making systems across industries, questions about the ethical sourcing of data have grown more pressing; model performance relies heavily on vast, diverse datasets, many of which are harvested without consent, transparency, or equitable representation. This study aims to compare analytics outcomes (accuracy, bias, fairness, and explainability) between ethically sourced and unethically sourced datasets. Using CRISP-DM methodology, the study will develop matched classification models on both dataset types, evaluate their performance and fairness using open-source tools such as Fairlearn and AIF360, and assess broader implications for trust, accountability, and regulatory compliance. Drawing from 30+ peer-reviewed sources, policy frameworks (e.g., GDPR, NIST AI RMF), and real-world case studies (e.g., Amazon’s hiring algorithm, MS-Celeb-1M), this research bridges data ethics and analytics performance to guide future practices in responsible AI.
Publication Date
5-2026
Document Type
Thesis
Student Type
Graduate
Degree Name
Professional Studies (MS)
Department, Program, or Center
Graduate Programs & Research
Advisor
Ioannis Karamitsos
Recommended Citation
Gebara, Lamar, "The Efficacy of Ethical Data: An Analytical Study by" (2026). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12599
Campus
RIT Dubai
