Abstract

As artificial intelligence and machine learning models become increasingly embedded in decision-making systems across industries, questions about the ethical sourcing of data have grown more pressing; model performance relies heavily on vast, diverse datasets, many of which are harvested without consent, transparency, or equitable representation. This study aims to compare analytics outcomes (accuracy, bias, fairness, and explainability) between ethically sourced and unethically sourced datasets. Using CRISP-DM methodology, the study will develop matched classification models on both dataset types, evaluate their performance and fairness using open-source tools such as Fairlearn and AIF360, and assess broader implications for trust, accountability, and regulatory compliance. Drawing from 30+ peer-reviewed sources, policy frameworks (e.g., GDPR, NIST AI RMF), and real-world case studies (e.g., Amazon’s hiring algorithm, MS-Celeb-1M), this research bridges data ethics and analytics performance to guide future practices in responsible AI.

Publication Date

5-2026

Document Type

Thesis

Student Type

Graduate

Degree Name

Professional Studies (MS)

Department, Program, or Center

Graduate Programs & Research

Advisor

Ioannis Karamitsos

Campus

RIT Dubai

Share

COinS