Abstract
Student dropout remains a persistent challenge in higher education, undermining institutional performance, reducing workforce preparedness, and limiting students’ academic and economic opportunities. Accurately identifying students at risk of attrition is complex, due to the interplay of academic, financial, and behavioral factors. This thesis addresses this challenge by applying a combined machine learning framework—integrating both unsupervised and supervised techniques—to predict student dropout using structured, first-year academic and financial data. The study utilizes a comprehensive dataset of 4,424 undergraduate student records from a European higher education institution, covering ten academic years and comprising 35 variables related to academic performance, enrollment behavior, and financial engagement. Clustering techniques were employed to group students by engagement profiles, while classification models—including Random Forest, XGBoost, and a soft voting ensemble—were trained to predict final academic outcomes: Dropout, Enrolled, or Graduate. Feature engineering was conducted in two phases, with both semester-averaged metrics and advanced derived indicators used to enhance model performance. Findings show that academic approvals, grades, and tuition fee status are the most influential predictors of student outcomes. Unsupervised clustering revealed behaviorally distinct groups with statistically significant dropout risks, though these clusters did not translate effectively into predictive labels. Supervised models, particularly tuned XGBoost and ensemble classifiers, achieved high performance in binary classification tasks (balanced accuracy ¿ 0.91, AUC ¿ 0.95), confirming that dropout risk can be reliably predicted from early academic records. However, multiclass classification performance declined, especially for the transitional “Enrolled” category, highlighting the limitations of static early-year data in capturing more ambiguous student states. This research contributes to the literature by demonstrating the strengths and constraints of interpretable machine learning in modeling student success. It also offers actionable insights for academic institutions, such as prioritizing interventions for students with early signs of disengagement and financial instability. Methodologically, the study highlights opportunities for future work to explore hybrid clustering-classification models, apply soft clustering techniques, and evaluate deep learning models for benchmarking purposes. While complex models may lack interpretability, they can serve as useful baselines to understand performance ceilings within structured educational datasets.
Library of Congress Subject Headings
College dropouts--Forecasting--Data processing; College dropouts--Prevention; Machine learning; Predictive analytics
Publication Date
5-2025
Document Type
Thesis
Student Type
Graduate
Degree Name
Professional Studies (MS)
Department, Program, or Center
Graduate Programs & Research
Advisor
Sanjay Modak
Advisor/Committee Member
Ioannis Karamitsos
Recommended Citation
Alameri, Fatma, "Predicting Student Dropout Risk using Machine Learning" (2025). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12172
Campus
RIT Dubai
Plan Codes
PROFST-MS