Abstract

Universities need to uncover students who are likely to drop out early and do something about it. It should be easy to find and use this information. This thesis evaluated interpretable machine-learning models (MLMs) for predicting three final student outcomes including Dropout, Enrolled and Graduate, using solely demographic, administrative/financial and first-semester academic variables. Study employed publicly accessible Portuguese UCI student retention dataset (4,424 records) and incorporated end-of-Semester-1 feature window to prevent look-ahead leakage and align with actual advising cycles. We trained and compared models using stratified validation, explicit class-imbalance handling and decision-threshold optimization to make dropout detection most critical factor. Modified Random Forest performed best overall. When validation-selected operational threshold (τ* = 0.38) was set for testing, it delivered solid out-of-sample findings (Macro-F1 ≈ 0.68; Dropout-recall ≈ 0.70). Explain ability analyses indicated that first-semester academic momentum (allowed credits, mean grade and assessment counts) constituted primary source of risk. Status of tuition fees is helpful second indicator. This thesis concluded with an advisor-centric deployment strategy that integrated prioritized risk list with concise, factor-driven explanations to provide prompt and targeted interventions at conclusion of first semester.

Publication Date

2026

Document Type

Thesis

Student Type

Graduate

Degree Name

Professional Studies (MS)

Department, Program, or Center

Graduate Programs & Research

Advisor

Ehsan Warriach

Campus

RIT Dubai

Share

COinS