Abstract
Credit risk assessment remains a very important part of financial institutions, particularly within the rapidly evolving digital lending environment. The research explores the effectiveness of five machine learning models—Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, and Support Vector Machine—in predicting loan default using both financial indicators and categorical borrower attributes. The study is motivated by the growing availability of structured borrower data and the need to evaluate whether advanced machine learning algorithms can be implemented over conventional credit scoring methods. A publicly available L&T Vehicle Loan Default Prediction dataset comprising 233,154 borrower records and 41 structured attributes, the CRISP-DM framework was the methodological process, including data preparation, feature engineering, class imbalance handling using SMOTE, model training, and evaluation. Model performance was assessed using Accuracy, Precision, Recall, F1-score, ROC–AUC, and confusion matrix analysis to ensure robust evaluation under imbalanced class conditions. The findings indicate that financial leverage and credit behavior variables loan-to-value ratio (LTV), disbursed amount, credit bureau score (PERFORM_CNS.SCORE), and prior sanctioned amounts are the most influential predictors of loan default. Among the evaluated models, Gradient Boosting achieved the highest ROC–AUC (0.641), followed by Random Forest (0.637). But Gradient Boosting had weak minority-class detection. Random Forest was the most balanced classification performance because it had the least false positives in the detection of defaulters. Logistic Regression and Decision Tree had stronger recall for defaulters but had higher false-positive rates, while Support Vector Machine was the worst performing model, even after using SMOTE to handle the class imbalance. The use of categorical borrower attributes did not have more importance than core financial indicators to detect a defaulter. To enhance transparency, SHAP and LIME were applied for global and local interpretability. Both techniques consistently confirmed the dominance of leverage and credit history variables, and geography also plays a role in detecting a defaulter.
Publication Date
5-2026
Document Type
Thesis
Student Type
Graduate
Degree Name
Professional Studies (MS)
Department, Program, or Center
Graduate Programs & Research
Advisor
Khalil Al Hussaeni
Recommended Citation
Vinson Ukken, Joe, "AI in Action: Redefining Loan Default Prediction for the Digital Lending Era" (2026). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12592
Campus
RIT Dubai
