Abstract

Credit risk assessment remains a very important part of financial institutions, particularly within the rapidly evolving digital lending environment. The research explores the effectiveness of five machine learning models—Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, and Support Vector Machine—in predicting loan default using both financial indicators and categorical borrower attributes. The study is motivated by the growing availability of structured borrower data and the need to evaluate whether advanced machine learning algorithms can be implemented over conventional credit scoring methods. A publicly available L&T Vehicle Loan Default Prediction dataset comprising 233,154 borrower records and 41 structured attributes, the CRISP-DM framework was the methodological process, including data preparation, feature engineering, class imbalance handling using SMOTE, model training, and evaluation. Model performance was assessed using Accuracy, Precision, Recall, F1-score, ROC–AUC, and confusion matrix analysis to ensure robust evaluation under imbalanced class conditions. The findings indicate that financial leverage and credit behavior variables loan-to-value ratio (LTV), disbursed amount, credit bureau score (PERFORM_CNS.SCORE), and prior sanctioned amounts are the most influential predictors of loan default. Among the evaluated models, Gradient Boosting achieved the highest ROC–AUC (0.641), followed by Random Forest (0.637). But Gradient Boosting had weak minority-class detection. Random Forest was the most balanced classification performance because it had the least false positives in the detection of defaulters. Logistic Regression and Decision Tree had stronger recall for defaulters but had higher false-positive rates, while Support Vector Machine was the worst performing model, even after using SMOTE to handle the class imbalance. The use of categorical borrower attributes did not have more importance than core financial indicators to detect a defaulter. To enhance transparency, SHAP and LIME were applied for global and local interpretability. Both techniques consistently confirmed the dominance of leverage and credit history variables, and geography also plays a role in detecting a defaulter.

Library of Congress Subject Headings

Default (Finance)--Forecasting--Data processing; Credit ratings--Data processing; Machine learning; Credit scoring systems; Credit analysis; Financial services industry--Technological innovations

Publication Date

5-2026

Document Type

Thesis

Student Type

Graduate

Degree Name

Professional Studies (MS)

Department, Program, or Center

Graduate Programs & Research

Advisor

Khalil Al Hussaeni

Recommended Citation

Vinson Ukken, Joe, "AI in Action: Redefining Loan Default Prediction for the Digital Lending Era" (2026). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12592

Campus

RIT Dubai

Plan Codes

PROFST-MS

Download

COinS

Theses

AI in Action: Redefining Loan Default Prediction for the Digital Lending Era

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

AI in Action: Redefining Loan Default Prediction for the Digital Lending Era

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links