Abstract
Customer retention has become a critical focus for businesses seeking to sustain growth and profitability in an increasingly competitive market. In this thesis, advanced machine learning techniques are used to develop a data-driven churn prediction model for Majid Al Futtaim's customer base. In an e-commerce environment, the research focuses on identifying the key transactional and behavioral factors that influence customer churn, based on customer relationship management theory. The study is contextualized in the context of increased digital consumer engagement and high customer acquisition costs, which make retaining existing customers more cost- effective than acquiring new ones. Our primary research questions focused on identifying the most effective machine learning models for churn prediction, identifying the most influential factors contributing to churn, and developing targeted retention strategies. Using customer RFM metrics, payment behavior, and delivery satisfaction indicators, a structured methodology was employed, beginning with extensive data preprocessing, exploratory data analysis (EDA), and feature engineering. An existing dataset was sourced and refined to produce meaningful inputs. With SPSS Modeler, Random Trees and the Feature Selection Node were used for feature selection and importance ranking. The following three algorithms were evaluated for model development: Logistic Regression, Linear Support Vector Machine (LSVM), and Artificial Neural Networks. In order to address class imbalance, the dataset was partitioned into training and testing subsets using 70:30 ratios. Accuracy, Precision, Recall, F1-score, and AUC-ROC were used to evaluate model performance. The Logistic Regression model had the highest accuracy, AUC, and F1-score of all the models tested. The Logistic Regression model performed more effectively than Neural Networks and LSVM because of its transparency and ease of explanation, which make it more practical for real-world business applications. According to a predictive importance analysis, order frequency, payment methods, and approval time are among the top contributors to churn risk. According to the findings, a well-structured, interpretable machine learning model can significantly help businesses identify customers at risk of churn and implement proactive retention strategies. Additionally, the study validates that marketing interventions are more precise when RFM-based segmentation is combined with predictive analytics. In this study, machine learning is demonstrated to be a useful tool to reduce churn and improve customer lifetime value by demonstrating how it can be applied to data-driven customer analytics. It would be possible to explore hybrid models, larger datasets, and the inclusion of psychological and contextual variables for a deeper level of personalization in the future. For churn mitigation strategies, it is recommended that businesses adopt explainable models such as logistic regression.
Library of Congress Subject Headings
Electronic commerce--Management--Data processing; Customer relations--Management--Data processing; Turnover (Business); Market segmentation; Predictive analytics; Machine learning
Publication Date
5-20-2025
Document Type
Thesis
Student Type
Graduate
Degree Name
Professional Studies (MS)
Department, Program, or Center
Graduate Programs & Research
Advisor
Sanjay Modak
Advisor/Committee Member
Hammou Messatfa
Recommended Citation
Kazim, Hind Tawfiq, "Predicting Customer Churn in E-commerce" (2025). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12162
Campus
RIT Dubai
Plan Codes
PROFST-MS