Abstract

Accurately predicting learner performance is a significant challenge in education since diverse and interrelated factors contribute to academic success. High school grades and standardized test scores remain the primary indicators used for university admissions. However, such approaches fail to capture the complexities surrounding student learning. For instance, various cognitive, cultural, socioeconomic, and environmental influences shape academic outcomes. Educators need to adopt more comprehensive predictive models that consider in-depth learner differences. Reliable academic performance prediction can help schools optimize admissions decisions, allocate resources effectively, and implement targeted interventions to support at-risk students. This study adopted a quantitative primary research approach with a survey design. R was used to analyze and identify meaningful relationships within the dataset. Various statistical techniques, including descriptive statistics, correlation analysis, and predictive modeling, were applied to interpret the data effectively. The results of a Spearman correlation analysis indicated a weak positive relationship between high school scores and CGPA, r=0.39. This finding suggested that students with higher high school scores tend to have higher CGPAs. However, the weak correlation implies that additional factors beyond high school performance significantly influence university academic success. A multiple linear regression model confirmed the predictive significance of high school scores (β = 0.2263, p < 0.05), EmSAT, and IELTS scores (β = 0.0886 and β = 0.00026, p < 0.05) were significant in determining CGPA. However, TOEFL and SAT scores were statistically insignificant (p > 0.05). The model also found that demographic factors affected academic performance. For instance, nationality had a significant impact, with non-Emirati students achieving higher CGPAs than Emirati students (β = 0.2218, p < 0.05). Gender differences were also notable, as male students had lower CGPAs than female students (β = -0.2266, p < 0.05). These findings highlight the importance of considering demographic and socioeconomic factors when predicting student success. The study further evaluated machine learning models for academic performance prediction. Results demonstrated that ensemble-based machine learning techniques, particularly Random Forest, outperformed deep learning approaches such as Artificial Neural Networks (ANN). The superior performance of ensemble methods suggests that future research and practical applications should prioritize models like Gradient Boosting Machines (GBM) or XGBoost to improve accuracy. Lastly, the decision tree model outperformed Random Forest and ANN with a strong correlation between the actual and predicted values, r=0.39. This study had limitations, including missing values and skewed data. These data quality issues affected the accuracy and generalizability of the findings. Future research should address these challenges by collecting a more extensive dataset, employing advanced techniques for handling missing values, and refining data preprocessing methods to minimize the impact of outliers. By leveraging more robust machine learning techniques and incorporating a broader range of predictive variables, future studies can enhance the accuracy and reliability of academic performance prediction models.

Library of Congress Subject Headings

Academic achievement--Forecasting; College students--Economic conditions; College students--Social conditions; Machine learning; Neural networks (Computer science)

Publication Date

7-2025

Document Type

Thesis

Student Type

Graduate

Degree Name

Professional Studies (MS)

Department, Program, or Center

Graduate Programs & Research

Advisor

Ehsan Warriach

Campus

RIT Dubai

Plan Codes

PROFST-MS

Share

COinS