Abstract

In rapidly urbanizing cities, the public transport systems have to be efficient and reliable with proper demand forecasting to guarantee customer satisfaction. This paper explores how evidence-based machine learning tools can be used to forecast weekly bus ridership in Dubai, using a dataset of more than 10 million trip-level boarding observations between June 2013 and July 2014. It was aggregated to 25,195 route-week observations, and preprocessed with temporal feature engineering, lag variables and rolling statistics. Three predictive models have been tested: Linear Regression (as a baseline), Random Forest and XGBoost (as an ensemble method). Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Coefficient of Determination (R²) and Mean Absolute Percentage Error ( MAPE ) were used to evaluate model performance. Findings indicate that the ensemble models outperformed the baseline in all cases, and the Random Forest provided the lowest MAE and XGBoost provided the most competitive values of MAPE. The explanatory power of models showed that short-term lags and rolling averages prevailed in the formation of ridership forecasts by feature importance analysis and SHAP values. The results indicate that machine learning models can be valuable decision-support solutions to the Roads and Transport Authority in Dubai, which can be used to plan routes in advance, allocate resources efficiently, and deliver services more effectively. The paper ends with recommendations on how to expand the data coverage, combine exogenous variables and consider more sophisticated modelling methods to increase the predictive power even more.

Publication Date

2025

Document Type

Thesis

Student Type

Graduate

Degree Name

Professional Studies (MS)

Department, Program, or Center

Graduate Programs & Research

Advisor

Sanjay Modak

Advisor/Committee Member

Parthasarathi Gopal

Campus

RIT Dubai

Share

COinS