Abstract

The research problem addressed in this study involves the use of machine learning approach in the spatio-temporal analysis and prediction of urban air quality by relating Air Quality Index (AQI) and geolocation data on Air Quality across a global dataset. The study is informed by the increased concern on the effects of air pollution both on human health and the environment, whereby specific attention is given to comprehending the determinants of AQI in various regions. In the study, four main research questions guide the research: what key factors play an effective role on AQI, the efficiency of machine learning models used in prediction, the impact of PM2.5 and comparison of model performance between the presence of dominant pollutants and their absence. Quantitative research design, which is based on a positivist philosophy, was followed employing secondary data of 16,695 international observations. Analysis of data entailed exploratory data analysis (EDA), correlation analysis, outlier control and feature engineering and further the creation of Linear Regression and the Random Forest models. The models were tested on the basis of R 2, RMSE and MAE in two conditions i.e. PM2.5 included and PM2.5 excluded. The results indicate PM2.5 as the most important determinant of AQI with the presence of strong correlation (0.93) and dominance of importance feature. Although models with PM2.5 had very high accuracy (R 2 = nearly 1), this was due to the dependency of features because AQI is made of pollutant values. Alternatively, the models that do not consider PM2.5 have superior predictive power but less realistic, which makes the independence of features a significant matter. The researchers conclude that machine learning models are successful in predicting the AQI, but their performance is determined by the structure of the data and choice of the features. It highlights the importance of critical interpretation in order to come up with non-misleading conclusions as a result of data leakage.

Publication Date

5-28-2026

Document Type

Thesis

Student Type

Graduate

Degree Name

Professional Studies (MS)

Department, Program, or Center

Graduate Programs & Research

Advisor

Ehsan Warriach

Comments

This thesis has been embargoed. The full-text will be available on or around 11/12/2026.

Campus

RIT Dubai

Share

COinS