Abstract

This thesis investigates the use of machine learning to predict flight cancellations, aiming to reduce operational disruption and improve airline decision-making. The research is motivated by the need for more proactive strategies in aviation, where flight cancellations often result in financial losses and customer dissatisfaction. The study is framed within the CRISP-DM methodology and demonstrates how historical flight data can be transformed into actionable insights using structured analytics. The research addresses three core questions: how effectively machine learning can predict cancellations, which evaluation metrics are most suitable in an imbalanced context, and how model outputs can support airline operations. A dataset of U.S. domestic flights from 2019 to 2023 was used, containing features such as delays, carriers, routes, and cancellation indicators. Through data preprocessing, irrelevant variables were removed, categorical features encoded, and outliers retained to preserve meaningful variation. Stratified sampling was applied to handle class imbalance and ensure fair evaluation. Several classification models were trained using stratified cross-validation and tested on a holdout set. Evaluation metrics such as precision, recall, F1-score, and AUC-ROC were used to compare models. Emphasis was placed on recall to reduce the chance of missing true cancellations. The results show that machine learning models, when carefully developed, can predict cancellations with strong reliability and practical relevance. The study concludes that predictive analytics has strong potential to enhance airline disruption management. The structured approach used in this research, which includes business understanding, data preparation, and performance evaluation, provides a replicable framework for practical implementation. Practical implications include integrating predictive tools into airline scheduling systems and training operational staff to interpret model outputs. Future research should consider incorporating real-time data and explainable AI to improve model responsiveness and transparency.

Library of Congress Subject Headings

Flight delays--Forecasting--Automation; Airlines--Management--Automation; Predictive analytics; Machine learning

Publication Date

5-2025

Document Type

Thesis

Student Type

Graduate

Degree Name

Professional Studies (MS)

Department, Program, or Center

Graduate Programs & Research

Advisor

Sanjay Modak

Advisor/Committee Member

Ioannis Karamitsos

Campus

RIT Dubai

Plan Codes

PROFST-MS

Share

COinS