Abstract

This research investigates the increasing challenge of accurate flight fare prediction for travel agencies that are functioning and working in the post-pandemic aviation market. As the prices fluctuate while demand is unstable and competition pressure increases, therefore it influences decision-making and profitability. In most cases, traditional ticket pricing methods often can be ineffective when capturing complex and non-linear relationships within factors that influence the ticket fare dynamics. As a result, highlighting the urge of more adaptive pricing and data-driven predictive machine learning models. In response to this challenge, the research examines the effectiveness of machine learning techniques for enhancing flight fare prediction accuracy while supporting pricing strategies.  The main objectives include developing robust machine learning models for fare prediction, through assessing and comparing multiple learning techniques to identify the influencing factors of pricing across the different market. Comparing Random Forest performance against Linear Regression, XGBoost, and Support Vector Regression, and identifying key factors influencing flight pricing dynamics. The research is guided by questions centered on model accuracy, comparative performance, and the factors that most significantly influence fare prediction across different markets. The study analyzes this challenge using three large-scale flight fare datasets with a total of 771,615 flight records from GitHub Airlines Ticket Pricing Dataset, Air India Dataset, and Kaggle Flight Dataset observations covering both business and economy classes.      Results demonstrate Random Forest's superior performance across all datasets, achieving R-squared values ranging from 0.969 to approximately 0.99 and RMSE values between 33.74 and 3,263.22 depending on dataset characteristics. XGBoost consistently delivered the lowest RMSE across datasets (33.74 for GitHub, 342.78 for Air India, and 358.01 for Kaggle), while Random Forest provided optimal balance between accuracy and interpretability with variance explained ranging from 96.82% to 99.01%.   Feature importance analysis reveals that flight class, total duration, route popularity, price per hour, and airline selection are the primary determinants of fare variation. Seasonal patterns, advance booking periods, and departure timing significantly influence pricing strategies, with business class fares averaging 5–9 times higher than economy class across similar routes.   The findings provide actionable insights fort ravel agencies pricing strategy, demonstrating that machine learning models can effectively predict flight fares with high accuracy. The research recommends implementing Random Forest models for strategic decision-making while utilizing XGBoost for operational forecasting due to its superior precision. Future research should incorporate real-time market data, external economic indicators, and dynamic pricing mechanisms to enhance prediction of accuracy and business applicability.

Publication Date

5-2026

Document Type

Thesis

Student Type

Graduate

Degree Name

Professional Studies (MS)

Department, Program, or Center

Graduate Programs & Research

Advisor

Khalil Al Hussaeni

Campus

RIT Dubai

Share

COinS