Abstract
This study systematically evaluates machine learning models to forecast influenza outbreaks, aiming to enhance public health preparedness through accurate predictions. Laboratory-confirmed influenza case data from the New York State Department of Health (2009–2023) were used to train and test five regression models—XGBoost, Neural Network (MLP regressor), K-Nearest Neighbors (KNN), Random Forest, and Support Vector Regression (SVR)—with preprocessing techniques including differencing and phasing. A leave-one-season-out cross-validation framework and Symmetric Mean Absolute Percentage Error (SMAPE) were employed to rigorously assess performance. Key findings reveal that the Neural Network (MLP regressor) outperformed other models, achieving the lowest SMAPE score (23.7), underscoring the critical role of raw temporal data over differencing methods. Differencing universally degraded accuracy, suggesting that removing trends obscured essential temporal patterns. SVR models exhibited poor performance, highlighting limitations of linear kernel-based methods in capturing nonlinear epidemiological dynamics. Hyperparameter analysis demonstrated distinct temporal dependencies: KNN excelled at short-term fluctuations, XGBoost at medium-term trends, and Random Forest at long-term structural shifts. These results provide actionable frameworks for public health decision-making, enabling timely resource allocation and outbreak mitigation. Future work should extend forecast horizons, integrate external datasets (e.g., weather, social media), and explore granular region- or strain-specific models. This study concludes that machine learning, particularly neural networks, offers a robust pathway to transform surveillance data into predictive insights, bridging critical gaps in infectious disease management and preparedness.
Library of Congress Subject Headings
Influenza--Forecasting; Regression analysis; Epidemiology--Data processing; Machine learning
Publication Date
2-10-2025
Document Type
Thesis
Student Type
Graduate
Degree Name
Bioinformatics (MS)
Department, Program, or Center
Thomas H. Gosnell School of Life Sciences
College
College of Science
Advisor
Gary Skuse
Advisor/Committee Member
Maureen Ferran
Advisor/Committee Member
Gregory Babbitt
Recommended Citation
Kandwal, Mukul, "Systematic Evaluation of Regression Models for the Prediction of Influenza Cases" (2025). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12076
Campus
RIT – Main Campus
Plan Codes
BIOINFO-MS