Abstract

This study systematically evaluates machine learning models to forecast influenza outbreaks, aiming to enhance public health preparedness through accurate predictions. Laboratory-confirmed influenza case data from the New York State Department of Health (2009–2023) were used to train and test five regression models—XGBoost, Neural Network (MLP regressor), K-Nearest Neighbors (KNN), Random Forest, and Support Vector Regression (SVR)—with preprocessing techniques including differencing and phasing. A leave-one-season-out cross-validation framework and Symmetric Mean Absolute Percentage Error (SMAPE) were employed to rigorously assess performance. Key findings reveal that the Neural Network (MLP regressor) outperformed other models, achieving the lowest SMAPE score (23.7), underscoring the critical role of raw temporal data over differencing methods. Differencing universally degraded accuracy, suggesting that removing trends obscured essential temporal patterns. SVR models exhibited poor performance, highlighting limitations of linear kernel-based methods in capturing nonlinear epidemiological dynamics. Hyperparameter analysis demonstrated distinct temporal dependencies: KNN excelled at short-term fluctuations, XGBoost at medium-term trends, and Random Forest at long-term structural shifts. These results provide actionable frameworks for public health decision-making, enabling timely resource allocation and outbreak mitigation. Future work should extend forecast horizons, integrate external datasets (e.g., weather, social media), and explore granular region- or strain-specific models. This study concludes that machine learning, particularly neural networks, offers a robust pathway to transform surveillance data into predictive insights, bridging critical gaps in infectious disease management and preparedness.

Library of Congress Subject Headings

Influenza--Forecasting; Regression analysis; Epidemiology--Data processing; Machine learning

Publication Date

2-10-2025

Document Type

Thesis

Student Type

Graduate

Degree Name

Bioinformatics (MS)

Department, Program, or Center

Thomas H. Gosnell School of Life Sciences

College

College of Science

Advisor

Gary Skuse

Advisor/Committee Member

Maureen Ferran

Advisor/Committee Member

Gregory Babbitt

Campus

RIT – Main Campus

Plan Codes

BIOINFO-MS

Share

COinS