Abstract
Myalgic Encephalomyelitis/Chronic Fatigue Syndrome is a complex chronic illness characterized by debilitating, heterogeneous symptoms. The condition exhibits highly variable long-term outcomes, posing significant challenges for patient management, research, and clinical practice. Currently, there is a lack of tools for predicting individual patient trajectories, creating a substantial prognostic gap that is compounded by the prevailing research focus on diagnostics over prognosis. This study addresses that gap by investigating whether multidimensional baseline patient-reported outcome measures (PROMs), analysed using advanced and explainable machine learning methods, can meaningfully predict heterogeneous 12-month outcomes in ME/CFS, and by examining what patterns of predictability themselves reveal about the structure of outcome. For the field of applied health data science and machine learning, this represents a fundamental theoretical and practical challenge, as predictive models must operate under conditions of multidimensional symptom burden, outcome discordance, class imbalance, missingness, and non-linear disease behaviour. A secondary analysis was conducted on a longitudinal UK specialist-service cohort, restricted to adults with complete baseline and 12-month follow-up data (n = 438). Baseline measures comprised multidimensional PROMs capturing fatigue, functional impairment, mood, pain, sleep, activity limitation, and cognition. Follow-up PROMs and Clinical Global Impression scales were used to derive continuous fatigue change scores, binary improved versus worsened trajectories, and multi-pattern outcome classifications. Supervised machine learning models, including linear and logistic regression, regularised regression, random forest, gradient boosting, and support vector machines, were developed within a reproducible train test and cross-validation framework. Model performance was evaluated using error metrics, discrimination, calibration, and decision curve analysis. Unsupervised methods and explainability techniques, including principal component analysis, clustering, partial dependence, and SHAP values, were applied to characterize predictor structure and model behaviour. The models explained a meaningful proportion of variance in continuous fatigue change and achieved modest but clinically relevant discrimination for improved versus worsened trajectories. Across all modelling approaches, baseline functional disability, daytime sleepiness, pain intensity, and fatigue severity consistently emerged as the strongest predictors of subsequent deterioration. In contrast, improvement remained intrinsically difficult to forecast, with weak and unstable predictor profiles across models. Clustering analyses identified interpretable baseline severity subgroups but failed to reliably distinguish long-term prognostic classes. Among all approaches, random forest models demonstrated the most favourable balance of discrimination, probability calibration, and net clinical benefit, particularly for early identification of patients at elevated risk of deterioration. Using explainable machine learning applied to longitudinal PROMs, the findings show that deterioration in ME/CFS can be identified with clinically meaningful reliability, whereas recovery consistently resists prediction. This asymmetry suggests that outcome fragmentation is a structural property of the illness rather than a modelling limitation. Collectively, the findings support a conceptual reframing of ME/CFS outcome as a fragmented, multisystem construct in which deterioration behaves as a coherent, machine detectable state, whereas recovery is plural, weakly structured, and poorly predictable from PROMs alone. This asymmetry demonstrates that recovery is not the inverse of worsening and that outcome domains decouple across functional systems over time. Practically, the results indicate that PROM-based machine learning holds promise as an early-warning tool for deterioration but should not be interpreted as a confirmatory prognostic instrument for recovery. Future research should prioritize external validation, multimodal data integration, and the development of prognostic frameworks explicitly designed to accommodate outcome fragmentation and heterogeneous illness trajectories.
Publication Date
12-2025
Document Type
Thesis
Student Type
Graduate
Degree Name
Professional Studies (MS)
Department, Program, or Center
Graduate Programs & Research
Advisor
Sanjay Modak
Advisor/Committee Member
Ioannis Karamitsos
Recommended Citation
Malik, Neha, "Predictive Modelling of Long-term Outcomes in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome using Machine Learning" (2025). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12493
Campus
RIT Dubai

Comments
This thesis has been embargoed. The full-text will be available on or around 1/22/2027.