Dementia is an increasing problem for the aging population that incurs high medical costs, in part due to the lack of available treatment options. Accordingly, early detection is critical to potentially postpone symptoms and to prepare both healthcare providers and families for a patient's management needs. Current detection methods are typically costly or unreliable, and could greatly benefit from improved recognition of early dementia markers. Identification of such markers may be possible through computational analysis of patients' electronic clinical records. Prior work on has focused on structured data (e.g. test results), but these records often also contain natural language (text) data in the form of patient histories, visit summaries, or other notes, which may be valuable for disease prediction. This thesis has three main goals: to incorporate analysis of the aforementioned electronic medical texts into predictive models of dementia development, to explore the use of topic modeling as a form of interpretable dimensionality reduction to improve prediction and to characterize the texts, and to integrate these models with ones using structured data. This kind of computational modeling could be used in an automated screening system to identify and flag potentially problematic patients for assessment by clinicians. Results support the potential for unstructured clinical text data both as standalone predictors of dementia status when structured data are missing, and as complements to structured data.

Library of Congress Subject Headings

Dementia--Diagnosis; Data mining; Medical statistics

Publication Date


Document Type


Student Type


Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science (GCCIS)


Cecilia Ovesdotter Alm

Advisor/Committee Member

Xumin Liu

Advisor/Committee Member

Qi Yu


Physical copy available from RIT's Wallace Library at RC521 .B85 2015


RIT – Main Campus

Plan Codes