Abstract

Deep learning has emerged as a powerful tool in medical imaging, assisting healthcare professionals with several decision-making tasks, such as disease diagnosis, surgical intervention, and treatment planning. Supervised deep learning methods typically require large amounts of high-quality labeled data for training. However, acquiring high-quality labeled data in the medical domain is challenging due to factors such as the high cost of expert annotation and the presence of label noise, often due to inherent user/expert annotation variability. Models trained on limited or noisy labeled data suffer from poor performance due to overfitting, thus reducing their generalizability and trustworthiness for medical applications. During recent years, several works have been proposed to tackle the challenge of learning with limited and noisy labeled data in general machine learning. However, the complexity of medical data, including factors like subtle distinguishing features, imbalanced classes, and different imaging modalities, makes this challenge even more prevalent in the medical domain. This dissertation explores multiple approaches to overcome the issue of learning with limited and noisy labeled data for robust medical image applications. We first began by investigating the impact of class-dependent label noise on medical image classifiers to understand the effects when noisy and clean target classes are visually similar. We then introduced a framework to enhance robustness against noisy labels using self-supervised pretraining. As multiple factors influence learning with noisy labels in medical image classification, including the number of classes, dataset complexity, learning with noisy label methods, noise types, and self-supervised pretraining approaches, we conducted an in-depth study on the benefits of self-supervised pretraining in improving robustness against label noise across various datasets, taking these factors into account. Next, to tackle the challenge of limited labeled medical data, we proposed an active learning pipeline that leverages multimodal information to learn from limited labeled data, reducing annotation cost. Furthermore, we developed a robust framework for training with noisy labels in imbalanced medical image classification by separating noisy from clean labels and gradually relabeling some critical incorrect samples selected using active learning techniques. Our final contribution is aimed towards robust multimodal learning by addressing the issue of hallucination in Vision Language Models (VLMs). For this application, we created a vision-language medical dataset with hallucination-aware annotations and established initial benchmarks for VLMs, laying the groundwork for future research in medical applications.

Library of Congress Subject Headings

Diagnostic imaging--Data processing; Deep learning (Machine learning); Computer vision; Natural language processing (Computer science); Electronic noise

Publication Date

4-2025

Document Type

Dissertation

Student Type

Graduate

Degree Name

Imaging Science (Ph.D.)

Department, Program, or Center

Chester F. Carlson Center for Imaging Science

College

College of Science

Advisor

Christian Linte

Advisor/Committee Member

Binod Bhattarai

Advisor/Committee Member

Bishesh Khanal

Campus

RIT – Main Campus

Plan Codes

IMGS-PHD

Share

COinS