Abstract
Deep neural networks achieve state-of-the-art performance across many domains, yet their deployment in high-stakes settings is constrained by two challenges: opaque decision-making and vulnerability to adversarial manipulation. This thesis investigates explainability and interpretability as principled mechanisms for improving the reliability and trustworthiness of deep learning models. First, we develop new post-hoc explanation methods that improve feature attribution and concept-based explanations. These methods provide faithful decision cues by modeling meaningful feature interactions and extracting faithful coherent concepts, enabling more reliable understanding of why a model predicts a given label. Second, we show that explanation quality is not solely a property of the explanation algorithm, but is strongly shaped by model design and training procedures. We present approaches that (i) guide model attention using vision–language–derived supervision, (ii) characterize how magnitude pruning reshapes post-hoc explanations, and (iii) establish a theoretical and empirical link between model sensitivity and explanation quality. These techniques show that appropriately designed training objectives can produce models whose explanations are inherently sparse, stable, and faithful. Finally, we show how explainability can be leveraged beyond interpretation to improve model security. We propose explanation-driven approaches for perturbation and patch-based adversarial attack detection, demonstrating explanations provide effective signals for identifying malicious inputs. Collectively, this thesis advances the view of explainability and interpretability as design principles for building reliable and trustworthy deep learning systems.
Library of Congress Subject Headings
Deep learning (Machine learning); Machine learning--Security measures; Neural networks (Computer science); Explanation
Publication Date
4-2026
Document Type
Dissertation
Student Type
Graduate
Degree Name
Computing and Information Sciences (Ph.D.)
Department, Program, or Center
Computing and Information Sciences Ph.D, Department of
College
Golisano College of Computing and Information Sciences
Advisor
Nidhi Rastogi
Advisor/Committee Member
Sara Rampazzi
Advisor/Committee Member
Linwei Wang
Recommended Citation
Bhusal, Dipkamal, "Towards Reliable and Trustworthy Deep Learning through Explainability and Interpretability" (2026). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12522
Campus
RIT – Main Campus
Plan Codes
COMPIS-PHD
