Abstract

Deep neural networks achieve state-of-the-art performance across many domains, yet their deployment in high-stakes settings is constrained by two challenges: opaque decision-making and vulnerability to adversarial manipulation. This thesis investigates explainability and interpretability as principled mechanisms for improving the reliability and trustworthiness of deep learning models. First, we develop new post-hoc explanation methods that improve feature attribution and concept-based explanations. These methods provide faithful decision cues by modeling meaningful feature interactions and extracting faithful coherent concepts, enabling more reliable understanding of why a model predicts a given label. Second, we show that explanation quality is not solely a property of the explanation algorithm, but is strongly shaped by model design and training procedures. We present approaches that (i) guide model attention using vision–language–derived supervision, (ii) characterize how magnitude pruning reshapes post-hoc explanations, and (iii) establish a theoretical and empirical link between model sensitivity and explanation quality. These techniques show that appropriately designed training objectives can produce models whose explanations are inherently sparse, stable, and faithful. Finally, we show how explainability can be leveraged beyond interpretation to improve model security. We propose explanation-driven approaches for perturbation and patch-based adversarial attack detection, demonstrating explanations provide effective signals for identifying malicious inputs. Collectively, this thesis advances the view of explainability and interpretability as design principles for building reliable and trustworthy deep learning systems.

Library of Congress Subject Headings

Deep learning (Machine learning); Machine learning--Security measures; Neural networks (Computer science); Explanation

Publication Date

4-2026

Document Type

Dissertation

Student Type

Graduate

Degree Name

Computing and Information Sciences (Ph.D.)

Department, Program, or Center

Computing and Information Sciences Ph.D, Department of

College

Golisano College of Computing and Information Sciences

Advisor

Nidhi Rastogi

Advisor/Committee Member

Sara Rampazzi

Advisor/Committee Member

Linwei Wang

Recommended Citation

Bhusal, Dipkamal, "Towards Reliable and Trustworthy Deep Learning through Explainability and Interpretability" (2026). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12522

Campus

RIT – Main Campus

Plan Codes

COMPIS-PHD

Download

COinS

Theses

Towards Reliable and Trustworthy Deep Learning through Explainability and Interpretability

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

College

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

Towards Reliable and Trustworthy Deep Learning through Explainability and Interpretability

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

College

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links