Abstract
In our study, we investigate Machine Learning (ML) application robustness in ML Integrated with Network (MLIN) systems. We consider MLIN as an integration of three major components and interrelations between them: data sources, network facilities, and ML application. In contrast to the conventional approaches that focus on each separate component, we concentrate on their interrelationships, and consider them from the system integration perspective. We formulate our primary goal as to develop methods and tools aimed at assuring ML application performance towards Data Quality (DQ) variation in MLIN through improving ML application robustness. We examine the prior work on ML robustness definition, evaluation, and enhancement, and discuss existing challenges. As our first major contribution, we propose a novel approach to define and evaluate ML robustness towards DQ variations in MLIN that enables addressing these challenges. We develop ML robustness calculus based on the relationship between the quality of the input data and ML performance demonstrated over this input. As another major contribution, we examine and develop methods and tools to ensure ML performance through improving ML robustness in MLIN. With the integrated MLIN architecture in mind, we represent our third major contribution in which we develop a reactive MLIN feedback mechanism aimed at providing MLIN system restructuring recommendations in order to improve ML performance in the presence of DQ variations. In our fourth contribution, we expand the robustness from the ML execution to the ML training phase. We investigate the feasibility of proactive strategies, such as Transfer and Federated Learning, applied at the ML training phase in order to enhance ML performance to DQ variations during the ML execution, and the security of MLIN systems. We address security vulnerabilities posed by these strategies when applied in MLIN and develop Reputation and Trust-based techniques that allow to enhance the security and, in turn, improve ML robustness. We investigate multiple real-world use cases to verify the developed solutions in practice. Our practical studies embrace diverse data modalities including images, sounds, voice recordings, videos, and the conventional qualitative and quantitative data represented in a table format. We examine various industrial open-source and commercial ML tools designated for processing data in such areas as computer vision, sound classification, voice recognition and transcription, and video object detection and classification.
Library of Congress Subject Headings
Machine learning--Development; Machine learning--Evaluation; Systems integration
Publication Date
4-2024
Document Type
Dissertation
Student Type
Graduate
Degree Name
Computing and Information Sciences (Ph.D.)
Department, Program, or Center
Computing and Information Sciences Ph.D, Department of
College
Golisano College of Computing and Information Sciences
Advisor
Leon Reznik
Advisor/Committee Member
Igor Khokhlov
Advisor/Committee Member
Stanislaw Radziszowski
Recommended Citation
Chuprov, Sergei, "Robust Machine Learning Under Vulnerable Cyberinfrastructure and Varying Data Quality" (2024). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/11821
Campus
RIT – Main Campus
Plan Codes
COMPIS-PHD