Abstract

This dissertation explores the advanced integration of deep learning (DL) techniques in remote sensing (RS) and computer vision (CV), with a focus on optimizing convolutional neural networks (CNNs) for enhanced detection performance and efficient computational deployment. The first part of the research addresses the underperformance of conventional object detection methods when applied to RS data. Traditional techniques often struggle due to the small size of targets, limited training data, and diverse modalities involved. To overcome these challenges, we introduce YOLOrs, a novel CNN designed specifically for real-time object detection within multimodal RS imagery. YOLOrs is adept at detecting objects across multiple scales and can predict target orientations while incorporating a pioneering mid-level fusion architecture that effectively handles multimodal data. Building on the concept of multimodal data fusion, we further propose a two-phase multi-stream fusion approach that mitigates the difficulties associated with collecting paired multimodal data, which is often expensive and complex due to the disparate nature of sensing technologies. Our approach first involves training unimodal streams independently, followed by a joint training phase of a common multimodal decision layer. This method has shown to outperform traditional fusion techniques in empirical tests, demonstrating its effectiveness in practical scenarios. The second part of the dissertation shifts focus towards addressing the issue of over-parameterization in CNNs, which often leads to excessive computational demands and storage overheads, as well as overfitting. Here, we introduce YOLOrs-lite, an adaptation of YOLOrs using the Tensor-Train (TT) format for convolutional kernels, significantly reducing the network’s parameters while maintaining high detection performance. This approach not only enhances model efficiency but also facilitates real-time inference suitable for edge deployment. Additionally, we extend the TT compression technique to convolutional auto-encoders (CAEs), creating the CAE-TT, which adjusts the number of parameters without altering the network architecture, demonstrating effectiveness in both batch and online learning environments. Finally, we explore a novel CNN compression technique through dynamic parameter rank pruning. Utilizing low-rank matrix approximations and novel regularization strategies, this method dynamically adjusts the ranks during training, achieving substantial reductions in model size with improved or maintained performance on several benchmark datasets. This research collectively advances the field by developing innovative methods that refine DL applications in RS and CV, ensuring both high performance and efficiency in processing and deployment across diverse platforms.

Library of Congress Subject Headings

Multisensor data fusion; Neural networks (Computer science); Computer vision--Data processing; Deep learning (Machine learning)

Publication Date

6-2024

Document Type

Dissertation

Student Type

Graduate

Degree Name

Imaging Science (Ph.D.)

Department, Program, or Center

Chester F. Carlson Center for Imaging Science

College

College of Science

Advisor

Eli Saber

Advisor/Committee Member

John Kerekes

Advisor/Committee Member

Panos P. Markopoulos

Recommended Citation

Sharma, Manish, "Multimodal Data Fusion and Model Compression Methods for Computer Vision" (2024). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/11879

Campus

RIT – Main Campus

Plan Codes

IMGS-PHD

Download

COinS

Theses

Multimodal Data Fusion and Model Compression Methods for Computer Vision

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

College

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

Multimodal Data Fusion and Model Compression Methods for Computer Vision

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

College

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links