Abstract

This dissertation explores the advanced integration of deep learning (DL) techniques in remote sensing (RS) and computer vision (CV), with a focus on optimizing convolutional neural networks (CNNs) for enhanced detection performance and efficient computational deployment. The first part of the research addresses the underperformance of conventional object detection methods when applied to RS data. Traditional techniques often struggle due to the small size of targets, limited training data, and diverse modalities involved. To overcome these challenges, we introduce YOLOrs, a novel CNN designed specifically for real-time object detection within multimodal RS imagery. YOLOrs is adept at detecting objects across multiple scales and can predict target orientations while incorporating a pioneering mid-level fusion architecture that effectively handles multimodal data. Building on the concept of multimodal data fusion, we further propose a two-phase multi-stream fusion approach that mitigates the difficulties associated with collecting paired multimodal data, which is often expensive and complex due to the disparate nature of sensing technologies. Our approach first involves training unimodal streams independently, followed by a joint training phase of a common multimodal decision layer. This method has shown to outperform traditional fusion techniques in empirical tests, demonstrating its effectiveness in practical scenarios. The second part of the dissertation shifts focus towards addressing the issue of over-parameterization in CNNs, which often leads to excessive computational demands and storage overheads, as well as overfitting. Here, we introduce YOLOrs-lite, an adaptation of YOLOrs using the Tensor-Train (TT) format for convolutional kernels, significantly reducing the network’s parameters while maintaining high detection performance. This approach not only enhances model efficiency but also facilitates real-time inference suitable for edge deployment. Additionally, we extend the TT compression technique to convolutional auto-encoders (CAEs), creating the CAE-TT, which adjusts the number of parameters without altering the network architecture, demonstrating effectiveness in both batch and online learning environments. Finally, we explore a novel CNN compression technique through dynamic parameter rank pruning. Utilizing low-rank matrix approximations and novel regularization strategies, this method dynamically adjusts the ranks during training, achieving substantial reductions in model size with improved or maintained performance on several benchmark datasets. This research collectively advances the field by developing innovative methods that refine DL applications in RS and CV, ensuring both high performance and efficiency in processing and deployment across diverse platforms.

Publication Date

6-2024

Document Type

Dissertation

Student Type

Graduate

Degree Name

Imaging Science (Ph.D.)

Department, Program, or Center

Chester F. Carlson Center for Imaging Science

College

College of Science

Advisor

Eli Saber

Advisor/Committee Member

John Kerekes

Advisor/Committee Member

Panos P. Markopoulos

Campus

RIT – Main Campus

Share

COinS