Abstract
Teaching computers how to recognize people and objects from visual cues in images and videos is an interesting challenge. The computer vision and pattern recognition communities have already demonstrated the ability of intelligent algorithms to detect and classify objects in difficult conditions such as pose, occlusions and image fidelity. Recent deep learning approaches in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) are built using very large and deep convolution neural network architectures. In 2015, such architectures outperformed human performance (94.9% human vs 95.06% machine) for top-5 validation accuracies on the ImageNet dataset, and earlier this year deep learning approaches demonstrated a remarkable 96.43% accuracy. These successes have been made possible by deep architectures such as VGG, GoogLeNet, and most recently by deep residual models with as many as 152 weight layers. Training of these deep models is a difficult task due to compute intensive learning of millions of parameters. Due to the inevitability of these parameters, very small filters of size 3x3 are used in convolutional layers to reduce the parameters in very deep networks. On the other hand, deep networks generalize well on other datasets and outperform complex datasets with less features or Images.
This thesis proposes a robust approach for large scale visual recognition by introducing a framework that automatically analyses the similarity between different classes among the dataset and configures a family of smaller networks that replace a single larger network. Classes that are similar are grouped together and are learnt by a smaller network. This allows one to divide and conquer the large classification problem by identifying the class category from its coarse label to its fine label, deploying two or more stages of networks. In this way the proposed framework learns the natural hierarchy and effectively uses it for the classification problem. A comprehensive analysis of the proposed methods show that hierarchical models outperform traditional models in terms of accuracy, reduced computations and attribute to expanding the ability to learn large scale visual information effectively.
Library of Congress Subject Headings
Computer vision; Optical pattern recognition; Machine learning; Neural networks (Computer science)
Publication Date
12-2016
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Engineering (MS)
Department, Program, or Center
Computer Engineering (KGCOE)
Advisor
Raymond Ptucha
Advisor/Committee Member
Christopher Kanan
Advisor/Committee Member
Dhireesha Kudithipudi
Recommended Citation
Chennupati, Sumanth, "Hierarchical Decomposition of Large Deep Networks" (2016). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/9288
Campus
RIT – Main Campus
Plan Codes
CMPE-MS
Comments
Physical copy available from RIT's Wallace Library at TA1634 .C43 2016