Abstract
The exponential growth of deep learning has helped solve problems across different fields of study. Convolutional neural networks have become a go-to tool for extracting features from images. Similarly, variations of recurrent neural networks such as Long-Short Term Memory and Gated Recurrent Unit architectures do a good job extracting useful information from temporal data such as text and time series data. Although, these networks are good at extracting features for a particular modality, learning features across multiple modalities is still a challenging task. In this work, we develop a generative common vector space model in which similar concepts from different modalities are brought closer in a common latent space representation while dissimilar concepts are pushed far apart in this same space. The developed model not only aims at solving the cross-modal retrieval problem but also uses the vector generated by the common vector space model to generate real looking data. This work mainly focuses on image and text modalities. However, it can be extended to other modalities as well. We train and evaluate the performance of the model on Caltech CUB and Oxford-102 datasets.
Library of Congress Subject Headings
Machine learning; Neural networks (Computer science); Convolutions (Mathematics); Information retrieval; Data mining
Publication Date
2-2020
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Engineering (MS)
Department, Program, or Center
Computer Engineering (KGCOE)
Advisor
Raymond Ptucha
Advisor/Committee Member
Alexander Loui
Advisor/Committee Member
Andres Kwasinski
Recommended Citation
Udaiyar, Premkumar, "Cross-modal data retrieval and generation using deep neural networks" (2020). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10562
Campus
RIT – Main Campus
Plan Codes
CMPE-MS