Abstract
Deep learning has enabled great advances in the field of natural language processing, computer vision and pattern recognition in general. Deep learning frameworks have been very successful in performing classification, object detection, segmentation and translation. Before objects can be processed, a vector representation of that object needs to be created. For example, sentences and images can be encoded with a sent2vec and image2vec function respectively in preparation for input to a machine learning framework. Neural networks are able to learn efficient vector representation of images, text, audio, videos and 3D point clouds. However, the transfer of knowledge from one modality to the other is a challenging task. In this work, we develop vector spaces that can handle data that belongs to multiple modalities at the same time. In these spaces, similar objects are tightly clustered and dissimilar objects are far away irrespective of their modality. Such a vector space can be used in retrieval of objects, searching and generation tasks. For example, given a picture of a person surfing, one can retrieve sentences or audio bites of a person surfing. We build a Multi-stage Common Vector Space (M-CVS) and Reference Vector Space (RVS) that can handle images, text, audios, videos and 3D point cloud data. Both, the M-CVS and RVS can handle the addition of a new modality without having to change the existing transforms or architecture. Our model is evaluated by performing cross modal retrieval on multiple benchmark datasets.
Library of Congress Subject Headings
Vector spaces; Machine learning; Neural networks (Computer science)
Publication Date
10-2019
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Engineering (MS)
Department, Program, or Center
Computer Engineering (KGCOE)
Advisor
Raymond Ptucha
Advisor/Committee Member
Alexander Loui
Advisor/Committee Member
Qi Yu
Recommended Citation
Gopalakrishnan, Sabarish, "Vector Spaces for Multiple Modal Embeddings" (2019). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10283
Campus
RIT – Main Campus
Plan Codes
CMPE-MS