Abstract

Video understanding has become increasingly important as surveillance, social, and informational videos weave themselves into our everyday lives. Video captioning offers a simple way to summarize, index, and search the data. Most video captioning models utilize a video encoder and captioning decoder framework. Hierarchical encoders can abstractly capture clip level temporal features to represent a video, but the clips are at fixed time steps. This thesis research introduces two models: a hierarchical model with steered captioning, and a Multi-stream Hierarchical Boundary model. The steered captioning model is the first attention model to smartly guide an attention model to appropriate locations in a video by using visual attributes. The Multi-stream Hierarchical Boundary model combines a fixed hierarchy recurrent architecture with a soft hierarchy layer by using intrinsic feature boundary cuts within a video to define clips. This thesis also introduces a novel parametric Gaussian attention which removes the restriction of soft attention techniques which require fixed length video streams. By carefully incorporating Gaussian attention in designated layers, the proposed models demonstrate state-of-the-art video captioning results on recent datasets.

Library of Congress Subject Headings

Neural networks (Computer science); Video recordings for the hearing impaired--Data processing; Video recordings--Data processing

Publication Date

4-2017

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Engineering (MS)

Department, Program, or Center

Computer Engineering (KGCOE)

Advisor

Raymond Ptucha

Advisor/Committee Member

Nathan Cahill

Advisor/Committee Member

Dhireesha Kudithipudi

Comments

Physical copy available from RIT's Wallace Library at QA76.87 .N48 2017

Recommended Citation

Nguyen, Thang Huy, "Automatic Video Captioning using Deep Neural Network" (2017). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/9516

Campus

RIT – Main Campus

Plan Codes

CMPE-MS

Download

COinS

Theses

Automatic Video Captioning using Deep Neural Network

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Comments

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

Automatic Video Captioning using Deep Neural Network

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Comments

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links