Abstract

Live speech transcription and captioning are important for the accessibility of deaf and hard of hearing individuals, especially in situations with no visible ASL translators. If live captioning is available at all, it is typically rendered in the style of closed captions on a display such as a phone screen or TV and away from the real conversation. This can potentially divide the focus of the viewer and detract from the experience. This paper proposes an investigation into an alternative, Augmented Reality driven approach to the display of these captions, using deep neural networks to compute, track and associate deep visual and speech descriptors in order to maintain captions as "speech bubbles" above the speaker.

Library of Congress Subject Headings

Real-time closed captioning--Technological innovations; Augmented reality; Neural networks (Computer science)

Publication Date

5-2020

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science (GCCIS)

Advisor

Joe Geigel

Advisor/Committee Member

Zack Butler

Advisor/Committee Member

Thomas Kinsman

Campus

RIT – Main Campus

Plan Codes

COMPSCI-MS

Share

COinS