Abstract

In this thesis, we explore several novel data augmentation methods for improving the performance of automatic speech recognition (ASR) on low-resource languages. Using a 100-hour subset of English LibriSpeech to simulate a low-resource setting, we compare the well-known SpecAugment augmentation approach to these new methods, along with several other competitive baselines. We then apply the most promising combinations of models and augmentation methods to three genuinely under-resourced languages using the 40-hour Gujarati, Tamil, Telugu datasets from the 2021 Interspeech Low Resource Automatic Speech Recognition Challenge for Indian Languages. Our data augmentation approaches, coupled with state-of-the-art acoustic model architectures and language models, yield reductions in word error rate over SpecAugment and other competitive baselines for the LibriSpeech-100 dataset, showing a particular advantage over prior models for the ``other'', more challenging, dev and test sets. Extending this work to the low-resource Indian languages, we see large improvements over the baseline models and results comparable to large multilingual models.

Library of Congress Subject Headings

Automatic speech recognition--Technological innovations; Neural networks (Computer science); Machine learning; Pattern recognition systems; Grammar, Comparative and general--Morphology

Publication Date

10-2021

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science (GCCIS)

Advisor

Christopher M. Homan

Advisor/Committee Member

Raymond Ptucha

Advisor/Committee Member

Emily Prud'hommeaux

Recommended Citation

Damania, Ronit, "Data augmentation for automatic speech recognition for low resource languages" (2021). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10968

Campus

RIT – Main Campus

Plan Codes

COMPSCI-MS

Download

COinS

Theses

Data augmentation for automatic speech recognition for low resource languages

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

Data augmentation for automatic speech recognition for low resource languages

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links