Abstract
In this thesis, we explore several novel data augmentation methods for improving the performance of automatic speech recognition (ASR) on low-resource languages. Using a 100-hour subset of English LibriSpeech to simulate a low-resource setting, we compare the well-known SpecAugment augmentation approach to these new methods, along with several other competitive baselines. We then apply the most promising combinations of models and augmentation methods to three genuinely under-resourced languages using the 40-hour Gujarati, Tamil, Telugu datasets from the 2021 Interspeech Low Resource Automatic Speech Recognition Challenge for Indian Languages. Our data augmentation approaches, coupled with state-of-the-art acoustic model architectures and language models, yield reductions in word error rate over SpecAugment and other competitive baselines for the LibriSpeech-100 dataset, showing a particular advantage over prior models for the ``other'', more challenging, dev and test sets. Extending this work to the low-resource Indian languages, we see large improvements over the baseline models and results comparable to large multilingual models.
Library of Congress Subject Headings
Automatic speech recognition--Technological innovations; Neural networks (Computer science); Machine learning; Pattern recognition systems; Grammar, Comparative and general--Morphology
Publication Date
10-2021
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Science (MS)
Department, Program, or Center
Computer Science (GCCIS)
Advisor
Christopher M. Homan
Advisor/Committee Member
Raymond Ptucha
Advisor/Committee Member
Emily Prud'hommeaux
Recommended Citation
Damania, Ronit, "Data augmentation for automatic speech recognition for low resource languages" (2021). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10968
Campus
RIT – Main Campus
Plan Codes
COMPSCI-MS