Abstract
The application of deep neural networks to the task of acoustic modeling for automatic speech recognition (ASR) has resulted in dramatic decreases of word error rates, allowing for the use of this technology in smart phones and personal home assistants in high-resource languages. Developing ASR models of this caliber, however, requires hundreds or thousands of hours of transcribed speech recordings, which presents challenges for most of the world’s languages. In this work, we investigate the applicability of three distinct architectures that have previously been used for ASR in languages with limited training resources. We tested these architectures using publicly available ASR datasets for several typologically and orthographically diverse languages, whose data was produced under a variety of conditions using different speech collection strategies, practices, and equipment. Additionally, we performed data augmentation on this audio, such that the amount of data could increase nearly tenfold, synthetically creating higher resource training. The architectures and their individual components were modified, and parameters explored such that we might find a best-fit combination of features and modeling schemas to fit a specific language morphology. Our results point to the importance of considering language-specific and corpus-specific factors and experimenting with multiple approaches when developing ASR systems for resource-constrained languages.
Library of Congress Subject Headings
Automatic speech recognition--Technological innovations; Machine learning; Neural networks (Computer science); Pattern recognition systems; Grammar, Comparative and general--Morphology
Publication Date
4-2021
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Engineering (MS)
Department, Program, or Center
Computer Engineering (KGCOE)
Advisor
Emily Prud'hommeaux
Advisor/Committee Member
Alexander Loui
Advisor/Committee Member
Andreas Savakis
Recommended Citation
Morris, Ethan, "Automatic Speech Recognition for Low-Resource and Morphologically Complex Languages" (2021). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10758
Campus
RIT – Main Campus
Plan Codes
CMPE-MS