Abstract
Spoken language identifcation (LID) in telephone speech signals is an important and difficult classification task. Language identifcation modules can be used as front end signal routers for multilanguage speech recognition or transcription devices. Gaussian Mixture Models (GMM's) can be utilized to effectively model the distribution of feature vectors present in speech signals for classification. Common feature vectors used for speech processing include Linear Prediction (LP-CC), Mel-Frequency (MF-CC), and Perceptual Linear Prediction derived Cepstral coefficients (PLP-CC). This thesis compares and examines the recently proposed type of feature vector called the Shifted Delta Cepstral (SDC) coefficients. Utilization of the Shifted Delta Cepstral coefficients has been shown to improve language identification performance. This thesis explores the use of different types of shifted delta cepstral feature vectors for spoken language identification of telephone speech using a simple Gaussian Mixture Models based classifier for a 3-language task. The OGI Multi-language Telephone Speech Corpus is used to evaluate the system.
Library of Congress Subject Headings
Speech processing systems; Automatic speech recognition; Speech--Data processing; Computational linguistics; Sound--Classification; Gaussian processes
Publication Date
2006
Document Type
Thesis
Department, Program, or Center
Computer Science (GCCIS)
Advisor
Gaborski, Roger
Recommended Citation
Lareau, Jonathan, "Application of shifted delta cepstral features for GMM language identification" (2006). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/257
Campus
RIT – Main Campus
Comments
Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works. Physical copy available through RIT's The Wallace Library at: TK7882.S65 L37 2006