Abstract
The problem of classification of people based on their phonetic features of accents is posted. This thesis intends to construct an automatic accent recognition machine that can accomplish this classification task with a decent accuracy. The machine consists of two crucial steps, feature extraction and pattern recognition. In the thesis, we review and explore multiple techniques of both steps in great detail. Specifically, in terms of feature extraction, we explore the techniques of principal component analysis and cepstral analysis, and in terms of pattern recognition, we explore the algorithms of discriminant function, support vector machine, and k-nearest neighbors. Since signal data usually exhibit the feature of High Dimension Low Sample Size, it is crucial in the automatic accent recognition task to reduce the dimensionality.
Two studies are constructed in which speech signals are collected and a binary classification of American English accent and non-American English accent is performed. In the first study, a total of 330 speech signals, without the disturbance of noise, of an average dimensionality of 44050 are classified into two categories. In the time domain, the dimensionality is reduced to 250 using principal component analysis. Although the in-sample prediction shows an optimistic accuracy of over 90%, the out-of-sample prediction accuracy using cross-validation is as low as 60%. Alternatively, a feature extraction technique in the frequency domain, cepstral analysis, is implemented instead of principal component analysis, by which a special type of feature called mel-frequency cepstral coefficients is extracted and the dimensionality is reduced to some values between 12 and 39. The out-of-sample prediction accuracy can be as high as around 95%. Although cepstral analysis demonstrates itself as a powerful tool in accent recognition, through asecond study we further show that it may quickly fail when there is evident amount of noise in the signal. The prediction performance is reduced to 80% or lower, depending on the amplitude of the noise and the length of the signals.
Library of Congress Subject Headings
Pattern recognition systems; Accents and accentuation--Data processing; Speech--Data processing; Sound--Classification
Publication Date
12-2014
Document Type
Thesis
Student Type
Graduate
Degree Name
Applied Statistics (MS)
Department, Program, or Center
The John D. Hromi Center for Quality and Applied Statistics (KGCOE)
Advisor
Ernest Fokoue
Advisor/Committee Member
Joseph Voelkel
Advisor/Committee Member
Steven Lalonde
Recommended Citation
Ma, Zichen, "Statistical Methods for Signal Processing with Application to Automatic Accent Recognition" (2014). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/8508
Campus
RIT – Main Campus
Plan Codes
APPSTAT-MS
Comments
Physical copy available from RIT's Wallace Library at TK7882.P3 M3 2014