Author

Zichen Ma

Abstract

The problem of classification of people based on their phonetic features of accents is posted. This thesis intends to construct an automatic accent recognition machine that can accomplish this classification task with a decent accuracy. The machine consists of two crucial steps, feature extraction and pattern recognition. In the thesis, we review and explore multiple techniques of both steps in great detail. Specifically, in terms of feature extraction, we explore the techniques of principal component analysis and cepstral analysis, and in terms of pattern recognition, we explore the algorithms of discriminant function, support vector machine, and k-nearest neighbors. Since signal data usually exhibit the feature of High Dimension Low Sample Size, it is crucial in the automatic accent recognition task to reduce the dimensionality.

Two studies are constructed in which speech signals are collected and a binary classification of American English accent and non-American English accent is performed. In the first study, a total of 330 speech signals, without the disturbance of noise, of an average dimensionality of 44050 are classified into two categories. In the time domain, the dimensionality is reduced to 250 using principal component analysis. Although the in-sample prediction shows an optimistic accuracy of over 90%, the out-of-sample prediction accuracy using cross-validation is as low as 60%. Alternatively, a feature extraction technique in the frequency domain, cepstral analysis, is implemented instead of principal component analysis, by which a special type of feature called mel-frequency cepstral coefficients is extracted and the dimensionality is reduced to some values between 12 and 39. The out-of-sample prediction accuracy can be as high as around 95%. Although cepstral analysis demonstrates itself as a powerful tool in accent recognition, through asecond study we further show that it may quickly fail when there is evident amount of noise in the signal. The prediction performance is reduced to 80% or lower, depending on the amplitude of the noise and the length of the signals.

Library of Congress Subject Headings

Pattern recognition systems; Accents and accentuation--Data processing; Speech--Data processing; Sound--Classification

Publication Date

12-2014

Document Type

Thesis

Student Type

Graduate

Degree Name

Applied Statistics (MS)

Department, Program, or Center

The John D. Hromi Center for Quality and Applied Statistics (KGCOE)

Advisor

Ernest Fokoue

Advisor/Committee Member

Joseph Voelkel

Advisor/Committee Member

Steven Lalonde

Comments

Physical copy available from RIT's Wallace Library at TK7882.P3 M3 2014

Campus

RIT – Main Campus

Plan Codes

APPSTAT-MS

Share

COinS