Abstract

In the English language there are six stop consonants, /b,d,g,p,t,k/. They account for over 17% of all phonemic occurrences. In continuous speech, phonetic recognition of stop consonants requires the ability to explicitly characterize the acoustic signal. Prior work has shown that high classification accuracy of discrete syllables and words can be achieved by characterizing the shape of the spectrally transformed acoustic signal. This thesis extends this concept to include a multispeaker continuous speech database and statistical moments of a distribution to characterize shape. A multivariate maximum likelihood classifier was used to discriminate classes. To reduce the number of features used by the discriminant model a dynamic programming scheme was employed to optimize subset combinations. The top six moments were the mean, variance, and skewness in both frequency and energy. Results showed 85% classification on the full database of 952 utterances. Performance improved to 97% when the discriminant model was trained separately for male and female talkers.

Library of Congress Subject Headings

Speech processing systems; Automatic speech recognition

Publication Date

1991

Document Type

Thesis

Department, Program, or Center

Computer Science (GCCIS)

Advisor

Not Listed

Comments

Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works. Physical copy available through RIT's The Wallace Library at: TK7882.S65 C35 1991

Campus

RIT – Main Campus

Share

COinS