Intelligibility Rating with Automatic Speech Recognition, Prosodic, and Cepstral Evaluation
For voice rehabilitation, speech intelligibility is an important criterion. Automatic evaluation of intelligibility has been shown to be successful for automatic speech recognition methods combined with prosodic analysis. In this paper, this method is extended by using measures based on the Cepstral Peak Prominence (CPP). 73 hoarse patients (48.3 ± 16.8 years) uttered the vowel /e/ and read the German version of the text “The North Wind and the Sun”. Their intelligibility was evaluated perceptually by 5 speech therapists and physicians according to a 5-point scale. Support Vector Regression (SVR) revealed a feature set with a human-machine correlation of r = 0.85 consisting of the word accuracy, smoothed CPP computed from a speech section, and three prosodic features (normalized energy of word-pause-word intervals, F 0 value at voice offset in a word, and standard deviation of jitter). The average human-human correlation was r = 0.82. Hence, the automatic method can be a meaningful objective support for perceptual analysis.
KeywordsSupport Vector Regression Automatic Speech Recognition Speech Recognition System Automatic Evaluation Noisy Speech
Unable to display preview. Download preview PDF.
- 2.Batliner, A., Buckow, J., Niemann, H., Nöth, E., Warnke, V.: The Prosody Module. In: Wahlster , pp. 106–121Google Scholar
- 4.Haderlein, T.: Automatic Evaluation of Tracheoesophageal Substitute Voices, Studien zur Mustererkennung, vol. 25. Logos, Berlin (2007)Google Scholar
- 6.Hall, M.: Correlation-based Feature Subset Selection for Machine Learning. Ph.D. thesis, University of Waikato, Hamilton, New Zealand (1999)Google Scholar
- 7.Hillenbrand, J.: cpps.exe (software), http://homepages.wmich.edu/~hillenbr (accessed May 30, 2011)
- 9.International Phonetic Association (IPA): Handbook of the International Phonetic Association. Cambridge University Press, Cambridge (1999)Google Scholar
- 10.Maier, A.: Speech of Children with Cleft Lip and Palate: Automatic Assessment, Studien zur Mustererkennung, vol. 29. Logos, Berlin (2009)Google Scholar
- 15.Shriberg, E., Stolcke, A.: Direct Modeling of Prosody: An Overview of Applications in Automatic Speech Processing. In: Proc. International Conference on Speech Prosody, Nara, Japan, pp. 575–582 (2004)Google Scholar
- 17.Stemmer, G.: Modeling Variability in Speech Recognition, Studien zur Mustererkennung, vol. 19. Logos, Berlin (2005)Google Scholar