Intelligibility Rating with Automatic Speech Recognition, Prosodic, and Cepstral Evaluation

  • Tino Haderlein
  • Cornelia Moers
  • Bernd Möbius
  • Frank Rosanowski
  • Elmar Nöth
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6836)

Abstract

For voice rehabilitation, speech intelligibility is an important criterion. Automatic evaluation of intelligibility has been shown to be successful for automatic speech recognition methods combined with prosodic analysis. In this paper, this method is extended by using measures based on the Cepstral Peak Prominence (CPP). 73 hoarse patients (48.3 ± 16.8 years) uttered the vowel /e/ and read the German version of the text “The North Wind and the Sun”. Their intelligibility was evaluated perceptually by 5 speech therapists and physicians according to a 5-point scale. Support Vector Regression (SVR) revealed a feature set with a human-machine correlation of r = 0.85 consisting of the word accuracy, smoothed CPP computed from a speech section, and three prosodic features (normalized energy of word-pause-word intervals, F 0 value at voice offset in a word, and standard deviation of jitter). The average human-human correlation was r = 0.82. Hence, the automatic method can be a meaningful objective support for perceptual analysis.

Keywords

Support Vector Regression Automatic Speech Recognition Speech Recognition System Automatic Evaluation Noisy Speech 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Awan, S., Roy, N.: Outcomes Measurement in Voice Disorders: Application of an Acoustic Index of Dysphonia Severity. J. Speech Lang. Hear. Res. 52, 482–499 (2009)CrossRefGoogle Scholar
  2. 2.
    Batliner, A., Buckow, J., Niemann, H., Nöth, E., Warnke, V.: The Prosody Module. In: Wahlster [18], pp. 106–121Google Scholar
  3. 3.
    Chen, K., Hasegawa-Johnson, M., Cohen, A., Borys, S., Kim, S.S., Cole, J., Choi, J.Y.: Prosody dependent speech recognition on radio news corpus of American English. IEEE Trans. Audio, Speech, and Language Processing 14, 232–245 (2006)CrossRefGoogle Scholar
  4. 4.
    Haderlein, T.: Automatic Evaluation of Tracheoesophageal Substitute Voices, Studien zur Mustererkennung, vol. 25. Logos, Berlin (2007)Google Scholar
  5. 5.
    Halberstam, B.: Acoustic and Perceptual Parameters Relating to Connected Speech Are More Reliable Measures of Hoarseness than Parameters Relating to Sustained Vowels. ORL J. Otorhinolaryngol. Relat. Spec. 66, 70–73 (2004)CrossRefGoogle Scholar
  6. 6.
    Hall, M.: Correlation-based Feature Subset Selection for Machine Learning. Ph.D. thesis, University of Waikato, Hamilton, New Zealand (1999)Google Scholar
  7. 7.
    Hillenbrand, J.: cpps.exe (software), http://homepages.wmich.edu/~hillenbr (accessed May 30, 2011)
  8. 8.
    Hillenbrand, J., Houde, R.: Acoustic Correlates of Breathy Vocal Quality: Dysphonic Voices and Continuous Speech. J. Speech Hear. Res. 39, 311–321 (1996)CrossRefGoogle Scholar
  9. 9.
    International Phonetic Association (IPA): Handbook of the International Phonetic Association. Cambridge University Press, Cambridge (1999)Google Scholar
  10. 10.
    Maier, A.: Speech of Children with Cleft Lip and Palate: Automatic Assessment, Studien zur Mustererkennung, vol. 29. Logos, Berlin (2009)Google Scholar
  11. 11.
    Maryn, Y., Roy, N., De Bodt, M., Van Cauwenberge, P., Corthals, P.: Acoustic measurement of overall voice quality: A meta-analysis. J. Acoust. Soc. Am. 126, 2619–2634 (2009)CrossRefGoogle Scholar
  12. 12.
    Nöth, E., Batliner, A., Kießling, A., Kompe, R., Niemann, H.: Verbmobil: The Use of Prosody in the Linguistic Components of a Speech Understanding System. IEEE Trans. on Speech and Audio Processing 8, 519–532 (2000)CrossRefGoogle Scholar
  13. 13.
    Parsa, V., Jamieson, D.: Acoustic discrimination of pathological voice: sustained vowels versus continuous speech. J. Speech Lang. Hear. Res. 44, 327–339 (2001)CrossRefGoogle Scholar
  14. 14.
    Ruben, R.: Redefining the survival of the fittest: communication disorders in the 21st century. Laryngoscope 110, 241–245 (2000)CrossRefGoogle Scholar
  15. 15.
    Shriberg, E., Stolcke, A.: Direct Modeling of Prosody: An Overview of Applications in Automatic Speech Processing. In: Proc. International Conference on Speech Prosody, Nara, Japan, pp. 575–582 (2004)Google Scholar
  16. 16.
    Smola, A., Schölkopf, B.: A Tutorial on Support Vector Regression. Statistics and Computing 14, 199–222 (2004)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Stemmer, G.: Modeling Variability in Speech Recognition, Studien zur Mustererkennung, vol. 19. Logos, Berlin (2005)Google Scholar
  18. 18.
    Wahlster, W. (ed.): Verbmobil: Foundations of Speech-to-Speech Translation. Springer, Berlin (2000)MATHGoogle Scholar
  19. 19.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Tino Haderlein
    • 1
    • 2
  • Cornelia Moers
    • 3
  • Bernd Möbius
    • 4
  • Frank Rosanowski
    • 2
  • Elmar Nöth
    • 1
  1. 1.Pattern Recognition Lab (Informatik 5)University of Erlangen-NurembergErlangenGermany
  2. 2.Department of Phoniatrics and PedaudiologyUniversity of Erlangen-NurembergErlangenGermany
  3. 3.Department of Speech and CommunicationUniversity of BonnBonnGermany
  4. 4.Department of Computational Linguistics and PhoneticsSaarland UniversitySaarbrückenGermany

Personalised recommendations