Advertisement

Subtext Word Accuracy and Prosodic Features for Automatic Intelligibility Assessment

  • Tino Haderlein
  • Anne Schützenberger
  • Michael Döllinger
  • Elmar Nöth
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11107)

Abstract

Speech intelligibility for voice rehabilitation can successfully be evaluated by automatic prosodic analysis. In this paper, the influence of reading errors and the selection of certain words (nouns only, nouns and verbs, beginning of each sentence, beginnings of sentences and subclauses) for the computation of the word accuracy (WA) and prosodic features are examined. 73 hoarse patients read the German version of the text “The North Wind and the Sun”. Their intelligibility was evaluated perceptually by 5 trained experts according to a 5-point scale. Combining prosodic features and WA by Support Vector Regression showed human-machine correlations of up to \(r=0.86\). They drop for files with few reading errors, however, but this can largely be evened out by feature set adjustment. WA should be computed on the whole text, but for some prosodic features, a subset of words may be sufficient.

Keywords

Intelligibility Automatic assessment Prosody Reading errors 

Notes

Acknowledgments

Dr. Döllinger’s contribution was supported by the German Research Foundation (DFG), grant no. DO1247/8-1 (no. 323308998).

References

  1. 1.
    Hustad, K., Dardis, C., McCourt, K.: Effects of visual information on intelligibility of open and closed class words in predictable sentences produced by speakers with dysarthria. Clin. Linguist. Phon 21, 353–367 (2007)CrossRefGoogle Scholar
  2. 2.
    Cutler, A.: Phonological cues to open- and closed-class words in the processing of spoken sentences. J. Psycholinguist Res. 22, 109–131 (1993)MathSciNetGoogle Scholar
  3. 3.
    Grosjean, F., Gee, J.: Prosodic structure and spoken word recognition. Cognition 25, 135–155 (1987)CrossRefGoogle Scholar
  4. 4.
    Pichney, M., Durlach, N., Braida, L.: Speaking clearly for the hard of hearing. II: acoustic characteristics of clear and conversational speech. J. Speech Hear. Res. 29, 434–446 (1986)CrossRefGoogle Scholar
  5. 5.
    Turner, G., Tjaden, K.: Acoustic differences between content and function words in amyotrophic lateral sclerosis. J. Speech Lang. Hear. Res. 43, 769–781 (2000)CrossRefGoogle Scholar
  6. 6.
    Haderlein, T., Schützenberger, A., Döllinger, M., Nöth, E.: Robust automatic evaluation of intelligibility in voice rehabilitation using prosodic analysis. In: Ekštein, K., Matoušek, V. (eds.) TSD 2017. LNCS (LNAI), vol. 10415, pp. 11–19. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-64206-2_2CrossRefGoogle Scholar
  7. 7.
    Haderlein, T., Nöth, E., Maier, A., Schuster, M., Rosanowski, F.: Influence of reading errors on the text-based automatic evaluation of pathologic voices. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 325–332. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-87391-4_42CrossRefGoogle Scholar
  8. 8.
    Haderlein, T., Döllinger, M., Matoušek, V., Nöth, E.: Objective voice and speech analysis of persons with chronic hoarseness by prosodic analysis of speech samples. Logop. Phoniatr Vocol 41, 106–116 (2016)Google Scholar
  9. 9.
    International Phonetic Association (IPA): Handbook of the International Phonetic Association. Cambridge University Press, Cambridge (1999)Google Scholar
  10. 10.
    Maier, A.: Speech of Children with Cleft Lip and Palate: Automatic Assessment. Studien zur Mustererkennung, vol. 29. Logos Verlag, Berlin (2009)Google Scholar
  11. 11.
    Haderlein, T., Moers, C., Möbius, B., Rosanowski, F., Nöth, E.: Intelligibility rating with automatic speech recognition, prosodic, and cepstral evaluation. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS (LNAI), vol. 6836, pp. 195–202. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-23538-2_25CrossRefGoogle Scholar
  12. 12.
    Haderlein, T., Schwemmle, C., Döllinger, M., Matoušek, V., Ptok, M., Nöth, E.: Automatic evaluation of voice quality using text-based laryngograph measurements and prosodic analysis. Comput. Math. Methods Med. 2015, 11 (2015)CrossRefGoogle Scholar
  13. 13.
    Batliner, A., Buckow, J., Niemann, H., Nöth, E., Warnke, V.: The Prosody Module. In: Wahlster, W. (ed.) Verbmobil: Foundations of Speech-to-Speech Translation, pp. 106–121. Springer, Berlin (2000).  https://doi.org/10.1007/978-3-662-04230-4_8CrossRefGoogle Scholar
  14. 14.
    Rubenstein, H., Pickett, J.: Intelligibility of words in sentences. J. Acoust. Soc. Am. 30, 670 (1958)CrossRefGoogle Scholar
  15. 15.
    Smola, A., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14, 199–222 (2004)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Lehrstuhl für Informatik 5 (Mustererkennung)ErlangenGermany
  2. 2.Universitätsklinikum Erlangen, Phoniatrische und Pädaudiologische Abteilung in der HNO-KlinikErlangenGermany

Personalised recommendations