Abstract
Substitute speech after laryngectomy is characterized by restricted aero-acoustic properties in comparison with laryngeal speech and has therefore lower intelligibility. Until now, an objective means to determine and quantify the intelligibility has not existed, although the intelligibility can serve as a global outcome parameter of voice restoration after laryngectomy. An automatic speech recognition system was applied on recordings of a standard text read by 18 German male laryngectomees with tracheoesophageal substitute speech. The system was trained with normal laryngeal speakers and not adapted to severely disturbed voices. Substitute speech was compared to laryngeal speech of a control group. Subjective evaluation of intelligibility was performed by a panel of five experts and compared to automatic speech evaluation. Substitute speech showed lower syllables/s and lower word accuracy than laryngeal speech. Automatic speech recognition for substitute speech yielded word accuracy between 10.0 and 50% (28.7±12.1%) with sufficient discrimination. It complied with experts’ subjective evaluations of intelligibility. The multi-rater kappa of the experts alone did not differ from the multi-rater kappa of experts and the recognizer. Automatic speech recognition serves as a good means to objectify and quantify global speech outcome of laryngectomees. For clinical use, the speech recognition system will be adapted to disturbed voices and can also be applied in other languages.
Similar content being viewed by others
References
Davies M, Fleiss JL (1982) Measuring agreement for multinomial data. Biometrics 38:1047–1051
Debruyne F, Delaere P, Wouters J, Uwents JP (1994) Acoustic analysis of tracheo-oesophageal versus oesophageal speech. J Laryngol Otol 108:325–328
Fleiss JL (1981) Statistical methods for rates and proportions, 2nd edn. John Wiley & Sons, New York
Gallwitz F, Niemann H, Nöth E (1999) Speech recognition—state of the art, applications, and future prospects. Wirtschaftsinformatik 41:538–547
Gandour J, Weinberg B (1983) Perception of intonational contrasts in alaryngeal speech. J Speech Hear Res 44:1315–1320
Pauloski BR (1998) Acoustic and aerodynamic characteristics of tracheoesophageal voice. In: Blom ED, Singer MI, Hamaker RC (eds) Tracheoesophageal voice restoration following total laryngectomy, PA. Singular Publishing Group Inc, San Diego London, pp 123–141
Pindzola RH, Cain BH (1989) Duration and frequency characteristics of tracheoesophageal speech. Ann Otol Rhinol Laryngol 98:960–964
Qi Y, Weinberg B (1995) Characteristics of voicing source waveforms produced by esophageal and tracheoesophageal speakers. J Speech Hear Res 38:536–548
Robbins J, Fisher HB, Blom ED, Singer MI (1984) A comparative study of normal, esophageal and tracheoesophageal speech production. J Speech Hear Disord 49:202–210
Schuster M, Lohscheller J, Kummer P, Hoppe U, Eysholdt U, Rosanowski F (2004) Voice handicap of laryngectomees with tracheoesophageal speech. Folia Phoniatr Logop 56:62–67
Searl JP, Carpenter MA (2002) Acoustic cues to the voicing feature in tracheoesophageal speech. J Speech Lang Hear Res 45:282–294
Steidl S, Stemmer G, Hacker C, Nöth E, Niemann H (2002) Improving children’s speech recognition by HMM Interpolation with adults’ speech recognizer. In: Michaelis B, Krell G (eds) Pattern recognition, 25 th DAGM Symposium, vol 2781 of lecture notes in computer science. Springer, Heidelberg New York Berlin, pp 600–607
Stemmer G (2005) Modeling variability in speech recognition. PhD Thesis, chair for pattern recognition. University of Erlangen-Nuremberg, Germany
Van As CJ, Hilgers FJM, Verdonck-de Leeuw IM, Koopmans-van Beinum FJ (1998) Acoustical analysis and perceptual evaluation of tracheoesophageal prosthetic voice. J Voice 12:239–248
Wahlster W (ed) (2000) Verbmobil: Foundations of speech-to-speech translation, Springer, Berlin Heidelberg New York
Wiliams SE, Scanio TS, Ritterman SI (1989) Temporal and perceptual characteristics of tracheoesophageal voice. Laryngoscope 99:846–850
Wilpon JG, Jacobsen CN (1996) A study of speech recognition for children and the elderly. Proc. of ICASSP, pp 349–352
Acknowledgments
This work was partially supported by the EU in the project PF-Star under grant IST-2001–37599 and by the DFG (Deutsche Forschungsgemeinschaft, German Research Council), SFB 603, subproject B5, and the Deutsche Krebshilfe (registration no. 106266). The authors are responsible for the content of this article. We thank PD Dr. A. Pfahlberg, Institute for Medical Informatics, Biometry and Epidemiology, University of Erlangen, for the helpful suggestions, and Prof. Dr. H. Iro, Department of ENT, University of Erlangen, for supplying the data of the control group.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Schuster, M., Haderlein, T., Nöth, E. et al. Intelligibility of laryngectomees’ substitute speech: automatic speech recognition and subjective rating. Eur Arch Otorhinolaryngol 263, 188–193 (2006). https://doi.org/10.1007/s00405-005-0974-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00405-005-0974-6