Zusammenfassung
Hintergrund
In dieser Studie wird ein objektives Verfahren für die Verständlichkeitsmessung mit dem Postlaryngektomie-Telefontest (PLTT) mittels automatischer Spracherkennungstechnik beschrieben.
Material und Methoden
31 Sprecher mit tracheoösophagealer Ersatzstimme (25 Männer und 6 Frauen; 63,4±8,7 Jahre) wurden zunächst von 11 naiven Hörern bewertet. Der vom Spracherkennungssystem ermittelte Verständlichkeitsgrad wird als Prozentsatz korrekt verstandener Wörter einer Wortkette, der Wortakkuratheit bzw. -korrektheit, angegeben und mit den subjektiv ermittelten PLTT-Werten verglichen.
Ergebnisse
Die durchschnittliche PLTT-Gesamtverständlichkeit der 11 naiven Hörer liegt bei 47%, die automatisch ermittelte Wortakkuratheit und Wortkorrektheit liegen deutlich niedriger (etwa 0% bzw. etwa 15%). Die Korrelation zwischen menschlicher und maschineller Bewertung liegt jedoch z. T. über 0,9.
Fazit
Für den Gesamtverständlichkeitswert des PLTT kann mit Hilfe der automatischen Spracherkennung objektiv und effizient ein äquivalentes Maß berechnet werden.
Abstract
Objective
In this study, an objective version of the postlaryngectomy telephone test (PLTT) for measuring speech intelligibility based on automatic speech recognition is presented.
Methods
Thirty-one patients with tracheoesophageal substitute voice (25 men and six women, 63.4±8.7 years) were evaluated by 11 naïve listeners. The automatic measurement of speech intelligibility was expressed by means of word accuracy and word recognition rates, or the percentage of correctly recognized words from a word sequence. These automatic measures were compared with the subjectively obtained PLTT values.
Results
The average PLTT intelligibility of the 11 naïve listeners was 47%; the automatically obtained word accuracy and word recognition rates were much lower (approximately 0% and 15%, respectively). The correlation between subjective and automatic evaluation, however, reached more than 0.9 in some of the examined cases.
Conclusion
Automatic speech recognition provides an efficient, objective measure that is equivalent to the overall PLTT intelligibility value.
Literatur
Baker JK (1975) The DRAGON System – an overview. IEEE Trans Acoust Speech Signal Process 23: 24–29
Bellandese MH, Lerman JW, Gilbert HR (2001) An acoustic analysis of excellent female esophageal, tracheoesophageal, and laryngeal speakers. J Speech Lang Hear Res 44: 1315–1320
Brown DH, Hilgers FJM, Irish JC, Balm AJM (2003) Postlaryngectomy voice rehabilitation: state of the art at the millennium. World J Surg 27: 824–831
Devins GM, Stam HJ, Koopmans JP (1994) Psychosocial impact of laryngectomy mediated by perceived stigma and illness intrusiveness. Can J Psychiatry 39: 608–616
Fröhlich M, Michaelis D, Strube HW, Kruse E (2000) Acoustic voice analysis by means of the hoarseness diagram. J Speech Lang Hear Res 43: 706–720
Gallwitz F, Niemann H, Nöth E (1999) Spracherkennung – Stand der Technik, Einsatzmöglichkeiten und Perspektiven. Wirtschaftsinformatik 41: 538–547
Gandour J, Weinberg B (1983) Perception of intonational contrasts in alaryngeal speech. J Speech Hear Res 26: 142–148
Gogh CDL van, Festen JM, Verdonck-de Leeuw IM et al. (2005) Acoustical analysis of tracheoesophageal voice. Speech Commun 47: 160–168
Haderlein T, Steidl S, Nöth E et al. (2004) Automatic recognition and evaluation of tracheoesophageal speech. In: Sojka P, Kopeček I, Pala K (eds) Proceedings of the 7th International Conference on Text, Speech and Dialogue (TSD 2004). Lecture notes in artificial intelligence, vol 3206, Springer, Berlin Heidelberg New York, pp 331–338
Jelinek F, Bahl LR, Mercer RL (1975) Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE Trans Inf Theory IT-21: 250–256
Lohscheller J (2003) Dynamics of the laryngectomee substitute voice production. Kommunikationsstörungen – Berichte aus Phoniatrie und Pädaudiologie, Bd 14. Shaker, Aachen
Moerman M, Pieters G, Martens JP et al. (2004) Objective evaluation of the quality of substitution voices. Eur Arch Otorhinolaryngol 261: 541–547
Moran R, Reilly RB, Chazal P de, Lacy PD (2006) Telephony-based voice pathology assessment using automated speech analysis. IEEE Trans Biomed Eng 53: 468–477
Olthoff A, Mrugalla K, Laskawi R et al. (2003) Assessment of irregular voices after total and laser surgical partial laryngectomy. Arch Otolaryngol Head Neck Surg 129: 994–999
Picone J, Goudie-Marshall K, Doddington G, Fisher W (1986) Automatic text alignment for speech system evaluation. IEEE Trans Acoust Speech Signal Process 34: 780–784
Riedhammer K, Haderlein T, Schuster M et al. (2006) Automatic evaluation of tracheoesophageal telephone speech. In: Erjavec T, Žganec Gros J (eds) Proceedings of the 5th Slovenian and 1st International Conference Language Technologies IS-LTC 2006. Biografika BORI, Ljubljana, pp 17–22
Robbins J, Fisher HB, Blom ED, Singer MI (1984) A Comparative acoustic study of normal, esophageal, and tracheoesophageal speech production. J Speech Hear Disord 49: 202–210
Schukat-Talamazzini EG, Niemann H, Eckert W et al. (1993) Automatic speech recognition without phonemes. In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech). European Speech Communication Association (ESCA), Berlin, pp 129–132
Schuster M, Haderlein T, Nöth E et al. (2006) Intelligibility of laryngectomees‘ substitute speech: automatic speech recognition and subjective rating. Eur Arch Otorhinolaryngol 263: 188–193
Schuster M, Kummer P, Eysholdt U, Rosanowski F (2003) Quality of life in laryngectomees after prosthetic voice restoration. Folia Phoniatr Logop 55: 211–219
Schuster M, Nöth E, Haderlein T et al. (2005) Can you understand him? Let’s look at his word accuracy – automatic evaluation of tracheoesophageal speech. Proceedings of ICASSP 2005 – International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, pp 61–64
Schutte HK, Nieboer GJ (2002) Aerodynamics of esophageal voice production with and without a Groningen voice prosthesis. Folia Phoniatr Logop 54: 8–18
Searl JP, Carpenter MA (2002) Acoustic cues to the voicing feature in tracheoesophageal speech. J Speech Lang Hear Res 45: 282–294
Singer S, Danker H, Bloching M et al. (2007) Stigmatisierungsgefühle nach Kehlkopfentfernung. Psychother Psychosom Med Psychol 57: 328–333
Stemmer G (2005) Modeling Variability in Speech Recognition. Studien zur Mustererkennung, Bd 19. Logos, Berlin
Torn M van der, Mahieu HF, Festen JM (2001) Aero-acoustics of silicone rubber lip reeds for alternative voice production in laryngectomees. J Acoust Soc Am 110: 2548–2559
Wokurek W, Pützer M (2003) Automated corpus based spectral measurement of voice quality parameters. In: Proceedings of the International Congress of Phonetic Sciences (ICPhS). International Phonetic Association (IPA), Barcelona, pp 2173-2176
Zenner HP (1986) The postlaryngectomy telephone intelligibility test (PLTT). In: Herrmann IF (ed) Speech Restoration via Voice Prosthesis. Springer, Berlin, pp 148–152
Zenner HP, Pfrang H (1986) Ein einfacher Sprachverständlichkeitstest zur Beurteilung der Stimmrehabilitation des Laryngektomierten. Laryngorhinootologie 65: 271–276
Danksagung
Diese Arbeit wurde von der Deutschen Krebshilfe (Förder-Nr. 106266) gefördert.
Interessenkonflikt
Der Autor PD Dr.-Ing. Elmar Nöth ist Teilhaber der Fa. Sympalog Voice Solutions, Erlangen.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Haderlein, T., Riedhammer, K., Maier, A. et al. Automatisierung des Postlaryngektomie-Telefontests. HNO 57, 51–56 (2009). https://doi.org/10.1007/s00106-008-1698-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00106-008-1698-x
Schlüsselwörter
- Automatische Mustererkennung
- Spracherkennungssoftware
- Sprechverständlichkeit
- Auswertungsmethoden
- Korrelation von Daten