Abstract
This contribution describes the challenges and the progress which have been made in Verbmobil concerning robustness of speech recognition for various types of adverse conditions, like channel distortion, environmental noise and various speaker and speaking conditions. For the channel and noise problem classical approaches like cepstral bias normalization and spectral subtraction methods have been improved as well as new methods like parallel model combination. One major result is the fact, that an intelligent combination of various methods achieves the best results. Considerable progresses have also been made in research on unsupervised speaker adaptation. Several different main approaches are presented to improve robustness against variations of speaking rate, speaking style and speaker characteristics. The methods described include new estimation of the parameters for vocal tract length normalization, features and codebook transformation methods using ML algorithms, and pronunciation adaptation of the words in the lexicon.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berouti, M., Schwartz, R., and Makhoul, J. (1979). Enhancement of Speech Corrupted by Acoustic Noise. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 208–211.
Class, F., Kaltenmeier, A., and Regel-Brietzmann, P. (1993). Optimization of an HMM-Based Continuous Speech Recognizer. In Proceedings of the 3rd European Conference on Speech Communication and Technology, 803–806.
Gales, M. and Young, S. (1996). Robust Continuous Speech Recognition Using Parallel Model Combination. IEEE Transactions on Speech and Audio Processing 4(5):352 – 359.
Gong, Y. (1995). Speech Recognition in Noisy Environments: A Survey. Speech Communication 16:261–291.
Haiber, U. (1998). Sprecheradaption in einem Spracherkennungssystem mit stochastischer Modellierung. Aachen, Germany 1998: Shaker Verlag.
Lee, L. and Rose, R. (1996). Speaker Normalization Using Efficient Frequency Warping Procedures. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 353–356.
Legetter, C. and Woodland, P. (1994). Speaker Adaptation of Continuous Density HMMs Using Multivariate Linear Regression. In Proceedings of the 3rd International Conference on Spoken Language Processing, 451–454.
Lockwood, P. and Boudy, J. (1992). Experiments with a Nonlinear Spectral Subtractor (NSS), Hidden Markov Models and the Projection, for Robust Speech Recognition in Cars. Speech Communication 11:215–228.
Pfau, T. and Ruske, G. (1998a). Estimating the Speaking Rate by Vowel Detection. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 945–948.
Pfau, T. and Ruske, G. (1998b). Creating Hidden Markov Models for Fast Speech. In Proceedings of the 5th International Conference on Spoken Language Processing, 205 – 208.
Pfau, T., Faltlhauser, R., and Ruske, G. (1999). Speaker Normalization and Pronunciation Variant Modeling: Helpful Methods for Improving Recognition of Fast Speech. In Proceedings of the 6th European Conference on Speech Communication and Technology, 299–302.
Reinecke, J. (1996). Evaluierung der signalnahen Spracherkennung im Verbundprojekt VERBMOBIL (Herbst 1996). Verbmobil Memo 113.
Schless, V. and Class, F. (1997). Adaptive Model Combination for Robust Speech Recognition in Car Environments. In Proceedings of the 5th European Conference on Speech Communication and Technology, 1091–1094.
Weilhammer, K., Burger, S., Scheer, C., and Wesenick, B. (1999). File Names, Formats and Structures. In VERBMOBIL II. Verbmobil Memo 131, Institut für Phonetik und Sprachliche Kommunikation der Universität München.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Haiber, U., Mangold, H., Pfau, T., Regel-Brietzmann, P., Ruske, G., Schleß, V. (2000). Robust Recognition of Spontaneous Speech. In: Wahlster, W. (eds) Verbmobil: Foundations of Speech-to-Speech Translation. Artificial Intelligence. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-04230-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-662-04230-4_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-08730-1
Online ISBN: 978-3-662-04230-4
eBook Packages: Springer Book Archive