TSD 1999: Text, Speech and Dialogue pp 193-198 | Cite as
Fast and Robust Features for Prosodic Classification?
Abstract
In our previous research, we have shown that prosody can be used to dramatically improve the performance of the automatic speech translation system Verbmobil [5][7][8]. In Verbmobil, prosodic information is made available to the different modules of the system by annotating the output of a word recognizer with prosodic markers. These markers are determined in a classification process. The computation of the prosodic features used for classification was previously based on a time alignment of the phoneme sequence of the recognized words. The phoneme segmentation was needed for the normalization of duration and energy features. This time alignment was very expensive in terms of computational effort and memory requirement. In our new approach the normalization is done on the word level with precomputed duration and energy statistics, thus the phoneme segmentation can be avoided. With the new set of prosodic features better classification results can be achieved, the features extraction can be sped up by 64 %, and the memory requirements are even reduced by 92%.
Keywords
Memory Requirement Multi Layer Perceptron Word Level Pitch Contour Prosodic FeaturePreview
Unable to display preview. Download preview PDF.
References
- 1.A. Batliner, A. Kießling, R. Kompe, H. Niemann, and E. Nöth. Tempo and its Change in Spontaneous Speech. In Proc. European Conf. on Speech Communication and Technology, volume 2, pages 763–766, Rhodes, 1997.Google Scholar
- 2.M. Beckman. Stress and Non-stress Accent. Foris Publications, Dordrecht, 1986.Google Scholar
- 3.C.M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, NY, 1995.Google Scholar
- 4.I.N. Bronstein and K.A. Semendjajew. Taschenbuch der Mathematik. Verlag Harri Deutsch, Thun und Frankfurt/Main, 24 edition, 1989.MATHGoogle Scholar
- 5.T. Bub and J. Schwinn. Verbmobil: The Evolution of a Complex Large Speech-to-Speech Translation System. In Int. Conf. on Spoken Language Processing, Volume 4, pages 1026–1029, Philadelphia, 1996.Google Scholar
- 6.Andreas Kießling. Extraktion und Klassifikation prosodischer Merkmale in der automatischen Sprachverarbeitung. Berichte aus der Informatik. Shaker Verlag, Aachen, 1997.Google Scholar
- 7.R. Kompe, A. Kießling, H. Niemann, E. Nöth, A. Batliner, S. Schachtl, T. Ruland, and H.U. Block. Improving Parsing of Spontaneous Speech with the Help of Prosodic Boundaries. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, Volume 2, pp. 811–814, München, 1997.Google Scholar
- 8.Ralf Kompe. Prosody in Speech Understanding Systems. Lecture Notes for Artificial Intelligence. Springer-Verlag, Berlin, 1997.Google Scholar