TSD 1999: Text, Speech and Dialogue pp 193-198 | Cite as

Fast and Robust Features for Prosodic Classification?

  • Jan Buckow
  • Volker Warnke
  • Richard Huber
  • Anton Batliner
  • Elmar Nöth
  • Heinrich Niemann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1692)

Abstract

In our previous research, we have shown that prosody can be used to dramatically improve the performance of the automatic speech translation system Verbmobil [5][7][8]. In Verbmobil, prosodic information is made available to the different modules of the system by annotating the output of a word recognizer with prosodic markers. These markers are determined in a classification process. The computation of the prosodic features used for classification was previously based on a time alignment of the phoneme sequence of the recognized words. The phoneme segmentation was needed for the normalization of duration and energy features. This time alignment was very expensive in terms of computational effort and memory requirement. In our new approach the normalization is done on the word level with precomputed duration and energy statistics, thus the phoneme segmentation can be avoided. With the new set of prosodic features better classification results can be achieved, the features extraction can be sped up by 64 %, and the memory requirements are even reduced by 92%.

Keywords

Memory Requirement Multi Layer Perceptron Word Level Pitch Contour Prosodic Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. Batliner, A. Kießling, R. Kompe, H. Niemann, and E. Nöth. Tempo and its Change in Spontaneous Speech. In Proc. European Conf. on Speech Communication and Technology, volume 2, pages 763–766, Rhodes, 1997.Google Scholar
  2. 2.
    M. Beckman. Stress and Non-stress Accent. Foris Publications, Dordrecht, 1986.Google Scholar
  3. 3.
    C.M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, NY, 1995.Google Scholar
  4. 4.
    I.N. Bronstein and K.A. Semendjajew. Taschenbuch der Mathematik. Verlag Harri Deutsch, Thun und Frankfurt/Main, 24 edition, 1989.MATHGoogle Scholar
  5. 5.
    T. Bub and J. Schwinn. Verbmobil: The Evolution of a Complex Large Speech-to-Speech Translation System. In Int. Conf. on Spoken Language Processing, Volume 4, pages 1026–1029, Philadelphia, 1996.Google Scholar
  6. 6.
    Andreas Kießling. Extraktion und Klassifikation prosodischer Merkmale in der automatischen Sprachverarbeitung. Berichte aus der Informatik. Shaker Verlag, Aachen, 1997.Google Scholar
  7. 7.
    R. Kompe, A. Kießling, H. Niemann, E. Nöth, A. Batliner, S. Schachtl, T. Ruland, and H.U. Block. Improving Parsing of Spontaneous Speech with the Help of Prosodic Boundaries. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, Volume 2, pp. 811–814, München, 1997.Google Scholar
  8. 8.
    Ralf Kompe. Prosody in Speech Understanding Systems. Lecture Notes for Artificial Intelligence. Springer-Verlag, Berlin, 1997.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Jan Buckow
    • 1
  • Volker Warnke
    • 1
  • Richard Huber
    • 1
  • Anton Batliner
    • 1
  • Elmar Nöth
    • 1
  • Heinrich Niemann
    • 1
  1. 1.Chair for Pattern Recognition (Computer Science 5)University of Erlangen-NurembergErlangenGermany

Personalised recommendations