Skip to main content

Summary

In multimodal dialogue systems, several input and output modalities are used for user interaction. The most important modality for human computer interaction is speech. Similar to human human interaction, it is necessary in the human computer interaction that the machine recognizes the spoken word chain in the user’s utterance. For better communication with the user it is advantageous to recognize his internal emotional state because it is then possible to adapt the dialogue strategy to the situation in order to reduce, for example, anger or uncertainty of the user.

In the following sections we describe first the state of the art in emotion and user state recognition with the help of prosody. The next section describes the prosody module. After that we present the experiments and results for recognition of user states. We summarize our results in the last section.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • J. Adelhardt, C. Frank, E. Nöth, R.P. Shi, V. Zeißler, and H. Niemann. Multimodal Emogram, Data Collection and Presentation, 2006. In this volume.

    Google Scholar 

  • J. Adelhardt, R.P. Shi, C. Frank, V. Zeißler, A. Batliner, E. Nöth, and H. Niemann. Multimodal User State Recognition in a Modern Dialogue System. In: Proc. 26th German Conference on Artificial Intelligence (KI 03), pp. 591–605, Berlin Heidelberg New York, 2003. Springer.

    Google Scholar 

  • N. Amir and S. Ron. Towards an Automatic Classification of Emotions in Speech. In: Proc. ICSLP-98, vol. 3, pp. 555–558, Sydney, Australia, 1998.

    Google Scholar 

  • S. Arunachalam, D. Gould, E. Andersen, D. Byrd, and S. Narayanan. Politeness and Frustration Language in Child-Machine Interactions. In: Proc. EUROSPEECH-01, pp. 2675–2678, Aalborg, Denmark, September 2001.

    Google Scholar 

  • A. Batliner, A. Buckow, H. Niemann, E. Nöth, and V. Warnke. The Prosody Module. In: W. Wahlster (ed.), Verbmobil: Foundations of Speech-to-Speech Translation, pp. 106–121, Berlin Heidelberg New York, 2000a. Springer.

    Google Scholar 

  • A. Batliner, R. Huber, H. Niemann, E. Nöth, J. Spilker, and K. Fischer. The Recognition of Emotion. In: W. Wahlster (ed.), Verbmobil: Foundations of Speech-to-Speech Translation, pp. 122–130, Berlin Heidelberg New York, 2000b. Springer.

    Google Scholar 

  • A. Batliner, M. Nutt, V. Warnke, E. Nöth, J. Buckow, R. Huber, and H. Niemann. Automatic Annotation and Classification of Phrase Accents in Spontaneous Speech. In: Proc. EUROSPEECH-99, vol. 1, pp. 519–522, Budapest, Hungary, 1999.

    Google Scholar 

  • A. Batliner, V. Zeißler, C. Frank, J. Adelhardt, R.P. Shi, E. Nöth, and H. Niemann. We Are Not Amused — But How Do You Know? User States in a Multi-Modal Dialogue System. In: Proc. EUROSPEECH-03, vol. 1, pp. 733–736, Geneva, Switzerland, 2003.

    Google Scholar 

  • F. Dellaert, T. Polzin, and A. Waibel. Recognizing Emotion in Speech. In: Proc. ICSLP-96, vol. 3, pp. 1970–1973, Philadelphia, PA, 1996.

    Google Scholar 

  • G. Herzog and A. Ndiaye. Building Multimodal Dialogue Applications: System Integration in SmartKom, 2006. In this volume.

    Google Scholar 

  • R. Huber. Prosodisch-linguistische Klassifikation von Emotion, vol. 8 of Studien zur Mustererkennung. Logos, Berlin, Germany, 2002.

    Google Scholar 

  • R. Huber, E. Nöth, A. Batliner, A. Buckow, V. Warnke, and H. Niemann. You BEEP Machine — Emotion in Automatic Speech Understanding Systems. In: TSD98, pp. 223–228, Brno, Czech Republic, 1998.

    Google Scholar 

  • A. Kießling. Extraktion und Klassifikation prosodischer Merkmale in der automatischen Sprachverarbeitung. Shaker, Aachen, Germany, 1997.

    Google Scholar 

  • C.M. Lee, S.S. Narayanan, and R. Pieraccini. Combining Acoustic And Language Information For Emotion Recognition. In: Proc. ICSLP-2002, pp. 873–876, Denver, CO, 2002.

    Google Scholar 

  • Y. Li and Y. Zhao. Recognizing Emotions in Speech Using Short-Term and Long-Term Features. In: Proc. ICSLP-98, vol. 6, pp. 2255–2258, Sydney, Australia, 1998.

    Google Scholar 

  • M.J. Norusis. SPSS 8.0 Guide to Data Analysis. Prentice Hall, Upper Saddle River, NJ, 1998.

    Google Scholar 

  • E. Nöth, A. Batliner, A. Kießling, R. Kompe, and H. Niemann. Verbmobil: The Use of Prosody in the Linguistic Components of a Speech Understanding System. IEEE Transactions on Speech and Audio Processing, 8(5):519–532, 2000.

    Article  Google Scholar 

  • V.A. Petrushin. Emotion Recognition in Speech Signal: Experimental Study, Development, and Application. In: Proc. ICSLP-2000, vol. IV, pp. 222–225, Beijing, China, 2000.

    Google Scholar 

  • R.W. Picard (ed.). Affective Computing. MIT Press, Cambridge, MA, 1997.

    Google Scholar 

  • M. Riedmiller and H. Braun. A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm. In: Proc. IEEE Intl. Conf. on Neural Networks, pp. 586–591, San Francisco, CA, 1993.

    Google Scholar 

  • F. Schiel and U. Türk. Wizard-of-Oz Recordings, 2006. In this volume.

    Google Scholar 

  • E.G. Schukat-Talamazzini. Automatische Spracherkennung — Grundlagen, statistische Modelle und effiziente Algorithmen. Vieweg, Braunschweig, Germany, 1995.

    Google Scholar 

  • M. Streit, A. Batliner, and T. Portele. Emotion Analysis and Emotion Handling Subdialogs, 2006. In this volume.

    Google Scholar 

  • I.H. Witten and E. Frank. Data Mining — Practical Machine Learning Tools and Techniques With Java Implementations. Morgan Kaufmann, San Francisco, CA, 2000.

    Google Scholar 

  • V. Zeißler, E. Nöth, and G. Stemmer. Parametrische Modellierung von Dauer und Energie prosodischer Einheiten. In: Proc. KONVENS 2002, pp. 177–183, Saarbruecken, Germany, 2002.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Zeißler, V. et al. (2006). The Prosody Module. In: Wahlster, W. (eds) SmartKom: Foundations of Multimodal Dialogue Systems. Cognitive Technologies. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-36678-4_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-36678-4_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23732-7

  • Online ISBN: 978-3-540-36678-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics