The Prosody Module

Zeißler, Viktor; Adelhardt, Johann; Batliner, Anton; Frank, Carmen; Nöth, Elmar; Shi, Rui Ping; Niemann, Heinrich

doi:10.1007/3-540-36678-4_9

Viktor Zeißler⁴,
Johann Adelhardt⁴,
Anton Batliner⁴,
Carmen Frank⁴,
Elmar Nöth⁴,
Rui Ping Shi⁴ &
…
Heinrich Niemann⁴

Part of the book series: Cognitive Technologies ((COGTECH))

670 Accesses
7 Citations

Summary

In multimodal dialogue systems, several input and output modalities are used for user interaction. The most important modality for human computer interaction is speech. Similar to human human interaction, it is necessary in the human computer interaction that the machine recognizes the spoken word chain in the user’s utterance. For better communication with the user it is advantageous to recognize his internal emotional state because it is then possible to adapt the dialogue strategy to the situation in order to reduce, for example, anger or uncertainty of the user.

In the following sections we describe first the state of the art in emotion and user state recognition with the help of prosody. The next section describes the prosody module. After that we present the experiments and results for recognition of user states. We summarize our results in the last section.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. Adelhardt, C. Frank, E. Nöth, R.P. Shi, V. Zeißler, and H. Niemann. Multimodal Emogram, Data Collection and Presentation, 2006. In this volume.
Google Scholar
J. Adelhardt, R.P. Shi, C. Frank, V. Zeißler, A. Batliner, E. Nöth, and H. Niemann. Multimodal User State Recognition in a Modern Dialogue System. In: Proc. 26th German Conference on Artificial Intelligence (KI 03), pp. 591–605, Berlin Heidelberg New York, 2003. Springer.
Google Scholar
N. Amir and S. Ron. Towards an Automatic Classification of Emotions in Speech. In: Proc. ICSLP-98, vol. 3, pp. 555–558, Sydney, Australia, 1998.
Google Scholar
S. Arunachalam, D. Gould, E. Andersen, D. Byrd, and S. Narayanan. Politeness and Frustration Language in Child-Machine Interactions. In: Proc. EUROSPEECH-01, pp. 2675–2678, Aalborg, Denmark, September 2001.
Google Scholar
A. Batliner, A. Buckow, H. Niemann, E. Nöth, and V. Warnke. The Prosody Module. In: W. Wahlster (ed.), Verbmobil: Foundations of Speech-to-Speech Translation, pp. 106–121, Berlin Heidelberg New York, 2000a. Springer.
Google Scholar
A. Batliner, R. Huber, H. Niemann, E. Nöth, J. Spilker, and K. Fischer. The Recognition of Emotion. In: W. Wahlster (ed.), Verbmobil: Foundations of Speech-to-Speech Translation, pp. 122–130, Berlin Heidelberg New York, 2000b. Springer.
Google Scholar
A. Batliner, M. Nutt, V. Warnke, E. Nöth, J. Buckow, R. Huber, and H. Niemann. Automatic Annotation and Classification of Phrase Accents in Spontaneous Speech. In: Proc. EUROSPEECH-99, vol. 1, pp. 519–522, Budapest, Hungary, 1999.
Google Scholar
A. Batliner, V. Zeißler, C. Frank, J. Adelhardt, R.P. Shi, E. Nöth, and H. Niemann. We Are Not Amused — But How Do You Know? User States in a Multi-Modal Dialogue System. In: Proc. EUROSPEECH-03, vol. 1, pp. 733–736, Geneva, Switzerland, 2003.
Google Scholar
F. Dellaert, T. Polzin, and A. Waibel. Recognizing Emotion in Speech. In: Proc. ICSLP-96, vol. 3, pp. 1970–1973, Philadelphia, PA, 1996.
Google Scholar
G. Herzog and A. Ndiaye. Building Multimodal Dialogue Applications: System Integration in SmartKom, 2006. In this volume.
Google Scholar
R. Huber. Prosodisch-linguistische Klassifikation von Emotion, vol. 8 of Studien zur Mustererkennung. Logos, Berlin, Germany, 2002.
Google Scholar
R. Huber, E. Nöth, A. Batliner, A. Buckow, V. Warnke, and H. Niemann. You BEEP Machine — Emotion in Automatic Speech Understanding Systems. In: TSD98, pp. 223–228, Brno, Czech Republic, 1998.
Google Scholar
A. Kießling. Extraktion und Klassifikation prosodischer Merkmale in der automatischen Sprachverarbeitung. Shaker, Aachen, Germany, 1997.
Google Scholar
C.M. Lee, S.S. Narayanan, and R. Pieraccini. Combining Acoustic And Language Information For Emotion Recognition. In: Proc. ICSLP-2002, pp. 873–876, Denver, CO, 2002.
Google Scholar
Y. Li and Y. Zhao. Recognizing Emotions in Speech Using Short-Term and Long-Term Features. In: Proc. ICSLP-98, vol. 6, pp. 2255–2258, Sydney, Australia, 1998.
Google Scholar
M.J. Norusis. SPSS 8.0 Guide to Data Analysis. Prentice Hall, Upper Saddle River, NJ, 1998.
Google Scholar
E. Nöth, A. Batliner, A. Kießling, R. Kompe, and H. Niemann. Verbmobil: The Use of Prosody in the Linguistic Components of a Speech Understanding System. IEEE Transactions on Speech and Audio Processing, 8(5):519–532, 2000.
Article Google Scholar
V.A. Petrushin. Emotion Recognition in Speech Signal: Experimental Study, Development, and Application. In: Proc. ICSLP-2000, vol. IV, pp. 222–225, Beijing, China, 2000.
Google Scholar
R.W. Picard (ed.). Affective Computing. MIT Press, Cambridge, MA, 1997.
Google Scholar
M. Riedmiller and H. Braun. A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm. In: Proc. IEEE Intl. Conf. on Neural Networks, pp. 586–591, San Francisco, CA, 1993.
Google Scholar
F. Schiel and U. Türk. Wizard-of-Oz Recordings, 2006. In this volume.
Google Scholar
E.G. Schukat-Talamazzini. Automatische Spracherkennung — Grundlagen, statistische Modelle und effiziente Algorithmen. Vieweg, Braunschweig, Germany, 1995.
Google Scholar
M. Streit, A. Batliner, and T. Portele. Emotion Analysis and Emotion Handling Subdialogs, 2006. In this volume.
Google Scholar
I.H. Witten and E. Frank. Data Mining — Practical Machine Learning Tools and Techniques With Java Implementations. Morgan Kaufmann, San Francisco, CA, 2000.
Google Scholar
V. Zeißler, E. Nöth, and G. Stemmer. Parametrische Modellierung von Dauer und Energie prosodischer Einheiten. In: Proc. KONVENS 2002, pp. 177–183, Saarbruecken, Germany, 2002.
Google Scholar

Download references

Author information

Authors and Affiliations

Friedrich-Alexander Universität Erlangen-Nürnberg, Germany
Viktor Zeißler, Johann Adelhardt, Anton Batliner, Carmen Frank, Elmar Nöth, Rui Ping Shi & Heinrich Niemann

Authors

Viktor Zeißler
View author publications
You can also search for this author in PubMed Google Scholar
Johann Adelhardt
View author publications
You can also search for this author in PubMed Google Scholar
Anton Batliner
View author publications
You can also search for this author in PubMed Google Scholar
Carmen Frank
View author publications
You can also search for this author in PubMed Google Scholar
Elmar Nöth
View author publications
You can also search for this author in PubMed Google Scholar
Rui Ping Shi
View author publications
You can also search for this author in PubMed Google Scholar
Heinrich Niemann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

German Research Center for AI, DFKI GmbH, Stuhlsatzenhausweg 3, 66123, Saarbrücken, Germany
Wolfgang Wahlster

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zeißler, V. et al. (2006). The Prosody Module. In: Wahlster, W. (eds) SmartKom: Foundations of Multimodal Dialogue Systems. Cognitive Technologies. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-36678-4_9

Download citation

DOI: https://doi.org/10.1007/3-540-36678-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23732-7
Online ISBN: 978-3-540-36678-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics