Summary
In multimodal dialogue systems, several input and output modalities are used for user interaction. The most important modality for human computer interaction is speech. Similar to human human interaction, it is necessary in the human computer interaction that the machine recognizes the spoken word chain in the user’s utterance. For better communication with the user it is advantageous to recognize his internal emotional state because it is then possible to adapt the dialogue strategy to the situation in order to reduce, for example, anger or uncertainty of the user.
In the following sections we describe first the state of the art in emotion and user state recognition with the help of prosody. The next section describes the prosody module. After that we present the experiments and results for recognition of user states. We summarize our results in the last section.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. Adelhardt, C. Frank, E. Nöth, R.P. Shi, V. Zeißler, and H. Niemann. Multimodal Emogram, Data Collection and Presentation, 2006. In this volume.
J. Adelhardt, R.P. Shi, C. Frank, V. Zeißler, A. Batliner, E. Nöth, and H. Niemann. Multimodal User State Recognition in a Modern Dialogue System. In: Proc. 26th German Conference on Artificial Intelligence (KI 03), pp. 591–605, Berlin Heidelberg New York, 2003. Springer.
N. Amir and S. Ron. Towards an Automatic Classification of Emotions in Speech. In: Proc. ICSLP-98, vol. 3, pp. 555–558, Sydney, Australia, 1998.
S. Arunachalam, D. Gould, E. Andersen, D. Byrd, and S. Narayanan. Politeness and Frustration Language in Child-Machine Interactions. In: Proc. EUROSPEECH-01, pp. 2675–2678, Aalborg, Denmark, September 2001.
A. Batliner, A. Buckow, H. Niemann, E. Nöth, and V. Warnke. The Prosody Module. In: W. Wahlster (ed.), Verbmobil: Foundations of Speech-to-Speech Translation, pp. 106–121, Berlin Heidelberg New York, 2000a. Springer.
A. Batliner, R. Huber, H. Niemann, E. Nöth, J. Spilker, and K. Fischer. The Recognition of Emotion. In: W. Wahlster (ed.), Verbmobil: Foundations of Speech-to-Speech Translation, pp. 122–130, Berlin Heidelberg New York, 2000b. Springer.
A. Batliner, M. Nutt, V. Warnke, E. Nöth, J. Buckow, R. Huber, and H. Niemann. Automatic Annotation and Classification of Phrase Accents in Spontaneous Speech. In: Proc. EUROSPEECH-99, vol. 1, pp. 519–522, Budapest, Hungary, 1999.
A. Batliner, V. Zeißler, C. Frank, J. Adelhardt, R.P. Shi, E. Nöth, and H. Niemann. We Are Not Amused — But How Do You Know? User States in a Multi-Modal Dialogue System. In: Proc. EUROSPEECH-03, vol. 1, pp. 733–736, Geneva, Switzerland, 2003.
F. Dellaert, T. Polzin, and A. Waibel. Recognizing Emotion in Speech. In: Proc. ICSLP-96, vol. 3, pp. 1970–1973, Philadelphia, PA, 1996.
G. Herzog and A. Ndiaye. Building Multimodal Dialogue Applications: System Integration in SmartKom, 2006. In this volume.
R. Huber. Prosodisch-linguistische Klassifikation von Emotion, vol. 8 of Studien zur Mustererkennung. Logos, Berlin, Germany, 2002.
R. Huber, E. Nöth, A. Batliner, A. Buckow, V. Warnke, and H. Niemann. You BEEP Machine — Emotion in Automatic Speech Understanding Systems. In: TSD98, pp. 223–228, Brno, Czech Republic, 1998.
A. Kießling. Extraktion und Klassifikation prosodischer Merkmale in der automatischen Sprachverarbeitung. Shaker, Aachen, Germany, 1997.
C.M. Lee, S.S. Narayanan, and R. Pieraccini. Combining Acoustic And Language Information For Emotion Recognition. In: Proc. ICSLP-2002, pp. 873–876, Denver, CO, 2002.
Y. Li and Y. Zhao. Recognizing Emotions in Speech Using Short-Term and Long-Term Features. In: Proc. ICSLP-98, vol. 6, pp. 2255–2258, Sydney, Australia, 1998.
M.J. Norusis. SPSS 8.0 Guide to Data Analysis. Prentice Hall, Upper Saddle River, NJ, 1998.
E. Nöth, A. Batliner, A. Kießling, R. Kompe, and H. Niemann. Verbmobil: The Use of Prosody in the Linguistic Components of a Speech Understanding System. IEEE Transactions on Speech and Audio Processing, 8(5):519–532, 2000.
V.A. Petrushin. Emotion Recognition in Speech Signal: Experimental Study, Development, and Application. In: Proc. ICSLP-2000, vol. IV, pp. 222–225, Beijing, China, 2000.
R.W. Picard (ed.). Affective Computing. MIT Press, Cambridge, MA, 1997.
M. Riedmiller and H. Braun. A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm. In: Proc. IEEE Intl. Conf. on Neural Networks, pp. 586–591, San Francisco, CA, 1993.
F. Schiel and U. Türk. Wizard-of-Oz Recordings, 2006. In this volume.
E.G. Schukat-Talamazzini. Automatische Spracherkennung — Grundlagen, statistische Modelle und effiziente Algorithmen. Vieweg, Braunschweig, Germany, 1995.
M. Streit, A. Batliner, and T. Portele. Emotion Analysis and Emotion Handling Subdialogs, 2006. In this volume.
I.H. Witten and E. Frank. Data Mining — Practical Machine Learning Tools and Techniques With Java Implementations. Morgan Kaufmann, San Francisco, CA, 2000.
V. Zeißler, E. Nöth, and G. Stemmer. Parametrische Modellierung von Dauer und Energie prosodischer Einheiten. In: Proc. KONVENS 2002, pp. 177–183, Saarbruecken, Germany, 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Zeißler, V. et al. (2006). The Prosody Module. In: Wahlster, W. (eds) SmartKom: Foundations of Multimodal Dialogue Systems. Cognitive Technologies. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-36678-4_9
Download citation
DOI: https://doi.org/10.1007/3-540-36678-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23732-7
Online ISBN: 978-3-540-36678-2
eBook Packages: Computer ScienceComputer Science (R0)