Disposition Recognition from Spontaneous Speech Towards a Combination with Co-speech Gestures

Böck, Ronald; Bergmann, Kirsten; Jaecks, Petra

doi:10.1007/978-3-319-15557-9_6

Ronald Böck⁸,
Kirsten Bergmann⁹ &
Petra Jaecks¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8757))

Included in the following conference series:

International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction

866 Accesses
1 Citations

Abstract

Speech as well as co-speech gestures are an integral part of human communicative behaviour. Furthermore, the way how these modalities influence each other and finally, reflect a speaker’s dispositional state is an important aspect of research in Human-Machine-Interaction. So far, just little is known, however, about the simultaneous investigation of both modalities. The EmoGest corpus is a novel data set addressing how emotions or dispositions manifest themselves in co-speech gestures. Participants were primed to be happy, neutral, or sad and afterwards, explain tangram figures to an experimenter. We employed this corpus to conduct disposition recognition from speech data as an evaluation of emotion priming. For the analysis, we based the classification on meaningful features already successfully applied in emotion recognition. In disposition recognition from speech, we achieved remarkable classification accuracy. These results provide the basis for a detailed disposition-related analyses of gestural behaviour, also in combination with speech. In general, the necessity of multimodal investigations of disposition is indicated which then will be heading towards an improvement of overall performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anagnostopoulos, C.N., Iliou, T., Giannoukos, I.: Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artif. Intell. Rev. 43(2), 155–177 (2015)
Article Google Scholar
Bergmann, K., Böck, R., Jaecks, P.: Emogest: investigating the impact of emotions on spontaneous co-speech gestures. In: Edlund, J., Heylen, D., Paggio, P. (eds.) Proceedings of the Workshop on Multimodal Corpora 2013: Multimodal Corpora: Combining Applied and Basic Research Targets, pp. 13–16. LREC, Reykjavik, Island (2014)
Google Scholar
Böck, R., Limbrecht-Ecklundt, K., Siegert, I., Walter, S., Wendemuth, A.: Audio-based pre-classification for semi-automatic facial expression coding. In: Kurosu, M. (ed.) HCII/HCI 2013, Part V. LNCS, vol. 8008, pp. 301–309. Springer, Heidelberg (2013)
Google Scholar
Böck, R.: Multimodal Automatic User Disposition Recognition in Human-Machine Interaction. Ph.D. thesis, Otto von Guericke University Magdeburg (2013)
Google Scholar
Böck, R., Hübner, D., Wendemuth, A.: Determining optimal signal features and parameters for hmm-based emotion classification. In: Proceedings of the 15th IEEE Mediterranean Electrotechnical Conference, pp. 1586–1590. IEEE, Valletta, Malta (2010)
Google Scholar
Boersma, P., Weenink, D.: Praat: Doing phonetics by computer (2011)
Google Scholar
Carroll, J.M.: Human Computer Interaction - brief intro, 2nd edn. The Interaction Design Foundation, Aarhus, Denmark (2013). http://www.interaction-design.org/encyclopedia/human_computer_interaction_hci.html
Castellano, G., Villalba, S.D., Camurri, A.: Recognising human emotions from body movement and gesture dynamics. In: Paiva, A.C.R., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738, pp. 71–82. Springer, Heidelberg (2007)
Chapter Google Scholar
Chaplin, J.P.: Dictionary of Psychology. Random House Publishing Group, New York (2010)
Google Scholar
Eerola, T., Vuoskoski, J.K.: A comparison of the discrete and dimensional models of emotion in music. Psychol. Music 39, 18–49 (2011)
Article Google Scholar
Grimm, M., Kroschel, K., Narayanan, S.: The Vera am Mittag German audio-visual emotional speech database. In: Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, pp. 865–868. IEEE (2008)
Google Scholar
Hunter, P.G., Schellenberg, E.G., Schimmack, U.: Mixed affective responses to music with conflicting cues. Cogn. Emot. 22(2), 327–352 (2008)
Article Google Scholar
Kendon, A.: Gesture: Visible Action as Utterance. Cambridge University Press, New York (2004)
Book Google Scholar
Kipp, M., Martin, J.C.: Gesture and emotion: can basic gestural form features discrminate emotions? In: Cohn, J., Nijholt, A., Pantic, M. (eds.) Proceedings of the International Conference on Affective Computing and Intelligent Interaction (ACII-09). IEEE Press (2009)
Google Scholar
Martin, O., Kotsia, I., Macq, B., Pitas, I.: The eNTERFACE 2005 audio-visual emotion database. In: Proceedings of the 22nd International Conference on Data Engineering Workshop (2006)
Google Scholar
Matthews, G., Jones, D., Chamberlain, A.: Refining the measurement of mood: the UWIST mood adjective checklist. Br. J. Psychol. 81, 17–42 (1990)
Article Google Scholar
McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schroder, M.: The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans. Affect. Comput. 3(1), 5–17 (2012)
Article Google Scholar
McNeill, D.: Gesture and Thought. Phoenix Poets Series. University of Chicago Press, Chicago (2008)
Google Scholar
Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden markov models. Speech Commun. 41(4), 603–623 (2003)
Article Google Scholar
Oldfield, R.C.: The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia 9(1), 97–113 (1971)
Article Google Scholar
Paulus, C.: Der Saarbrücker Persönlichkeitsfragebogen (IRI) zur Messung von Empathie. Psychometrische evaluation der deutschen Version des interpersonal reactivity index (the Saarbrücken personality questionnaire (IRI) for measuring empathy: A psychometric evaluation of the German version of the interpersonal reactivity index) (2009)
Google Scholar
Rammstedt, B., John, O.P.: Kurzversion des big five inventory (BFI-K): Entwicklung und Validierung eines ökonomischen Inventars zur Erfassung der fünf Faktoren der Persönlichkeit. Diagnostika 51, 195–206 (2005)
Article Google Scholar
Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., Wendemuth, A.: Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop. ASRU 2009, Merano, Italy, pp. 552–557 (2009)
Google Scholar
Schuller, B., Vlasenko, B., Minguez, R., Rigoll, G., Wendemuth, A.: Comparing one and two-stage acoustic modeling in the recognition of emotion in speech. In: 2007 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 596–600 (2007)
Google Scholar
Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. 53(9–10), 1062–1087 (2011)
Article Google Scholar
Siegert, I., Haase, M., Prylipko, D., Wendemuth, A.: Discourse particles and user characteristics in naturalistic human-computer interaction. In: Kurosu, M. (ed.) HCI 2014, Part II. LNCS, vol. 8511, pp. 492–501. Springer, Heidelberg (2014)
Google Scholar
Siegert, I., Philippou-Hübner, D., Hartmann, K., Böck, R., Wendemuth, A.: Investigations on speaker group dependent modelling for affect recognition from speech. Cogn. Comput. Special Issue: Model. Emot. Behav. Context 6(4), 892–913 (2014)
Article Google Scholar
Traue, H.C., Ohl, F., Brechmann, A., Schwenker, F., Kessler, H., Limbrecht, K., Hoffman, H., Scherer, S., Kotzyba, M., Scheck, A., Walter, S.: A framework for emotions and dispositions in man-companion interaction. In: Rojc, M., Campbell, N. (eds.) Converbal Synchrony in Human-Machine Interaction, pp. 98–140. CRC Press, Boca Raton (2013)
Google Scholar
Vlasenko, B., Prylipko, D., Böck, R., Wendemuth, A.: Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications. Comput. Speech Lang. 28(2), 483–500 (2014)
Article Google Scholar
Vlasenko, B., Philippou-Hübner, D., Prylipko, D., Böck, R., Siegert, I., Wendemuth, A.: Vowels formants analysis allows straightforward detection of high arousal emotions. In: 2011 IEEE International Conference on Multimedia and Expo (ICME) (2011)
Google Scholar
Vogt, T., André, E.: Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In: IEEE International Conference on Multimedia and Expo 2005, pp. 474–477. IEEE, Amsterdam (2005)
Google Scholar
Wojcicki, K.: writehtk. In: Voicebox Toolbox (2011). http://www.mathworks.com/matlabcentral/fileexchange/32849-htk-mfcc-matlab/content/mfcc/writehtk.m. Accessed 10 July 2014
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book, version 3.4. Cambridge University Engineering Department, Cambridge (2009)
Google Scholar
Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)
Article Google Scholar
Zentner, M., Grandjean, D., Scherer, K.: Emotions evoked by the sound of music: characterization, classification, and measurement. Emotion 8(4), 494–521 (2008)
Article Google Scholar

Download references

Acknowledgement

We acknowledge continued support by the Transregional Collaborative Research Centre SFB/TRR 62 “Companion-Technology for Cognitive Technical Systems” and the Collaborative Research Centre SFB 673 “Alignment in Communication” both funded by the German Research Foundation (DFG). We also acknowledge the DFG for financing our computing cluster used for parts of this work. Furthermore, we thank Sören Klett and Ingo Siegert for fruitful discussions and support.

Author information

Authors and Affiliations

Faculty of Electrical Engineering and Information Technology, Otto von Guericke University, P.O. Box 4120, 39016, Magdeburg, Germany
Ronald Böck
Faculty of Technology, Bielefeld University, P.O. Box 100 131, 33501, Bielefeld, Germany
Kirsten Bergmann
Faculty of Linguistics and Literary Studies, Bielefeld University, P.O. Box 100 131, 33501, Bielefeld, Germany
Petra Jaecks

Authors

Ronald Böck
View author publications
You can also search for this author in PubMed Google Scholar
Kirsten Bergmann
View author publications
You can also search for this author in PubMed Google Scholar
Petra Jaecks
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ronald Böck .

Editor information

Editors and Affiliations

Otto von Guericke University, Magdeburg, Germany
Ronald Böck
Trinity College, Dublin, Ireland
Francesca Bonin
Trinity College, Dublin, Ireland
Nick Campbell
Utrecht University, Utrecht, The Netherlands
Ronald Poppe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Böck, R., Bergmann, K., Jaecks, P. (2015). Disposition Recognition from Spontaneous Speech Towards a Combination with Co-speech Gestures. In: Böck, R., Bonin, F., Campbell, N., Poppe, R. (eds) Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction. MA3HMI 2014. Lecture Notes in Computer Science(), vol 8757. Springer, Cham. https://doi.org/10.1007/978-3-319-15557-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-15557-9_6
Published: 12 February 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15556-2
Online ISBN: 978-3-319-15557-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics