Advertisement

Disposition Recognition from Spontaneous Speech Towards a Combination with Co-speech Gestures

  • Ronald Böck
  • Kirsten Bergmann
  • Petra Jaecks
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8757)

Abstract

Speech as well as co-speech gestures are an integral part of human communicative behaviour. Furthermore, the way how these modalities influence each other and finally, reflect a speaker’s dispositional state is an important aspect of research in Human-Machine-Interaction. So far, just little is known, however, about the simultaneous investigation of both modalities. The EmoGest corpus is a novel data set addressing how emotions or dispositions manifest themselves in co-speech gestures. Participants were primed to be happy, neutral, or sad and afterwards, explain tangram figures to an experimenter. We employed this corpus to conduct disposition recognition from speech data as an evaluation of emotion priming. For the analysis, we based the classification on meaningful features already successfully applied in emotion recognition. In disposition recognition from speech, we achieved remarkable classification accuracy. These results provide the basis for a detailed disposition-related analyses of gestural behaviour, also in combination with speech. In general, the necessity of multimodal investigations of disposition is indicated which then will be heading towards an improvement of overall performance.

Keywords

Human-machine-interaction Disposition recognition from speech Co-speech gestures Naturalistic human-machine-interaction 

Notes

Acknowledgement

We acknowledge continued support by the Transregional Collaborative Research Centre SFB/TRR 62 “Companion-Technology for Cognitive Technical Systems” and the Collaborative Research Centre SFB 673 “Alignment in Communication” both funded by the German Research Foundation (DFG). We also acknowledge the DFG for financing our computing cluster used for parts of this work. Furthermore, we thank Sören Klett and Ingo Siegert for fruitful discussions and support.

References

  1. 1.
    Anagnostopoulos, C.N., Iliou, T., Giannoukos, I.: Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artif. Intell. Rev. 43(2), 155–177 (2015)CrossRefGoogle Scholar
  2. 2.
    Bergmann, K., Böck, R., Jaecks, P.: Emogest: investigating the impact of emotions on spontaneous co-speech gestures. In: Edlund, J., Heylen, D., Paggio, P. (eds.) Proceedings of the Workshop on Multimodal Corpora 2013: Multimodal Corpora: Combining Applied and Basic Research Targets, pp. 13–16. LREC, Reykjavik, Island (2014)Google Scholar
  3. 3.
    Böck, R., Limbrecht-Ecklundt, K., Siegert, I., Walter, S., Wendemuth, A.: Audio-based pre-classification for semi-automatic facial expression coding. In: Kurosu, M. (ed.) HCII/HCI 2013, Part V. LNCS, vol. 8008, pp. 301–309. Springer, Heidelberg (2013) Google Scholar
  4. 4.
    Böck, R.: Multimodal Automatic User Disposition Recognition in Human-Machine Interaction. Ph.D. thesis, Otto von Guericke University Magdeburg (2013)Google Scholar
  5. 5.
    Böck, R., Hübner, D., Wendemuth, A.: Determining optimal signal features and parameters for hmm-based emotion classification. In: Proceedings of the 15th IEEE Mediterranean Electrotechnical Conference, pp. 1586–1590. IEEE, Valletta, Malta (2010)Google Scholar
  6. 6.
    Boersma, P., Weenink, D.: Praat: Doing phonetics by computer (2011)Google Scholar
  7. 7.
    Carroll, J.M.: Human Computer Interaction - brief intro, 2nd edn. The Interaction Design Foundation, Aarhus, Denmark (2013). http://www.interaction-design.org/encyclopedia/human_computer_interaction_hci.html
  8. 8.
    Castellano, G., Villalba, S.D., Camurri, A.: Recognising human emotions from body movement and gesture dynamics. In: Paiva, A.C.R., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738, pp. 71–82. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  9. 9.
    Chaplin, J.P.: Dictionary of Psychology. Random House Publishing Group, New York (2010) Google Scholar
  10. 10.
    Eerola, T., Vuoskoski, J.K.: A comparison of the discrete and dimensional models of emotion in music. Psychol. Music 39, 18–49 (2011)CrossRefGoogle Scholar
  11. 11.
    Grimm, M., Kroschel, K., Narayanan, S.: The Vera am Mittag German audio-visual emotional speech database. In: Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, pp. 865–868. IEEE (2008)Google Scholar
  12. 12.
    Hunter, P.G., Schellenberg, E.G., Schimmack, U.: Mixed affective responses to music with conflicting cues. Cogn. Emot. 22(2), 327–352 (2008)CrossRefGoogle Scholar
  13. 13.
    Kendon, A.: Gesture: Visible Action as Utterance. Cambridge University Press, New York (2004) CrossRefGoogle Scholar
  14. 14.
    Kipp, M., Martin, J.C.: Gesture and emotion: can basic gestural form features discrminate emotions? In: Cohn, J., Nijholt, A., Pantic, M. (eds.) Proceedings of the International Conference on Affective Computing and Intelligent Interaction (ACII-09). IEEE Press (2009)Google Scholar
  15. 15.
    Martin, O., Kotsia, I., Macq, B., Pitas, I.: The eNTERFACE 2005 audio-visual emotion database. In: Proceedings of the 22nd International Conference on Data Engineering Workshop (2006)Google Scholar
  16. 16.
    Matthews, G., Jones, D., Chamberlain, A.: Refining the measurement of mood: the UWIST mood adjective checklist. Br. J. Psychol. 81, 17–42 (1990)CrossRefGoogle Scholar
  17. 17.
    McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schroder, M.: The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans. Affect. Comput. 3(1), 5–17 (2012)CrossRefGoogle Scholar
  18. 18.
    McNeill, D.: Gesture and Thought. Phoenix Poets Series. University of Chicago Press, Chicago (2008) Google Scholar
  19. 19.
    Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden markov models. Speech Commun. 41(4), 603–623 (2003)CrossRefGoogle Scholar
  20. 20.
    Oldfield, R.C.: The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia 9(1), 97–113 (1971)CrossRefGoogle Scholar
  21. 21.
    Paulus, C.: Der Saarbrücker Persönlichkeitsfragebogen (IRI) zur Messung von Empathie. Psychometrische evaluation der deutschen Version des interpersonal reactivity index (the Saarbrücken personality questionnaire (IRI) for measuring empathy: A psychometric evaluation of the German version of the interpersonal reactivity index) (2009)Google Scholar
  22. 22.
    Rammstedt, B., John, O.P.: Kurzversion des big five inventory (BFI-K): Entwicklung und Validierung eines ökonomischen Inventars zur Erfassung der fünf Faktoren der Persönlichkeit. Diagnostika 51, 195–206 (2005)CrossRefGoogle Scholar
  23. 23.
    Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., Wendemuth, A.: Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop. ASRU 2009, Merano, Italy, pp. 552–557 (2009)Google Scholar
  24. 24.
    Schuller, B., Vlasenko, B., Minguez, R., Rigoll, G., Wendemuth, A.: Comparing one and two-stage acoustic modeling in the recognition of emotion in speech. In: 2007 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 596–600 (2007)Google Scholar
  25. 25.
    Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. 53(9–10), 1062–1087 (2011)CrossRefGoogle Scholar
  26. 26.
    Siegert, I., Haase, M., Prylipko, D., Wendemuth, A.: Discourse particles and user characteristics in naturalistic human-computer interaction. In: Kurosu, M. (ed.) HCI 2014, Part II. LNCS, vol. 8511, pp. 492–501. Springer, Heidelberg (2014) Google Scholar
  27. 27.
    Siegert, I., Philippou-Hübner, D., Hartmann, K., Böck, R., Wendemuth, A.: Investigations on speaker group dependent modelling for affect recognition from speech. Cogn. Comput. Special Issue: Model. Emot. Behav. Context 6(4), 892–913 (2014)CrossRefGoogle Scholar
  28. 28.
    Traue, H.C., Ohl, F., Brechmann, A., Schwenker, F., Kessler, H., Limbrecht, K., Hoffman, H., Scherer, S., Kotzyba, M., Scheck, A., Walter, S.: A framework for emotions and dispositions in man-companion interaction. In: Rojc, M., Campbell, N. (eds.) Converbal Synchrony in Human-Machine Interaction, pp. 98–140. CRC Press, Boca Raton (2013)Google Scholar
  29. 29.
    Vlasenko, B., Prylipko, D., Böck, R., Wendemuth, A.: Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications. Comput. Speech Lang. 28(2), 483–500 (2014)CrossRefGoogle Scholar
  30. 30.
    Vlasenko, B., Philippou-Hübner, D., Prylipko, D., Böck, R., Siegert, I., Wendemuth, A.: Vowels formants analysis allows straightforward detection of high arousal emotions. In: 2011 IEEE International Conference on Multimedia and Expo (ICME) (2011)Google Scholar
  31. 31.
    Vogt, T., André, E.: Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In: IEEE International Conference on Multimedia and Expo 2005, pp. 474–477. IEEE, Amsterdam (2005)Google Scholar
  32. 32.
    Wojcicki, K.: writehtk. In: Voicebox Toolbox (2011). http://www.mathworks.com/matlabcentral/fileexchange/32849-htk-mfcc-matlab/content/mfcc/writehtk.m. Accessed 10 July 2014
  33. 33.
    Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book, version 3.4. Cambridge University Engineering Department, Cambridge (2009)Google Scholar
  34. 34.
    Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)CrossRefGoogle Scholar
  35. 35.
    Zentner, M., Grandjean, D., Scherer, K.: Emotions evoked by the sound of music: characterization, classification, and measurement. Emotion 8(4), 494–521 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Faculty of Electrical Engineering and Information TechnologyOtto von Guericke UniversityMagdeburgGermany
  2. 2.Faculty of TechnologyBielefeld UniversityBielefeldGermany
  3. 3.Faculty of Linguistics and Literary StudiesBielefeld UniversityBielefeldGermany

Personalised recommendations