EmoVoice — A Framework for Online Recognition of Emotions from Voice

  • Thurid Vogt
  • Elisabeth André
  • Nikolaus Bee
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5078)


We present EmoVoice, a framework for emotional speech corpus and classifier creation and for offline as well as real-time online speech emotion recognition. The framework is intended to be used by non-experts and therefore comes with an interface to create an own personal or application specific emotion recogniser. Furthermore, we describe some applications and prototypes that already use our framework to track online emotional user states from voice information.


Recognition Rate Emotion Recognition Humanoid Robot Spontaneous Speech Virtual Agent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Ai et al., 2006]
    Ai, H., Litman, D.J., Forbes-Riley, K., Rotaru, M., Tetreault, J., Purandare, A.: Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In: Proceedings of Interspeech 2006 — ICSLP, Pittsburgh, PA, USA (2006)Google Scholar
  2. [Batliner et al., 2004]
    Batliner, A., Hacker, C., Steidl, S., Nöth, E., D’Arcy, S., Russell, M., Wong, M.: ”You stupid tin box” - children interacting with the AIBO robot: A cross-linguistic emotional speech corpus. In: Proceedings of the 4th International Conference of Language Resources and Evaluation LREC 2004, Lisbon, pp. 171–174 (2004)Google Scholar
  3. [Batliner et al., 2006]
    Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: Combining efforts for improving automatic classification of emotional user states. In: Proc. IS-LTC 2006, Ljubljana, Slovenia (2006)Google Scholar
  4. [Boersma and Weenink, 2007]
    Boersma, P., Weenink, D.: Praat: doing phonetics by computer (version 4.5.15) [computer program] (2007) (Retrieved 24.02.2007)
  5. [Burkhardt et al., 2005a]
    Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: Proceedings of Interspeech 2005, Lisbon, Portugal (2005a)Google Scholar
  6. [Burkhardt et al., 2005b]
    Burkhardt, F., van Ballegooy, M., Englert, R., Huber, R.: An emotion-aware voice portal. In: Electronic Speech Signal Processing Conference, Prague, Czech Republic (2005b)Google Scholar
  7. [Chang and Lin, 2001]
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001),
  8. [Charles et al., 2007]
    Charles, F., Lemercier, S., Vogt, T., Bee, N., Mancini, M., Urbain, J., Price, M., André, E., Pelachaud, C., Cavazza, M.: Affective interactive narrative in the callas project. In: Demo paper in Proceedings of the 4th International Conference on Virtual Storytelling, Saint Malo, France (2007)Google Scholar
  9. [de Rosis et al., 2003]
    de Rosis, F., Pelachaud, C., Poggi, I., Carofiglio, V., de Carolis, B.: From Greta’s mind to her face: modelling the dynamics of affective states in a conversational embodied agent. International Journal of Human-Computer Studies 59, 81–118 (2003)CrossRefGoogle Scholar
  10. [Fink, 1999]
    Fink, G.: Developing HMM-based recognizers with esmeralda. In: Matoušek, V., et al. (eds.). Lecture notes in Artificial Intelligence, vol. 1962, pp. 229–234. Springer, Heidelberg (1999)Google Scholar
  11. [Gilroy et al., 2007]
    Gilroy, S.W., Cavazza, M., Chaignon, R., Mäkelä, S.-M., Niiranen, M., André, E., Vogt, T., Billinghurst, M., Seichter, H., Benayoun, M.: An emotionally responsive AR art installation. In: Proceedings of ISMAR Workshop 2: Mixed Reality Entertainment and Art, Nara, Japan (2007)Google Scholar
  12. [Hall, 1998]
    Hall, M.A.: Correlation-based feature subset selection for machine learning. Master’s thesis, University of Waikato, New Zealand (1998)Google Scholar
  13. [Hegel et al., 2006]
    Hegel, F., Spexard, T., Vogt, T., Horstmann, G., Wrede, B.: Playing a different imitation game: Interaction with an empathic android robot. In: Proc. 2006 IEEE-RAS International Conference on Humanoid Robots (Humanoids 2006) (2006)Google Scholar
  14. [Jones and Deeming, 2007]
    Jones, C., Deeming, A.: Affective human-robotic interaction. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2007)Google Scholar
  15. [Jones and Sutherland, 2007]
    Jones, C., Sutherland, J.: Acoustic emotion recognition for affective computer gaming. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2007)Google Scholar
  16. [Kim et al., 2005]
    Kim, J., André, E., Rehm, M., Vogt, T., Wagner, J.: Integrating information from speech and physiological signals to achieve emotional sensitivity. In: Proceedings of Interspeech 2005, Lisbon, Portugal (2005)Google Scholar
  17. [Madan, 2005]
    Madan, A.: Jerk-O-Meter: Speech-Feature Analysis Provides Feedback on Your Phone Interactions (2005) (retrieved: 28.06.2007),
  18. [Oudeyer, 2003]
    Oudeyer, P.-Y.: The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies 59(1–2), 157–183 (2003)Google Scholar
  19. [Rehm et al., 2008]
    Rehm, M., Vogt, T., Wissner, M., Bee, N.: Dancing the night away — controlling a virtual karaoke dancer by multimodal expressive cues. In: Proceedings of AAMAS 2008 (2008)Google Scholar
  20. [Schiel et al., 2002]
    Schiel, F., Steininger, S., Türk, U.: The SmartKom multimodal corpus at BAS. In: Proceedings of the 3rd Language Resources & Evaluation Conference (LREC) 2002, Las Palmas, Gran Canaria, Spain, pp. 200–206 (2002)Google Scholar
  21. [Schuller et al., 2007a]
    Schuller, B., Rigoll, G., Grimm, M., Kroschel, K., Moosmayr, T., Ruske, G.: Effects of in-car noise-conditions on the recognition of emotion within speech. In: Proc. of the DAGA 2007, Stuttgart, Germany (2007a)Google Scholar
  22. [Schuller et al., 2007b]
    Schuller, B., Seppi, D., Batliner, A., Maier, A., Steidl, S.: Towards more reality in the recognition of emotional speech. In: IEEE (ed.) Proc. ICASSP 2007, Honolulu, Hawaii, USA, vol. 2, pp. 941–944 (2007b)Google Scholar
  23. [Velten, 1968]
    Velten, E.: A laboratory task for induction of mood states. Behavior Research & Therapy 6, 473–482 (1968)CrossRefGoogle Scholar
  24. [Vogt and André, 2005]
    Vogt, T., André, E.: Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In: Proceedings of International Conference on Multimedia & Expo, Amsterdam, The Netherlands (2005)Google Scholar
  25. [Vogt and André, 2006]
    Vogt, T., André, E.: Improving automatic emotion recognition from speech via gender differentiation. In: Proc. Language Resources and Evaluation Conference (LREC 2006), Genoa (2006)Google Scholar
  26. [Vogt et al., 2007]
    Vogt, T., André, E., Wagner, J.: Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realisation. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2007)Google Scholar
  27. [Wilting et al., 2006]
    Wilting, J., Krahmer, E., Swerts, M.: Real vs. acted emotional speech. In: Proceedings of Interspeech 2006 — ICSLP, Pittsburgh, PA, USA (2006)Google Scholar
  28. [Witten and Frank, 2005]
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools with Java implementations, 2nd edn. Morgan Kaufmann, San Francisco (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Thurid Vogt
    • 1
  • Elisabeth André
    • 1
  • Nikolaus Bee
    • 1
  1. 1.Multimedia Concepts and their ApplicationsUniversity of AugsburgGermany

Personalised recommendations