Language Resources and Evaluation

, Volume 50, Issue 3, pp 497–521 | Cite as

The USC CreativeIT database of multimodal dyadic interactions: from speech and full body motion capture to continuous emotional annotations

  • Angeliki Metallinou
  • Zhaojun Yang
  • Chi-chun Lee
  • Carlos Busso
  • Sharon Carnicke
  • Shrikanth Narayanan
Original Paper


Improvised acting is a viable technique to study expressive human communication and to shed light into actors’ creativity. The USC CreativeIT database provides a novel, freely-available multimodal resource for the study of theatrical improvisation and rich expressive human behavior (speech and body language) in dyadic interactions. The theoretical design of the database is based on the well-established improvisation technique of Active Analysis in order to provide naturally induced affective and expressive, goal-driven interactions. This database contains dyadic theatrical improvisations performed by 16 actors, providing detailed full body motion capture data and audio data of each participant in an interaction. The carefully engineered data collection, the improvisation design to elicit natural emotions and expressive speech and body language, as well as the well-developed annotation processes provide a gateway to study and model various aspects of theatrical performance, expressive behaviors and human communication and interaction.


Multimodal database Theatrical improvisations Motion capture system Continuous emotion 



This material is based upon work supported by DARPA and Space and Naval Warfare Systems Center Pacific under Contract Number N66001-11-C-4006 and the NSF.


  1. Anolli, L., Mantovani, F., Mortillaro, M., Vescovo, A., Agliati, A., Confalonieri, L., Realdon, O., Zurloni, V., & Sacchi, A. (2005). A multimodal database as a background for emotional synthesis, recognition and training in e-learning systems. In Affective computing and intelligent interaction, pp. 566–573. Berlin: Springer.Google Scholar
  2. Audhkhasi, K., & Narayanan, S. S. (2011). Emotion classification from speech using evaluator reliability-weighted combination of ranked lists. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4956–4959.Google Scholar
  3. Audhkhasi, K., & Narayanan, S. (2013). A globally-variant locally-constant model for fusion of labels from multiple diverse experts without using reference labels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(4), 769–783.CrossRefGoogle Scholar
  4. Bachorowski, J. A., Smoski, M. J., & Owren, M. J. (2001). The acoustic features of human laughter. The Journal of the Acoustical Society of America, 110(3), 1581–1597.CrossRefGoogle Scholar
  5. Bänziger, T., & Scherer, K. R. (2007). Using actor portrayals to systematically study multimodal emotion expression: The GEMEP corpus. In Affective computing and intelligent interaction, pp. 476–487.Google Scholar
  6. Beattie, G. (2004). Visible thought: The new psychology of body language. New York: Psychology Press.Google Scholar
  7. Busso, C., & Narayanan, S. (2008). Recording audio-visual emotional databases from actors: A closer look. In Second international workshop on emotion: Corpora for research on emotion and affect, international conference on language resources and evaluation, pp. 17–22.Google Scholar
  8. Busso, C., Bulut, M., Lee, C. C., Kazemzadeh, A., Mower, E., Kim, S., et al. (2008). Iemocap: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42(4), 335–359.CrossRefGoogle Scholar
  9. Carnicke, S. M. (2009). Stanislavsky in focus: An acting master for the twenty-first century. London: Taylor & Francis.Google Scholar
  10. Cowie, R., & Sawey, M. (2011). GTrace-General trace program from Queen’s, Belfast.Google Scholar
  11. Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., & Schröder, M. (2000). ’feeltrace’: An instrument for recording perceived emotion in real time. In ISCA tutorial and research workshop (ITRW) on speech and emotion.Google Scholar
  12. Cowie, R., McKeown, G., & Douglas-Cowie, E. (2012). Tracing emotion: An overview. International Journal of Synthetic Emotions (IJSE), 3(1), 1–17.CrossRefGoogle Scholar
  13. Crane, E., & Gross, M. (2007). Motion capture and emotion: Affect detection in whole body movement. In Affective computing and intelligent interaction, pp. 95–101. Berlin: Springer.Google Scholar
  14. Devillers, L., Cowie, R., Martin, J., Douglas-Cowie, E., Abrilian, S., & McRorie, M. (2006). Real life emotions in French and English tv video clips: An integrated annotation protocol combining continuous and discrete approaches. In 5th international conference on language resources and evaluation (LREC 2006), Genoa, Italy.Google Scholar
  15. Dhall, A., Member, S., Lucey, S., & Gedeon, T. (2012). Collecting large, richly annotated facial-expression databases from movies. IEEE Multimedia, 19(3), 34–41.CrossRefGoogle Scholar
  16. Douglas-Cowie, E., Cowie, R., Sneddon, I., Cox, C., Lowry, O., Mcrorie, M., Martin, J. C., Devillers, L., Abrilian, S., Batliner, A., Amir, N., & Karpouzis, K. (2007). The humaine database: Addressing the collection and annotation of naturalistic and induced emotional data. In Affective computing and intelligent interaction, pp. 488–500. Berlin: Springer.Google Scholar
  17. Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases. Speech Communication, 40(1), 33–60.CrossRefGoogle Scholar
  18. Enos, F., & Hirschberg, J. (2006). A framework for eliciting emotional speech: Capitalizing on the actors process. In First international workshop on emotion: Corpora for research on emotion and affect (international conference on language resources and evaluation (LREC 2006)), pp. 6–10.Google Scholar
  19. Grafsgaard, J. F., Fulton, R. M., Boyer, K. E., Wiebe, E. N., & Lester, J. C. (2012). Multimodal analysis of the implicit affective channel in computer-mediated textual communication. In Proceedings of the 14th ACM international conference on multimodal interaction, pp. 145–152. New York: ACM.Google Scholar
  20. Grimm, M., Kroschel, K., & Narayanan, S. (2008). The Vera am Mittag German audio-visual emotional speech database. In 2008 IEEE international conference on multimedia and expo, pp. 865–868. New York: IEEE.Google Scholar
  21. Harrigan, J., Rosenthal, R., & Scherer, K. (2005). The new handbook of methods in nonverbal behavior research. Oxford: Oxford University Press.Google Scholar
  22. Hayworth, D. (1928). The social origin and function of laughter. Psychological Review, 35(5), 367.CrossRefGoogle Scholar
  23. Humphrey, G. (1924). The psychology of the gestalt. Journal of Educational Psychology, 15(7), 401.CrossRefGoogle Scholar
  24. Johnstone, K. (1981). Impro: Improvisation and the theatre. London: Routledge.Google Scholar
  25. Kanluan, I., Grimm, M., & Kroschel, K. (2008). Audio-visual emotion recognition using an emotion space concept. In 16th European signal processing conference, Lausanne, Switzerland.Google Scholar
  26. Kapur, A., Kapur, A., Virji-Babul, N., Tzanetakis, G., & Driessen, P. F. (2005). Gesture-based affective computing on motion capture data. In Affective computing and intelligent interaction, pp. 1–7. Berlin:Springer.Google Scholar
  27. Kelly, S. D., Kravitz, C., & Hopkins, M. (2004). Neural correlates of bimodal speech and gesture comprehension. Brain and Language, 89(1), 253–260.CrossRefGoogle Scholar
  28. Koelstra, S., Muhl, C., Soleymani, M., Lee, J. S., Yazdani, A., Ebrahimi, T., et al. (2012). Deap: A database for emotion analysis; using physiological signals. IEEE Transactions on Affective Computing, 3(1), 18–31.CrossRefGoogle Scholar
  29. Lee, C. C., Busso, C., Lee, S., & Narayanan, S. S. (2009). Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions. In INTERSPEECH, pp. 1983–1986.Google Scholar
  30. Levine, S., Theobalt, C., & Koltun, V. (2009). Real-time prosody-driven synthesis of body language. ACM Transactions on Graphics (TOG), 28(5), 172.CrossRefGoogle Scholar
  31. Lindahl, K. M. (2001). Methodological issues in family observational research. In: P. K. Kerig & K. M. Lindahl (Eds.), Family observational coding systems: Resources for systemic research (pp. 23–32). Mahwah, NJ:Lawrence Erlbaum Associates.Google Scholar
  32. Malandrakis, N., Potamianos, A., Evangelopoulos, G., & Zlatintsi, A. (2011). A supervised approach to movie emotion tracking. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2376–2379. New York: IEEE.Google Scholar
  33. McKeown, G., Curran, W., McLoughlin, C., Griffin, H. J., & Bianchi-Berthouze, N. (2013). Laughter induction techniques suitable for generating motion capture data of laughter associated body movements. In 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG), pp. 1–5. New York: IEEE.Google Scholar
  34. McKeown, G., Valstar, M., Cowie, R., Pantic, M., & Schroder, M. (2012). The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Transactions on Affective Computing, 3(1), 5–17.CrossRefGoogle Scholar
  35. Mendonca, D. J., & Wallace, W. A. (2007). A cognitive model of improvisation in emergency management. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 37(4), 547–561.CrossRefGoogle Scholar
  36. Metallinou, A., & Narayanan, S. (2013). Annotation and processing of continuous emotional attributes: Challenges and opportunities. In 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG), pp. 1–8. New York: IEEE.Google Scholar
  37. Metallinou, A., Katsamanis, A., & Narayanan, S. (2013). Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information. Image and Vision Computing, Special Issue on Continuous Affect Analysis, 31(2), 137–152. Google Scholar
  38. Metallinou, A., Katsamanis, A., Wang, Y., & Narayanan, S. (2011). Tracking changes in continuous emotion states using body language and prosodic cues. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2288–2291. New Yok: IEEE.Google Scholar
  39. Metallinou, A., Lee, C. C., Busso, C., Carnicke, S., Narayanan, S., & Tx, D. (2010). The USC CreativeIT database: A multimodal database of theatrical improvisation. In Workshop on Multimodal Corpora, LREC.Google Scholar
  40. Narayanan, S., & Georgiou, P. G. (2013). Behavioral signal processing: Deriving human behavioral informatics from speech and language. Proceedings of the IEEE, 101(5), 1203–1233.CrossRefGoogle Scholar
  41. Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., & Ng, A. Y. (2011). Multimodal deep learning. In Proceedings of the 28th international conference on machine learning (ICML-11), pp. 689–696.Google Scholar
  42. Niewiadomski, R., Hofmann, J., Urbain, J., Platt, T., Wagner, J., Piot, B., Cakmak, H., Pammi, S., Baur, T., & Dupont, S., et al. (2013). Laugh-aware virtual agent and its impact on user amusement. In Proceedings of the 2013 international conference on autonomous agents and multi-agent systems, pp. 619–626. International Foundation for Autonomous Agents and Multiagent Systems.Google Scholar
  43. Pelachaud, C., Carofiglio, V., De Carolis, B., de Rosis, F., & Poggi, I. (2002). Embodied contextual agent in information delivering application. In Proceedings of the first international joint conference on autonomous agents and multiagent systems: Part 2, pp. 758–765. New York: ACM.Google Scholar
  44. Perlin, K., & Goldberg, A. (1996). Improv: A system for scripting interactive actors in virtual worlds. In Proceedings of the 23rd annual conference on computer graphics and interactive techniques, pp. 205–216. New York: ACM.Google Scholar
  45. Sauter, D. A., Eisner, F., Ekman, P., & Scott, S. K. (2010). Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations. Proceedings of the National Academy of Sciences, 107(6), 2408–2412.CrossRefGoogle Scholar
  46. Scherer, K. R., Bänziger, T., & Roesch, E. (2010). A blueprint for affective computing: A sourcebook and manual. Oxford: Oxford University Press.Google Scholar
  47. Sneddon, I., McRorie, M., McKeown, G., & Hanratty, J. (2012). The belfast induced natural emotion database. IEEE Transactions on Affective Computing, 3(1), 32–41.CrossRefGoogle Scholar
  48. Soleymani, M., Lichtenauer, J., Pun, T., & Pantic, M. (2012). A multimodal database for affect recognition and implicit tagging. IEEE Transactions on Affective Computing, 3(1), 42–55.CrossRefGoogle Scholar
  49. Szameitat, D. P., Alter, K., Szameitat, A. J., Wildgruber, D., Sterr, A., & Darwin, C. J. (2009). Acoustic profiles of distinct emotional expressions in laughter. The Journal of the Acoustical Society of America, 126(1), 354–366.CrossRefGoogle Scholar
  50. Wallbott, H. G., & Scherer, K. R. (1986). Cues and channels in emotion recognition. Journal of Personality and Social Psychology, 51(4), 690.CrossRefGoogle Scholar
  51. Wu, S., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53(5), 768–785.CrossRefGoogle Scholar
  52. Yang, Z., & Narayanan, S. (2014). Analysis of emotional effect on speech-body gesture interplay. In Proceedings of Interspeech.Google Scholar
  53. Yang, Z., Metallinou, A., & Narayanan, S. (2013). Towards body language generation in dyadic interaction settings from interlocutor multimodal cues. In Proceedings of ICASSP.Google Scholar
  54. Yang, Z., Metallinou, A., Erzin, E., & Narayanan, S. (2014a). Analysis of interaction attitudes using data-driven hand gesture phrases. In Proceedings of ICASSP.Google Scholar
  55. Yang, Z., Metallinou, A., & Narayanan, S. (2014b). Analysis and predictive modeling of body language behavior in dyadic interactions from multimodal interlocutor cues. IEEE Transactions on Multimedia, 16, 1766–1778.Google Scholar
  56. Yang, Z., Ortega, A., & Narayanan, S. (2014c). Gesture dynamics modeling for attitude analysis using graph based transform. In Proceedings of IEEE international conference on image processing.Google Scholar
  57. Yildirim, S., Narayanan, S., & Potamianos, A. (2011). Detecting emotional state of a child in a conversational computer game. Computer, Speech, and Language, 25, 29–44.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2015

Authors and Affiliations

  • Angeliki Metallinou
    • 1
  • Zhaojun Yang
    • 2
  • Chi-chun Lee
    • 3
  • Carlos Busso
    • 4
  • Sharon Carnicke
    • 2
  • Shrikanth Narayanan
    • 2
  1. 1.Amazon Lab 126CupertinoUSA
  2. 2.University of Southern CaliforniaLos AngelesUSA
  3. 3.National Tsing Hua UniversityHsinchuTaiwan
  4. 4.University of Texas at DallasRichardsonUSA

Personalised recommendations