KI - Künstliche Intelligenz

, Volume 31, Issue 4, pp 357–361 | Cite as

Red Hen Lab: Dataset and Tools for Multimodal Human Communication Research

  • Jungseock JooEmail author
  • Francis F. Steen
  • Mark Turner
Research Project


Researchers in the fields of AI and Communication both study human communication, but despite the opportunities for collaboration, they rarely interact. Red Hen Lab is dedicated to bringing them together for research on multimodal communication, using multidisciplinary teams working on vast ecologically-valid datasets. This article introduces Red Hen Lab with some possibilities for collaboration, demonstrating the utility of a variety of machine learning and AI-based tools and methods to fundamental research questions in multimodal human communication. Supplemental materials are at


Multimodal communication Non-verbal communication Face and gesture 


  1. 1.
    Fong T, Nourbakhsh I, Dautenhahn K (2003) A survey of socially interactive robots. Robot Auton Syst 42(3):143–166CrossRefzbMATHGoogle Scholar
  2. 2.
    Jaimes A, Sebe N (2007) Multimodal human–computer interaction: a survey. Comput Vis Image Underst 108(1):116–134CrossRefGoogle Scholar
  3. 3.
    Ende T, Haddadin S, Parusel S, Wüsthoff T, Hassenzahl M, Albu-Schäffer A (2011) A human-centered approach to robot gesture based communication within collaborative working processes. In: Intelligent robots and systems (IROS), 2011 IEEE/RSJ international conference on, pp 3367–3374. IEEEGoogle Scholar
  4. 4.
    Gleeson B, MacLean K, Haddadi A, Croft E, Alcazar J (2013) Gestures for industry: intuitive human–robot communication from human observation. In: Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction, pp 349–356. IEEE PressGoogle Scholar
  5. 5.
    Yanik PM, Manganelli J, Merino J, Threatt AL, Brooks JO, Green KE, Walker ID (2014) A gesture learning interface for simulated robot path shaping with a human teacher. IEEE Trans Hum Mach Syst 44(1):41–54CrossRefGoogle Scholar
  6. 6.
    Chen LS, Huang TS (2000) Emotional expressions in audiovisual human computer interaction. In: Multimedia and Expo, 2000. ICME 2000. 2000 IEEE international conference on, vol 1, pp 423–426. IEEEGoogle Scholar
  7. 7.
    Busso C, Deng Z, Yildirim S, Bulut M, Lee CM, Kazemzadeh A, Lee S, Neumann U, Narayanan S (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the 6th international conference on multimodal interfaces, pp 205–211. ACMGoogle Scholar
  8. 8.
    Pantic M, Sebe N, Cohn JF, Huang T (2005) Affective multimodal human–computer interaction. In: Proceedings of the 13th annual ACM international conference on multimedia, pp 669–676. ACMGoogle Scholar
  9. 9.
    Caridakis G, Castellano G, Kessous L, Raouzaiou A, Malatesta L, Asteriadis S, Karpouzis K (2007) Multimodal emotion recognition from expressive faces, body gestures and speech. Artificial intelligence and innovations 2007: from theory to applications, pp 375–388Google Scholar
  10. 10.
    Soleymani M, Pantic M, Pun T (2012) Multimodal emotion recognition in response to videos. IEEE Trans Affect Comput 3(2):211–223CrossRefGoogle Scholar
  11. 11.
    Suchan J, Bhatt M (2016) Semantic question-answering with video and eye-tracking data: AI foundations for human visual perception driven cognitive film studies. IJCAI, pp 2633–2639Google Scholar
  12. 12.
    Suchan J, Bhatt M (2016) The geometry of a scene: on deep semantics for visual perception driven cognitive film studies. WACV, pp 1–9Google Scholar
  13. 13.
    Cassell J, Kopp S, Tepper P, Ferriman K, Striegnitz K (2007) Trading spaces: how humans and humanoids use speech and gesture to give directions. Conversational informatics, pp 133–160Google Scholar
  14. 14.
    Kopp S, Bergmann K, Wachsmuth I (2008) Multimodal communication from multimodal thinking towards an integrated model of speech and gesture production. Int J Semant Comput 2(01):115–136CrossRefGoogle Scholar
  15. 15.
    Marsella S, Xu Y, Lhommet M, Feng A, Scherer S, Shapiro A (2013) Virtual character performance from speech. In: Proceedings of the 12th ACM SIGGRAPH/Eurographics symposium on computer animation, pp 25–35. ACMGoogle Scholar
  16. 16.
    Huang C-M, Mutlu B (2014) Learning-based modeling of multimodal behaviors for humanlike robots. In: Proceedings of the 2014 ACM/IEEE international conference on human–robot interaction, pp 57–64. ACMGoogle Scholar
  17. 17.
    Li W, Joo J, Qi H, Zhu S-C (2017) Joint image-text news topic detection and tracking by multimodal topic and-or graph. IEEE Trans Multimedia 19(2):367–381CrossRefGoogle Scholar
  18. 18.
    Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Computer vision and pattern recognitionGoogle Scholar
  19. 19.
    Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1725–1732Google Scholar
  20. 20.
    Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, pp 740–755Google Scholar
  21. 21.
    Joo J, Steen FF, Zhu S-C (2015) Automated facial trait judgment and election outcome prediction: social dimensions of face. In: Proceedings of the IEEE international conference on computer vision, pp 3712–3720Google Scholar
  22. 22.
    Groeling T, Li W, Joo J, Steen FF (2016) Visualizing presidential elections. In: APSA annual meetingGoogle Scholar
  23. 23.
    Joo J, Li W, Steen FF, Zhu S-C (2014) Visual persuasion: inferring communicative intents of images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 216–223Google Scholar
  24. 24.
    Tore N, Anna E, Janda LA, Makarova A, Steen F, Turner M (2013) How “here” and “now” in Russian and English establish joint attention in TV news broadcasts. Russ Linguist 37(3):229–251CrossRefGoogle Scholar
  25. 25.
    Steen FF, Turner M (2013) Multimodal construction grammar. In: Borkent M, Barbara D, Jennifer H (eds) Language and the creative mind. CSLI Publications, University of Chicago Press. Stanford, CA, pp 255–274Google Scholar
  26. 26.
    Turner M (2017) Multimodal form-meaning pairs for blended classic joint attention. Linguist Vanguard. doi: 10.1515/lingvan-2016-0043

Copyright information

© Springer-Verlag GmbH Deutschland 2017

Authors and Affiliations

  1. 1.CommunicationUCLALos AngelesUSA
  2. 2.Cognitive ScienceCase Western Reserve UniversityClevelandUSA

Personalised recommendations