Red Hen Lab: Dataset and Tools for Multimodal Human Communication Research


Researchers in the fields of AI and Communication both study human communication, but despite the opportunities for collaboration, they rarely interact. Red Hen Lab is dedicated to bringing them together for research on multimodal communication, using multidisciplinary teams working on vast ecologically-valid datasets. This article introduces Red Hen Lab with some possibilities for collaboration, demonstrating the utility of a variety of machine learning and AI-based tools and methods to fundamental research questions in multimodal human communication. Supplemental materials are at

This is a preview of subscription content, log in to check access.

Fig. 1


  1. 1.

    Fong T, Nourbakhsh I, Dautenhahn K (2003) A survey of socially interactive robots. Robot Auton Syst 42(3):143–166

    Article  MATH  Google Scholar 

  2. 2.

    Jaimes A, Sebe N (2007) Multimodal human–computer interaction: a survey. Comput Vis Image Underst 108(1):116–134

    Article  Google Scholar 

  3. 3.

    Ende T, Haddadin S, Parusel S, Wüsthoff T, Hassenzahl M, Albu-Schäffer A (2011) A human-centered approach to robot gesture based communication within collaborative working processes. In: Intelligent robots and systems (IROS), 2011 IEEE/RSJ international conference on, pp 3367–3374. IEEE

  4. 4.

    Gleeson B, MacLean K, Haddadi A, Croft E, Alcazar J (2013) Gestures for industry: intuitive human–robot communication from human observation. In: Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction, pp 349–356. IEEE Press

  5. 5.

    Yanik PM, Manganelli J, Merino J, Threatt AL, Brooks JO, Green KE, Walker ID (2014) A gesture learning interface for simulated robot path shaping with a human teacher. IEEE Trans Hum Mach Syst 44(1):41–54

    Article  Google Scholar 

  6. 6.

    Chen LS, Huang TS (2000) Emotional expressions in audiovisual human computer interaction. In: Multimedia and Expo, 2000. ICME 2000. 2000 IEEE international conference on, vol 1, pp 423–426. IEEE

  7. 7.

    Busso C, Deng Z, Yildirim S, Bulut M, Lee CM, Kazemzadeh A, Lee S, Neumann U, Narayanan S (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the 6th international conference on multimodal interfaces, pp 205–211. ACM

  8. 8.

    Pantic M, Sebe N, Cohn JF, Huang T (2005) Affective multimodal human–computer interaction. In: Proceedings of the 13th annual ACM international conference on multimedia, pp 669–676. ACM

  9. 9.

    Caridakis G, Castellano G, Kessous L, Raouzaiou A, Malatesta L, Asteriadis S, Karpouzis K (2007) Multimodal emotion recognition from expressive faces, body gestures and speech. Artificial intelligence and innovations 2007: from theory to applications, pp 375–388

  10. 10.

    Soleymani M, Pantic M, Pun T (2012) Multimodal emotion recognition in response to videos. IEEE Trans Affect Comput 3(2):211–223

    Article  Google Scholar 

  11. 11.

    Suchan J, Bhatt M (2016) Semantic question-answering with video and eye-tracking data: AI foundations for human visual perception driven cognitive film studies. IJCAI, pp 2633–2639

  12. 12.

    Suchan J, Bhatt M (2016) The geometry of a scene: on deep semantics for visual perception driven cognitive film studies. WACV, pp 1–9

  13. 13.

    Cassell J, Kopp S, Tepper P, Ferriman K, Striegnitz K (2007) Trading spaces: how humans and humanoids use speech and gesture to give directions. Conversational informatics, pp 133–160

  14. 14.

    Kopp S, Bergmann K, Wachsmuth I (2008) Multimodal communication from multimodal thinking towards an integrated model of speech and gesture production. Int J Semant Comput 2(01):115–136

    Article  Google Scholar 

  15. 15.

    Marsella S, Xu Y, Lhommet M, Feng A, Scherer S, Shapiro A (2013) Virtual character performance from speech. In: Proceedings of the 12th ACM SIGGRAPH/Eurographics symposium on computer animation, pp 25–35. ACM

  16. 16.

    Huang C-M, Mutlu B (2014) Learning-based modeling of multimodal behaviors for humanlike robots. In: Proceedings of the 2014 ACM/IEEE international conference on human–robot interaction, pp 57–64. ACM

  17. 17.

    Li W, Joo J, Qi H, Zhu S-C (2017) Joint image-text news topic detection and tracking by multimodal topic and-or graph. IEEE Trans Multimedia 19(2):367–381

    Article  Google Scholar 

  18. 18.

    Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Computer vision and pattern recognition

  19. 19.

    Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1725–1732

  20. 20.

    Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, pp 740–755

  21. 21.

    Joo J, Steen FF, Zhu S-C (2015) Automated facial trait judgment and election outcome prediction: social dimensions of face. In: Proceedings of the IEEE international conference on computer vision, pp 3712–3720

  22. 22.

    Groeling T, Li W, Joo J, Steen FF (2016) Visualizing presidential elections. In: APSA annual meeting

  23. 23.

    Joo J, Li W, Steen FF, Zhu S-C (2014) Visual persuasion: inferring communicative intents of images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 216–223

  24. 24.

    Tore N, Anna E, Janda LA, Makarova A, Steen F, Turner M (2013) How “here” and “now” in Russian and English establish joint attention in TV news broadcasts. Russ Linguist 37(3):229–251

    Article  Google Scholar 

  25. 25.

    Steen FF, Turner M (2013) Multimodal construction grammar. In: Borkent M, Barbara D, Jennifer H (eds) Language and the creative mind. CSLI Publications, University of Chicago Press. Stanford, CA, pp 255–274

  26. 26.

    Turner M (2017) Multimodal form-meaning pairs for blended classic joint attention. Linguist Vanguard. doi:10.1515/lingvan-2016-0043

Download references

Author information



Corresponding author

Correspondence to Jungseock Joo.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Joo, J., Steen, F.F. & Turner, M. Red Hen Lab: Dataset and Tools for Multimodal Human Communication Research. Künstl Intell 31, 357–361 (2017).

Download citation


  • Multimodal communication
  • Non-verbal communication
  • Face and gesture