Towards Modelling Multimodal and Multiparty Interaction in Educational Settings

  • Maria Koutsombogera
  • Miltos Deligiannis
  • Maria Giagkou
  • Harris Papageorgiou
Part of the Intelligent Systems Reference Library book series (ISRL, volume 106)


This paper presents an experimental design and setup that explores the interaction between two children and their tutor during a question–answer session of a reading comprehension task. The multimodal aspects of the interactions are analysed in terms of preferred signals and strategies that speakers employ to carry out successful multi-party conversations. This analysis will form the basis for the development of behavioral models accounting for the specific context. We envisage the integration of such models into intelligent, context-aware systems, i.e. an embodied dialogue system that has the role of a tutor and is able to carry out a discussion in a multiparty setting by exploring the multimodal signals of the children. This system will have the ability to discuss a text and address questions to the children, encouraging collaboration and equal participation in the discussion and assessing the answers that the children give. The paper focuses on the design of the appropriate setup, the data collection and the analysis of the multimodal signals that are important for the realization of such a system.


Multiparty and multimodal interaction Reading comprehension  Non-verbal signals Turn-taking Feedback Embodied dialogue system 



Research leading to these results has been funded in part by the Greek General Secretariat for Research and Technology, KRIPIS Action, under Grant No. 448306 (POLYTROPON). The authors would like to thank all the session subjects for their kind participation in the experiments. The authors would also like to express their appreciation to the reviewers for their valuable feedback and constructive comments.


  1. 1.
    Clifford N, Steuer J, Tauber E (1994) Computers are social actors. In: Adelson B, Dumais S, Olson J (eds) CHI ’94 Proceedings. of the SIGCHI conference on human factors in computing systems, Boston, April 1994. ACM Press, pp 72–78Google Scholar
  2. 2.
    Breazeal C (2003) Emotion and sociable humanoid robots. Int J Hum Comput Stud 59(1–2):119–155CrossRefGoogle Scholar
  3. 3.
    Cohen P, Oviatt S (1995) The role of voice input for human-machine communication. Proc Natl Acad Sci 92(22):9921–9927CrossRefGoogle Scholar
  4. 4.
    Kapoor A, Picard RW (2005) Multimodal affect recognition in learning environments. In: MULTIMEDIA’05 Proceedings of the 13th annual ACM international conference on Multimedia, Singapore, November 2005. ACM press, pp 677–682Google Scholar
  5. 5.
    Castellano G et al (2013) Towards empathic virtual and robotic tutors. In: Chad Lane H et al (eds) Artificial intelligence in education, vol 7926, Lecture notes in artificial intelligence. Springer, Heidelberg, pp 733–736Google Scholar
  6. 6.
    Robins B et al (2005) Robotic assistants in therapy and education of children with autism: can a small humanoid robot help encourage social interaction skills? Univers Access Inform Soc 4(2):105–120MathSciNetCrossRefGoogle Scholar
  7. 7.
    Cassell J (2009) Embodied conversational agents. MIT Press, CambridgeGoogle Scholar
  8. 8.
    Rudnicky A (2005) Multimodal dialogue systems. In: Minker W, Buhler W, Dybkjaer L (eds) Spoken multimodal human-computer dialogue in mobile environments, vol 28. Text, speech and language technology. Springer, Dordrecht, pp 3–11Google Scholar
  9. 9.
    Al Moubayed S et al. (2012) Furhat: a back-projected human-like robot head for multiparty human-machine interaction. In: Esposito A et al. (eds) Cognitive behavioural systems, vol 7403. Lecture notes in computer science. Springer. Heidelberg, pp 114–130Google Scholar
  10. 10.
    Oertel C et al (2013) D64: a corpus of richly recorded conversational interaction. J Multimodal User Interfaces 7:19–28CrossRefGoogle Scholar
  11. 11.
    Edlund J et al. (2010) Spontal: a Swedish spontaneous dialogue corpus of audio, video and motion capture. In: Calzolari et al. (eds) LREC 2010 Proceedings of the seventh conference on international language resources and evaluation, Valetta, May 2010. ELRA, pp 2992–2995Google Scholar
  12. 12.
    Paggio P et al. (2010) The NOMCO multimodal Nordic resource - goals and characteristics. In: Calzolari et al. (eds) LREC 2010 Proceedings of the seventh conference on international language resources and evaluation valetta, May 2010. ELRA, pp 2968–2973Google Scholar
  13. 13.
    Carletta J (2007) Unleashing the killer corpus: experiences in creating the multi-everything AMI meeting corpus. J Lang Resour Eval 41(2):181–190CrossRefGoogle Scholar
  14. 14.
    Wittenburg P et al. (2006) ELAN: a professional framework for multimodality research. In: Calzolari et al. (eds) LREC 2006 Proceedings of the fifth conference on International language resources and evaluation, Genoa, May 2006. ELRA, pp 1556–1559Google Scholar
  15. 15.
    Allwood et al. (2007) The mumin coding scheme for the annotation of feedback, turn management and sequencing phenomena. Multimodal corpora for modelling human multimodal behaviour. J Lang Resour Eval 41(3–4):273–287Google Scholar
  16. 16.
    Koutsombogera M et al. (2014) The tutorbot corpus - A corpus for studying tutoring behaviour in multiparty face-to-face spoken dialogue. In: Calzolari et al. (eds) LREC 2014 Proceedings of the ninth conference on international language resources and evaluation. Reykjavik, May 2014. ELRA, pp 4196–4201Google Scholar
  17. 17.
    Sacks H, Schegloff E, Jefferson G (1974) A simplest systematics for the organization of turn-taking in conversation. Language 50:696–735CrossRefGoogle Scholar
  18. 18.
    Duncan S (1972) Some signals and rules for taking speaking turns in conversation. J Pers Soc Psychol 23:283–292CrossRefGoogle Scholar
  19. 19.
    Goodwin C (1980) Restarts, pauses and the achievement of mutual gaze at turn-beginning. Sociol Inq 50(3–4):272–302CrossRefGoogle Scholar
  20. 20.
    Bohus D, Horvitz E (2010) Facilitating multiparty dialog with gaze, gesture, and speech. In: ICMI-MLMI ’10 International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, Beijing, November 2010. ACM Press, p 311Google Scholar
  21. 21.
    Allwood J, Nivre J, Ahlsén E (1993) On the semantics and pragmatics of linguistic feedback. J Semant 9(1):1–29CrossRefGoogle Scholar
  22. 22.
    Koutsombogera M, Papageorgiou H (2010) Linguistic and non-verbal cues for the induction of silent feedback. In: Esposito A et al. (eds) Development of multimodal interfaces: active listening and synchrony, vol 5967. Lecture notes in computer science. Springer, Heidelberg, pp 327–336Google Scholar
  23. 23.
    Allwood J et al (2007) The analysis of embodied communicative feedback in multimodal corpora: a prerequisite for behavior simulation. J Lang Resour Eval 41(3–4):255–272CrossRefGoogle Scholar
  24. 24.
    Al Moubayed S, Skantze G (2012) Perception of gaze direction for situated interaction. In: Gaze-In ’12 proceedings of the 4th workshop on eye gaze in intelligent human machine interaction, Santa Monica, October 2012. ACM Press, p 88Google Scholar
  25. 25.
    Johansson M, Skantze G, Gustafson J (2013) Head pose patterns in multiparty human-robot team-building interactions. In: Herrmann G et al. (eds) International conference on social robotics, Bristol, October 2013. Lecture notes in artificial intelligence, vol 8239. Springer International publishing, pp 351–360Google Scholar
  26. 26.
    Skantze G, Al Moubayed S (2012) IrisTK: a statechart-based toolkit for multi-party face-to-face interaction. In: ICMI’12 Proceedings of the 14th ACM international conference on multimodal interaction, Santa Monica, October 2012. ACM Press, pp 69–76Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Maria Koutsombogera
    • 1
  • Miltos Deligiannis
    • 1
  • Maria Giagkou
    • 1
  • Harris Papageorgiou
    • 1
  1. 1.Institute for Language and Speech Processing - “Athena” R.I.CAthensGreece

Personalised recommendations