Tutoring Robots

Al Moubayed, Samer; Beskow, Jonas; Bollepalli, Bajibabu; Hussen-Abdelaziz, Ahmed; Johansson, Martin; Koutsombogera, Maria; Lopes, José David; Novikova, Jekaterina; Oertel, Catharine; Skantze, Gabriel; Stefanov, Kalin; Varol, Gül

doi:10.1007/978-3-642-55143-7_4

Tutoring Robots

Multiparty Multimodal Social Dialogue with an Embodied Tutor

Samer Al Moubayed²,
Jonas Beskow²,
Bajibabu Bollepalli²,
Ahmed Hussen-Abdelaziz⁶,
Martin Johansson²,
Maria Koutsombogera³,
José David Lopes⁴,
Jekaterina Novikova⁵,
Catharine Oertel²,
Gabriel Skantze²,
Kalin Stefanov² &
…
Gül Varol⁷

Conference paper

748 Accesses
3 Citations
7 Altmetric

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 425))

Abstract

This project explores a novel experimental setup towards building spoken, multi-modally rich, and human-like multiparty tutoring agent. A setup is developed and a corpus is collected that targets the development of a dialogue system platform to explore verbal and nonverbal tutoring strategies in multiparty spoken interactions with embodied agents. The dialogue task is centered on two participants involved in a dialogue aiming to solve a card-ordering game. With the participants sits a tutor that helps the participants perform the task and organizes and balances their interaction. Different multimodal signals captured and auto-synchronized by different audio-visual capture technologies were coupled with manual annotations to build a situated model of the interaction based on the participants personalities, their temporally-changing state of attention, their conversational engagement and verbal dominance, and the way these are correlated with the verbal and visual feedback, turn-management, and conversation regulatory actions generated by the tutor. At the end of this chapter we discuss the potential areas of research and developments this work opens and some of the challenges that lie in the road ahead.

Download to read the full chapter text

Chapter PDF

References

Cassell, J.: Embodied conversational agents. MIT Press, Cambridge (2009)
Google Scholar
Rudnicky, A.: Multimodal dialogue systems. In: Minker, W., et al. (eds.) Spoken Multimodal Human-Computer Dialogue in Mobile Environments. Text, Speech and Language Technology, vol. 28, pp. 3–11. Springer (2005)
Google Scholar
Clifford, N., Steuer, J., Tauber, E.: Computers are social actors. In: CHI 1994: Proc. of the SIGCHI Conference on Human Factors in Computing Systems, pp. 72–78. ACM Press (1994)
Google Scholar
Cohen, P.: The role of natural language in a multimodal interface. In: Proc. of User Interface Software Technology (UIST 1992) Conference, pp. 143–149. Academic Press, Monterey (1992)
Google Scholar
Cohen, P., Oviatt, S.: The role of voice input for human-machine communication. Proceedings of the National Academy of Sciences 92(22), 9921–9927 (1995)
Article Google Scholar
Castellano, G., Paiva, A., Kappas, A., Aylett, R., Hastie, H., Barendregt, W., Nabais, F., Bull, S.: Towards empathic virtual and robotic tutors. In: Lane, H.C., Yacef, K., Mostow, J., Pavlik, P. (eds.) AIED 2013. LNCS, vol. 7926, pp. 733–736. Springer, Heidelberg (2013)
Chapter Google Scholar
Iacobelli, F., Cassell, J.: Ethnic Identity and Engagement in Embodied Conversational Agents. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 57–63. Springer, Heidelberg (2007)
Chapter Google Scholar
Robins, B., Dautenhahn, K., te Boekhorst, R., Billard, A.: Robotic assistants in therapy and education of children with autism: Can a small humanoid robot help encourage social interaction skills? In: Universal Access in the Information Society, UAIS (2005)
Google Scholar
Al Moubayed, S., Beskow, J., Skantze, G., Granström, B.: Furhat: A Back-projected Human-like Robot Head for Multiparty Human-Machine Interaction. In: Esposito, A., Esposito, A.M., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds.) COST 2102. LNCS, vol. 7403, pp. 114–130. Springer, Heidelberg (2012)
Chapter Google Scholar
Skantze, G., Al Moubayed, S.: IrisTK: A statechart-based toolkit for multi-party face-to-face interaction. In: ICMI 2012, Santa Monica, CA (2012)
Google Scholar
Oertel, C., Cummins, F., Edlund, J., Wagner, P., Campbell, N.: D64: A corpus of richly recorded conversational interaction. Journal of Multimodal User Interfaces (2012)
Google Scholar
Edlund, J., Beskow, J., Elenius, K., Hellmer, K., Strömbergsson, S., House, D.: Spontal: A Swedish spontaneous dialogue corpus of audio, video and motion capture. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odjik, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proc. of the Seventh Conference on International Language Resources and Evaluation (LREC 2010), Valetta, Malta, pp. 2992–2995 (2010)
Google Scholar
Al Moubayed, S., Edlund, J., Gustafson, J.: Analysis of gaze and speech patterns in three-party quiz game interaction. In: Interspeech 2013, Lyon, France (2013)
Google Scholar
Paggio, P., Allwood, J., Ahlsen, E., Jokinen, K., Navarretta, C.: The NOMCO multimodal Nordic resource - goals and characteristics. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2010), Valetta, Malta (2010)
Google Scholar
Carletta, J.: Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus. Language Resources and Evaluation 41(2), 181–190 (2007)
Article Google Scholar
Digman, J.M.: Personality structure: Emergence of the five-factor model. Annual Review of Psychology 41, 417–440 (1990)
Article Google Scholar
Bateman, T.S., Crant, J.M.: The proactive component of organizational behavior: A measure and correlates. Journal of Organizational Behavior 14(2), 103–118 (1993)
Article Google Scholar
Langelaan, S., Bakker, A., Van Doornen, L., Schaufeli, W.: Burnout and work engagement: Do individual differences make a difference? Personality and Individual Differences 40(3), 521–532 (2006)
Article Google Scholar
Laugwitz, B., Held, T., Schrepp, M.: Construction and evaluation of a user experience questionnaire. In: Holzinger, A. (ed.) USAB 2008. LNCS, vol. 5298, pp. 63–76. Springer, Heidelberg (2008)
Chapter Google Scholar
Cronbach, L.J.: Coefficient alpha and the internal consistency of tests. Psychometrika 16, 297–334 (1951)
Article Google Scholar
Sacks, H.: A simplest systematics for the organization of turn-taking in conversation. Language 50, 696–735 (1974)
Article Google Scholar
Duncan, S.: Some Signals and Rules for Taking Speaking Turns in Conversation. Journal of Personality and Social Psychology 23, 283–292 (1972)
Article Google Scholar
Goodwin, C.: Restarts, pauses and the achievement of mutual gaze at turn-beginning. Sociological Inquiry 50(3-4), 272–302 (1980)
Article Google Scholar
Bohus, D., Horvitz, E.: Facilitating multiparty dialog with gaze, gesture, and speech. In: ICMI 2010, Beijing, China (2010)
Google Scholar
Allwood, J., Nivre, J., Ahlsén, E.: On the semantics and pragmatics of linguistic feedback. Journal of Semantics 9(1), 1–29 (1993)
Google Scholar
Koutsombogera, M., Papageorgiou, H.: Linguistic and Non-verbal Cues for the Induction of Silent Feedback. In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds.) Second COST 2102. LNCS, vol. 5967, pp. 327–336. Springer, Heidelberg (2010)
Chapter Google Scholar
Allwood, J., Kopp, S., Grammer, K., Ahlsén, E., Oberzaucher, E., Koppensteiner, M.: The analysis of embodied communicative feedback in multimodal corpora: A prerequisite for behavior simulation. Journal on Language Resources and Evaluation 41(3-4), 255–272 (2007a)
Google Scholar
Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., Sloetjes, H.: ELAN: A professional framework for multimodality research. In: 5th International Conference on Language Resources and Evaluation (LREC 2006), pp. 1556–1559 (2006)
Google Scholar
Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., Paggio, P.: The MUMIN Coding Scheme for the Annotation of Feedback, Turn Management and Sequencing Phenomena. Multimodal Corpora for Modeling Human Multimodal Behaviour. Journal on Language Resources and Evaluation 41(3-4), 273–287 (2007b)
Article Google Scholar
Bunt, H., Alexandersson, J., Carletta, J., Choe, J.-W., Fang, A.C., Hasida, K., Lee, K., Petukhova, V., Popescu-Belis, A., Romary, L., Soria, C., Traum, D.R.: Towards an ISO Standard for Dialogue Act Annotation. In: Seventh International Conference on Language Resources and Evaluation, LREC 2010 (2010)
Google Scholar
Beskow, J.: Rule-based visual speech synthesis. In: Proc of the Fourth European Conference on Speech Communication and Technology (1995)
Google Scholar
Al Moubayed, S., Edlund, J., Beskow, J.: Taming Mona Lisa: Communicating gaze faithfully in 2D and 3D facial projections. ACM Transactions on Interactive Intelligent Systems 1(2), 25 (2012)
Article Google Scholar
Al Moubayed, S., Skantze, G.: Turn-taking Control Using Gaze in Multiparty Human-Computer Dialogue: Effects of 2D and 3D Displays. In: AVSP 2011, Florence, Italy (2011)
Google Scholar
Al Moubayed, S., Skantze, G.: Perception of Gaze Direction for Situated Interaction. In: 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction, The 14th ACM International Conference on Multimodal Interaction, Santa Monica, CA, USA (2012)
Google Scholar
Al Moubayed, S., Skantze, G., Beskow, J.: Lip-reading Furhat: Audio Visual Intelligibility of a Back Projected Animated Face. In: 10th International Conference on Intelligent Virtual Agents (IVA 2012), Santa Cruz, CA, USA (2012)
Google Scholar
Skantze, G., Al Moubayed, S., Gustafson, J., Beskow, J., Granström, B.: Furhat at Robotville: A Robot Head Harvesting the Thoughts of the Public through Multi-party Dialogue. In: Proceedings of IVA-RCVA, Santa Cruz, CA (2012)
Google Scholar
Harel, D.: Statecharts: A visual formalism for complex systems. Science of Computer Programming 8(3), 231–274 (1987)
Article MATH MathSciNet Google Scholar
Stiefelhagen, R., Zhu, J.: Head orientation and gaze direction in meetings. In: Conference on Human Factors in Computing Systems, pp. 858–859 (2002)
Google Scholar
Ba, S.O., Odobez, J.-M.: Recognizing visual focus of attention from head pose in natural meetings. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 39(1), 16–33 (2009)
Article Google Scholar
Johansson, M., Skantze, G., Gustafson, J.: Head Pose Patterns in Multiparty Human-Robot Team-Building Interactions. In: Herrmann, G., Pearson, M.J., Lenz, A., Bremner, P., Spiers, A., Leonards, U. (eds.) ICSR 2013. LNCS, vol. 8239, pp. 351–360. Springer, Heidelberg (2013)
Chapter Google Scholar
Al Moubayed, S., Beskow, J., Granström, B.: Auditory-Visual Prominence: From Intelligibilitty to Behavior. Journal on Multimodal User Interfaces 3(4), 299–311 (2010)
Article Google Scholar
Al Moubayed, S., Beskow, J.: Effects of Visual Prominence Cues on Speech Intelligibility. In: Auditory-Visual Speech Processing, AVSP 2009, Norwich, England (2009)
Google Scholar
Streefkerk, B., Pols, L.C.W., ten Bosch, L.: Acoustical features as predictors for prominence in read aloud Dutch sentences used in anns. In: Eurospeech, Budapest, Hungary (1999)
Google Scholar
Bevacqua, E., Pammi, S., Hyniewska, S.J., Schröder, M., Pelachaud, C.: Multimodal backchannels for embodied conversational agents. In: The International Conference on Intelligent Virtual Agents, Philadelphia, PA, USA (2010)
Google Scholar
Zhang, J.Y., Toth, A.R., Collins-Thompson, K., Black, A.W.: Prominence prediction for super-sentential prosodic modeling based on a new database. In: ISCA Speech Synthesis Workshop, Pittsburgh, PA, USA (2004)
Google Scholar
Al Moubayed, S., Chetouani, M., Baklouti, M., Dutoit, T., Mahdhaoui, A., Martin, J.-C., Ondas, S., Pelachaud, C., Urbain, J., Yilmaz, M.: Generating Robot/Agent Backchannels During a Storytelling Experiment. In: Proceedings of (ICRA 2009) IEEE International Conference on Robotics and Automation, Kobe, Japan (2009)
Google Scholar
Terken, J.: Fundamental frequency and perceived prominence of accented syllables. The Journal of the Acoustical Society of America 89, 1768–1776 (1991)
Article Google Scholar
Wang, D., Narayanan, S.: An acoustic measure for word prominence in spontaneous speech. IEEE Transactions on Audio, Speech, and Language Processing 15, 690–701 (2007)
Article Google Scholar
Kullback, S.: Information Theory and Statistics. John Wiley and Sons (1959)
Google Scholar
Hotelling, H., Eisenhart, M., Hastay, W., Wallis, W.A.: Multivariate quality control. McGraw-Hill (1947)
Google Scholar
Cheveigne, A.D., Kawahara, H.: Yin, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America 111, 1917–1930 (2002)
Article Google Scholar
Greenberg, S., Carvey, H., Hitchcock, L., Chang, S.: Temporal properties of spontaneous speech - Asyllable-centric perspective. Journal of Phonetics 31, 465–485 (2003)
Article Google Scholar
Yilmaz, A., Javed, O., Shah, M.: Object tracking: A survey. ACM Comput. Surv. 38(4), Article 13 (2006)
Google Scholar
Rienks, R., Heylen, D.: Dominance Detection in Meetings Using Easily Obtainable Features. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 76–86. Springer, Heidelberg (2006)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

KTH Speech, Music and Hearing, Sweden
Samer Al Moubayed, Jonas Beskow, Bajibabu Bollepalli, Martin Johansson, Catharine Oertel, Gabriel Skantze & Kalin Stefanov
Institute for Language and Speech Processing- “Athena” R.C., Greece
Maria Koutsombogera
Spoken Language Systems Laboratory, INESC ID Lisboa, Portugal
José David Lopes
Department of Computer Science, University of Bath, UK
Jekaterina Novikova
Institute of Communication Acoustics, Ruhr-Universität Bochum, Germany
Ahmed Hussen-Abdelaziz
Department of Computer Engineering, Boğaziçi University, Turkey
Gül Varol

Authors

Samer Al Moubayed
View author publications
You can also search for this author in PubMed Google Scholar
Jonas Beskow
View author publications
You can also search for this author in PubMed Google Scholar
Bajibabu Bollepalli
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Hussen-Abdelaziz
View author publications
You can also search for this author in PubMed Google Scholar
Martin Johansson
View author publications
You can also search for this author in PubMed Google Scholar
Maria Koutsombogera
View author publications
You can also search for this author in PubMed Google Scholar
José David Lopes
View author publications
You can also search for this author in PubMed Google Scholar
Jekaterina Novikova
View author publications
You can also search for this author in PubMed Google Scholar
Catharine Oertel
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Skantze
View author publications
You can also search for this author in PubMed Google Scholar
Kalin Stefanov
View author publications
You can also search for this author in PubMed Google Scholar
Gül Varol
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Engenharia Electrotécnica, Universidade Nova de Lisboa, Quinta da Torre, 2829-516, Monte de Caparica, Portugal
Yves Rybarczyk , Tiago Cardoso , João Rosas & Luis M. Camarinha-Matos , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Al Moubayed, S. et al. (2014). Tutoring Robots. In: Rybarczyk, Y., Cardoso, T., Rosas, J., Camarinha-Matos, L.M. (eds) Innovative and Creative Developments in Multimodal Interaction Systems. eNTERFACE 2013. IFIP Advances in Information and Communication Technology, vol 425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55143-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-55143-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55142-0
Online ISBN: 978-3-642-55143-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics