Abstract
This project explores a novel experimental setup towards building spoken, multi-modally rich, and human-like multiparty tutoring agent. A setup is developed and a corpus is collected that targets the development of a dialogue system platform to explore verbal and nonverbal tutoring strategies in multiparty spoken interactions with embodied agents. The dialogue task is centered on two participants involved in a dialogue aiming to solve a card-ordering game. With the participants sits a tutor that helps the participants perform the task and organizes and balances their interaction. Different multimodal signals captured and auto-synchronized by different audio-visual capture technologies were coupled with manual annotations to build a situated model of the interaction based on the participants personalities, their temporally-changing state of attention, their conversational engagement and verbal dominance, and the way these are correlated with the verbal and visual feedback, turn-management, and conversation regulatory actions generated by the tutor. At the end of this chapter we discuss the potential areas of research and developments this work opens and some of the challenges that lie in the road ahead.
Chapter PDF
References
Cassell, J.: Embodied conversational agents. MIT Press, Cambridge (2009)
Rudnicky, A.: Multimodal dialogue systems. In: Minker, W., et al. (eds.) Spoken Multimodal Human-Computer Dialogue in Mobile Environments. Text, Speech and Language Technology, vol. 28, pp. 3–11. Springer (2005)
Clifford, N., Steuer, J., Tauber, E.: Computers are social actors. In: CHI 1994: Proc. of the SIGCHI Conference on Human Factors in Computing Systems, pp. 72–78. ACM Press (1994)
Cohen, P.: The role of natural language in a multimodal interface. In: Proc. of User Interface Software Technology (UIST 1992) Conference, pp. 143–149. Academic Press, Monterey (1992)
Cohen, P., Oviatt, S.: The role of voice input for human-machine communication. Proceedings of the National Academy of Sciences 92(22), 9921–9927 (1995)
Castellano, G., Paiva, A., Kappas, A., Aylett, R., Hastie, H., Barendregt, W., Nabais, F., Bull, S.: Towards empathic virtual and robotic tutors. In: Lane, H.C., Yacef, K., Mostow, J., Pavlik, P. (eds.) AIED 2013. LNCS, vol. 7926, pp. 733–736. Springer, Heidelberg (2013)
Iacobelli, F., Cassell, J.: Ethnic Identity and Engagement in Embodied Conversational Agents. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 57–63. Springer, Heidelberg (2007)
Robins, B., Dautenhahn, K., te Boekhorst, R., Billard, A.: Robotic assistants in therapy and education of children with autism: Can a small humanoid robot help encourage social interaction skills? In: Universal Access in the Information Society, UAIS (2005)
Al Moubayed, S., Beskow, J., Skantze, G., Granström, B.: Furhat: A Back-projected Human-like Robot Head for Multiparty Human-Machine Interaction. In: Esposito, A., Esposito, A.M., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds.) COST 2102. LNCS, vol. 7403, pp. 114–130. Springer, Heidelberg (2012)
Skantze, G., Al Moubayed, S.: IrisTK: A statechart-based toolkit for multi-party face-to-face interaction. In: ICMI 2012, Santa Monica, CA (2012)
Oertel, C., Cummins, F., Edlund, J., Wagner, P., Campbell, N.: D64: A corpus of richly recorded conversational interaction. Journal of Multimodal User Interfaces (2012)
Edlund, J., Beskow, J., Elenius, K., Hellmer, K., Strömbergsson, S., House, D.: Spontal: A Swedish spontaneous dialogue corpus of audio, video and motion capture. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odjik, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proc. of the Seventh Conference on International Language Resources and Evaluation (LREC 2010), Valetta, Malta, pp. 2992–2995 (2010)
Al Moubayed, S., Edlund, J., Gustafson, J.: Analysis of gaze and speech patterns in three-party quiz game interaction. In: Interspeech 2013, Lyon, France (2013)
Paggio, P., Allwood, J., Ahlsen, E., Jokinen, K., Navarretta, C.: The NOMCO multimodal Nordic resource - goals and characteristics. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2010), Valetta, Malta (2010)
Carletta, J.: Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus. Language Resources and Evaluation 41(2), 181–190 (2007)
Digman, J.M.: Personality structure: Emergence of the five-factor model. Annual Review of Psychology 41, 417–440 (1990)
Bateman, T.S., Crant, J.M.: The proactive component of organizational behavior: A measure and correlates. Journal of Organizational Behavior 14(2), 103–118 (1993)
Langelaan, S., Bakker, A., Van Doornen, L., Schaufeli, W.: Burnout and work engagement: Do individual differences make a difference? Personality and Individual Differences 40(3), 521–532 (2006)
Laugwitz, B., Held, T., Schrepp, M.: Construction and evaluation of a user experience questionnaire. In: Holzinger, A. (ed.) USAB 2008. LNCS, vol. 5298, pp. 63–76. Springer, Heidelberg (2008)
Cronbach, L.J.: Coefficient alpha and the internal consistency of tests. Psychometrika 16, 297–334 (1951)
Sacks, H.: A simplest systematics for the organization of turn-taking in conversation. Language 50, 696–735 (1974)
Duncan, S.: Some Signals and Rules for Taking Speaking Turns in Conversation. Journal of Personality and Social Psychology 23, 283–292 (1972)
Goodwin, C.: Restarts, pauses and the achievement of mutual gaze at turn-beginning. Sociological Inquiry 50(3-4), 272–302 (1980)
Bohus, D., Horvitz, E.: Facilitating multiparty dialog with gaze, gesture, and speech. In: ICMI 2010, Beijing, China (2010)
Allwood, J., Nivre, J., Ahlsén, E.: On the semantics and pragmatics of linguistic feedback. Journal of Semantics 9(1), 1–29 (1993)
Koutsombogera, M., Papageorgiou, H.: Linguistic and Non-verbal Cues for the Induction of Silent Feedback. In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds.) Second COST 2102. LNCS, vol. 5967, pp. 327–336. Springer, Heidelberg (2010)
Allwood, J., Kopp, S., Grammer, K., Ahlsén, E., Oberzaucher, E., Koppensteiner, M.: The analysis of embodied communicative feedback in multimodal corpora: A prerequisite for behavior simulation. Journal on Language Resources and Evaluation 41(3-4), 255–272 (2007a)
Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., Sloetjes, H.: ELAN: A professional framework for multimodality research. In: 5th International Conference on Language Resources and Evaluation (LREC 2006), pp. 1556–1559 (2006)
Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., Paggio, P.: The MUMIN Coding Scheme for the Annotation of Feedback, Turn Management and Sequencing Phenomena. Multimodal Corpora for Modeling Human Multimodal Behaviour. Journal on Language Resources and Evaluation 41(3-4), 273–287 (2007b)
Bunt, H., Alexandersson, J., Carletta, J., Choe, J.-W., Fang, A.C., Hasida, K., Lee, K., Petukhova, V., Popescu-Belis, A., Romary, L., Soria, C., Traum, D.R.: Towards an ISO Standard for Dialogue Act Annotation. In: Seventh International Conference on Language Resources and Evaluation, LREC 2010 (2010)
Beskow, J.: Rule-based visual speech synthesis. In: Proc of the Fourth European Conference on Speech Communication and Technology (1995)
Al Moubayed, S., Edlund, J., Beskow, J.: Taming Mona Lisa: Communicating gaze faithfully in 2D and 3D facial projections. ACM Transactions on Interactive Intelligent Systems 1(2), 25 (2012)
Al Moubayed, S., Skantze, G.: Turn-taking Control Using Gaze in Multiparty Human-Computer Dialogue: Effects of 2D and 3D Displays. In: AVSP 2011, Florence, Italy (2011)
Al Moubayed, S., Skantze, G.: Perception of Gaze Direction for Situated Interaction. In: 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction, The 14th ACM International Conference on Multimodal Interaction, Santa Monica, CA, USA (2012)
Al Moubayed, S., Skantze, G., Beskow, J.: Lip-reading Furhat: Audio Visual Intelligibility of a Back Projected Animated Face. In: 10th International Conference on Intelligent Virtual Agents (IVA 2012), Santa Cruz, CA, USA (2012)
Skantze, G., Al Moubayed, S., Gustafson, J., Beskow, J., Granström, B.: Furhat at Robotville: A Robot Head Harvesting the Thoughts of the Public through Multi-party Dialogue. In: Proceedings of IVA-RCVA, Santa Cruz, CA (2012)
Harel, D.: Statecharts: A visual formalism for complex systems. Science of Computer Programming 8(3), 231–274 (1987)
Stiefelhagen, R., Zhu, J.: Head orientation and gaze direction in meetings. In: Conference on Human Factors in Computing Systems, pp. 858–859 (2002)
Ba, S.O., Odobez, J.-M.: Recognizing visual focus of attention from head pose in natural meetings. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 39(1), 16–33 (2009)
Johansson, M., Skantze, G., Gustafson, J.: Head Pose Patterns in Multiparty Human-Robot Team-Building Interactions. In: Herrmann, G., Pearson, M.J., Lenz, A., Bremner, P., Spiers, A., Leonards, U. (eds.) ICSR 2013. LNCS, vol. 8239, pp. 351–360. Springer, Heidelberg (2013)
Al Moubayed, S., Beskow, J., Granström, B.: Auditory-Visual Prominence: From Intelligibilitty to Behavior. Journal on Multimodal User Interfaces 3(4), 299–311 (2010)
Al Moubayed, S., Beskow, J.: Effects of Visual Prominence Cues on Speech Intelligibility. In: Auditory-Visual Speech Processing, AVSP 2009, Norwich, England (2009)
Streefkerk, B., Pols, L.C.W., ten Bosch, L.: Acoustical features as predictors for prominence in read aloud Dutch sentences used in anns. In: Eurospeech, Budapest, Hungary (1999)
Bevacqua, E., Pammi, S., Hyniewska, S.J., Schröder, M., Pelachaud, C.: Multimodal backchannels for embodied conversational agents. In: The International Conference on Intelligent Virtual Agents, Philadelphia, PA, USA (2010)
Zhang, J.Y., Toth, A.R., Collins-Thompson, K., Black, A.W.: Prominence prediction for super-sentential prosodic modeling based on a new database. In: ISCA Speech Synthesis Workshop, Pittsburgh, PA, USA (2004)
Al Moubayed, S., Chetouani, M., Baklouti, M., Dutoit, T., Mahdhaoui, A., Martin, J.-C., Ondas, S., Pelachaud, C., Urbain, J., Yilmaz, M.: Generating Robot/Agent Backchannels During a Storytelling Experiment. In: Proceedings of (ICRA 2009) IEEE International Conference on Robotics and Automation, Kobe, Japan (2009)
Terken, J.: Fundamental frequency and perceived prominence of accented syllables. The Journal of the Acoustical Society of America 89, 1768–1776 (1991)
Wang, D., Narayanan, S.: An acoustic measure for word prominence in spontaneous speech. IEEE Transactions on Audio, Speech, and Language Processing 15, 690–701 (2007)
Kullback, S.: Information Theory and Statistics. John Wiley and Sons (1959)
Hotelling, H., Eisenhart, M., Hastay, W., Wallis, W.A.: Multivariate quality control. McGraw-Hill (1947)
Cheveigne, A.D., Kawahara, H.: Yin, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America 111, 1917–1930 (2002)
Greenberg, S., Carvey, H., Hitchcock, L., Chang, S.: Temporal properties of spontaneous speech - Asyllable-centric perspective. Journal of Phonetics 31, 465–485 (2003)
Yilmaz, A., Javed, O., Shah, M.: Object tracking: A survey. ACM Comput. Surv. 38(4), Article 13 (2006)
Rienks, R., Heylen, D.: Dominance Detection in Meetings Using Easily Obtainable Features. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 76–86. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 IFIP International Federation for Information Processing
About this paper
Cite this paper
Al Moubayed, S. et al. (2014). Tutoring Robots. In: Rybarczyk, Y., Cardoso, T., Rosas, J., Camarinha-Matos, L.M. (eds) Innovative and Creative Developments in Multimodal Interaction Systems. eNTERFACE 2013. IFIP Advances in Information and Communication Technology, vol 425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55143-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-55143-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55142-0
Online ISBN: 978-3-642-55143-7
eBook Packages: Computer ScienceComputer Science (R0)