Towards Reasoned Modality Selection in an Embodied Conversation Agent

  • Carla Ten-VenturaEmail author
  • Roberto Carlini
  • Stamatia Dasiopoulou
  • Gerard Llorach Tó
  • Leo WannerEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10498)


We present work in progress on (verbal, facial, and gestural) modality selection in an embodied multilingual and multicultural conversation agent. In contrast to most of the recent proposals, which consider non-verbal behavior as being superimposed on and/or derived from the verbal modality, we argue for a holistic model that assigns modalities to individual content elements in accordance with semantic and contextual constraints as well as with cultural and personal characteristics of the addressee. Our model is thus in line with the SAIBA framework, although methodological differences become apparent at a more fine-grained level of realization.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Albrecht, I., Haber, J., Seidel, H.P., Earnshaw, R.: Automatic generation of non-verbal facial expressions from speech. In: Proceedings of the International Computer Graphics Conference, pp. 283–293 (2002)CrossRefGoogle Scholar
  2. 2.
    Cafaro, A., Vilhjálmsson, H.H., Bickmore, T., Heylen, D., Pelachaud, C.: Representing communicative functions in SAIBA with a unified function markup language. In: Bickmore, T., Marsella, S., Sidner, C. (eds.) IVA 2014. LNCS (LNAI), vol. 8637, pp. 81–94. Springer, Cham (2014). doi: 10.1007/978-3-319-09767-1_11CrossRefGoogle Scholar
  3. 3.
    Cassell, J., Bickmore, T., Billinghurst, M., Campbell, L., Chang, K., Vilhjámsson, H., Yan, H.: Embodiment in conversational interfaces: Rea. In: Proceedings of CHI 1999, pp. 520–527. ACM (1999)Google Scholar
  4. 4.
    De Carolis, B., Pelachaud, C., Poggi, I., Steedman., M.: APML, a mark-up language for believable behavior generation. In: Prendinger, H., Ishizuka, M. (eds.) Lifelike Characters. Tools, Affective Functions and Applications. Springer Verlag (2004)Google Scholar
  5. 5.
    Endrass, B., Rehm, M., André, E.: Planning Small Talk behavior with cultural influences for multiagent systems. Computer Speech and Language 25, 158–174 (2014)CrossRefGoogle Scholar
  6. 6.
    Foster, M.: Interleaved preparation and output in the comic fission module. In: Proceedings of the ACL Workshop on Software, Ann Arbor (2005)Google Scholar
  7. 7.
    Freigang, F., Kopp, S.: This is what’s important – using speech and gesture to create focus in multimodal utterance. In: Traum, D., Swartout, W., Khooshabeh, P., Kopp, S., Scherer, S., Leuski, A. (eds.) IVA 2016. LNCS, vol. 10011, pp. 96–109. Springer, Cham (2016). doi: 10.1007/978-3-319-47665-0_9CrossRefGoogle Scholar
  8. 8.
    Gebhard, P., Mehlmann, G.U., Kipp, M.: Visual SceneMaker: A Tool for Authoring Interactive Virtual Characters. Journal of Multimodal User Interfaces: Interacting with Embodied Conversational Agents, Springer-Verlag 6(1–2), 3–11 (2012)CrossRefGoogle Scholar
  9. 9.
    Kendon, A.: Gesture. Visible action as utterance. Cambridge University Press, Cambridge (2004)Google Scholar
  10. 10.
    Lock, A. (ed.): Action, gesture, and symbol: The emergence of language. Academic Press, London (1978)Google Scholar
  11. 11.
    Mann, W.C., Thompson, S.A.: Rhetorical structure theory: Toward a functional theory of text organization. Text - Interdisciplinary Journal for the Study of Discourse 8(3), 243–281 (2009)Google Scholar
  12. 12.
    Marsella, S., Xu, Y., Lhommet, M., Feng, A., Scherer, S., Shapirok, A.: Virtual character performance from speech. In: SCA 2013 Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 25–35 (2013)Google Scholar
  13. 13.
    McNeill, D.: Hand and mind: What gestures reveal about thought. University of Chicago Press, Chicago (1992)Google Scholar
  14. 14.
    McNeill, D. (ed.): Language and gesture. Cambridge University Press, Cambridge (2000)Google Scholar
  15. 15.
    Meditskos, G., Dasiopoulou, S., Pragst, L., Ultes, S., Vrochidis, S., Kompatsiaris, I., Wanner, L.: Towards an ontology-driven adaptive dialogue framework. In: Proceedings of the 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction (MARMI), pp. 15–20. ACM, New York (2016)Google Scholar
  16. 16.
    Mille, S., Burga, A., Carlini, R., Wanner, L.: FORGe at SemEval-2017 Task 9: Deep sentence generation based on a sequence of graph transducers. In: Proceedings of SemEval 2017. Association for Computational Linguistics, Vancouver (2017)Google Scholar
  17. 17.
    Moore, J., Paris, C.: Planning Text for Advisory Dialogues. Capturing Intentional and Rhetorical Information. Computational Linguistics 19(4), 1–46 (1993)Google Scholar
  18. 18.
    Pelachaud, C., Badler, N.I., Steedman, M.: Generating facial expressions for speech. Cognitive Science 20, 1–46 (1996)CrossRefGoogle Scholar
  19. 19.
    Quintas, J., Menezes, P., Dias, J.: Auto-adaptive interactive systems for active and assisted living applications. In: Camarinha-Matos, L.M., Falcão, A.J., Vafaei, N., Najdi, S. (eds.) DoCEIS 2016. IAICT, vol. 470, pp. 161–168. Springer, Cham (2016). doi: 10.1007/978-3-319-31165-4_17CrossRefGoogle Scholar
  20. 20.
    de Rosis, F., Pelachaud, C., Poggi, I., Carofiglio, V., De Carolis, N.: From Greta’s Mind to her Face: Modeling the Dynamics of Affective States in a Conversational Embodied Agent. International Journal of Human-Computer Studies 59(1–2), 81–118 (2003)CrossRefGoogle Scholar
  21. 21.
    Vilhjálmsson, H., Cantelmo, N., Cassell, J., E. Chafai, N., Kipp, M., Kopp, S., Mancini, M., Marsella, S., Marshall, A.N., Pelachaud, C., Ruttkay, Z., Thórisson, K.R., van Welbergen, H., van der Werf, R.J.: The behavior markup language: recent developments and challenges. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS, vol. 4722, pp. 99–111. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-74997-4_10CrossRefGoogle Scholar
  22. 22.
    Walker, M., Whittaker, S., Stent, A., Maloor, P., Moore, J., Johnston, M., Vasireddy, G.: Generation and evaluation of user tailored responses in multimodal dialogue. Cognitive Science 28(5), 811–840 (2004)CrossRefGoogle Scholar
  23. 23.
    Wanner, L., André, E., Blat, J., Dasiopoulou, S., Farrùs, M., Fraga, T., Kamateri, E., Lingenfelser, F., Llorach, G., Martínez, O., Meditskos, G., Mille, S., Minker, W., Pragst, L., Schiller, D., Stam, A., Stellingwerff, L., Sukno, F., Vieru, B., Vrochidis, S.: KRISTINA: a knowledge-based virtual conversation agent. In: Demazeau, Y., Davidsson, P., Bajo, J., Vale, Z. (eds.) PAAMS 2017. LNCS (LNAI), vol. 10349, pp. 284–295. Springer, Cham (2017). doi: 10.1007/978-3-319-59930-4_23CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Universitat Pompeu FabraBarcelonaSpain
  2. 2.ICREABarcelonaSpain

Personalised recommendations