Multimodal Open-Domain Conversations with the Nao Robot

Conference paper


In this paper we discuss the design of human-robot interaction focussing especially on social robot communication and multimodal information presentation. As a starting point we use the WikiTalk application, an open-domain conversational system which has been previously developed using a robotics simulator. We describe how it can be implemented on the Nao robot platform, enabling Nao to make informative spoken contributions on a wide range of topics during conversation. Spoken interaction is further combined with gesturing in order to support Nao’s presentation by natural multimodal capabilities, and to enhance and explore natural communication between human users and robots.


  1. 1.
    Allwood, J.: Linguistic Communication as Action and Cooperation: A Study in Pragmatics. Gothenburg Monographs in Linguistics 2. University of Gothenburg, Gothenburg (1976)Google Scholar
  2. 2.
    Csapo, A., Gilmartin, E., Grizou, J., Han, J., Meena, R., Anastasiou, D., Jokinen, K., Wilcock, G.: Multimodal conversational interaction with a humanoid robot. In: Proceedings of 3rd IEEE International Conference on Cognitive Infocommunications (CogInfoCom 2012). Kosice (2012)Google Scholar
  3. 3.
    Fong, T., Nourbaksh, I., Dautenhahn, K.: A survey of socially interactive robots. Robot. Auton. Syst. 42, 143–166 (2003)CrossRefMATHGoogle Scholar
  4. 4.
    Han, J., Campbell, N., Jokinen, K., Wilcock, G.: Investigating the use of non-verbal cues in human-robot interaction with a Nao robot. In: Proceedings of 3rd IEEE International Conference on Cognitive Infocommunications (CogInfoCom 2012). Kosice (2012)Google Scholar
  5. 5.
    Jokinen, K.: Constructive Dialogue Modelling: Speech Interaction and Rational Agents. Wiley, Chichester (2009)CrossRefGoogle Scholar
  6. 6.
    Jokinen, K.: Pointing gestures and synchronous communication management. In:  Esposito, A.,  Campbell, N.,  Vogel, C.,  Hussein, A.,  Nijholt, A. (eds.) Development of Multimodal Interfaces: Active Listening and Synchrony, pp. 33–49. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    Jokinen, K., Hurtig, T.: User expectations and real experience on a multimodal interactive system. In: Proceedings of 9th International Conference on Spoken Language Processing (Interspeech 2006). Pittsburgh, USA (2006)Google Scholar
  8. 8.
    Jokinen, K., Wilcock, G.: Emergent verbal behaviour in human-robot interaction. In: Proceedings of 2nd International Conference on Cognitive Infocommunications (CogInfoCom 2011). Budapest (2011)Google Scholar
  9. 9.
    Jokinen, K., Wilcock, G.: Constructive interaction for talking about interesting topics. In: Proceedings of Eighth International Conference on Language Resources and Evaluation (LREC 2012). Istanbul (2012)Google Scholar
  10. 10.
    Jokinen, K., Harada, K., Nishida, M., Yamamoto, S.: Turn-alignment using eye-gaze and speech in conversational interaction. In: Proceedings of 11th International Conference on Spoken Language Processing (Interspeech 2010). Makuhari, Japan (2010)Google Scholar
  11. 11.
    Kendon, A.: Gesture: Visible Action as Utterance. Cambridge University Press, Cambridge (2004)Google Scholar
  12. 12.
    Levitski, A., Radun, J., Jokinen, K.: Visual interaction and conversational activity. In: Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction: Eye Gaze and Multimodality. Santa Monica, USA (2012)Google Scholar
  13. 13.
    McCoy, K.F., Cheng, J.: Focus of attention: Constraining what can be said next. In:  Paris, C.,  Swartout, W.,  Mann, W. (eds.) Natural Language Generation in Artificial Intelligence and Computational Linguistics, pp. 103–124. Kluwer Academic Publishers, Boston (1991)CrossRefGoogle Scholar
  14. 14.
    Meena, R., Jokinen, K., Wilcock, G.: Integration of gestures and speech in human-robot interaction. In: Proceedings of 3rd IEEE International Conference on Cognitive Infocommunications (CogInfoCom 2012). Kosice (2012)Google Scholar
  15. 15.
    Quek, F.: Toward a vision-based hand gesture interface. In: Proceedings of the Virtual Reality System Technology Conference, pp. 17–29. Singapore (1994)Google Scholar
  16. 16.
    Swerts, M., Geluykens, R.: Prosody as a marker of information flow in spoken discourse. Lang. Speech 37, 21–43 (1994)Google Scholar
  17. 17.
    Wilcock, G.: WikiTalk: A spoken Wikipedia-based open-domain knowledge access system. In: Proceedings of the COLING 2012 Workshop on Question Answering for Complex Domains, pp. 57–69. Mumbai, India (2012)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.University of HelsinkiHelsinkiFinland

Personalised recommendations