, Volume 24, Issue 3, pp 225–235 | Cite as

Toward a multi-culture adaptive virtual tour guide agent with a modular approach

  • Hung-Hsuan HuangEmail author
  • Aleksandra Cerekovic
  • Igor S. Pandzic
  • Yukiko Nakano
  • Toyoaki Nishida
Original Article


Embodied conversational agents (ECAs) are computer-generated, human-like characters that interact with human users in face-to-face conversations. ECA is a powerful tool for representing cultural differences and is suitable for interactive training or edutainment systems. This article presents preliminary results from the development of a culture-adaptive virtual tour guide agent for serving Japanese, Croatian, and general Western users by displaying appropriate verbal and non-verbal behaviors. It is being implemented in Generic ECA Framework, a modular framework for developing ECAs. Dividing the ECA functions into reusable and loosely coupled modules minimizes the effort required to implement additional behavior and facilitates incremental scale up of the system.


Speech Recognition Tour Guide Interface Agent Speech Recognizer English Alphabet 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We thank Kateryna Tarasenko, Vjekoslav Levacic, Goranka Zoric, and Margus Treumuth for their contributions to this project during the eNTERFACE’06 summer workshop, and Takuya Furukawa and Yuji Yamaoka for their contributions to this project during the eNTERFACE’08 summer workshop. We also thank Tsuyoshi Masuda for his contribution in the application for experiencing the cross-cultural differences in gestures.


  1. A.L.I.C.E. AI Foundation (2005) Artificial Intelligence Markup Language (AIML).
  2. Baylor AL, Rosenberg-Kima RB, Plant EA (2006) Interface agents as social models: The impact of appearance on females’ attitude toward engineering. In: Conference on human factors in computing systems (CHI’06), MontrealGoogle Scholar
  3. Cerekovic A, Huang HH, Furukawa T, Yamaoka Y, Pandzic IS, Nishida T, Nakano Y (2008) Implementing a multiparty support in a tour guide system with an embodied conversational agent (ECA). In: The eNTERFACE’08 international workshop on multimodal interfaces. Orsay, FranceGoogle Scholar
  4. Costa A, Pickering MJ, Sorace A (2008) Alignment in second language dialogue. Lang Cogn Process 23(4):528–556CrossRefGoogle Scholar
  5. de Rosis F, Pelachaud C, Poggi I (2004) Transcultural believability in embodied agents. A matter of consistent adaption. In: Agent culture: human–agent interaction in a multicultural world. Lawrence Erlbaum Associates, London, pp 75–105Google Scholar
  6. Gratch J, Marsella S (2004) A domain-independent framework for modeling emotion. J Cogn Syst Res 5:269–306CrossRefGoogle Scholar
  7. Hall ET (1992) Beyond culture. Peter Smith Publisher, GloucesterGoogle Scholar
  8. Hamiru.aqui (2004) 70 Japanese gestures—no language communication. IBC Publishing, WestminsterGoogle Scholar
  9. Hoya Corp (2008) Pentax VoiceText text-to-speech engine.
  10. Huang HH, Cerekovic A, Tarasenko K, Levacic V, Zoric G, Treumuth M, Pandzic IS, Nakano Y, Nishida T (2006) An agent based multicultural user interface in a customer service application. In: The eNTERFACE’06 international workshop on multimodal interfaces. Dubrovnik, CroatiaGoogle Scholar
  11. Huang HH, Inoue T, Cerekovic A, Nakano Y, Pandzic IS, Nishida T (2007) A quiz game console based on a generic embodied conversational agent framework. In: Seventh international conference on intelligent virtual agents (IVA’07). Paris, France, pp 383–384Google Scholar
  12. Huang HH, Cerekovic A, Nakano Y, Pandzic IS, Nishida T (2008a) The design of a generic framework for integrating ECA components. In: Padgham L, Parkes D, Muller JP (eds) The 7th international conference of autonomous agents and multiagent systems (AAMAS’08), Inesc-Id, Estoril, Portugal, pp 128–135Google Scholar
  13. Huang HH, Cerekovic A, Tarasenko K, Levacic V, Zoric G, Pandzic IS, Nakano Y, Nishida T (2008b) An agent based multicultural tour guide system with nonverbal user interface. Int J Multimodal User Interfaces 1(1):41–48CrossRefGoogle Scholar
  14. Huang HH, Furukawa T, Ohashi H, Ohmoto Y, Nishida T (2008c) Toward a virtual quiz agent who interacts with user groups. In: The 7th international workshop on social intelligence design (SID’08). Puerto RicoGoogle Scholar
  15. Iacobelli F, Cassell J (2007) Ethnic identity and engagement in embodied conversational agents. In: Proceedings of the 7th international conference on intelligent virtual agents (IVA’07). Springer, Paris, pp 57–63Google Scholar
  16. Intel Corp (2006) Open computer vision library (OpenCV) 1.0.
  17. Ipsic S, Zanert J, Ipsic I (2003) Speech recognition of Croatian and Slovenian weather forecast. In: Proceedings of 4th EURASIP conference, France, pp 637–642Google Scholar
  18. Isbister K (2004) Building bridges through the unspoken: embodied agents to facilitate intercultural communication. In: Agent culture: human–agent interaction in a multicultural world. Lawrence Erlbaum Associates, London, pp 233–244Google Scholar
  19. Johnson WL, Vilhjalmsson H, Marsella S (2005) Serious games for language learning: How much game, how much AI? In: Proceedings of the 12th international conference on artificial intelligence in education. Amsterdam, The NetherlandsGoogle Scholar
  20. Johnston M, Bangalore S (2000) Finite-state multimodal parsing and understanding. In: Proceedings of the 18th conference on computational linguistics. Saarbrucken, GermanyGoogle Scholar
  21. Kato H (2006) Artoolkit.
  22. Larsson S, Traum DR (2000) Information state and dialogue management in the trindi dialogue move engine toolkit. Natural Language Engineering, Cambridge University Press 6(3–4):323–340Google Scholar
  23. mind.makersorg (2005) OpenAIR protocol specification 1.0.
  24. Nakano Y, Okamoto M, Kawahara D, Li Q, Nishida T (2004) Converting text into agent animations: assigning gestures to text. In: Proceedings of the human language technology conference (HLT-NAACL’04). ACL Press, PragueGoogle Scholar
  25. Nass C, Isbister K, Lee EJ (2000) Truth is beauty, researching embodied conversational agents. In: Embodied conversational agents. The MIT Press, Cambridge, pp 374–402Google Scholar
  26. Peic R (2003) A speech recognition algorithm based on the features of Croatian language. In: Proceedings of the 4th EURASIP conference. Dubrovnik, Croatia, pp 613–618Google Scholar
  27. Pickering MJ, Garrod S (2004) Toward a mechanistic psychology of dialogue. Behav Brain Sci 27:169–226Google Scholar
  28. Pickering MJ, Garrod S (2006) Alignment as the basis for successful communication. Res Lang Comput 4:203–228CrossRefGoogle Scholar
  29. Rehm M, Andre E, Bee N, Endrass B, Wissner M, Nakano Y, Nishida T, Huang HH (2007a) The CUBE-G approach—coaching culture-specific nonverbal behavior by virtual agents. In: The 38th conference of the international simulation and gaming association (ISAGA). Nijmegen, New ZealandGoogle Scholar
  30. Rehm M, Bee N, Endrass B, Wissner M, Andre E (2007b) Too close for comfort? In: Proceedings of the international workshop on human-centered multimedia, ACM MultimediaGoogle Scholar
  31. Rehm M, Nakano Y, Andre E, Nishida T (2008a) Culture-specific first meeting encounters between virtual agents. In: Prendinger H, Lester J, Ishizuka M (eds) Proceedings of the 8th international conference on intelligent virtual agents (IVA’08). Tokyo, Japan, pp 223–236Google Scholar
  32. Rehm M, Gruneberg F, Nakano Y, Lipi AA, Yamaoka Y, Huang HH (2008b) Creating a standardized corpus of multimodal interactions for enculturating conversational interfaces. In: Workshop on enculturating conversational interfaces by socio-cultural as-pects of communication, 2008 international conference on intelligent user interfaces (IUI2008). Canary Islands, SpainGoogle Scholar
  33. Solomon S, van Lent M, Core M, Carpenter P, Rosenberg M (2008) A language for modeling cultural norms, biases and stereotypes for human behavior models. In: Proceedings of the 17th conference on behavior representation in modeling and simulation (BRIMS’08)Google Scholar
  34. Thiebaux M, Marshall AN, Marsella S, Kallmann M (2008) Smartbody: Behavior realization for embodied conversational agents. In: The 7th international conference of autonomous agents and multiagent systems (AAMAS’08), Estoril, PortugalGoogle Scholar
  35. Traum D, Larsson S (2003) The information state approach to dialogue management. In: Smith R, van Kuppevelt J (eds) Current and new directions in discourse and dialogue. Kluwer, Dordrecht, pp 325–353Google Scholar
  36. Traum D, Roque A, Georgiou ALP, Gerten J, Martinovski B, Narayanan S, Robinson S, Vaswani A (2007) Hassan: a virtual human for tactical questioning. In: The 8th SIGdial workshop on discourse and dialogue, Antwerp, BelgiumGoogle Scholar
  37. Visage Technologies AB (2008) Visage|SDK.
  38. W3C (2004) Emma: extensible multimodal annotation markup language.
  39. Young PA (2008) Integrating culture in the design of ICTS. Br J Educational Technol 39(1):6–17Google Scholar
  40. Zoric G, Pandzic IS (2005) A real-time language independent lip synchronization method using a genetic algorithm. In: the Proceedings of ICME’05, Amsterdam, The NetherlandsGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2009

Authors and Affiliations

  • Hung-Hsuan Huang
    • 1
    Email author
  • Aleksandra Cerekovic
    • 2
  • Igor S. Pandzic
    • 2
  • Yukiko Nakano
    • 3
  • Toyoaki Nishida
    • 1
  1. 1.Graduate School of InformaticsKyoto UniversityKyotoJapan
  2. 2.Faculty of Electrical Engineering and ComputingUniversity of ZagrebZagrebCroatia
  3. 3.Faculty of Science and TechnologySeikei UniversitySeikeiJapan

Personalised recommendations