Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents

Kawamoto, Shin-ichi; Shimodaira, Hiroshi; Nitta, Tsuneo; Nishimoto, Takuya; Nakamura, Satoshi; Itou, Katsunobu; Morishima, Shigeo; Yotsukura, Tatsuo; Kai, Atsuhiko; Lee, Akinobu; Yamashita, Yoichi; Kobayashi, Takao; Tokuda, Keiichi; Hirose, Keikichi; Minematsu, Nobuaki; Yamada, Atsushi; Den, Yasuharu; Utsuro, Takehito; Sagayama, Shigeki

doi:10.1007/978-3-662-08373-4_9

Shin-ichi Kawamoto⁴,
Hiroshi Shimodaira⁴,
Tsuneo Nitta⁶,
Takuya Nishimoto⁵,
Satoshi Nakamura⁷,
Katsunobu Itou⁸,
Shigeo Morishima⁹,
Tatsuo Yotsukura⁷,
Atsuhiko Kai¹⁰,
Akinobu Lee¹¹,
Yoichi Yamashita¹²,
Takao Kobayashi¹³,
Keiichi Tokuda¹⁴,
Keikichi Hirose⁵,
Nobuaki Minematsu⁵,
Atsushi Yamada¹⁵,
Yasuharu Den¹⁶,
Takehito Utsuro¹⁷ &
…
Shigeki Sagayama⁵

Part of the book series: Cognitive Technologies ((COGTECH))

202 Accesses
5 Citations

Summary

Galatea is a software toolkit to develop a human-like spoken dialog agent. In order to easily integrate the modules of different characteristics including speech recognizer, speech synthesizer, facial animation synthesizer, and dialog controller, each module is modeled as a virtual machine having a simple common interface and connected to each other through a broker (communication manager). Galatea employs model-based speech and facial animation synthesizers whose model parameters are adapted easily to those for an existing person if his or her training data is given. The software toolkit that runs on both UNIX/Linux and Windows operating systems will be publicly available in the middle of 2003 [7, 6].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adachi, H., Katsurada, K., Yamada, H., Nitta, T.: Development of a prototyping tool for MMI systems. In: Information Processing Society of Japan,Technical Report 2002-SLP-43 (in Japanese) (2002) pp 7–12
Google Scholar
Cassell, J., Bickmore, T., Campbell, L., Chang, K., Vilhjâlmsson, H., Yan, H.: Requirements for an architecture for embodied conversational characters. In: Proceedings of Computer Animation and Simulation ‘89 (Eurographics Series), ed Thalmann, D., Thalmann, N. (1999) pp 109–122
Chapter Google Scholar
DARPA Communicator Program. http://fofoca.mitre.org/(1998)
Dohi, H., Ishizuka, M.: Visual Software Agent: A realistic face-to-face style interface connected with WWW/Netscape. In: IJCAI Workshop on Intelligent Multimodal Systems (1997) pp 17–22
Google Scholar
Ekman, P., Friesen, W.V.: Facial Action Coding System (FAGS): A technique for the measurement of facial action ( Consulting Psychologists Press, Palo Alto, CA 1978 )
Google Scholar
Galatea Toolkit. http://hil.t.u-tokyo.ac.jp/-galatea/(2002)
Galatea Toolkit. http://iipl.jaist.ac.jp/IPA/(2002)
Gustafson, J., Lindberg, N., Lundeberg, M.: The August spoken dialogue system. In: EuroSpeech (1999) pp 1151–1154
Google Scholar
HMM-Based Speech Synthesis Toolkit. http://hts.ics.nitech.ac.jp/(2002)
Julia, L., Cheyer, A.: Is talking to virtual more realistic? In: EuroSpeech (1999) pp 1719–1722
Google Scholar
Katsurada, K., Otani, Y., Nakamura, Y., Kobayashi, S., Yamada, H., Nitta, T.: A modality-independent MMI system architecture. In: Proceedings ICSLP (2002) pp 2549–2552
Google Scholar
Kawahara, T., Kobayashi, T., Takeda, T., Minematsu, N., Itou, K., Yamamoto, M., Utsuro, T., Shikano, K.: Sharable software repository for Japanese large vocabulary continuous speech recognition. In: Proceedings ICSLP (1998) pp 3257–3260
Google Scholar
MMI Description Language XISL. http://www.vox.tutkie.tut.ac.jp/XISL/XISLE.html(2002)
Morishima, S.: Face analysis and synthesis. IEEE Signal Processing Magazine 18 (3): 26–34 (2001)
Article Google Scholar
Morphological Analyzer ChaSen. http://chasen.aist-nara.ac.jp/index.html.en(2000)
Google Scholar
Nishimoto, T., Araki, M., Niimi, Y.: RadioDoc: A voice-accessible document system. In: Proceedings ICSLP (2002) pp 1485–1488
Google Scholar
The Open Agent Architecture. http://www.ai.sri.com/-oaa/(2001)
Sakamoto, K., Hinode, H., Watanuki, K., Seki, S., Kiyama, J., Togawa, F.: A response model for a CG character based on timing of interactions in a multimodal human interface. In: IUI-97 (1997) pp 257–260
Google Scholar
Seneff, S., Hurley, E., Lau, R., Pao, C., Schmid, P., Zue, V.: GALAXY-II: A referece architecture for conversational system development. In: Proceedings ICSLP (1998) pp 931–934
Google Scholar
Standard of symbols for Japanese text-to-speech synthesizer: JEIDA-62–2000 (2000)
Google Scholar
Sutton, S., Cole, R.: Universal speech tools: The CSLU Toolkit. In: Proceedings ICSLP (1998) pp 3221–3224
Google Scholar
Tamura, M., Masuko, T., Tokuda, K., Kobayashi, T.: Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR. In: ICASSP,Vol. 2 (2001) pp 805–808
Google Scholar
Tamura, M., Masuko, T., Tokuda, K., Kobayashi, T.: Text-to-speech synthesis with arbitrary speaker’s voice from average voice. In: Proceedings of European Conference on Speech Communication and Technology,Vol. 1 (2001) pp 345–348
Google Scholar
Ushida, H., Hirayama, Y., Nakajima, H.: Emotion model for life-like agent and its evaluation. In: AAAI-98 (1998) pp 62–69
Google Scholar
Voice eXtensible Markup Language (VoiceXML), Version 1.0. http://www.voicexml.org(2000)
XSL Transformations (XSLT), Version 1.0. http://www.w3.org/TR/xslt (1999)
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: EuroSpeech,Vol. 5 (1999) pp 2347–2350
Google Scholar

Download references

Author information

Authors and Affiliations

Japan Advanced Institute of Science and Technology, School of Information Science, 1-1, Asahidai, Tatsunokuchi, Ishikawa, 923-1292, Japan
Shin-ichi Kawamoto & Hiroshi Shimodaira
The University of Tokyo, Japan
Takuya Nishimoto, Keikichi Hirose, Nobuaki Minematsu & Shigeki Sagayama
Toyohashi University of Technology, Japan
Tsuneo Nitta
Advanced Telecommunications Research Institute International, Kyoto, Japan
Satoshi Nakamura & Tatsuo Yotsukura
Nagoya University, Japan
Katsunobu Itou
Seikei University, Tokyo, Japan
Shigeo Morishima
Shizuoka University, Japan
Atsuhiko Kai
Nara Institute of Science and Technology, Japan
Akinobu Lee
Ritsumeikan University, Shiga, Japan
Yoichi Yamashita
Tokyo Institute of Technology, Japan
Takao Kobayashi
Nagoya Institute of Technology, Japan
Keiichi Tokuda
The Advanced Software Technology and Mechatronics Research Institute of Kyoto, Japan
Atsushi Yamada
Chiba University, Japan
Yasuharu Den
Kyoto University, Japan
Takehito Utsuro

Authors

Shin-ichi Kawamoto
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Shimodaira
View author publications
You can also search for this author in PubMed Google Scholar
Tsuneo Nitta
View author publications
You can also search for this author in PubMed Google Scholar
Takuya Nishimoto
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Nakamura
View author publications
You can also search for this author in PubMed Google Scholar
Katsunobu Itou
View author publications
You can also search for this author in PubMed Google Scholar
Shigeo Morishima
View author publications
You can also search for this author in PubMed Google Scholar
Tatsuo Yotsukura
View author publications
You can also search for this author in PubMed Google Scholar
Atsuhiko Kai
View author publications
You can also search for this author in PubMed Google Scholar
Akinobu Lee
View author publications
You can also search for this author in PubMed Google Scholar
Yoichi Yamashita
View author publications
You can also search for this author in PubMed Google Scholar
Takao Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar
Keiichi Tokuda
View author publications
You can also search for this author in PubMed Google Scholar
Keikichi Hirose
View author publications
You can also search for this author in PubMed Google Scholar
Nobuaki Minematsu
View author publications
You can also search for this author in PubMed Google Scholar
Atsushi Yamada
View author publications
You can also search for this author in PubMed Google Scholar
Yasuharu Den
View author publications
You can also search for this author in PubMed Google Scholar
Takehito Utsuro
View author publications
You can also search for this author in PubMed Google Scholar
Shigeki Sagayama
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Information and Communication Engineering, Graduate School of Information Science and Technology, University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, 113-8656, Tokyo, Japan
Helmut Prendinger & Mitsuru Ishizuka &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kawamoto, Si. et al. (2004). Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents. In: Prendinger, H., Ishizuka, M. (eds) Life-Like Characters. Cognitive Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-08373-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-662-08373-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05655-0
Online ISBN: 978-3-662-08373-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics