A Spoken Dialog System for Chat-Like Conversations Considering Response Timing

  • Ryota Nishimura
  • Norihide Kitaoka
  • Seiichi Nakagawa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4629)


If a dialog system can respond to a user as naturally as a human, the interaction will be smoother. In this research, we aim to develop a dialog system by emulating the human behavior in a chat-like dialog. In this paper, we developed a dialog system which could generate chat-like responses and their timing using a decision tree. The system could perform “collaborative completion,” “aizuchi” (back-channel) and so on. The decision tree utilized the pitch and the power contours of user’s utterance, recognition hypotheses, and response preparation status of the response generator, at every time segment as features to generate response timing.


Prosodic Feature Pause Duration Speech Recognizer Dialog System Speech Synthesizer 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Takeuchi, M., Kitaoka, N., Nakagawa, S.: Timing detection for realtime dialog systems using prosodic and linguistic information. In: Speech Prosody 2004, pp. 529–532 (2004)Google Scholar
  2. 2.
    Koiso, H., Horiuchi, Y., Tutiya, S., Ichikawa, A., Den, Y.: An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. Language and Speech 41(3-4), 291–317 (1998)Google Scholar
  3. 3.
    Geluykens, R., Swerts, M.: Prosodic cues to discourse boundaries in experimental dialogues. Speech Communication 15, 69–77 (1994)CrossRefGoogle Scholar
  4. 4.
    Hirschberg, J.: Communication and prosody: functional aspects of prosody. Speech Communication 36, 31–43 (2002)CrossRefGoogle Scholar
  5. 5.
    Ohsuga, T., Nishida, M., Horiuchi, Y., Ichikawa, A.: Investigation of the relationship between turn-taking and prosodic features in spontaneous dialogue. In: Proceedings of Eurospeech 2005, pp. 33–36 (2005)Google Scholar
  6. 6.
    Ward, N., Tsukahara, W.: Prosodic features which cue back-channel responses in English and Japanese. Journal of Pragmatics 32, 1177–1207 (2000)CrossRefGoogle Scholar
  7. 7.
    Okato, Y., Kato, K., Yamamoto, M., Itahashi, S.: Insertion of interjectory response based on prosodic information. In: IEEE Workshop Interactive Voice Technology for Telecommunication Applications (IVTTA 1996), pp. 85–88 (1996)Google Scholar
  8. 8.
    Noguchi, H., Den, Y.: Prosody-based detection of the context of backchannel responses. In: Proceedings of ICSLP 1998, pp. 487–490 (1998)Google Scholar
  9. 9.
    Sato, R., Higashinaka, R., Tamoto, M., Nakano, M., Aikawa, K.: Learning decision tree to determine turn-taking by spoken dialogue systems. In: ICSLP 2002, pp. 861–864 (2002)Google Scholar
  10. 10.
    Hirasawa, J., Nakano, M., Kawabata, T., Aikawa, K.: Effects of system barge-in responses on user impressions. In: EUROSPEECH 1999, vol. 3, pp. 1391–1394 (1999)Google Scholar
  11. 11.
    Kamm, C., Narayanan, S., Dutton, D., Ritenour, R.: Evaluating spoken dialogue systems for telecommunication services. In: Eurospeech 1997, pp. 2203–2206 (1997)Google Scholar
  12. 12.
    Fujie, S., Fukushima, K., Kobayashi, T.: Back-channel feedback generation using linguistic and nonlinguistic information and its application to spoken dialogue system. In: Interspeech 2005, pp. 889–892 (2005)Google Scholar
  13. 13.
    Kai, A., Nakagawa, S.: A frame-synchronous continuous speech recognition algorithm using a top-down parsing of context-free grammar, pp. 257–260 (1992)Google Scholar
  14. 14.
    Goto, M., Itou, K., Hayamizu, S.: A real-time filled pause detection system for spontaneous speech recognition. In: Eurospeech 1999, pp. 227–230 (1999)Google Scholar
  15. 15.
    Weizenbaum, J.: ELIZA — a computer program for the study of natural language communication between man and machine. Communications of the ACM 9(1), 36–45 (1965)CrossRefGoogle Scholar
  16. 16.
    Kawamoto, S., Shimodaira, H., Nitta, T., Nishimoto, T., Nakamura, S., Itou, K., Morishima, S., Yotsukura, T., Kai, A., Lee, A., Yamashita, Y., Kobayashi, T., Tokuda, K., Hirose, K., Minematsu, N., Yamada, A., Den, Y., Utsuro, T., Sagayama, S.: Open-source software for developing anthropomorphic spoken dialog agent. In: Ishizuka, M., Sattar, A. (eds.) PRICAI 2002. LNCS (LNAI), vol. 2417, pp. 64–69. Springer, Heidelberg (2002)Google Scholar
  17. 17.
    Ishizaki, M., Den, Y.: Danwa to taiwa. Tokyo Daigaku Shuppankai (in Japanese) (2001)Google Scholar
  18. 18.
    Tanaka, K., Hayamizu, S., Yamasita, Y., Shikano, K., Itahashi, S., Oka, R.: Design and data collection for a spoken dialogue database in the real world computing program. In: Proc. ASA-ASJ Third Joint Meeting, pp. 1027–1030 (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Ryota Nishimura
    • 1
  • Norihide Kitaoka
    • 2
  • Seiichi Nakagawa
    • 1
  1. 1.Department of Information and Computer Sciences, Toyohashi University of TechnologyJapan
  2. 2.Graduate School of Information Science, Nagoya UniversityJapan

Personalised recommendations