Evaluating User-System Interactional Chains for Naturalness-oriented Spoken Dialogue Systems

Conference paper


In evaluating spoken dialogue systems, especially those aimed at actualizing a natural conversation with users, it seems necessary to assess whether the interactions between users and systems are chained appropriately. In this paper, we propose a response evaluation coding scheme for user dialogue actions and evaluate the interactional chains achieved by the users and the systems from the viewpoint of appropriateness of reactions. Two dialogue data sets of interactions between users and sightseeing guidance systems were evaluated by this scheme. The differences between the two systems and the points that should be improved were determined by analyzing the data coded by the scheme. From the result of multiple regression analysis, it was also suggested that the appropriate ratings (AR) for user actions might contribute to predicting the subjective judgment concerning naturalness of the system actions.


Speech Recognition Dialogue System Word Error Rate Dialogue Action Speech Understanding 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    N. Beringer, U. Kartal, K. Louka, F. Schiel, and U. Turk. Promise - a procedure for multimodal interactive system evaluation. In 3rd International Conference on Language Resources and Evaluation (LREC), pages 77–80, 2002.Google Scholar
  2. 2.
    Mark G. Core and James F. Allen. Coding dialogs with the DAMSL annotation scheme. In Working Notes of AAAI Fall Symposium on Communicative Action in Humans and Machines, Boston, 1997.Google Scholar
  3. 3.
    Klaus-Peter Engelbrecht and Sebastian M¨oller. Sequential classifiers for the prediction of user judgments about spoken dialog systems. Speech Communication, 52, 10:816–833, 2010.Google Scholar
  4. 4.
    H.P. Grice. Logic and conversation. In P. Cole and J Morgan, editors, Syntax and Semantics volume 3, pages 41–58. Academic Press, New York, 1975.Google Scholar
  5. 5.
    C. Hori, K. Othake, T. Misu, H. Kashioka, and S. Nakamura. Statistical dialog management applied to wfst-based dialog systems. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 4793–4796, 2009.Google Scholar
  6. 6.
    Kazunori Komatani, Shinichi Ueno, Tatsuya Kawahara, and Hiroshi G. Okuno. User modeling in spoken dialogue systems to generate flexible guidance. User Modeling and User-Adapted Interaction, 15, 1:169–183, 2005.Google Scholar
  7. 7.
    Ramón López-Cózar and Michael Mctear Zoraida Callejas. Testing the performance of spoken dialogue systems by means of an artificially simulated user. Artificial Intelligence Review, 26, 4:291–323, 2006.Google Scholar
  8. 8.
    Teruhisa Misu, Kiyonori Ohtake, Chiori Hori, Hideki Kashioka, and Satoshi Nakamura. Annotating communicative function and semantic content in dialogue act for construction of consulting dialogue systems. In Proceedings of Interspeech 2009, pages 1843–1846, 2009.Google Scholar
  9. 9.
    S. Möller. Evaluating interactions with spoken dialogue telephone services. In L. Dybkjær, editor, Recent Trends in Discourse and Dialogue, pages 69–100. Springer, 2007.Google Scholar
  10. 10.
    Sebastian Möller, Klaus-Peter Engelbrecht, and Robert Schleicher. Predicting the quality and usability of spoken dialogue services. Speech Communication, 50, 8–9:730–744, 2008.Google Scholar
  11. 11.
    Kiyonori Ohtake, Teruhisa Misu, Chiori Hori, Hideki Kashioka, and Satoshi Nakamura. Dialogue acts annotation to construct dialogue systems for consulting. In Spoken Dialogue Systems Technology and Design, pages 231–254. Springer, 2010.Google Scholar
  12. 12.
    Susan Robinson, Antonio Roque, and David Traum. Dialogues in context: An objective user-oriented evaluation approach for virtual human dialogue. In Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10), 2010.Google Scholar
  13. 13.
    Susan Robinson, David Trauma, Midhun Ittycheriah, and Joe Henderer. What would you ask a conversational agent? Observations of human-agent dialogues in a museum setting. In Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), pages 64–71, 2008.Google Scholar
  14. 14.
    Harvey Sacks, Emanuel A. Schegloff, and Gail Jefferson. A simplest systematics for the organization of turn-taking for conversation. Language, 50:696– 735, 1974.CrossRefGoogle Scholar
  15. 15.
    E. Shriberg, R. Dhillon, S. Bhagat, J. Ang, and H. Carvey. The ICSI meeting recorder dialog act (MRDA) corpus. In M. Strube and C. Sidner, editors, Proc. 5th SIGdial Workshop on Discourse and Dialogue, pages 97–100. Cambridge, MA, 2004.Google Scholar
  16. 16.
    David R. Traum, Susan Robinson, and Jens Stephan. Evaluation of multi-party virtual reality dialogue interaction. In European Language Resources Association (ELRA), editor, Proceedings of the Sixth International Language Resources and Evaluation (LREC’04), Marrakech, Morocco, 2004.Google Scholar
  17. 17.
    Marilyn Walker, Diane Litman, Candace Kamm, and Alicia Abella. Paradise: A framework for evaluating spoken dialogue agents. In Proceedings of the 35th Annual Meeting of the Association of Computational Linguistics (ACL 97), 1997.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.National Institute of Information and Communications TechnologyKyotoJapan

Personalised recommendations