Skip to main content
Log in

Can We Talk? Methods for Evaluation and Training of Spoken Dialogue Systems

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

There is a strong relationship between evaluation and methods for automatically training language processing systems, where generally the same resource and metrics are used both to train system components and to evaluate them. To date, in dialogue systems research, this general methodology is not typically applied to the dialogue manager and spoken language generator. However, any metric for evaluating system performance can be used as a feedback function for automatically training the system. This approach is motivated with examples of the application of reinforcement learning to dialogue manager optimization, and the use of boosting to train the spoken language generator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • André E., Rist T., van Mulken S., Klesen M., Baldes S. (2000) The Automated Design of Believable Dialogues for Animated Presentation Teams’. Embodied conversational agents pp. 220–255.

  • R. Barzilay N. Elhadad K.R. McKeown (2002) ArticleTitleInfering Strategies for Sentence Ordering in Multidocument Summarization Journal of Artificial Intelligence Research 17 35–55

    Google Scholar 

  • Freund Y., Iyer R., Schapire R.E., Singer Y. (1998) An Efficient Boosting Algorithm for Combining Preferences. In Machine Learning: Proceedings of the Fifteenth International Conference.

  • L. Hirschman (2000) Evaluating Spoken Language Interaction: Experiences from the DARPA Spoken Language Program 1990–1995 S. Luperfoy (Eds) Spoken Language Discourse MIT Press Cambridge Mass

    Google Scholar 

  • Lapata M. (2003) Probabilistic Text Structuring: Experiments with Sentence Ordering. In Proceedings of the ACL.

  • Litman D.J., Kearns M.S., Singh S., Walker M.A. (2000), Automatic Optimization of Dialogue Management. In Proc. COLING 2000.

  • Rambow O., Rogati M., Walker M. (2001) Evaluating a Trainable Sentence Planner for a Spoken Dialogue Travel System. In Proceedings of the Meeting of the Association for Computational Lingustics, ACL 2001.

  • Reiter E. (2002) Should Corpora be Gold Standards for NLG?. In Proceedings of the 11th International Workshop on Natural Language Generation. pp. 97–104.

  • Scheffler K., Young S. (2002) Automatic Learning of Dialogue Strategy using Dialogue Simulation and Reinforcement Learning. In Human Language Technology Conference.

  • Stent A., Prasad R., Walker M. (2004) Trainable Sentence Planning for Complex Information Presentation in Spoken Dialogue Systems. In Meeting of the Association for Computational Linguistics.

  • Sutton R.S., Barto A.G. (1998) Reinforcement Learning. MIT Press.

  • M.A. Walker (2000) ArticleTitleAn Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email Journal of Artificial Intelligence Research 12 387–416

    Google Scholar 

  • Walker M.A., Litman D., Kamm C.A., Abella A. (1997) PARADISE: A General Framework for Evaluating Spoken Dialogue Agents. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, ACL/EACL 97. pp. 271–280.

  • C.-L. Yeh C. Mellish (1997) ArticleTitleAn Empirical Study on the Generation of Anaphora in Chinese Computational Linguistics 23-1 169–190

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marilyn A. Walker.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Walker, M.A. Can We Talk? Methods for Evaluation and Training of Spoken Dialogue Systems. Language Res Eval 39, 65–75 (2005). https://doi.org/10.1007/s10579-005-2696-1

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-005-2696-1

Keywords

Navigation