Evaluation of Statistical POMDP-Based Dialogue Systems in Noisy Environments

  • Steve Young
  • Catherine Breslin
  • Milica Gašić
  • Matthew Henderson
  • Dongho Kim
  • Martin Szummer
  • Blaise Thomson
  • Pirros Tsiakoulis
  • Eli Tzirkel Hancock
Part of the Signals and Communication Technology book series (SCT)


Compared to conventional hand-crafted rule-based dialogue management systems, statistical POMDP-based dialogue managers offer the promise of increased robustness, reduced development and maintenance costs, and scaleability to large open-domains. As a consequence, there has been considerable research activity in approaches to statistical spoken dialogue systems over recent years. However, building and deploying a real-time spoken dialogue system is expensive, and even when operational, it is hard to recruit sufficient users to get statistically significant results. Instead, researchers have tended to evaluate using user simulators or by reprocessing existing corpora, both of which are unconvincing predictors of actual real world performance. This paper describes the deployment of a real-world restaurant information system and its evaluation in a motor car using subjects recruited locally and by remote users recruited using Amazon Mechanical Turk. The paper explores three key questions: are statistical dialogue systems more robust than conventional hand-crafted systems; how does the performance of a system evaluated on a user simulator compare to performance with real users; and can performance of a system tested over the telephone network be used to predict performance in more hostile environments such as a motor car? The results show that the statistical approach is indeed more robust, but results from a simulator significantly over-estimate performance both absolute and relative. Finally, by matching WER rates, performance results obtained over the telephone can provide useful predictors of performance in noisier environments such as the motor car, but again they tend to over-estimate performance.


  1. 1.
    Roy N, Pineau J, Thrun S (2000) Spoken dialogue management using probabilistic reasoning. In: Proceedings of ACLGoogle Scholar
  2. 2.
    Young S (2002) Talking to machines (statistically speaking). In: Proceedings of ICSLPGoogle Scholar
  3. 3.
    Williams J, Young S (2007) Partially observable markov decision processes for spoken dialog systems. Comput Speech Lang 21(2):393–422CrossRefGoogle Scholar
  4. 4.
    Young S, Gasic M, Thomson B, Williams J (2013) POMDP-based statistical spoken dialogue systems: a review. Proc IEEE 101(5):1160–1179CrossRefGoogle Scholar
  5. 5.
    Scheffler K, Young S (2000) Probabilistic simulation of human-machine dialogues. In: ICASSPGoogle Scholar
  6. 6.
    Pietquin O, Dutoit T (2006) A probabilistic framework for dialog simulation and optimal strategy learning. IEEE Trans Speech Audio Process, Spec Issue Data Min Speech, Audio Dialog 14(2):589–599Google Scholar
  7. 7.
    Schatzmann J, Weilhammer K, Stuttle M, Young S (2006) A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. KER 21(2):97–126Google Scholar
  8. 8.
    Pietquin O, Renals S (2002) ASR system modelling for automatic evaluation and optimisation of dialogue systems. In: International Conference on Acoustics Speech and Signal Processing. FloridaGoogle Scholar
  9. 9.
    Thomson B, Henderson M, Gasic M, Tsiakoulis P, Young S (2012) N-Best error simulation for training spoken dialogue systems. In: IEEE SLT 2012. MiamiGoogle Scholar
  10. 10.
    Tsiakoulis P, Gašić M, Henderson M, Planells-Lerma J, Prombonas J, Thomson B, Yu K, Young S, Tzirkel E (2012) Statistical methods for building robust spoken dialogue systems in an automobile. In: Proceedings of the 4th applied human factors and ergonomicsGoogle Scholar
  11. 11.
    Jurčíček F, Keizer S, Gašić M, Mairesse F, Thomson B, Yu K, Young S (2011) Real user evaluation of spoken dialogue systems using amazon mechanical Turk. In: Proceedings of interspeechGoogle Scholar
  12. 12.
    Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK book version 3.4. Cambridge University, CambridgeGoogle Scholar
  13. 13.
    Mairesse F, Gašić M, Jurčíček F, Keizer S, Thomson B, Yu K, Young S (2009) Spoken language understanding from unaligned data using discriminative classification models. In: Proceedings of ICASSPGoogle Scholar
  14. 14.
    Henderson M, Gasic M, Thomson B, Tsiakoulis P, Yu K, Young S (2012) Discriminative spoken language understanding using word confusion networks. In: IEEE SLT 2012. MiamiGoogle Scholar
  15. 15.
    Young S (2007) CUED standard dialogue acts. Cambridge University Engineering Department (14 October 2007)Google Scholar
  16. 16.
    Thomson B, Young S (2010) Bayesian update of dialogue state: a POMDP framework for spoken dialogue systems. Comput Speech Lang 24(4):562–588CrossRefGoogle Scholar
  17. 17.
    Minka T (2001) Expectation propagation for approximate bayesian inference. In: Proceedings of the 17th conference in uncertainty in artificial intelligence (Seattle). Morgan-Kaufmann, pp 362–369Google Scholar
  18. 18.
    Thomson B, Jurcicek F, Gasic M, Keizer S, Mairesse F, Yu K, Young S (2010) Parameter learning for POMDP spoken dialogue models. In: IEEE workshop on spoken language technology (SLT 2010). BerkeleyGoogle Scholar
  19. 19.
    Jurcicek F, Thomson B, Young S (2011) Natural actor and belief critic: reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs. ACM Trans Speech Lang Process 7(3)Google Scholar
  20. 20.
    Schatzmann J, Thomson B, Weilhammer K, Ye H, Young S (2007) Agenda-Based user simulation for bootstrapping a POMDP dialogue system. In: Proceedings of HLTGoogle Scholar
  21. 21.
    Yu K, Young S (2011) Continuous F0 modelling for HMM based statistical parametric speech synthesis. IEEE Audio, Speech Lang Process 19(5):1071–1079CrossRefGoogle Scholar
  22. 22.
    Mairesse F, Gašić M, Jurčíček F, Keizer S, Thomson B, Yu K, Young S (2010) Phrase-based statistical language generation using graphical models and active learning. In: Proceedings of ACLGoogle Scholar
  23. 23.
    OnStar (2013) OnStar FMV mirror.
  24. 24.
    Williams J (2012) A critical analysis of two statistical spoken dialog systems in public use. In: Spoken language technology workshop (SLT). MiamiGoogle Scholar
  25. 25.
    Gasic M, Breslin C, Henderson M, Kim D, Szummer M, Thomson B, Tsiakoulis P, Young S (2013) POMDP-based dialogue manager adaptation to extended domains. In: SigDial 13. MetzGoogle Scholar
  26. 26.
    Gasic M, Breslin C, Henderson M, Kim D, Szummer M, Thomson B, Tsiakoulis P, Young S (2013) On-line policy optimisation of bayesian spoken dialogue systems via human interaction. In: ICASSP 2013. VancouverGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Steve Young
    • 1
  • Catherine Breslin
    • 1
  • Milica Gašić
    • 1
  • Matthew Henderson
    • 1
  • Dongho Kim
    • 1
  • Martin Szummer
    • 1
  • Blaise Thomson
    • 1
  • Pirros Tsiakoulis
    • 1
  • Eli Tzirkel Hancock
    • 2
  1. 1.Cambridge University Engineering DepartmentCambridgeUK
  2. 2.General Motors Advanced Technical CenterHerzliyaIsrael

Personalised recommendations