Evaluation of Statistical POMDP-Based Dialogue Systems in Noisy Environments

Young, Steve; Breslin, Catherine; Gašić, Milica; Henderson, Matthew; Kim, Dongho; Szummer, Martin; Thomson, Blaise; Tsiakoulis, Pirros; Hancock, Eli Tzirkel

doi:10.1007/978-3-319-21834-2_1

Steve Young⁵,
Catherine Breslin⁵,
Milica Gašić⁵,
Matthew Henderson⁵,
Dongho Kim⁵,
Martin Szummer⁵,
Blaise Thomson⁵,
Pirros Tsiakoulis⁵ &
…
Eli Tzirkel Hancock⁶

Part of the book series: Signals and Communication Technology ((SCT))

761 Accesses
1 Citations

Abstract

Compared to conventional hand-crafted rule-based dialogue management systems, statistical POMDP-based dialogue managers offer the promise of increased robustness, reduced development and maintenance costs, and scaleability to large open-domains. As a consequence, there has been considerable research activity in approaches to statistical spoken dialogue systems over recent years. However, building and deploying a real-time spoken dialogue system is expensive, and even when operational, it is hard to recruit sufficient users to get statistically significant results. Instead, researchers have tended to evaluate using user simulators or by reprocessing existing corpora, both of which are unconvincing predictors of actual real world performance. This paper describes the deployment of a real-world restaurant information system and its evaluation in a motor car using subjects recruited locally and by remote users recruited using Amazon Mechanical Turk. The paper explores three key questions: are statistical dialogue systems more robust than conventional hand-crafted systems; how does the performance of a system evaluated on a user simulator compare to performance with real users; and can performance of a system tested over the telephone network be used to predict performance in more hostile environments such as a motor car? The results show that the statistical approach is indeed more robust, but results from a simulator significantly over-estimate performance both absolute and relative. Finally, by matching WER rates, performance results obtained over the telephone can provide useful predictors of performance in noisier environments such as the motor car, but again they tend to over-estimate performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.gumtree.com.
2.
As well as being used to train the POMDP-based system, the user simulator was used to tune the rules in the conventional hand-crafted system.

References

Roy N, Pineau J, Thrun S (2000) Spoken dialogue management using probabilistic reasoning. In: Proceedings of ACL
Google Scholar
Young S (2002) Talking to machines (statistically speaking). In: Proceedings of ICSLP
Google Scholar
Williams J, Young S (2007) Partially observable markov decision processes for spoken dialog systems. Comput Speech Lang 21(2):393–422
Article Google Scholar
Young S, Gasic M, Thomson B, Williams J (2013) POMDP-based statistical spoken dialogue systems: a review. Proc IEEE 101(5):1160–1179
Article Google Scholar
Scheffler K, Young S (2000) Probabilistic simulation of human-machine dialogues. In: ICASSP
Google Scholar
Pietquin O, Dutoit T (2006) A probabilistic framework for dialog simulation and optimal strategy learning. IEEE Trans Speech Audio Process, Spec Issue Data Min Speech, Audio Dialog 14(2):589–599
Google Scholar
Schatzmann J, Weilhammer K, Stuttle M, Young S (2006) A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. KER 21(2):97–126
Google Scholar
Pietquin O, Renals S (2002) ASR system modelling for automatic evaluation and optimisation of dialogue systems. In: International Conference on Acoustics Speech and Signal Processing. Florida
Google Scholar
Thomson B, Henderson M, Gasic M, Tsiakoulis P, Young S (2012) N-Best error simulation for training spoken dialogue systems. In: IEEE SLT 2012. Miami
Google Scholar
Tsiakoulis P, Gašić M, Henderson M, Planells-Lerma J, Prombonas J, Thomson B, Yu K, Young S, Tzirkel E (2012) Statistical methods for building robust spoken dialogue systems in an automobile. In: Proceedings of the 4th applied human factors and ergonomics
Google Scholar
Jurčíček F, Keizer S, Gašić M, Mairesse F, Thomson B, Yu K, Young S (2011) Real user evaluation of spoken dialogue systems using amazon mechanical Turk. In: Proceedings of interspeech
Google Scholar
Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK book version 3.4. Cambridge University, Cambridge
Google Scholar
Mairesse F, Gašić M, Jurčíček F, Keizer S, Thomson B, Yu K, Young S (2009) Spoken language understanding from unaligned data using discriminative classification models. In: Proceedings of ICASSP
Google Scholar
Henderson M, Gasic M, Thomson B, Tsiakoulis P, Yu K, Young S (2012) Discriminative spoken language understanding using word confusion networks. In: IEEE SLT 2012. Miami
Google Scholar
Young S (2007) CUED standard dialogue acts. Cambridge University Engineering Department (14 October 2007)
Google Scholar
Thomson B, Young S (2010) Bayesian update of dialogue state: a POMDP framework for spoken dialogue systems. Comput Speech Lang 24(4):562–588
Article Google Scholar
Minka T (2001) Expectation propagation for approximate bayesian inference. In: Proceedings of the 17th conference in uncertainty in artificial intelligence (Seattle). Morgan-Kaufmann, pp 362–369
Google Scholar
Thomson B, Jurcicek F, Gasic M, Keizer S, Mairesse F, Yu K, Young S (2010) Parameter learning for POMDP spoken dialogue models. In: IEEE workshop on spoken language technology (SLT 2010). Berkeley
Google Scholar
Jurcicek F, Thomson B, Young S (2011) Natural actor and belief critic: reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs. ACM Trans Speech Lang Process 7(3)
Google Scholar
Schatzmann J, Thomson B, Weilhammer K, Ye H, Young S (2007) Agenda-Based user simulation for bootstrapping a POMDP dialogue system. In: Proceedings of HLT
Google Scholar
Yu K, Young S (2011) Continuous F0 modelling for HMM based statistical parametric speech synthesis. IEEE Audio, Speech Lang Process 19(5):1071–1079
Article Google Scholar
Mairesse F, Gašić M, Jurčíček F, Keizer S, Thomson B, Yu K, Young S (2010) Phrase-based statistical language generation using graphical models and active learning. In: Proceedings of ACL
Google Scholar
OnStar (2013) OnStar FMV mirror. http://www.onstarconnections.com/
Williams J (2012) A critical analysis of two statistical spoken dialog systems in public use. In: Spoken language technology workshop (SLT). Miami
Google Scholar
Gasic M, Breslin C, Henderson M, Kim D, Szummer M, Thomson B, Tsiakoulis P, Young S (2013) POMDP-based dialogue manager adaptation to extended domains. In: SigDial 13. Metz
Google Scholar
Gasic M, Breslin C, Henderson M, Kim D, Szummer M, Thomson B, Tsiakoulis P, Young S (2013) On-line policy optimisation of bayesian spoken dialogue systems via human interaction. In: ICASSP 2013. Vancouver
Google Scholar

Download references

Author information

Authors and Affiliations

Cambridge University Engineering Department, Cambridge, UK
Steve Young, Catherine Breslin, Milica Gašić, Matthew Henderson, Dongho Kim, Martin Szummer, Blaise Thomson & Pirros Tsiakoulis
General Motors Advanced Technical Center, Herzliya, Israel
Eli Tzirkel Hancock

Authors

Steve Young
View author publications
You can also search for this author in PubMed Google Scholar
Catherine Breslin
View author publications
You can also search for this author in PubMed Google Scholar
Milica Gašić
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Henderson
View author publications
You can also search for this author in PubMed Google Scholar
Dongho Kim
View author publications
You can also search for this author in PubMed Google Scholar
Martin Szummer
View author publications
You can also search for this author in PubMed Google Scholar
Blaise Thomson
View author publications
You can also search for this author in PubMed Google Scholar
Pirros Tsiakoulis
View author publications
You can also search for this author in PubMed Google Scholar
Eli Tzirkel Hancock
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steve Young .

Editor information

Editors and Affiliations

School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Alexander Rudnicky
Cupertino, California, USA
Antoine Raux
Silicon Valley, Carnegie Mellon University, Moffett Field, California, USA
Ian Lane
Mountain View, California, USA
Teruhisa Misu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Young, S. et al. (2016). Evaluation of Statistical POMDP-Based Dialogue Systems in Noisy Environments. In: Rudnicky, A., Raux, A., Lane, I., Misu, T. (eds) Situated Dialog in Speech-Based Human-Computer Interaction. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-21834-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-21834-2_1
Published: 21 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21833-5
Online ISBN: 978-3-319-21834-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics