Dialogue POMDP components (part I): learning states and observations
- 142 Downloads
Abstract
The partially observable Markov decision process (POMDP) framework has been applied in dialogue systems as a formal framework to represent uncertainty explicitly while being robust to noise. In this context, estimating the dialogue POMDP model components is a significant challenge as they have a direct impact on the optimized dialogue POMDP policy. To achieve such an estimation, we propose methods for learning dialogue POMDP model components using noisy and unannotated dialogues. Specifically, we introduce techniques to learn the set of possible user intentions from dialogues, use them as the dialogue POMDP states, and learn a maximum likelihood POMDP transition model from data. Since it is crucial to reduce the observation state size, we then propose two observation models: the keyword model and the intention model. Using these two models, the number of observations is reduced significantly while the POMDP performance remains high particularly in the intention POMDP. Learning states and observations sustaining a POMDP are both covered in this first part (part I) and experimented from dialogues collected by SmartWheeler (an intelligent wheelchair which aims to help persons with disabilities). Part II covers the reward model learning required by the POMDP.
Keywords
Partially observable Markov decision processes (POMDP) Unsupervised learning Learning observations and states Healthcare dialogue managementReferences
- Atrash, A., & Pineau, J. (2010). A Bayesian method for learning POMDP observation parameters for robot interaction management systems. In The POMDP practitioners workshop.Google Scholar
- Blei, D. (2012). Introduction to probabilistic topic models. Communications of the ACM, 55(4), 77–84.Google Scholar
- Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.MATHGoogle Scholar
- Chinaei, H. R., Chaib-draa, B., & Lamontagne, L. (2009). Learning user intentions in spoken dialogue systems. In Proceedings of the 1st International Conference on Agents and Artificial Intelligence (ICAART’09), Porto, Portugal.Google Scholar
- Choi, J., & Kim, K.-E. (2011). Inverse reinforcement learning in partially observable environments. Journal of Machine Learning Research, 12, 691–730.MATHGoogle Scholar
- Daud, A., Li, J., Zhou, L., & Muhammad, F. (2010). Knowledge discovery through directed probabilistic topic models: A survey. Frontiers of Computer Science in China, 4(2), 280–301.CrossRefGoogle Scholar
- Doshi, F., & Roy, N. (2007). Efficient model learning for dialog management. In Proceedings of the 2nd ACM SIGCHI/SIGART conference on Human-Robot Interaction (HRI’07), Arlington, Virginia, USA.Google Scholar
- Doshi, F., & Roy, N. (2008). Spoken language interaction with model uncertainty: An adaptive human-robot interaction system. Connection Science, 20(4), 299–318.CrossRefGoogle Scholar
- Gašić, M. (2011). Statistical dialogue modelling. PhD thesis, Department of Engineering, University of Cambridge.Google Scholar
- Gruber, A., & Popat, A. (2007). Notes regarding computations in open htmm. http://openhtmm.googlecode.com/files/htmm_computations.pdf
- Gruber, A., Rosen-Zvi, M., & Weiss, Y. (2007). Hidden topic Markov models. In Artificial intelligence and statistics (AISTATS’07), San Juan, Puerto Rico, USA.Google Scholar
- Kaelbling, L., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1–2), 99–134.MathSciNetCrossRefMATHGoogle Scholar
- Ko, Y., & Seo, J. (2004). Learning with unlabeled data for text categorization using bootstrapping and feature projection techniques. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL’04), Barcelona, Spain.Google Scholar
- Matsubara, S., Kimura, S., Kawaguchi, N., Yamaguchi, Y., & Inagaki, Y. (2002). Example-based speech intention understanding and its application to in-car spoken dialogue system. In Proceedings of the 19th International Conference on Computational linguistics (Vol. 1), Taipei, Taiwan. Google Scholar
- Ng, A. Y., & Russell, S. J. (2000). Algorithms for inverse reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning (ICML’00), Stanford, CA, USA.Google Scholar
- Paek, T., & Pieraccini, R. (2008). Automating spoken dialogue management design using machine learning: An industry perspective. Speech Communication, 50(8), 716–729.CrossRefGoogle Scholar
- Pineau, J., Gordon, G., & Thrun, S. (2003). Point-based value iteration: An anytime algorithm for POMDPs. In International Joint Conference on Artificial Intelligence (IJCAI’03), Acapulco, Mexico.Google Scholar
- Pineau, J., West, R., Atrash, A., Villemure, J., & Routhier, F. (2011). On the feasibility of using a standardized test for evaluating a speech-controlled smart wheelchair. International Journal of Intelligent Control and Systems, 16(2), 124–131.Google Scholar
- Png, S. & Pineau, J. (2011). Bayesian reinforcement learning for POMDP-based dialogue systems. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’11), Prague, Czech Republic.Google Scholar
- Png, S., Pineau, J., & Chaib-Draa, B. (2012). Building adaptive dialogue systems via bayes-adaptive POMDPs. IEEE Journal of Selected Topics in Signal Processing, 6(8), 917–927.Google Scholar
- Rabiner, L. R. (1990). Readings in speech recognition. In Chapter A tutorial on hidden Markov models and selected applications in speech recognition (pp. 267–296). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.Google Scholar
- Roy, N., Pineau, J., & Thrun, S. (2000). Spoken dialogue management using probabilistic reasoning. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (ACL’00), Hong Kong.Google Scholar
- Thomson, B. (2009). Statistical methods for spoken dialogue management. PhD thesis, Department of Engineering, University of Cambridge.Google Scholar
- Weilhammer, K., Williams, J. D., & Young, S. (2004). The SACTI-2 corpus: Guide for research users. Cambridge University. Technical report.Google Scholar
- Williams, J. D. (2006). Partially observable Markov decision processes for spoken dialogue management. PhD thesis, Department of Engineering, University of Cambridge.Google Scholar
- Williams, J. D., & Young, S. (2005). The SACTI-1 corpus: Guide for research users. Department of Engineering, University of Cambridge. Technical report.Google Scholar
- Williams, J. D., & Young, S. (2007). Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language, 21, 393–422.CrossRefGoogle Scholar
- Zhang, B., Cai, Q., Mao, J., Chang, E., & Guo, B. (2001a). Spoken dialogue management as planning and acting under uncertainty. In Proceedings of the 9th European Conference on Speech Communication and Technology (Eurospeech’01), Aalborg, Denmark.Google Scholar
- Zhang, B., Cai, Q., Mao, J., & Guo, B. (2001b). Planning and acting under uncertainty: A new model for spoken dialogue system. In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence (UAI’01), Seattle, Washington, USA.Google Scholar