Towards Online Planning for Dialogue Management with Rich Domain Knowledge

  • Pierre LisonEmail author
Conference paper


Most approaches to dialogue management have so far concentrated on offline optimisation techniques, where a dialogue policy is precomputed for all possible situations and then plugged into the dialogue system. This development strategy has however some limitations in terms of domain scalability and adaptivity, since these policies are essentially static and cannot readily accommodate runtime changes in the environment or task dynamics. In this paper, we follow an alternative approach based on online planning. To ensure that the planning algorithm remains tractable over longer horizons, the presented method relies on probabilistic models expressed via probabilistic rules that capture the internal structure of the domain using high-level representations. We describe in this paper the generic planning algorithm, ongoing implementation efforts and directions for future work.


Bayesian Network Reinforcement Learning Planning Algorithm Dialogue System Partially Observable Markov Decision Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Allen, J., Byron, D., Dzikovska, M., Ferguson, G., Galescu, L., Stent, A.: An architecture for a generic dialogue shell. Nat. Lang. Eng. 6, 213–228 (2000)CrossRefGoogle Scholar
  2. 2.
    Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)zbMATHGoogle Scholar
  3. 3.
    Daubigney, L., Geist, M., Pietquin, O.: Off-policy learning in large-scale pomdp-based dialogue systems. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4989–4992 (2012)Google Scholar
  4. 4.
    Frampton, M., Lemon, O.: Recent research advances in reinforcement learning in spoken dialogue systems. Knowl. Eng. Rev. 24(4), 375–408 (2009)CrossRefGoogle Scholar
  5. 5.
    Gasic, M., Jurcicek, F., Thomson, B., Yu, K., Young, S.: On-line policy optimisation of spoken dialogue systems via live interaction with human subjects. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 312–317 (2011)Google Scholar
  6. 6.
    Getoor, L., Taskar, B.: Introduction to Statistical Relational Learning. The MIT Press, Cambridge (2007)zbMATHGoogle Scholar
  7. 7.
    Lang, T., Toussaint, M.: Planning with noisy probabilistic relational rules. J. Artif. Intell. Res. 39, 1–49 (2010)zbMATHGoogle Scholar
  8. 8.
    Lemon, O., Pietquin, O.: Machine learning for spoken dialogue systems. In: Proceedings of the 10th European Conference on Speech Communication and Technologies (Interspeech’07), pp. 2685–2688 (2007)Google Scholar
  9. 9.
    Lison, P.: Towards relational POMDPs for adaptive dialogue management. In: Proceeding of the Student Research Workshop of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2010)Google Scholar
  10. 10.
    Lison, P.: Declarative design of spoken dialogue systems with probabilistic rules. In: Proceedings of the 16th Workshop on the Semantics and Pragmatics of Dialogue (2012)Google Scholar
  11. 11.
    Lison, P.: Probabilistic dialogue models with prior domain knowledge. In: Proceedings of the SIGDIAL 2012 Conference, pp. 179–188. Seoul, South Korea (2012)Google Scholar
  12. 12.
    Pietquin, O.: Optimising spoken dialogue strategies within the reinforcement learning paradigm. In: Cornelius Weber, M.E., Mayer, N.M. (eds.) Reinforcement Learning, Theory and Applications, pp. 239–256. I-Tech Education and Publishing, Vienna (2008)Google Scholar
  13. 13.
    Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for POMDPs. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 1025–1032 (2003)Google Scholar
  14. 14.
    Poupart, P., Vlassis, N.A.: Model-based bayesian reinforcement learning in partially observable domains. In: International Symposium on Artificial Intelligence and Mathematics (ISAIM) (2008)Google Scholar
  15. 15.
    Purver, M.: The theory and use of clarification requests in dialogue. Ph.D. Thesis (2004)Google Scholar
  16. 16.
    Rieser, V., Lemon, O.: Learning human multimodal dialogue strategies. Nat. Lang. Eng. 16, 3–23 (2010)CrossRefGoogle Scholar
  17. 17.
    Ross, S., Pineau, J., Paquet, S., Chaib-Draa, B.: Online planning algorithms for POMDPs. J. Artif. Intell. Res. 32, 663–704 (2008)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Ross, S., Pineau, J., Chaib-draa, B., Kreitmann, P.: A Bayesian approach for learning and planning in partially observable markov decision processes. J. Mach. Learn. Res. 12, 1729–1770 (2011)MathSciNetGoogle Scholar
  19. 19.
    Silver, D., Veness, J.: Monte-carlo planning in large POMDPs. In:  Lafferty, J., Williams, C.K.I.,  Shawe-Taylor, J.,  Zemel, R.,  Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 2164–2172 (2010)Google Scholar
  20. 20.
    Steedman, M., Petrick, R.P.A.: Planning dialog actions. In: Proceedings of the 8th SIGDIAL Meeting on Discourse and Dialogue, pp. 265–272. Antwerp, Belgium (2007)Google Scholar
  21. 21.
    Thomson, V., Young, S.: Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems. Comput. Speech Lang. 24, 562–588 (2010)CrossRefGoogle Scholar
  22. 22.
    Williams, J.: A case study of applying decision theory in the real world: POMDPs and spoken dialog systems. In:  Sucar, L.,  Morales, E.,  Hoey, J. (eds.) Decision Theory Models for Applications in Artificial Intelligence: Concepts and Solutions, pp. 315–342. IGI Global, Pennsylvania (2012)Google Scholar
  23. 23.
    Young, S., Gašić, M., Keizer, S., Mairesse, F., Schatzmann, J., Thomson, B., Yu, K.: The hidden information state model: A practical framework for POMDP-based spoken dialogue management. Comput. Speech Lang. 24, 150–174 (2010)CrossRefGoogle Scholar
  24. 24.
    Zettlemoyer, L.S., Pasula, H.M., Kaelblin, L.P.: Learning planning rules in noisy stochastic worlds. In: Proceedings of the 20th AAAI Conference on Artificial Intelligence, pp. 911–918. AAAI Press (2005)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Department of Informatics, Language Technology GroupUniversity of OsloOsloNorway

Personalised recommendations