Personalizing influence diagrams: applying online learning strategies to dialogue management

  • David Maxwell Chickering
  • Tim Paek
Original Paper


We consider the problem of adapting the parameters of an influence diagram in an online fashion for real-time personalization. This problem is important when we use the influence diagram repeatedly to make decisions and we are uncertain about its parameters. We describe learning algorithms to solve this problem. In particular, we show how to modify various explore-versus-exploit strategies that are known to work well for Markov decision processes to the more general influence-diagram model. As an illustration, we describe how our techniques for online personalization allow a voice-enabled browser to adapt to a particular speaker for spoken dialogue management. We evaluate all the explore-versus-exploit strategies in this domain.


Personalization Influence diagrams User-model adaptation Planning Dialogue management Speech recognition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Albrecht, D., Zukerman, I., Nicholson, A.: Bayesian models for keyhole plan recognition in an adventure game. User Model. User-Adapted Interaction, Special Issue Machine Learning User Model. 8(1–2) 5–47 (1998)Google Scholar
  2. Auer P. (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3: 397–422CrossRefMathSciNetGoogle Scholar
  3. Auer, P., Cesa-Bianchi, M., Freund, Y., Schapire, R.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. In: In Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 322–331. IEEE Computer Society Press, Los Alamitos, CA (1995)Google Scholar
  4. Berry, D., Fristedt, B.: Bandit Problems: Sequential Allocation of Experiments Chapman and Hall, London (1985)Google Scholar
  5. Boutilier C., Dean T., Hanks S. (1999) Decision-theoretic planning: structural assumptions and computational leverage. J. Aritif. Intell. Res. 1: 1–93MathSciNetGoogle Scholar
  6. Chickering, D.M.: The winmine toolkit. Technical Report MSR-TR-2002-103, Microsoft Redmond, WA (2002)Google Scholar
  7. Cooper G.F. (1993) A method for using belief networks as influence diagrams. In: Heckerman D., Mamdani A. (eds) Proceedings of the Ninth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann , Washington DC, pp. 55–63Google Scholar
  8. Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence, pp. 761–768. Madison, WI (1998)Google Scholar
  9. Heckerman D. (1995) A Bayesian approach for learning causal networks. In: Hanks S., Besnard P. (eds) Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, Montreal, QUGoogle Scholar
  10. Heckerman, D.: A tutorial on learning Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research (1996)Google Scholar
  11. Horvitz, E., Breese, J., Heckerman, D., Hovel, D., Rommelse, K.: The lumiere project: Bayesian user modeling for inferring the goals and needs of software users. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 256–265. Madison, Wisconsin (1998)Google Scholar
  12. Howard, R., Matheson, J.: Influence diagrams. In: Readings on the Principles and Applications of Decision Analysis, Vol. II, pp. 721–762. Strategic Decisions Group, Menlo Park, CA (1981)Google Scholar
  13. Kaelbling, L.P.: Learning in Embedded Systems. The MIT Press, Cambridge, MA (1993)Google Scholar
  14. Kaelbling L.P., Littman M.L., Moore A.W. (1996) Reinforcement learning: a survey. J. Artif. Intell. Res. 4: 237–285Google Scholar
  15. Kakade S.M., Ng A.Y. (2005) Online bounds for bayesian algorithms. In: Saul L.K., Weiss Y., Bottou L. (eds) Advances in Neural Information Processing Systems. MIT Press, Cambridge MA, Vol. 17, pp. 641–648Google Scholar
  16. Lauritzen S.L., Nilsson D. (2001) Representing and solving decision problems with limited information. Manage. Sci. 47(9): 1235–1251CrossRefGoogle Scholar
  17. Roy, N., Pineau, J., Thrun, S.: Spoken dialogue management using probabilistic reasoning. In: Proceedings of ACL-2000, pp. 93–100. Hong Kong, China (2000)Google Scholar
  18. Shachter, R., Peot, M.: Decision making using probabilistic inference methods. In: Proceedings of the 8th Annual Conference on Uncertainty in Artificial Intelligence, pp. 276–283. San Mateo, CA, Morgan Kaufmann Publishers (1992)Google Scholar
  19. Singh S., Litman D., Kearns M., Walker M. (2002) Optimizing dialogue management with reinforcement learning: experiments with the njfun system. J. Artif. Intell. Res. 16: 105–133Google Scholar
  20. Sutton, R., Barto A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)Google Scholar
  21. Tatman J.A., Shachter R.D. (1990) Dynamic programming and influence diagrams. IEEE Trans. Syst. Man Cybernet. 20(2): 365–379zbMATHCrossRefMathSciNetGoogle Scholar
  22. Thompson W.R. (1993) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometricka. 25: 285–294Google Scholar
  23. Wyatt, J.: Exploration and Inference in Learning from Reinforcement. PhD thesis, University of Edinburgh (1997)Google Scholar
  24. Young S. (2000) Probabilistic methods in spoken dialogue systems. Philos. Trans. Roy. Soc. (Ser A) 358(1769): 1389–1402zbMATHCrossRefGoogle Scholar
  25. Zukerman I., Albrecht D. (2001) Predictive statistical models for user modeling. User Model. User-Adapted Interact. 11(1): 5–18zbMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2006

Authors and Affiliations

  1. 1.Microsoft ResearchRedmondUSA

Personalised recommendations