Personalizing influence diagrams: applying online learning strategies to dialogue management

Chickering, David Maxwell; Paek, Tim

doi:10.1007/s11257-006-9020-7

Personalizing influence diagrams: applying online learning strategies to dialogue management

Original Paper
Published: 13 December 2006

Volume 17, pages 71–91, (2007)
Cite this article

User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

David Maxwell Chickering¹ &
Tim Paek¹

157 Accesses
9 Citations
Explore all metrics

Abstract

We consider the problem of adapting the parameters of an influence diagram in an online fashion for real-time personalization. This problem is important when we use the influence diagram repeatedly to make decisions and we are uncertain about its parameters. We describe learning algorithms to solve this problem. In particular, we show how to modify various explore-versus-exploit strategies that are known to work well for Markov decision processes to the more general influence-diagram model. As an illustration, we describe how our techniques for online personalization allow a voice-enabled browser to adapt to a particular speaker for spoken dialogue management. We evaluate all the explore-versus-exploit strategies in this domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Packing, Stacking, and Tracking: An Empirical Study of Online User Adaptation

GCG Aviator: A Decision Support Agent for Career Management

Plug-in Tutor Agents: Still Pluggin’

Article 10 September 2015

References

Albrecht, D., Zukerman, I., Nicholson, A.: Bayesian models for keyhole plan recognition in an adventure game. User Model. User-Adapted Interaction, Special Issue Machine Learning User Model. 8(1–2) 5–47 (1998)
Auer P. (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3: 397–422
Article MathSciNet Google Scholar
Auer, P., Cesa-Bianchi, M., Freund, Y., Schapire, R.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. In: In Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 322–331. IEEE Computer Society Press, Los Alamitos, CA (1995)
Berry, D., Fristedt, B.: Bandit Problems: Sequential Allocation of Experiments Chapman and Hall, London (1985)
Boutilier C., Dean T., Hanks S. (1999) Decision-theoretic planning: structural assumptions and computational leverage. J. Aritif. Intell. Res. 1: 1–93
MathSciNet Google Scholar
Chickering, D.M.: The winmine toolkit. Technical Report MSR-TR-2002-103, Microsoft Redmond, WA (2002)
Cooper G.F. (1993) A method for using belief networks as influence diagrams. In: Heckerman D., Mamdani A. (eds) Proceedings of the Ninth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann , Washington DC, pp. 55–63
Google Scholar
Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence, pp. 761–768. Madison, WI (1998)
Heckerman D. (1995) A Bayesian approach for learning causal networks. In: Hanks S., Besnard P. (eds) Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, Montreal, QU
Google Scholar
Heckerman, D.: A tutorial on learning Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research (1996)
Horvitz, E., Breese, J., Heckerman, D., Hovel, D., Rommelse, K.: The lumiere project: Bayesian user modeling for inferring the goals and needs of software users. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 256–265. Madison, Wisconsin (1998)
Howard, R., Matheson, J.: Influence diagrams. In: Readings on the Principles and Applications of Decision Analysis, Vol. II, pp. 721–762. Strategic Decisions Group, Menlo Park, CA (1981)
Kaelbling, L.P.: Learning in Embedded Systems. The MIT Press, Cambridge, MA (1993)
Kaelbling L.P., Littman M.L., Moore A.W. (1996) Reinforcement learning: a survey. J. Artif. Intell. Res. 4: 237–285
Google Scholar
Kakade S.M., Ng A.Y. (2005) Online bounds for bayesian algorithms. In: Saul L.K., Weiss Y., Bottou L. (eds) Advances in Neural Information Processing Systems. MIT Press, Cambridge MA, Vol. 17, pp. 641–648
Google Scholar
Lauritzen S.L., Nilsson D. (2001) Representing and solving decision problems with limited information. Manage. Sci. 47(9): 1235–1251
Article Google Scholar
Roy, N., Pineau, J., Thrun, S.: Spoken dialogue management using probabilistic reasoning. In: Proceedings of ACL-2000, pp. 93–100. Hong Kong, China (2000)
Shachter, R., Peot, M.: Decision making using probabilistic inference methods. In: Proceedings of the 8th Annual Conference on Uncertainty in Artificial Intelligence, pp. 276–283. San Mateo, CA, Morgan Kaufmann Publishers (1992)
Singh S., Litman D., Kearns M., Walker M. (2002) Optimizing dialogue management with reinforcement learning: experiments with the njfun system. J. Artif. Intell. Res. 16: 105–133
Google Scholar
Sutton, R., Barto A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)
Tatman J.A., Shachter R.D. (1990) Dynamic programming and influence diagrams. IEEE Trans. Syst. Man Cybernet. 20(2): 365–379
Article MATH MathSciNet Google Scholar
Thompson W.R. (1993) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometricka. 25: 285–294
Google Scholar
Wyatt, J.: Exploration and Inference in Learning from Reinforcement. PhD thesis, University of Edinburgh (1997)
Young S. (2000) Probabilistic methods in spoken dialogue systems. Philos. Trans. Roy. Soc. (Ser A) 358(1769): 1389–1402
Article MATH Google Scholar
Zukerman I., Albrecht D. (2001) Predictive statistical models for user modeling. User Model. User-Adapted Interact. 11(1): 5–18
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, One Microsoft Way, Redmond, WA, 98052, USA
David Maxwell Chickering & Tim Paek

Authors

David Maxwell Chickering
View author publications
You can also search for this author in PubMed Google Scholar
Tim Paek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Maxwell Chickering.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chickering, D.M., Paek, T. Personalizing influence diagrams: applying online learning strategies to dialogue management. User Model User-Adap Inter 17, 71–91 (2007). https://doi.org/10.1007/s11257-006-9020-7

Download citation

Received: 01 November 2005
Accepted: 29 June 2006
Published: 13 December 2006
Issue Date: March 2007
DOI: https://doi.org/10.1007/s11257-006-9020-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Personalizing influence diagrams: applying online learning strategies to dialogue management

Abstract

Access this article

Similar content being viewed by others

Packing, Stacking, and Tracking: An Empirical Study of Online User Adaptation

GCG Aviator: A Decision Support Agent for Career Management

Plug-in Tutor Agents: Still Pluggin’

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Personalizing influence diagrams: applying online learning strategies to dialogue management

Abstract

Access this article

Similar content being viewed by others

Packing, Stacking, and Tracking: An Empirical Study of Online User Adaptation

GCG Aviator: A Decision Support Agent for Career Management

Plug-in Tutor Agents: Still Pluggin’

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation