Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

ECML PKDD 2012: Machine Learning and Knowledge Discovery in Databases pp 99–115Cite as

  1. Home
  2. Machine Learning and Knowledge Discovery in Databases
  3. Conference paper
Adaptive Planning for Markov Decision Processes with Uncertain Transition Models via Incremental Feature Dependency Discovery

Adaptive Planning for Markov Decision Processes with Uncertain Transition Models via Incremental Feature Dependency Discovery

  • N. Kemal Ure21,
  • Alborz Geramifard21,
  • Girish Chowdhary21 &
  • …
  • Jonathan P. How21 
  • Conference paper
  • 4864 Accesses

  • 10 Citations

Part of the Lecture Notes in Computer Science book series (LNAI,volume 7524)

Abstract

Solving large scale sequential decision making problems without prior knowledge of the state transition model is a key problem in the planning literature. One approach to tackle this problem is to learn the state transition model online using limited observed measurements. We present an adaptive function approximator (incremental Feature Dependency Discovery (iFDD)) that grows the set of features online to approximately represent the transition model. The approach leverages existing feature-dependencies to build a sparse representation of the state transition model. Theoretical analysis and numerical simulations in domains with state space sizes varying from thousands to millions are used to illustrate the benefit of using iFDD for incrementally building transition models in a planning framework.

Keywords

  • Transition Model
  • Markov Decision Process
  • State Transition Model
  • Adaptive Dynamic Program
  • Adaptive Planning

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Download conference paper PDF

References

  1. Singh, S.P., Barto, A.G., Bradtke, S.J.: Learning to act using real-timedynamicprogramming. Artificial Intelligience 72(1-2), 81–138 (1995)

    CrossRef  Google Scholar 

  2. Asmuth, J., Li, L., Littman, M., Nouri, A., Wingate, D.: A Bayesian sampling approach to exploration in reinforcement learning. In: Proceedings of the Proceedings of the Twenty-Fifth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI 2009), pp. 19–26. AUAI Press, Corvallis (2009)

    Google Scholar 

  3. Bertsekas, D.P.: Dynamic Programming and Optimal Control, 3rd edn., vol. I-II. Athena Scientific, Belmont (2007)

    Google Scholar 

  4. Bertuccelli, L.F., How, J.P.: Robust Markov decision processes using sigma point sampling. In: American Control Conference (ACC), June 11-13, pp. 5003–5008 (2008)

    Google Scholar 

  5. Bertuccelli, L.F.: Robust Decision-Making with Model Uncertainty in Aerospace Systems. PhD thesis, Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, Cambridge MA (September 2008)

    Google Scholar 

  6. Bethke, B., How, J.P., Vian, J.: Multi-UAV Persistent Surveillance With Communication Constraints and Health Management. In: AIAA Guidance, Navigation, and Control Conference (GNC) (August 2009) (AIAA 2009-5654)

    Google Scholar 

  7. Brafman, R.I., Tennenholtz, M.: R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research (JMLR) 3, 213–231 (2001)

    MathSciNet  Google Scholar 

  8. Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press (2010)

    Google Scholar 

  9. Delage, E., Mannor, S.: Percentile Optimization for Markov Decision Processes with Parameter Uncertainty. Subm. to Operations Research (2007)

    Google Scholar 

  10. Vasquez, T.F.D., Laugier, C.: Incremental Learning of Statistical Motion Patterns With Growing Hidden Markov Models. IEEE Transcations on Intelligent Transportation Systems 10(3) (2009)

    Google Scholar 

  11. Fox, E.B.: Bayesian Nonparametric Learning of Complex Dynamical Phenomena. PhD thesis, Massachusetts Institute of Technology, Cambridge MA (December 2009)

    Google Scholar 

  12. Geramifard, A.: Practical Reinforcement Learning Using Representation Learning and Safe Exploration for Large Scale Markov Decision Processes. PhD thesis, Massachusetts Institute of Technology, Department of Aeronautics and Astronautics (February 2012)

    Google Scholar 

  13. Geramifard, A., Doshi, F., Redding, J., Roy, N., How, J.: Online discovery of feature dependencies. In: Getoor, L., Scheffer, T. (eds.) International Conference on Machine Learning (ICML), pp. 881–888. ACM, New York (2011)

    Google Scholar 

  14. Gullapalli, V., Barto, A.: Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms. In: Neural Information Processing Systems, NIPS (1994)

    Google Scholar 

  15. Iyengar, G.: Robust Dynamic Programming. Math. Oper. Res. 30(2), 257–280 (2005)

    CrossRef  MathSciNet  MATH  Google Scholar 

  16. Jilkov, V., Li, X.: Online Bayesian Estimation of Transition Probabilities for Markovian Jump Systems. IEEE Trans. on Signal Processing 52(6) (2004)

    Google Scholar 

  17. Joseph, J., Doshi-Velez, F., Huang, A.S., Roy, N.: A Bayesian nonparametric approach to modeling motion patterns. Autonomous Robots 31(4), 383–400 (2011)

    CrossRef  Google Scholar 

  18. Kushner, H.J., George Yin, G.: Convergence of indirect adaptive asynchronous value iteration algorithms. Springer (2003)

    Google Scholar 

  19. Liu, W., Pokharel, P.P., Principe, J.C.: The kernel least-mean-square algorithm. IEEE Transactions on Signal Processing 56(2), 543–554 (2008)

    CrossRef  MathSciNet  Google Scholar 

  20. Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103–130 (1993)

    Google Scholar 

  21. Nigam, N., Kroo, I.: Persistent surveillance using multiple unmanned air vehicles. In: 2008 IEEE Aerospace Conference, pp. 1–14 (March 2008)

    Google Scholar 

  22. Nilim, A., El Ghaoui, L.: Robust Solutions to Markov Decision Problems with Uncertain Transition Matrices. Operations Research 53(5) (2005)

    Google Scholar 

  23. Puterman, M.L.: Markov Decision Processes: Stochastic Dynamic Programming. John Wiley and Sons (1994)

    Google Scholar 

  24. Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  25. Redding, J.D., Toksoz, T., Kemal Ure, N., Geramifard, A., How, J.P., Vavrina, M., Vian, J.: Persistent distributed multi-agent missions with automated battery management. In: AIAA Guidance, Navigation, and Control Conference (GNC) (August 2011) (AIAA-2011-6480)

    Google Scholar 

  26. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2003)

    Google Scholar 

  27. Ryan, A., Hedrick, J.K.: A mode-switching path planner for uav-assisted search and rescue. In: 44th IEEE Conference on Decision and Control, 2005 and 2005 European Control Conference, CDC-ECC 2005, pp. 1471–1476 (December 2005)

    Google Scholar 

  28. Shapiro, A., Wardi, Y.: Convergence Analysis of Gradient Descent Stochastic Algorithms. Journal of Optimization Theory and Applications 91(2), 439–454 (1996)

    CrossRef  MathSciNet  MATH  Google Scholar 

  29. Sutton, R.S., Szepesvari, C., Geramifard, A., Bowling, M.: Dyna-style planning with linear function approximation and prioritized sweeping. In: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, pp. 528–536 (2008)

    Google Scholar 

  30. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)

    Google Scholar 

  31. Tozer, T.C., Grace, D.: High-altitude platforms for wireless communications. Electronics Communication Engineering Journal 13(3), 127–137 (2001)

    CrossRef  Google Scholar 

  32. Paul, E.: Constructive function approximation. In: Feature extraction, construction, and selection: A data-mining perspective (1998)

    Google Scholar 

  33. Yao, H., Sutton, R.S., Bhatnagar, S., Dongcui, D., Szepesvári, C.: Multi-step dyna planning for policy evaluation and control. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) NIPS, pp. 2187–2195. Curran Associates, Inc. (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Laboratory for Information and Decision Systems, (MIT) Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA, USA

    N. Kemal Ure, Alborz Geramifard, Girish Chowdhary & Jonathan P. How

Authors
  1. N. Kemal Ure
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Alborz Geramifard
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Girish Chowdhary
    View author publications

    You can also search for this author in PubMed Google Scholar

  4. Jonathan P. How
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road, BS8 1UB, Bristol, UK

    Peter A. Flach

  2. Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road,, BS8 1UB, Bristol, UK

    Tijl De Bie & Nello Cristianini & 

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ure, N.K., Geramifard, A., Chowdhary, G., How, J.P. (2012). Adaptive Planning for Markov Decision Processes with Uncertain Transition Models via Incremental Feature Dependency Discovery. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33486-3_7

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-33486-3_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33485-6

  • Online ISBN: 978-3-642-33486-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature