Characterizing Markov Decision Processes

  • Bohdana Ratitch
  • Doina Precup
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2430)

Abstract

Problem characteristics often have a significant influence on the difficulty of solving optimization problems. In this paper, we propose attributes for characterizing Markov Decision Processes (MDPs), and discuss how they affect the performance of reinforcement learning algorithms that use function approximation. The attributes measure mainly the amount of randomness in the environment. Their values can be calculated from the MDP model or estimated on-line. We show empirically that two of the proposed attributes have a statistically significant effect on the quality of learning. We discuss how measurements of the proposed MDP attributes can be used to facilitate the design of reinforcement learning systems.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Bertsekas, D. P., Tsitsiklis, J. N.: Neuro-Dynamic Programming. Belmont, MA: Athena Scientific (1996)MATHGoogle Scholar
  2. [2]
    Cohen, P. R.: Empirical Methods for Artificial Intelligence. Cambridge, MA: The MIT Press (1995)MATHGoogle Scholar
  3. [3]
    Dean, T., Kaelbling, L., Kirman, J., Nicholson, A.: Planning under Time Constraints in Stochastic Domains. Artificial Intelligence 76(1–2) (1995) 35–74CrossRefGoogle Scholar
  4. [4]
    Dearden, R., Friedman, N., Andre, D.: Model-Based Bayesian Exploration. In Uncertainty in Artificial Intelligence: Proceedings of the Fifteenth Conference (UAI-1999) 150–159Google Scholar
  5. [5]
    Gordon, J. G.: Reinforcement Learning with Function Approximation Converges to a Region. Advances in Neural Information Processing Systems 13 (2001) 1040–1046Google Scholar
  6. [6]
    Hogg, T., Huberman, B. A., Williams, C. P.: Phase Transitions and the Search Problem (Editorial). Artificial Intelligence, 81 (1996) 1–16CrossRefMathSciNetGoogle Scholar
  7. [7]
    Hoos, H. H., Stutzle, T.: Local Search Algorithms for SAT: An Empirical Evaluation. Journal of Automated Reasoning, 24 (2000) 421–481.MATHCrossRefGoogle Scholar
  8. [8]
    Kearns, M., Singh, S.: Near-Optimal Reinforcement Learning in Polynomial Time. In Proceedings of the 15th International Conference on Machine Learning (1998) 260–268Google Scholar
  9. [9]
    Kirman, J.: Predicting Real-Time Planner Performance by Domain Characterization. Ph.D. Thesis, Brown University (1995)Google Scholar
  10. [10]
    Lagoudakis, M., Littman, M. L.: Algorithm Selection using Reinforcement Learning Proceedings of the 17th International Conference on Machine Learning (2000) 511–518Google Scholar
  11. [11]
    Meuleau, N., Bourgine, P.: Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty. Machine Learning 35(2) (1999) 117–154MATHCrossRefGoogle Scholar
  12. [12]
    Moore, A. W., Atkeson, C. G.: Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time. Machine Learning, 13 (1993) 103–130Google Scholar
  13. [13]
    Papadimitriou, C. H., Steiglitz, K: Combinatorial Optimization: Algorithms and Complexity. Prentice Hall (1982)Google Scholar
  14. [14]
    Papadimitriou, C. H., Tsitsiklis, J. N.: The Complexity of Markov Chain Decision Processes. Mathematics of Operations Research 12(3) (1987) 441–450MATHMathSciNetCrossRefGoogle Scholar
  15. [15]
    Puterman, M. L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley (1994)Google Scholar
  16. [16]
    Singh, S. P., Jaakkola, T., Jordan, M. I.: Reinforcement Learning with Soft State Aggregation. Advances in Neural Information Processing Systems, 7 (1995) 361–368Google Scholar
  17. [17]
    Sutton, R. S., Barto, A. G.: Reinforcement Learning. An Introduction. Cambridge, MA: The MIT Press (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Bohdana Ratitch
    • 1
  • Doina Precup
    • 1
  1. 1.McGill UniversityMontrealCanada

Personalised recommendations