A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

Kearns, Michael; Mansour, Yishay; Ng, Andrew Y.

doi:10.1023/A:1017932429737

A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

Published: November 2002

Volume 49, pages 193–208, (2002)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

Download PDF

Michael Kearns¹,
Yishay Mansour² &
Andrew Y. Ng³

3626 Accesses
130 Citations
4 Altmetric
Explore all metrics

Abstract

A critical issue for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP. In stochastic environments with very large or infinite state spaces, traditional planning and reinforcement learning algorithms may be inapplicable, since their running time typically grows linearly with the state space size in the worst case. In this paper we present a new algorithm that, given only a generative model (a natural and common type of simulator) for an arbitrary MDP, performs on-line, near-optimal planning with a per-state running time that has no dependence on the number of states. The running time is exponential in the horizon time (which depends only on the discount factor γ and the desired degree of approximation to the optimal policy). Our algorithm thus provides a different complexity trade-off than classical algorithms such as value iteration—rather than scaling linearly in both horizon time and state space size, our running time trades an exponential dependence on the former in exchange for no dependence on the latter.

Our algorithm is based on the idea of sparse sampling. We prove that a randomly sampled look-ahead tree that covers only a vanishing fraction of the full look-ahead tree nevertheless suffices to compute near-optimal actions from any state of an MDP. Practical implementations of the algorithm are discussed, and we draw ties to our related recent results on finding a near-best strategy from a given class of strategies in very large partially observable MDPs (Kearns, Mansour, & Ng. Neural information processing systems 13, to appear).

Article PDF

Monte Carlo Tree Search: a review of recent modifications and applications

Article Open access 19 July 2022

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Article 22 April 2021

References

Aho, A. V., Hopcroft, J. E., & Ullman, J. D. (1974). The design and analysis of computer algorithms. Reading MA: Addison-Wesley.
Google Scholar
Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995) Learning to act using real-time dynamic programming. Artificial Intelligence, 72, 81–138.
Google Scholar
Boutilier, C., Dearden, R., & Goldszmidt, M. (1995). Exploiting structure in policy construction. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (pp. 1104-1111).
Boyen, X., & Koller, D. (1998). Tractable inference for complex stochastic processes. In Proceedings of the 1998 Conference on Uncertainty in Artificial Intelligence. San Mateo, CA: Morgan Kauffmann.
Google Scholar
Bonet, B., Loerincs, G., & Geffner, H. (1997). A robust and fast action selection mechanism for planning. In Proceedings of the Fourteenth National Conference on Artifial Intelligence.
Dearden, R., & Boutilier, C. (1994). Integrating planning and execution in stochastic domains. In Proceedings of the Tenth Annual Conference on Uncertainty in Artificial Intelligence.
Davies, S., Ng, A. Y., & Moore, A. (1998). Applying online-search to reinforcement learning. In Proceedings of AAAI-98 (pp. 753–760). Menlo Park, CA: AAAI Press.
Google Scholar
Kearns, M., Mansour, Y., & Ng, A. Y. Approximate planning in large POMDPs via reusable trajectories. In neural information processing systems 13, to appear.
Korf, R. E. (1990). Real-time heuristic search. Artificial Intelligence, 42, 189–211.
Google Scholar
Koller, D., & Parr, R. (1999). Computing factored value functions for policies in structured MDPs. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence.
Koenig, S., & Simmons, R. (1998). Solving robot navigation problems with initial pose uncertainty using realtime heuristic search. In Proceedings of the Fourth International Conference on Artificial Intelligence Planning Systems.
Kearns, M., & Singh, S. (1999). Finite-sample convergence rates for Q-learning and indirect algorithms. In Neural Information Processing systems 12. Cambridge, MA: MIT Press.
Google Scholar
Meuleau, N., Hauskrecht, M., Kim, K.-E., Peshkin, L., Kaelbling, L. P., Dean, T., & Boutilier, C. (1998). Solving very large weakly coupled Markov decision processes. In Proceedings of AAAI (pp. 165-172).
McAllester, D., & Singh, S. (1999). Personal Communication.
McAllester, D., & Singh, S. Approximate planning for factored POMDPs using belief state simplification. Preprint.
Russell, S., & Norvig, P. (1995). Artificial Intelligence: A modern approach. Englewood cliffs, NJ: Prentice-Hall.
Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning. Cambridge, MA: MIT Press.
Google Scholar
Singh, S., & Yee, R. (1994). An upper bound on the loss from approximate optimal-value functions. Machine Learning, 16, 227–233.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, University of Pennsylvania, Moore School Building, 200 South 33rd Street, Philadelphia, PA, 19104-6389, USA
Michael Kearns
Department of Computer Science, Tel Aviv University, 69978, Tel Aviv, Israel
Yishay Mansour
Department of Computer Science, University of Berkeley, Berkeley, CA, 94704, USA
Andrew Y. Ng

Authors

Michael Kearns
View author publications
You can also search for this author in PubMed Google Scholar
Yishay Mansour
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Y. Ng
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kearns, M., Mansour, Y. & Ng, A.Y. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes. Machine Learning 49, 193–208 (2002). https://doi.org/10.1023/A:1017932429737

Download citation

Issue Date: November 2002
DOI: https://doi.org/10.1023/A:1017932429737

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

Abstract

Article PDF

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

A practical guide to multi-objective reinforcement learning and planning

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

Abstract

Article PDF

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

A practical guide to multi-objective reinforcement learning and planning

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation