New prioritized value iteration for Markov decision processes

Garcia-Hernandez, Ma. de Guadalupe; Ruiz-Pinales, Jose; Onaindia, Eva; Aviña-Cervantes, J. Gabriel; Ledesma-Orozco, Sergio; Alvarado-Mendez, Edgar; Reyes-Ballesteros, Alberto

doi:10.1007/s10462-011-9224-z

New prioritized value iteration for Markov decision processes

Published: 01 May 2011

Volume 37, pages 157–167, (2012)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Ma. de Guadalupe Garcia-Hernandez¹,
Jose Ruiz-Pinales¹,
Eva Onaindia²,
J. Gabriel Aviña-Cervantes¹,
Sergio Ledesma-Orozco¹,
Edgar Alvarado-Mendez¹ &
…
Alberto Reyes-Ballesteros³

198 Accesses
5 Citations
Explore all metrics

Abstract

The problem of solving large Markov decision processes accurately and quickly is challenging. Since the computational effort incurred is considerable, current research focuses on finding superior acceleration techniques. For instance, the convergence properties of current solution methods depend, to a great extent, on the order of backup operations. On one hand, algorithms such as topological sorting are able to find good orderings but their overhead is usually high. On the other hand, shortest path methods, such as Dijkstra’s algorithm which is based on priority queues, have been applied successfully to the solution of deterministic shortest-path Markov decision processes. Here, we propose an improved value iteration algorithm based on Dijkstra’s algorithm for solving shortest path Markov decision processes. The experimental results on a stochastic shortest-path problem show the feasibility of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-objective dynamic programming with limited precision

Article Open access 02 November 2021

L. Mandow, J. L. Perez-de-la-Cruz & N. Pozas

A multi-objective approach for PH-graphs with applications to stochastic shortest paths

Article Open access 24 October 2020

Peter Buchholz & Iryna Dohndorf

A parallelized Lagrangean relaxation approach for the discrete ordered median problem

Article 26 October 2014

Juana L. Redondo, Alfredo Marín & Pilar M. Ortigosa

References

Agrawal S, Roth D (2002) Learning a sparse representation for object detection. In: Proceedings of the 7th European conference on computer vision. Copenhagen, Denmark, pp 1–15
Bellman RE (1954) The theory of dynamic programming. Bull Amer Math Soc 60: 503–516
Article MathSciNet MATH Google Scholar
Bellman RE (1957) Dynamic programming. Princeton University Press, New Jersey
MATH Google Scholar
Bertsekas DP (1995) Dynamic programming and optimal control. Athena Scientific, Massachusetts
MATH Google Scholar
Bhuma K, Goldsmith J (2003) Bidirectional LAO* algorithm. In: Proceedings of indian international conferences on artificial intelligence. p 980–992
Blackwell D (1965) Discounted dynamic programming. Ann Math Stat 36: 226–235
Article MathSciNet MATH Google Scholar
Bonet B, Geffner H (2003a) Faster heuristic search algorithms for planning with uncertainty and full feedback. In: Proceedings of the 18th international joint conference on artificial intelligence. Morgan Kaufmann, Acapulco, México, pp 1233–1238
Bonet B, Geffner H (2003b) Labeled RTDP: improving the convergence of real-time dynamic programming. In: Proceedings of the international conference on automated planning and scheduling. Trento, Italy, pp 12–21
Bonet B, Geffner H (2006) Learning depth-first search: a unified approach to heuristic search in deterministic and non-deterministic settings and its application to MDP. In: Proceedings of the 16th international conference on automated planning and scheduling. Cumbria, UK
Boutilier C, Dean T, Hanks S (1999) Decision-theoretic planning: structural assumptions and computational leverage. J Artif Intell Res 11: 1–94
MathSciNet MATH Google Scholar
Chang I, Soo H (2007) Simulation-based algorithms for Markov decision processes Communications and control engineering. Springer, London
Google Scholar
Dai P, Goldsmith J (2007a) Faster dynamic programming for Markov decision processes. Technical report. Doctoral consortium, department of computer science and engineering. University of Washington
Dai P, Goldsmith J (2007b) Topological value iteration algorithm for Markov decision processes. In: Proceedings of the 20th international joint conference on artificial intelligence. Hyderabad, India, pp 1860–1865
Dai P, Hansen EA (2007c) Prioritizing bellman backups without a priority queue. In: Proceedings of the 17th international conference on automated planning and scheduling, association for the advancement of artificial intelligence. Rhode Island, USA, pp 113–119
Dibangoye JS, Chaib-draa B, Mouaddib A (2008) A Novel prioritization technique for solving Markov decision processes. In: Proceedings of the 21st international FLAIRS (The Florida Artificial Intelligence Research Society) conference, association for the advancement of artificial intelligence. Florida, USA
Ferguson D, Stentz A (2004) Focused propagation of MDPs for path planning. In: Proceedings of the 16th IEEE international conference on tools with artificial intelligence. pp 310–317
Hansen EA, Zilberstein S (2001) LAO: a heuristic search algorithm that finds solutions with loops. Artif Intell 129: 35–62
Article MathSciNet MATH Google Scholar
Hinderer K, Waldmann KH (2003) The critical discount factor for finite Markovian decision processes with an absorbing set. Math Methods Oper Res 57: 1–19
Article MathSciNet MATH Google Scholar
Li L (2009) A unifying framework for computational reinforcement learning theory. PhD Thesis. The state university of New Jersey, New Brunswick. NJ
Littman ML, Dean TL, Kaelbling LP (1995) On the complexity of solving Markov decision problems.In: Proceedings of the 11th international conference on uncertainty in artificial intelligence. Montreal, Quebec pp 394–402
McMahan HB, Gordon G (2005a) Fast exact planning in Markov decision processes. In: Proceedings of the 15th international conference on automated planning and scheduling. Monterey, CA, USA
McMahan HB, Gordon G (2005b) Generalizing Dijkstra’s algorithm and gaussian elimination for solving MDPs. Technical report, Carnegie Mellon University, Pittsburgh
Meuleau N, Brafman R, Benazera E (2006) Stochastic over-subscription planning using hierarchies of MDPs. In: Proceedings of the 16th international conference on automated planning and scheduling. Cumbria, UK, pp 121–130
Moore A, Atkeson C (1993) Prioritized sweeping: reinforcement learning with less data and less real time. Mach Learn 13: 103–130
Google Scholar
Puterman ML (1994) Markov decision processes. Wiley Editors, New York
Book MATH Google Scholar
Puterman ML (2005) Markov decision processes. Wiley Inter Science Editors, New York
MATH Google Scholar
Russell S (2005) Artificial intelligence: a modern approach. Making complex decisions (Ch-17), 2nd edn. Pearson Prentice Hill Ed., USA
Google Scholar
Shani G, Brafman R, Shimony S (2008) Prioritizing point-based POMDP solvers. IEEE Trans Syst Man Cybern 38(6): 1592–1605
Article Google Scholar
Sniedovich M (2006) Dijkstra’s algorithm revisited: the dynamic programming connexion. Control Cybern 35: 599–620
MathSciNet MATH Google Scholar
Sniedovich M (2010) Dynamic programming: foundations and principles, 2nd edn. Pure and Applied Mathematics Series, UK
Book Google Scholar
Tijms HC (2003) A first course in stochastic models. Discrete-time Markov decision processes (Ch-6). Wiley Editors, UK
Google Scholar
Vanderbei RJ (1996) Optimal sailing strategies. Statistics and operations research program, University of Princeton, USA (http://www.orfe.princeton.edu/~rvdb/sail/sail.html)
Vanderbei RJ (2008) Linear programming: foundations and extensions, 3rd edn. Springer, New York
MATH Google Scholar
Wingate D, Seppi KD (2005) Prioritization methods for accelerating MDP solvers. J Mach Learn Res 6: 851–881
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

University of Guanajuato, Comunidad de Palo Blanco s/n, Guanajuato, Salamanca, Mexico
Ma. de Guadalupe Garcia-Hernandez, Jose Ruiz-Pinales, J. Gabriel Aviña-Cervantes, Sergio Ledesma-Orozco & Edgar Alvarado-Mendez
Universitat Politècnica de València, DSIC, Camino de Vera s/n, 46022, Valencia, España, Spain
Eva Onaindia
Electrical Research Institute, Reforma 113, 62490, Morelos, Temixco, Mexico
Alberto Reyes-Ballesteros

Authors

Ma. de Guadalupe Garcia-Hernandez
View author publications
You can also search for this author in PubMed Google Scholar
Jose Ruiz-Pinales
View author publications
You can also search for this author in PubMed Google Scholar
Eva Onaindia
View author publications
You can also search for this author in PubMed Google Scholar
J. Gabriel Aviña-Cervantes
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Ledesma-Orozco
View author publications
You can also search for this author in PubMed Google Scholar
Edgar Alvarado-Mendez
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Reyes-Ballesteros
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ma. de Guadalupe Garcia-Hernandez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Garcia-Hernandez, M.d.G., Ruiz-Pinales, J., Onaindia, E. et al. New prioritized value iteration for Markov decision processes. Artif Intell Rev 37, 157–167 (2012). https://doi.org/10.1007/s10462-011-9224-z

Download citation

Published: 01 May 2011
Issue Date: February 2012
DOI: https://doi.org/10.1007/s10462-011-9224-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

New prioritized value iteration for Markov decision processes

Abstract

Access this article

Similar content being viewed by others

Multi-objective dynamic programming with limited precision

A multi-objective approach for PH-graphs with applications to stochastic shortest paths

A parallelized Lagrangean relaxation approach for the discrete ordered median problem

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

New prioritized value iteration for Markov decision processes

Abstract

Access this article

Similar content being viewed by others

Multi-objective dynamic programming with limited precision

A multi-objective approach for PH-graphs with applications to stochastic shortest paths

A parallelized Lagrangean relaxation approach for the discrete ordered median problem

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation