Advertisement

Applied Intelligence

, Volume 45, Issue 3, pp 662–672 | Cite as

Robust probabilistic planning with ilao

  • Daniel A. M. Moreira
  • Karina Valdivia DelgadoEmail author
  • Leliane Nunes de Barros
Article
  • 220 Downloads

Abstract

In probabilistic planning problems which are usually modeled as Markov Decision Processes (MDPs), it is often difficult, or impossible, to obtain an accurate estimate of the state transition probabilities. This limitation can be overcome by modeling these problems as Markov Decision Processes with imprecise probabilities (MDP-IPs). Robust LAO* and Robust LRTDP are efficient algorithms for solving a special class of MDP-IPs where the probabilities lie in a given interval, known as Bounded-Parameter Stochastic-Shortest Path MDP (BSSP-MDP). However, they do not make clear what assumptions must be made to find a robust solution (the best policy under the worst model). In this paper, we propose a new efficient algorithm for BSSP-MDPs, called Robust ILAO* which has a better performance than Robust LAO* and Robust LRTDP, considered the-state-of-the art of robust probabilistic planning. We also define the assumptions required to ensure a robust solution and prove that Robust ILAO* algorithm converges to optimal values if the initial value of all states is admissible.

Keywords

Bounded-parameter Markov decision process Probabilistic planning Heuristic search 

Notes

Acknowledgments

We thank the São Paulo Research Foundation for the financial support (FAPESP grant #2015/01587-0).

References

  1. 1.
    Barto A, Bradtke S, Singh S (1995) Learning to act using Real-Time dynamic programming. Artif Intell 72:81–138CrossRefGoogle Scholar
  2. 2.
    Datta A, Choudhary A, Bittner ML, Dougherty ER (2003) External control in Markovian genetic regulatory networks. Mach Learn 52(1-2):169191zbMATHGoogle Scholar
  3. 3.
    Tewari A, Barlett PL (2007) Bounded parameter Markov decision processes with average reward criterion. Springer, Berlin Heidelberg, pp 263–277. proceedings of Learning Theory: 20th Annual Conference on Learning TheoryGoogle Scholar
  4. 4.
    Bonet B, Geffner H, Labeled RTDP (2003) Improving the convergence of real-time dynamic programming in Proc. AAAI Press:12–21. 13th International Conf. on Automated Planning and Sheduling Trento: Italy:Google Scholar
  5. 5.
    White III CC, El-Deib HK (1994) Markov decision processes with imprecise transition probabilities. Oper Res 42(4):739–749MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Bertsekas D (1995) Programming, Dynamic Control, Optimal Athena scientific, belmont MAGoogle Scholar
  7. 7.
    Bertsekas DP, Tsitsiklis JN (1991) An analysis of stochastic shortest path problems. Math Oper Res 16(3):580595MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Bryce D, Verdicchio M, Kim S (2010) Planning interventions in biological networks. ACM Trans Intell Syst Technol 1(2):111–140CrossRefGoogle Scholar
  9. 9.
    Wu D, Koutsoukos X (2008) Reachability analysis of uncertain systems using bounded-parameter Markov decision processes. Artif Intell 172(8):945–954MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Hansen E, Zilberstein S (2001) LAO*: A heuristic search algorithm that finds solutions with loops. Artif Intell 129:35– 62MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Hansen E, Zilberstein S (1999) Solving Markov decision problems using heuristic search. AAAI Technical Report:42–47Google Scholar
  12. 12.
    Trevizan F W, Cozman F G, de Barros L N (2007) Planning under risk and knightian uncertainty. Hyderabad, India, pp 2023–2028. Inproceedings of International Joint Conferences on Artificial IntelligenceGoogle Scholar
  13. 13.
    McMahan HB, Likhachev M, Gordon GJ (2005) Bounded realtime dynamic programming: RTDP with monotone upper bounds and performance guarantees:569–576. Inproceedings of the 22nd international conference on Machine Learning (ICML ’05) New York NYGoogle Scholar
  14. 14.
    Satia JK, Lave Jr RE (1970) Markovian decision processes with uncertain transition probabilities. Oper Res 21:728–740MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Delgado KV, Sanner S, de Barros LN (2011) Efficient solutions to factored MDPs with imprecise transition probabilities. Artif Intell 175(9-10):1498–1527MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Delgado KV, de Barros LN, Cozman FG, Sanner S (2011) Using mathematical programming to solve factored Markov decision processes with imprecise probabilities. Int J Approx Reason (IJAR) 52(7):1000–1017MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Delgado KV, de Barros LN, Dias DB, Sanner S (2015) Real-time dynamic programming for Markov Decision Processes with imprecise probabilities. Artif Intell 230:192–223MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Hauskrecht M (1997) Dynamic decision making in stochastic partially observable medical domains. Lecture Notes in Artificial Intelligence 1211:296–299. Ischemic heart disease example, 6th Conference on Artificial Intelligence in Medicine, SpringerGoogle Scholar
  19. 19.
    Puterman M L, Processes Markov Decision (1994) Discrete Stochastic Dynamic Programming, 1st ed, New York, NY, USA: John Wiley & Sons IncGoogle Scholar
  20. 20.
    Buffet O, Aberdeen D (2005) Robust planning with (l)RTDP, in Proc. of the 19th. Int Joint Conf:1214–1219. on Artificial Intelligence (IJCAI05)Google Scholar
  21. 21.
    Givan R, Leach S, Dean T (2000) Bounded-parameter Markov Decision Processes. Artif Intell 122:71–109MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Pal R, Datta A, Dougherty ER (2008) Robust intervention in probabilistic boolean networks. IEEE Trans Signal Process 56(3):1280–1294MathSciNetCrossRefGoogle Scholar
  23. 23.
    Sanner S, Goetschalckx R, Driessens K, Shani G (2009) Bayesian realtime dynamic programming, in 21st International Joint Conference on Artifical Intelligence (IJCAI-09). Kaufmann Publishers Inc., San Francisco, CA, pp 1784–1789Google Scholar
  24. 24.
    Cui S, Sun J, Yin M, Lu S (2006) Solving uncertain Markov decision problems: an Interval-Based method, second international conference. ICNC:948–957Google Scholar
  25. 25.
    Patek SD, Bertsekas DP (1999) Stochastic shortest path games. SIAM J Control Optim 37:804–824MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Witwicki SJ, Melo FS, Capitan J, Spaan MTJ (2013) A flexible approach to modeling unpredictable events in MDPs. ICAPS:260–268. proceedings of the Twenty-Third International Conference on Automated Planning and SchedulingGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Institute of Mathematics and StatisticsUniversity of São PauloButantãBrazil
  2. 2.School of Arts, Sciences and HumanitiesUniversity of São PauloErmelino MatarazzoBrazil

Personalised recommendations