Solving Relational MDPs with Exogenous Events and Additive Rewards

  • Saket Joshi
  • Roni Khardon
  • Prasad Tadepalli
  • Aswin Raghavan
  • Alan Fern
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8188)

Abstract

We formalize a simple but natural subclass of service domains for relational planning problems with object-centered, independent exogenous events and additive rewards capturing, for example, problems in inventory control. Focusing on this subclass, we present a new symbolic planning algorithm which is the first algorithm that has explicit performance guarantees for relational MDPs with exogenous events. In particular, under some technical conditions, our planning algorithm provides a monotonic lower bound on the optimal value function. To support this algorithm we present novel evaluation and reduction techniques for generalized first order decision diagrams, a knowledge representation for real-valued functions over relational world states. Our planning algorithm uses a set of focus states, which serves as a training set, to simplify and approximate the symbolic solution, and can thus be seen to perform learning for planning. A preliminary experimental evaluation demonstrates the validity of our approach.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bahar, R., Frohm, E., Gaona, C., Hachtel, G., Macii, E., Pardo, A., Somenzi, F.: Algebraic decision diagrams and their applications. In: Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, pp. 188–191 (1993)Google Scholar
  2. 2.
    Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research 11, 1–94 (1999)MathSciNetMATHGoogle Scholar
  3. 3.
    Boutilier, C., Reiter, R., Price, B.: Symbolic dynamic programming for first-order MDPs. In: Proceedings of the International Joint Conference of Artificial Intelligence, pp. 690–700 (2001)Google Scholar
  4. 4.
    Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research 19, 399–468 (2003)MathSciNetMATHGoogle Scholar
  5. 5.
    Hoey, J., St-Aubin, R., Hu, A., Boutilier, C.: SPUDD: Stochastic planning using decision diagrams. In: Proceedings of Uncertainty in Artificial Intelligence, pp. 279–288 (1999)Google Scholar
  6. 6.
    Hölldobler, S., Karabaev, E., Skvortsova, O.: FluCaP: a heuristic search planner for first-order MDPs. Journal of Artificial Intelligence Research 27, 419–439 (2006)MATHGoogle Scholar
  7. 7.
    Joshi, S., Kersting, K., Khardon, R.: Self-Taught decision theoretic planning with first-order decision diagrams. In: Proceedings of the International Conference on Automated Planning and Scheduling, pp. 89–96 (2010)Google Scholar
  8. 8.
    Joshi, S., Kersting, K., Khardon, R.: Decision theoretic planning with generalized first order decision diagrams. Artificial Intelligence 175, 2198–2222 (2011)MathSciNetMATHCrossRefGoogle Scholar
  9. 9.
    Joshi, S., Khardon, R.: Probabilistic relational planning with first-order decision diagrams. Journal of Artificial Intelligence Research 41, 231–266 (2011)MathSciNetMATHGoogle Scholar
  10. 10.
    Joshi, S., Khardon, R., Tadepalli, P., Raghavan, A., Fern, A.: Solving relational MDPs with exogenous events and additive rewards. CoRR abs/1306 6302 (2013), http://arxiv.org/abs/1306.6302
  11. 11.
    Kersting, K., van Otterlo, M., De Raedt, L.: Bellman goes relational. In: Proceedings of the International Conference on Machine Learning, pp. 465–472 (2004)Google Scholar
  12. 12.
    McMahan, H.B., Likhachev, M., Gordon, G.J.: Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In: Proceedings of the International Conference on Machine Learning, pp. 569–576 (2005)Google Scholar
  13. 13.
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley (1994)Google Scholar
  14. 14.
    Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall Series in Artificial Intelligence (2002)Google Scholar
  15. 15.
    Sanner, S.: First-order decision-theoretic planning in structured relational environments. Ph.D. thesis, University of Toronto (2008)Google Scholar
  16. 16.
    Sanner, S.: Relational dynamic influence diagram language (RDDL): Language description (2010), http://users.cecs.anu.edu.au/~sanner/IPPC2011/RDDL.pdf
  17. 17.
    Sanner, S., Boutilier, C.: Approximate solution techniques for factored first-order MDPs. In: Proceedings of the International Conference on Automated Planning and Scheduling, pp. 288–295 (2007)Google Scholar
  18. 18.
    Sanner, S., Boutilier, C.: Practical solution techniques for first-order MDPs. Artificial Intelligence 173, 748–788 (2009)MathSciNetMATHCrossRefGoogle Scholar
  19. 19.
    Sanner, S., Uther, W., Delgado, K.: Approximate dynamic programming with affine ADDs. In: Proceeding of the International Conference on Autonomous Agents and Multiagent Systems, pp. 1349–1356 (2010)Google Scholar
  20. 20.
    Wang, C., Joshi, S., Khardon, R.: First-Order decision diagrams for relational MDPs. Journal of Artificial Intelligence Research 31, 431–472 (2008)MathSciNetMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Saket Joshi
    • 1
  • Roni Khardon
    • 2
  • Prasad Tadepalli
    • 3
  • Aswin Raghavan
    • 3
  • Alan Fern
    • 3
  1. 1.Cycorp Inc.AustinUSA
  2. 2.Tufts UniversityMedfordUSA
  3. 3.Oregon State UniversityCorvallisUSA

Personalised recommendations