Advertisement

Bounded Aggregation for Continuous Time Markov Decision Processes

Conference paper
  • 584 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10497)

Abstract

Markov decision processes suffer from two problems, namely the so-called state space explosion which may lead to long computation times and the memoryless property of states which limits the modeling power with respect to real systems. In this paper we combine existing state aggregation and optimization methods for a new aggregation based optimization method. More specifically, we compute reward bounds on an aggregated model by exchanging state space size with uncertainty. We propose an approach for continuous time Markov decision models with discounted or average reward measures.

The approach starts with a portioned state space which consists of blocks that represent an abstract, high-level view on the state space. The sojourn time in each block can then be represented by a phase-type distribution (PHD). Using known properties of PHDs, we can then bound sojourn times in the blocks and also the accumulated reward in each sojourn by constraining the set of possible initial vectors in order to derive tighter bounds for the sojourn times, and, ultimatively, for the average or discounted reward measures. Furthermore, given a fixed policy for the CTMDP, we can then further constrain the initial vector which improves reward bounds. The aggregation approach is illustrated on randomly generated models.

Keywords

Markov Decision Process Aggregation Discounted reward Average reward Bounds 

References

  1. 1.
    Abate, A., Češka, M., Kwiatkowska, M.: Approximate policy iteration for Markov decision processes via quantitative adaptive aggregations. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 13–31. Springer, Cham (2016). doi: 10.1007/978-3-319-46520-3_2 CrossRefGoogle Scholar
  2. 2.
    Beutler, F.J., Ross, K.W.: Uniformization for semi-Markov decision processes under stationary policies. J. Appl. Probability 24, 644–656 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Buchholz, P.: Bounding reward measures of Markov models using the Markov decision processes. Numerical Lin. Alg. with Applic. 18(6), 919–930 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Buchholz, P., Dohndorf, I., Scheftelowitsch, D.: Analysis of Markov decision processes under parameter uncertainty. In: Reinecke, P., Di Marco, A. (eds.) EPEW 2017. LNCS, vol. 10497, pp. 3–18. Springer, Cham (2017). doi: 10.1007/978-3-319-66583-2_1 CrossRefGoogle Scholar
  5. 5.
    Buchholz, P., Hahn, E.M., Hermanns, H., Zhang, L.: Model checking algorithms for CTMDPs. In: Computer Aided Verification - 23rd International Conference, CAV 2011, Snowbird, UT, USA, 14–20 July 2011, Proceedings, pp. 225–242 (2011)Google Scholar
  6. 6.
    Buchholz, P., Kriege, J., Felko, I.: Input Modeling with Phase-Type Distributions and Markov Models. SM. Springer, Cham (2014)CrossRefzbMATHGoogle Scholar
  7. 7.
    Courtois, P., Semal, P.: Bounds for the positive eigenvectors of nonnegative matrices and for their approximations by decomposition. J. ACM 31(4), 804–825 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Dean, T.L., Givan, R., Leach, S.M.: Model reduction techniques for computing approximately optimal solutions for Markov decision processes. In: Geiger, D., Shenoy, P.P. (eds.) UAI 1997: Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, Brown University, Providence, Rhode Island, USA, 1–3 August 1997, pp. 124–131. Morgan Kaufmann (1997)Google Scholar
  9. 9.
    Franceschinis, G., Muntz, R.R.: Bounds for quasi-lumpable Markov chains. Perform. Eval. 20(1–3), 223–243 (1994)CrossRefzbMATHGoogle Scholar
  10. 10.
    Givan, R., Dean, T.L., Greig, M.: Equivalence notions and model minimization in Markov decision processes. Artif. Intell. 147(1–2), 163–223 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Givan, R., Leach, S.M., Dean, T.L.: Bounded-parameter Markov decision processes. Artif. Intell. 122(1–2), 71–109 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Li, L., Walsh, T.J., Littman, M.L.: Towards a unified theory of state abstraction for MDPs. In: International Symposium on Artificial Intelligence and Mathematics, ISAIM 2006, Fort Lauderdale, Florida, USA, 4–6 January 2006 (2006)Google Scholar
  13. 13.
    Puterman, M.L.: Markov Decision Processes. Wiley, New York (2005)zbMATHGoogle Scholar
  14. 14.
    Ren, Z., Krogh, B.: State aggregation in Markov decision processes. In: Proceedings of the 41st IEEE Conference on Decision and Control, vol. 4, pp. 3819–3824. IEEE (2002)Google Scholar
  15. 15.
    Semal, P.: Refinable bounds for large Markov chains. IEEE Trans. Computers 44(10), 1216–1222 (1995)CrossRefzbMATHGoogle Scholar
  16. 16.
    Serfozo, R.F.: An equivalence between continuous and discrete time Markov decision processes. Oper. Res. 27(3), 616–620 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Tewari, A., Bartlett, P.L.: Bounded parameter Markov decision processes with average reward criterion. In: Bshouty, N.H., Gentile, C. (eds.) COLT 2007. LNCS, vol. 4539, pp. 263–277. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-72927-3_20 CrossRefGoogle Scholar
  18. 18.
    Van Roy, B.: Performance loss bounds for approximate value iteration with state aggregation. Math. Oper. Res. 31(2), 234–244 (2006)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceTU DortmundDortmundGermany

Personalised recommendations