Skip to main content

Finite-State Approximation of Markov Decision Processes

  • Chapter
  • First Online:
Finite Approximations in Discrete-Time Stochastic Control

Part of the book series: Systems & Control: Foundations & Applications ((SCFA))

  • 849 Accesses

Abstract

In this chapter we study the finite-state approximation problem for computing near optimal policies for discrete-time MDPs with Borel state and action spaces, under discounted and average costs criteria. Even though existence and structural properties of optimal policies of MDPs have been studied extensively in the literature, computing such policies is generally a challenging problem for systems with uncountable state spaces. This situation also arises in the fully observed reduction of a partially observed Markov decision process even when the original system has finite state and action spaces. Here we show that one way to compute approximately optimal solutions for such MDPs is to construct a reduced model with a new transition probability and one-stage cost function by quantizing the state space, i.e., by discretizing it on a finite grid. It is reasonable to expect that when the one-stage cost function and the transition probability of the original model has certain continuity properties, the cost of the optimal policy for the approximating finite model converges to the optimal cost of the original model as the discretization becomes finer. Moreover, under additional continuity conditions on the transition probability and the one stage cost function we also obtain bounds on the accuracy of the approximation in terms of the number of points used to discretize the state space, thereby providing a tradeoff between the computation cost and the performance loss in the system. In particular, we study the following two problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. D.P. Bertsekas, Dynamic Programming and Optimal Control: Volume II (Athena Scientific, Belmont, 1995)

    Google Scholar 

  2. D.P. Bertsekas, S.E. Shreve, Stochastic Optimal Control: The Discrete Time Case (Academic, New York, 1978)

    Google Scholar 

  3. D. Blackwell, D. Freedman, M. Orkin, The optimal reward operator in dynamic programming. Ann. Probab. 2(2), 926–941 (1974)

    Article  MathSciNet  Google Scholar 

  4. V. Borkar, Convex analytic methods in Markov decision processes, in Handbook of Markov Decision Processes ed. by E.A. Feinberg, A. Shwartz (Kluwer Academic, Boston, 2002)

    Google Scholar 

  5. S.B. Connor, G. Fort, State-dependent Foster-Lyapunov criteria for subgeometric convergence of Markov chains. Stoch. Process Appl. 119, 176–4193 (2009)

    Article  MathSciNet  Google Scholar 

  6. T.M. Cover, J.A. Thomas, Elements of Information Theory, 2nd edn. (Wiley, New York, 2006)

    Google Scholar 

  7. R. Douc, G. Fort, E. Moulines, P. Soulier, Practical drift conditions for subgeometric rates of convergence. Ann. Appl. Probab 14, 1353–1377 (2004)

    Google Scholar 

  8. F. Dufour, T. Prieto-Rumeau, Approximation of Markov decision processes with general state space. J. Math. Anal. Appl. 388, 1254–1267 (2012)

    Article  MathSciNet  Google Scholar 

  9. F. Dufour, T. Prieto-Rumeau, Finite linear programming approximations of constrained discounted Markov decision processes. SIAM J. Control Optim. 51(2), 1298–1324 (2013)

    Article  MathSciNet  Google Scholar 

  10. F. Dufour, T. Prieto-Rumeau, Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities. Stochastics 87, 273–307 (2014)

    Article  MathSciNet  Google Scholar 

  11. E. Gordienko, O. Hernandez-Lerma, Average cost Markov control processes with weighted norms: existence of canonical policies. Appl. Math. 23(2), 199–218 (1995)

    Article  MathSciNet  Google Scholar 

  12. R.M. Gray, D.L. Neuhoff, Quantization. IEEE Trans. Inf. Theory 44, 2325–2383 (1998)

    Google Scholar 

  13. M. Hairer, Convergence of Markov Processes. Lecture Notes (University of Warwick, 2010)

    Google Scholar 

  14. O. Hernández-Lerma, Adaptive Markov Control Processes (Springer, New York, 1989)

    Chapter  Google Scholar 

  15. O. Hernández-Lerma, J.B. Lasserre, Discrete-Time Markov Control Processes: Basic Optimality Criteria (Springer, Berlin, 1996)

    Chapter  Google Scholar 

  16. O. Hernández-Lerma, J.B. Lasserre, Further Topics on Discrete-Time Markov Control Processes (Springer, New York, 1999)

    Book  Google Scholar 

  17. K. Hinderer, Lipshitz continuity of value functions in Markovian desision processes. Math. Methods Oper. Res. 62, 3–22 (2005)

    Article  MathSciNet  Google Scholar 

  18. A. Jaśkiewicz, A.S. Nowak, On the optimality equation for average cost Markov control processes with Feller transition probabilities. J. Math. Anal. Appl. 316, 495–509 (2006)

    Article  MathSciNet  Google Scholar 

  19. K. Kuratowski, Topology: Volume I (Academic, New York, 1966)

    Chapter  Google Scholar 

  20. S.P. Meyn, R.L. Tweedie, Markov Chains and Stochastic Stability (Springer, New York, 1993)

    Book  Google Scholar 

  21. R. Ortner, Pseudometrics for state aggregation in average reward Markov decision processes, in Algorithmic Learning Theory (Springer, Berlin, 2007)

    MATH  Google Scholar 

  22. M.L. Puterman, Markov Decision Processes (Wiley, Hoboken, NJ, 2005)

    Google Scholar 

  23. G.O. Roberts, J.S. Rosenthal, General state space Markov chains and MCMC algorithms. Probab. Surv. 1, 20–71 (2004)

    Article  MathSciNet  Google Scholar 

  24. B.V. Roy, Performance loss bounds for approximate value iteration with state aggregation. Math. Oper. Res. 31(2), 234–244 (2006)

    Google Scholar 

  25. N. Saldi, T. Linder, S. Yüksel, Finite state approximations of Markov decision processes with general state and action spaces, in American Control Conference, Chicago (2015)

    Google Scholar 

  26. N. Saldi, S. Yüksel, T. Linder, Finite-state approximation of Markov decision processes with unbounded costs and Borel spaces, in IEEE Conference Decision Control, Japan (2015)

    Google Scholar 

  27. N. Saldi, S. Yüksel, T. Linder, Asymptotic optimality of finite approximations to Markov decision processes with Borel spaces. Math. Oper. Res. 42(4), 945–978 (2017)

    Article  MathSciNet  Google Scholar 

  28. S.E. Shreve, D.P. Bertsekas, Universally measurable policies in dynamic programming. Math. Oper. Res. 4(1), 15–30 (1979)

    Article  MathSciNet  Google Scholar 

  29. P. Tuominen, R.L. Tweedie, Subgeometric rates of convergence of f-ergodic Markov chains. Adv. Ann. Appl. Probab. 26(3), 775–798 (1994)

    Article  MathSciNet  Google Scholar 

  30. O. Vega-Amaya, The average cost optimality equation: a fixed point approach. Bol. Soc. Mat. Mex. 9(3), 185–195 (2003)

    MathSciNet  MATH  Google Scholar 

  31. C. Villani, Optimal Transport: Old and New (Springer, Berlin, 2009)

    Book  Google Scholar 

  32. Y. Yamada, S. Tazaki, R.M. Gray, Asymptotic performance of block quantizers with difference distortion measures. IEEE Trans. Inf. Theory 26, 6–14 (1980)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Naci Saldi , Tamás Linder or Serdar Yüksel .

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Saldi, N., Linder, T., Yüksel, S. (2018). Finite-State Approximation of Markov Decision Processes. In: Finite Approximations in Discrete-Time Stochastic Control. Systems & Control: Foundations & Applications. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-79033-6_4

Download citation

Publish with us

Policies and ethics