Abstract
In this chapter we study the finite-state approximation problem for computing near optimal policies for discrete-time MDPs with Borel state and action spaces, under discounted and average costs criteria. Even though existence and structural properties of optimal policies of MDPs have been studied extensively in the literature, computing such policies is generally a challenging problem for systems with uncountable state spaces. This situation also arises in the fully observed reduction of a partially observed Markov decision process even when the original system has finite state and action spaces. Here we show that one way to compute approximately optimal solutions for such MDPs is to construct a reduced model with a new transition probability and one-stage cost function by quantizing the state space, i.e., by discretizing it on a finite grid. It is reasonable to expect that when the one-stage cost function and the transition probability of the original model has certain continuity properties, the cost of the optimal policy for the approximating finite model converges to the optimal cost of the original model as the discretization becomes finer. Moreover, under additional continuity conditions on the transition probability and the one stage cost function we also obtain bounds on the accuracy of the approximation in terms of the number of points used to discretize the state space, thereby providing a tradeoff between the computation cost and the performance loss in the system. In particular, we study the following two problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
D.P. Bertsekas, Dynamic Programming and Optimal Control: Volume II (Athena Scientific, Belmont, 1995)
D.P. Bertsekas, S.E. Shreve, Stochastic Optimal Control: The Discrete Time Case (Academic, New York, 1978)
D. Blackwell, D. Freedman, M. Orkin, The optimal reward operator in dynamic programming. Ann. Probab. 2(2), 926–941 (1974)
V. Borkar, Convex analytic methods in Markov decision processes, in Handbook of Markov Decision Processes ed. by E.A. Feinberg, A. Shwartz (Kluwer Academic, Boston, 2002)
S.B. Connor, G. Fort, State-dependent Foster-Lyapunov criteria for subgeometric convergence of Markov chains. Stoch. Process Appl. 119, 176–4193 (2009)
T.M. Cover, J.A. Thomas, Elements of Information Theory, 2nd edn. (Wiley, New York, 2006)
R. Douc, G. Fort, E. Moulines, P. Soulier, Practical drift conditions for subgeometric rates of convergence. Ann. Appl. Probab 14, 1353–1377 (2004)
F. Dufour, T. Prieto-Rumeau, Approximation of Markov decision processes with general state space. J. Math. Anal. Appl. 388, 1254–1267 (2012)
F. Dufour, T. Prieto-Rumeau, Finite linear programming approximations of constrained discounted Markov decision processes. SIAM J. Control Optim. 51(2), 1298–1324 (2013)
F. Dufour, T. Prieto-Rumeau, Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities. Stochastics 87, 273–307 (2014)
E. Gordienko, O. Hernandez-Lerma, Average cost Markov control processes with weighted norms: existence of canonical policies. Appl. Math. 23(2), 199–218 (1995)
R.M. Gray, D.L. Neuhoff, Quantization. IEEE Trans. Inf. Theory 44, 2325–2383 (1998)
M. Hairer, Convergence of Markov Processes. Lecture Notes (University of Warwick, 2010)
O. Hernández-Lerma, Adaptive Markov Control Processes (Springer, New York, 1989)
O. Hernández-Lerma, J.B. Lasserre, Discrete-Time Markov Control Processes: Basic Optimality Criteria (Springer, Berlin, 1996)
O. Hernández-Lerma, J.B. Lasserre, Further Topics on Discrete-Time Markov Control Processes (Springer, New York, 1999)
K. Hinderer, Lipshitz continuity of value functions in Markovian desision processes. Math. Methods Oper. Res. 62, 3–22 (2005)
A. Jaśkiewicz, A.S. Nowak, On the optimality equation for average cost Markov control processes with Feller transition probabilities. J. Math. Anal. Appl. 316, 495–509 (2006)
K. Kuratowski, Topology: Volume I (Academic, New York, 1966)
S.P. Meyn, R.L. Tweedie, Markov Chains and Stochastic Stability (Springer, New York, 1993)
R. Ortner, Pseudometrics for state aggregation in average reward Markov decision processes, in Algorithmic Learning Theory (Springer, Berlin, 2007)
M.L. Puterman, Markov Decision Processes (Wiley, Hoboken, NJ, 2005)
G.O. Roberts, J.S. Rosenthal, General state space Markov chains and MCMC algorithms. Probab. Surv. 1, 20–71 (2004)
B.V. Roy, Performance loss bounds for approximate value iteration with state aggregation. Math. Oper. Res. 31(2), 234–244 (2006)
N. Saldi, T. Linder, S. Yüksel, Finite state approximations of Markov decision processes with general state and action spaces, in American Control Conference, Chicago (2015)
N. Saldi, S. Yüksel, T. Linder, Finite-state approximation of Markov decision processes with unbounded costs and Borel spaces, in IEEE Conference Decision Control, Japan (2015)
N. Saldi, S. Yüksel, T. Linder, Asymptotic optimality of finite approximations to Markov decision processes with Borel spaces. Math. Oper. Res. 42(4), 945–978 (2017)
S.E. Shreve, D.P. Bertsekas, Universally measurable policies in dynamic programming. Math. Oper. Res. 4(1), 15–30 (1979)
P. Tuominen, R.L. Tweedie, Subgeometric rates of convergence of f-ergodic Markov chains. Adv. Ann. Appl. Probab. 26(3), 775–798 (1994)
O. Vega-Amaya, The average cost optimality equation: a fixed point approach. Bol. Soc. Mat. Mex. 9(3), 185–195 (2003)
C. Villani, Optimal Transport: Old and New (Springer, Berlin, 2009)
Y. Yamada, S. Tazaki, R.M. Gray, Asymptotic performance of block quantizers with difference distortion measures. IEEE Trans. Inf. Theory 26, 6–14 (1980)
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Saldi, N., Linder, T., Yüksel, S. (2018). Finite-State Approximation of Markov Decision Processes. In: Finite Approximations in Discrete-Time Stochastic Control. Systems & Control: Foundations & Applications. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-79033-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-79033-6_4
Published:
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-319-79032-9
Online ISBN: 978-3-319-79033-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)