Markov Decision Processes in Practice pp 63-101 | Cite as

# Approximate Dynamic Programming by Practical Examples

## Abstract

Computing the exact solution of an MDP model is generally difficult and possibly intractable for realistically sized problem instances. A powerful technique to solve the large scale discrete time multistage stochastic control processes is Approximate Dynamic Programming (ADP). Although ADP is used as an umbrella term for a broad spectrum of methods to approximate the optimal solution of MDPs, the common denominator is typically to combine optimization with simulation, use approximations of the optimal values of the Bellman’s equations, and use approximate policies. This chapter aims to present and illustrate the basics of these steps by a number of practical and instructive examples. We use three examples (1) to explain the basics of ADP, relying on value iteration with an approximation of the value functions, (2) to provide insight into implementation issues, and (3) to provide test cases for the reader to validate its own ADP implementations.

## Key words

Dynamic programming Approximate dynamic programming Stochastic optimization Monte Carlo simulation Curse of dimensionality## References

- 1.R. Bellman,
*Dynamic Programming*, 1st edn. (Princeton University Press, Princeton, NJ, 1957)Google Scholar - 2.D.P.D. Farias, B.V. Roy, On constraint sampling in the linear programming approach to approximate dynamic programming. Math. Oper. Res.
**29**(3), 462–478 (2004)CrossRefGoogle Scholar - 3.A.P. George, W.B. Powell, Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Mach. Learn.
**65**(1), 167–198 (2006)CrossRefGoogle Scholar - 4.A.P. George, W.B. Powell, S.R. Kulkarni, S. Mahadevan, Value function approximation using multiple aggregation for multiattribute resource management. J. Mach. Learn. Res.
**9**, 2079–2111 (2008)Google Scholar - 5.T. Hastie, R. Tibshirani, J. Friedman,
*The Elements of Statistical Learning*. Springer Series in Statistics (Springer, New York, NY, 2001)Google Scholar - 6.P.J.H. Hulshof, M.R.K. Mes, R.J. Boucherie, E.W. Hans, Patient admission planning using approximate dynamic programming. Flex. Serv. Manuf. J.
**28**(1), 30–61 (2016)CrossRefGoogle Scholar - 7.D.R. Jiang, T.V. Pham, W.B. Powell, D.F. Salas, W.R. Scott, A comparison of approximate dynamic programming techniques on benchmark energy storage problems: does anything work?, in
*IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)*, 2014, pp. 1–8Google Scholar - 8.M.R.K. Mes, W.B. Powell, P.I. Frazier, Hierarchical knowledge gradient for sequential sampling. J. Mach. Learn. Res.
**12**, 2931–2974 (2011)Google Scholar - 9.A. Pérez Rivera, M.R.K. Mes, Dynamic multi-period freight consolidation, in
*Computational Logistics*, ed. by F. Corman, S. Voß, R.R. Negenborn. Lecture Notes in Computer Science, vol. 9335 (Springer, Cham, 2015), pp. 370–385Google Scholar - 10.W.B. Powell,
*Approximate Dynamic Programming: Solving the Curses of Dimensionality*. Wiley Series in Probability and Statistics (Wiley, London, 2011)Google Scholar - 11.W.B. Powell, Perspectives of approximate dynamic programming. Ann. Oper. Res.
**241**(1), 319–356 (2012)Google Scholar - 12.W.B. Powell, Clearing the jungle of stochastic optimization, in
*Informs Tutorials in Operations Research*, chap. 4 (INFORMS, Hanover, MD, 2014), pp. 109–137Google Scholar - 13.W.B. Powell, I.O. Ryzhov,
*Optimal Learning and Approximate Dynamic Programming*(Wiley, London, 2013), pp. 410–431Google Scholar - 14.W.B. Powell, H.P. Simao, B. Bouzaiene-Ayari, Approximate dynamic programming in transportation and logistics: a unified framework. EURO J. Transp. Logist.
**1**(3), 237–284 (2012)CrossRefGoogle Scholar - 15.I.O. Ryzhov, W.B. Powell, Approximate dynamic programming with correlated bayesian beliefs, in
*Proceedings of the 48th Allerton Conference on Communication, Control and Computing*(2010)Google Scholar - 16.R.S. Sutton, A.G. Barto,
*Introduction to Reinforcement Learning*, 1st edn. (MIT Press, Cambridge, MA, 1998)Google Scholar - 17.J.N. Tsitsiklis, B. Roy, Feature-based methods for large scale dynamic programming. Mach. Learn.
**22**(1), 59–94 (1996)Google Scholar - 18.W. van Heeswijk, M.R.K. Mes, M. Schutten, An approximate dynamic programming approach to urban freight distribution with batch arrivals, in
*Computational Logistics*, ed. by F. Corman, S. Voß, R.R. Negenborn. Lecture Notes in Computer Science, vol. 9335 (Springer, Cham, 2015), pp. 61–75Google Scholar