Abstract
In this contribution we give a down-to-earth discussion on basic ideas for solving practical Markov decision problems. The emphasis is on the concept of the policy-improvement step for average cost optimization. This concept provides a flexible method of improving a given policy. By appropriately designing the policy-improvement step in specific applications, tailor-made algorithms may be developed to generate the best control rule within a class of control rules characterized by a few parameters. Also, in decision problems with an intractable multi-dimensional state space, decomposition and a once-only application of the policy-improvement step may lead to a good heuristic rule. These useful features of the policy-improvement concept will be illustrated with a queueing control problem with variable service rate and with the dynamic routing of arrivals to parallel queues. In the final section, we deal with the concept of the one-stage-look-ahead rule in optimal stopping and give several applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
S. Bhulai, G. Koole, On the structure of value functions for threshold policies in queueing models. J. Appl. Probab. 40, 613–622 (2003)
W.M. Boyce, On a simple stopping problem. Discret. Math. 5, 297–312 (1973)
E.V. Denardo, Dynamic Programming (Prentice-Hall, Englewood Cliffs, NJ, 1980)
C. Derman, Finite State Markovian Decision Processes (Academic, New York, 1970)
O. Hägström, J. Wästlund, Rigorous computer analysis of the Chow-Robbins game. Am. Math. Mon. 120, 893–900 (2013)
R. Haijema, J. Van der Wal, An MDP decomposition approach for traffic control at isolated signalized intersections. Probab. Eng. Inf. Sci. 27, 587–602 (2008)
N.A.J. Hastings, Bounds on the gain of a Markov decision process. Oper. Res. 19, 240–244 (1971)
T.P. Hill, Knowing when to stop. Am. Sci. 97, 126–133 (2007)
R.A. Howard, Dynamic Programming and Markov Processes (Wiley, New York, 1960)
K.R. Krishnan, T.J. Ott, State-dependent routing for telephone traffic: theory and results, in Proceedings of 25th IEEE Conference on Decision and Control, Athens (IEEE, New York, 1986), pp. 2124–2128
K.R. Krishnan, T.J. Ott, Joining the right queue: a Markov decision rule, in Proceedings of 26th IEEE Conference on Decision and Control, Los Angeles, CA (IEEE, New York, 1987), pp. 1863–1868
J.M. Norman, Heuristic Procedures in Dynamic Programming (Manchester University Press, Manchester, 1972)
A. Odoni, On finding the maximal gain for Markov decision processes. Operat. Res. 17, 857–860 (1969)
W.B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley, New York, 2007)
M.L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, New York, 1994)
S.M. Ross, Introduction to Stochastic Dynamic Programming, (Academic, New York, 1983)
S.A.E. Sassen, H.C. Tijms, R.D. Nobel, A heuristic rule for routing customers to parallel servers. Statistica Neerlandica 51, 107–121 (1997)
P.J. Schweitzer, A. Federgruen, Geometric convergence of value iteration in multichain Markov decision problems. Adv. Appl. Probab. 11, 188–217 (1979)
H.C. Tijms, A First Course in Stochastic Models (Wiley, New York, 2003)
H.C. Tijms, Understanding Probability, 3rd edn. (Cambridge University Press, New York, 2012)
J. Van der Wal, The method of value oriented successive approximations for the average reward Markov decision process. OR Spektrum 1, 233–242 (1980)
R. Weber, Optimization and Control. Class Notes (University of Cambridge, Cambridge, 2014). http://www.statslab.cam.ac.uk/rrw1/oc/oc2014.pdf
J. Wijngaard, Decomposition for dynamic programming in production and inventory control. Eng. Process Econ. 4, 385–388 (1979)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Tijms, H. (2017). One-Step Improvement Ideas and Computational Aspects. In: Boucherie, R., van Dijk, N. (eds) Markov Decision Processes in Practice. International Series in Operations Research & Management Science, vol 248. Springer, Cham. https://doi.org/10.1007/978-3-319-47766-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-47766-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47764-0
Online ISBN: 978-3-319-47766-4
eBook Packages: Business and ManagementBusiness and Management (R0)