One-Step Improvement Ideas and Computational Aspects

Tijms, Henk

doi:10.1007/978-3-319-47766-4_1

Henk Tijms⁶

Part of the book series: International Series in Operations Research & Management Science ((ISOR,volume 248))

4054 Accesses
1 Citations

Abstract

In this contribution we give a down-to-earth discussion on basic ideas for solving practical Markov decision problems. The emphasis is on the concept of the policy-improvement step for average cost optimization. This concept provides a flexible method of improving a given policy. By appropriately designing the policy-improvement step in specific applications, tailor-made algorithms may be developed to generate the best control rule within a class of control rules characterized by a few parameters. Also, in decision problems with an intractable multi-dimensional state space, decomposition and a once-only application of the policy-improvement step may lead to a good heuristic rule. These useful features of the policy-improvement concept will be illustrated with a queueing control problem with variable service rate and with the dynamic routing of arrivals to parallel queues. In the final section, we deal with the concept of the one-stage-look-ahead rule in optimal stopping and give several applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Hardcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

S. Bhulai, G. Koole, On the structure of value functions for threshold policies in queueing models. J. Appl. Probab. 40, 613–622 (2003)
Article Google Scholar
W.M. Boyce, On a simple stopping problem. Discret. Math. 5, 297–312 (1973)
Article Google Scholar
E.V. Denardo, Dynamic Programming (Prentice-Hall, Englewood Cliffs, NJ, 1980)
Google Scholar
C. Derman, Finite State Markovian Decision Processes (Academic, New York, 1970)
Google Scholar
O. Hägström, J. Wästlund, Rigorous computer analysis of the Chow-Robbins game. Am. Math. Mon. 120, 893–900 (2013)
Article Google Scholar
R. Haijema, J. Van der Wal, An MDP decomposition approach for traffic control at isolated signalized intersections. Probab. Eng. Inf. Sci. 27, 587–602 (2008)
Google Scholar
N.A.J. Hastings, Bounds on the gain of a Markov decision process. Oper. Res. 19, 240–244 (1971)
Article Google Scholar
T.P. Hill, Knowing when to stop. Am. Sci. 97, 126–133 (2007)
Article Google Scholar
R.A. Howard, Dynamic Programming and Markov Processes (Wiley, New York, 1960)
Google Scholar
K.R. Krishnan, T.J. Ott, State-dependent routing for telephone traffic: theory and results, in Proceedings of 25th IEEE Conference on Decision and Control, Athens (IEEE, New York, 1986), pp. 2124–2128
Google Scholar
K.R. Krishnan, T.J. Ott, Joining the right queue: a Markov decision rule, in Proceedings of 26th IEEE Conference on Decision and Control, Los Angeles, CA (IEEE, New York, 1987), pp. 1863–1868
Google Scholar
J.M. Norman, Heuristic Procedures in Dynamic Programming (Manchester University Press, Manchester, 1972)
Google Scholar
A. Odoni, On finding the maximal gain for Markov decision processes. Operat. Res. 17, 857–860 (1969)
Article Google Scholar
W.B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley, New York, 2007)
Book Google Scholar
M.L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, New York, 1994)
Book Google Scholar
S.M. Ross, Introduction to Stochastic Dynamic Programming, (Academic, New York, 1983)
Google Scholar
S.A.E. Sassen, H.C. Tijms, R.D. Nobel, A heuristic rule for routing customers to parallel servers. Statistica Neerlandica 51, 107–121 (1997)
Article Google Scholar
P.J. Schweitzer, A. Federgruen, Geometric convergence of value iteration in multichain Markov decision problems. Adv. Appl. Probab. 11, 188–217 (1979)
Article Google Scholar
H.C. Tijms, A First Course in Stochastic Models (Wiley, New York, 2003)
Book Google Scholar
H.C. Tijms, Understanding Probability, 3rd edn. (Cambridge University Press, New York, 2012)
Book Google Scholar
J. Van der Wal, The method of value oriented successive approximations for the average reward Markov decision process. OR Spektrum 1, 233–242 (1980)
Article Google Scholar
R. Weber, Optimization and Control. Class Notes (University of Cambridge, Cambridge, 2014). http://www.statslab.cam.ac.uk/rrw1/oc/oc2014.pdf
J. Wijngaard, Decomposition for dynamic programming in production and inventory control. Eng. Process Econ. 4, 385–388 (1979)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Henk Tijms

Authors

Henk Tijms
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Henk Tijms .

Editor information

Editors and Affiliations

Stochastic Operations Research, University of Twente, Enschede, The Netherlands
Richard J. Boucherie
Stochastic Operations Research, University of Twente, Enschede, The Netherlands
Nico M. van Dijk

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tijms, H. (2017). One-Step Improvement Ideas and Computational Aspects. In: Boucherie, R., van Dijk, N. (eds) Markov Decision Processes in Practice. International Series in Operations Research & Management Science, vol 248. Springer, Cham. https://doi.org/10.1007/978-3-319-47766-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-47766-4_1
Published: 11 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47764-0
Online ISBN: 978-3-319-47766-4
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics