Abstract
This chapter presents some recent results in the area of optimization and Markov decision processes (MDPs). The work starts with performance sensitivity analysis of Markov processes. It is a continuation of the research in the past two decades on the optimization of discrete event dynamic systems, especially the theory of perturbation analysis (PA). We approach the MDP problem from a sensitivity point of view. This new perspective leads to some new insights and offers a clear and concise explanation for the basic concepts and results in MDPs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berman, A. and Plemmons, R. J., Nonnegative Matrices in the Mathematical Sciences, SIAM, Philadelphia, 1994.
Bertsekas, D. P., Dynamic Programming and Optimal Control, Vols. I, II, Athena Scientific, Belmont, Massachusetts, 1995.
Bertsekas, D. P., and Tsitsiklis, J. N., Neuro-Dynamic Programming, Athena Scientific, Belmont, Massachusetts, 1996.
Cao, X. R., Realization Probabilities: The Dynamics of Queueing Systems, Springer-Verlag, New York, 1994.
Cao, X. R., The relation among potentials, perturbation analysis, Markov decision processes, and other topics, Journal of Discrete Event Dynamic Systems, 8 (1998), 71–87.
Gao, X. R., The Maclaurin series for performance functions of Markov chains, Advances in Applied Probability, 30 (1998), 676–692.
Cao, X. R., Single sample path-based optimization of Markov chains, Journal of Optimization Theory and Application, 100 (1999), 527–548.
Cao, X. R., A unified approach to Markov decision problems and performance sensitivity analysis, Automatica, 36 (2000), 771–774.
Cao, X. R. and Chen, H. F., Potentials, perturbation realization, and sensitivity analysis of Markov processes, IEEE Transactions on Automatic Control, 42 (1997), 1382–1393.
Gao, X. R., Ren, Z. Y., Bhatnagar, S., Fu, M., and MARCUS, S., A time aggregation approach to Markov decision processes, Automatica, to appear, (2001).
Çinlar, E., Introduction to Stochastic Processes, Prentice Hall, Englewood cliffs, NJ, 1975.
Fang, H. T. and Cao, X. R., Potential-based on-line policy iteration algorithms for Markov decision processes, IEEE Trans. on Automatic Control, submitted.
Forestier, J. P. and Varaiya, P., Multilayer control of large Markov chains, IEEE Transactions on Automatic Control, 23 (1978), 298–305.
Glynn P. W. and Meyn, S. P., A Lyapunov bound for solutions of Poisson’s equation, Ann. Probab., 24 (1996), 916–931.
He, M., Performance Optimization in Wireless Communication Networks - Online Algorithms, MPhil. Thesis, Dept. of EEE, Hong Kong University of Science and Technology, 2001.
Ho, Y. C. and Cao, X. R., Perturbation Analysis of Discrete-Event Dynamic Systems, Kluwer Academic Publisher, Boston, 1991.
Marbach, P. and Tsitsiklis, J. N., Simulation-based optimization of Markov reward processes, IEEE Transactions on Automatic Control, 46 (2001), 191–209.
Kemeny, J. G. and Snell, J. L., Finite Markov Chains, Van Nostrand, New York, 1960.
Meyn S. P. and Tweedie, R. L., Markov Chains and Stochastic Stability, Springer-Verlag, London, 1993.
Puterman, M. L., Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley, New York, 1994.
Revuz, D., Markov Chains, North-Holland, New York, 1984.
Zhang B. and Ho, Y. C., Performance gradient estimation for very large finite Markov chains, IEEE Transactions on Automatic Control, 36 (1991), 1218–1227.
Rights and permissions
Copyright information
© 2003 Springer-Verlag New York, Inc.
About this chapter
Cite this chapter
Cao, XR. (2003). Performance Potential Based Optimization and MDPs. In: Stochastic Modeling and Optimization. Springer, New York, NY. https://doi.org/10.1007/978-0-387-21757-4_4
Download citation
DOI: https://doi.org/10.1007/978-0-387-21757-4_4
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-3065-1
Online ISBN: 978-0-387-21757-4
eBook Packages: Springer Book Archive