Abstract
This paper provides an introductory discussion for an important concept, the performance potentials of Markov processes, and its relations with perturbation analysis (PA), average-cost Markov decision processes (MDP), Poisson equations, α-potentials, the fundamental matrix, and the group inverse of the transition matrix (or the infinitesimal generators). Applications to single sample path-based performance sensitivity estimation and performance optimization are also discussed. On-line algorithms for performance sensitivity estimates and on-line schemes for policy iteration methods are presented. The approach is closely related to reinforcement learning algorithms.
Similar content being viewed by others
References
Berman, A., and Plemmons, R. J. 1994. Nonnegative Matrices in the Mathematical Sciences. Philadelphia: SIAM.
Bertsekas, D. P. 1995. Dynamic Programming and Optimal Control, Vols. I, II. Belmont, Massachusetts: Athena Scientific.
Bertsekas, D. P., and Tsitsiklis, J. N. 1996. Neuro–Dynamic Programming. Belmont, Massachusetts: Athena Scientific.
Cao, X. R. 1994. Realization Probabilities: The Dynamics of Queueing Systems. New York: Springer–Verlag.
Cao, X. R., and Chen, H. F. 1997. Potentials, perturbation realization, and sensitivity analysis of Markov processes. IEEE Trans. on Automatic Control 42: 1382–1393.
Cao, X. R., and Wan, Y. W. To appear. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization. IEEE Trans. on Control Systems Technology.
Çinlar, E. 1975. Introduction to Stochastic Processes. Prentice Hall, Inc.
Ho, Y. C., and Cao, X. R. 1991. Perturbation Analysis of Discrete–Event Dynamic Systems. Boston: Kluwer Academic Publisher.
Dai, L. Y. 1994. A consistent algorithm for derivative estimation of Markov chains. Proceedings of the 33rd IEEE Conference on Decision and Control, 1990–1995.
Dai, L. Y., and Ho, Y. C. 1995. Structural infinitesimal perturbation analysis (SIPA) for derivative estimation of discrete event dynamic systems. IEEE Transactions on AC 40: 1154–1166.
Fu, M., and Hu, J. Q. 1994. Smoothed perturbation analysis derivative estimation for Markov chains. Operations Research Letters 14: 241–251.
Gallager, R. G. 1995. Discrete Stochastic Processes. Kluwer Academic Publishers.
Golub, G. H., and Meyer, C. D., Jr. 1986. Using the QR factorization and group inversion to compute, differentiate, and estimate the sensitivity of stationary probability for Markov chains. SIAM J. Alg. Disc. Meth. 7: 273–281.
Jaakkola, T., Singh, S. P., and Jordan, M. J. 1995. Reinforcement learning algorithm for partially observable Markov decision problems. Neural Information Processing Systems7.
Kemeny, J. G., and Snell, J. L. 1960. Finite Markov Chains. New York: Van Nostrand.
Meyer, Carl D., Jr. 1975. The role of the group generalized inverse in the theory of finite Markov chains. SIAM Review 17: 443–464.
Puterman, M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York: Wiley.
Ross, S. M. 1983. Introduction to Stochastic Dynamic Programming. New York: Academic Press, Inc.
Tsitsiklis, J. N., and Van Roy, B. 1996. Feature–based methods for large scale dynamic programming. Machine Learning 22: 59–94.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Cao, XR. The Relations Among Potentials, Perturbation Analysis, and Markov Decision Processes. Discrete Event Dynamic Systems 8, 71–87 (1998). https://doi.org/10.1023/A:1008260528575
Issue Date:
DOI: https://doi.org/10.1023/A:1008260528575