Abstract
Motivated by the needs of on-line optimization of real-world engineering systems, we studied single sample path-based algorithms for Markov decision problems (MDP). The sample path used in the algorithms can be obtained by observing the operation of a real system. We give a simple example to explain the advantages of the sample path-based approach over the traditional computation-based approach: matrix inversion is not required; some transition probabilities do not have to be known; it may save storage space; and it gives the flexibility of iterating the actions for a subset of the state space in each iteration. The effect of the estimation errors and the convergence property of the sample path-based approach are studied. Finally, we propose a fast algorithm, which updates the policy whenever the system reaches a particular set of states and prove that the algorithm converges to the true optimal policy with probability one under some conditions. The sample path-based approach may have important applications to the design and management of engineering systems, such as high speed communication networks.
Similar content being viewed by others
References
Suri, R., and Leung, Y. T., Single-Run Optimization of Discrete-Event Simulations: An Empirical Study Using the M/M/1 Queue, IIE Transactions, Vol. 21, pp. 35–49, 1989.
Chong, E. K. P., and Ramadge, P. J., Stochastic Optimization of Regenerative Systems Using Infinitesimal Perturbation Analysis, IEEE Transactions on Automatic Control, Vol. 39, pp. 1400–1410, 1994.
Plambeck, E. L., Fu, B. R., Robinson, S. M., and Suri, R., Sample-Path Optimization of Convex Stochastic Performance Functions, Mathematical Programming, (to appear).
Cao, X. R., and Chen, H. F., Potentials, Perturbation Realization, and Sensitivity Analysis of Markov Processes, IEEE Transactions on Automatic Control, Vol. 42, pp. 1382–1393, 1997.
Arapostathis, A., Borkar, V. S., Fernandez-Gaucherand, E., Ghosh, M. H., and Marcus, S. I., Discrete-Time Controlled Markov Processes with Average Cost Criterion: A Survey, SIAM Journal on Control and Optimization, Vol. 31, pp. 282–344, 1993.
Bertsekas, D. P., Dynamic Programming and Optimal Control, Vols. 1–2, Athena Scientific, Belmont, Massachusetts, 1995.
Bertsekas, D. P., and Tsitsiklis, T. N., Neurodynamic Programming, Athena Scientific, Belmont, Massachusetts, 1996.
Glynn, P. W., and L'Ecuyer, P., Likelihood Ratio Gradient Estimation for Stochastic Recursions, Advances in Applied Probability, Vol. 27, pp. 1019–1053, 1995.
Meyn, S. P., and Tweedie, R. L., Markov Chains and Stochastic Stability, Springer Verlag, London, England, 1993.
Meyn, S. P., The Policy Improvement Algorithm for Markov Decision Processes with General State Space, IEEE Transactions on Automatic Control, (to appear).
Puterman, M. L., Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley, New York, New York, 1994.
Ross, S. M., Introduction to Stochastic Dynamic Programming, Academic Press, Reading, Massachusetts, 1983.
Tsitsiklis, J. N., and Van Roy, B., Feature-Based Methods for Large-Scale Dynamic Programming, Machine Learning, Vol. 22, pp. 59–94, 1996.
Çinlar, E., Introduction to Stochastic Processes, Prentice Hall, Englewood Cliffs, New Jersey, 1975.
Cao, X. R., and Wan, Y. W., Algorithms for Sensitivity Analysis of Markov Systems through Potentials and Perturbation Realization, IEEE Transactions on Control System Technology, Vol. 6, pp. 472–494, 1998.
Schweitzer, P. J., Perturbation Theory and Finite Markov Chains, Journal of Applied Probability, Vol. 5, pp. 401–413, 1968.
Golub, G. H., and Meyer, C. D., Using the QR Factorization and Group Inversion to Compute, Differentiate, and Estimate the Sensitivity of Stationary Probabilities for Markov Chains, SIAM Journal on Algorithms and Discrete Methods, Vol. 7, pp. 273–281, 1986.
Author information
Authors and Affiliations
Additional information
This work was supported in part by
Rights and permissions
About this article
Cite this article
Cao, X.R. Single Sample Path-Based Optimization of Markov Chains. Journal of Optimization Theory and Applications 100, 527–548 (1999). https://doi.org/10.1023/A:1022634422482
Issue Date:
DOI: https://doi.org/10.1023/A:1022634422482