Skip to main content
Log in

Single Sample Path-Based Optimization of Markov Chains

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

Motivated by the needs of on-line optimization of real-world engineering systems, we studied single sample path-based algorithms for Markov decision problems (MDP). The sample path used in the algorithms can be obtained by observing the operation of a real system. We give a simple example to explain the advantages of the sample path-based approach over the traditional computation-based approach: matrix inversion is not required; some transition probabilities do not have to be known; it may save storage space; and it gives the flexibility of iterating the actions for a subset of the state space in each iteration. The effect of the estimation errors and the convergence property of the sample path-based approach are studied. Finally, we propose a fast algorithm, which updates the policy whenever the system reaches a particular set of states and prove that the algorithm converges to the true optimal policy with probability one under some conditions. The sample path-based approach may have important applications to the design and management of engineering systems, such as high speed communication networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Suri, R., and Leung, Y. T., Single-Run Optimization of Discrete-Event Simulations: An Empirical Study Using the M/M/1 Queue, IIE Transactions, Vol. 21, pp. 35–49, 1989.

    Google Scholar 

  2. Chong, E. K. P., and Ramadge, P. J., Stochastic Optimization of Regenerative Systems Using Infinitesimal Perturbation Analysis, IEEE Transactions on Automatic Control, Vol. 39, pp. 1400–1410, 1994.

    Google Scholar 

  3. Plambeck, E. L., Fu, B. R., Robinson, S. M., and Suri, R., Sample-Path Optimization of Convex Stochastic Performance Functions, Mathematical Programming, (to appear).

  4. Cao, X. R., and Chen, H. F., Potentials, Perturbation Realization, and Sensitivity Analysis of Markov Processes, IEEE Transactions on Automatic Control, Vol. 42, pp. 1382–1393, 1997.

    Google Scholar 

  5. Arapostathis, A., Borkar, V. S., Fernandez-Gaucherand, E., Ghosh, M. H., and Marcus, S. I., Discrete-Time Controlled Markov Processes with Average Cost Criterion: A Survey, SIAM Journal on Control and Optimization, Vol. 31, pp. 282–344, 1993.

    Google Scholar 

  6. Bertsekas, D. P., Dynamic Programming and Optimal Control, Vols. 1–2, Athena Scientific, Belmont, Massachusetts, 1995.

    Google Scholar 

  7. Bertsekas, D. P., and Tsitsiklis, T. N., Neurodynamic Programming, Athena Scientific, Belmont, Massachusetts, 1996.

    Google Scholar 

  8. Glynn, P. W., and L'Ecuyer, P., Likelihood Ratio Gradient Estimation for Stochastic Recursions, Advances in Applied Probability, Vol. 27, pp. 1019–1053, 1995.

    Google Scholar 

  9. Meyn, S. P., and Tweedie, R. L., Markov Chains and Stochastic Stability, Springer Verlag, London, England, 1993.

    Google Scholar 

  10. Meyn, S. P., The Policy Improvement Algorithm for Markov Decision Processes with General State Space, IEEE Transactions on Automatic Control, (to appear).

  11. Puterman, M. L., Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley, New York, New York, 1994.

    Google Scholar 

  12. Ross, S. M., Introduction to Stochastic Dynamic Programming, Academic Press, Reading, Massachusetts, 1983.

    Google Scholar 

  13. Tsitsiklis, J. N., and Van Roy, B., Feature-Based Methods for Large-Scale Dynamic Programming, Machine Learning, Vol. 22, pp. 59–94, 1996.

    Google Scholar 

  14. Çinlar, E., Introduction to Stochastic Processes, Prentice Hall, Englewood Cliffs, New Jersey, 1975.

    Google Scholar 

  15. Cao, X. R., and Wan, Y. W., Algorithms for Sensitivity Analysis of Markov Systems through Potentials and Perturbation Realization, IEEE Transactions on Control System Technology, Vol. 6, pp. 472–494, 1998.

    Google Scholar 

  16. Schweitzer, P. J., Perturbation Theory and Finite Markov Chains, Journal of Applied Probability, Vol. 5, pp. 401–413, 1968.

    Google Scholar 

  17. Golub, G. H., and Meyer, C. D., Using the QR Factorization and Group Inversion to Compute, Differentiate, and Estimate the Sensitivity of Stationary Probabilities for Markov Chains, SIAM Journal on Algorithms and Discrete Methods, Vol. 7, pp. 273–281, 1986.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

This work was supported in part by

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, X.R. Single Sample Path-Based Optimization of Markov Chains. Journal of Optimization Theory and Applications 100, 527–548 (1999). https://doi.org/10.1023/A:1022634422482

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1022634422482

Navigation