Single Sample Path-Based Optimization of Markov Chains

Cao, X. R.

doi:10.1023/A:1022634422482

Single Sample Path-Based Optimization of Markov Chains

Published: March 1999

Volume 100, pages 527–548, (1999)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

X. R. Cao^1,2

238 Accesses
20 Citations
Explore all metrics

Abstract

Motivated by the needs of on-line optimization of real-world engineering systems, we studied single sample path-based algorithms for Markov decision problems (MDP). The sample path used in the algorithms can be obtained by observing the operation of a real system. We give a simple example to explain the advantages of the sample path-based approach over the traditional computation-based approach: matrix inversion is not required; some transition probabilities do not have to be known; it may save storage space; and it gives the flexibility of iterating the actions for a subset of the state space in each iteration. The effect of the estimation errors and the convergence property of the sample path-based approach are studied. Finally, we propose a fast algorithm, which updates the policy whenever the system reaches a particular set of states and prove that the algorithm converges to the true optimal policy with probability one under some conditions. The sample path-based approach may have important applications to the design and management of engineering systems, such as high speed communication networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Suri, R., and Leung, Y. T., Single-Run Optimization of Discrete-Event Simulations: An Empirical Study Using the M/M/1 Queue, IIE Transactions, Vol. 21, pp. 35–49, 1989.
Google Scholar
Chong, E. K. P., and Ramadge, P. J., Stochastic Optimization of Regenerative Systems Using Infinitesimal Perturbation Analysis, IEEE Transactions on Automatic Control, Vol. 39, pp. 1400–1410, 1994.
Google Scholar
Plambeck, E. L., Fu, B. R., Robinson, S. M., and Suri, R., Sample-Path Optimization of Convex Stochastic Performance Functions, Mathematical Programming, (to appear).
Cao, X. R., and Chen, H. F., Potentials, Perturbation Realization, and Sensitivity Analysis of Markov Processes, IEEE Transactions on Automatic Control, Vol. 42, pp. 1382–1393, 1997.
Google Scholar
Arapostathis, A., Borkar, V. S., Fernandez-Gaucherand, E., Ghosh, M. H., and Marcus, S. I., Discrete-Time Controlled Markov Processes with Average Cost Criterion: A Survey, SIAM Journal on Control and Optimization, Vol. 31, pp. 282–344, 1993.
Google Scholar
Bertsekas, D. P., Dynamic Programming and Optimal Control, Vols. 1–2, Athena Scientific, Belmont, Massachusetts, 1995.
Google Scholar
Bertsekas, D. P., and Tsitsiklis, T. N., Neurodynamic Programming, Athena Scientific, Belmont, Massachusetts, 1996.
Google Scholar
Glynn, P. W., and L'Ecuyer, P., Likelihood Ratio Gradient Estimation for Stochastic Recursions, Advances in Applied Probability, Vol. 27, pp. 1019–1053, 1995.
Google Scholar
Meyn, S. P., and Tweedie, R. L., Markov Chains and Stochastic Stability, Springer Verlag, London, England, 1993.
Google Scholar
Meyn, S. P., The Policy Improvement Algorithm for Markov Decision Processes with General State Space, IEEE Transactions on Automatic Control, (to appear).
Puterman, M. L., Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley, New York, New York, 1994.
Google Scholar
Ross, S. M., Introduction to Stochastic Dynamic Programming, Academic Press, Reading, Massachusetts, 1983.
Google Scholar
Tsitsiklis, J. N., and Van Roy, B., Feature-Based Methods for Large-Scale Dynamic Programming, Machine Learning, Vol. 22, pp. 59–94, 1996.
Google Scholar
Çinlar, E., Introduction to Stochastic Processes, Prentice Hall, Englewood Cliffs, New Jersey, 1975.
Google Scholar
Cao, X. R., and Wan, Y. W., Algorithms for Sensitivity Analysis of Markov Systems through Potentials and Perturbation Realization, IEEE Transactions on Control System Technology, Vol. 6, pp. 472–494, 1998.
Google Scholar
Schweitzer, P. J., Perturbation Theory and Finite Markov Chains, Journal of Applied Probability, Vol. 5, pp. 401–413, 1968.
Google Scholar
Golub, G. H., and Meyer, C. D., Using the QR Factorization and Group Inversion to Compute, Differentiate, and Estimate the Sensitivity of Stationary Probabilities for Markov Chains, SIAM Journal on Algorithms and Discrete Methods, Vol. 7, pp. 273–281, 1986.
Google Scholar

Download references

Author information

Authors and Affiliations

Hong Kong University Grant Council under Grant, HKUST 690/95E
X. R. Cao
Professor, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
X. R. Cao

Authors

X. R. Cao
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

This work was supported in part by

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, X.R. Single Sample Path-Based Optimization of Markov Chains. Journal of Optimization Theory and Applications 100, 527–548 (1999). https://doi.org/10.1023/A:1022634422482

Download citation

Issue Date: March 1999
DOI: https://doi.org/10.1023/A:1022634422482

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Single Sample Path-Based Optimization of Markov Chains

Abstract

Access this article

Similar content being viewed by others

A Review of Multi-objective Optimization: Methods and Algorithms in Mechanical Engineering Problems

Iterative MILP algorithm to find alternate solutions in linear programming models

Simulated Annealing: From Basics to Applications

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Navigation

Single Sample Path-Based Optimization of Markov Chains

Abstract

Access this article

Similar content being viewed by others

A Review of Multi-objective Optimization: Methods and Algorithms in Mechanical Engineering Problems

Iterative MILP algorithm to find alternate solutions in linear programming models

Simulated Annealing: From Basics to Applications

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation