Basic Ideas for Event-Based Optimization of Markov Systems

Cao, Xi-Ren

doi:10.1007/s10626-004-6211-4

Basic Ideas for Event-Based Optimization of Markov Systems

Original Article
Published: June 2005

Volume 15, pages 169–197, (2005)
Cite this article

Discrete Event Dynamic Systems Aims and scope Submit manuscript

Xi-Ren Cao¹

348 Accesses
69 Citations
Explore all metrics

Abstract

The goal of this paper is two-fold: First, we present a sensitivity point of view on the optimization of Markov systems. We show that Markov decision processes (MDPs) and the policy-gradient approach, or perturbation analysis (PA), can be derived easily from two fundamental sensitivity formulas, and such formulas can be flexibly constructed, by first principles, with performance potentials as building blocks. Second, with this sensitivity view we propose an event-based optimization approach, including the event-based sensitivity analysis and event-based policy iteration. This approach utilizes the special feature of a system characterized by events and illustrates how the potentials can be aggregated using the special feature and how the aggregated potential can be used in policy iteration. Compared with the traditional MDP approach, the event-based approach has its advantages: the number of aggregated potentials may scale to the system size despite that the number of states grows exponentially in the system size, this reduces the policy space and saves computation; the approach does not require actions at different states to be independent; and it utilizes the special feature of a system and does not need to know the exact transition probability matrix. The main ideas of the approach are illustrated by an admission control problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Barto, A., and Mahadevan, S. 2003. Recent advances in hierarchical reinforcement learning, special issue on reinforcement learning. Discret. Event Dyn. Syst. Theory Appl. 13: 41–77.
Google Scholar
Baxter, J., and Bartlett, P. L. 2001. Infinite-horizon policy-gradient estimation. J. Artif. Intell. Res. 15: 319–350.
Google Scholar
Baxter, J., Bartlett, P. L., and Weaver, L. 2001. Experiments with infinite-horizon policy-gradient estimation. J. Artif. Intell. Res. 15: 351–381.
Google Scholar
Bertsekas, D. P. 1995. Dynamic Programming and Optimal Control, Volume I and II. Belmont, MA: Athena Scientific.
Google Scholar
Cao, X. R. 1994. Realization Probabilities: The Dynamics of Queueing Systems. New York: Springer-Verlag.
Google Scholar
Cao, X. R. 1998. The relation among potentials, perturbation analysis, Markov decision processes, and other topics. J. Discret. Event Dyn. Syst. 8: 71–87.
Google Scholar
Cao, X. R. 1999. Single sample path based optimization of Markov chains. J. Optim. Theory Appl. 100(3): 527–548.
Google Scholar
Cao, X. R. 2000. A unified approach to Markov decision problems and performance sensitivity analysis. Automatica 36: 771–774.
Google Scholar
Cao, X. R. 2004a. The potential structure of sample paths and performance sensitivities of Markov systems. IEEE Trans. Automat. Contr. 49: 2129–2142.
Google Scholar
Cao, X. R. 2004b. A basic formula for on-line policy gradient algorithms, IEEE Trans. Automat. Contr. to appear.
Cao, X. R. 2004c. Event-based optimization of Markov systems. Manuscript to be submitted.
Cao, X. R., and Chen, H. F. 1997. Perturbation realization, potentials and sensitivity analysis of Markov processes. IEEE Trans. Automat. Contr. 42: 1382–1393.
Google Scholar
Cao, X. R., and Guo, X. 2004. A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: Multichain cases. Automatica 40: 1749–1759.
Google Scholar
Cao, X. R., and Wan, Y. W. 1998. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization. IEEE Trans. Control Syst. Technol. 6: 482–494.
Google Scholar
Cao, X. R., Yuan, X. M., and Qiu, L. 1996. A single sample path-based performance sensitivity formula for Markov chains. IEEE Trans. Automat. Contr. 41: 1814–1817.
Google Scholar
Cao, X. R., Ren, Z. Y., Bhatnagar, S., Fu, M., and Marcus, S. 2002. A time aggregation approach to Markov decision processes. Automatica 38: 929–943.
Google Scholar
Chong, E. K. P., and Ramadge, P. J. 1994. Stochastic optimization of regenerative systems using infinitesimal perturbation analysis. IEEE Trans. Automat. Contr. 39: 1400–1410.
Google Scholar
Cooper, W. L., Henderson, S. G., and Lewis, M. E. 2003. Convergence of simulation-based policy iteration. Probab. Eng. Inf. Sci. 17: 213–234.
Google Scholar
Dijk, N. V. 1993. Queueing Networks and Product Forms: A Systems Approach. Chichester: John Willey and Sons.
Fang, H. T., and Cao, X. R. 2004. Potential-based on-line policy iteration algorithms for Markov decision processes. IEEE Trans. Automat. Contr. 49: 493–505.
Google Scholar
Ho, Y. C., and Cao, X. R. 1983. Perturbation analysis and optimization of queueing networks. J. Optim. Theory Appl. 40(4): 559–582.
Google Scholar
Ho, Y. C., and Cao, X. R. 1991. Perturbation Analysis of Discrete-Event Dynamic Systems. Boston: Kluwer Academic Publisher.
Google Scholar
Ho, Y. C., Zhao, Q. C., and Pepyne, D. L. 2003. The no free lunch theorem, complexity and computer security. IEEE Trans. Automat. Contr. 48: 783–793.
Google Scholar
Marbach, P., and Tsitsiklis, T. N. 2001. Simulation-based optimization of Markov reward processes. IEEE Trans. Automat. Contr. 46: 191–209.
Google Scholar
Meuleau, N., Peshkin, L., Kim, K. -E., and Kaelbling, P. L. 1999. Learning finite-state controllers for partially observable environments. Proceedings of the Fifteenth International Conference on Uncertainty in Artificial Intelligence.
Puterman, M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York: Wiley.
Google Scholar
Suri, R., and Leung, Y. T. 1989. Single run optimization of discrete event simulations—An empirical study using the M/M/1 queue. IIE Trans. 21: 35–49.
Google Scholar
Theocharous, G., and Kaelbling, P. L. 2004. Approximate planning in POMDPS with macro-actions. Advances in Neural Information Processing Systems 16 (NIPS-03). Cambridge, MA: MIT Press. 775-782.
Watkins, C., and Dayan, P. 1992. Q-learning. Mach. Learn. 8: 279–292.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Electronic Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
Xi-Ren Cao

Authors

Xi-Ren Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xi-Ren Cao.

Additional information

Supported in part by a grant from Hong Kong UGC.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, XR. Basic Ideas for Event-Based Optimization of Markov Systems. Discrete Event Dyn Syst 15, 169–197 (2005). https://doi.org/10.1007/s10626-004-6211-4

Download citation

Issue Date: June 2005
DOI: https://doi.org/10.1007/s10626-004-6211-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Basic Ideas for Event-Based Optimization of Markov Systems

Abstract

Access this article

Similar content being viewed by others

The self-regulating nature of occupancy in ICUs: stochastic homoeostasis

Simulation optimization: a review of algorithms and applications

Optimal control of a two-phase heterogeneous service retrial queueing system with collisions and delayed vacations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Basic Ideas for Event-Based Optimization of Markov Systems

Abstract

Access this article

Similar content being viewed by others

The self-regulating nature of occupancy in ICUs: stochastic homoeostasis

Simulation optimization: a review of algorithms and applications

Optimal control of a two-phase heterogeneous service retrial queueing system with collisions and delayed vacations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation