Event-Based Optimization of Markov Systems

Cao, Xi-Ren

doi:10.1007/978-0-387-69082-7_8

Event-Based Optimization of Markov Systems

Xi-Ren Cao PhD²

Chapter

Abstract

In the previous chapters, we developed a sensitivity-based approach that provides a unified framework for learning and optimization. We have shown that the two performance sensitivity formulas are the bases for learning and optimization of stochastic systems. The performance derivative formula leads to the gradient-based optimization approach, and the performance difference formula leads to the policy iteration approach to the standard MDP-type of problems.

Imagination is more important than knowledge.

Albert Einstein American (German born) physicist (1879 – 1955)

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

X. R. Cao, “Convergence of Parameter Sensitivity Estimates in a Stochastic Experiment,” IEEE Transactions on Automatic Control, Vol. 30, 845-853, 1985.
Article MATH Google Scholar
X. R. Cao, Realization Probabilities: The Dynamics of Queueing Systems, Springer-Verlag, New York, 1994.
Book MATH Google Scholar
X. R. Cao and H. F. Chen, “Perturbation Realization, Potentials and Sensitivity Analysis of Markov Processes,” IEEE Transactions on Automatic Control, Vol. 42, 1382-1393, 1997.
Article MATH MathSciNet Google Scholar
C. G. Cassandras and S. Lafortune, Introduction to Discrete Event Systems, Kluwer Academic Publishers, Boston, 1999.
MATH Google Scholar
Y. C. Ho and X. R. Cao, “Perturbation Analysis and Optimization of Queueing Networks,” Journal of Optimization Theory and Applications, Vol. 40, 559-582, 1983.
Article MATH MathSciNet Google Scholar
Y. C. Ho and X. R. Cao, Perturbation Analysis of Discrete-Event Dynamic Systems, Kluwer Academic Publisher, Boston, 1991.
MATH Google Scholar
E. Altman, Constrained Markov Decision Processes, Chapman & Hall/CRC, Boca Raton, 1999.
Google Scholar
D. P. Bertsekas, Dynamic Programming and Optimal Control, Volumes I and II. Athena Scientific, Belmont, Massachusetts, 1995, 2001, 2007.
Google Scholar
D. P. Bertsekas and T. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, Massachusetts, 1996.
Google Scholar
C. D. Meyer, “The Role of the Group Generalized Inverse in the Theory of Finite Markov Chains,” SIAM Review, Vol. 17, 443-464, 1975.
Article MATH MathSciNet Google Scholar
M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, New York, 1994.
MATH Google Scholar
J. Baxter and P. L. Bartlett, “Infinite-Horizon Policy-Gradient Estimation,” Journal of Artificial Intelligence Research, Vol. 15, 319-350, 2001.
Article MATH MathSciNet Google Scholar
J. Baxter, P. L. Bartlett, and L. Weaver, “Experiments with Infinite-Horizon, Policy-Gradient Estimation,” Journal of Artificial Intelligence Research, Vol. 15, 351-381, 2001.
MATH MathSciNet Google Scholar
L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and Acting in Partially Observable Stochastic Domains,” Artificial Intelligence, Vol. 101, 99-134, 1998.
Article MATH MathSciNet Google Scholar
V. R. Konda and V. S. Borkar, “Actor-Critic Type Learning Algorithms for Markov Decision Processes,” SIAM Journal on Control and Optimization, Vol. 38, 94-123, 1999.
Article MATH MathSciNet Google Scholar
V. R. Konda and J. N. Tsitsiklis, “On Actor-Critic Algorithms,” SIAM Journal on Control and Optimization, Vol. 42, 1143-1166, 2003.
Article MATH MathSciNet Google Scholar
S. P. Singh, “Reinforcement Learning Algorithms for Average-Payoff Markovain Decision Processes,” Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, Washington, U.S.A, 700-705, July-August 1994.
Google Scholar
W. D. Smart and L. P. Kaelbling, “Practical Reinforcement Learning in Continuous Spaces,” Proceedings of the Seventeenth International Conference on Machine Learning, Stanford, California, U.S.A, 903-910, June-July 2000.
Google Scholar
R. S. Sutton, “Learning to Predict by the Methods of Temporal Differences,” Machine Learning, Vol. 3, 9-44, 1988.
Google Scholar
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, Massachusetts, 1998.
Google Scholar
J. N. Tsitsiklis and B. Van Roy, “Average Cost Temporal-Difference Learning,” Automatica, Vol. 35, 1799-1808, 1999.
Article MATH Google Scholar
K. J. Åström and B. Wittenmark, Adaptive Control, Addison-Wesley, Reading, Massachusetts, 1989.
MATH Google Scholar
K. S. Narendra and A. M. Annaswamy, Stable Adaptive Systems, Prentice Hall, Englewood Cliffs, New Jersey, 1989.
Google Scholar
X. R. Cao, “The Relations Among Potentials, Perturbation Analysis, and Markov Decision Processes,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 8, 71-87, 1998.
Article MATH MathSciNet Google Scholar
X. R. Cao, Z. Y. Ren, S. Bhatnagar, M. C. Fu, and S. I. Marcus, “A Time Aggregation Approach to Markov Decision Processes,” Automatica, Vol. 38, 929-943, 2002.
Article MATH MathSciNet Google Scholar
H. T. Fang and X. R. Cao, “Potential-Based Online Policy Iteration Algorithms for Markov Decision Processes,” IEEE Transactions on Automatic Control, Vol. 49, 493-505, 2004.
Article MathSciNet Google Scholar
E. K. P. Chong and P. J. Ramadge, “Convergence of Recursive Optimization Algorithms Using Infinitesimal Perturbation Analysis Estimates,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 1, 339-372, 1992.
MATH Google Scholar
E. K. P. Chong and P. J. Ramadge, “Optimization of Queues Using an Infinitesimal Perturbation Analysis-Based Stochastic Algorithm with General Update Times,” SIAM Journal on Control and Optimization, Vol. 31, 698-732, 1993.
Article MATH MathSciNet Google Scholar
E. K. P. Chong and P. J. Ramadge, “Stochastic Optimization of Regenerative Systems Using Infinitesimal Perturbation Analysis,” IEEE Transactions on Automatic Control, Vol. 39, 1400-1410, 1994.
Article MATH MathSciNet Google Scholar
M. C. Fu, “Convergence of a Stochastic Approximation Algorithm for the GI/G/1 Queue Using Infinitesimal Perturbation Analysis,” Journal of Optimization Theory and Applications, Vol. 65, 149-160, 1990.
Article MATH MathSciNet Google Scholar
P. Marbach and T. N. Tsitsiklis, “Simulation-Based Optimization of Markov Reward Processes,” IEEE Transactions on Automatic Control, Vol. 46, 191-209, 2001.
Article MATH MathSciNet Google Scholar
P. Marbach and T. N. Tsitsiklis, “Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 13, 111-148, 2003.
Article MATH MathSciNet Google Scholar
F. J. Vazquez-Abad, C. G. Cassandras, and V. Julka, “Centralized and Decentralized Asynchronous Optimization of Stochastic Discrete Event Systems,” IEEE Transactions on Automatic Control, Vol. 43, 631-655, 1998.
Article MATH MathSciNet Google Scholar
X. R. Cao (eds.), “Introduction to the Special Issue on Learning, Optimization, and Decision Making in DEDS,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 13, 7-8, 2003.
Article Google Scholar
NSF (USA) Workshop on Learning and Approximate Dynamic Programming, Playacar, Mexico, April 8-10, 2002.
Google Scholar
NSF (USA) Workshop and Outreach Tutorials on Approximate Dynamic Programming, Cocoyoc, Mexico, April 3-6, 2006
Google Scholar
J. Si, A. G. Barto, W. B. Powell, and D. Wunsch, (eds) Handbook of Learning and Approximate Dynamic Programming, Wiley-IEEE Press, 2004.
Google Scholar
A. G. Barto and S. Mahadevan, “Recent Advances in Hierarchical Reinforcement Learning,” Special Issue on Reinforcement Learning, Discrete Event Dynamic Systems: Theory and Application, Vol. 13, 41-77, 2003.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
Xi-Ren Cao PhD (Professor)

Authors

Xi-Ren Cao PhD
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xi-Ren Cao PhD .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cao, XR. (2007). Event-Based Optimization of Markov Systems. In: Stochastic Learning and Optimization. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-69082-7_8

Download citation

DOI: https://doi.org/10.1007/978-0-387-69082-7_8
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-36787-3
Online ISBN: 978-0-387-69082-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics