Abstract
In the previous chapters, we developed a sensitivity-based approach that provides a unified framework for learning and optimization. We have shown that the two performance sensitivity formulas are the bases for learning and optimization of stochastic systems. The performance derivative formula leads to the gradient-based optimization approach, and the performance difference formula leads to the policy iteration approach to the standard MDP-type of problems.
Imagination is more important than knowledge.
Albert Einstein American (German born) physicist (1879 – 1955)
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
X. R. Cao, “Convergence of Parameter Sensitivity Estimates in a Stochastic Experiment,” IEEE Transactions on Automatic Control, Vol. 30, 845-853, 1985.
X. R. Cao, Realization Probabilities: The Dynamics of Queueing Systems, Springer-Verlag, New York, 1994.
X. R. Cao and H. F. Chen, “Perturbation Realization, Potentials and Sensitivity Analysis of Markov Processes,” IEEE Transactions on Automatic Control, Vol. 42, 1382-1393, 1997.
C. G. Cassandras and S. Lafortune, Introduction to Discrete Event Systems, Kluwer Academic Publishers, Boston, 1999.
Y. C. Ho and X. R. Cao, “Perturbation Analysis and Optimization of Queueing Networks,” Journal of Optimization Theory and Applications, Vol. 40, 559-582, 1983.
Y. C. Ho and X. R. Cao, Perturbation Analysis of Discrete-Event Dynamic Systems, Kluwer Academic Publisher, Boston, 1991.
E. Altman, Constrained Markov Decision Processes, Chapman & Hall/CRC, Boca Raton, 1999.
D. P. Bertsekas, Dynamic Programming and Optimal Control, Volumes I and II. Athena Scientific, Belmont, Massachusetts, 1995, 2001, 2007.
D. P. Bertsekas and T. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, Massachusetts, 1996.
C. D. Meyer, “The Role of the Group Generalized Inverse in the Theory of Finite Markov Chains,” SIAM Review, Vol. 17, 443-464, 1975.
M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, New York, 1994.
J. Baxter and P. L. Bartlett, “Infinite-Horizon Policy-Gradient Estimation,” Journal of Artificial Intelligence Research, Vol. 15, 319-350, 2001.
J. Baxter, P. L. Bartlett, and L. Weaver, “Experiments with Infinite-Horizon, Policy-Gradient Estimation,” Journal of Artificial Intelligence Research, Vol. 15, 351-381, 2001.
L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and Acting in Partially Observable Stochastic Domains,” Artificial Intelligence, Vol. 101, 99-134, 1998.
V. R. Konda and V. S. Borkar, “Actor-Critic Type Learning Algorithms for Markov Decision Processes,” SIAM Journal on Control and Optimization, Vol. 38, 94-123, 1999.
V. R. Konda and J. N. Tsitsiklis, “On Actor-Critic Algorithms,” SIAM Journal on Control and Optimization, Vol. 42, 1143-1166, 2003.
S. P. Singh, “Reinforcement Learning Algorithms for Average-Payoff Markovain Decision Processes,” Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, Washington, U.S.A, 700-705, July-August 1994.
W. D. Smart and L. P. Kaelbling, “Practical Reinforcement Learning in Continuous Spaces,” Proceedings of the Seventeenth International Conference on Machine Learning, Stanford, California, U.S.A, 903-910, June-July 2000.
R. S. Sutton, “Learning to Predict by the Methods of Temporal Differences,” Machine Learning, Vol. 3, 9-44, 1988.
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, Massachusetts, 1998.
J. N. Tsitsiklis and B. Van Roy, “Average Cost Temporal-Difference Learning,” Automatica, Vol. 35, 1799-1808, 1999.
K. J. Åström and B. Wittenmark, Adaptive Control, Addison-Wesley, Reading, Massachusetts, 1989.
K. S. Narendra and A. M. Annaswamy, Stable Adaptive Systems, Prentice Hall, Englewood Cliffs, New Jersey, 1989.
X. R. Cao, “The Relations Among Potentials, Perturbation Analysis, and Markov Decision Processes,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 8, 71-87, 1998.
X. R. Cao, Z. Y. Ren, S. Bhatnagar, M. C. Fu, and S. I. Marcus, “A Time Aggregation Approach to Markov Decision Processes,” Automatica, Vol. 38, 929-943, 2002.
H. T. Fang and X. R. Cao, “Potential-Based Online Policy Iteration Algorithms for Markov Decision Processes,” IEEE Transactions on Automatic Control, Vol. 49, 493-505, 2004.
E. K. P. Chong and P. J. Ramadge, “Convergence of Recursive Optimization Algorithms Using Infinitesimal Perturbation Analysis Estimates,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 1, 339-372, 1992.
E. K. P. Chong and P. J. Ramadge, “Optimization of Queues Using an Infinitesimal Perturbation Analysis-Based Stochastic Algorithm with General Update Times,” SIAM Journal on Control and Optimization, Vol. 31, 698-732, 1993.
E. K. P. Chong and P. J. Ramadge, “Stochastic Optimization of Regenerative Systems Using Infinitesimal Perturbation Analysis,” IEEE Transactions on Automatic Control, Vol. 39, 1400-1410, 1994.
M. C. Fu, “Convergence of a Stochastic Approximation Algorithm for the GI/G/1 Queue Using Infinitesimal Perturbation Analysis,” Journal of Optimization Theory and Applications, Vol. 65, 149-160, 1990.
P. Marbach and T. N. Tsitsiklis, “Simulation-Based Optimization of Markov Reward Processes,” IEEE Transactions on Automatic Control, Vol. 46, 191-209, 2001.
P. Marbach and T. N. Tsitsiklis, “Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 13, 111-148, 2003.
F. J. Vazquez-Abad, C. G. Cassandras, and V. Julka, “Centralized and Decentralized Asynchronous Optimization of Stochastic Discrete Event Systems,” IEEE Transactions on Automatic Control, Vol. 43, 631-655, 1998.
X. R. Cao (eds.), “Introduction to the Special Issue on Learning, Optimization, and Decision Making in DEDS,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 13, 7-8, 2003.
NSF (USA) Workshop on Learning and Approximate Dynamic Programming, Playacar, Mexico, April 8-10, 2002.
NSF (USA) Workshop and Outreach Tutorials on Approximate Dynamic Programming, Cocoyoc, Mexico, April 3-6, 2006
J. Si, A. G. Barto, W. B. Powell, and D. Wunsch, (eds) Handbook of Learning and Approximate Dynamic Programming, Wiley-IEEE Press, 2004.
A. G. Barto and S. Mahadevan, “Recent Advances in Hierarchical Reinforcement Learning,” Special Issue on Reinforcement Learning, Discrete Event Dynamic Systems: Theory and Application, Vol. 13, 41-77, 2003.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2007 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Cao, XR. (2007). Event-Based Optimization of Markov Systems. In: Stochastic Learning and Optimization. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-69082-7_8
Download citation
DOI: https://doi.org/10.1007/978-0-387-69082-7_8
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-36787-3
Online ISBN: 978-0-387-69082-7
eBook Packages: Computer ScienceComputer Science (R0)