Skip to main content

Event-Based Optimization of Markov Systems

  • Chapter

Abstract

In the previous chapters, we developed a sensitivity-based approach that provides a unified framework for learning and optimization. We have shown that the two performance sensitivity formulas are the bases for learning and optimization of stochastic systems. The performance derivative formula leads to the gradient-based optimization approach, and the performance difference formula leads to the policy iteration approach to the standard MDP-type of problems.

Imagination is more important than knowledge.

Albert Einstein American (German born) physicist (1879 – 1955)

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. X. R. Cao, “Convergence of Parameter Sensitivity Estimates in a Stochastic Experiment,” IEEE Transactions on Automatic Control, Vol. 30, 845-853, 1985.

    Article  MATH  Google Scholar 

  2. X. R. Cao, Realization Probabilities: The Dynamics of Queueing Systems, Springer-Verlag, New York, 1994.

    Book  MATH  Google Scholar 

  3. X. R. Cao and H. F. Chen, “Perturbation Realization, Potentials and Sensitivity Analysis of Markov Processes,” IEEE Transactions on Automatic Control, Vol. 42, 1382-1393, 1997.

    Article  MATH  MathSciNet  Google Scholar 

  4. C. G. Cassandras and S. Lafortune, Introduction to Discrete Event Systems, Kluwer Academic Publishers, Boston, 1999.

    MATH  Google Scholar 

  5. Y. C. Ho and X. R. Cao, “Perturbation Analysis and Optimization of Queueing Networks,” Journal of Optimization Theory and Applications, Vol. 40, 559-582, 1983.

    Article  MATH  MathSciNet  Google Scholar 

  6. Y. C. Ho and X. R. Cao, Perturbation Analysis of Discrete-Event Dynamic Systems, Kluwer Academic Publisher, Boston, 1991.

    MATH  Google Scholar 

  7. E. Altman, Constrained Markov Decision Processes, Chapman & Hall/CRC, Boca Raton, 1999.

    Google Scholar 

  8. D. P. Bertsekas, Dynamic Programming and Optimal Control, Volumes I and II. Athena Scientific, Belmont, Massachusetts, 1995, 2001, 2007.

    Google Scholar 

  9. D. P. Bertsekas and T. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, Massachusetts, 1996.

    Google Scholar 

  10. C. D. Meyer, “The Role of the Group Generalized Inverse in the Theory of Finite Markov Chains,” SIAM Review, Vol. 17, 443-464, 1975.

    Article  MATH  MathSciNet  Google Scholar 

  11. M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, New York, 1994.

    MATH  Google Scholar 

  12. J. Baxter and P. L. Bartlett, “Infinite-Horizon Policy-Gradient Estimation,” Journal of Artificial Intelligence Research, Vol. 15, 319-350, 2001.

    Article  MATH  MathSciNet  Google Scholar 

  13. J. Baxter, P. L. Bartlett, and L. Weaver, “Experiments with Infinite-Horizon, Policy-Gradient Estimation,” Journal of Artificial Intelligence Research, Vol. 15, 351-381, 2001.

    MATH  MathSciNet  Google Scholar 

  14. L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and Acting in Partially Observable Stochastic Domains,” Artificial Intelligence, Vol. 101, 99-134, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  15. V. R. Konda and V. S. Borkar, “Actor-Critic Type Learning Algorithms for Markov Decision Processes,” SIAM Journal on Control and Optimization, Vol. 38, 94-123, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  16. V. R. Konda and J. N. Tsitsiklis, “On Actor-Critic Algorithms,” SIAM Journal on Control and Optimization, Vol. 42, 1143-1166, 2003.

    Article  MATH  MathSciNet  Google Scholar 

  17. S. P. Singh, “Reinforcement Learning Algorithms for Average-Payoff Markovain Decision Processes,” Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, Washington, U.S.A, 700-705, July-August 1994.

    Google Scholar 

  18. W. D. Smart and L. P. Kaelbling, “Practical Reinforcement Learning in Continuous Spaces,” Proceedings of the Seventeenth International Conference on Machine Learning, Stanford, California, U.S.A, 903-910, June-July 2000.

    Google Scholar 

  19. R. S. Sutton, “Learning to Predict by the Methods of Temporal Differences,” Machine Learning, Vol. 3, 9-44, 1988.

    Google Scholar 

  20. R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, Massachusetts, 1998.

    Google Scholar 

  21. J. N. Tsitsiklis and B. Van Roy, “Average Cost Temporal-Difference Learning,” Automatica, Vol. 35, 1799-1808, 1999.

    Article  MATH  Google Scholar 

  22. K. J. Åström and B. Wittenmark, Adaptive Control, Addison-Wesley, Reading, Massachusetts, 1989.

    MATH  Google Scholar 

  23. K. S. Narendra and A. M. Annaswamy, Stable Adaptive Systems, Prentice Hall, Englewood Cliffs, New Jersey, 1989.

    Google Scholar 

  24. X. R. Cao, “The Relations Among Potentials, Perturbation Analysis, and Markov Decision Processes,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 8, 71-87, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  25. X. R. Cao, Z. Y. Ren, S. Bhatnagar, M. C. Fu, and S. I. Marcus, “A Time Aggregation Approach to Markov Decision Processes,” Automatica, Vol. 38, 929-943, 2002.

    Article  MATH  MathSciNet  Google Scholar 

  26. H. T. Fang and X. R. Cao, “Potential-Based Online Policy Iteration Algorithms for Markov Decision Processes,” IEEE Transactions on Automatic Control, Vol. 49, 493-505, 2004.

    Article  MathSciNet  Google Scholar 

  27. E. K. P. Chong and P. J. Ramadge, “Convergence of Recursive Optimization Algorithms Using Infinitesimal Perturbation Analysis Estimates,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 1, 339-372, 1992.

    MATH  Google Scholar 

  28. E. K. P. Chong and P. J. Ramadge, “Optimization of Queues Using an Infinitesimal Perturbation Analysis-Based Stochastic Algorithm with General Update Times,” SIAM Journal on Control and Optimization, Vol. 31, 698-732, 1993.

    Article  MATH  MathSciNet  Google Scholar 

  29. E. K. P. Chong and P. J. Ramadge, “Stochastic Optimization of Regenerative Systems Using Infinitesimal Perturbation Analysis,” IEEE Transactions on Automatic Control, Vol. 39, 1400-1410, 1994.

    Article  MATH  MathSciNet  Google Scholar 

  30. M. C. Fu, “Convergence of a Stochastic Approximation Algorithm for the GI/G/1 Queue Using Infinitesimal Perturbation Analysis,” Journal of Optimization Theory and Applications, Vol. 65, 149-160, 1990.

    Article  MATH  MathSciNet  Google Scholar 

  31. P. Marbach and T. N. Tsitsiklis, “Simulation-Based Optimization of Markov Reward Processes,” IEEE Transactions on Automatic Control, Vol. 46, 191-209, 2001.

    Article  MATH  MathSciNet  Google Scholar 

  32. P. Marbach and T. N. Tsitsiklis, “Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 13, 111-148, 2003.

    Article  MATH  MathSciNet  Google Scholar 

  33. F. J. Vazquez-Abad, C. G. Cassandras, and V. Julka, “Centralized and Decentralized Asynchronous Optimization of Stochastic Discrete Event Systems,” IEEE Transactions on Automatic Control, Vol. 43, 631-655, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  34. X. R. Cao (eds.), “Introduction to the Special Issue on Learning, Optimization, and Decision Making in DEDS,” Discrete Event Dynamic Systems: Theory and Applications, Vol. 13, 7-8, 2003.

    Article  Google Scholar 

  35. NSF (USA) Workshop on Learning and Approximate Dynamic Programming, Playacar, Mexico, April 8-10, 2002.

    Google Scholar 

  36. NSF (USA) Workshop and Outreach Tutorials on Approximate Dynamic Programming, Cocoyoc, Mexico, April 3-6, 2006

    Google Scholar 

  37. J. Si, A. G. Barto, W. B. Powell, and D. Wunsch, (eds) Handbook of Learning and Approximate Dynamic Programming, Wiley-IEEE Press, 2004.

    Google Scholar 

  38. A. G. Barto and S. Mahadevan, “Recent Advances in Hierarchical Reinforcement Learning,” Special Issue on Reinforcement Learning, Discrete Event Dynamic Systems: Theory and Application, Vol. 13, 41-77, 2003.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xi-Ren Cao PhD .

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Cao, XR. (2007). Event-Based Optimization of Markov Systems. In: Stochastic Learning and Optimization. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-69082-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-69082-7_8

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-36787-3

  • Online ISBN: 978-0-387-69082-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics