Advertisement

On the significance of Markov decision processes

  • Richard S. Sutton
Part II: Cortical Maps and Receptive Fields
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1327)

Abstract

Formulating the problem facing an intelligent agent as a Markov decision process (MDP) is increasingly common in artificial intelligence, reinforcement learning, artificial life, and artificial neural networks. In this short paper we examine some of the reasons for the appeal of this framework. Foremost among these are its generality, simplicity, and emphasis on goal-directed interaction between the agent and its environment. MDPs may be becoming a common focal point for different approaches to understanding the mind. Finally, we speculate that this focus may be an enduring one insofar as many of the efforts to extend the MDP framework end up bringing a wider class of problems back within it.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barto, A. G., Bradtke, S. J., and Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81–138.Google Scholar
  2. Barto, A. G., Sutton, R. S., and Watkins, C. J. C. H. (1990). Learning and sequential decision making. In Gabriel, M. and Moore, J., editors, Learning and Computational Neuroscience: Foundations of Adaptive Networks, pages 539–602. MIT Press, Cambridge, MA.Google Scholar
  3. Bellman, R. E. (1957). A Markov decision process.Journal of Mathematical Mech., 6:679–684.Google Scholar
  4. Boutilier, C., Dearden, R., and Goldszmidt, M. (1995). Exploiting structure in policy construction. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence.Google Scholar
  5. Crites, R. H. and Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, M. E. H. editor, Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference, pages 1017–1023, Cambridge, MA. MIT Press.Google Scholar
  6. Dean, T. L., Kaelbling, L. P., Kirman, J., and Nicholson, A. (1995).Planning under time constraints in stochastic domains. Artificial Intelligence, 76(12):35–74.Google Scholar
  7. Houk, J. C., Adams, J. L., and Barto, A. G. (1995).A model of how the basal ganglia generates and uses neural signals that predict reinforcement. In Houk, J. C., Davis, J. L., and Beiser, D. G., editors, Models of Information Processing in the Basal Ganglia, pages 249–270. MIT Press, Cambridge, MA.Google Scholar
  8. McCallum, A. K. (1995). Reinforcement Learning with Selective Perception and Hidden State. PhD thesis, University of Rochester, Rochester.Google Scholar
  9. Precup, D. and Sutton, R. S. (in preparation). Multi-time models for temporally abstract planning.Google Scholar
  10. Santamaria, J. C., Sutton, R. S., and Ram, A. (1996). Experiments with reinforcement learning in problems with continuous state and action spaces. Technical Report UM-CS-1996-088, Department of Computer Science, University of Massachusetts, Amherst, MA 01003.Google Scholar
  11. Schultz, W., Dayan, P., and Montague, P. R. (1997).A neural substrate of prediction and reward. Science, 275:1593–1598.Google Scholar
  12. Singh, S. P. (1992). Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning, 8:323–339.Google Scholar
  13. Singh, S. P., Jaakkola, T., and Jordan, M. I. (1994). Learning without stateestimation in partially observable Markovian decision problems. In Cohen, W. W. and Hirsch, H., editors, Proceedings of the Eleventh International Conference on Machine Learning, pages 284–292, San Francisco, CA. Morgan Kaufmann.Google Scholar
  14. Singh, S. P., Jaakkola, T., and Jordan, M. I. (1995). Reinforcement learing with soft state aggregation. In G. Tesauro, D. Touretzky, T. L., editor, Advances in Neural Information Processing Systems: Proceedings of the 1994 Conference, pages 359–368, Cambridge, MA. MIT Press.Google Scholar
  15. Sutton, R. S. (1995). TD models: Modeling the world at a mixture of time scales. In Prieditis, A. and Russell, S., editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 531–539, San Francisco, CA. Morgan Kaufmann.Google Scholar
  16. Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E., editors, Advances in Neural Information Processing Sys tems: Proceedings of the 1995 Conference, pages 1038–1044, Cambridge, MA. MIT Press.Google Scholar
  17. Sutton, R. S. and Barto, A. G. (1990). Time-derivative models of Pavlovian reinforcement. In Gabriel, M. and Moore, J., editors, Learning and Computational Neuroscience: Foundations of Adaptive Networks, pages 497–537. MIT Press, Cambridge, MA.Google Scholar
  18. Sutton, R. S. and Barto, A. G. (1998). Introduction to Reinforcement Learning. MIT Press/Bradford Books, Cambridge, MA.Google Scholar
  19. Tesauro, G. J. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38:58–68.Google Scholar
  20. Tesauro, G. J. and Galperin, G. R. (1997). On-line policy improvement using monte-carlo search. In Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference, Cambridge, MA. MIT Press.Google Scholar
  21. Van Roy, B., Bertsekas, D. P., Lee, Y., and Tsitsiklis, J. N. (1996). A neurodynamic programming approach to retailer inventory management. Technical ] Report LIDS-P-?, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology.Google Scholar
  22. Watkins, C. J. C. H. (1989). Learning from Delayed Rewards.PhD thesis, Cambridge University, Cambridge, England.Google Scholar
  23. Witten, I. H. (1977). Exploring, modelling and controlling discrete sequential environments. International Journal of Man-Machine Studies, 9:715–735.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Richard S. Sutton
    • 1
  1. 1.Department of Computer ScienceUniversity of MassachusettsAmherstUSA

Personalised recommendations