Skip to main content

Modelling Intelligent Behaviour: The Markov Decision Process Approach

  • Conference paper
  • First Online:
Progress in Artificial Intelligence — IBERAMIA 98 (IBERAMIA 1998)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1484))

Included in the following conference series:

Abstract

The problem of selecting action in environments that are dynamic and not completely predictable or observable is a central problem in intelligent behavior. From an AI point of view, the problem is to design a mechanism that can select the best actions given information provided by sensors and a suitable model of the actions and goals. We call this the problem of Planning as it is a direct generalization of the problem considered in Planning research where feedback is absent and the effect of actions is assumed to be predictable. In this paper we present an approach to Planning that combines ideas and methods from Operations Research and Artificial Intelligence. Basically Planning problems are described in high-level action languages that are compiled into general mathematical models of sequential decisions known as Markov Decision Processes or Partially Observable Markov Decision Processes, which are then solved by suitable Heuristic Search Algorithms. The result are controllers that map sequences of observations into actions, and which, under certain conditions can be shown to be optimal. We show how this approach applies to a number of concrete problems and discuss its relation to work in Reinforcement Learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81–138, 1995.

    Article  Google Scholar 

  2. D. Bertsekas and J. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.

    Google Scholar 

  3. A. Blum and M. Furst. Fast planning through planning graph analysis. In Proceedings of IJCAI-95, Montreal, Canada, 1995.

    Google Scholar 

  4. B. Bonet and H. Geffner. Learning sorting and decision trees with POMDPs. To appear in Proceedings ICML-98, 1998.

    Google Scholar 

  5. B. Bonet and H. Geffner. Planning and control with incomplete information using POMDPs: Experimental results. Available at http://www.ldc.usb.ve/~hector, 1998.

  6. B. Bonet, G. Loerincs, and H. Geffner. A robust and fast action selection mechanism for planning. In Proceedings of AAAI-97, pages 714–719. MIT Press, 1997.

    Google Scholar 

  7. C. Boutilier, T. Dean, and S. Hanks. Planning under uncertainty: structural assumptions and computational leverage. In Proceedings of EWSP-95, 1995.

    Google Scholar 

  8. A. Cassandra, L. Kaebling, and M. Littman. Acting optimally in partially observable stochastic domains. In Proceedings AAAI94, pages 1023–1028, 1994.

    Google Scholar 

  9. A. Cassandra, L. Kaebling, and M. Littman. Learning policies for partially observable environments: Scaling up. In Proc. of the 12th Int. Conf. on Machine Learning, 1995.

    Google Scholar 

  10. G. Collins and L. Pryor. Planning under uncertainty: Some key issues. In Proceedings IJCAI95, 1995.

    Google Scholar 

  11. T. Dean, L. Kaebling, J. Kirman, and A. Nicholson. Planning with deadlines in stochastic domains. In Proceedings AAAI93, pages 574–579. MIT Press, 1993.

    Google Scholar 

  12. T. Dean and K. Kanazawa. A model for reasoning about persistence and causation. Computational Intelligence, 5(3):142–150, 1989.

    Article  Google Scholar 

  13. T. Dean and M. Wellman. Planning and Control. Morgan Kaufmann, 1991.

    Google Scholar 

  14. O. Etzioni, S. Hanks, D. Draper, N. Lesh, and M. Williamson. An approach to planning with incomplete information. In Proceedings of the Third Int. Conference on Principles of Knowledge Representation and Reasoning, pages 115–125. Morgan Kaufmann, 1992.

    Google Scholar 

  15. R. Fikes and N. Nilsson. STRIPS: A new approach to the application of theorem proving to problem solving. Artificial Intelligence, 1:27–120, 1971.

    Article  Google Scholar 

  16. H. Geffner and B. Bonet. High-level plannnig and control with incomplete information using POMDP’s. In Proceedings AIPS-98 Workshop on Integrating Planning, Scheduling and Execution in Dynamic and Uncertain Environments, 1998. Available at http://www.ldc.usb.ve/~hector.

  17. H. Kautz and B. Selman. Pushing the en velope: Planning, propositional logic, and stochastic search. In Proceedings of AAAI-96, pages 1194–1201, Protland, Oregon, 1996. MIT Press.

    Google Scholar 

  18. R. Korf. Real-time heuristic search. Artificial Intelligence, 42:189–211, 1990.

    Article  MATH  Google Scholar 

  19. N. Kushmerick, S. Hanks, and D. Weld. An algorithm for probabilistic planning. Artificial Intelligence, 76:239–286, 1995.

    Article  Google Scholar 

  20. H. Levesque. What is planning in the presence of sensing. In Proceedings AAAI-96, pages 1139–1146, Portland, Oregon, 1996. MIT Press.

    Google Scholar 

  21. D. McDermott. A heuristic estimator for means-ends analysis in planning. In Proc. Third Int. Conf. on AI Planning Systems (AIPS-96), 1996.

    Google Scholar 

  22. A. Newell and H. Simon. Human Problem Solving. Prentice-Hall, Englewood Cliffs, NJ, 1972.

    Google Scholar 

  23. N. Nilsson. Principles of Artificial Intelligence. Tioga, 1980.

    Google Scholar 

  24. L. Padulo and M. Arbib. System Theory. Hemisphere Publishing Co., 1974.

    Google Scholar 

  25. J. Pearl. Heuristics. Morgan Kaufmann, 1983.

    Google Scholar 

  26. M. Puterman. Markov Decision Processes — Discrete Stochastic Dynamic Programming. John Wiley and Sons, Inc., 1994.

    Google Scholar 

  27. S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, 1994.

    Google Scholar 

  28. E. Sondik. The Optimal Control of Partially Observable Markov Processes. PhD thesis, Stanford University, 1971.

    Google Scholar 

  29. R. Sutton. Learning to predict by the method of temporal differences. Machine Learning, 3:9–44, 1988.

    Google Scholar 

  30. R. Sutton. Integrated architectures for learning, planning and reacting based on approximating dynamic programming. In Proceedings of ML-90, pages 216–224. Morgan Kaufmann, 1990.

    Google Scholar 

  31. R. Sutton and A. Barto. Introduction to Reinforcement Learning. MIT Press, 1998.

    Google Scholar 

  32. C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, 1989.

    Google Scholar 

  33. D. Weld. An introduction to least commitment planning. AI Magazine, 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Geffner, H. (1998). Modelling Intelligent Behaviour: The Markov Decision Process Approach. In: Coelho, H. (eds) Progress in Artificial Intelligence — IBERAMIA 98. IBERAMIA 1998. Lecture Notes in Computer Science(), vol 1484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49795-1_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-49795-1_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64992-2

  • Online ISBN: 978-3-540-49795-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics