Modelling Intelligent Behaviour: The Markov Decision Process Approach

Geffner, Héctor

doi:10.1007/3-540-49795-1_1

Héctor Geffner²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1484))

Included in the following conference series:

Ibero-American Conference on Artificial Intelligence

2515 Accesses
1 Citations

Abstract

The problem of selecting action in environments that are dynamic and not completely predictable or observable is a central problem in intelligent behavior. From an AI point of view, the problem is to design a mechanism that can select the best actions given information provided by sensors and a suitable model of the actions and goals. We call this the problem of Planning as it is a direct generalization of the problem considered in Planning research where feedback is absent and the effect of actions is assumed to be predictable. In this paper we present an approach to Planning that combines ideas and methods from Operations Research and Artificial Intelligence. Basically Planning problems are described in high-level action languages that are compiled into general mathematical models of sequential decisions known as Markov Decision Processes or Partially Observable Markov Decision Processes, which are then solved by suitable Heuristic Search Algorithms. The result are controllers that map sequences of observations into actions, and which, under certain conditions can be shown to be optimal. We show how this approach applies to a number of concrete problems and discuss its relation to work in Reinforcement Learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81–138, 1995.
Article Google Scholar
D. Bertsekas and J. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.
Google Scholar
A. Blum and M. Furst. Fast planning through planning graph analysis. In Proceedings of IJCAI-95, Montreal, Canada, 1995.
Google Scholar
B. Bonet and H. Geffner. Learning sorting and decision trees with POMDPs. To appear in Proceedings ICML-98, 1998.
Google Scholar
B. Bonet and H. Geffner. Planning and control with incomplete information using POMDPs: Experimental results. Available at http://www.ldc.usb.ve/~hector, 1998.
B. Bonet, G. Loerincs, and H. Geffner. A robust and fast action selection mechanism for planning. In Proceedings of AAAI-97, pages 714–719. MIT Press, 1997.
Google Scholar
C. Boutilier, T. Dean, and S. Hanks. Planning under uncertainty: structural assumptions and computational leverage. In Proceedings of EWSP-95, 1995.
Google Scholar
A. Cassandra, L. Kaebling, and M. Littman. Acting optimally in partially observable stochastic domains. In Proceedings AAAI94, pages 1023–1028, 1994.
Google Scholar
A. Cassandra, L. Kaebling, and M. Littman. Learning policies for partially observable environments: Scaling up. In Proc. of the 12th Int. Conf. on Machine Learning, 1995.
Google Scholar
G. Collins and L. Pryor. Planning under uncertainty: Some key issues. In Proceedings IJCAI95, 1995.
Google Scholar
T. Dean, L. Kaebling, J. Kirman, and A. Nicholson. Planning with deadlines in stochastic domains. In Proceedings AAAI93, pages 574–579. MIT Press, 1993.
Google Scholar
T. Dean and K. Kanazawa. A model for reasoning about persistence and causation. Computational Intelligence, 5(3):142–150, 1989.
Article Google Scholar
T. Dean and M. Wellman. Planning and Control. Morgan Kaufmann, 1991.
Google Scholar
O. Etzioni, S. Hanks, D. Draper, N. Lesh, and M. Williamson. An approach to planning with incomplete information. In Proceedings of the Third Int. Conference on Principles of Knowledge Representation and Reasoning, pages 115–125. Morgan Kaufmann, 1992.
Google Scholar
R. Fikes and N. Nilsson. STRIPS: A new approach to the application of theorem proving to problem solving. Artificial Intelligence, 1:27–120, 1971.
Article Google Scholar
H. Geffner and B. Bonet. High-level plannnig and control with incomplete information using POMDP’s. In Proceedings AIPS-98 Workshop on Integrating Planning, Scheduling and Execution in Dynamic and Uncertain Environments, 1998. Available at http://www.ldc.usb.ve/~hector.
H. Kautz and B. Selman. Pushing the en velope: Planning, propositional logic, and stochastic search. In Proceedings of AAAI-96, pages 1194–1201, Protland, Oregon, 1996. MIT Press.
Google Scholar
R. Korf. Real-time heuristic search. Artificial Intelligence, 42:189–211, 1990.
Article MATH Google Scholar
N. Kushmerick, S. Hanks, and D. Weld. An algorithm for probabilistic planning. Artificial Intelligence, 76:239–286, 1995.
Article Google Scholar
H. Levesque. What is planning in the presence of sensing. In Proceedings AAAI-96, pages 1139–1146, Portland, Oregon, 1996. MIT Press.
Google Scholar
D. McDermott. A heuristic estimator for means-ends analysis in planning. In Proc. Third Int. Conf. on AI Planning Systems (AIPS-96), 1996.
Google Scholar
A. Newell and H. Simon. Human Problem Solving. Prentice-Hall, Englewood Cliffs, NJ, 1972.
Google Scholar
N. Nilsson. Principles of Artificial Intelligence. Tioga, 1980.
Google Scholar
L. Padulo and M. Arbib. System Theory. Hemisphere Publishing Co., 1974.
Google Scholar
J. Pearl. Heuristics. Morgan Kaufmann, 1983.
Google Scholar
M. Puterman. Markov Decision Processes — Discrete Stochastic Dynamic Programming. John Wiley and Sons, Inc., 1994.
Google Scholar
S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, 1994.
Google Scholar
E. Sondik. The Optimal Control of Partially Observable Markov Processes. PhD thesis, Stanford University, 1971.
Google Scholar
R. Sutton. Learning to predict by the method of temporal differences. Machine Learning, 3:9–44, 1988.
Google Scholar
R. Sutton. Integrated architectures for learning, planning and reacting based on approximating dynamic programming. In Proceedings of ML-90, pages 216–224. Morgan Kaufmann, 1990.
Google Scholar
R. Sutton and A. Barto. Introduction to Reinforcement Learning. MIT Press, 1998.
Google Scholar
C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, 1989.
Google Scholar
D. Weld. An introduction to least commitment planning. AI Magazine, 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

Depto de Computación, Universidad Simón Bolívar, Aptdo, 89000, Caracas, Venezuela
Héctor Geffner

Authors

Héctor Geffner
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dep. Informática, Fac. Ciências de Lisboa, Bloco C5, Piso 1, Campo Grande, 1700, Lisboa, Portugal
Helder Coelho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Geffner, H. (1998). Modelling Intelligent Behaviour: The Markov Decision Process Approach. In: Coelho, H. (eds) Progress in Artificial Intelligence — IBERAMIA 98. IBERAMIA 1998. Lecture Notes in Computer Science(), vol 1484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49795-1_1

Download citation

DOI: https://doi.org/10.1007/3-540-49795-1_1
Published: 14 January 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64992-2
Online ISBN: 978-3-540-49795-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics