Abstract
In this chapter, we propose a non-exhaustive review of past works of the AI community on classical planning and planning under uncertainty. We first present the classical propositional STRIPS planning language. Its extensions, based on the problem description language PDDL have become a standard in the community. We briefly deal with the structural analysis of planning problems, which has initiated the development of efficient planning algorithms and associated planners. Then, we describe the Markov Decision Processes framework (MDP), initially proposed in the Operations Research community before the AI community adopted it as a framework for planning under uncertainty. Eventually, we will describe innovative (approximate or exact) MDP solution algorithms as well as recent progresses in AI in terms of knowledge representation (logics, Bayesian networks) which have been used to increase the power of expression of the MDP framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
International Planning Competition: http://ipc.icaps-conference.org.
References
Akström KJ (1965) Optimal control of Markov decision processes with incomplete state estimation. J Math Anal Appl 10:174–205
Bacchus F (2001) The 2000 AI planning systems competition. AI Mag 22(3):47–56
Bäckström C, Nebel B (1995) Complexity results for SAS\(+\) planning. Comput Intell 11(4):625–655
Bahar RI, Frohm EA, Gaona CM, Hachtel GD, Macii E, Pardo A, Somenzi F (1997) Algebraic decision diagrams and their applications. Form Methods Syst Des 10(2–3):171–206
Bellman RE (1957) Dynamic programming. Princeton University Press, Princeton
Ben Amor N, El Khalfi Z, Fargier H, Sabbadin R (2018) Lexicographic refinements in possibilistic decision trees and finite-horizon markov decision processes. Fuzzy Sets Syst (In Press)
Bertsekas DP (1987) Dynamic programming: deterministic and stochastic models. Prentice-Hall, Englewood Cliffs
Beynier A, Charpillet F, Szer D, Mouaddib A (2010) DEC-MDP/POMDP. Markov decision processes and artificial intelligence. Wiley, New York, pp 321–359
Bibai J, Savéant P, Schoenauer M, Vidal V (2010) An evolutionary metaheuristic based on state decomposition for domain-independent satisficing planning. Proceedings of ICAPS, pp 18–25
Blum AL, Furst ML (1997) Fast planning through planning graph analysis. Artif Intell 90(1–2):279–298
Blythe J (1999) An overview of planning under uncertainty. AI Mag 20(2):37–54
Bonet B, Geffner H (1999) Planning as heuristic search: New results. In: Proceedings of ECP, pp 359–371
Bonet B, Geffner H (2001) Planning as heuristic search. Artif Intell 129(1–2):5–33
Bonet B, Geffner H (2005) mGPT: a probabilistic planner based on heuristic search. J Artif Intell Res 24:933–944
Bonet B, Loerincs G, Geffner H (1997) A robust and fast action selection mechanism for planning. In: Proceedings of AAAI, pp 714–719
Botea A, Enzenberger M, Müller M, Schaeffer J (2005) Macro-FF: improving AI planning with automatically learned macro-operators. J Artif Intell Res 24:581–621
Boutilier C, Brafman RI, Geib CW (1998) Structured reachability analysis for Markov decision processes. In: Proceedings of UAI, pp 24–32
Boutilier C, Dearden R, Goldszmidt M (2000) Stochastic dynamic programming with factored representations. Artif Intell 121(1–2):49–107
Bryant RE (1986) Graph-based algorithms for boolean function manipulation. IEEE Trans Comput 35(8):677–691
Buffet O, Aberdeen D (2007) FF \(+\) FPG: guiding a policy-gradient planner. In: Proceedings of ICAPS, pp 42–48
Buffet O, Aberdeen D (2009) The factored policy-gradient planner. Artif Intell 173(5–6):722–747
Buffet O, Sigaud O (eds) (2008a). Processus décisionnels de Markov en intelligence artificielle - vol. 1. Traité IC2 - Informatique et systèmes d’information. Hermes - Lavoisier, Cachan
Buffet O, Sigaud O (eds) (2008b) Processus décisionnels de Markov en intelligence artificielle - vol. 2. Traité IC2 - Informatique et systèmes d’information. Hermes - Lavoisier, Cachan
Burns E, Lemons S, Zhou R, Ruml W (2009) Best-first heuristic search for multi-core machines. In: Proceedings of IJCAI, pp 449–455
Bylander T (1994) The computational complexity of propositional STRIPS planning. Artif Intell 69(1–2):165–204
Cai D, Hoffmann J, Helmert M (2009) Enhancing the context-enhanced additive heuristic with precedence constraints. In Proceedings of ICAPS
Canu A, Mouaddib A (2011) Dynamic local interaction model: framework and algorithms. In: Proceedings of AAMAS, workshop of multi-agent sequential decision making in uncertain multi-agent domains (MSDM)
Chen Y, Hsu C, Wah B (2006) Temporal planning using subgoal partitioning and resolution in SGPlan. Artif Intell 26:323–369
Chien S, Rabideau G, Knight R, Sherwood R, Engelhardt B, Mutz D, Estlin T, Smith B, Fisher F, Barrett T, Stebbins G, Tran D (2000) ASPEN - automating space mission operations using automated planning and scheduling. In: Proceedings of the international conference on space operations (SpaceOps)
Coles A, Smith KA (2007) Marvin: a heuristic search planner with online macro-action learning. J Artif Intell Res 28:119–156
Coles A, Fox M, Long D, Smith A (2008) Planning with problems requiring temporal coordination. In: Proceedings of AAAI, pp 892–897
Coles A, Coles A, Fox M, Long D (2009) Temporal planning in domains with linear processes. In: Proceedings of IJCAI, pp 1671–1676
Cushing W, Kambhampati S, Mausam, Weld, DS (2007a) When is temporal planning really temporal? In: Proceedings of IJCAI, pp 1852–1859
Cushing W, Weld DS, Kambhampati S, Mausam, Talamadupula, K. (2007b) Planning with graded non deterministic actions: a possibilistic approach. In: Proceedings of ICAPS, pp 105–112
Da Costa PC, Garcia F, Lang J, Martin-Clouaire R (1997) Planning with graded non deterministic actions: a possibilistic approach. Int J Intell Syst 12:935–962
Dean T, Kanazawa K (1990) A model for reasoning about persistence and causation. Comput Intell 5(3):142–150
Do M, Kambhampati S (2003) Sapa: a multi-objective metric temporal planner. J Artif Intell Res 20:155–194
Do MB, Kambhampati S (2001) Planning as constraint satisfaction: solving the planning graph by compiling it into CSP. Artif Intell 132(2):151–182
Durfee EH (1999) Distributed problem solving and planning. Multiagent systems: a modern approach to distributed artificial intelligence. MIT Press, Cambridge, pp 121–164
Edelkamp S, Kissmann P (2008) GAMER: bridging planning and general game playing with symbolic search. In: Proceedings of IPC
Fabre E, Jezequel L, Haslum P, Thiébaux S (2010) Cost-optimal factored planning: promises and pitfalls. In: Proceedings of ICAPS, pp 65–72
Feng Z, Hansen EA (2002) Symbolic heuristic search for factored Markov decision processes. In: Proceedings of AAAI, pp 455–460
Feng Z, Hansen EA, Zilberstein S (2003) Symbolic generalization for on-line planning. In: Proceedings of UAI, pp 209–216
Fikes R, Nilsson NJ (1971) STRIPS: a new approach to the application of theorem proving to problem solving. Artif Intell 2(3–4):189–208
Fox M, Long D (2003) PDDL2.1: an extension to PDDL for expressing temporal planning domains. J Artif Intell Res 20:61–124
Fox M, Long D (2006) Modelling mixed discrete-continuous domains for planning. J Artif Intell Res 27:235–297
Garcia L, Sabbadin R (2008) Complexity results and algorithms for possibilistic influence diagrams. Artif Intell 172(8–9):1018–1044
Gazen B, Knoblock C (1997) Combining the expressiveness of UCPOP with the efficiency of Graphplan. In: Proceedings of ECP, pp 221–233
Geffner H (2000) Functional STRIPS: A more flexible language for planning and problem solving. In: Minker J (ed) Logic-based artificial intelligence. Kluwer, Alphen aan den Rijn, pp 187–209
Gerevini A, Long D (2005) Plan constraints and preferences in PDDL3. Technical report RT 2005–08-47, Department of Electronics for Automation, University of Brescia, Italy
Gerevini A, Haslum P, Long D, Saetti A, Dimopoulos Y (2009) Deterministic planning in the 5th international planning competition: PDDL3 and experimental evaluation of the planners. Artif Intell 173(5–6):619–668
Ghallab M, Laruelle H (1994) Representation and control in IxTeT, a temporal planner. In: Proceedings of AIPS, pp 61–67
Ghallab M, Nau D, Traverso P (2004) Automated planning: theory and practice. Morgan Kaufmann, San Francisco
Ghavamzadeh M, Mannor S, Pineau J, Tamar A (2015) Bayesian reinforcement learning: a survey. Found Trends Mach Learn 8(5–6):359–483
Givan R, Dean T, Greig M (2003) Equivalence notions and model minimization in Markov decision processes. Artif Intell 147(1–2):163–223
Grandcolas S, Pain-Barre C (2007) Filtering, decomposition and search space reduction for optimal sequential planning. In: Proceedings of AAAI, pp 993–998
Hart P, Nilsson N, Raphael B (1968) A formal basis for the heuristic determination of minimum-cost paths. IEEE Trans Syst Sci Cybern 4(2):100–107
Haslum P (2006) Improving heuristics through relaxed search - an analysis of TP4 and HSP*a in the 2004 planning competition. J Artif Intell Res 25:233–267
Haslum P, Geffner H (2000) Admissible heuristics for optimal planning. In: Proceedings of AIPS, pp 70–82
Haslum P, Geffner H (2001) Heuristic planning with time and resources. In: Proceedings of ECP, pp 121–132
Helmert M (2003) Complexity results for standard benchmark domains in planning. Artif Intell 143(2):219–262
Helmert M (2004) A planning heuristic based on causal graph analysis. In: Proceedings of ICAPS, pp 161–170
Helmert M (2006) The fast downward planning system. J Artif Intell Res 26:191–246
Helmert M (2008) Understanding planning tasks. Springer, Berlin
Helmert M, Domshlak C (2009) Landmarks, critical paths and abstractions: what’s the difference anyway? In: Proceedings of ICAPS, pp 162–169
Helmert M, Geffner H (2008) Unifying the causal graph and additive heuristics. In: Proceedings of ICAPS, pp 140–147
Helmert M, Haslum P, Hoffmann J (2007) Flexible abstraction heuristics for optimal sequential planning. In: Proceedings of ICAPS, pp 176–183
Helmert M, Do MB, Refanidis I (2008a) IPC-2008, deterministic part: changes in PDDL 3.1. http://ipc08.icaps-conference.org/PddlExtension
Helmert M, Do MB, Refanidis I (2008b) IPC-2008, deterministic part: results. http://ipc08.icaps-conference.org/Results
Hickmott SL, Rintanen J, Thiébaux S, White LB (2007) Planning via petri net unfolding. In: Proceedings of IJCAI, pp 1904–1911
Hoey J, St-Aubin R, Hu AJ, Boutilier C (1999) SPUDD: Stochastic planning using decision diagrams. In: Proceedings of UAI, pp 279–288
Hoffmann J (2002) Extending FF to numerical state variables. In: Proceedings of ECAI, pp 571–575
Hoffmann J, Edelkamp S (2004) PDDL2. 2: the language for the classical part of IPC-4. In: Proceedings of IPC
Hoffmann J, Edelkamp S (2005) The deterministic part of IPC-4: an overview. J Artif Intell Res 24:519–579
Hoffmann J, Nebel B (2001) The FF planning system: fast plan generation through heuristic search. J Artif Intell Res 14:253–302
Hoffmann J, Porteous J, Sebastia L (2004) Ordered landmarks in planning. J Artif Intell Res 22:215–278
Huang R, Chen Y, Zhang W (2010) A novel transition based encoding scheme for planning as satisfiability. In: Proceedings of AAAI
Jensen FV (2001) Bayesian networks and decision graphs. Springer, Berlin
Joshi S, Kersting K, Khardon R (2010) Self-taught decision theoretic planning with first order decision diagrams. In: Proceedings of ICAPS, pp 89–96
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable domains. Artif Intell 101(1–2):99–134
Karpas E, Domshlak C (2009) Cost-optimal planning with landmarks. In: Proceedings of IJCAI, pp 1728–1733
Kautz H, Selman B (1999) Unifying SAT-based and graph-based planning. In: Proceedings of IJCAI, pp 318–325
Kautz H, McAllester D, Selman B (1996) Encoding plans in propositional logic. In: Proceedings of KR, pp 374–384
Keller T, Eyerich P (2012) Prost: probabilistic planning based on uct. In: Proceedings of ICAPS
Keller T, Helmert M (2013) Trial-based heuristic tree search for finite horizon mdps. In: Proceedings of ICAPS
Keyder E, Geffner H (2008) Heuristics for planning with action costs revisited. In: Proceedings of ECAI, pp 588–592
Keyder E, Richter S, Helmert M (2010) Sound and complete landmarks for and/or graphs. In: Proceedings of ECAI, pp 335–340
Kishimoto A, Fukunaga AS, Botea A (2009) Scalable, parallel best-first search for optimal sequential planning. In: Proceedings of ICAPS, pp 10–17
Koehler J, Hoffmann J (2000) On reasonable and forced goal orderings and their use in an agenda-driven planning algorithm. J Artif Intell Res 12:338–386
Koehler J, Nebel B, Hoffmann J, Dimopoulos Y (1997) Extending planning graphs to an ADL subset. In: Proceedings of ECP, pp 273–285
Kolobov A, Mausam M, Weld DS (2009) ReTrASE: integrating paradigms for approximate probabilistic planning. In: Proceedings of IJCAI, pp 1746–1753
Kolobov A, Mausam, Weld DS (2010a) Classical planning in MDP heuristics: with a little help from generalization. In: Proceedings of ICAPS, pp 97–104
Kolobov A, Mausam, Weld DS (2010b) Sixthsense: fast and reliable recognition of dead ends in MDPs. In: Proceedings of AAAI
Kolobov A, Mausam Weld DS, Geffner H (2011) Heuristic search for generalized stochastic shortest path mdps. In: Proceedings of ICAPS
Kolobov A, Dai P, Mausam M, Weld DS (2012a) Reverse iterative deepening for finite-horizon MDPs with large branching factors. In: Proceedings of ICAPS
Kolobov A, Mausam, Weld DS (2012b) A theory of goal-oriented MDPs with dead ends. In: Proceedings of UAI, pp 438–447
Korf R (1985) Depth-first iterative-deepening: an optimal admissible tree search. Artif Intell 27(1):97–109
Kumar PR, Varaiya PP (1986) Stochastic systems: estimation, identification and adaptive control. Prentice Hall, Englewood Cliffs
Kuter U, Nau DS (2005) Using domain-configurable search control for probabilistic planning. In: Proceedings of AAAI, pp 1169–1174
Laborie P (2003) Algorithms for propagating resource constraints in AI planning and scheduling. Artif Intell 143(2):151–188
Laborie P, Ghallab M (1995) Planning with sharable resources constraints. In: Proceedings of IJCAI, pp 1643–1649
Littman ML (1994) Memoryless policies: theoretical limitations and practical results. In: Proceedings of ICSAB, pp 238–245
Long D, Fox M (1999) The efficient implementation of the plan-graph. J Artif Intell Res 10:85–115
Long D, Fox M (2003) The 3rd international planning competition: results and analysis. J Artif Intell Res 20:1–59
Maris F, Régnier P (2008) TLP-GP: solving temporally-expressive planning problems. In: Proceedings of TIME, pp 137–144
Maris F, Régnier P, Vidal V (2008) Planification par satisfaction de bases de clauses. In: Saïs L (ed) Problème SAT : défis et challenges (Chap. 11). Hermes, pp 289–309
McDermott D (1996) A heuristic estimator for means-ends analysis in planning. In: Proceedings of AIPS, pp 142–149
McDermott D (2000) The 1998 AI planning systems competition. AI Mag 21(2):35–56
McDermott D, Ghallab M, Howe A, Knoblock C, Ram A, Veloso M, Weld D, Wilkins D (1998) PDDL - The Planning Domain Definition Language. Technical report CVC TR-98-003/DCS TR-1165, Yale Center for Computational Vision and Control, New Haven, CI, USA
Meseguer P, Rossi F, Schiex T (2006) Soft constraints. In: Rossi F, van Beek P, Walsh T (eds) Handbook of constraint programming (Chap. 9). Elsevier, Amsterdam, pp 281–328
Moore AW, Atkeson CG (1993) Prioritized sweeping: reinforcement learning with less data and less real time. Mach Learn 13:103–130
Mouaddib AI, Pralet C, Sabbadin R, Weng P (2008) Processus décisionnels de Markov et critères non classiques. In: Buffet O, Sigaud O (eds) Processus décisionnels de Markov en intelligence artificielle - vol 1 (Chap. 5). Hermes - Lavoisier, Cachan
Newell A, Simon H (1963) GPS: a program that simulates human thought. In: Feigenbaum E, Feldman J (eds) Computers and thought. McGraw Hill, New-York, pp 279–293
Nguyen X, Kambhampati S (2001) Reviving partial order planning. In: Proceedings of IJCAI, pp 459–466
Pearl J (1983) Heuristics. Addison Wesley, Reading
Pednault EPD (1989) ADL: exploring the middle ground between STRIPS and the situation calculus. In: Proceedings of IJCAI, pp 324–332
Penberthy J, Weld D (1992) UCPOP: a sound, complete, partial order planner for ADL. In: Proceeidngs of KR, pp 103–114
Peng J, Williams RJ (1993) Efficient learning and planning within the dyna framework. Adapt Behav 1(4):437–454
Pineau J, Gordon G, Thrun S (2003) Point-based value iteration: an anytime algorithm for POMDPs. In: Proceedings of IJCAI, pp 1025–1032
Pohl I (1970) Heuristic search viewed as path finding in a graph. Artif Intell 1(3):193–204
Pralet C, Verfaillie G (2010) Réseaux de contraintes sur des chronogrammes pour la planification et l’ordonnancement. Revue d’Intelligence Artificielle 24(4):485–504
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Richter S, Westphal M (2010) The LAMA planner: guiding cost-based anytime planning with landmarks. J Artif Intell Res 39:127–177
Richter S, Helmert M, Westphal M (2008) Landmarks revisited. In: Proceedings of AAAI, pp 975–982
Rintanen J (2007) Complexity of concurrent temporal planning. In: Proceedings of ICAPS, pp 280–287
Rintanen J (2010) Heuristics for planning with SAT. In: Proceedings of CP, pp 414–428
Sabbadin R (2001) Possibilistic Markov decision processes. Eng Appl Artif Intell 14:287–300
Sabbadin R, Fargier H, Lang J (1998) Towards qualitative approaches to multi-stage decision making. Int J Approx Reason 19(3–4):441–471
Sanner, S. (2010). Relational dynamic influence diagram language (RDDL): language description. http://users.cecs.anu.edu.au/~ssanner/IPPC_2011/RDDL.pdf
Shoham Y, Leyton-Brown K (2009) Multiagent systems: algorithmic, game-theoretic, and logical foundations. Cambridge University Press, New York
Smith DE, Weld DS (1999) Temporal planning with mutual exclusion reasoning. In: Proceedings of IJCAI, pp 326–337
Smith T, Simmons R (2005) Point-based POMDP algorithms: improved analysis and implementation. In: Proceedings of UAI, pp 542–547
Sondik EJ (1978) The optimal control of partially observable Markov processes over the infinite horizon: discounted costs. Oper Res 26(2):282–304
St-Aubin R, Hoey J, Boutilier C (2000) APRICODD: approximate policy construction using decision diagrams. In: Proceedings of NIPS, pp 1089–1095
Sutton R (1988) Learning to predict by the method of temporal differences. Mach Learn 3(1):9–44
Sutton R (1991) Planning by incremental dynamic programming. In: Proceedings of IWML, pp 353–357
Teichteil-Königsbuch F (2012) Stochastic safest and shortest path problems. In: Proceedings of AAAI
Teichteil-Königsbuch F, Kuter U, Infantes G (2010) Incremental plan aggregation for generating policies in MDPs. In: Proceedings of AAMAS, pp 1231–1238
Teichteil-Königsbuch F, Vidal V, Infantes G (2011) Extending classical planning heuristics to probabilistic planning with dead-ends. In: Proceedings of AAAI
Thiébaux S, Hoffmann J, Nebel B (2003) In defense of PDDL axioms. In: Proceedings of IJCAI, pp 961–968
Trevizan FW, Veloso MM (2014) Depth-based short-sighted stochastic shortest path problems. Artif Intell 216:179–205
Trevizan FW, Thiébaux S, Santana PH, Williams BC (2016) Heuristic search in dual space for constrained stochastic shortest path problems. In: Proceedings of ICAPS, pp 326–334
Valenzano R, Sturtevant N, Schaeffer J, Buro K, Kishimoto A (2010) Simultaneously searching with multiple settings: an alternative to parameter tuning for suboptimal single-agent search algorithms. In: Proceedings of ICAPS, pp 177–184
Verfaillie G, Pralet C, Lemaître M (2010) How to model planning and scheduling problems using constraint networks on timelines. Knowl Eng Rev 25(3):319–336
Vidal V (2001) Recherche dans les graphes de planification, satisfiabilité et stratégies de moindre engagement. Les systèmes LCGP et LCDPP. PhD thesis, IRIT, Université Paul Sabatier, Toulouse, France
Vidal V (2004) A lookahead strategy for heuristic search planning. In: Proceedings of ICAPS, pp 150–160
Vidal V, Geffner H (2005) Solving simple planning problems with more inference and no search. In: Proceedings of CP, pp 682–696
Vidal V, Geffner H (2006) Branching and pruning: an optimal temporal POCL planner based on constraint programming. Artif Intell 170(3):298–335
Vidal V, Bordeaux L, Hamadi Y (2010) Adaptive K-parallel best-first search: a simple but efficient algorithm for multi-core domain-independent planning. In: Proceedings of SOCS
Vlassis N (2009) Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, San Rafael
Watkins CJ (1989) Learning from Delayed Rewards. PhD thesis, King’s College, Cambridge, UK
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 3(8):279–292
Weld DS (1994) An introduction to least commitment planning. AI Mag 15(4):27–61
Yoon SW, Fern A, Givan R (2007) FF-replan: a baseline for probabilistic planning. In: Proceedings of ICAPS, pp 352–359
Yoon SW, Fern A, Givan R, Kambhampati S (2008) Probabilistic planning via determinization in hindsight. In: Proceedings of AAAI, pp 1010–1016
Yoon SW, Ruml W, Benton J, Do MB (2010) Improving determinization in hindsight for on-line probabilistic planning. In: Proceedings of ICAPS, pp 209–217
Younes HLS, Simmons RG (2003) VHPOP: versatile heuristic partial order planner. J Artif Intell Res 20:405–430
Younes HLS, Littman ML, Weissman D, Asmuth J (2005) The first probabilistic track of the international planning competition. J Artif Intell Res 24:851–887
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Sabbadin, R., Teichteil-Königsbuch, F., Vidal, V. (2020). Planning in Artificial Intelligence. In: Marquis, P., Papini, O., Prade, H. (eds) A Guided Tour of Artificial Intelligence Research. Springer, Cham. https://doi.org/10.1007/978-3-030-06167-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-06167-8_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-06166-1
Online ISBN: 978-3-030-06167-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)