Learning Feature-Based Heuristic Functions

Abstract

Planning is the process of creating a sequence of actions that achieve some desired goals. Automated planning arguably plays a key role in both developing intelligent systems and solving many practical industrial problems. Typical planning problems are characterized by a structured state space, a set of possible actions, a description of the effects of each action, and an objective measure. In this chapter, we consider planning as an optimization problem, seeking plans that minimize the cost of reaching the goals or some other performance measure.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Auer, P., Jaksch, T., Ortner, R.: Near-optimal regret bounds for reinforcement learning. In: Advances in Neural Information Processing Systems (2009) Google Scholar
  2. [2]
    Barto, A., Bradtke, S. J., Singh, S. P.: Learning to act using real-time dynamic programming. Artificial Intelligence 72(1), 81–138 (1995) CrossRefGoogle Scholar
  3. [3]
    Beliaeva, N., Zilberstein, S.: Generating admissible heuristics by abstraction for search in stochastic domains. In: Abstraction, Reformulation and Approximation, pp. 14–29. Springer Berlin / Heidelberg (2005) CrossRefGoogle Scholar
  4. [4]
    Ben-Tal, A., Nemirovski, A.: Selected topics in robust optimization. Mathematical Programming, Series B 112, 125–158 (2008) MATHMathSciNetCrossRefGoogle Scholar
  5. [5]
    Benton, J., van den Briel, M., Kambhampati, S.: A hybrid linear programming and relaxed plan heuristic for partial satisfaction planning problems. In: International Conference on Automated Planning and Scheduling (ICAPS) (2007) Google Scholar
  6. [6]
    Bonet, B., Geffner, H.: Planning as heuristic search. Artificial Intelligence 129(1-2), 5–33 (2001) MATHMathSciNetCrossRefGoogle Scholar
  7. [7]
    Bonet, B., Geffner, H.: Faster heuristic search algorithms for planning under uncertainty and full feedback. In: International Joint Conference on Artificial Intelligence (2003) Google Scholar
  8. [8]
    Bonet, B., Geffner, H.: Labeled RTDP: Improving the convergence of real-time dynamic programming. In: International Conference on Autonomous Planning (ICAPS) (2003) Google Scholar
  9. [9]
    Bonet, B., Geffner, H.: Solving POMDPs: RTDP-Bel vs. point-based algorithms. In: International Joint Conference on Artificial Intelligence (IJCAI) (2009) Google Scholar
  10. [10]
    Brafman, R. I., Tennenholtz, M.: R-MAX -a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3, 213–231 (2002) MathSciNetGoogle Scholar
  11. [11]
    Bylander, T.: A linear programming heuristic for optimal planning. In: National Conference on Artificial Intelligence, pp. 694–699 (1997) Google Scholar
  12. [12]
    Culberson, J. C., Schaeffer, J.: Efficiently searching the 15-puzzle. Tech. rep., Department of Computer Science, University of Alberta (1994) Google Scholar
  13. [13]
    Culberson, J. C., Schaeffer, J.: Searching with pattern databases. In: Advances in Artifical Intelligence, pp. 402–416. Springer Berlin / Heidelberg (1996) Google Scholar
  14. [14]
    Culberson, J. C., Schaeffer, J.: Pattern databases. Computational Intelligence 14(3), 318–334 (1998) MathSciNetCrossRefGoogle Scholar
  15. [15]
    Dinh, H., Russell, A., Su, Y.: On the value of good advice: The complexity of A* search with accurate heuristics. In: AAAI (2007) Google Scholar
  16. [16]
    Drager, K., Fingbeiner, B., Podelski, A.: Directed model checking with distance-preserving abstractions. In: International SPIN Workshop, LNCS, vol. 3925, pp. 19–34 (2006) Google Scholar
  17. [17]
    Dzeroski, S., de Raedt, L., Driessens, K.: Relational reinforcement learning. Machine Learning 43, 7–52 (2001) MATHCrossRefGoogle Scholar
  18. [18]
    Edelkamp, S.: Planning with pattern databases. In: ECP (2001) Google Scholar
  19. [19]
    Edelkamp, S.: Symbolic pattern databases in heuristic search planning. In: AIPS (2002) Google Scholar
  20. [20]
    Edelkamp, S.: Automated creation of pattern database search heuristics. In: Workshop on Model Checking and Artificial Intelligence (2006) Google Scholar
  21. [21]
    Edelkamp, S.: Symbolic shortest paths planning. In: International Conference on Automated Planning and Scheduling (ICAPS) (2007) Google Scholar
  22. [22]
    de Farias, D. P.: The linear programming approach to approximate dynamic programming: Theory and application. Ph.D. thesis, Stanford University (2002) Google Scholar
  23. [23]
    de Farias, D. P., van Roy, B.: On constraint sampling in the linear programming approach to approximate dynamic programming. Mathematics of Operations Research 29(3), 462–478 (2004) MATHMathSciNetCrossRefGoogle Scholar
  24. [24]
    Farias, V., van Roy, B.: Probabilistic and Randomized Methods for Design Under Uncertainty, chap. 6: Tetris: A Study of Randomized Constraint Sampling. Springer-Verlag (2006) Google Scholar
  25. [25]
    Feng, Z., Hansen, E. A., Zilberstein, S.: Symbolic generalization for on-line planning. In: Uncertainty in Artificial Intelligence (UAI), pp. 209–216 (2003) Google Scholar
  26. [26]
    Fern, A., Yoon, S., Givan, R.: Approximate policy iteration with a policy language bias: Solving relational Markov decision processes. Journal of Artificial Intelligence Research (JAIR) 25, 85–118 (2006) MathSciNetGoogle Scholar
  27. [27]
    Fikes, R. E., Nilsson, N. J.: STRIPS: A new approach to the application of theorem proving to problem solving. Artificial Intelligence 2(189-208), 189–208 (1971) MATHCrossRefGoogle Scholar
  28. [28]
    Gaschnig, J.: Ph.D. thesis, Carnegie-Mellon University (1979) Google Scholar
  29. [29]
    Gerevini, A., Long, D.: Plan constraints and preferences in PPDL3. Tech. rep., Dipartimento di Elettronica per l’Automazione, Universita degli Studi di Brescia (2005) Google Scholar
  30. [30]
    Ghallab, M., Nau, D., Traverso, P.: Automated Planning: Theory and Practice. Morgan Kaufmann (2004) MATHGoogle Scholar
  31. [31]
    Goldfarb, D., Iyengar, G.: Robust convex quadratically constrained programs. Mathematical Programming 97, 495–515 (2003) MATHMathSciNetCrossRefGoogle Scholar
  32. [32]
    Hansen, E. A., Zhou, R.: Anytime heuristic search. Journal of Artificial Intelligence Research 28, 267–297 (2007) MATHMathSciNetGoogle Scholar
  33. [33]
    Haslum, P., Bonet, B., Geffner, H.: New admissible heuristics for domain-independent planning. In: National Conference on AI (2005) Google Scholar
  34. [34]
    Haslum, P., Botea, A., Helmert, M., Bonet, B., Koenig, S.: Domain-independent construction of pattern database heuristics for cost-optimal planning. In: National Conference on Artificial Intelligence (2007) Google Scholar
  35. [35]
    Helmert, M., Mattmuller, R.: Accuracy of admissible heuristic functions in selected planning domains. In: National Conference on Artificial Intelligence (2008) Google Scholar
  36. [36]
    Helmert, M., Roger, G.: How good is almost perfect. In: National Conference on AI (2008) Google Scholar
  37. [37]
    Holte, R. C., Grajkowski, J., Tanner, B.: Hierarchical heuristic search revisited. In: Abstraction, Reformulation and Approximation, pp. 121–133. Springer Berlin / Heidelberg (2005) CrossRefGoogle Scholar
  38. [38]
    Holte, R. C., Mkadmi, T., Zimmer, R., MacDonald, A.: Speeding up problem solving by abstraction: a graph oriented approach. Artificial Intelligence 85, 321–361 (1996) CrossRefGoogle Scholar
  39. [39]
    Holte, R. C., Perez, M., Zimmer, R., MacDonald, A.: Hierarchical A*: Searching abstraction hierarchies efficiently. In: National Conference on Artificial Intelligence (AAAI), pp. 530–535 (1996) Google Scholar
  40. [40]
    Kautz, H. A., Selman, B.: Pushing the envelope: Planning, propositional logic, and stochastic search. In: National Conference on Artificial Intelligence (AAAI) (1996) Google Scholar
  41. [41]
    Kearns, M., Singh, S.: Near-polynomial reinforcement learning in polynomial time. Machine Learning 49, 209–232 (2002) MATHCrossRefGoogle Scholar
  42. [42]
    Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: European Conference on Machine Learning (ECML) (2006) Google Scholar
  43. [43]
    Korf, R.: Depth-first iterative deepening: An optimal admissible tree search. Artificial Intelligence 27(1), 97–109 (1985) MATHMathSciNetCrossRefGoogle Scholar
  44. [44]
    Korf, R. E.: Real-time heuristic search. In: National Conference on AI (AAAI) (1988) Google Scholar
  45. [45]
    Lagoudakis, M. G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003) MathSciNetGoogle Scholar
  46. [46]
    Laird, J. E., Rosenbloom, P. S., Newell, A.: Chunking in SOAR: The anatomy of of a general learning mechanism. Machine Learning 1, 11–46 (1986) Google Scholar
  47. [47]
    Leckie, C., Zuckerman, I.: Inductive learning of search control rules for planning. Artificial Intelligence 101(1-2), 63–98 (1998) MATHCrossRefGoogle Scholar
  48. [48]
    McMahan, H. B., Likhachev, M., Gordon, G. J.: Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In: International Conference on Machine Learning (ICML) (2005) Google Scholar
  49. [49]
    Mercier, L., van Hentenryck, P.: Performance analysis of online anticipatory algorithms for large multistage stochastic integer programs. In: International Joint Conference on AI, pp. 1979–1985 (2007) Google Scholar
  50. [50]
    Minton, S., Knoblock, C., Kuokka, D. R., Gil, Y., Joseph, R. L., Carbonell, J. G.: PRODIGY 2.0: The manual and tutorial. Tech. rep., Carnegie Mellon University (1989) Google Scholar
  51. [51]
    Munos, R.: Error bounds for approximate policy iteration. In: International Conference on Machine Learning, pp. 560–567 (2003) Google Scholar
  52. [52]
    Nilsson, N.: Problem-Solving Methods in Artificial Intelligence. McGraw Hill (1971) Google Scholar
  53. [53]
    Pearl, J.: Heuristics: Intelligent search strategies for computer problem solving. Addison-Wesley, Reading, MA (1984) Google Scholar
  54. [54]
    Petrik, M., Zilberstein, S.: Learning heuristic functions through approximate linear programming. In: International Conference on Automated Planning and Scheduling (ICAPS), pp. 248–255 (2008) Google Scholar
  55. [55]
    Pohl, I.: Heuristic search viewed as path finding in a graph. Artificial Intelligence 1, 193–204 (1970) MATHMathSciNetCrossRefGoogle Scholar
  56. [56]
    Pohl, I.: Practical and theoretical considerations in heuristic search algorithms. Machine Intelligence 8, 55–72 (1977) MathSciNetGoogle Scholar
  57. [57]
    Powell, W. B.: Approximate Dynamic Programming. Wiley-Interscience (2007) MATHCrossRefGoogle Scholar
  58. [58]
    Puterman, M. L.: Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons, Inc. (2005) MATHGoogle Scholar
  59. [59]
    Reinefeld, A.: Complete solution of the eight-puzzle and the benefit of node ordering in IDA*. In: International Joint Conference on AI, pp. 248–253 (1993) Google Scholar
  60. [60]
    Rintanen, J.: An iterative algorithm for synthesizing invariants. In: National Conference on Artificial Intelligence (AAAI) (2000) Google Scholar
  61. [61]
    Russell, S., Norvig, P.: Artificial Intelligence A Modern Approach, 2nd edn. Prentice Hall (2003) Google Scholar
  62. [62]
    Sacerdott, E.: Planning in a hierarchy of abstraction spaces. Artificial Intelligence 5(2), 115–135 (1974) CrossRefGoogle Scholar
  63. [63]
    Samuel, A.: Some studies in machine learning using the game of checkers. IBM Journal of Research and Development 3(3), 210–229 (1959) CrossRefGoogle Scholar
  64. [64]
    Sanner, S., Goetschalckx, R., Driessens, K., Shani, G.: Bayesian real-time dynamic programming. In: Intenational Joint Conference on Artificial Intelligence (IJCAI) (2009) Google Scholar
  65. [65]
    Smith, T., Simmons, R. G.: Focused real-time dynamic programming. In: National Proceedings in Artificial Intelligence (AAAI) (2006) Google Scholar
  66. [66]
    Sutton, R.S., Barto, A.: Reinforcement Learning. MIT Press (1998) Google Scholar
  67. [67]
    Szita, I., Lorincz, A.: Learning Tetris using the noisy cross-entropy method. Neural Computation 18(12), 2936–2941 (2006) MATHCrossRefGoogle Scholar
  68. [68]
    Thayer, J. T., Ruml, W.: Faster than weighted A*: An optimistic approach to bounded suboptimal search. In: International Conference on Automated Planning and Scheduling (2008) Google Scholar
  69. [69]
    Valtorta, M.: A result on the computational complexity of heuristic estimates for the A* algorithm. Information Sciences 34, 48–59 (1984) MathSciNetCrossRefGoogle Scholar
  70. [70]
    Vanderbei, R. J.: Linear Programming: Foundations and Extensions, 2nd edn. Springer (2001) MATHGoogle Scholar
  71. [71]
    Yang, F., Coulberson, J., Holte, R., Zahavi, U., Felner, A.: A general theory of assitive state space abstraction. Journal of Artificial Intelligence Research 32, 631–662 (2008) MATHMathSciNetGoogle Scholar
  72. [72]
    Yoon, S., Fern, A., Givan, R.: Learning control knowledge for forward search planning. Journal of Machine Learning Research 9, 638–718 (2008) MathSciNetGoogle Scholar
  73. [73]
    Zhang, Z., Sturtevant, N. R., Holte, R., Schaeffer, J., Felner, A.: A* search with inconsistent heuristics. In: International Joint Conference on Artificial Intelligence (IJCAI) (2009) Google Scholar
  74. [74]
    Zimmerman, T., Kambhampati, S.: Learning-assisted automated planning. AI Magazine 24(2), 73–96 (2003) Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Massachusetts AmherstAmherstUSA

Personalised recommendations