Advertisement

Behavioral Hierarchy: Exploration and Representation

  • Andrew G. BartoEmail author
  • George Konidaris
  • Christopher Vigorito
Chapter

Abstract

Behavioral modules are units of behavior providing reusable building blocks that can be composed sequentially and hierarchically to generate extensive ranges of behavior. Hierarchies of behavioral modules facilitate learning complex skills and planning at multiple levels of abstraction and enable agents to incrementally improve their competence for facing new challenges that arise over extended periods of time. This chapter focuses on two features of behavioral hierarchy that appear to be less well recognized: its influence on exploratory behavior and the opportunity it affords to reduce the representational challenges of planning and learning in large, complex domains. Four computational examples are described that use methods of hierarchical reinforcement learning to illustrate the influence of behavioral hierarchy on exploration and representation. Beyond illustrating these features, the examples provide support for the central role of behavioral hierarchy in development and learning for both artificial and natural agents.

Keywords

Reward Function Behavioral Module Primitive Action Causal Graph Active Learning Scheme 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

The authors thank Sridhar Mahadevan, Rod Grupen, and current and former members of the Autonomous Learning Laboratory who have participated in discussing behavioral hierarchy: Bruno Castro da Silva, Will Dabney, Anders Jonsson, Scott Kuindersma, Scott Niekum, Özgür Şimşek, Andrew Stout, Phil Thomas, and Pippin Wolfe. This research has benefitted from Barto’s association with the European Community 7th Framework Programme (FP7/2007–2013), “Challenge 2—Cognitive Systems, Interaction, Robotics”, grant agreement No. ICT-IP-231722, project “IM-CLeVeR—Intrinsically Motivated Cumulative Learning Versatile Robots.” Some of the research described here was supported by the National Science Foundation under Grant No. IIS-0733581 and by the Air Force Office of Scientific Research under grant FA9550-08-1-0418. Any opinions, findings, conclusions, or recommendations expressed here are those of the authors and do not necessarily reflect the views of the sponsors.

References

  1. Alur, R., McDougall, M., Yang, Z. (2002). Exploiting behavioral hierarchy for efficient model checking. In E. Brinksma & K. G. Larsen (Eds.), Computer aided verification: 14th international conference, proceedings (Lecture notes in computer science) (pp. 338–342). Berlin: Springer.CrossRefGoogle Scholar
  2. Amarel, S. (1981). Problems of representation in heuristic problemsolving: related issues in the development ofexpert systems. Technical Report CBM-TR-118, Laboratory for Computer Science, Rutgers University, New Brunswick NJ.Google Scholar
  3. Anderson, J. R. (2004). An integrated theory of mind. Psychological Review, 111, 1036–1060.CrossRefGoogle Scholar
  4. Antsaklis, P. J., & Passino, K. M. (Eds.), (1993). An introduction to intelligent and autonomous control. Norwell MA: Kluwer.zbMATHGoogle Scholar
  5. Bakker, B., & Schmidhuber, J. (2004). Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In F. Groen, N. Amato, A. Bonarini, E. Yoshida, B. Kröse (Eds.), Proceedings of the 8-th conference on intelligent autonomous systems, IAS-8 (pp. 438–445). Amsterdam, The Netherlands: IOS.Google Scholar
  6. Baldassarre, G., & Mirolli, M. (Eds.), (2012). Intrinsically motivated learning in natural and artificial systems. Berlin: Springer.Google Scholar
  7. Barto, A., Singh, S., Chentanez, N. (2004). Intrinsically motivated learning of hierarchical collections of skills. In J. Triesch & T. Jebara (Eds.), Proceedings of the 2004 international conference on development and learning (pp. 112–119). UCSD Institute for Neural Computation.Google Scholar
  8. Barto, A. G. (2012). Intrinsic motivation and reinforcement learning. In G. Baldassarre & M. Miroll (Eds.), Intrinsically motivated learning in natural and artificial system. Berlin: Springer.Google Scholar
  9. Barto, A. G., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamcal Systems: Theory and Applications, 13, 341–379.MathSciNetCrossRefGoogle Scholar
  10. Bellman, R. E. (1957). Dynamic programming. Princeton: Princeton University Press.zbMATHGoogle Scholar
  11. Bernstein, D. S. (1999). Reusing old policies to accelerate learning on new MDPs. Technical Report Technical Report UM-CS-1999-026, Department of Computer Science, University of Massachusetts Amherst.Google Scholar
  12. Botvinick, M. M., & Plaut, D. C. (2004). Doing without schema hierarchies: a recurrent connectionist approach to normal and impaired routine sequential action. Psychological Review, 111, 395–429.CrossRefGoogle Scholar
  13. Botvinick, M. M., Niv, Y., Barto, A. G. (2009). Hierarchically organized behavior and its neural foundations: a reinforcement-learning perspective. Cognition, 113, 262–280.CrossRefGoogle Scholar
  14. Boutilier, C., Dearden, R., Goldszmdt, M. (2000). Stochastic dynamic programming with factored representations. Artificial Intelligence, 121, 49–107.MathSciNetCrossRefzbMATHGoogle Scholar
  15. Buntine, W. (1991). Theory refinement on Bayesian networks. In B. D’Ambrosio & P. Smets (Eds.), UAI ’91: proceedings of the seventh annual conference on uncertainty in artificial intelligence (pp. 52–60). San Francisco: Morgan Kaufmann.Google Scholar
  16. Burridge, R. R., Rizzi, A. A., Koditschek, D. E. (1999). Sequential composition of dynamically dextrous robot behaviors. International Journal of Robotics Research, 18, 534–555.CrossRefGoogle Scholar
  17. Callebaut, W. (2005). The ubiquity of modularity. In W. Callebaut & D. Rasskin-Gutman (Eds.), Modularity: understanding the development and evolution of natural complex systems (pp. 3–28). Cambridge: MIT.Google Scholar
  18. Callebaut, W., & Rasskin-Gutman, D. (Eds.) (2005). Modularity: understanding the development and evolution of natural complex systems. Cambridge: MIT.Google Scholar
  19. da Silva, B. C., Konidaris, G., & Barto, A. G. (2012). Learning parameterized skills. In J. Langford & J. Pineau (Eds.), Machine learning, proceedings of the 29th international conference (ICML 2012) (pp. 1679–1686). Omnipress: Edinburgh.Google Scholar
  20. Dean, T. L., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. Computational Intelligence, 5, 142–150.CrossRefGoogle Scholar
  21. Degris, T., Sigaud, O., Wuillemin, P. H. (2006). Learning the structure of factored Markov decision processes in reinforcement learning problems. In W. W. Cohen & A. Moore (Eds.), Machine learning, proceedings of the twenty-third international conference (ICML 2006). ACM international conference proceeding series (vol. 148, pp. 257–264). New York: ACM.Google Scholar
  22. Dietterich, T. G. (2000a). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13, 227–303.MathSciNetzbMATHGoogle Scholar
  23. Dietterich, T. G. (2000b). State abstraction in MAXQ hierarchical reinforcement learning. In S. A. Solla, T. K. Leen, K.-R. Müller (Eds.), Advances in neural information processing systems 12 (pp. 994–1000). Cambridge: MIT.Google Scholar
  24. Digney, B. (1996). Emergent hierarchical control structures: learning reactive/hierarchical relationships inreinforcement environments. In P. Meas, M. Mataric, J.-A. Meyer, J. Pollack, S. W. Wilson (Eds.), From animals to animats 4: proceedings of the fourth international conference on simulation of adaptive behavior (pp. 363–372). Cambridge: MIT.Google Scholar
  25. Diuk, C., Li, L., Leffler, B. (2009). The adaptive k-meteorologists problems and its application to structure learning and feature selection in reinforcement learning. In A. P. Danyluk, L. Bottou, M. L. Littman (Eds.), Proceedings of the 26th annual international conference on machine learning, ICML 2009. ACM international conference proceeding series (vol. 382, pp. 249–256). New York: ACM.Google Scholar
  26. Fikes, R. E., Hart, P. E., Nilsson, N. J. (1972). Learning and executing generalized robot plans. Artificial Intelligence, 3, 251–288.CrossRefGoogle Scholar
  27. Friedman, N., Murphy, K., Russell, S. (1998). Learning the structure of dynamic probabilistic networks. In G. F. Cooper & S. Moral (Eds.), UAI ’98: proceedings of the fourteenth conference on uncertainty in artificial intelligence (pp. 139–147). San Francisco: Morgan Kaufmann.Google Scholar
  28. Guestrin, C., Koller, D., Gearhart, C., Kanodia, N. (2003). Generalizing plans to new environments in relational MDPs. In IJCAI-03, Proceedings of the eighteenth international joint conference on artificial intelligence (pp. 1003–1010). San Francisco: Morgan Kaufmann.Google Scholar
  29. Hart, S., & Grupen, R. (2011). Learning generalizable control programs. IEEE Transactions on Autonomous Mental Development, 3, 216–231. Special Issue on Representations and Architectures for Cognitive Systems.Google Scholar
  30. Hart, S., & Grupen, R. (2012). Intrinsically motivated affordance discovery and modeling. In G. Baldassarre & M. Mirolli (Eds.), Intrinsically motivated learning in natural and artificial systems. Berlin: Springer.Google Scholar
  31. Heckerman, D., Geiger, D., Chickering, D. (1995). Learning Bayesian networks: the combination of knowledge and statistical data. Machine Learning, 20, 197–243.zbMATHGoogle Scholar
  32. Hengst, B. (2002). Discovering hierarchy in reinforcement learning with HEXQ. In C. Sammut & A. G. Hoffmann (Eds.), Machine learning, proceedings of the nineteenth international conference (ICML 2002) (pp. 243–250). San Francisco: Morgan Kaufmann.Google Scholar
  33. Huber, M., & Grupen, R. A. (1997). A feedback control structure for on-line learning tasks. Robotics and Autonomous Systems, 22, 303–315.CrossRefGoogle Scholar
  34. Gibson, J. (1977). The theory of affordances. In R. Shaw & J. Bransford (Eds.), Perceiving, acting, and knowing: toward an ecological psychology (pp. 67–82). Hillsdale: Lawrence Erlbaum.Google Scholar
  35. Jonsson, A., & Barto, A. G. (2002). Automated state abstraction for options using the U-tree algorithm. In T. G. Dietterich, S. Becker, Z. Ghahramani (Eds.), Advances in neural information processing systems 14: proceedings of the 2001 neural information processing systems (NIPS) conference (pp. 1054–1060). Cambridge: MIT.Google Scholar
  36. Jonsson, A., & Barto, A. G. (2006). Causal graph based decomposition of factored mdps. Journal of Machine Learning Research, 7, 2259–2301.MathSciNetzbMATHGoogle Scholar
  37. Jonsson, A., & Barto, A. G. (2007). Active learning of dynamic Bayesian networks in Markov decision processes. In I. Miguel & W. Rumi (Eds.), Proceedings of Abstraction, reformulation, and approximation, 7th international symposium, SARA 2007, Whistler, Canada, July 18–21, 2007. Lecture notes in computer science: abstraction, reformulation, and approximation (vol. 4612, pp. 273–284). Berlin: Springer.Google Scholar
  38. Konidaris, G., & Barto, A. (2007). Building portable options: Skill transfer in reinforcement learning. In M. Veloso (Ed.), IJCAI 2007, proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, 6–12 January 2007 (pp. 895–900). Menlo Park: AAAI Press.Google Scholar
  39. Konidaris, G., & Barto, A. (2009a). Efficient skill learning using abstraction selection. In C. Boutilier (Ed.), IJCAI 2009, Proceedings of the 21st international joint conference on artificial intelligence, Pasadena, California, USA, 11–17 July 2009 (pp. 1107–1112). Menlo Park: AAAI Press.Google Scholar
  40. Konidaris, G., & Barto, A. (2009b). Skill discovery in continuous reinforcement learning domains using skill chaining. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, A. Culotta (Eds.), Proceedings of the 2009 conference of Advances in neural information processing systems 22 (pp. 1015–1023). NIPS Foundation.Google Scholar
  41. Konidaris, G., Barto, A., Scheidwasser, I. (2012a). Transfer in reinforcement learning via shared features. Journal of Machine Learning Research, 13, 1333–1371.MathSciNetGoogle Scholar
  42. Konidaris, G., Kuindersma, S., Grupen, R., Barto, A. (2011a). Autonomous skill acquisition on a mobile manipulator. In W. Burgard & D. Roth (Eds.), Proceedings of the twenty-fifth AAAI conference on artificial intelligence, AAAI 2011 (pp. 1468–1473). San Francisco: AAAI.Google Scholar
  43. Konidaris, G., Kuindersma, S., Grupen, R., Barto, A. (2012b). Robot learning from demonstration by constructing skill trees. The International Journal of Robotics Research, 31, 360–375.CrossRefGoogle Scholar
  44. Konidaris, G., Osentoski, S., Thomas, P. (2011b). Value function approximation in reinforcement learning using the Fourier basis. In W. Burgard & D. Roth (Eds.), Proceedings of the twenty-fifth AAAI conference on artificial intelligence, AAAI 2011 (pp. 380–385). San Francisco: AAAI.Google Scholar
  45. Konidaris, G. D. (2011). Autonomous robot skill acquisition. PhD thesis, Computer Science, University of Massachusetts Amherst.Google Scholar
  46. Korf, R. E. (1985). Learning to solve problems by searching for macro-operators. Boston: Pitman.zbMATHGoogle Scholar
  47. Langley, P., Choi, D., Rogers, S. (2009). Acquisition of hierarchical reactive skills in a unified cognitive architecture. Cognitive Systems Research, 10, 316–332.CrossRefGoogle Scholar
  48. Langley, P., & Rogers, S. (2004). Cumulative learning of hierarchical skills. In J. Triesch & T. Jebara (Eds.), Proceedings of the 2004 international conference on development and learning (pp. 1–8). UCSD Institute for Neural Computation.Google Scholar
  49. Lashley, K. S. (1951). The problem of serial order in behavior. In L. A. Jeffress (Ed.), Cerebral mechanisms in behavior: the Hixon symposium (pp. 112–136). New York: Wiley.Google Scholar
  50. Lewis, F. L., & Vrabie, D. (2009). Reinforcement learning and adaptive dynamic programming for feedback control. In IEEE circuits and systems magazine (vol. 9, pp. 32–50). IEEE Circuits and Systems Society.Google Scholar
  51. Li, L., Walsh, T., Littman, M. (2006). Towards a unified theory of state abstraction for MDPs. In International symposium on artificial intelligence and mathematics (ISAIM 2006), Fort Lauderdale, Florida, USA, 4–6 January 2006.Google Scholar
  52. Liu, Y., & Stone, P. (2006). Value-function-based transfer for reinforcement learning using structure mapping. In Proceedings, the twenty-first national conference on artificial intelligence and the eighteenth innovative applications of artificial intelligence conference (pp. 415–420). San Francisco: AAAI.Google Scholar
  53. Mahadevan, S. (2009). Learning representation and control in Markov decision processes: new frontiers. Foundations and trends in machine learning (vol. 1). Hanover: Now Publishers Inc.Google Scholar
  54. Mannor, S., Menache, I., Hoze, A., Klein, U. (2004). Dynamic abstraction in reinforcement learning via clustering. In C. E. Brodley (Ed.), Machine learning, proceedings of the twenty-first international conference (ICML 2004). ACM international conference proceeding series (vol. 69, pp. 560–567). New York: ACM.Google Scholar
  55. McCallum, A. K. (1996). Reinforcement learning with selective perception and hidden state. PhD thesis, University of Rochester.Google Scholar
  56. McGovern, A., & Barto, A. (2001). Automatic discovery of subgoals in reinforcement learning using diverse density. In C. E. Brodley & A. P. Danyluk (Eds.), Proceedings of the eighteenth international conference on machine learning (ICML 2001) (pp. 361–368). San Francisco: Morgan Kaufmann.Google Scholar
  57. Mehta, N., Natarajan, S., Tadepalli, P. (2008). Transfer in variable-reward hierarchical reinforcement learning. Machine Learning, 73, 289–312.CrossRefGoogle Scholar
  58. Menache, I., Mannor, S., Shimkin, N. (2002). Q-Cut – Dynamic discovery of sub-goals in reinforcement learning. In Machine learning: ECML 2002, 13th European conference on machine learning. Lecture notes in computer science (vol. 2430, pp. 295–306). Berlin: Springer.Google Scholar
  59. Miller, G. A., Galanter, E., Pribram, K. H. (1960). Plans and the structure of behavior. New York: Holt, Rinehart & Winston.CrossRefGoogle Scholar
  60. Mugan, J., & Kuipers, B. (2009). Autonomously learning an action hierarchy using a learned qualitative state representation. In C. Boutilier (Ed.), IJCAI 2009, Proceedings of the 21st international joint conference on artificial intelligence, Pasadena, California, USA, 11–17 July 2009 (pp. 1175–1180). Menlo Park: AAAI Press.Google Scholar
  61. Murphy, K. (2001). Active learning of causal Bayes net structure. Technical report, Computer Science Division, University of California, Berkeley CA.Google Scholar
  62. Neumann, G., Maass, W., Peters, J. (2009). Learning complex motions by sequencing simpler motion templates. In A. P. Danyluk, L. Bottou, M. L. Littman (Eds.), Proceedings of the 26th annual international conference on machine learning, ICML 2009. ACM international conference proceeding series (vol. 382, pp. 753–760). New York: ACM.Google Scholar
  63. Newell, A., Shaw, J. C., Simon, H. A. (1963). GPS, a program that simulates human thought. In J. Feldman (Ed.), Computers and thought (pp. 279–293). New York: McGraw-Hill.Google Scholar
  64. Niekum, S., & Barto, A. G. (2011). Clustering via Dirichlet process mixture models for portable skill discovery. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, K. Weinberger (Eds.), Advances in neural information processing systems 24 (NIPS) (pp. 1818–1826). Curran Associates.Google Scholar
  65. Osentoski, S., & Mahadevan, S. (2010). Basis function construction for hierarchical reinforcement learning. In W. van der Hoek, G. A. Kaminka, Y. Lespérance, M. Luck, S. Sen (Eds.), 9th international conference on autonomous agents and multiagent systems (AAMAS 2010) (pp. 747–754). International Foundation for Autonomous Agents and MultiAgent Systems (IFAAMAS).Google Scholar
  66. Parr, R. (1998). Hierarchical control and learning for Markov decision processes. PhD thesis, University of California, Berkeley CA.Google Scholar
  67. Parr, R., & Russell, S. (1998). Reinforcement learning with hierarchies of machines. In M. I. Jordan, M. J. Kearns, S. A. Solla (Eds.), Advances in neural information processing systems 10: proceedings of the 1997 conference (pp. 1043–1049). Cambridge: MIT.Google Scholar
  68. Pearl, J. (2000). Causality: models, reasoning, and inference. Cambridge: Cambridge University Press.Google Scholar
  69. Perkins, T. J., & Precup, D. (1999). Using options for knowledge transfer in reinforcement learning. Technical Report UM-CS-1999-034, University of Massachusetts Amherst.Google Scholar
  70. Pickett, M., & Barto, A. G. (2002). PolicyBlocks: an algorithm for creating useful macro-actions in reinforcement learning. In C. Sammut & A. Hoffmann (Eds.), Machine learning, proceedings of the nineteenth international conference (ICML 2002) (pp. 506–513). San Francisco: Morgan Kaufmann.Google Scholar
  71. Ravindran, B., & Barto, A. G. (2002). Model minimization in hierarchical reinforcement learning. In S. Koenig & R. C. Holte (Eds.), Abstraction, reformulation and approximation, 5th international symposium, SARA 2002, Kananaskis, Alberta, Canada, 2–4 August 2002, proceedings. Lecture notes in computer science (vol. 2371, pp. 196–211). Berlin: Springer.Google Scholar
  72. Ryan, R. M., & Deci, E. L. (2000). Intrinsic and extrinsic motivations: classic definitions and new directions. Contemporary Educational Psychology, 25, 54–67.CrossRefGoogle Scholar
  73. Sacerdoti, E. D. (1974). Planning in a hierarchy of abstraction spaces. Artificial Intelligence, 5, 115–135.CrossRefzbMATHGoogle Scholar
  74. Schmidhuber, J. (1991a). Adaptive confidence and adaptive curiosity. Technical Report FKI-149-91, Institut für Informatik, Technische Universität München, Arcisstr. 21, 800 München 2, Germany.Google Scholar
  75. Schmidhuber, J. (1991b). A possibility for implementing curiosity and boredom in model-building neural controllers. In J.-A. Meyer & S. W. Wilson (Eds.), From animals to animats: proceedings of the first international conference on simulation of adaptive behavior (complex adaptive systems) (pp. 222–227). Cambridge: MIT.Google Scholar
  76. Schneider, D. W., & Logan, G. D. (2006). Hierarchical control of cognitive processes: switching tasks in sequences. Journal of Experimental Psychology: General, 135, 623–640.CrossRefGoogle Scholar
  77. Simon, H. A. (1996). The sciences of the artificial, 3rd edn. Cambridge: MIT.Google Scholar
  78. Simon, H. A. (2005). The structure of complexity in an evolving world: the role of near decomposability. In W. Callebaut & D. Rasskin-Gutman (Eds.), Modularity: understanding the development and evolution of natural complex systems (pp. ix–xiii). Cambridge: MIT.Google Scholar
  79. Şimşek, Ö., & Barto, A. (2004). Using relative novelty to identify useful temporal abstractions in reinforcement learning. In C. E. Brodley (Ed.), Machine learning, proceedings of the twenty-first international conference (ICML 2004) ACM international conference proceeding series (vol. 69, pp. 751–758). New York: ACM.Google Scholar
  80. Şimşek, Ö., & Barto, A. (2009). Skill characterization based on betweenness. In D. Koller, D. Schuurmans, Y. Bengio, L. Bottou (Eds.), Advances in neural information processing systems 21, Proceedings of the twenty-second annual conference on neural information processing systems (pp. 1497–1504). Red Hook: Curran Associates, Inc.Google Scholar
  81. Şimşek, Ö., Wolfe, A. P., Barto, A. (2005). Identifying useful subgoals in reinforcement learning by local graph partitioning. In L. D. Raedt & S. Wrobel (Eds.), Machine learning, proceedings of the twenty-second international conference (ICML 2005) ACM international conference proceeding series (vol. 119, pp. 816–823). New York: ACM.Google Scholar
  82. Singh, S., Barto, A. G., Chentanez, N. (2005). Intrinsically motivated reinforcement learning. In L. K. Saul, Y. Weiss, L. Bottou (Eds.), Advances in neural information processing systems 17: proceedings of the 2004 conference (pp. 1281–1288). Cambridge: MIT.Google Scholar
  83. Singh, S., Lewis, R. L., Barto, A. G., Sorg, J. (2010). Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Transactions on Autonomous Mental Development, 2, 70–82. Special issue on Active Learning and Intrinsically Motivated Exploration in Robots: Advances and Challenges.Google Scholar
  84. Soni, V., & Singh, S. (2006). Reinforcement learning of hierarchical skills on the Sony Aibo robot. In L. Smith, O. Sporns, C. Yu, M. Gasser, C. Breazeal, G. Deak, J. Weng (Eds.), Fifth international conference on development and learning (ICDL). Bloomington IN.Google Scholar
  85. Steck, H., & Jaakkola, T. (2002). Unsupervised active learning in large domains. In A. Darwiche & N. Friedman (Eds.), UAI ’02, Proceedings of the 18th conference in uncertainty in artificial intelligence (pp. 469–476). San Francisco: Morgan Kaufmann.Google Scholar
  86. Strehl, A. L., Diuk, C., Littman, M. L. (2007). Efficient structure learning in factored-state MDPs. In Proceedings of the twenty-second AAAI conference on artificial intelligence. San Francisco: AAAI.Google Scholar
  87. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT.Google Scholar
  88. Sutton, R. S., Precup, D., Singh, S. (1999). Between MDPs and semi-MDPs: a framework for temporal abstraction inreinforcement learning. Artificial Intelligence, 112, 181–211.MathSciNetCrossRefzbMATHGoogle Scholar
  89. Taylor, M. E., & Stone, P. (2009). Transfer learning for reinforcement learning domains: a survey. Journal of Machine Learning Research, 10, 1633–1685.MathSciNetzbMATHGoogle Scholar
  90. Taylor, M. E., Stone, P., Liu, Y. (2007). Transfer learning via inter-task mappings for temporal difference learning. Journal of Machine Learning Research, 8, 2125–2167.MathSciNetzbMATHGoogle Scholar
  91. Tedrake, R. (2010). LQR-Trees: feedback motion planning on sparse randomized trees. In J. Trinkle, Y. Matsuoka, J. A. Castellanos (Eds.), Robotics: science and systems V: proceedings of the fifth annual robotics: science and systems conference (pp. 17–24). Cambridge: MIT.Google Scholar
  92. Tedrake, R., Zhang, T. W., Seung, H. S. (2004). Stochastic policy gradient reinforcement learning on a simple 3D biped. In Proceedings of the IEEE international conference on intelligent robots and systems (IROS) (vol. 3, pp. 2849–2854). Japan: Sendai.Google Scholar
  93. Tesauro, G. J. (1994). TD–gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6, 215–219.CrossRefGoogle Scholar
  94. Thrun, S. B., & Schwartz, A. (1995). Finding structure in reinforcement learning. In G. Tesauro, D. S. Touretzky, T. Leen (Eds.), Advances in neural information processing systems 7: proceedings of the 1994 conference (pp. 385–392). Cambridge: MIT.Google Scholar
  95. Tong, S., & Koller, D. (2001). Active learning for structure in Bayesian networks. In B. Nebel (Ed.), Proceedings of the seventeenth international joint conference on artificial intelligence, IJCAI 2001 (pp. 863–869). San Francisco: Morgan Kaufmann.Google Scholar
  96. Torrey, L., Shavlik, J., Walker, J., Maclin, R. (2008). Relational macros for transfer in reinforcement learning. In H. Blockeel, J. Ramon, J. Shavlik, P. Tadepalli (Eds.), Inductive logic programming 17th international conference, ILP 2007. Lecture notes in computer science (vol. 4894, pp. 254–268). Berlin: Springer.Google Scholar
  97. van Seijen, H., Whiteson, S., Kester, L. (2007). Switching between representations in reinforcement learning. In R. Babuska & F. C. A. Groen (Eds.), Interactive collaborative information systems. Studies in computational intelligence (vol. 281, pp. 65–84). Berlin: Springer.Google Scholar
  98. Vigorito, C., & Barto, A. G. (2010). Intrinsically motivated hierarchical skill learning in structured environments. IEEE Transactions on Autonomous Mental Development, 2, 83–90. Special issue on Active Learning and Intrinsically Motivated Exploration in Robots: Advances and Challenges.Google Scholar
  99. Waterman, D. A., & Hayes-Roth, F. (1978). Pattern-directed inference systems. New York: Academic.zbMATHGoogle Scholar
  100. White, R. W. (1959). Motivation reconsidered: the concept of competence. Psychological Review, 66, 297–333.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Andrew G. Barto
    • 1
    Email author
  • George Konidaris
    • 2
  • Christopher Vigorito
    • 1
  1. 1.School of Computer ScienceUniversity of Massachusetts AmherstAmherstUSA
  2. 2.Computer Science and Artificial Intelligence LaboratoryMassachusetts Institute of TechnologyCambridgeUSA

Personalised recommendations