Skip to main content

Abstraction and Knowledge Transfer in Reinforcement Learning

  • Chapter
  • First Online:
Qualitative Spatial Abstraction in Reinforcement Learning

Part of the book series: Cognitive Technologies ((COGTECH))

  • 879 Accesses

Abstract

This chapter presents the state of the art in research on reinforcement learning with a focus on abstraction and transfer learning. Especially, the open questions of performance in large, continuous state spaces and knowledge transfer are worked out as central challenges of reinforcement learning with regard to applications in Sect. 3.1. In the following, three approaches to tackle these problems are investigated: value function approximation (Sect. 3.2), temporal abstraction (Sect. 3.3), and spatial abstraction (Sect. 3.4).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Al-Ansari, M.A., Williams, R.J.: Robust, efficient, globally-optimized reinforcement learning with the parti-game algorithm. In: Kearns, M.S., Solla, S.A., Cohn, D.A. (eds.) Advances in Neural Information Processing Systems: Proceedings of the 1998 Conference, pp. 961–967. MIT Press, Cambridge, MA (1999)

    Google Scholar 

  • Albus, J.S.: Brain, Behavior, and Robotics. Byte Books, Peterborough, NH, USA (1981)

    Google Scholar 

  • Baird, L.: Residual algorithms: Reinforcement learning with function approximation. In: Proceedings of the Twelfth International Conference on Machine Learning (ICML), pp. 30–37. Morgan Kaufmann, San Francisco, CA (1995)

    Google Scholar 

  • Baum, L.E., Sell, G.R.: Growth functions for transformations on manifolds. Pacific Journal of Mathematics 27(2), 211–227 (1968)

    MathSciNet  MATH  Google Scholar 

  • Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton, NJ (1957)

    MATH  Google Scholar 

  • Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: Safely approximating the value function. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems: Proceedings of the 1994 Conference, vol. 7, pp. 369–376. MIT Press, Cambridge, MA (1995)

    Google Scholar 

  • Braga, A.P.S., Ara´ujo, A.F.R.: A topological reinforcement learning agent for navigation. NeuralComputing and Applications 12, 220–236 (2003)

    Article  Google Scholar 

  • Dean, T., Givan, R.: Model minimization in Markov decision processes. In: Proceedings of the 14th National Conference on Artificial Intelligence (AAAI), pp. 106–111. Providence, RI, USA (1997)

    Google Scholar 

  • Dietterich, T.G.: The MAXQ method for hierarchical reinforcement learning. In: Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pp. 118–126. Morgan Kaufmann (1998)

    Google Scholar 

  • Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. International Journal of Artificial Intelligence Research 13, 227–303 (2000a)

    MathSciNet  MATH  Google Scholar 

  • Dietterich, T.G.: State abstraction in MAXQ hierarchical reinforcement learning,. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) Advances in Neural Information Processing Systems 12: Procedings of the 1999 Conference, pp. 994–1000. MIT Press (2000b)

    Google Scholar 

  • Fern´andez, F., Veloso, M.: Probabilistic policy reuse in a reinforcement learning agent. In: Proceedingsof the Fifth International Joint Conference on Autonomous Agents and MultiagentSystems (AAMAS), pp. 720–727. Hakodate, Japan (2006)

    Google Scholar 

  • Gabel, T., Hafner, R., Lange, S., Lauer, M., Riedmiller, M.: Bridging the gap: Learning in theRoboCup simulation and midsize league. In: Proceedings of the 7th Portuguese Conference onAutomatic Control (Controlo 2006). Lisbon, Portugal (2006)

    Google Scholar 

  • Gabel, T., Riedmiller, M.: CBR for state value function approximation in reinforcement learning. In: Proceedings of the Sixteenth European Conference on Machine Learning (ECML), pp. 206–221. Bonn, Germany (2005)

    Google Scholar 

  • Glaubius, R., Namihira, M., Smart, W.D.: Speeding up reinforcement learning using manifoldrepresentations: Preliminary results. In: Proceedings of the IJCAI Workshop “Reasoning withUncertainty in Robotics”. Edinburgh, Scotland (2005)

    Google Scholar 

  • Glaubius, R., Smart, W.D.: Manifold representations for value-function approximation. In: Proceedings of the AAAI Workshop on Markov Decision Processes. San Jose, CA (2004)

    Google Scholar 

  • Howard, R.A.: Dynamic Probabilistic Systems, Volume II: Semi-Markov and Decision Processes. Dover Publications (1971)

    Google Scholar 

  • Kaelbling, L.P., Littmann, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)

    Google Scholar 

  • Kirchner, F.: Q-learning of complex behaviors on a six-legged walking machine. Journal of robotics and autonomous systems 25, 256–263 (1998)

    Article  MathSciNet  Google Scholar 

  • Konidaris, G.D.: A framework for transfer in reinforcement learning. In: Proceedings of the ICML Workshop on Structural Knowledge Transfer for Machine Learning. Pittsburgh, PA, USA (2006)

    Google Scholar 

  • Likhachev, M., Koenig, S.: Speeding up the parti-game algorithm. In: Becker, S., Thrun, S., Obermayer,K. (eds.) Advances in Neural Information Processing Systems: Proceedings of the 2002 Conference, pp. 1563–1570. MIT Press, Cambridge, MA (2003)

    Google Scholar 

  • Liu, Y., Stone, P.: Value-function-based transfer for reinforcement learning using structure mapping. In: Proceedings of the National Conference on Artificial Intelligence (AAAI). Boston, MA (2006)

    Google Scholar 

  • Mahadevan, S.: Samuel meets Amarel: Automatic value function approximation using global statespace analysis. In: Proceedings of the National Conference on Artificial Intelligence (AAAI),pp. 877–917. Pittsburgh, PA (2005)

    Google Scholar 

  • Mahadevan, S., Maggioni, M.: Value function approximation with diffusion wavelets and Laplacianeigenfunctions. In: Weiss, Y., Schölkopf, B., Platt, J. (eds.) Advances in Neural InformationProcessing Systems: Proceedings of the 2005 Conference, vol. 18, pp. 843–850. MITPress, Cambridge, MA (2006)

    Google Scholar 

  • McCallum, A.K.: Reinforcement learning with selective perception and hidden state. Ph.D. thesis, Department of Computer Science, University of Rochester, NY (1995)

    Google Scholar 

  • McGovern, A., Barto, A.G.: Autonomous discovery of temporal abstract actions from interaction with an environment. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML), pp. 361–368. Morgan Kaufmann, San Francisco, CA (2001)

    Google Scholar 

  • Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103–130 (1993)

    Google Scholar 

  • Moore, A.W., Atkeson, C.G.: The parti-game algorithm for variable resolution reinforcement learningin multidimensional state-spaces. Machine Learning 21(3), 199–233 (1995)

    Google Scholar 

  • Munos, R., Moore, A.: Variable resolution discretizations for high-accuracy solutions of optimal control problems. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence(IJCAI), pp. 1348–1355. Stockholm, Sweden (1999)

    Google Scholar 

  • Parr, R., Russell, S.J.: Reinforcement learning with hierarchies of machines. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems 10: Procedingsof the 1997 Conference, pp. 1043–1049. MIT Press (1998)

    Google Scholar 

  • Petrik, M.: An analysis of Laplacian methods for value function approximation in MDPs. In: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI), pp. 2574–2579. Hyderabad, India (2007)

    Google Scholar 

  • Porta, J.M., Celaya, E.: Reinforcement learning for agents with many sensors and actuators acting in categorizable environments. Journal of Artificial Intelligence Research 23, 79–122 (2005)

    MATH  Google Scholar 

  • Powell, M.J.D.: Radial basis functions for multivariate interpolation: A review. In: Mason, J.C., Cox, M.G. (eds.) Algorithms for Approximation, The Institute of Mathematics and its Applications Conference Series, pp. 143–167. Clarendon Press, Oxford (1987)

    Google Scholar 

  • Precup, D.: Temporal abstraction in reinforcement learning. Ph.D. thesis, Department of ComputerScience, University of Massachusetts, Amherst MA (2000)

    Google Scholar 

  • Ravindran, B.: An algebraic approach to abstraction in reinforcement learning. Ph.D. thesis, Departmentof Computer Science, University of Massachusetts, Amherst MA (2004)

    Google Scholar 

  • Ravindran, B., Barto, A.G.: SMDP homomorphisms: An algebraic approach to abstraction in semi-Markov decision processes. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI), pp. 1011–1018. Acapulco, Mexico (2003)

    Google Scholar 

  • Reynolds, S.I.: Adaptive resolution model-free reinforcement learning: Decision boundary partitioning. In: Proceedings of the Seventeenth International Conference on Machine Learning(ICML). Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  • Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement learning with soft state aggregation. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems: Proceedings of the 1994 Conference, vol. 7. MIT Press, Cambridge, MA (1995)

    Google Scholar 

  • Smart, W.D., Kaelbling, L.P.: Practical reinforcement learning in continuous spaces. In: Proceedingsof the Seventeenth International Conference on Machine Learning (ICML), pp. 903–910. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2000)

    Google Scholar 

  • Soni, V., Singh, S.: Using homomorphisms to transfer options across continuous reinforcementlearning domains. In: Proceedings of the National Conference on Artificial Intelligence(AAAI), pp. 494–499. Boston, MA (2006)

    Google Scholar 

  • Sutton, R.S.: Generalization in reinforcement learning: Successful examples using sparse tile coding. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural InformationProcessing Systems: Proceedings of the 1995 Conference, vol. 8, pp. 1038–1044. MIT Press, Cambridge, MA (1996)

    Google Scholar 

  • Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA (1998)

    Google Scholar 

  • Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesv´ari, C., Wiewiora, E.W.: Fastgradient descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the Twenty Sixth International Conference on Machine Learning (ICML). Montreal, Canada (2009a)

    Google Scholar 

  • Sutton, R.S., Precup, D., Singh, S.: Intra-option learning about temporally abstract actions. In:In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pp.118–126. Morgan Kaufmann (1998)

    Google Scholar 

  • Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporalabstraction in reinforcement learning. Artificial Intelligence 112(1–2), 181–211 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Sutton, R.S., Szepesv´ari, C., Maei, H.R.: A convergent O(n) algorithm for off-policy temporaldifference learning with linear function approximation. In: Koller, D., Schuurmans, D., Bengio,Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 21: Procedings of the2008 Conference, pp. 1609–1616. MIT Press (2009b)

    Google Scholar 

  • Taylor, M.E., Stone, P.: Behavior transfer for value-function-learning-based reinforcement learning. In: Proceedings of the Fifth International Joint Conference on Autonomous Agents andMultiagent Systems (AAMAS), pp. 53–59. Utrecht, Netherlands (2005)

    Google Scholar 

  • Taylor, M.E., Stone, P.: Cross-domain transfer for reinforcement learning. In: Proceedings ofthe Twenty Fourth International Conference on Machine Learning (ICML). Corvallis, Oregon(2007)

    Google Scholar 

  • Tesauro, G.: Practical issues in temporal difference learning. Machine Learning 8, 257–277 (1992)

    MATH  Google Scholar 

  • Thrun, S.: Is learning the n-th thing any easier than learning the first? In: Touretzky, D.S., Mozer,M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems: Proceedingsof the 1995 Conference, vol. 8, pp. 640–646. MIT Press (1996)

    Google Scholar 

  • Thrun, S., Schwartz, A.: Finding structure in reinforcement learning. In: Tesauro, G., Touretzky,D., Leen, T. (eds.) Advances in Neural Information Processing Systems: Proceedings of the 1994 Conference, vol. 7. MIT Press, Cambridge, MA (1995)

    Google Scholar 

  • Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Using advice to transfer knowledge acquired in onereinforcement learning task to another. In: Proceedings of the Sixteenth European Conferenceon Machine Learning (ECML), pp. 412–424. Bonn, Germany (2005)

    Google Scholar 

  • Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Skill acquisition via transfer learning and advice taking. In: Proceedings of the Seventeenth European Conference on Machine Learning (ECML),pp. 425–436. Berlin, Germany (2006)

    Google Scholar 

  • Uther, W.T.B., Veloso, M.M.: Tree based discretization for continuous state space reinforcementlearning. In: Proceedings of the National Conference on Artificial Intelligence (AAAI), pp.769–775. Madison, WI (1998)

    Google Scholar 

  • Uther, W.T.B., Veloso, M.M.: TTree: Tree-based state generalization with temporally abstract actions. In: Alonso, E., Kudenko, D., Kazakov, D. (eds.) Adaptive Agents and Multi-Agent Systems:Adaptation and Multi-Agent Learning, Lecture Notes in Artificial Intelligence, vol. 2636,pp. 260–290. Springer-Verlag Berlin Heidelberg (2003)

    Google Scholar 

  • Vollbrecht, H.: Hierarchic function approximation in kd-Q-learning. In: Proceedings of the FourthInternational Conference on Knowledge-Based Intelligent Engineering Systems & Allied Technologies. Brighton, UK (2000)

    Google Scholar 

  • Weaver, S., Baird, L., Polycarpou, M.: An analytical framework for local feedforward networks. IEEE Transactions on Neural Networks 9(3), 473–482 (1998)

    Article  Google Scholar 

  • Whiteson, S., Stone, P.: Evolutionary function approximation for reinforcement learning. Journalof Machine Learning Research 7 (2006)

    MathSciNet  Google Scholar 

  • Wolfe, A.P., Barto, A.G.: Decision tree methods for finding reusable MDP homomorphisms. In: Proceedings of the National Conference on Artificial Intelligence (AAAI), pp. 530–535. Boston, MA (2006a)

    Google Scholar 

  • Wolfe, A.P., Barto, A.G.: Defining object types and options using MDP homomorphisms. In: Proceedings of the ICML Workshop on Structural Knowledge Transfer for Machine Learning. Pittsburgh, PA, USA (2006b)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lutz Frommberger .

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Frommberger, L. (2010). Abstraction and Knowledge Transfer in Reinforcement Learning. In: Qualitative Spatial Abstraction in Reinforcement Learning. Cognitive Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16590-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16590-0_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16589-4

  • Online ISBN: 978-3-642-16590-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics