Abstraction and Knowledge Transfer in Reinforcement Learning

Frommberger, Lutz

doi:10.1007/978-3-642-16590-0_3

Lutz Frommberger²

Part of the book series: Cognitive Technologies ((COGTECH))

879 Accesses

Abstract

This chapter presents the state of the art in research on reinforcement learning with a focus on abstraction and transfer learning. Especially, the open questions of performance in large, continuous state spaces and knowledge transfer are worked out as central challenges of reinforcement learning with regard to applications in Sect. 3.1. In the following, three approaches to tackle these problems are investigated: value function approximation (Sect. 3.2), temporal abstraction (Sect. 3.3), and spatial abstraction (Sect. 3.4).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Al-Ansari, M.A., Williams, R.J.: Robust, efficient, globally-optimized reinforcement learning with the parti-game algorithm. In: Kearns, M.S., Solla, S.A., Cohn, D.A. (eds.) Advances in Neural Information Processing Systems: Proceedings of the 1998 Conference, pp. 961–967. MIT Press, Cambridge, MA (1999)
Google Scholar
Albus, J.S.: Brain, Behavior, and Robotics. Byte Books, Peterborough, NH, USA (1981)
Google Scholar
Baird, L.: Residual algorithms: Reinforcement learning with function approximation. In: Proceedings of the Twelfth International Conference on Machine Learning (ICML), pp. 30–37. Morgan Kaufmann, San Francisco, CA (1995)
Google Scholar
Baum, L.E., Sell, G.R.: Growth functions for transformations on manifolds. Pacific Journal of Mathematics 27(2), 211–227 (1968)
MathSciNet MATH Google Scholar
Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton, NJ (1957)
MATH Google Scholar
Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: Safely approximating the value function. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems: Proceedings of the 1994 Conference, vol. 7, pp. 369–376. MIT Press, Cambridge, MA (1995)
Google Scholar
Braga, A.P.S., Ara´ujo, A.F.R.: A topological reinforcement learning agent for navigation. NeuralComputing and Applications 12, 220–236 (2003)
Article Google Scholar
Dean, T., Givan, R.: Model minimization in Markov decision processes. In: Proceedings of the 14th National Conference on Artificial Intelligence (AAAI), pp. 106–111. Providence, RI, USA (1997)
Google Scholar
Dietterich, T.G.: The MAXQ method for hierarchical reinforcement learning. In: Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pp. 118–126. Morgan Kaufmann (1998)
Google Scholar
Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. International Journal of Artificial Intelligence Research 13, 227–303 (2000a)
MathSciNet MATH Google Scholar
Dietterich, T.G.: State abstraction in MAXQ hierarchical reinforcement learning,. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) Advances in Neural Information Processing Systems 12: Procedings of the 1999 Conference, pp. 994–1000. MIT Press (2000b)
Google Scholar
Fern´andez, F., Veloso, M.: Probabilistic policy reuse in a reinforcement learning agent. In: Proceedingsof the Fifth International Joint Conference on Autonomous Agents and MultiagentSystems (AAMAS), pp. 720–727. Hakodate, Japan (2006)
Google Scholar
Gabel, T., Hafner, R., Lange, S., Lauer, M., Riedmiller, M.: Bridging the gap: Learning in theRoboCup simulation and midsize league. In: Proceedings of the 7th Portuguese Conference onAutomatic Control (Controlo 2006). Lisbon, Portugal (2006)
Google Scholar
Gabel, T., Riedmiller, M.: CBR for state value function approximation in reinforcement learning. In: Proceedings of the Sixteenth European Conference on Machine Learning (ECML), pp. 206–221. Bonn, Germany (2005)
Google Scholar
Glaubius, R., Namihira, M., Smart, W.D.: Speeding up reinforcement learning using manifoldrepresentations: Preliminary results. In: Proceedings of the IJCAI Workshop “Reasoning withUncertainty in Robotics”. Edinburgh, Scotland (2005)
Google Scholar
Glaubius, R., Smart, W.D.: Manifold representations for value-function approximation. In: Proceedings of the AAAI Workshop on Markov Decision Processes. San Jose, CA (2004)
Google Scholar
Howard, R.A.: Dynamic Probabilistic Systems, Volume II: Semi-Markov and Decision Processes. Dover Publications (1971)
Google Scholar
Kaelbling, L.P., Littmann, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Google Scholar
Kirchner, F.: Q-learning of complex behaviors on a six-legged walking machine. Journal of robotics and autonomous systems 25, 256–263 (1998)
Article MathSciNet Google Scholar
Konidaris, G.D.: A framework for transfer in reinforcement learning. In: Proceedings of the ICML Workshop on Structural Knowledge Transfer for Machine Learning. Pittsburgh, PA, USA (2006)
Google Scholar
Likhachev, M., Koenig, S.: Speeding up the parti-game algorithm. In: Becker, S., Thrun, S., Obermayer,K. (eds.) Advances in Neural Information Processing Systems: Proceedings of the 2002 Conference, pp. 1563–1570. MIT Press, Cambridge, MA (2003)
Google Scholar
Liu, Y., Stone, P.: Value-function-based transfer for reinforcement learning using structure mapping. In: Proceedings of the National Conference on Artificial Intelligence (AAAI). Boston, MA (2006)
Google Scholar
Mahadevan, S.: Samuel meets Amarel: Automatic value function approximation using global statespace analysis. In: Proceedings of the National Conference on Artificial Intelligence (AAAI),pp. 877–917. Pittsburgh, PA (2005)
Google Scholar
Mahadevan, S., Maggioni, M.: Value function approximation with diffusion wavelets and Laplacianeigenfunctions. In: Weiss, Y., Schölkopf, B., Platt, J. (eds.) Advances in Neural InformationProcessing Systems: Proceedings of the 2005 Conference, vol. 18, pp. 843–850. MITPress, Cambridge, MA (2006)
Google Scholar
McCallum, A.K.: Reinforcement learning with selective perception and hidden state. Ph.D. thesis, Department of Computer Science, University of Rochester, NY (1995)
Google Scholar
McGovern, A., Barto, A.G.: Autonomous discovery of temporal abstract actions from interaction with an environment. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML), pp. 361–368. Morgan Kaufmann, San Francisco, CA (2001)
Google Scholar
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103–130 (1993)
Google Scholar
Moore, A.W., Atkeson, C.G.: The parti-game algorithm for variable resolution reinforcement learningin multidimensional state-spaces. Machine Learning 21(3), 199–233 (1995)
Google Scholar
Munos, R., Moore, A.: Variable resolution discretizations for high-accuracy solutions of optimal control problems. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence(IJCAI), pp. 1348–1355. Stockholm, Sweden (1999)
Google Scholar
Parr, R., Russell, S.J.: Reinforcement learning with hierarchies of machines. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems 10: Procedingsof the 1997 Conference, pp. 1043–1049. MIT Press (1998)
Google Scholar
Petrik, M.: An analysis of Laplacian methods for value function approximation in MDPs. In: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI), pp. 2574–2579. Hyderabad, India (2007)
Google Scholar
Porta, J.M., Celaya, E.: Reinforcement learning for agents with many sensors and actuators acting in categorizable environments. Journal of Artificial Intelligence Research 23, 79–122 (2005)
MATH Google Scholar
Powell, M.J.D.: Radial basis functions for multivariate interpolation: A review. In: Mason, J.C., Cox, M.G. (eds.) Algorithms for Approximation, The Institute of Mathematics and its Applications Conference Series, pp. 143–167. Clarendon Press, Oxford (1987)
Google Scholar
Precup, D.: Temporal abstraction in reinforcement learning. Ph.D. thesis, Department of ComputerScience, University of Massachusetts, Amherst MA (2000)
Google Scholar
Ravindran, B.: An algebraic approach to abstraction in reinforcement learning. Ph.D. thesis, Departmentof Computer Science, University of Massachusetts, Amherst MA (2004)
Google Scholar
Ravindran, B., Barto, A.G.: SMDP homomorphisms: An algebraic approach to abstraction in semi-Markov decision processes. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI), pp. 1011–1018. Acapulco, Mexico (2003)
Google Scholar
Reynolds, S.I.: Adaptive resolution model-free reinforcement learning: Decision boundary partitioning. In: Proceedings of the Seventeenth International Conference on Machine Learning(ICML). Morgan Kaufmann, San Francisco (2000)
Google Scholar
Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement learning with soft state aggregation. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems: Proceedings of the 1994 Conference, vol. 7. MIT Press, Cambridge, MA (1995)
Google Scholar
Smart, W.D., Kaelbling, L.P.: Practical reinforcement learning in continuous spaces. In: Proceedingsof the Seventeenth International Conference on Machine Learning (ICML), pp. 903–910. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2000)
Google Scholar
Soni, V., Singh, S.: Using homomorphisms to transfer options across continuous reinforcementlearning domains. In: Proceedings of the National Conference on Artificial Intelligence(AAAI), pp. 494–499. Boston, MA (2006)
Google Scholar
Sutton, R.S.: Generalization in reinforcement learning: Successful examples using sparse tile coding. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural InformationProcessing Systems: Proceedings of the 1995 Conference, vol. 8, pp. 1038–1044. MIT Press, Cambridge, MA (1996)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA (1998)
Google Scholar
Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesv´ari, C., Wiewiora, E.W.: Fastgradient descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the Twenty Sixth International Conference on Machine Learning (ICML). Montreal, Canada (2009a)
Google Scholar
Sutton, R.S., Precup, D., Singh, S.: Intra-option learning about temporally abstract actions. In:In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pp.118–126. Morgan Kaufmann (1998)
Google Scholar
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporalabstraction in reinforcement learning. Artificial Intelligence 112(1–2), 181–211 (1999)
Article MathSciNet MATH Google Scholar
Sutton, R.S., Szepesv´ari, C., Maei, H.R.: A convergent O(n) algorithm for off-policy temporaldifference learning with linear function approximation. In: Koller, D., Schuurmans, D., Bengio,Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 21: Procedings of the2008 Conference, pp. 1609–1616. MIT Press (2009b)
Google Scholar
Taylor, M.E., Stone, P.: Behavior transfer for value-function-learning-based reinforcement learning. In: Proceedings of the Fifth International Joint Conference on Autonomous Agents andMultiagent Systems (AAMAS), pp. 53–59. Utrecht, Netherlands (2005)
Google Scholar
Taylor, M.E., Stone, P.: Cross-domain transfer for reinforcement learning. In: Proceedings ofthe Twenty Fourth International Conference on Machine Learning (ICML). Corvallis, Oregon(2007)
Google Scholar
Tesauro, G.: Practical issues in temporal difference learning. Machine Learning 8, 257–277 (1992)
MATH Google Scholar
Thrun, S.: Is learning the n-th thing any easier than learning the first? In: Touretzky, D.S., Mozer,M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems: Proceedingsof the 1995 Conference, vol. 8, pp. 640–646. MIT Press (1996)
Google Scholar
Thrun, S., Schwartz, A.: Finding structure in reinforcement learning. In: Tesauro, G., Touretzky,D., Leen, T. (eds.) Advances in Neural Information Processing Systems: Proceedings of the 1994 Conference, vol. 7. MIT Press, Cambridge, MA (1995)
Google Scholar
Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Using advice to transfer knowledge acquired in onereinforcement learning task to another. In: Proceedings of the Sixteenth European Conferenceon Machine Learning (ECML), pp. 412–424. Bonn, Germany (2005)
Google Scholar
Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Skill acquisition via transfer learning and advice taking. In: Proceedings of the Seventeenth European Conference on Machine Learning (ECML),pp. 425–436. Berlin, Germany (2006)
Google Scholar
Uther, W.T.B., Veloso, M.M.: Tree based discretization for continuous state space reinforcementlearning. In: Proceedings of the National Conference on Artificial Intelligence (AAAI), pp.769–775. Madison, WI (1998)
Google Scholar
Uther, W.T.B., Veloso, M.M.: TTree: Tree-based state generalization with temporally abstract actions. In: Alonso, E., Kudenko, D., Kazakov, D. (eds.) Adaptive Agents and Multi-Agent Systems:Adaptation and Multi-Agent Learning, Lecture Notes in Artificial Intelligence, vol. 2636,pp. 260–290. Springer-Verlag Berlin Heidelberg (2003)
Google Scholar
Vollbrecht, H.: Hierarchic function approximation in kd-Q-learning. In: Proceedings of the FourthInternational Conference on Knowledge-Based Intelligent Engineering Systems & Allied Technologies. Brighton, UK (2000)
Google Scholar
Weaver, S., Baird, L., Polycarpou, M.: An analytical framework for local feedforward networks. IEEE Transactions on Neural Networks 9(3), 473–482 (1998)
Article Google Scholar
Whiteson, S., Stone, P.: Evolutionary function approximation for reinforcement learning. Journalof Machine Learning Research 7 (2006)
MathSciNet Google Scholar
Wolfe, A.P., Barto, A.G.: Decision tree methods for finding reusable MDP homomorphisms. In: Proceedings of the National Conference on Artificial Intelligence (AAAI), pp. 530–535. Boston, MA (2006a)
Google Scholar
Wolfe, A.P., Barto, A.G.: Defining object types and options using MDP homomorphisms. In: Proceedings of the ICML Workshop on Structural Knowledge Transfer for Machine Learning. Pittsburgh, PA, USA (2006b)
Google Scholar

Download references

Author information

Authors and Affiliations

Cognitive Systems Group, Department of Mathematics and Informatics, University of Bremen, P.O. Box 330 440, 28334, Bremen, Germany
Dr.-Ing. Lutz Frommberger

Authors

Dr.-Ing. Lutz Frommberger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lutz Frommberger .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Frommberger, L. (2010). Abstraction and Knowledge Transfer in Reinforcement Learning. In: Qualitative Spatial Abstraction in Reinforcement Learning. Cognitive Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16590-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-16590-0_3
Published: 02 November 2010
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16589-4
Online ISBN: 978-3-642-16590-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics