Abstract
This chapter presents the state of the art in research on reinforcement learning with a focus on abstraction and transfer learning. Especially, the open questions of performance in large, continuous state spaces and knowledge transfer are worked out as central challenges of reinforcement learning with regard to applications in Sect. 3.1. In the following, three approaches to tackle these problems are investigated: value function approximation (Sect. 3.2), temporal abstraction (Sect. 3.3), and spatial abstraction (Sect. 3.4).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Al-Ansari, M.A., Williams, R.J.: Robust, efficient, globally-optimized reinforcement learning with the parti-game algorithm. In: Kearns, M.S., Solla, S.A., Cohn, D.A. (eds.) Advances in Neural Information Processing Systems: Proceedings of the 1998 Conference, pp. 961–967. MIT Press, Cambridge, MA (1999)
Albus, J.S.: Brain, Behavior, and Robotics. Byte Books, Peterborough, NH, USA (1981)
Baird, L.: Residual algorithms: Reinforcement learning with function approximation. In: Proceedings of the Twelfth International Conference on Machine Learning (ICML), pp. 30–37. Morgan Kaufmann, San Francisco, CA (1995)
Baum, L.E., Sell, G.R.: Growth functions for transformations on manifolds. Pacific Journal of Mathematics 27(2), 211–227 (1968)
Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton, NJ (1957)
Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: Safely approximating the value function. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems: Proceedings of the 1994 Conference, vol. 7, pp. 369–376. MIT Press, Cambridge, MA (1995)
Braga, A.P.S., Ara´ujo, A.F.R.: A topological reinforcement learning agent for navigation. NeuralComputing and Applications 12, 220–236 (2003)
Dean, T., Givan, R.: Model minimization in Markov decision processes. In: Proceedings of the 14th National Conference on Artificial Intelligence (AAAI), pp. 106–111. Providence, RI, USA (1997)
Dietterich, T.G.: The MAXQ method for hierarchical reinforcement learning. In: Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pp. 118–126. Morgan Kaufmann (1998)
Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. International Journal of Artificial Intelligence Research 13, 227–303 (2000a)
Dietterich, T.G.: State abstraction in MAXQ hierarchical reinforcement learning,. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) Advances in Neural Information Processing Systems 12: Procedings of the 1999 Conference, pp. 994–1000. MIT Press (2000b)
Fern´andez, F., Veloso, M.: Probabilistic policy reuse in a reinforcement learning agent. In: Proceedingsof the Fifth International Joint Conference on Autonomous Agents and MultiagentSystems (AAMAS), pp. 720–727. Hakodate, Japan (2006)
Gabel, T., Hafner, R., Lange, S., Lauer, M., Riedmiller, M.: Bridging the gap: Learning in theRoboCup simulation and midsize league. In: Proceedings of the 7th Portuguese Conference onAutomatic Control (Controlo 2006). Lisbon, Portugal (2006)
Gabel, T., Riedmiller, M.: CBR for state value function approximation in reinforcement learning. In: Proceedings of the Sixteenth European Conference on Machine Learning (ECML), pp. 206–221. Bonn, Germany (2005)
Glaubius, R., Namihira, M., Smart, W.D.: Speeding up reinforcement learning using manifoldrepresentations: Preliminary results. In: Proceedings of the IJCAI Workshop “Reasoning withUncertainty in Robotics”. Edinburgh, Scotland (2005)
Glaubius, R., Smart, W.D.: Manifold representations for value-function approximation. In: Proceedings of the AAAI Workshop on Markov Decision Processes. San Jose, CA (2004)
Howard, R.A.: Dynamic Probabilistic Systems, Volume II: Semi-Markov and Decision Processes. Dover Publications (1971)
Kaelbling, L.P., Littmann, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Kirchner, F.: Q-learning of complex behaviors on a six-legged walking machine. Journal of robotics and autonomous systems 25, 256–263 (1998)
Konidaris, G.D.: A framework for transfer in reinforcement learning. In: Proceedings of the ICML Workshop on Structural Knowledge Transfer for Machine Learning. Pittsburgh, PA, USA (2006)
Likhachev, M., Koenig, S.: Speeding up the parti-game algorithm. In: Becker, S., Thrun, S., Obermayer,K. (eds.) Advances in Neural Information Processing Systems: Proceedings of the 2002 Conference, pp. 1563–1570. MIT Press, Cambridge, MA (2003)
Liu, Y., Stone, P.: Value-function-based transfer for reinforcement learning using structure mapping. In: Proceedings of the National Conference on Artificial Intelligence (AAAI). Boston, MA (2006)
Mahadevan, S.: Samuel meets Amarel: Automatic value function approximation using global statespace analysis. In: Proceedings of the National Conference on Artificial Intelligence (AAAI),pp. 877–917. Pittsburgh, PA (2005)
Mahadevan, S., Maggioni, M.: Value function approximation with diffusion wavelets and Laplacianeigenfunctions. In: Weiss, Y., Schölkopf, B., Platt, J. (eds.) Advances in Neural InformationProcessing Systems: Proceedings of the 2005 Conference, vol. 18, pp. 843–850. MITPress, Cambridge, MA (2006)
McCallum, A.K.: Reinforcement learning with selective perception and hidden state. Ph.D. thesis, Department of Computer Science, University of Rochester, NY (1995)
McGovern, A., Barto, A.G.: Autonomous discovery of temporal abstract actions from interaction with an environment. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML), pp. 361–368. Morgan Kaufmann, San Francisco, CA (2001)
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103–130 (1993)
Moore, A.W., Atkeson, C.G.: The parti-game algorithm for variable resolution reinforcement learningin multidimensional state-spaces. Machine Learning 21(3), 199–233 (1995)
Munos, R., Moore, A.: Variable resolution discretizations for high-accuracy solutions of optimal control problems. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence(IJCAI), pp. 1348–1355. Stockholm, Sweden (1999)
Parr, R., Russell, S.J.: Reinforcement learning with hierarchies of machines. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems 10: Procedingsof the 1997 Conference, pp. 1043–1049. MIT Press (1998)
Petrik, M.: An analysis of Laplacian methods for value function approximation in MDPs. In: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI), pp. 2574–2579. Hyderabad, India (2007)
Porta, J.M., Celaya, E.: Reinforcement learning for agents with many sensors and actuators acting in categorizable environments. Journal of Artificial Intelligence Research 23, 79–122 (2005)
Powell, M.J.D.: Radial basis functions for multivariate interpolation: A review. In: Mason, J.C., Cox, M.G. (eds.) Algorithms for Approximation, The Institute of Mathematics and its Applications Conference Series, pp. 143–167. Clarendon Press, Oxford (1987)
Precup, D.: Temporal abstraction in reinforcement learning. Ph.D. thesis, Department of ComputerScience, University of Massachusetts, Amherst MA (2000)
Ravindran, B.: An algebraic approach to abstraction in reinforcement learning. Ph.D. thesis, Departmentof Computer Science, University of Massachusetts, Amherst MA (2004)
Ravindran, B., Barto, A.G.: SMDP homomorphisms: An algebraic approach to abstraction in semi-Markov decision processes. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI), pp. 1011–1018. Acapulco, Mexico (2003)
Reynolds, S.I.: Adaptive resolution model-free reinforcement learning: Decision boundary partitioning. In: Proceedings of the Seventeenth International Conference on Machine Learning(ICML). Morgan Kaufmann, San Francisco (2000)
Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement learning with soft state aggregation. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems: Proceedings of the 1994 Conference, vol. 7. MIT Press, Cambridge, MA (1995)
Smart, W.D., Kaelbling, L.P.: Practical reinforcement learning in continuous spaces. In: Proceedingsof the Seventeenth International Conference on Machine Learning (ICML), pp. 903–910. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2000)
Soni, V., Singh, S.: Using homomorphisms to transfer options across continuous reinforcementlearning domains. In: Proceedings of the National Conference on Artificial Intelligence(AAAI), pp. 494–499. Boston, MA (2006)
Sutton, R.S.: Generalization in reinforcement learning: Successful examples using sparse tile coding. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural InformationProcessing Systems: Proceedings of the 1995 Conference, vol. 8, pp. 1038–1044. MIT Press, Cambridge, MA (1996)
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA (1998)
Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesv´ari, C., Wiewiora, E.W.: Fastgradient descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the Twenty Sixth International Conference on Machine Learning (ICML). Montreal, Canada (2009a)
Sutton, R.S., Precup, D., Singh, S.: Intra-option learning about temporally abstract actions. In:In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pp.118–126. Morgan Kaufmann (1998)
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporalabstraction in reinforcement learning. Artificial Intelligence 112(1–2), 181–211 (1999)
Sutton, R.S., Szepesv´ari, C., Maei, H.R.: A convergent O(n) algorithm for off-policy temporaldifference learning with linear function approximation. In: Koller, D., Schuurmans, D., Bengio,Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 21: Procedings of the2008 Conference, pp. 1609–1616. MIT Press (2009b)
Taylor, M.E., Stone, P.: Behavior transfer for value-function-learning-based reinforcement learning. In: Proceedings of the Fifth International Joint Conference on Autonomous Agents andMultiagent Systems (AAMAS), pp. 53–59. Utrecht, Netherlands (2005)
Taylor, M.E., Stone, P.: Cross-domain transfer for reinforcement learning. In: Proceedings ofthe Twenty Fourth International Conference on Machine Learning (ICML). Corvallis, Oregon(2007)
Tesauro, G.: Practical issues in temporal difference learning. Machine Learning 8, 257–277 (1992)
Thrun, S.: Is learning the n-th thing any easier than learning the first? In: Touretzky, D.S., Mozer,M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems: Proceedingsof the 1995 Conference, vol. 8, pp. 640–646. MIT Press (1996)
Thrun, S., Schwartz, A.: Finding structure in reinforcement learning. In: Tesauro, G., Touretzky,D., Leen, T. (eds.) Advances in Neural Information Processing Systems: Proceedings of the 1994 Conference, vol. 7. MIT Press, Cambridge, MA (1995)
Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Using advice to transfer knowledge acquired in onereinforcement learning task to another. In: Proceedings of the Sixteenth European Conferenceon Machine Learning (ECML), pp. 412–424. Bonn, Germany (2005)
Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Skill acquisition via transfer learning and advice taking. In: Proceedings of the Seventeenth European Conference on Machine Learning (ECML),pp. 425–436. Berlin, Germany (2006)
Uther, W.T.B., Veloso, M.M.: Tree based discretization for continuous state space reinforcementlearning. In: Proceedings of the National Conference on Artificial Intelligence (AAAI), pp.769–775. Madison, WI (1998)
Uther, W.T.B., Veloso, M.M.: TTree: Tree-based state generalization with temporally abstract actions. In: Alonso, E., Kudenko, D., Kazakov, D. (eds.) Adaptive Agents and Multi-Agent Systems:Adaptation and Multi-Agent Learning, Lecture Notes in Artificial Intelligence, vol. 2636,pp. 260–290. Springer-Verlag Berlin Heidelberg (2003)
Vollbrecht, H.: Hierarchic function approximation in kd-Q-learning. In: Proceedings of the FourthInternational Conference on Knowledge-Based Intelligent Engineering Systems & Allied Technologies. Brighton, UK (2000)
Weaver, S., Baird, L., Polycarpou, M.: An analytical framework for local feedforward networks. IEEE Transactions on Neural Networks 9(3), 473–482 (1998)
Whiteson, S., Stone, P.: Evolutionary function approximation for reinforcement learning. Journalof Machine Learning Research 7 (2006)
Wolfe, A.P., Barto, A.G.: Decision tree methods for finding reusable MDP homomorphisms. In: Proceedings of the National Conference on Artificial Intelligence (AAAI), pp. 530–535. Boston, MA (2006a)
Wolfe, A.P., Barto, A.G.: Defining object types and options using MDP homomorphisms. In: Proceedings of the ICML Workshop on Structural Knowledge Transfer for Machine Learning. Pittsburgh, PA, USA (2006b)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Frommberger, L. (2010). Abstraction and Knowledge Transfer in Reinforcement Learning. In: Qualitative Spatial Abstraction in Reinforcement Learning. Cognitive Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16590-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-16590-0_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16589-4
Online ISBN: 978-3-642-16590-0
eBook Packages: Computer ScienceComputer Science (R0)