Transfer in Reinforcement Learning: A Framework and a Survey

Part of the Adaptation, Learning, and Optimization book series (ALO, volume 12)

Abstract

Transfer in reinforcement learning is a novel research area that focuses on the development of methods to transfer knowledge from a set of source tasks to a target task. Whenever the tasks are similar, the transferred knowledge can be used by a learning algorithm to solve the target task and significantly improve its performance (e.g., by reducing the number of samples needed to achieve a nearly optimal performance). In this chapter we provide a formalization of the general transfer problem, we identify the main settings which have been investigated so far, and we review the most important approaches to transfer in reinforcement learning.

Keywords

Reinforcement Learning Multiagent System Reward Function Transfer Learning Hierarchical Bayesian Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Antos, A., Szepesvári, C., Munos, R.: Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning Journal 71, 89–129 (2008)CrossRefGoogle Scholar
  2. Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Machine Learning Journal 73(3), 243–272 (2008)CrossRefGoogle Scholar
  3. Asadi, M., Huber, M.: Effective control knowledge transfer through learning skill and representation hierarchies. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-2007), pp. 2054–2059 (2007)Google Scholar
  4. Banerjee, B., Stone, P.: General game learning using knowledge transfer. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-2007), pp. 672–677 (2007)Google Scholar
  5. Bartlett, P.L., Tewari, A.: Regal: a regularization based algorithm for reinforcement learning in weakly communicating mdps. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI-2009), pp. 35–42. AUAI Press, Arlington (2009)Google Scholar
  6. Baxter, J.: A model of inductive bias learning. Journal of Artificial Intelligence Research 12, 149–198 (2000)MathSciNetMATHGoogle Scholar
  7. Ben-David, S., Schuller-Borbely, R.: A notion of task relatedness yiealding provable multiple-task learning guarantees. Machine Learning Journal 73(3), 273–287 (2008)CrossRefGoogle Scholar
  8. Bernstein, D.S.: Reusing old policies to accelerate learning on new mdps. Tech. rep., University of Massachusetts, Amherst, MA, USA (1999)Google Scholar
  9. Bonarini, A., Lazaric, A., Restelli, M.: Incremental Skill Acquisition for Self-motivated Learning Animats. In: Nolfi, S., Baldassarre, G., Calabretta, R., Hallam, J.C.T., Marocco, D., Meyer, J.-A., Miglino, O., Parisi, D. (eds.) SAB 2006. LNCS (LNAI), vol. 4095, pp. 357–368. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press (2006)Google Scholar
  11. Crammer, K., Kearns, M., Wortman, J.: Learning from multiple sources. Journal of Machine Learning Research 9, 1757–1774 (2008)MathSciNetMATHGoogle Scholar
  12. Drummond, C.: Accelerating reinforcement learning by composing solutions of automatically identified subtasks. Journal of Artificial Intelligence Research 16, 59–104 (2002)MATHGoogle Scholar
  13. Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian processes. In: Proceedings of the 22nd International Conference on Machine Learning (ICML-2005), pp. 201–208 (2005)Google Scholar
  14. Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)MathSciNetMATHGoogle Scholar
  15. Farahmand, A.M., Ghavamzadeh, M., Szepesvári, C., Mannor, S.: Regularized policy iteration. In: Proceedings of the Twenty-Second Annual Conference on Advances in Neural Information Processing Systems (NIPS-2008), pp. 441–448 (2008)Google Scholar
  16. Fawcett, T., Callan, J., Matheus, C., Michalski, R., Pazzani, M., Rendell, L., Sutton, R. (eds.): Constructive Induction Workshop at the Eleventh International Conference on Machine Learning (1994)Google Scholar
  17. Ferguson, K., Mahadevan, S.: Proto-transfer learning in markov decision processes using spectral methods. In: Workshop on Structural Knowledge Transfer for Machine Learning at the Twenty-Third International Conference on Machine Learning (2006)Google Scholar
  18. Ferns, N., Panangaden, P., Precup, D.: Metrics for finite markov decision processes. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI-2004), pp. 162–169 (2004)Google Scholar
  19. Ferrante, E., Lazaric, A., Restelli, M.: Transfer of task representation in reinforcement learning using policy-based proto-value functions. In: Proceedings of the Seventh International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-2008), pp. 1329–1332 (2008)Google Scholar
  20. Foster, D.J., Dayan, P.: Structure in the space of value functions. Machine Learning Journal 49(2-3), 325–346 (2002)MATHCrossRefGoogle Scholar
  21. Gentner, D., Loewenstein, J., Thompson, L.: Learning and transfer: A general role for analogical encoding. Journal of Educational Psychology 95(2), 393–408 (2003)CrossRefGoogle Scholar
  22. Gick, M.L., Holyoak, K.J.: Schema induction and analogical transfer. Cognitive Psychology 15, 1–38 (1983)CrossRefGoogle Scholar
  23. Hauskrecht, M.: Planning with macro-actions: Effect of initial value function estimate on convergence rate of value iteration. Tech. rep., Department of Computer Science, University of Pittsburgh (1998)Google Scholar
  24. Hengst, B.: Discovering hierarchy in reinforcement learning. PhD thesis, University of New South Wales (2003)Google Scholar
  25. Jaksch, T., Ortner, R., Auer, P.: Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research 11, 1563–1600 (2010)MathSciNetMATHGoogle Scholar
  26. Kalmar, Z., Szepesvari, C.: An evaluation criterion for macro-learning and some results. Tech. Rep. TR-99-01, Mindmaker Ltd. (1999)Google Scholar
  27. Konidaris, G., Barto, A.: Autonomous shaping: knowledge transfer in reinforcement learning. In: Proceedings of the Twenty-Third International Conference on Machine Learning (ICML-2006), pp. 489–496 (2006)Google Scholar
  28. Konidaris, G., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-2007), pp. 895–900 (2007)Google Scholar
  29. Langley, P.: Transfer of knowledge in cognitive systems. In: Talk, Workshop on Structural Knowledge Transfer for Machine Learning at the Twenty-Third International Conference on Machine Learning (2006)Google Scholar
  30. Lazaric, A.: Knowledge transfer in reinforcement learning. PhD thesis, Poltecnico di Milano (2008)Google Scholar
  31. Lazaric, A., Ghavamzadeh, M.: Bayesian multi-task reinforcement learning. In: Proceedings of the Twenty-Seventh International Conference on Machine Learning, ICML-2010 (2010) (submitted)Google Scholar
  32. Lazaric, A., Restelli, M., Bonarini, A.: Transfer of samples in batch reinforcement learning. In: Proceedings of the Twenty-Fifth Annual International Conference on Machine Learning (ICML-2008), pp. 544–551 (2008)Google Scholar
  33. Lazaric, A., Ghavamzadeh, M., Munos, R.: Finite-sample analysis of lstd. In: Proceedings of the Twenty-Seventh International Conference on Machine Learning, ICML-2010 (2010)Google Scholar
  34. Li, H., Liao, X., Carin, L.: Multi-task reinforcement learning in partially observable stochastic environments. Journal of Machine Learning Research 10, 1131–1186 (2009)MathSciNetMATHGoogle Scholar
  35. Madden, M.G., Howley, T.: Transfer of experience between reinforcement learning environments with progressive difficulty. Artificial Intelligence Review 21(3-4), 375–398 (2004)MATHCrossRefGoogle Scholar
  36. Mahadevan, S., Maggioni, M.: Proto-value functions: A laplacian framework for learning representation and control in markov decision processes. Journal of Machine Learning Research 38, 2169–2231 (2007)MathSciNetGoogle Scholar
  37. Maillard, O.A., Lazaric, A., Ghavamzadeh, M., Munos, R.: Finite-sample analysis of bellman residual minimization. In: Proceedings of the Second Asian Conference on Machine Learning, ACML-2010 (2010)Google Scholar
  38. McGovern, A., Barto, A.G.: Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001 (2001)Google Scholar
  39. Mehta, N., Natarajan, S., Tadepalli, P., Fern, A.: Transfer in variable-reward hierarchical reinforcement learning. Machine Learning Journal 73(3), 289–312 (2008)CrossRefGoogle Scholar
  40. Menache, I., Mannor, S., Shimkin, N.: Q-cut - dynamic discovery of sub-goals in reinforcement learning. In: Proceedings of the Thirteen European Conference on Machine Learning, pp. 295–306 (2002)Google Scholar
  41. Munos, R., Szepesvári, C.: Finite time bounds for fitted value iteration. Journal of Machine Learning Research 9, 815–857 (2008)MATHGoogle Scholar
  42. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(22), 1345–1359 (2010)CrossRefGoogle Scholar
  43. Perkins, D.N., Salomon, G., Press, P.: Transfer of learning. In: International Encyclopedia of Education. Pergamon Press (1992)Google Scholar
  44. Perkins, T.J., Precup, D.: Using options for knowledge transfer in reinforcement learning. Tech. rep., University of Massachusetts, Amherst, MA, USA (1999)Google Scholar
  45. Phillips, C.: Knowledge transfer in markov decision processes. McGill School of Computer Science (2006), http://www.cs.mcgill.ca/~martin/usrs/phillips.pdf
  46. Ravindran, B., Barto, A.G.: Relativized options: Choosing the right transformation. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), pp. 608–615 (2003)Google Scholar
  47. Sherstov, A.A., Stone, P.: Improving action selection in MDP’s via knowledge transfer. In: Proceedings of the Twentieth National Conference on Artificial Intelligence, AAAI-2005 (2005)Google Scholar
  48. Silver, D.: Selective transfer of neural network task knowledge. PhD thesis, University of Western Ontario (2000)Google Scholar
  49. Silver, D.L., Poirier, R.: Requirements for Machine Lifelong Learning. In: Mira, J., Álvarez, J.R. (eds.) IWINAC 2007, Part I. LNCS, vol. 4527, pp. 313–319. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  50. Simsek, O., Wolfe, A.P., Barto, A.G.: Identifying useful subgoals in reinforcement learning by local graph partitioning. In: Proceedings of the Twenty-Second International Conference of Machine Learning, ICML 2005 (2005)Google Scholar
  51. Singh, S., Barto, A., Chentanez, N.: Intrinsically motivated reinforcement learning. In: Proceedings of the Eighteenth Annual Conference on Neural Information Processing Systems, NIPS-2004 (2004)Google Scholar
  52. Soni, V., Singh, S.P.: Using homomorphisms to transfer options across continuous reinforcement learning domains. In: Proceedings of the Twenty-first National Conference on Artificial Intelligence, AAAI-2006 (2006)Google Scholar
  53. Stone, P., Sutton, R.S., Kuhlmann, G.: Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior 13(3), 165–188 (2005)CrossRefGoogle Scholar
  54. Sunmola, F.T., Wyatt, J.L.: Model transfer for markov decision tasks via parameter matching. In: Proceedings of the 25th Workshop of the UK Planning and Scheduling Special Interest Group, PlanSIG 2006 (2006)Google Scholar
  55. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  56. Sutton, R.S., Precup, D., Singh, S.: Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)MathSciNetMATHCrossRefGoogle Scholar
  57. Talvitie, E., Singh, S.: An experts algorithm for transfer learning. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-2007), pp. 1065–1070 (2007)Google Scholar
  58. Tanaka, F., Yamamura, M.: Multitask reinforcement learning on the distribution of mdps. In: IEEE International Symposium on Computational Intelligence in Robotics and Automation, vol. 3, pp. 1108–1113 (2003)Google Scholar
  59. Taylor, M.E., Stone, P.: Behavior transfer for value-function-based reinforcement learning. In: Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-2005), pp. 53–59 (2005)Google Scholar
  60. Taylor, M.E., Stone, P.: Representation transfer for reinforcement learning. In: AAAI 2007 Fall Symposium on Computational Approaches to Representation Change during Learning and Development (2007)Google Scholar
  61. Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research 10(1), 1633–1685 (2009)MathSciNetMATHGoogle Scholar
  62. Taylor, M.E., Stone, P., Liu, Y.: Value functions for RL-based behavior transfer: A comparative study. In: Proceedings of the Twentieth National Conference on Artificial Intelligence, AAAI-2005 (2005)Google Scholar
  63. Taylor, M.E., Stone, P., Liu, Y.: Transfer learning via inter-task mappings for temporal difference learning. Journal of Machine Learning Research 8, 2125–2167 (2007a)MathSciNetMATHGoogle Scholar
  64. Taylor, M.E., Whiteson, S., Stone, P.: Transfer via inter-task mappings in policy search reinforcement learning. In: Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS-2007 (2007b)Google Scholar
  65. Taylor, M.E., Jong, N.K., Stone, P.: Transferring instances for model-based reinforcement learning. In: Proceedings of the European Conference on Machine Learning (ECML-2008), pp. 488–505 (2008a)Google Scholar
  66. Taylor, M.E., Kuhlmann, G., Stone, P.: Autonomous transfer for reinforcement learning. In: Proceedings of the Seventh International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-2008), pp. 283–290 (2008b)Google Scholar
  67. Thorndike, E.L., Woodworth, R.S.: The influence of improvement in one mental function upon the efficiency of other functions. Psychological Review 8 (1901)Google Scholar
  68. Torrey, L., Walker, T., Shavlik, J., Maclin, R.: Using Advice to Transfer Knowledge Acquired in one Reinforcement Learning Task to Another. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 412–424. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  69. Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Skill Acquisition Via Transfer Learning and Advice Taking. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 425–436. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  70. Utgoff, P.: Shift of bias for inductive concept learning. Machine Learning 2, 163–190 (1986)Google Scholar
  71. Walsh, T.J., Li, L., Littman, M.L.: Transferring state abstractions between mdps. In: ICML Workshop on Structural Knowledge Transfer for Machine Learning (2006)Google Scholar
  72. Watkins, C., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)MATHGoogle Scholar
  73. Wilson, A., Fern, A., Ray, S., Tadepalli, P.: Multi-task reinforcement learning: a hierarchical bayesian approach. In: Proceedings of the Twenty-Forth International Conference on Machine learning (ICML-2007), pp. 1015–1022 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.INRIA Lille-Nord EuropeVilleneuve d’AscqFrance

Personalised recommendations