Skip to main content

A Study on Efficient Reinforcement Learning Through Knowledge Transfer

  • Chapter
  • First Online:
  • 1117 Accesses

Part of the book series: Adaptation, Learning, and Optimization ((ALO,volume 27))

Abstract

Although Reinforcement Learning (RL) algorithms have made impressive progress in learning complex tasks over the past years, there are still prevailing short-comings and challenges. Specifically, the sample-inefficiency and limited adaptation across tasks often make classic RL techniques impractical for real-world applications despite the gained representational power when combining deep neural networks with RL, known as Deep Reinforcement Learning (DRL). Recently, a number of approaches to address those issues have emerged. Many of those solutions are based on smart DRL architectures that enhance single task algorithms with the capability to share knowledge between agents and across tasks by introducing Transfer Learning (TL) capabilities. This survey addresses strategies of knowledge transfer from simple parameter sharing to privacy preserving federated learning and aims at providing a general overview of the field of TL in the DRL domain, establishes a classification framework, and briefly describes representative works in the area.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://gym.openai.com/.

  2. 2.

    https://github.com/deepmind/lab.

  3. 3.

    https://github.com/Microsoft/AirSim.

  4. 4.

    https://github.com/Microsoft/malmo.

References

  1. Andreas J, Klein D, Levine S (2017) Modular multitask reinforcement learning with policy sketches. In: International conference on machine learning. PMLR, pp 166–175

    Google Scholar 

  2. Andrychowicz OM, Baker B, Chociej M, Jozefowicz R, McGrew B, Pachocki J, Petron A, Plappert M, Powell G, Ray A et al (2020) Learning dexterous in-hand manipulation. Int J Robot Res 39(1):3–20

    Article  Google Scholar 

  3. Anwar A, Raychowdhury A (2021) Multi-task federated reinforcement learning with adversaries. arXiv:2103.06473

  4. Bacon PL, Harb J, Precup D (2017) The option-critic architecture. In: Proceedings of the thirty-first AAAI conference on artificial intelligence (AAAI-17), pp 1726–1734

    Google Scholar 

  5. Bansal T, Pachocki T, Sidor SI, Mordatch SI (2018) Emergent complexity via multi-agent competition. In: 6th international conference on learning representations. https://www.OpenReview.net

  6. Bengio Y (2012) Deep learning of representations for unsupervised and transfer learning. In: Proceedings of ICML workshop on unsupervised and transfer learning, vol 27. PMLR, pp 17–36

    Google Scholar 

  7. Bsat SE, Ammar HB, Taylor M (2017) Scalable multitask policy gradient reinforcement learning. In: Proceedings of the thirty-first AAAI conference on artificial intelligence. AAAI Press, pp 1847–1853

    Google Scholar 

  8. Clavera I, Nagabandi A, Liu S, Fearing RS, Abbeel P, Levine S, Finn C (2019) Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In: International conference on learning representations

    Google Scholar 

  9. Czarnecki WM, Jayakumar SM, Jaderberg M, Hasenclever L, Teh YW, Heess N, Osindero S, Pascanu R (2018) Mix & match agent curricula for reinforcement learning. In: Proceedings of the 35th international conference on machine learning, vol 80. PMLR, pp 1095–1103

    Google Scholar 

  10. Du Y, de la Cruz GV Jr, Irwin J, Taylor ME (2016) Initial progress in transfer for deep reinforcement learning algorithms. In: The deep reinforcement learning: frontiers and challenges workshop

    Google Scholar 

  11. Dulac-Arnold G, Mankowitz D, Hester T (2019) Challenges of real-world reinforcement learning. In: ICML 2019 workshop RL4RealLife

    Google Scholar 

  12. Fernando C, Banarse D, Blundell C et al (2017) PathNet: evolution channels gradient descent in super neural networks. arXiv:1701.08734

  13. Finn C, Levine S, Abbeel P (2016) Guided cost learning: deep inverse optimal control via policy optimization. In: Proceedings of the 33rd international conference on international conference on machine learning, vol 48. JMLR, pp 49–58

    Google Scholar 

  14. Finn C, Yu T, Fu J, Abbeel P, Levine S (2017) Generalizing skills with semi-supervised reinforcement learning. In: 5th International conference on learning representations

    Google Scholar 

  15. Foerster J, Farquhar G, Afouras T, Nardelli N, Whiteson S (2018) Counterfactual multi-agent policy gradients. In: The thirty-second AAAI conference on artificial intelligence. AAAI Press, pp 2974–2982

    Google Scholar 

  16. Glatt R, Costa AHR (2017) Improving deep reinforcement learning with knowledge transfer. In: Thirty-First AAAI conference on artificial intelligence

    Google Scholar 

  17. Glatt R, Costa AHR (2017) Policy reuse in deep reinforcement learning. In: Thirty-First AAAI conference on artificial intelligence

    Google Scholar 

  18. Glatt R, Da Silva FL, da Costa Bianchi RA, Costa AHR (2020) Decaf: deep case-based policy inference for knowledge transfer in reinforcement learning. Expert Syst Appl 156:113420

    Google Scholar 

  19. Glatt R, Silva FD, Costa AHR (2017) Case-based policy inference for transfer in reinforcement learning. In: Workshop on scaling-up reinforcement learning at ECML, pp 1–8

    Google Scholar 

  20. Glatt R, Silva FLD, Costa AHR (2016) Towards knowledge transfer in deep reinforcement learning. In: 5th Brazilian conference on intelligent systems (BRACIS). IEEE, pp 91–96

    Google Scholar 

  21. Gu S, Holly E, Lillicrap T, Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: IEEE international conference on robotics and automation (ICRA). IEEE, pp 3389–3396

    Google Scholar 

  22. Guo Y, Zhao Z, He K, Lai S, Xia J, Fan L (2021) Efficient and flexible management for industrial internet of things: a federated learning approach. Computer Networks 192:108122

    Article  Google Scholar 

  23. Gupta A, Devin C, Liu Y, Abbeel P, Levine S (2017) Learning invariant feature spaces to transfer skills with reinforcement learning. In: 5th international conference on learning representations. https://www.OpenReview.net

  24. Gupta A, Mendonca R, Liu Y, Abbeel P, Levine S (2018) Meta-Reinforcement learning of structured exploration strategies. In: Advances in neural information processing systems, vol 31. Curran Associates, Inc, pp 5302–5311

    Google Scholar 

  25. Heinrich J, Silver D (2016) Deep reinforcement learning from self-play in imperfect-information games. arXiv:160301121v2 (2016)

  26. Hussein A, Gaber MM, Elyan E, Jayne C (2017) Imitation learning: a survey of learning methods. ACM Comput Surv (CSUR) 50(2):1–35

    Article  Google Scholar 

  27. Isele D, Cosgun A (2018) Selective experience replay for lifelong learning. In: The thirty-second AAAI conference on artificila intelligence. AAAI Press, pp 3303–3309

    Google Scholar 

  28. Isele D, Cosgun A, Fujimura K (2017) Analyzing knowledge transfer in deep Q-Networks for autonomously handling multiple intersections. arXiv:1705.01197

  29. Jeon W, Seo S, Kim KE (2018) A bayesian approach to generative adversarial imitation learning. In: Advances in neural information processing systems, vol 31. Curran Associates, Inc, pp 7429–7439

    Google Scholar 

  30. Khetarpal K, Sodhani S, Chandar S, Precup D (2018) Environments for lifelong reinforcement learning. arXiv:1811.10732

  31. Kirkpatrick J, Pascanu R, Rabinowitz N et al (2017) Overcoming catastrophic forgetting in neural networks. PNAS 114(13):3521–3526

    Article  MathSciNet  Google Scholar 

  32. Kolodner J (2014) Case-based reasoning. Morgan Kaufmann

    Google Scholar 

  33. Konidaris G, Barto AG (2007) Building portable options: skill transfer in reinforcement learning. In: Proceedings of the twentieth international joint conference on artificial intelligence, vol 7. IJCAI, pp 895–900

    Google Scholar 

  34. Kulkarni TD, Narasimhan KR, Saeedi A, Tenenbaum JB (2016) Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Proceedings of the 30th international conference on neural information processing systems. Curran Associates Inc, pp 3682–3690

    Google Scholar 

  35. Kumar S, Shah P, Hakkani-Tur D, Heck L (2017) Federated control with hierarchical multi-agent deep reinforcement learning. arXiv:1712.08266

  36. Lazaric A (2012) Transfer in reinforcement learning: a framework and a survey. In: Reinforcement learning. Springer, pp. 143–173

    Google Scholar 

  37. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10.1038/nature14539

    Article  Google Scholar 

  38. Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334–1373

    MathSciNet  MATH  Google Scholar 

  39. Liang X, Liu Y, Chen T, Liu M, Yang Q (2019) Federated transfer reinforcement learning for autonomous driving. arXiv:1910.06001

  40. Lim HK, Kim JB, Heo JS, Han YH (2020) Federated reinforcement learning for training control policies on multiple IoT devices. Sensors 20(5):1359

    Article  Google Scholar 

  41. Liu B, Wang L, Liu M (2019) Lifelong federated reinforcement learning: a learning architecture for navigation in cloud robotic systems. IEEE Robot Autom Lett 4(4):4555–4562

    Article  Google Scholar 

  42. Liu YJ, Feng G, Sun Y, Qin S, Liang YC (2020) Device association for ran slicing based on hybrid federated deep reinforcement learning. IEEE Trans Veh Technol 69(12):15731–15745

    Article  Google Scholar 

  43. Mendez Mendez JA, Shivkumar S, Eaton E (2018) Lifelong inverse reinforcement learning. In: Advances in neural information processing systems, vol 31. Curran Associates, Inc, pp 4502–4513

    Google Scholar 

  44. Mnih V, Badia AP, Mirza M et al (2016) Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd international conference on international conference on machine learning, vol 48. JMLR, pp 1928–1937

    Google Scholar 

  45. Mnih V, Silver D, Rusu AA et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  46. Mo K, Zhang Y, Li S, Li J, Yang Q (2018) Personalizing a dialogue system with transfer reinforcement learning. In: Proceedings of the thirty-second AAAI conference on artificial intelligence, pp 5317–5324

    Google Scholar 

  47. Nakayashiki T, Kaneko T (2018) Learning of evaluation functions via self-play enhanced by checkmate search. In: 2018 conference on technologies and applications of artificial intelligence (TAAI), pp 126–131. https://doi.org/10.1109/TAAI.2018.00036

  48. Narvekar S, Peng B, Leonetti M, Sinapov J, Taylor ME, Stone P (2020) Curriculum learning for reinforcement learning domains: a framework and survey. arXiv:2003.04960

  49. Narvekar S, Sinapov J, Stone P (2017) Autonomous task sequencing for customized curriculum design in reinforcement learning. In: IJCAI, pp 2536–2542

    Google Scholar 

  50. Ng AY, Coates A, Diel M et al (2006) Autonomous inverted helicopter flight via reinforcement learning. In: Experimental robotics IX, vol 21. Springer, pp 363–372 (2006)

    Google Scholar 

  51. Nguyen DT, Kumar A, Lau HC (2018) Credit assignment for collective multiagent RL with global rewards. In: Advances in neural information processing systems, vol 31. Curran Associates, Inc, pp 8102–8113

    Google Scholar 

  52. Nguyen TT, Nguyen ND, Nahavandi S (2020) Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans Cybern 50(9):3826–3839

    Article  Google Scholar 

  53. Oh J, Singh S, Lee H, Kohli P (2017) Zero-Shot task generalization with multi-task deep reinforcement learning. In: Proceedings of the 34th international conference on machine learning. PMLR, pp 2661–2670

    Google Scholar 

  54. Omidshafiei S, Pazis J, Amato C, How JP, Vian J (2017) Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th international conference on machine learning, vol 70. JMLR, pp 2681–2690

    Google Scholar 

  55. OpenAI, Berner C, Brockman G, Chan B, Cheung V, Dȩbiak P, Dennison C, Farhi D, Fischer Q, Hashme S, Hesse C, Józefowicz R, Gray S, Olsson C, Pachocki J, Petrov M, de Oliveira Pinto HP, Raiman J, Salimans T, Schlatter J, Schneider J, Sidor S, Sutskever I, Tang J, Wolski F, Zhang S (2019) Dota 2 with large scale deep reinforcement learning. arXiv:1912.06680, https://arxiv.org/abs/1912.06680

  56. Owen A, Zhou Y (2000) Safe and effective importance sampling. J Am Stat Assoc 95(449):135–143

    Article  MathSciNet  Google Scholar 

  57. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  58. Parisotto E, Ba JL, Salakhutdinov R (2016) Actor-mimic: Deep multitask and transfer reinforcement learning. In: 4th international conference on learning representations. PMLR

    Google Scholar 

  59. Puterman ML (2014) Markov decision processes: discrete stochastic dynamic programming, 1st edn. Wiley, New Yor, NY, USA

    MATH  Google Scholar 

  60. Qi J, Zhou Q, Lei L, Zheng K (2021) Federated reinforcement learning: Techniques, applications, and open challenges. arXiv:2108.11887

  61. Rusu AA, Colmenarejo SG, Gulcehre C et al (2015) Policy distillation. arXiv:1511.06295

  62. Rusu AA, Rabinowitz NC, Desjardins G et al (2016) Progressive neural networks. arXiv:1606.04671

  63. Saito A (2018) Curriculum learning based on reward sparseness for deep reinforcement learning of task completion dialogue management. In: Proceedings of the 2018 EMNLP workshop SCAI: the 2nd international workshop on search-oriented conversational AI. Association for Computational Linguistics, pp 46–51

    Google Scholar 

  64. Schaul T, Horgan D, Gregor K, Silver D (2015) Universal value function approximators. In: Proceedings of the 32nd international conference on machine learning, vol 37. PMLR, pp 1312–1320

    Google Scholar 

  65. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

    Article  Google Scholar 

  66. Shahidinejad A, Farahbakhsh F, Ghobaei-Arani M, Malik MH, Anwar T (2021) Context-aware multi-user offloading in mobile edge computing: a federated learning-based approach. J Grid Comput 19(2):1–23

    Article  Google Scholar 

  67. Shao K, Zhu Y, Zhao D (2018) StarCraft micromanagement with reinforcement learning and curriculum transfer learning. IEEE Trans Emerg Top Comput Intell 99:1–12

    Google Scholar 

  68. Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298

    Article  Google Scholar 

  69. Shu T, Tian Y (2019) M\(^3\)RL: mind-aware multi-agent management reinforcement learning. In: International conference on learning representations. https://www.OpenReview.net

  70. Silva FLD, Costa AHR (2018) Object-Oriented curriculum generation for reinforcement learning. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 1026–1034

    Google Scholar 

  71. Silva FLD, Costa AHR (2019) A survey on transfer learning for multiagent reinforcement learning systems. J Artif Intell Res 64:645–703

    Google Scholar 

  72. Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D et al (2018) A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362(7):1140–1144

    Article  MathSciNet  Google Scholar 

  73. Silver D, Schrittwieser J, Simonyan K et al (2017) Mastering the game of Go without human knowledge. Nature 550(7676):354

    Article  Google Scholar 

  74. Sohn S, Oh J, Lee H (2018) Hierarchical reinforcement learning for zero-shot generalization with subtask dependencies. In: Advances in neural information processing systems, vol 31. Curran Associates, Inc, pp 7156–7166

    Google Scholar 

  75. Song J, Ren H, Sadigh D, Ermon S (2018) Multi-Agent Generative Adversarial Imitation Learning. In: Advances in neural information processing systems, vol 31. Curran Associates, Inc, pp 7461–7472

    Google Scholar 

  76. Stadie BC, Abbeel P, Sutskever I (2017) Third Person Imitation Learning. In: 5th International Conference on Learning Representations. https://www.OpenReview.net

  77. Stone P, Sutton RS (2001) Scaling reinforcement learning toward robocup soccer. In: Proceedings of the eighteenth international conference on machine learning. ACM, pp 537–544

    Google Scholar 

  78. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn. MIT Press, Cambridge, MA, USA

    MATH  Google Scholar 

  79. Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10:1633–1685

    MathSciNet  MATH  Google Scholar 

  80. Teh Y, Bapst V, Czarnecki WM, Quan J, Kirkpatrick J, Hadsell R, Heess N, Pascanu R (2017) Distral: robust multitask reinforcement learning. In: Advances in neural information processing systems, vol 30. Curran Associates, Inc, pp 4496–4506

    Google Scholar 

  81. Tesauro G (1995) Temporal difference learning and TD-Gammon. Commun ACM 38(3):58–68

    Article  Google Scholar 

  82. Tessler C, Givony S, Zahavy T et al (2017) A deep hierarchical approach to lifelong learning in minecraft. In: Proceedings of the thirty-first AAAI conference on artificial intelligence. AAAI Press, pp 1553–1561

    Google Scholar 

  83. Thrun S (1998) Lifelong learning algorithms. In: Learning to learn, pp 181–209. Springer, Boston, MA

    Google Scholar 

  84. Tirinzoni A, Salvini M, Restelli M (2019) Transfer of samples in policy search via multiple importance sampling. In: Proceedings of the 36th international conference on machine learning, vol 97. PMLR, pp 6264–6274

    Google Scholar 

  85. Tirinzoni A, Sessa A, Pirotta M, Restelli M (2018) Importance weighted transfer of samples in reinforcement learning. In: Proceedings of the 35th international conference on machine learning, vol 80. PMLR, pp 4936–4945

    Google Scholar 

  86. Tomar M, Sathuluri A, Ravindran B (2019) MaMiC: macro and micro curriculum for robotic reinforcement learning. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 2226–2228

    Google Scholar 

  87. Tutunov R, Kim D, Bou Ammar H (2018) Distributed multitask reinforcement learning with quadratic convergence. In: Advances in neural information processing systems, vol 31. Curran Associates, Inc, pp 8907–8916

    Google Scholar 

  88. Vanschoren J (2018) Meta-learning: a survey. arXiv:1810.03548

  89. Vezhnevets AS, Osindero S, Schaul T, Heess N, Jaderberg M, Silver D, Kavukcuoglu K (2017) FeUdal networks for hierarchical reinforcement learning. In: Proceedings of the 34th international conference on machine learning, vol 70. PMLR, pp 3540–3549

    Google Scholar 

  90. Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature 575(7782):350–354

    Article  Google Scholar 

  91. Wang X, Wang C, Li X, Leung VC, Taleb T (2020) Federated deep reinforcement learning for internet of things with decentralized cooperative edge caching. IEEE Int Things J 7(10):9441–9455

    Article  Google Scholar 

  92. Weiss K, Khoshgoftaar TM, Wang D (2016) A survey of transfer learning. J Big Data 3(1):9

    Article  Google Scholar 

  93. Woodworth RS, Thorndike EL (1901) The influence of improvement in one mental function upon the efficiency of other functions.(i). Psychol Rev 8(3):247

    Google Scholar 

  94. Wu Y, Tian Y (2017) Training agent for first-person shooter game with actor-critic curriculum learning. In: 5th international conference on learning representations. https://www.OpenReview.net

  95. Yang Q, Liu Y, Chen T, Tong Y (2019) Federated machine learning: concept and applications. ACM Trans Intell Syst Technol (TIST) 10(2):1–19

    Article  Google Scholar 

  96. Yin H, Pan SJ (2017) Knowledge transfer for deep reinforcement learning with hierarchical experience replay. In: Thirty-First AAAI conference on artificial intelligence

    Google Scholar 

  97. Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: NIPS, pp 3320–3328

    Google Scholar 

  98. Zhao W, Queralta JP, Westerlund T (2020) Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: 2020 IEEE symposium series on computational intelligence (SSCI). IEEE, pp 737–744

    Google Scholar 

  99. Zhu Z, Lin K, Zhou J (2020) Transfer learning in deep reinforcement learning: a survey. CoRR. arxiv:2009.07888

  100. Zhuo HH, Feng W, Lin Y, Xu Q, Yang Q (2019) Federated deep reinforcement learning. arXiv:1901.08277

Download references

Acknowledgements

A. H. R. Costa gratefully acknowledges support from CNPq (grant 310085/2020-9) and Itaú Unibanco S.A. (Data Science Center - C2D). A. H. R. Costa and R. A. C. Bianchi’s work was carried out at the Center for Artificial Intelligence - C4AI (FAPESP grant 2019/07665-4 and support from the IBM Corporation). R. Glatt and F. L. Silva’s portion of the work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC. LLNL-JRNL-790961.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruben Glatt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Glatt, R., da Silva, F.L., da Costa Bianchi, R.A., Costa, A.H.R. (2023). A Study on Efficient Reinforcement Learning Through Knowledge Transfer. In: Razavi-Far, R., Wang, B., Taylor, M.E., Yang, Q. (eds) Federated and Transfer Learning. Adaptation, Learning, and Optimization, vol 27. Springer, Cham. https://doi.org/10.1007/978-3-031-11748-0_14

Download citation

Publish with us

Policies and ethics