Skip to main content
Log in

Reinforcement Learning in Few-Shot Scenarios: A Survey

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Reinforcement learning has a demand for massive data in complex problems, which makes it infeasible to be applied to real cases where sampling is difficult. The key to coping with these few-shot problems is knowledge generalization, and related algorithms are often called few-shot reinforcement learning (FS-RL). However, there lacks a formal definition and comprehensive analyses of few-shot scenarios and FS-RL algorithms. Therefore, after giving a uniform definition, we categorize few-shot scenarios into two types. The first type pursues more professional performance, while the other one pursues more general performance. In the process of knowledge transfer, few-shot scenarios usually have an obvious tendency to some type of knowledge. Based on this, we divide FS-RL algorithms into two types: the direct transfer case and the indirect transfer case. Thereafter, existing algorithms are discussed under this classification. Finally, we discuss future directions of FS-RL from the aspect of both theory and application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., Hassabis, D.: Mastering the game of Go without human knowledge. Nature 550(7676), 354–359 (2017). https://doi.org/10.1038/nature24270

    Article  Google Scholar 

  2. Vinyals, O., Babuschkin, I., Czarnecki, W.M., Mathieu, M., Dudzik, A., Chung, J., Choi, D.H., Powell, R., Ewalds, T., Georgiev, P., et al.: Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019). https://doi.org/10.1038/s41586-019-1724-z

    Article  Google Scholar 

  3. Finn, C., Abbeel, P., Levine. S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the international conference on machine learning, pp. 1126–1135. (2017)

  4. Finn, C., Rajeswaran, A., Kakade, S., Levine, S.: Online meta-learning. In: Proceedings of the international conference on machine learning, pp. 1920–1930 (2019)

  5. Fe-Fei, L., Fergus, P.: A bayesian approach to unsupervised one-shot learning of object categories. In: Proceedings of the ninth IEEE international conference on computer vision, pp. 1134–1141. (2003). https://doi.org/10.1109/iccv.2003.1238476

  6. Chen, S., Wang, M., Song, W., Yang, Y., Li, Y., Fu, M.: Stabilization approaches for reinforcement learning-based end-to-end autonomous driving. IEEE Trans. Vehic. Technol. 69(5), 4740–4750 (2020). https://doi.org/10.1109/tvt.2020.2979493

    Article  Google Scholar 

  7. Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Proceedings of the advances in neural information processing systems, vol. 31, (2018)

  8. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010). https://doi.org/10.1109/TKDE.2009.191

    Article  Google Scholar 

  9. Lazaric, A.: Transfer in reinforcement learning: A framework and a survey. In: Adaptation, learning, and optimization, pp. 143–173. Springer, Berlin, (2012). https://doi.org/10.1007/978-3-642-27645-3_5

  10. Chen, Y., Hoffman, M.W., Colmenarejo, S.G., Denil, M., Lillicrap, T.P., Botvinick, M., de Freitas, N.: Learning to learn without gradient descent by gradient descent. In: Proceedings of the 34th international conference on machine learning, vol. 70, pp. 748–756. (2017)

  11. Ji, J., Chen, X., Wang, Q., Yu, L., Li, P.: Learning to learn gradient aggregation by gradient descent. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, pp. 2614–2620. (2019). https://doi.org/10.24963/ijcai.2019/363

  12. Liu, B.: Learning on the job: Online lifelong and continual learning. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 13544–13549. (2020). https://doi.org/10.1609/aaai.v34i09.7079

  13. Wang, Y., Yao, Q., Kwok, J.T., Ni, L.M.: Generalizing from a few examples. ACM Comput. Surv. 53(3), 1–34 (2020). https://doi.org/10.1145/3386252

    Article  Google Scholar 

  14. Baxter, J.: A model of inductive bias learning. J. Artif. Intel. Res. 12, 149–198 (2000). https://doi.org/10.1613/jair.731

    Article  MathSciNet  MATH  Google Scholar 

  15. Silver, D.L.: Selective transfer of neural network task knowledge. PhD thesis, University of Western Ontario (2000)

  16. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT press (2018)

  17. Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems, vol. 37. University of Cambridge, Department of Engineering Cambridge, UK (1994)

    Google Scholar 

  18. van Seijen, H., van Hasselt, H., Whiteson, S., Wiering, M.: A theoretical and empirical analysis of expected sarsa. In: Proceedings of the 2009 IEEE symposium on adaptive dynamic programming and reinforcement learning, (2009). https://doi.org/10.1109/adprl.2009.4927542

  19. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. In: Reinforcement Learning, Springer, US, pp. 5–32. (1992). https://doi.org/10.1007/978-1-4615-3618-5_2

  20. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Proceedings of the international conference on machine learning, pp. 1889–1897. (2015)

  21. Mitchell, T.M., et al.: Machine learning. McGraw-hill, New York (1997)

    MATH  Google Scholar 

  22. Salvato, E., Fenu, G., Medvet, E., Pellegrino, F.A.: Crossing the reality gap: a survey on sim-to-real transferability of robot controllers in reinforcement learning. IEEE Access 9, 153171–153187 (2021). https://doi.org/10.1109/ACCESS.2021.3126658

    Article  Google Scholar 

  23. Zhou, S., Pereida, K., Zhao, W., Schoellig, A.P.: Bridging the model-reality gap with lipschitz network adaptation. IEEE Robotics and Automation Letters 7(1), 642–649 (2022). https://doi.org/10.1109/LRA.2021.3131698

    Article  Google Scholar 

  24. Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: Proceedings of the 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), (2017). https://doi.org/10.1109/iros.2017.8202133

  25. Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Pieter Abbeel, O., Zaremba, W.: Hindsight experience replay. In: Proceedings of the advances in neural information processing systems, vol. 30, (2017)

  26. Clavera, I., Nagabandi, A., Liu, S., Fearing, R.S., Abbeel, P., Levine, S., Finn, C.: Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In: Proceedings of the international conference on learning representations, (2019)

  27. Hester, T., Vecerik, M., Pietquin, O., Lanctot, M., Schaul, T., Piot, B., Horgan, D., Quan, J., Sendonaris, A., Osband, I., et al.: Deep q-learning from demonstrations. In: Proceedings of the AAAI conference on artificial intelligence, pp. 3223–3230 (2018)

  28. Kirsch, L., van Steenkiste, S., Schmidhuber, J.: Improving generalization in meta reinforcement learning using learned objectives. In: Proceedings of the international conference on learning representations, (2020). https://openreview.net/forum?id=S1evHerYPr

  29. Woodworth, R.S., Thorndike, E.: The influence of improvement in one mental function upon the efficiency of other functions.(i). Psychol. Rev. 8(3), 247–261 (1901)

    Article  Google Scholar 

  30. Anderson, J.R.: Cognitive psychology and its implications (7th Edition) Worth (2009)

  31. Wang, H., Gao, Y., Chen, X.: Transfer of reinforcement learning: the state of the art. Acta Electr. Sin. 36(s1), 39–43 (2008)

    Google Scholar 

  32. Silver, D.L., Yang, Q., Li, L.: Lifelong machine learning systems: beyond learning algorithms. In: Proceedings of the 2013 AAAI spring symposium series (2013)

  33. Brunskill, E., Li, L.: Pac-inspired option discovery in lifelong reinforcement learning. In: Proceedings of the 31st international conference on machine learning, vol 32, pp. 316–324. (2014)

  34. Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10(7), (2009)

  35. Approximate policy iteration with demonstration data: Kim, B., Farahmand, A.m., Pineau, J., Precup, D. RLDM 2013, 168–172 (2013)

    Google Scholar 

  36. Kim, B., Farahmand, A.m., Pineau, J., Precup, D.: (2013b) Learning from limited demonstrations. In: Proceedings of the advances in neural information processing systems, vol. 26, pp. 2859–2867. (2013a)

  37. Bertsekas, D.P.: Approximate policy iteration: a survey and some new methods. J. Control Theory Appl. 9(3), 310–335 (2011). https://doi.org/10.1007/s11768-011-1005-3

    Article  MathSciNet  MATH  Google Scholar 

  38. Piot, B., Geist, M., Pietquin, O.: Boosted bellman residual minimization handling expert demonstrations. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Proceedings of the machine learning and knowledge discovery in databases, Berlin, Heidelberg, pp. 549–564. (2014)

  39. Chemali, J., Lazaric, A.: Direct policy iteration with demonstrations. In: Proceedings of the 24th international joint conference on artificial intelligence, pp. 3380–3386. (2015)

  40. Gao, Y., Huazhe, X., Lin, J., Yu, F., Levine, S., Darrell, T.: Reinforcement learning from imperfect demonstrations. In: 6th International conference on learning representations, (2018)

  41. Jing, M., Ma, X., Huang, W., Sun, F., Yang, C., Fang, B., Liu, H.: Reinforcement learning from imperfect demonstrations under soft expert guidance. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 5109–5116. (2020). https://doi.org/10.1609/aaai.v34i04.5953

  42. Lazaric, A., Restelli, M., Bonarini, A.: Transfer of samples in batch reinforcement learning. In: Proceedings of the 25th international conference on machine learning, New York, NY, USA, pp. 544–551. https://doi.org/10.1145/1390156.1390225

  43. Cortes, C., Mohri, M., Riley, M., Rostamizadeh, A.: Sample selection bias correction theory. In: Proceedings of the international conference on algorithmic learning theory, pp. 38–53, (2008). https://doi.org/10.1007/978-3-540-87987-9_8

  44. Laroche, R., Barlier, M.: Transfer reinforcement learning with shared dynamics. In: Proceedings of the AAAI conference on artificial intelligence, vol 31. pp. 2147–2153. (2017)

  45. Tirinzoni, A., Sessa, A., Pirotta, M., Restelli, M.: Importance weighted transfer of samples in reinforcement learning. In: Proceedings of the 35th international conference on machine learning, vol. 80, pp. 4936–4945. (2018)

  46. Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. J. Mach. Learn. Res. 6, 503–556 (2005)

    MathSciNet  MATH  Google Scholar 

  47. Fakoor, R., Chaudhari, P., Soatto, S., Smola, A.J.: Meta-q-learning. In: Proceedings of the international conference on learning representations (2020)

  48. Harutyunyan, A., Devlin, S., Vrancx, P., Nowé, A.: Expressing arbitrary reward functions as potential-based advice. In: Proceedings of the AAAI conference on artificial intelligence, vol. 29, pp. 2652–2658. (2015)

  49. Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: ICML, vol. 99, pp. 278–287. (1999)

  50. Wiewiora, E., Cottrell, G.W., Elkan, C.: Principled methods for advising reinforcement learning agents. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp. 792–799. (2003)

  51. Devlin, S.M., Kudenko. D.: Dynamic potential-based reward shaping. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems, pp. 433–440. (2012)

  52. Brys, T., Harutyunyan, A., Suay, H.B., Chernova, S., Taylor, M.E., Nowé, A.: Reinforcement learning from demonstration through shaping. In: Proceedings of the 24th international joint conference on artificial intelligence, pp. 3352–3358. (2015a)

  53. Brys, T., Harutyunyan, A., Taylor, M.E., Nowé, A.: Policy transfer using reward shaping. In: Proceedings of the AAMAS, pp. 181–188. (2015b)

  54. Marom, O., Rosman, B.: Belief reward shaping in reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, vol. 32, pp. 3762–3769. (2018)

  55. Bengio, S., Bengio, Y., Cloutier, J., Gecsei, J.: On the optimization of a synaptic learning rule. In: Proceedings of the optimality in artificial and biological neural networks. (1992)

  56. Andrychowicz, M., Denil, M., Gómez, S., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B., de Freitas, N.: Learning to learn by gradient descent by gradient descent. In: Proceedings of the advances in neural information processing systems, vol. 29, pp. 3981–3989. (2016)

  57. Nichol, A., Achiam, J., Schulman, J.: On first-order meta-learning algorithms. (2018). arXiv:1803.02999

  58. Deleu, T., Bengio, Y.: The effects of negative adaptation in model-agnostic meta-learning. (2018). arXiv:1812.02159

  59. Lee, K.I., Lee, S., Song, B.C.: Zero-shot knowledge distillation using label-free adversarial perturbation with taylor approximation. IEEE Access. 9, 45454–45461 (2021). https://doi.org/10.1109/access.2021.3066513

    Article  Google Scholar 

  60. Nguyen-Meidine, L.T., Belal, A., Kiran, M., Dolz, J., Blais-Morin, L.A., Granger, E.: Knowledge distillation methods for efficient unsupervised adaptation across multiple domains. Image Vis. Comput. 108, 104096 (2021). https://doi.org/10.1016/j.imavis.2021.104096

    Article  Google Scholar 

  61. Rusu, A.A., Colmenarejo, S.G., Gulcehre, C., Desjardins, G., Kirkpatrick, J., Pascanu, R., Mnih, V., Kavukcuoglu, K., Hadsell, R.: Policy distillation. In: Proceedings of the international conference on learning representations, (2016)

  62. Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999). https://doi.org/10.1016/s0004-3702(99)00052-1

    Article  MathSciNet  MATH  Google Scholar 

  63. Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: Proceedings of the AAAI conference on artificial intelligence, vol. 31, pp. 1726–1734. (2017)

  64. Riemer, M., Liu, M., Tesauro, G.: Learning abstract options. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds) Proceedings of the advances in neural information processing systems, vol. 31, pp. 10445–10455. (2018)

  65. Dayan, P., Hinton, G.E.: Feudal reinforcement learning. In: Proceedings of the advances in neural information processing systems, vol. 5, pp. 271–278. (1993)

  66. Jothimurugan, K., Bastani, O., Alur, R.: Abstract value iteration for hierarchical reinforcement learning. In: Proceedings of the international conference on artificial intelligence and statistics, vol. 130, pp. 1162–1170. (2021)

  67. Rafati, J., Noelle, D.C.: Learning representations in model-free hierarchical reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 10009–10010. (2019). https://doi.org/10.1609/aaai.v33i01.330110009

  68. Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dyn. Syst. 13(1–2), 41–77 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  69. Abel, D.: A theory of state abstraction for reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 9876–9877. (2019) https://doi.org/10.1609/aaai.v33i01.33019876

  70. Abel, D., Hershkowitz, D., Littman, M.: Near optimal behavior via approximate state abstraction. In: Proceedings of the international conference on machine learning, New York, New York, USA, Proceedings of Machine Learning Research, vol. 48, pp. 2915–2923. (2016)

  71. Li, L., Walsh, T.J., Littma, M.L.: Towards a unified theory of state abstraction for mdps. In: Proceedings of the international symposium on artificial intelligence and mathematics. (2006)

  72. Abel, D., Arumugam, D., Lehnert, L., Littman, M.: State abstractions for lifelong reinforcement learning. In: Proceedings of the international conference on machine learning, vol. 80, pp. 10–19. (2018)

  73. Valiant, L.G.: A theory of the learnable. In: Proceedings of the 16th annual ACM symposium on Theory of computing - STOC ’84, (1984). https://doi.org/10.1145/800057.808710

  74. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym. (2016). arXiv:1606.01540

  75. van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence, (2016) https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12389

  76. Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control. In: Proceedings of the 2012 IEEE/RSJ international conference on intelligent robots and systems, pp. 5026–5033. (2012). https://doi.org/10.1109/IROS.2012.6386109

  77. Yao, H., Zhang, C., Wei, Y., Jiang, M., Wang, S., Huang, J., Chawla, N., Li, Z.: Graph few-shot learning via knowledge transfer. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 6656–6663. (2020). https://doi.org/10.1609/aaai.v34i04.6142

  78. Zhang, C., Yao, H., Huang, C., Jiang, M., Li, Z., Chawla, NV.: Few-shot knowledge graph completion. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 3041–3048. (2020). https://doi.org/10.1609/aaai.v34i03.5698

  79. Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2011). https://doi.org/10.1109/tnn.2010.2091281

    Article  Google Scholar 

  80. Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2096–2030 (2016)

    MathSciNet  MATH  Google Scholar 

  81. Motiian, S., Jones, Q., Iranmanesh, S., Doretto, G.: Few-shot adversarial domain adaptation. In: Proceedings of the advances in neural information processing systems, vol. 30, pp. 6670–6680. (2017)

  82. Zou, H., Zhou, Y., Yang, J., Liu, H., Das, H.P., Spanos, C.J.: Consensus adversarial domain adaptation. Proc. AAAI Conf. Artif. Intell. 33(1), 5997–6004 (2019). https://doi.org/10.1609/aaai.v33i01.33015997

  83. Parisotto, E., Ba, J.L., Salakhutdinov, R.: Actor-mimic: Deep multitask and transfer reinforcement learning. In: Proceedings of the 4th international conference on learning representations, pp. 156–171. (2016)

  84. Nguyen, T., Luu, T., Pham, T., Rakhimkul, S., Yoo, C.D.: Robust maml: prioritization task buffer with adaptive learning process for model-agnostic meta-learning. (2021). arXiv:2103.08233

  85. Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, New York, NY, USA, ICML ’09, pp. 41–48. (2009). https://doi.org/10.1145/1553374.1553380

  86. Lecarpentier, E., Abel, D., Asadi, K., Jinnai, Y., Rachelson, E., Littman, M.L.: Lipschitz lifelong reinforcement learning. 35, 8270–8278 (2021)

  87. Hester, T., Stone, P.: TEXPLORE: real-time sample-efficient reinforcement learning for robots. Mach. Learn. 90(3), 385–429 (2012). https://doi.org/10.1007/s10994-012-5322-7

  88. Abbeel, P., Coates, A., Quigley, M., Ng, A.: An application of reinforcement learning to aerobatic helicopter flight. In: Proceedings of the advances in neural information processing systems, vol. 19, pp. 1–8. (2007)

  89. Shani, G., Heckerman, D., Brafman, R.I.: An mdp-based recommender system. J. Mach. Learn. Res. 6(43), 1265–1295 (2005)

    MathSciNet  MATH  Google Scholar 

  90. Saleh, A., Jaques, N., Ghandeharioun, A., Shen, J., Picard, R.: Hierarchical reinforcement learning for open-domain dialog. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 8741–8748. (2020). https://doi.org/10.1609/aaai.v34i05.6400

  91. Sohn, S., Woo, H., Choi, J., Lee, H.: Meta reinforcement learning with autonomous inference of subtask dependencies. In: Proceedings of the international conference on learning representations, (2020)

Download references

Acknowledgements

This work was financially supported by Primary Research and Development Plan of China (No.2020YF-C2006602), National Natural Science Foundation of China (No.62072324, No.61876217, No.61876121, No.61772357), University Natural Science Foundation of Jiangsu Province (No.21KJA520005), Primary Research and Development Plan of Jiangsu Province (No.BE2020026), Natural Science Foundation of Jiangsu Province (No.BK20190942), Postgraduate Research & Practice Innovation Program of Jiangsu Province (No.KYCX 21_3020).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiming Fu.

Ethics declarations

Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Fu, Q., Chen, J. et al. Reinforcement Learning in Few-Shot Scenarios: A Survey. J Grid Computing 21, 30 (2023). https://doi.org/10.1007/s10723-023-09663-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10723-023-09663-0

Keywords

Navigation