Reinforcement Learning in Few-Shot Scenarios: A Survey

Wang, Zhechao; Fu, Qiming; Chen, Jianping; Wang, Yunzhe; Lu, You; Wu, Hongjie

doi:10.1007/s10723-023-09663-0

Reinforcement Learning in Few-Shot Scenarios: A Survey

Published: 05 June 2023

Volume 21, article number 30, (2023)
Cite this article

Journal of Grid Computing Aims and scope Submit manuscript

Zhechao Wang^1,2,
Qiming Fu^1,2,
Jianping Chen^2,3,
Yunzhe Wang^1,2,
You Lu^1,2 &
…
Hongjie Wu^1,2

421 Accesses
1 Citation
Explore all metrics

Abstract

Reinforcement learning has a demand for massive data in complex problems, which makes it infeasible to be applied to real cases where sampling is difficult. The key to coping with these few-shot problems is knowledge generalization, and related algorithms are often called few-shot reinforcement learning (FS-RL). However, there lacks a formal definition and comprehensive analyses of few-shot scenarios and FS-RL algorithms. Therefore, after giving a uniform definition, we categorize few-shot scenarios into two types. The first type pursues more professional performance, while the other one pursues more general performance. In the process of knowledge transfer, few-shot scenarios usually have an obvious tendency to some type of knowledge. Based on this, we divide FS-RL algorithms into two types: the direct transfer case and the indirect transfer case. Thereafter, existing algorithms are discussed under this classification. Finally, we discuss future directions of FS-RL from the aspect of both theory and application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of transfer learning

Article Open access 28 May 2016

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

References

Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., Hassabis, D.: Mastering the game of Go without human knowledge. Nature 550(7676), 354–359 (2017). https://doi.org/10.1038/nature24270
Article Google Scholar
Vinyals, O., Babuschkin, I., Czarnecki, W.M., Mathieu, M., Dudzik, A., Chung, J., Choi, D.H., Powell, R., Ewalds, T., Georgiev, P., et al.: Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019). https://doi.org/10.1038/s41586-019-1724-z
Article Google Scholar
Finn, C., Abbeel, P., Levine. S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the international conference on machine learning, pp. 1126–1135. (2017)
Finn, C., Rajeswaran, A., Kakade, S., Levine, S.: Online meta-learning. In: Proceedings of the international conference on machine learning, pp. 1920–1930 (2019)
Fe-Fei, L., Fergus, P.: A bayesian approach to unsupervised one-shot learning of object categories. In: Proceedings of the ninth IEEE international conference on computer vision, pp. 1134–1141. (2003). https://doi.org/10.1109/iccv.2003.1238476
Chen, S., Wang, M., Song, W., Yang, Y., Li, Y., Fu, M.: Stabilization approaches for reinforcement learning-based end-to-end autonomous driving. IEEE Trans. Vehic. Technol. 69(5), 4740–4750 (2020). https://doi.org/10.1109/tvt.2020.2979493
Article Google Scholar
Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Proceedings of the advances in neural information processing systems, vol. 31, (2018)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010). https://doi.org/10.1109/TKDE.2009.191
Article Google Scholar
Lazaric, A.: Transfer in reinforcement learning: A framework and a survey. In: Adaptation, learning, and optimization, pp. 143–173. Springer, Berlin, (2012). https://doi.org/10.1007/978-3-642-27645-3_5
Chen, Y., Hoffman, M.W., Colmenarejo, S.G., Denil, M., Lillicrap, T.P., Botvinick, M., de Freitas, N.: Learning to learn without gradient descent by gradient descent. In: Proceedings of the 34th international conference on machine learning, vol. 70, pp. 748–756. (2017)
Ji, J., Chen, X., Wang, Q., Yu, L., Li, P.: Learning to learn gradient aggregation by gradient descent. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, pp. 2614–2620. (2019). https://doi.org/10.24963/ijcai.2019/363
Liu, B.: Learning on the job: Online lifelong and continual learning. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 13544–13549. (2020). https://doi.org/10.1609/aaai.v34i09.7079
Wang, Y., Yao, Q., Kwok, J.T., Ni, L.M.: Generalizing from a few examples. ACM Comput. Surv. 53(3), 1–34 (2020). https://doi.org/10.1145/3386252
Article Google Scholar
Baxter, J.: A model of inductive bias learning. J. Artif. Intel. Res. 12, 149–198 (2000). https://doi.org/10.1613/jair.731
Article MathSciNet MATH Google Scholar
Silver, D.L.: Selective transfer of neural network task knowledge. PhD thesis, University of Western Ontario (2000)
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT press (2018)
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems, vol. 37. University of Cambridge, Department of Engineering Cambridge, UK (1994)
Google Scholar
van Seijen, H., van Hasselt, H., Whiteson, S., Wiering, M.: A theoretical and empirical analysis of expected sarsa. In: Proceedings of the 2009 IEEE symposium on adaptive dynamic programming and reinforcement learning, (2009). https://doi.org/10.1109/adprl.2009.4927542
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. In: Reinforcement Learning, Springer, US, pp. 5–32. (1992). https://doi.org/10.1007/978-1-4615-3618-5_2
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Proceedings of the international conference on machine learning, pp. 1889–1897. (2015)
Mitchell, T.M., et al.: Machine learning. McGraw-hill, New York (1997)
MATH Google Scholar
Salvato, E., Fenu, G., Medvet, E., Pellegrino, F.A.: Crossing the reality gap: a survey on sim-to-real transferability of robot controllers in reinforcement learning. IEEE Access 9, 153171–153187 (2021). https://doi.org/10.1109/ACCESS.2021.3126658
Article Google Scholar
Zhou, S., Pereida, K., Zhao, W., Schoellig, A.P.: Bridging the model-reality gap with lipschitz network adaptation. IEEE Robotics and Automation Letters 7(1), 642–649 (2022). https://doi.org/10.1109/LRA.2021.3131698
Article Google Scholar
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: Proceedings of the 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), (2017). https://doi.org/10.1109/iros.2017.8202133
Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Pieter Abbeel, O., Zaremba, W.: Hindsight experience replay. In: Proceedings of the advances in neural information processing systems, vol. 30, (2017)
Clavera, I., Nagabandi, A., Liu, S., Fearing, R.S., Abbeel, P., Levine, S., Finn, C.: Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In: Proceedings of the international conference on learning representations, (2019)
Hester, T., Vecerik, M., Pietquin, O., Lanctot, M., Schaul, T., Piot, B., Horgan, D., Quan, J., Sendonaris, A., Osband, I., et al.: Deep q-learning from demonstrations. In: Proceedings of the AAAI conference on artificial intelligence, pp. 3223–3230 (2018)
Kirsch, L., van Steenkiste, S., Schmidhuber, J.: Improving generalization in meta reinforcement learning using learned objectives. In: Proceedings of the international conference on learning representations, (2020). https://openreview.net/forum?id=S1evHerYPr
Woodworth, R.S., Thorndike, E.: The influence of improvement in one mental function upon the efficiency of other functions.(i). Psychol. Rev. 8(3), 247–261 (1901)
Article Google Scholar
Anderson, J.R.: Cognitive psychology and its implications (7th Edition) Worth (2009)
Wang, H., Gao, Y., Chen, X.: Transfer of reinforcement learning: the state of the art. Acta Electr. Sin. 36(s1), 39–43 (2008)
Google Scholar
Silver, D.L., Yang, Q., Li, L.: Lifelong machine learning systems: beyond learning algorithms. In: Proceedings of the 2013 AAAI spring symposium series (2013)
Brunskill, E., Li, L.: Pac-inspired option discovery in lifelong reinforcement learning. In: Proceedings of the 31st international conference on machine learning, vol 32, pp. 316–324. (2014)
Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10(7), (2009)
Approximate policy iteration with demonstration data: Kim, B., Farahmand, A.m., Pineau, J., Precup, D. RLDM 2013, 168–172 (2013)
Google Scholar
Kim, B., Farahmand, A.m., Pineau, J., Precup, D.: (2013b) Learning from limited demonstrations. In: Proceedings of the advances in neural information processing systems, vol. 26, pp. 2859–2867. (2013a)
Bertsekas, D.P.: Approximate policy iteration: a survey and some new methods. J. Control Theory Appl. 9(3), 310–335 (2011). https://doi.org/10.1007/s11768-011-1005-3
Article MathSciNet MATH Google Scholar
Piot, B., Geist, M., Pietquin, O.: Boosted bellman residual minimization handling expert demonstrations. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Proceedings of the machine learning and knowledge discovery in databases, Berlin, Heidelberg, pp. 549–564. (2014)
Chemali, J., Lazaric, A.: Direct policy iteration with demonstrations. In: Proceedings of the 24th international joint conference on artificial intelligence, pp. 3380–3386. (2015)
Gao, Y., Huazhe, X., Lin, J., Yu, F., Levine, S., Darrell, T.: Reinforcement learning from imperfect demonstrations. In: 6th International conference on learning representations, (2018)
Jing, M., Ma, X., Huang, W., Sun, F., Yang, C., Fang, B., Liu, H.: Reinforcement learning from imperfect demonstrations under soft expert guidance. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 5109–5116. (2020). https://doi.org/10.1609/aaai.v34i04.5953
Lazaric, A., Restelli, M., Bonarini, A.: Transfer of samples in batch reinforcement learning. In: Proceedings of the 25th international conference on machine learning, New York, NY, USA, pp. 544–551. https://doi.org/10.1145/1390156.1390225
Cortes, C., Mohri, M., Riley, M., Rostamizadeh, A.: Sample selection bias correction theory. In: Proceedings of the international conference on algorithmic learning theory, pp. 38–53, (2008). https://doi.org/10.1007/978-3-540-87987-9_8
Laroche, R., Barlier, M.: Transfer reinforcement learning with shared dynamics. In: Proceedings of the AAAI conference on artificial intelligence, vol 31. pp. 2147–2153. (2017)
Tirinzoni, A., Sessa, A., Pirotta, M., Restelli, M.: Importance weighted transfer of samples in reinforcement learning. In: Proceedings of the 35th international conference on machine learning, vol. 80, pp. 4936–4945. (2018)
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. J. Mach. Learn. Res. 6, 503–556 (2005)
MathSciNet MATH Google Scholar
Fakoor, R., Chaudhari, P., Soatto, S., Smola, A.J.: Meta-q-learning. In: Proceedings of the international conference on learning representations (2020)
Harutyunyan, A., Devlin, S., Vrancx, P., Nowé, A.: Expressing arbitrary reward functions as potential-based advice. In: Proceedings of the AAAI conference on artificial intelligence, vol. 29, pp. 2652–2658. (2015)
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: ICML, vol. 99, pp. 278–287. (1999)
Wiewiora, E., Cottrell, G.W., Elkan, C.: Principled methods for advising reinforcement learning agents. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp. 792–799. (2003)
Devlin, S.M., Kudenko. D.: Dynamic potential-based reward shaping. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems, pp. 433–440. (2012)
Brys, T., Harutyunyan, A., Suay, H.B., Chernova, S., Taylor, M.E., Nowé, A.: Reinforcement learning from demonstration through shaping. In: Proceedings of the 24th international joint conference on artificial intelligence, pp. 3352–3358. (2015a)
Brys, T., Harutyunyan, A., Taylor, M.E., Nowé, A.: Policy transfer using reward shaping. In: Proceedings of the AAMAS, pp. 181–188. (2015b)
Marom, O., Rosman, B.: Belief reward shaping in reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, vol. 32, pp. 3762–3769. (2018)
Bengio, S., Bengio, Y., Cloutier, J., Gecsei, J.: On the optimization of a synaptic learning rule. In: Proceedings of the optimality in artificial and biological neural networks. (1992)
Andrychowicz, M., Denil, M., Gómez, S., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B., de Freitas, N.: Learning to learn by gradient descent by gradient descent. In: Proceedings of the advances in neural information processing systems, vol. 29, pp. 3981–3989. (2016)
Nichol, A., Achiam, J., Schulman, J.: On first-order meta-learning algorithms. (2018). arXiv:1803.02999
Deleu, T., Bengio, Y.: The effects of negative adaptation in model-agnostic meta-learning. (2018). arXiv:1812.02159
Lee, K.I., Lee, S., Song, B.C.: Zero-shot knowledge distillation using label-free adversarial perturbation with taylor approximation. IEEE Access. 9, 45454–45461 (2021). https://doi.org/10.1109/access.2021.3066513
Article Google Scholar
Nguyen-Meidine, L.T., Belal, A., Kiran, M., Dolz, J., Blais-Morin, L.A., Granger, E.: Knowledge distillation methods for efficient unsupervised adaptation across multiple domains. Image Vis. Comput. 108, 104096 (2021). https://doi.org/10.1016/j.imavis.2021.104096
Article Google Scholar
Rusu, A.A., Colmenarejo, S.G., Gulcehre, C., Desjardins, G., Kirkpatrick, J., Pascanu, R., Mnih, V., Kavukcuoglu, K., Hadsell, R.: Policy distillation. In: Proceedings of the international conference on learning representations, (2016)
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999). https://doi.org/10.1016/s0004-3702(99)00052-1
Article MathSciNet MATH Google Scholar
Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: Proceedings of the AAAI conference on artificial intelligence, vol. 31, pp. 1726–1734. (2017)
Riemer, M., Liu, M., Tesauro, G.: Learning abstract options. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds) Proceedings of the advances in neural information processing systems, vol. 31, pp. 10445–10455. (2018)
Dayan, P., Hinton, G.E.: Feudal reinforcement learning. In: Proceedings of the advances in neural information processing systems, vol. 5, pp. 271–278. (1993)
Jothimurugan, K., Bastani, O., Alur, R.: Abstract value iteration for hierarchical reinforcement learning. In: Proceedings of the international conference on artificial intelligence and statistics, vol. 130, pp. 1162–1170. (2021)
Rafati, J., Noelle, D.C.: Learning representations in model-free hierarchical reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 10009–10010. (2019). https://doi.org/10.1609/aaai.v33i01.330110009
Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dyn. Syst. 13(1–2), 41–77 (2003)
Article MathSciNet MATH Google Scholar
Abel, D.: A theory of state abstraction for reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 9876–9877. (2019) https://doi.org/10.1609/aaai.v33i01.33019876
Abel, D., Hershkowitz, D., Littman, M.: Near optimal behavior via approximate state abstraction. In: Proceedings of the international conference on machine learning, New York, New York, USA, Proceedings of Machine Learning Research, vol. 48, pp. 2915–2923. (2016)
Li, L., Walsh, T.J., Littma, M.L.: Towards a unified theory of state abstraction for mdps. In: Proceedings of the international symposium on artificial intelligence and mathematics. (2006)
Abel, D., Arumugam, D., Lehnert, L., Littman, M.: State abstractions for lifelong reinforcement learning. In: Proceedings of the international conference on machine learning, vol. 80, pp. 10–19. (2018)
Valiant, L.G.: A theory of the learnable. In: Proceedings of the 16th annual ACM symposium on Theory of computing - STOC ’84, (1984). https://doi.org/10.1145/800057.808710
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym. (2016). arXiv:1606.01540
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence, (2016) https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12389
Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control. In: Proceedings of the 2012 IEEE/RSJ international conference on intelligent robots and systems, pp. 5026–5033. (2012). https://doi.org/10.1109/IROS.2012.6386109
Yao, H., Zhang, C., Wei, Y., Jiang, M., Wang, S., Huang, J., Chawla, N., Li, Z.: Graph few-shot learning via knowledge transfer. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 6656–6663. (2020). https://doi.org/10.1609/aaai.v34i04.6142
Zhang, C., Yao, H., Huang, C., Jiang, M., Li, Z., Chawla, NV.: Few-shot knowledge graph completion. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 3041–3048. (2020). https://doi.org/10.1609/aaai.v34i03.5698
Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2011). https://doi.org/10.1109/tnn.2010.2091281
Article Google Scholar
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2096–2030 (2016)
MathSciNet MATH Google Scholar
Motiian, S., Jones, Q., Iranmanesh, S., Doretto, G.: Few-shot adversarial domain adaptation. In: Proceedings of the advances in neural information processing systems, vol. 30, pp. 6670–6680. (2017)
Zou, H., Zhou, Y., Yang, J., Liu, H., Das, H.P., Spanos, C.J.: Consensus adversarial domain adaptation. Proc. AAAI Conf. Artif. Intell. 33(1), 5997–6004 (2019). https://doi.org/10.1609/aaai.v33i01.33015997
Parisotto, E., Ba, J.L., Salakhutdinov, R.: Actor-mimic: Deep multitask and transfer reinforcement learning. In: Proceedings of the 4th international conference on learning representations, pp. 156–171. (2016)
Nguyen, T., Luu, T., Pham, T., Rakhimkul, S., Yoo, C.D.: Robust maml: prioritization task buffer with adaptive learning process for model-agnostic meta-learning. (2021). arXiv:2103.08233
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, New York, NY, USA, ICML ’09, pp. 41–48. (2009). https://doi.org/10.1145/1553374.1553380
Lecarpentier, E., Abel, D., Asadi, K., Jinnai, Y., Rachelson, E., Littman, M.L.: Lipschitz lifelong reinforcement learning. 35, 8270–8278 (2021)
Hester, T., Stone, P.: TEXPLORE: real-time sample-efficient reinforcement learning for robots. Mach. Learn. 90(3), 385–429 (2012). https://doi.org/10.1007/s10994-012-5322 -7
Abbeel, P., Coates, A., Quigley, M., Ng, A.: An application of reinforcement learning to aerobatic helicopter flight. In: Proceedings of the advances in neural information processing systems, vol. 19, pp. 1–8. (2007)
Shani, G., Heckerman, D., Brafman, R.I.: An mdp-based recommender system. J. Mach. Learn. Res. 6(43), 1265–1295 (2005)
MathSciNet MATH Google Scholar
Saleh, A., Jaques, N., Ghandeharioun, A., Shen, J., Picard, R.: Hierarchical reinforcement learning for open-domain dialog. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 8741–8748. (2020). https://doi.org/10.1609/aaai.v34i05.6400
Sohn, S., Woo, H., Choi, J., Lee, H.: Meta reinforcement learning with autonomous inference of subtask dependencies. In: Proceedings of the international conference on learning representations, (2020)

Download references

Acknowledgements

This work was financially supported by Primary Research and Development Plan of China (No.2020YF-C2006602), National Natural Science Foundation of China (No.62072324, No.61876217, No.61876121, No.61772357), University Natural Science Foundation of Jiangsu Province (No.21KJA520005), Primary Research and Development Plan of Jiangsu Province (No.BE2020026), Natural Science Foundation of Jiangsu Province (No.BK20190942), Postgraduate Research & Practice Innovation Program of Jiangsu Province (No.KYCX 21_3020).

Author information

Authors and Affiliations

School of Electronics and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
Zhechao Wang, Qiming Fu, Yunzhe Wang, You Lu & Hongjie Wu
Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China
Zhechao Wang, Qiming Fu, Jianping Chen, Yunzhe Wang, You Lu & Hongjie Wu
School of Architecture and Urban Planning, Suzhou University of Science and Technology, Suzhou, 215009, China
Jianping Chen

Authors

Zhechao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qiming Fu
View author publications
You can also search for this author in PubMed Google Scholar
Jianping Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yunzhe Wang
View author publications
You can also search for this author in PubMed Google Scholar
You Lu
View author publications
You can also search for this author in PubMed Google Scholar
Hongjie Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiming Fu.

Ethics declarations

Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, Z., Fu, Q., Chen, J. et al. Reinforcement Learning in Few-Shot Scenarios: A Survey. J Grid Computing 21, 30 (2023). https://doi.org/10.1007/s10723-023-09663-0

Download citation

Received: 09 November 2021
Accepted: 10 April 2023
Published: 05 June 2023
DOI: https://doi.org/10.1007/s10723-023-09663-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement Learning in Few-Shot Scenarios: A Survey

Abstract

Access this article

Similar content being viewed by others

A survey of transfer learning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reinforcement Learning in Few-Shot Scenarios: A Survey

Abstract

Access this article

Similar content being viewed by others

A survey of transfer learning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation