Abstract
Reinforcement learning has a demand for massive data in complex problems, which makes it infeasible to be applied to real cases where sampling is difficult. The key to coping with these few-shot problems is knowledge generalization, and related algorithms are often called few-shot reinforcement learning (FS-RL). However, there lacks a formal definition and comprehensive analyses of few-shot scenarios and FS-RL algorithms. Therefore, after giving a uniform definition, we categorize few-shot scenarios into two types. The first type pursues more professional performance, while the other one pursues more general performance. In the process of knowledge transfer, few-shot scenarios usually have an obvious tendency to some type of knowledge. Based on this, we divide FS-RL algorithms into two types: the direct transfer case and the indirect transfer case. Thereafter, existing algorithms are discussed under this classification. Finally, we discuss future directions of FS-RL from the aspect of both theory and application.
Similar content being viewed by others
References
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., Hassabis, D.: Mastering the game of Go without human knowledge. Nature 550(7676), 354–359 (2017). https://doi.org/10.1038/nature24270
Vinyals, O., Babuschkin, I., Czarnecki, W.M., Mathieu, M., Dudzik, A., Chung, J., Choi, D.H., Powell, R., Ewalds, T., Georgiev, P., et al.: Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019). https://doi.org/10.1038/s41586-019-1724-z
Finn, C., Abbeel, P., Levine. S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the international conference on machine learning, pp. 1126–1135. (2017)
Finn, C., Rajeswaran, A., Kakade, S., Levine, S.: Online meta-learning. In: Proceedings of the international conference on machine learning, pp. 1920–1930 (2019)
Fe-Fei, L., Fergus, P.: A bayesian approach to unsupervised one-shot learning of object categories. In: Proceedings of the ninth IEEE international conference on computer vision, pp. 1134–1141. (2003). https://doi.org/10.1109/iccv.2003.1238476
Chen, S., Wang, M., Song, W., Yang, Y., Li, Y., Fu, M.: Stabilization approaches for reinforcement learning-based end-to-end autonomous driving. IEEE Trans. Vehic. Technol. 69(5), 4740–4750 (2020). https://doi.org/10.1109/tvt.2020.2979493
Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Proceedings of the advances in neural information processing systems, vol. 31, (2018)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010). https://doi.org/10.1109/TKDE.2009.191
Lazaric, A.: Transfer in reinforcement learning: A framework and a survey. In: Adaptation, learning, and optimization, pp. 143–173. Springer, Berlin, (2012). https://doi.org/10.1007/978-3-642-27645-3_5
Chen, Y., Hoffman, M.W., Colmenarejo, S.G., Denil, M., Lillicrap, T.P., Botvinick, M., de Freitas, N.: Learning to learn without gradient descent by gradient descent. In: Proceedings of the 34th international conference on machine learning, vol. 70, pp. 748–756. (2017)
Ji, J., Chen, X., Wang, Q., Yu, L., Li, P.: Learning to learn gradient aggregation by gradient descent. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, pp. 2614–2620. (2019). https://doi.org/10.24963/ijcai.2019/363
Liu, B.: Learning on the job: Online lifelong and continual learning. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 13544–13549. (2020). https://doi.org/10.1609/aaai.v34i09.7079
Wang, Y., Yao, Q., Kwok, J.T., Ni, L.M.: Generalizing from a few examples. ACM Comput. Surv. 53(3), 1–34 (2020). https://doi.org/10.1145/3386252
Baxter, J.: A model of inductive bias learning. J. Artif. Intel. Res. 12, 149–198 (2000). https://doi.org/10.1613/jair.731
Silver, D.L.: Selective transfer of neural network task knowledge. PhD thesis, University of Western Ontario (2000)
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT press (2018)
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems, vol. 37. University of Cambridge, Department of Engineering Cambridge, UK (1994)
van Seijen, H., van Hasselt, H., Whiteson, S., Wiering, M.: A theoretical and empirical analysis of expected sarsa. In: Proceedings of the 2009 IEEE symposium on adaptive dynamic programming and reinforcement learning, (2009). https://doi.org/10.1109/adprl.2009.4927542
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. In: Reinforcement Learning, Springer, US, pp. 5–32. (1992). https://doi.org/10.1007/978-1-4615-3618-5_2
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Proceedings of the international conference on machine learning, pp. 1889–1897. (2015)
Mitchell, T.M., et al.: Machine learning. McGraw-hill, New York (1997)
Salvato, E., Fenu, G., Medvet, E., Pellegrino, F.A.: Crossing the reality gap: a survey on sim-to-real transferability of robot controllers in reinforcement learning. IEEE Access 9, 153171–153187 (2021). https://doi.org/10.1109/ACCESS.2021.3126658
Zhou, S., Pereida, K., Zhao, W., Schoellig, A.P.: Bridging the model-reality gap with lipschitz network adaptation. IEEE Robotics and Automation Letters 7(1), 642–649 (2022). https://doi.org/10.1109/LRA.2021.3131698
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: Proceedings of the 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), (2017). https://doi.org/10.1109/iros.2017.8202133
Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Pieter Abbeel, O., Zaremba, W.: Hindsight experience replay. In: Proceedings of the advances in neural information processing systems, vol. 30, (2017)
Clavera, I., Nagabandi, A., Liu, S., Fearing, R.S., Abbeel, P., Levine, S., Finn, C.: Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In: Proceedings of the international conference on learning representations, (2019)
Hester, T., Vecerik, M., Pietquin, O., Lanctot, M., Schaul, T., Piot, B., Horgan, D., Quan, J., Sendonaris, A., Osband, I., et al.: Deep q-learning from demonstrations. In: Proceedings of the AAAI conference on artificial intelligence, pp. 3223–3230 (2018)
Kirsch, L., van Steenkiste, S., Schmidhuber, J.: Improving generalization in meta reinforcement learning using learned objectives. In: Proceedings of the international conference on learning representations, (2020). https://openreview.net/forum?id=S1evHerYPr
Woodworth, R.S., Thorndike, E.: The influence of improvement in one mental function upon the efficiency of other functions.(i). Psychol. Rev. 8(3), 247–261 (1901)
Anderson, J.R.: Cognitive psychology and its implications (7th Edition) Worth (2009)
Wang, H., Gao, Y., Chen, X.: Transfer of reinforcement learning: the state of the art. Acta Electr. Sin. 36(s1), 39–43 (2008)
Silver, D.L., Yang, Q., Li, L.: Lifelong machine learning systems: beyond learning algorithms. In: Proceedings of the 2013 AAAI spring symposium series (2013)
Brunskill, E., Li, L.: Pac-inspired option discovery in lifelong reinforcement learning. In: Proceedings of the 31st international conference on machine learning, vol 32, pp. 316–324. (2014)
Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10(7), (2009)
Approximate policy iteration with demonstration data: Kim, B., Farahmand, A.m., Pineau, J., Precup, D. RLDM 2013, 168–172 (2013)
Kim, B., Farahmand, A.m., Pineau, J., Precup, D.: (2013b) Learning from limited demonstrations. In: Proceedings of the advances in neural information processing systems, vol. 26, pp. 2859–2867. (2013a)
Bertsekas, D.P.: Approximate policy iteration: a survey and some new methods. J. Control Theory Appl. 9(3), 310–335 (2011). https://doi.org/10.1007/s11768-011-1005-3
Piot, B., Geist, M., Pietquin, O.: Boosted bellman residual minimization handling expert demonstrations. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Proceedings of the machine learning and knowledge discovery in databases, Berlin, Heidelberg, pp. 549–564. (2014)
Chemali, J., Lazaric, A.: Direct policy iteration with demonstrations. In: Proceedings of the 24th international joint conference on artificial intelligence, pp. 3380–3386. (2015)
Gao, Y., Huazhe, X., Lin, J., Yu, F., Levine, S., Darrell, T.: Reinforcement learning from imperfect demonstrations. In: 6th International conference on learning representations, (2018)
Jing, M., Ma, X., Huang, W., Sun, F., Yang, C., Fang, B., Liu, H.: Reinforcement learning from imperfect demonstrations under soft expert guidance. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 5109–5116. (2020). https://doi.org/10.1609/aaai.v34i04.5953
Lazaric, A., Restelli, M., Bonarini, A.: Transfer of samples in batch reinforcement learning. In: Proceedings of the 25th international conference on machine learning, New York, NY, USA, pp. 544–551. https://doi.org/10.1145/1390156.1390225
Cortes, C., Mohri, M., Riley, M., Rostamizadeh, A.: Sample selection bias correction theory. In: Proceedings of the international conference on algorithmic learning theory, pp. 38–53, (2008). https://doi.org/10.1007/978-3-540-87987-9_8
Laroche, R., Barlier, M.: Transfer reinforcement learning with shared dynamics. In: Proceedings of the AAAI conference on artificial intelligence, vol 31. pp. 2147–2153. (2017)
Tirinzoni, A., Sessa, A., Pirotta, M., Restelli, M.: Importance weighted transfer of samples in reinforcement learning. In: Proceedings of the 35th international conference on machine learning, vol. 80, pp. 4936–4945. (2018)
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. J. Mach. Learn. Res. 6, 503–556 (2005)
Fakoor, R., Chaudhari, P., Soatto, S., Smola, A.J.: Meta-q-learning. In: Proceedings of the international conference on learning representations (2020)
Harutyunyan, A., Devlin, S., Vrancx, P., Nowé, A.: Expressing arbitrary reward functions as potential-based advice. In: Proceedings of the AAAI conference on artificial intelligence, vol. 29, pp. 2652–2658. (2015)
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: ICML, vol. 99, pp. 278–287. (1999)
Wiewiora, E., Cottrell, G.W., Elkan, C.: Principled methods for advising reinforcement learning agents. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp. 792–799. (2003)
Devlin, S.M., Kudenko. D.: Dynamic potential-based reward shaping. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems, pp. 433–440. (2012)
Brys, T., Harutyunyan, A., Suay, H.B., Chernova, S., Taylor, M.E., Nowé, A.: Reinforcement learning from demonstration through shaping. In: Proceedings of the 24th international joint conference on artificial intelligence, pp. 3352–3358. (2015a)
Brys, T., Harutyunyan, A., Taylor, M.E., Nowé, A.: Policy transfer using reward shaping. In: Proceedings of the AAMAS, pp. 181–188. (2015b)
Marom, O., Rosman, B.: Belief reward shaping in reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, vol. 32, pp. 3762–3769. (2018)
Bengio, S., Bengio, Y., Cloutier, J., Gecsei, J.: On the optimization of a synaptic learning rule. In: Proceedings of the optimality in artificial and biological neural networks. (1992)
Andrychowicz, M., Denil, M., Gómez, S., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B., de Freitas, N.: Learning to learn by gradient descent by gradient descent. In: Proceedings of the advances in neural information processing systems, vol. 29, pp. 3981–3989. (2016)
Nichol, A., Achiam, J., Schulman, J.: On first-order meta-learning algorithms. (2018). arXiv:1803.02999
Deleu, T., Bengio, Y.: The effects of negative adaptation in model-agnostic meta-learning. (2018). arXiv:1812.02159
Lee, K.I., Lee, S., Song, B.C.: Zero-shot knowledge distillation using label-free adversarial perturbation with taylor approximation. IEEE Access. 9, 45454–45461 (2021). https://doi.org/10.1109/access.2021.3066513
Nguyen-Meidine, L.T., Belal, A., Kiran, M., Dolz, J., Blais-Morin, L.A., Granger, E.: Knowledge distillation methods for efficient unsupervised adaptation across multiple domains. Image Vis. Comput. 108, 104096 (2021). https://doi.org/10.1016/j.imavis.2021.104096
Rusu, A.A., Colmenarejo, S.G., Gulcehre, C., Desjardins, G., Kirkpatrick, J., Pascanu, R., Mnih, V., Kavukcuoglu, K., Hadsell, R.: Policy distillation. In: Proceedings of the international conference on learning representations, (2016)
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999). https://doi.org/10.1016/s0004-3702(99)00052-1
Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: Proceedings of the AAAI conference on artificial intelligence, vol. 31, pp. 1726–1734. (2017)
Riemer, M., Liu, M., Tesauro, G.: Learning abstract options. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds) Proceedings of the advances in neural information processing systems, vol. 31, pp. 10445–10455. (2018)
Dayan, P., Hinton, G.E.: Feudal reinforcement learning. In: Proceedings of the advances in neural information processing systems, vol. 5, pp. 271–278. (1993)
Jothimurugan, K., Bastani, O., Alur, R.: Abstract value iteration for hierarchical reinforcement learning. In: Proceedings of the international conference on artificial intelligence and statistics, vol. 130, pp. 1162–1170. (2021)
Rafati, J., Noelle, D.C.: Learning representations in model-free hierarchical reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 10009–10010. (2019). https://doi.org/10.1609/aaai.v33i01.330110009
Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dyn. Syst. 13(1–2), 41–77 (2003)
Abel, D.: A theory of state abstraction for reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 9876–9877. (2019) https://doi.org/10.1609/aaai.v33i01.33019876
Abel, D., Hershkowitz, D., Littman, M.: Near optimal behavior via approximate state abstraction. In: Proceedings of the international conference on machine learning, New York, New York, USA, Proceedings of Machine Learning Research, vol. 48, pp. 2915–2923. (2016)
Li, L., Walsh, T.J., Littma, M.L.: Towards a unified theory of state abstraction for mdps. In: Proceedings of the international symposium on artificial intelligence and mathematics. (2006)
Abel, D., Arumugam, D., Lehnert, L., Littman, M.: State abstractions for lifelong reinforcement learning. In: Proceedings of the international conference on machine learning, vol. 80, pp. 10–19. (2018)
Valiant, L.G.: A theory of the learnable. In: Proceedings of the 16th annual ACM symposium on Theory of computing - STOC ’84, (1984). https://doi.org/10.1145/800057.808710
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym. (2016). arXiv:1606.01540
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence, (2016) https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12389
Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control. In: Proceedings of the 2012 IEEE/RSJ international conference on intelligent robots and systems, pp. 5026–5033. (2012). https://doi.org/10.1109/IROS.2012.6386109
Yao, H., Zhang, C., Wei, Y., Jiang, M., Wang, S., Huang, J., Chawla, N., Li, Z.: Graph few-shot learning via knowledge transfer. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 6656–6663. (2020). https://doi.org/10.1609/aaai.v34i04.6142
Zhang, C., Yao, H., Huang, C., Jiang, M., Li, Z., Chawla, NV.: Few-shot knowledge graph completion. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 3041–3048. (2020). https://doi.org/10.1609/aaai.v34i03.5698
Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2011). https://doi.org/10.1109/tnn.2010.2091281
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2096–2030 (2016)
Motiian, S., Jones, Q., Iranmanesh, S., Doretto, G.: Few-shot adversarial domain adaptation. In: Proceedings of the advances in neural information processing systems, vol. 30, pp. 6670–6680. (2017)
Zou, H., Zhou, Y., Yang, J., Liu, H., Das, H.P., Spanos, C.J.: Consensus adversarial domain adaptation. Proc. AAAI Conf. Artif. Intell. 33(1), 5997–6004 (2019). https://doi.org/10.1609/aaai.v33i01.33015997
Parisotto, E., Ba, J.L., Salakhutdinov, R.: Actor-mimic: Deep multitask and transfer reinforcement learning. In: Proceedings of the 4th international conference on learning representations, pp. 156–171. (2016)
Nguyen, T., Luu, T., Pham, T., Rakhimkul, S., Yoo, C.D.: Robust maml: prioritization task buffer with adaptive learning process for model-agnostic meta-learning. (2021). arXiv:2103.08233
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, New York, NY, USA, ICML ’09, pp. 41–48. (2009). https://doi.org/10.1145/1553374.1553380
Lecarpentier, E., Abel, D., Asadi, K., Jinnai, Y., Rachelson, E., Littman, M.L.: Lipschitz lifelong reinforcement learning. 35, 8270–8278 (2021)
Hester, T., Stone, P.: TEXPLORE: real-time sample-efficient reinforcement learning for robots. Mach. Learn. 90(3), 385–429 (2012). https://doi.org/10.1007/s10994-012-5322-7
Abbeel, P., Coates, A., Quigley, M., Ng, A.: An application of reinforcement learning to aerobatic helicopter flight. In: Proceedings of the advances in neural information processing systems, vol. 19, pp. 1–8. (2007)
Shani, G., Heckerman, D., Brafman, R.I.: An mdp-based recommender system. J. Mach. Learn. Res. 6(43), 1265–1295 (2005)
Saleh, A., Jaques, N., Ghandeharioun, A., Shen, J., Picard, R.: Hierarchical reinforcement learning for open-domain dialog. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 8741–8748. (2020). https://doi.org/10.1609/aaai.v34i05.6400
Sohn, S., Woo, H., Choi, J., Lee, H.: Meta reinforcement learning with autonomous inference of subtask dependencies. In: Proceedings of the international conference on learning representations, (2020)
Acknowledgements
This work was financially supported by Primary Research and Development Plan of China (No.2020YF-C2006602), National Natural Science Foundation of China (No.62072324, No.61876217, No.61876121, No.61772357), University Natural Science Foundation of Jiangsu Province (No.21KJA520005), Primary Research and Development Plan of Jiangsu Province (No.BE2020026), Natural Science Foundation of Jiangsu Province (No.BK20190942), Postgraduate Research & Practice Innovation Program of Jiangsu Province (No.KYCX 21_3020).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Z., Fu, Q., Chen, J. et al. Reinforcement Learning in Few-Shot Scenarios: A Survey. J Grid Computing 21, 30 (2023). https://doi.org/10.1007/s10723-023-09663-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10723-023-09663-0