Abstract
Training agents over sequences of tasks is often employed in deep reinforcement learning to let the agents progress more quickly towards better behaviours. This problem, known as curriculum learning, has been mainly tackled in the literature by numerical methods based on enumeration strategies, which, however, can handle only small size problems. In this work, we define a new optimization perspective to the curriculum learning problem with the aim of developing efficient solution methods for solving complex reinforcement learning tasks. Specifically, we show how the curriculum learning problem can be viewed as an optimization problem with a nonsmooth and nonconvex objective function and with an integer feasible region. We reformulate it by defining a grey-box function that includes a suitable scheduling problem. Numerical results on a benchmark environment in the reinforcement learning community show the effectiveness of the proposed approaches in reaching better performance also on large problems.
Similar content being viewed by others
References
Belotti P, Kirches C, Leyffer S, Linderoth J, Luedtke J, Mahajan A (2013) Mixed-integer nonlinear optimization. Acta Numer 22:1–131
Bergstra J, Yamins D, Cox DD (2013) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. Proceedings of the 30th International Conference on Machine Learning(ICML 2013), June 2013, pp. I-115–I-23. https://github.com/hyperopt/hyperopt. Accessed on 4 Nov 2019
Bergstra JS, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems, pp 2546–2554
Custódio AL, Scheinberg K, Nunes Vicente L (2017) Methodologies and software for derivative-free optimization. In: Advances and trends in optimization with engineering applications, pp. 495–506
Di Pillo G, Liuzzi G, Lucidi S, Piccialli V, Rinaldi F (2016) A DIRECT-type approach for derivative-free constrained global optimization. Comput Optim Appl 65(2):361–397
Foglino F, Christakou CC, Leonetti M (2019) An optimization framework for task sequencing in curriculum learning. In: 2019 Joint IEEE 9th international conference on development and learning and epigenetic robotics (ICDL-EpiRob), IEEE, pp 207–214
Foglino F, Leonetti M, Sagratella S, Seccia R (2019) A gray-box approach for curriculum learning. World congress on global optimization. Springer, Cham, pp 720–729
Frazier PI (2018) A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811
Gpyopt (2016) A bayesian optimization framework in python. http://github.com/SheffieldML/GPyOpt. Accessed on 4 Nov 2019
IBM: IBM Decision Optimization (2019). http://ibmdecisionoptimization.github.io/docplex-doc/mp/refman.html
Leonetti M, Kormushev P, Sagratella S (2012) Combining local and global direct derivative-free optimization for reinforcement learning. Cybern Inf Technol 12(3):53–65
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529
Narvekar S, Sinapov J, Leonetti M, Stone P (2016) Source task creation for curriculum learning. In: Proceedings of the 2016 international conference on autonomous agents & multiagent systems, International Foundation for Autonomous Agents and Multiagent Systems, pp 566–574
Narvekar S, Sinapov J, Stone P (2017) Autonomous task sequencing for customized curriculum design in reinforcement learning. In: IJCAI, pp. 2536–2542
Narvekar S, Peng B, Leonetti M, Sinapov J, Taylor ME, Stone P (2020) Curriculum learning for reinforcement learning domains: a framework and survey. J Mach Learn Res 21(181):1–50
Peng B, MacGlashan J, Loftin R, Littman ML, Roberts DL. Taylor ME (2016) An empirical study of non-expert curriculum design for machine learners. In: In Proceedings of the IJCAI interactive machine learning workshop
Rasmussen CE (2004) Gaussian processes in machine learning. Advanced lectures on machine learning. Springer, Berlin, pp 63–71
Shahriari B, Swersky K, Wang Z, Adams RP, De Freitas N (2016) Taking the human out of the loop: a review of Bayesian optimization. Proc IEEE 104(1):148–175
Snoek J, Larochelle H, Adams RP (2012) Practical bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems, pp 2951–2959
Sutton Richard S, Barto AG (2018) Reinforcement Learning: an introduction. MIT Press, Cambridge
Svetlik M, Leonetti M, Sinapov J, Shah R, Walker N, Stone P (2017) Automatic curriculum graph generation for reinforcement learning agents. In: AAAI, pp 2590–2596
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Seccia, R., Foglino, F., Leonetti, M. et al. A novel optimization perspective to the problem of designing sequences of tasks in a reinforcement learning framework. Optim Eng 24, 831–846 (2023). https://doi.org/10.1007/s11081-021-09708-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11081-021-09708-x