Skip to main content
Log in

A novel optimization perspective to the problem of designing sequences of tasks in a reinforcement learning framework

  • Research Article
  • Published:
Optimization and Engineering Aims and scope Submit manuscript

Abstract

Training agents over sequences of tasks is often employed in deep reinforcement learning to let the agents progress more quickly towards better behaviours. This problem, known as curriculum learning, has been mainly tackled in the literature by numerical methods based on enumeration strategies, which, however, can handle only small size problems. In this work, we define a new optimization perspective to the curriculum learning problem with the aim of developing efficient solution methods for solving complex reinforcement learning tasks. Specifically, we show how the curriculum learning problem can be viewed as an optimization problem with a nonsmooth and nonconvex objective function and with an integer feasible region. We reformulate it by defining a grey-box function that includes a suitable scheduling problem. Numerical results on a benchmark environment in the reinforcement learning community show the effectiveness of the proposed approaches in reaching better performance also on large problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Belotti P, Kirches C, Leyffer S, Linderoth J, Luedtke J, Mahajan A (2013) Mixed-integer nonlinear optimization. Acta Numer 22:1–131

    Article  MathSciNet  MATH  Google Scholar 

  • Bergstra J, Yamins D, Cox DD (2013) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. Proceedings of the 30th International Conference on Machine Learning(ICML 2013), June 2013, pp. I-115–I-23. https://github.com/hyperopt/hyperopt. Accessed on 4 Nov 2019

  • Bergstra JS, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems, pp 2546–2554

  • Custódio AL, Scheinberg K, Nunes Vicente L (2017) Methodologies and software for derivative-free optimization. In: Advances and trends in optimization with engineering applications, pp. 495–506

  • Di Pillo G, Liuzzi G, Lucidi S, Piccialli V, Rinaldi F (2016) A DIRECT-type approach for derivative-free constrained global optimization. Comput Optim Appl 65(2):361–397

    Article  MathSciNet  MATH  Google Scholar 

  • Foglino F, Christakou CC, Leonetti M (2019) An optimization framework for task sequencing in curriculum learning. In: 2019 Joint IEEE 9th international conference on development and learning and epigenetic robotics (ICDL-EpiRob), IEEE, pp 207–214

  • Foglino F, Leonetti M, Sagratella S, Seccia R (2019) A gray-box approach for curriculum learning. World congress on global optimization. Springer, Cham, pp 720–729

    Google Scholar 

  • Frazier PI (2018) A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811

  • Gpyopt (2016) A bayesian optimization framework in python. http://github.com/SheffieldML/GPyOpt. Accessed on 4 Nov 2019

  • IBM: IBM Decision Optimization (2019). http://ibmdecisionoptimization.github.io/docplex-doc/mp/refman.html

  • Leonetti M, Kormushev P, Sagratella S (2012) Combining local and global direct derivative-free optimization for reinforcement learning. Cybern Inf Technol 12(3):53–65

    Google Scholar 

  • Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529

    Article  Google Scholar 

  • Narvekar S, Sinapov J, Leonetti M, Stone P (2016) Source task creation for curriculum learning. In: Proceedings of the 2016 international conference on autonomous agents & multiagent systems, International Foundation for Autonomous Agents and Multiagent Systems, pp 566–574

  • Narvekar S, Sinapov J, Stone P (2017) Autonomous task sequencing for customized curriculum design in reinforcement learning. In: IJCAI, pp. 2536–2542

  • Narvekar S, Peng B, Leonetti M, Sinapov J, Taylor ME, Stone P (2020) Curriculum learning for reinforcement learning domains: a framework and survey. J Mach Learn Res 21(181):1–50

    MathSciNet  MATH  Google Scholar 

  • Peng B, MacGlashan J, Loftin R, Littman ML, Roberts DL. Taylor ME (2016) An empirical study of non-expert curriculum design for machine learners. In: In Proceedings of the IJCAI interactive machine learning workshop

  • Rasmussen CE (2004) Gaussian processes in machine learning. Advanced lectures on machine learning. Springer, Berlin, pp 63–71

    Chapter  Google Scholar 

  • Shahriari B, Swersky K, Wang Z, Adams RP, De Freitas N (2016) Taking the human out of the loop: a review of Bayesian optimization. Proc IEEE 104(1):148–175

    Article  Google Scholar 

  • Snoek J, Larochelle H, Adams RP (2012) Practical bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems, pp 2951–2959

  • Sutton Richard S, Barto AG (2018) Reinforcement Learning: an introduction. MIT Press, Cambridge

    MATH  Google Scholar 

  • Svetlik M, Leonetti M, Sinapov J, Shah R, Walker N, Stone P (2017) Automatic curriculum graph generation for reinforcement learning agents. In: AAAI, pp 2590–2596

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simone Sagratella.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Seccia, R., Foglino, F., Leonetti, M. et al. A novel optimization perspective to the problem of designing sequences of tasks in a reinforcement learning framework. Optim Eng 24, 831–846 (2023). https://doi.org/10.1007/s11081-021-09708-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11081-021-09708-x

Keywords

Navigation