Skip to main content

Novel First Order Bayesian Optimization with an Application to Reinforcement Learning


Zeroth Order Bayesian Optimization (ZOBO) methods optimize an unknown function based on its black-box evaluations at the query locations. Unlike most optimization procedures, ZOBO methods fail to utilize gradient information even when it is available. On the other hand, First Order Bayesian Optimization (FOBO) methods exploit the available gradient information to arrive at better solutions faster. However, the existing FOBO methods do not utilize a crucial information that the gradient is zero at the optima. Further, the inherent sequential nature of the FOBO methods incur high computational cost limiting their wide applicability. To alleviate the aforementioned difficulties of FOBO methods, we propose a relaxed statistical model to leverage the gradient information that directly searches for points where gradient vanishes. To accomplish this, we develop novel acquisition algorithms that search for global optima effectively. Unlike the existing FOBO methods, the proposed methods are parallelizable. Through extensive experimentation on standard test functions, we compare the performance of our methods over the existing methods. Furthermore, we explore an application of the proposed FOBO methods in the context of policy gradient reinforcement learning.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. 1.

    This observation could be utilized in the existing FOBO methods as well. However, due to the computational burden of the joint GP model in the existing FOBO methods, we propose to to utilize this fact in independent GP modeling . Note further that we do not require joint GP modeling to utilize this fact.


  1. 1.

    Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin

    Book  Google Scholar 

  2. 2.

    Abualigah LM, Khader AT, Hanandeh ES (2018) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071

    Article  Google Scholar 

  3. 3.

    Ahmed MO, Shahriarim B, Schmidt M (2016) Do we need “harmless” bayesian optimization and “first-order” bayesian optimization. NIPS BayesOpt

  4. 4.

    Brochu E, Cora VM, De Freitas N (2010) A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv:1012.2599

  5. 5.

    Bull AD (2011) Convergence rates of efficient global optimization algorithms. J Mach Learn Res 12(10):1–27

    MathSciNet  MATH  Google Scholar 

  6. 6.

    Calandra R, Seyfarth A, Peters J, Deisenroth MP (2016) Bayesian optimization for learning gaits under uncertainty. Ann Math Artif Intell 76(1–2):5–23

    MathSciNet  Article  Google Scholar 

  7. 7.

    Deisenroth M, Rasmussen CE (2011) Pilco: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 465–472

  8. 8.

    Duffy AC (2009) An introduction to gradient computation by the discrete adjoint method, Tech. Rep.

  9. 9.

    Frazier PI, Powell WB, Dayanik S (2008) A knowledge-gradient policy for sequential information collection. SIAM J Control Optim 47(5):2410–2439

    MathSciNet  Article  Google Scholar 

  10. 10.

    Fu J, Luo H, Feng J, Chua T -S (2016) Distilling reverse-mode automatic differentiation (drmad) for optimizing hyperparameters of deep neural networks, arXiv:1601.00917

  11. 11.

    Hernández-Lobato J M, Hoffman MW, Ghahramani Z (2014) Predictive entropy search for efficient global optimization of black-box functions. In: Advances in neural information processing systems, pp 918–926

  12. 12.

    Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Glob Optim 13(4):455–492

    MathSciNet  Article  Google Scholar 

  13. 13.

    Kingma D, Adam JBA (2014) A method for stochastic optimization. arXiv:1412.6980

  14. 14.

    Koistinen OP, Maras E, Vehtari A, Jónsson H (2016) Minimum energy path calculations with Gaussian process regression. Nanosyst Phys Chem Math 7(6):925

    Article  Google Scholar 

  15. 15.

    Lizotte DJ (2008) Practical Bayesian optimization. University of Alberta

  16. 16.

    Lizotte DJ, Wang T, Bowling MH, Schuurmans D (2007) Automatic gait optimization with Gaussian process regression. In: IJCAI, vol 7, pp 944–949

  17. 17.

    Luketina J, Berglund M, Greff K, Raiko T (2016) Scalable gradient-based tuning of continuous regularization hyperparameters. In: International conference on machine learning, pp 2952–2960

  18. 18.

    Maclaurin D, Duvenaud D, Adams R (2015) Gradient-based hyperparameter optimization through reversible learning. In: International conference on machine learning, pp 2113–2122

  19. 19.

    Martinez-Cantin R (2017) Bayesian optimization with adaptive kernels for robot control. In: IEEE International conference on robotics and automation (ICRA). IEEE, pp 3350–3356

  20. 20.

    Martinez-Cantin R, de Freitas N, Brochu E, Castellanos J, Doucet A (2009) A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Auton Rob 27(2):93–103

    Article  Google Scholar 

  21. 21.

    McLeod M, Osborne MA, Roberts SJ (2018) Optimization, fast and slow: optimally switching between local and Bayesian optimization, arXiv:1805.08610

  22. 22.

    Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  23. 23.

    Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814

  24. 24.

    O’Donoghue B, Munos R, Kavukcuoglu K, Mnih V (2016) Combining policy gradient and Q-learning. arXiv:1611.01626

  25. 25.

    Osborne MA, Garnett R, Roberts SJ (2009) Gaussian processes for global optimization. In: 3rd international conference on learning and intelligent optimization (LION3), pp 1–15

  26. 26.

    Peters J, Schaal S (2006) Policy gradient methods for robotics. In: IEEE/RSJ international conference on intelligent robots and systems, pp 2219–2225

  27. 27.

    Peters J, Schaal S (2008) Natural actor-critic. Neurocomputing 71(7):1180–1190

    Article  Google Scholar 

  28. 28.

    Plessix R -E (2006) A review of the adjoint-state method for computing the gradient of a functional with geophysical applications. Geophys J Int 167(2):459–503

    Article  Google Scholar 

  29. 29.

    Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms, arXiv:1707.06347

  30. 30.

    Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp 1057–1063

  31. 31.

    Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning, vol 1. MIT Press, Cambridge

    MATH  Google Scholar 

  32. 32.

    Rückstiess T, Sehnke F, Schaul T, Wierstra D, Sun Y, Schmidhuber J (2010) Exploring parameter space in reinforcement learning. Paladyn 1(1):14–24

    Google Scholar 

  33. 33.

    Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems, pp 2951–2959

  34. 34.

    Srinivas N, Krause A, Seeger M, Kakade SM (2010) Gaussian process optimization in the bandit setting: no regret and experimental design. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 1015–1022

  35. 35.

    Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn. MIT Press, Cambridge

    MATH  Google Scholar 

  36. 36.

    Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw Mach Learn 4(2):26–31

    Google Scholar 

  37. 37.

    Vazquez E, Bect J (2010) Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. J Stat Plan Inference 140(11):3088–3095

    MathSciNet  Article  Google Scholar 

  38. 38.

    Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256

    MATH  Google Scholar 

  39. 39.

    Wilson A, Fern A, Tadepalli P (2014) Using trajectory data to improve Bayesian optimization for reinforcement learning. J Mach Learn Res 15(1):253–282

    MathSciNet  MATH  Google Scholar 

  40. 40.

    Wu J, Poloczek M, Wilson AG, Frazier PI (2017) Bayesian optimization with gradients. In: Advances in neural information processing systems, pp 5267–5278

  41. 41.

    Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057

  42. 42.

    Yogatama D, Kong L, Smith NA (2015) Bayesian optimization of text representations. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 2100–2105

Download references

Author information



Corresponding author

Correspondence to Prabuchandran K. J..

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Prabuchandran K. J. was supported by SGNF research grant from Indian Institute of Technology, Dharwad.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

J., P.K., Penubothula, S., Kamanchi, C. et al. Novel First Order Bayesian Optimization with an Application to Reinforcement Learning. Appl Intell 51, 1565–1579 (2021).

Download citation


  • Bayesian optimization
  • Reinforcement learning
  • First order methods
  • Policy gradient