Zeroth Order Bayesian Optimization (ZOBO) methods optimize an unknown function based on its black-box evaluations at the query locations. Unlike most optimization procedures, ZOBO methods fail to utilize gradient information even when it is available. On the other hand, First Order Bayesian Optimization (FOBO) methods exploit the available gradient information to arrive at better solutions faster. However, the existing FOBO methods do not utilize a crucial information that the gradient is zero at the optima. Further, the inherent sequential nature of the FOBO methods incur high computational cost limiting their wide applicability. To alleviate the aforementioned difficulties of FOBO methods, we propose a relaxed statistical model to leverage the gradient information that directly searches for points where gradient vanishes. To accomplish this, we develop novel acquisition algorithms that search for global optima effectively. Unlike the existing FOBO methods, the proposed methods are parallelizable. Through extensive experimentation on standard test functions, we compare the performance of our methods over the existing methods. Furthermore, we explore an application of the proposed FOBO methods in the context of policy gradient reinforcement learning.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
This observation could be utilized in the existing FOBO methods as well. However, due to the computational burden of the joint GP model in the existing FOBO methods, we propose to to utilize this fact in independent GP modeling . Note further that we do not require joint GP modeling to utilize this fact.
Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin
Abualigah LM, Khader AT, Hanandeh ES (2018) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071
Ahmed MO, Shahriarim B, Schmidt M (2016) Do we need “harmless” bayesian optimization and “first-order” bayesian optimization. NIPS BayesOpt
Brochu E, Cora VM, De Freitas N (2010) A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv:1012.2599
Bull AD (2011) Convergence rates of efficient global optimization algorithms. J Mach Learn Res 12(10):1–27
Calandra R, Seyfarth A, Peters J, Deisenroth MP (2016) Bayesian optimization for learning gaits under uncertainty. Ann Math Artif Intell 76(1–2):5–23
Deisenroth M, Rasmussen CE (2011) Pilco: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 465–472
Duffy AC (2009) An introduction to gradient computation by the discrete adjoint method, Tech. Rep.
Frazier PI, Powell WB, Dayanik S (2008) A knowledge-gradient policy for sequential information collection. SIAM J Control Optim 47(5):2410–2439
Fu J, Luo H, Feng J, Chua T -S (2016) Distilling reverse-mode automatic differentiation (drmad) for optimizing hyperparameters of deep neural networks, arXiv:1601.00917
Hernández-Lobato J M, Hoffman MW, Ghahramani Z (2014) Predictive entropy search for efficient global optimization of black-box functions. In: Advances in neural information processing systems, pp 918–926
Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Glob Optim 13(4):455–492
Kingma D, Adam JBA (2014) A method for stochastic optimization. arXiv:1412.6980
Koistinen OP, Maras E, Vehtari A, Jónsson H (2016) Minimum energy path calculations with Gaussian process regression. Nanosyst Phys Chem Math 7(6):925
Lizotte DJ (2008) Practical Bayesian optimization. University of Alberta
Lizotte DJ, Wang T, Bowling MH, Schuurmans D (2007) Automatic gait optimization with Gaussian process regression. In: IJCAI, vol 7, pp 944–949
Luketina J, Berglund M, Greff K, Raiko T (2016) Scalable gradient-based tuning of continuous regularization hyperparameters. In: International conference on machine learning, pp 2952–2960
Maclaurin D, Duvenaud D, Adams R (2015) Gradient-based hyperparameter optimization through reversible learning. In: International conference on machine learning, pp 2113–2122
Martinez-Cantin R (2017) Bayesian optimization with adaptive kernels for robot control. In: IEEE International conference on robotics and automation (ICRA). IEEE, pp 3350–3356
Martinez-Cantin R, de Freitas N, Brochu E, Castellanos J, Doucet A (2009) A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Auton Rob 27(2):93–103
McLeod M, Osborne MA, Roberts SJ (2018) Optimization, fast and slow: optimally switching between local and Bayesian optimization, arXiv:1805.08610
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814
O’Donoghue B, Munos R, Kavukcuoglu K, Mnih V (2016) Combining policy gradient and Q-learning. arXiv:1611.01626
Osborne MA, Garnett R, Roberts SJ (2009) Gaussian processes for global optimization. In: 3rd international conference on learning and intelligent optimization (LION3), pp 1–15
Peters J, Schaal S (2006) Policy gradient methods for robotics. In: IEEE/RSJ international conference on intelligent robots and systems, pp 2219–2225
Peters J, Schaal S (2008) Natural actor-critic. Neurocomputing 71(7):1180–1190
Plessix R -E (2006) A review of the adjoint-state method for computing the gradient of a functional with geophysical applications. Geophys J Int 167(2):459–503
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms, arXiv:1707.06347
Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp 1057–1063
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning, vol 1. MIT Press, Cambridge
Rückstiess T, Sehnke F, Schaul T, Wierstra D, Sun Y, Schmidhuber J (2010) Exploring parameter space in reinforcement learning. Paladyn 1(1):14–24
Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems, pp 2951–2959
Srinivas N, Krause A, Seeger M, Kakade SM (2010) Gaussian process optimization in the bandit setting: no regret and experimental design. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 1015–1022
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn. MIT Press, Cambridge
Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw Mach Learn 4(2):26–31
Vazquez E, Bect J (2010) Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. J Stat Plan Inference 140(11):3088–3095
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256
Wilson A, Fern A, Tadepalli P (2014) Using trajectory data to improve Bayesian optimization for reinforcement learning. J Mach Learn Res 15(1):253–282
Wu J, Poloczek M, Wilson AG, Frazier PI (2017) Bayesian optimization with gradients. In: Advances in neural information processing systems, pp 5267–5278
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Yogatama D, Kong L, Smith NA (2015) Bayesian optimization of text representations. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 2100–2105
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Prabuchandran K. J. was supported by SGNF research grant from Indian Institute of Technology, Dharwad.
About this article
Cite this article
J., P.K., Penubothula, S., Kamanchi, C. et al. Novel First Order Bayesian Optimization with an Application to Reinforcement Learning. Appl Intell 51, 1565–1579 (2021). https://doi.org/10.1007/s10489-020-01896-w
- Bayesian optimization
- Reinforcement learning
- First order methods
- Policy gradient