Stable Bayesian optimization

  • Thanh Dai Nguyen
  • Sunil Gupta
  • Santu Rana
  • Svetha Venkatesh
Regular Paper


Tuning hyperparameters of machine learning models is important for their performance. Bayesian optimization has recently emerged as a de-facto method for this task. The hyperparameter tuning is usually performed by looking at model performance on a validation set. Bayesian optimization is used to find the hyperparameter set corresponding to the best model performance. However, in many cases, the function representing the model performance on the validation set contains several spurious sharp peaks due to limited datapoints. The Bayesian optimization, in such cases, has a tendency to converge to sharp peaks instead of other more stable peaks. When a model trained using these hyperparameters is deployed in the real world, its performance suffers dramatically. We address this problem through a novel stable Bayesian optimization framework. We construct two new acquisition functions that help Bayesian optimization to avoid the convergence to the sharp peaks. We conduct a theoretical analysis and guarantee that Bayesian optimization using the proposed acquisition functions prefers stable peaks over unstable ones. Experiments with synthetic function optimization and hyperparameter tuning for support vector machines show the effectiveness of our proposed framework.


Bayesian optimization Gaussian process Stable Bayesian optimization Acquisition function 



This research was partially funded by the Australian Government through the Australian Research Council (ARC) and the Telstra-Deakin Centre of Excellence in Big Data and Machine Learning. Prof Venkatesh is the recipient of an ARC Australian Laureate Fellowship (FL170100006).

Compliance with ethical standards

Conflicts of interest

All the authors declare that they have no conflict of interest.


  1. 1.
    Azimi, J., Fern, A., Fern, X.Z.: Batch bayesian optimization via simulation matching. Adv. Neural Inf. Process. Syst. 1, 109–117 (2010)zbMATHGoogle Scholar
  2. 2.
    Brochu, E., Cora, V.M., De Freitas, N.: A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010)
  3. 3.
    Bull, A.D.: Convergence rates of efficient global optimization algorithms. J. Mach. Learn. Res. 12, 2879–2904 (2011)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Chen, B., Castro, R., Krause, A.: Joint optimization and variable selection of high-dimensional gaussian processes. arXiv preprint arXiv:1206.6396 (2012)
  5. 5.
    Garnett, R., Osborne, M.A., Roberts, S.J.: Bayesian optimization for sensor set selection. In: IPSN (2010)Google Scholar
  6. 6.
    Gelbart, M.A., Snoek, J., Adams, R.P.: Bayesian optimization with unknown constraints. arXiv preprint arXiv:1403.5607 (2014)
  7. 7.
    Girard, A., Murray-Smith, R.: Gaussian processes: prediction at a noisy input and application to iterative multiple-step ahead forecasting of time-series. In: Murray-Smith, R, Shorten, R (eds) Switching and Learning in Feedback Systems, pp. 158–184. Springer, Berlin (2005)Google Scholar
  8. 8.
    Jones, D.R., Perttunen, C.D., Stuckman, B.E.: Lipschitzian optimization without the lipschitz constant. J. Optim. Theory Appl. 79(1), 157–181 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Glob. Optim. 13(4), 455–492 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Joy, T.T., Rana, S., Gupta, S.K., Venkatesh, S.: Flexible transfer learning framework for bayesian optimisation. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 102–114. Springer, Berlin (2016)Google Scholar
  11. 11.
    Laumanns, M., Ocenasek, J.: Bayesian optimization algorithms for multi-objective optimization. In: PPSN (2002)Google Scholar
  12. 12.
    Lizotte, D.J., Wang, T., Bowling, M.H., Schuurmans, D.: Automatic gait optimization with gaussian process regression. IJCAI 7, 944–949 (2007)Google Scholar
  13. 13.
    Martinez-Cantin, et al.: A bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Auton. Robots 27(2), 93-103 (2009)Google Scholar
  14. 14.
    Mockus, J., Tiesis, V., Zilinskas, A.: The application of bayesian methods for seeking the extremum. Towards Glob. Optim. 2(117–129), 2 (1978)zbMATHGoogle Scholar
  15. 15.
    Nguyen, T.D., Gupta, S., Rana, S., Venkatesh, S.: Stable bayesian optimization. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 578–591. Springer, Berlin (2017)Google Scholar
  16. 16.
    Nguyen, V., Rana, S., Gupta, S.K., Li, C., Venkatesh, S.: Budgeted batch Bayesian optimization. In: 16th International Conference on IEEE Data Mining (ICDM), 2016, pp. 1107–1112. IEEE (2016)Google Scholar
  17. 17.
    Rasmussen, C.E.: Gaussian processes for machine learning. Citeseer, (2006)Google Scholar
  18. 18.
    Snoek, J., Larochelle, H., Adams, R.P.: Practical bayesian optimization of machine learning algorithms. In: NIPS, pp. 2951–2959 (2012)Google Scholar
  19. 19.
    Snoek, J., Rippel, O., Swersky, K., Kiros, R., Satish, N., Sundaram, N., Patwary, M., Prabhat, M., Adams, R.: Scalable Bayesian optimization using deep neural networks. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 2171–2180 (2015)Google Scholar
  20. 20.
    Srinivas, N., Krause, A., Seeger, M., Kakade, S.M.: Gaussian process optimization in the bandit setting: no regret and experimental design. In: ICML (2010)Google Scholar
  21. 21.
    Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-weka: Combined selection and hyperparameter optimization of classification algorithms. In: ACM SIGKDD (2013)Google Scholar
  22. 22.
    Wang, Z., de Freitas, N.: Theoretical analysis of bayesian optimisation with unknown gaussian process hyper-parameters. arXiv preprint arXiv:1406.7758 (2014)
  23. 23.
    Xue, D., et al.: Accelerated search for materials with targeted properties by adaptive design. Nat. Commun. 7, 11241 (2016)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Thanh Dai Nguyen
    • 1
  • Sunil Gupta
    • 1
  • Santu Rana
    • 1
  • Svetha Venkatesh
    • 1
  1. 1.Centre for Pattern Recognition and Data Analytics (PRaDA)Deakin UniversityGeelongAustralia

Personalised recommendations