Kernel and Acquisition Function Setup for Bayesian Optimization of Gradient Boosting Hyperparameters

  • Andrzej Szwabe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10751)


The application scenario investigated in the paper is the bank credit scoring based on a Gradient Boosting classifier. It is shown how one may exploit hyperparameter optimization based on the Bayesian Optimization paradigm. All the evaluated methods are based on the Gaussian Process model, but differ in terms of the kernel and the acquisition function. The main purpose of the research presented herein is to confirm experimentally that it is reasonable to tune both the kernel function and the acquisition function in order to optimize Bayesian Gradient Boosting hyperparameters. Moreover, the paper provides results indicating that, at least in the investigated application scenario, the superiority of some of the evaluated Bayesian Optimization methods over others strongly depends on the amount of the optimization budget.


Binary classification Gradient Boosting Hyperparameters Bayesian Optimization Gaussian Process Kernel function Acquisition function Bank credit scoring 



This work was supported by the Polish National Science Centre, grant DEC-2011/01/D/ST6/06788, and by Poznan University of Technology under grant 04/45/DSPB/0163.


  1. 1.
    Flach, P.: Machine Learning: The Art and Science of Algorithms That Make Sense of Data. Cambridge University Press, New York (2012)CrossRefzbMATHGoogle Scholar
  2. 2.
    Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, NIPS 2012, USA, vol. 2, pp. 2951–2959. Curran Associates Inc. (2012)Google Scholar
  3. 3.
    Xia, Y., Liu, C., Li, Y., Liu, N.: A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Syst. Appl. 78(Suppl. C), 225–241 (2017)CrossRefGoogle Scholar
  4. 4.
    Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., de Freitas, N.: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104(1), 148–175 (2016)CrossRefGoogle Scholar
  5. 5.
    Chen, T., Guestrin, C.: XGboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 785–794. ACM, New York (2016)Google Scholar
  6. 6.
    Brochu, E., Cora, V.M., de Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning, December 2010. arXiv:1012.2599
  7. 7.
    Szwabe, A., Misiorek, P., Walkowiak, P.: Reflective relational learning for ontology alignment. In: 9th International Conference on Distributed Computing and Artificial Intelligence, DCAI 2012, Salamanca, Spain, 28–30th March 2012, pp. 519–526 (2012)Google Scholar
  8. 8.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Lizotte, D.J., Greiner, R., Schuurmans, D.: An experimental methodology for response surface optimization methods. J. Glob. Optim. 53(4), 699–736 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Kushner, H.J.: A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. J. Basic Eng. 86(1), 97–106 (1964)CrossRefGoogle Scholar
  11. 11.
    Močkus, J.: On Bayesian methods for seeking the extremum. In: Marchuk, G.I. (ed.) Optimization Techniques 1974. LNCS, vol. 27, pp. 400–404. Springer, Heidelberg (1975). CrossRefGoogle Scholar
  12. 12.
    Srinivas, N., Krause, A., Kakade, S., Seeger, M.: Gaussian process optimization in the bandit setting: no regret and experimental design. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML 2010, USA, pp. 1015–1022. Omnipress (2010)Google Scholar
  13. 13.
    University of California, Irvine (UCI), Machine Learning Repository (MRI): German Credit dataset (2017).
  14. 14.
    Lessmann, S., Baesens, B., Seow, H.V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247(1), 124–136 (2015)CrossRefzbMATHGoogle Scholar
  15. 15.
    Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39(3), 3446–3453 (2012)CrossRefGoogle Scholar
  16. 16.
    Harris, T.: Credit scoring using the clustered support vector machine. Expert Syst. Appl. 42(2), 741–750 (2015)CrossRefGoogle Scholar
  17. 17.
    Huang, C.L., Chen, M.C., Wang, C.J.: Credit scoring with a data mining approach based on support vector machines. Expert Syst. Appl. 33(4), 847–856 (2007)CrossRefGoogle Scholar
  18. 18.
    Finlay, S.: Multiple classifier architectures and their application to credit risk assessment. Eur. J. Oper. Res. 210(2), 368–378 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Institute of Control, Robotics and Information EngineeringPoznan University of TechnologyPoznanPoland

Personalised recommendations