Advertisement

Two-Stage Transfer Surrogate Model for Automatic Hyperparameter Optimization

  • Martin WistubaEmail author
  • Nicolas Schilling
  • Lars Schmidt-Thieme
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9851)

Abstract

The choice of hyperparameters and the selection of algorithms is a crucial part in machine learning. Bayesian optimization methods have been used very successfully to tune hyperparameters automatically, in many cases even being able to outperform the human expert. Recently, these techniques have been massively improved by using meta-knowledge. The idea is to use knowledge of the performance of an algorithm on given other data sets to automatically accelerate the hyperparameter optimization for a new data set.

In this work we present a model that transfers this knowledge in two stages. At the first stage, the function that maps hyperparameter configurations to hold-out validation performances is approximated for previously seen data sets. At the second stage, these approximations are combined to rank the hyperparameter configurations for a new data set. In extensive experiments on the problem of hyperparameter optimization as well as the problem of combined algorithm selection and hyperparameter optimization, we are outperforming the state of the art methods. The software related to this paper is available at https://github.com/wistuba/TST.

Keywords

Hyperparameter optimization Meta-learning Transfer learning 

Notes

Acknowledgments

The authors gratefully acknowledge the co-funding of their work by the German Research Foundation (DFG) under grant SCHM 2583/6-1.

References

  1. 1.
    Bardenet, R., Brendel, M., Kégl, B., Sebag, M.: Collaborative hyperparameter tuning. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16–21 June, pp. 199–207 (2013)Google Scholar
  2. 2.
    Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12–14 December, Granada, Spain, pp. 2546–2554 (2011)Google Scholar
  3. 3.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
  5. 5.
    Coates, A., Ng, A.Y., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA, 11–13 April, pp. 215–223 (2011)Google Scholar
  6. 6.
    Domhan, T., Springenberg, J.T., Hutter, F.: Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, 25–31 July, pp. 3460–3468 (2015)Google Scholar
  7. 7.
    Feurer, M., Springenberg, J.T., Hutter, F.: Using meta-learning to initialize bayesian optimization of hyperparameters. In: ECAI Workshop on Metalearning and Algorithm Selection (MetaSel), pp. 3–10 (2014)Google Scholar
  8. 8.
    Feurer, M., Springenberg, J.T., Hutter, F.: Initializing bayesian hyperparameter optimization via meta-learning. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, Texas, USA, 25–30 January, pp. 1128–1135 (2015)Google Scholar
  9. 9.
    Gomes, T.A.F., Prudêncio, R.B.C., Soares, C., Rossi, A.L.D., Carvalho, A.C.P.L.F.: Combining meta-learning and search techniques to select parameters for support vector machines. Neurocomputing 75(1), 3–13 (2012)CrossRefGoogle Scholar
  10. 10.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  11. 11.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2 edn. Springer, New York (2009)Google Scholar
  12. 12.
    Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: Dhaenens, C., Jourdan, L., Marmion, M.-E. (eds.) LION 2015. LNCS, vol. 8994, pp. 507–523. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-25566-3_40 CrossRefGoogle Scholar
  13. 13.
    Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Global Optim. 13(4), 455–492 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Leite, R., Brazdil, P.: Predicting relative performance of classifiers from samples. In: Proceedings of the 22nd International Conference on Machine Learning, ICML 2005, pp. 497–503. ACM, New York (2005)Google Scholar
  16. 16.
    Leite, R., Brazdil, P., Vanschoren, J.: Selecting classification algorithms with active testing. In: Perner, P. (ed.) MLDM 2015. LNCS(LNAI), vol. 9166, pp. 117–131. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-31537-4_10 Google Scholar
  17. 17.
    Michie, D., Spiegelhalter, D.J., Taylor, C.C., Campbell, J. (eds.): Machine Learning, Neural and Statistical Classification. Ellis Horwood, Upper Saddle River (1994)Google Scholar
  18. 18.
    Pinto, N., Doukhan, D., DiCarlo, J.J., Cox, D.D.: A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS Comput. Biol. 5(11), e1000579 (2009)Google Scholar
  19. 19.
    Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, Cambridge (2005)Google Scholar
  20. 20.
    Reif, M., Shafait, F., Dengel, A.: Meta-learning for evolutionary parameter optimization of classifiers. Mach. Learn. 87(3), 357–380 (2012)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Schilling, N., Wistuba, M., Drumond, L., Schmidt-Thieme, L.: Hyperparameter optimization with factorized multilayer perceptrons. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds.) ECML PKDD 2015. LNCS(LNAI), vol. 9285, pp. 87–103. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-23525-7_6 CrossRefGoogle Scholar
  22. 22.
    Smith-Miles, K.A.: Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv. 41(1), 6:1–6:25 (2009)Google Scholar
  23. 23.
    Snoek, J., Larochelle, H., Adams, R.P.: Practical bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3–6, Lake Tahoe, Nevada, United States, pp. 2960–2968 (2012)Google Scholar
  24. 24.
    Srinivas, N., Krause, A., Kakade, S., Seeger, M.W.: Gaussian process optimization in the bandit setting: no regret and experimental design. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel, 21–24 June, pp. 1015–1022 (2010)Google Scholar
  25. 25.
    Swersky, K., Snoek, J., Adams, R.P.: Multi-task bayesian optimization. In: Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5–8, Lake Tahoe, Nevada, United States, pp. 2004–2012 (2013)Google Scholar
  26. 26.
    Swersky, K., Snoek, J., Adams, R.P.: Freeze-thaw bayesian optimization (2014)Google Scholar
  27. 27.
    Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-weka: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, pp. 847–855. ACM, New York (2013)Google Scholar
  28. 28.
    Villemonteix, J., Vazquez, E., Walter, E.: An informational approach to the global optimization of expensive-to-evaluate functions. J. Global Optim. 44(4), 509–534 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Wistuba, M., Schilling, N., Schmidt-Thieme, L.: Sequential model-free hyperparameter tuning. In: 2015 IEEE International Conference on Data Mining (ICDM), pp. 1033–1038, November 2015Google Scholar
  30. 30.
    Wistuba, M.: Supplementary website: https://github.com/wistuba/TST, Mar 2016
  31. 31.
    Wistuba, M., Schilling, N., Schmidt-Thieme, L.: Hyperparameter search space pruning – a new component for sequential model-based hyperparameter optimization. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds.) ECML PKDD 2015. LNCS(LNAI), vol. 9285, pp. 104–119. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-23525-7_7 CrossRefGoogle Scholar
  32. 32.
    Wistuba, M., Schilling, N., Schmidt-Thieme, L.: Learning hyperparameter optimization initializations. In: International Conference on Data Science and Advanced Analytics, DSAA 2015, Paris, France, 19–21 October 2015Google Scholar
  33. 33.
    Yogatama, D., Mann, G.: Efficient transfer learning method for automatic hyperparameter tuning. In: International Conference on Artificial Intelligence and Statistics, AISTATS 2014 (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Martin Wistuba
    • 1
    Email author
  • Nicolas Schilling
    • 1
  • Lars Schmidt-Thieme
    • 1
  1. 1.Information Systems and Machine Learning LabUniversity of HildesheimHildesheimGermany

Personalised recommendations