Advertisement

Hyperparameter Importance for Image Classification by Residual Neural Networks

  • Abhinav SharmaEmail author
  • Jan N. van RijnEmail author
  • Frank Hutter
  • Andreas Müller
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11828)

Abstract

Residual neural networks (ResNets) are among the state-of-the-art for image classification tasks. With the advent of automated machine learning (AutoML), automated hyperparameter optimization methods are by now routinely used for tuning various network types. However, in the thriving field of deep neural networks, this progress is not yet matched by equal progress on rigorous techniques that yield information beyond performance-optimizing hyperparameter settings. In this work, we aim to answer the following question: Given a residual neural network architecture, what are generally (across datasets) its most important hyperparameters? In order to answer this question, we assembled a benchmark suite containing 10 image classification datasets. For each of these datasets, we analyze which of the hyperparameters were most influential using the functional ANOVA framework. This experiment both confirmed expected patterns, and revealed new insights. With these experimental results, we aim to form a more rigorous basis for experimentation that leads to better insight towards what hyperparameters are important to make neural networks perform well.

Keywords

Hyperparameter importance Residual neural networks 

References

  1. 1.
    Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, vol. 24, pp. 2546–2554. Curran Associates, Inc. (2011)Google Scholar
  2. 2.
    Biedenkapp, A., Lindauer, M., Eggensperger, K., Fawcett, C., Hoos, H.H., Hutter, F.: Efficient parameter importance analysis via ablation with surrogates. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 773–779. AAAI Press (2017)Google Scholar
  3. 3.
    Biedenkapp, A., Marben, J., Lindauer, M., Hutter, F.: CAVE: configuration assessment, visualization and evaluation. In: Battiti, R., Brunato, M., Kotsireas, I., Pardalos, P.M. (eds.) LION 12 2018. LNCS, vol. 11353, pp. 115–130. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-05348-2_10CrossRefGoogle Scholar
  4. 4.
    Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R.: Metalearning. Applications to Data Mining, 1st edn. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-73263-1CrossRefzbMATHGoogle Scholar
  5. 5.
    Cawley, G.C., Talbot, N.L.: On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Coates, A., Ng, A., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, vol. 15, pp. 215–223. PMLR (2011)Google Scholar
  7. 7.
    Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: AutoAugment: learning augmentation strategies from data. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  8. 8.
    Feurer, M., Hutter, F.: Hyperparameter optimization. In: Hutter, F., Kotthoff, L., Vanschoren, J. (eds.) Automated Machine Learning: Methods, Systems, Challenges. TSSCML, pp. 3–33. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-05318-5_1CrossRefGoogle Scholar
  9. 9.
    Hand, D.J.: Classifier technology and the illusion of progress. Stat. Sci. 21(1), 1–14 (2006)MathSciNetCrossRefGoogle Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
  11. 11.
    Huang, Y., et al.: GPipe: efficient training of giant neural networks using pipeline parallelism. arXiv preprint arXiv:1811.06965 (2018)
  12. 12.
    Hutter, F., Hoos, H.H., Leyton-Brown, K.: Identifying key algorithm parameters and instance features using forward selection. In: Nicosia, G., Pardalos, P. (eds.) LION 2013. LNCS, vol. 7997, pp. 364–381. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-44973-4_40CrossRefGoogle Scholar
  13. 13.
    Hutter, F., Hoos, H., Leyton-Brown, K.: An efficient approach for assessing hyperparameter importance. In: Proceedings of the 31st International Conference on Machine Learning, vol. 32, pp. 754–762. PMLR (2014)Google Scholar
  14. 14.
    Ji, X., Henriques, J.F., Vedaldi, A.: Invariant information clustering for unsupervised image classification and segmentation. arXiv preprint arXiv:1807.06653 (2018)
  15. 15.
    Kaggle: Dogs vs. Cats Redux: Kernels Edition (2016). https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition. Accessed December 2018
  16. 16.
    Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)Google Scholar
  17. 17.
    LeCun, Y.: The MNIST database of handwritten digits (1998). http://yann.lecun.com/exdb/mnist/. Accessed December 2018
  18. 18.
    Li, L., Jamieson, K.G., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Hyperband: bandit-based configuration evaluation for hyperparameter optimization. In: 5th International Conference on Learning Representations, ICLR 2017. OpenReview.net (2017)Google Scholar
  19. 19.
    Mamaev, A.: Flowers Recognition (version 2). https://www.kaggle.com/alxmamaev/flowers-recognition. Accessed December 2018
  20. 20.
    Mureşan, H., Oltean, M.: Fruit recognition from images using deep learning. Acta Universitatis Sapientiae, Informatica 10(1), 26–42 (2018)CrossRefGoogle Scholar
  21. 21.
    Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)Google Scholar
  22. 22.
    Probst, P., Boulesteix, A.L., Bischl, B.: Tunability: importance of hyperparameters of machine learning algorithms. J. Mach. Learn. Res. 20(53), 1–32 (2019)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Pushak, Y., Hoos, H.: Algorithm configuration landscapes: more benign than expected? In: Auger, A., Fonseca, C.M., Lourenço, N., Machado, P., Paquete, L., Whitley, D. (eds.) PPSN 2018. LNCS, vol. 11102, pp. 271–283. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-99259-4_22CrossRefGoogle Scholar
  24. 24.
    Recht, B., Roelofs, R., Schmidt, L., Shankar, V.: Do CIFAR-10 Classifiers Generalize to CIFAR-10? arXiv preprint arXiv:1806.00451 (2018)
  25. 25.
    van Rijn, J.N., Hutter, F.: An empirical study of hyperparameter importance across datasets. In: AutoML@ PKDD/ECML, pp. 91–98 (2017)Google Scholar
  26. 26.
    van Rijn, J.N., Hutter, F.: Hyperparameter importance across datasets. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2367–2376. ACM (2018)Google Scholar
  27. 27.
    Sculley, D., Snoek, J., Wiltschko, A., Rahimi, A.: Winner’s curse? on pace, progress, and empirical rigor. In: Proceedings of ICLR 2018 (2018)Google Scholar
  28. 28.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  29. 29.
    Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, vol. 25, pp. 2951–2959. ACM (2012)Google Scholar
  30. 30.
    Sobol, I.M.: Sensitivity estimates for nonlinear mathematical models. Math. Model. Comput. Exp. 1(4), 407–414 (1993)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Strang, B., Putten, P., Rijn, J.N., Hutter, F.: Don’t rule out simple models prematurely: a large scale benchmark comparing linear and non-linear classifiers in OpenML. In: Duivesteijn, W., Siebes, A., Ukkonen, A. (eds.) IDA 2018. LNCS, vol. 11191, pp. 303–315. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01768-2_25CrossRefGoogle Scholar
  32. 32.
    Tschandl, P., Rosendahl, C., Kittler, H.: The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data (2018)Google Scholar
  33. 33.
    Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., Fergus, R.: Regularization of neural networks using DropConnect. In: International Conference on Machine Learning, pp. 1058–1066 (2013)Google Scholar
  34. 34.
    Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
  35. 35.
    Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. arXiv preprint arXiv:1708.04896 (2017)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Columbia UniversityNew York CityUSA
  2. 2.Leiden UniversityLeidenThe Netherlands
  3. 3.Albert-Ludwigs-Universität FreiburgFreiburgGermany

Personalised recommendations