Efficient Hyperparameter Optimization in Convolutional Neural Networks by Learning Curves Prediction

  • Andrés F. Cardona-Escobar
  • Andrés F. Giraldo-Forero
  • Andrés E. Castro-Ospina
  • Jorge A. Jaramillo-Garzón
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10657)


In this work, we present an automatic framework for hyperparameter selection in Convolutional Neural Networks. In order to achieve fast evaluation of several hyperparameter combinations, prediction of learning curves using non-parametric regression models is applied. Considering that “trend” is the most important feature in any learning curve, our prediction method is focused on trend detection. Results show that our forecasting method is able to catch a complete behavior of future iterations in the learning process.


Hyperparameter optimization Deep learning Learning curves Forecasting SVR Singular spectrum analysis 


  1. 1.
    Perera, C., Liu, C.H., Jayawardena, S., Chen, M.: A survey on internet of things from industrial market perspective. IEEE Access 2, 1660–1679 (2014)CrossRefGoogle Scholar
  2. 2.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems (2012)Google Scholar
  4. 4.
    Domhan, T., Springenberg, J.T., Hutter, F.: Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curvesGoogle Scholar
  5. 5.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer Science & Business Media, Heidelberg (2013). zbMATHGoogle Scholar
  6. 6.
    Tao, D.: The COREL database for content based image retrieval.
  7. 7.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)Google Scholar
  8. 8.
    Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013)Google Scholar
  9. 9.
    Spencer, M., Eickholt, J., Cheng, J.: A deep learning network approach to ab initio protein secondary structure prediction. IEEE/ACM Trans. Comput. Biol. Bioinf. 12(1), 103–112 (2015)CrossRefGoogle Scholar
  10. 10.
    Busia, A., Collins, J., Jaitly, N.: Protein secondary structure prediction using deep multi-scale convolutional neural networks and next-step conditioning. arXiv preprint arXiv:1611.01503 (2016)
  11. 11.
    Golyandina, N., Nekrutkin, V., Zhigljavsky, A.: Analysis of time series structure: SSA and related techniques (2001)Google Scholar
  12. 12.
    Gavish, M., Donoho, D.L.: The optimal hard threshold for singular values is \(4/sqrt(3)\). IEEE Trans. Inf. Theory 60(8), 5040–5053 (2014)CrossRefzbMATHGoogle Scholar
  13. 13.
    Zhang, L., Zhou, W.D., Chang, P.C., et al.: Iterated time series prediction with multiple support vector regression models. Neurocomputing 99, 411–422 (2013)CrossRefGoogle Scholar
  14. 14.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  15. 15.
    LeCun, Y.: The MNIST database of handwritten digits (1998)Google Scholar
  16. 16.
    Wang, J.Z., Li, J., Wiederhold, G.: Simplicity: semantics-sensitive integrated matching for picture libraries. IEEE Trans. Pattern Anal. Mach. Intell. 23(9), 947–963 (2001)CrossRefGoogle Scholar
  17. 17.
    Acharya, S., Pant, A.K., Gyawali, P.K.: Deep learning based large scale handwritten Devanagari character recognition. In: 2015 9th International Conference on Software, Knowledge, Information Management and Applications (SKIMA), pp. 1–6. IEEE (2015)Google Scholar
  18. 18.
    Tao, D., Tang, X., Li, X., Wu, X.: Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 28(7), 1088–1099 (2006)CrossRefGoogle Scholar
  19. 19.
    Chollet, F.: Keras (2015).
  20. 20.
    Abadi, M., Agarwal, A., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015).

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Andrés F. Cardona-Escobar
    • 1
  • Andrés F. Giraldo-Forero
    • 1
  • Andrés E. Castro-Ospina
    • 1
  • Jorge A. Jaramillo-Garzón
    • 2
  1. 1.Instituto Tecnológico MetropolitanoMedellínColombia
  2. 2.Universidad de CaldasManizalesColombia

Personalised recommendations