Convergence Analysis of PSO for Hyper-Parameter Selection in Deep Neural Networks

  • Jakub NalepaEmail author
  • Pablo Ribalta Lorenzo
Conference paper
Part of the Lecture Notes on Data Engineering and Communications Technologies book series (LNDECT, volume 13)


Deep Neural Networks (DNNs) have gained enormous research attention since they consistently outperform other state-of-the-art methods in a plethora of machine learning tasks. However, their performance strongly depends on the DNN hyper-parameters which are commonly tuned by experienced practitioners. Recently, we introduced Particle Swarm Optimization (PSO) and parallel PSO techniques to automate this process. In this work, we theoretically and experimentally investigate the convergence capabilities of these algorithms. The experiments were performed for several DNN architectures (both gradually augmented and hand-crafted by a human) using two challenging multi-class benchmark datasets—MNIST and CIFAR-10.


Convergence analysis PSO Hyper-parameter selection DNNs 



This work has been supported by the Polish National Centre for Research and Development under the Innomed grant POIR.01.02.00-00-0030/15, and the Silesian University of Technology grant for young researchers (BKM-507/RAU2/2016).


  1. 1.
    Liu, L., Luo, J., Deng, X., Li, S.: FPGA-based acceleration of deep neural networks using high level method. In: Proceedings 3PGCIC, pp. 824–827, November 2015Google Scholar
  2. 2.
    Lorenzo, P.R., Nalepa, J., Kawulok, M., Ramos, L.S., Pastor, J.R.: Particle swarm optimization for hyper-parameter selection in deep neural networks. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2017, New York, NY, USA. ACM (2017) 481–488Google Scholar
  3. 3.
    Lorenzo, P.R., Nalepa, J., Ramos, L.S., Pastor, J.R.: Hyper-parameter selection in deep neural networks using parallel particle swarm optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO 2017, New York, NY, USA, pp. 1864–1871. ACM (2017)Google Scholar
  4. 4.
    Nalepa, J.: Genetic and memetic algorithms for selection of training sets for support vector machines. Ph.D. thesis, Silesian University of Technology, Poland (2016)Google Scholar
  5. 5.
    Snoek, J., Larochelle, H., Adams, R.P.: Practical bayesian optimization of machine learning algorithms. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Proceedings NIPS, pp. 2951–2959. Curran Associates (2012)Google Scholar
  6. 6.
    Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Proceedings NIPS, pp. 2546–2554. Curran Associates (2011)Google Scholar
  7. 7.
    Loshchilov, I., Hutter, F.: CMA-ES for hyperparameter optimization of deep neural networks. CoRR abs/1604.07269, pp. 1–8 (2016)Google Scholar
  8. 8.
    Ilievski, I., Akhtar, T., Feng, J., Shoemaker, C.A.: Hyperparameter optimization of deep neural networks using non-probabilistic RBF surrogate model. CoRR abs/1607.08316, pp. 1–8 (2016)Google Scholar
  9. 9.
    David, O.E., Greental, I.: Genetic algorithms for evolving deep neural networks. In: Proceedings GECCO, USA, pp. 1451–1452. ACM (2014)Google Scholar
  10. 10.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)zbMATHMathSciNetGoogle Scholar
  11. 11.
    Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings ICML, USA, pp. 473–480. ACM (2007)Google Scholar
  12. 12.
    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR abs/1412.6980, pp. 1–15 (2014)Google Scholar
  13. 13.
    Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings IEEE, pp. 2278–2324 (1998)Google Scholar
  14. 14.
    Schmidhuber, J.: Multi-column deep neural networks for image classification. In: Proceedings CVPR, USA, pp. 3642–3649. IEEE Computer Society (2012)Google Scholar
  15. 15.
    Graham, B.: Fractional max-pooling. CoRR abs/1412.6071, pp. 1–10 (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Future Processing and Silesian University of TechnologyGliwicePoland
  2. 2.Future ProcessingGliwicePoland

Personalised recommendations