Neuron Pruning for Compressing Deep Networks Using Maxout Architectures

  • Fernando Moya RuedaEmail author
  • Rene Grzeszick
  • Gernot A. Fink
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10496)


This paper presents an efficient and robust approach for reducing the size of deep neural networks by pruning entire neurons. It exploits maxout units for combining neurons into more complex convex functions and it makes use of a local relevance measurement that ranks neurons according to their activation on the training set for pruning them. Additionally, a parameter reduction comparison between neuron and weight pruning is shown. It will be empirically shown that the proposed neuron pruning reduces the number of parameters dramatically. The evaluation is performed on two tasks, the MNIST handwritten digit recognition and the LFW face verification, using a LeNet-5 and a VGG16 network architecture. The network size is reduced by up to \(74\%\) and \(61\%\), respectively, without affecting the network’s performance. The main advantage of neuron pruning is its direct influence on the size of the network architecture. Furthermore, it will be shown that neuron pruning can be combined with subsequent weight pruning, reducing the size of the LeNet-5 and VGG16 up to \(92\%\) and \(80\%\) respectively.



This work has been supported by the German Research Foundation (DFG) within project Fi799/9-1 (’Partially Supervised Learning of Models for Visual Scene Recognition’).


  1. 1.
    Alvarez, J.M., Salzmann, M.: Learning the number of neurons in deep networks. In: Advances in Neural Information Processing Systems, pp. 2262–2270 (2016)Google Scholar
  2. 2.
    Bray, J.R., Curtis, J.T.: An ordination of the upland forest communities of southern wisconsin. In: Ecological monographs, vol. 27, pp. 325–349. Wiley Online Library (1957)Google Scholar
  3. 3.
    Denil, M., Shakibi, B., Dinh, L., Ranzato, M., de Freitas, N.: Predicting parameters in deep learning. CoRR abs/1306.0543 (2013).
  4. 4.
    Giot, R., El-Abed, M., Rosenberger, C.: Fast computation of the performance evaluation of biometric systems: application to multibiometrics. In: Future Generation Computer Systems, vol. 29, pp. 788–799. Elsevier (2013)Google Scholar
  5. 5.
    Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A.C., Bengio, Y.: Maxout Networks. In: ICML (3), vol. 28, pp. 1319–1327 (2013)Google Scholar
  6. 6.
    Grzeszick, R., Sudholt, S., Fink, G.A.: Optimistic and pessimistic neural networks for scene and object recognition. CoRR abs/1609.07982 (2016).
  7. 7.
    Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural network with pruning, trained quantization and huffman coding. CoRR abs/1510.00149 (2015).
  8. 8.
    Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural networks. CoRR abs/1506.02626 (2015).
  9. 9.
    Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report, pp. 07–49. University of Massachusetts, Amherst, October 2007Google Scholar
  10. 10.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)Google Scholar
  11. 11.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  12. 12.
    LeCun, Y., Denker, J.S., Solla, S.A., Howard, R.E., Jackel, L.D.: Optimal brain damage. In: Advances in Neural Information Processing Systems, vol. 2, pp. 598–605. Morgan-Kaufmann (1989).
  13. 13.
    Liu, B., Wang, M., Foroosh, H., Tappen, M., Pensky, M.: Sparse convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 806–814 (2015)Google Scholar
  14. 14.
    Mozer, M.C., Smolensky, P.: Skeletonization: a technique for trimming the fat from a network via relevance assessment. In: Advances in Neural Information Processing Systems, pp. 107–115 (1989)Google Scholar
  15. 15.
    Ojala, M., Garriga, G.C.: Permutation tests for studying classifier performance. J. Mach. Learn. Res. 11, 1833–1863 (2010)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: British Machine Vision Conference (2015)Google Scholar
  17. 17.
    Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1 (2014)Google Scholar
  18. 18.
    Sudholt, S., Fink, G.A.: A modified isomap approach to manifold learning in word spotting. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 529–539. Springer, Cham (2015). doi: 10.1007/978-3-319-24947-6_44 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Fernando Moya Rueda
    • 1
    Email author
  • Rene Grzeszick
    • 1
  • Gernot A. Fink
    • 1
  1. 1.Department of Computer ScienceTU Dortmund UniversityDortmundGermany

Personalised recommendations