Parameters Sharing in Residual Neural Networks

  • Dawei DaiEmail author
  • Liping Yu
  • Hui Wei


Deep neural networks (DNN) have achieved great success in machine learning due to their powerful ability to learn and present knowledge. However, models of such DNN often have massive trainable parameters, which lead to big resource burden in practice. As a result, reducing the amount of parameters and preserving its competitive performance are always critical tasks in the field of DNN. In this paper, we focused on one type of convolution neural network that has many repeated or same-structure convolutional layers. Residual net and its variants are widely used, making the deeper model easy to train. One type block of such a model contains two convolutional layers, and each block commonly has two trainable parameter layers. However, we used only one layer of trainable parameters in the block, which means that the two convolutional layers in one block shared one layer of trainable parameters. We performed extensive experiments for different architectures of the Residual Net with trainable parameter sharing on the CIFAR-10, CIFAR-100, and ImageNet datasets. We found that the model with trainable parameter sharing can obtain fewer errors on the training datasets and had a very close recognition accuracy (within 0.5%), compared to the original models. The parameters of the new model were reduced by more than 1/3 of the total of the original.


CNN ResNet Classification Parameters reducing 



This work was sponsored by Natural Science Foundation of Chongqing (No. E021D2019034), Chongqing Education Commission (No. E010J2019025), NSFC project (No. 61771146, 61375122), and in part by Shanghai Science and Technology Development Funds (No. 13dz2260200, 13511504300).


  1. 1.
    Cireşan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. ArXiv preprint arXiv:1202.2745
  2. 2.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp 580–587)Google Scholar
  3. 3.
    Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105Google Scholar
  4. 4.
    Shrikumar A, Greenside P, Kundaje A (2017) Learning important features through propagating activation differences. ArXiv preprint arXiv:1704.02685
  5. 5.
    Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision, pp 818–833. Springer, ChamCrossRefGoogle Scholar
  6. 6.
    Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929Google Scholar
  7. 7.
    Bach S, Binder A, Montavon G, Klauschen F, Müller KR, Samek W (2015) On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7):e0130140CrossRefGoogle Scholar
  8. 8.
    Zintgraf LM, Cohen TS, Adel T, Welling M (2017) Visualizing deep neural network decisions: prediction difference analysis. ArXiv preprint arXiv:1702.04595
  9. 9.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. ArXiv preprint arXiv:1409.1556
  10. 10.
    Zagoruyko S, Komodakis N (2016) Wide residual networks. ArXiv preprint arXiv:1605.07146
  11. 11.
    Yosinski J, Clune J, Nguyen A, Fuchs T, Lipson H (2015) Understanding neural networks through deep visualization. ArXiv preprint arXiv:1506.06579
  12. 12.
    Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99Google Scholar
  13. 13.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440Google Scholar
  14. 14.
    Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: 2017 IEEE international conference on computer vision (ICCV), pp 2755–2763. IEEEGoogle Scholar
  15. 15.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778Google Scholar
  16. 16.
    Krizhevsky A, Nair V, Hinton G (2010) Cifar-10 (Canadian Institute for Advanced Research).
  17. 17.
    Chrabaszcz P, Loshchilov I, Hutter F (2017) A downsampled variant of ImageNet as an alternative to the CIFAR datasets. ArXiv preprint arXiv:1707.08819
  18. 18.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110CrossRefGoogle Scholar
  19. 19.
    Belue LM, Bauer KW Jr (1995) Determining input features for multilayer perceptrons. Neurocomputing 7(2):111–121CrossRefGoogle Scholar
  20. 20.
    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436CrossRefGoogle Scholar
  21. 21.
    Celik MU, Sharma G, Tekalp AM, Saber E (2002) Reversible data hiding. In: Proceedings international conference on image processing, vol 2, p II. IEEEGoogle Scholar
  22. 22.
    LeCun Y, Boser BE, Denker JS, Henderson D, Howard RE, Hubbard WE, Jackel LD (1990) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, pp 396–404Google Scholar
  23. 23.
    Lin M, Chen Q, Yan S (2013) Network in network. ArXiv preprint arXiv:1312.4400
  24. 24.
    Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: CVPR, vol 1(2), p 3Google Scholar
  25. 25.
    Hornic K (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366CrossRefGoogle Scholar
  26. 26.
    Leshno M, Vladimir YL, Pinkus A et al (1991) Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw 6(6):861–867CrossRefGoogle Scholar
  27. 27.
    Heaton J, Goodfellow I, Bengio Y, Courville A (2017) Deep learning. Genet Program Evolvable Mach. CrossRefzbMATHGoogle Scholar
  28. 28.
    Zhang X, Luo H, Fan X, Xiang W, Sun Y, Xiao Q et al (2017) Alignedreid: surpassing human-level performance in person re-identification. ArXiv preprint arXiv:1711.08184
  29. 29.
    Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. ArXiv preprint arXiv:1502.03167
  30. 30.
    Lee CY, Xie S, Gallagher P, Zhang Z, Tu Z (2015). Deeply-supervised nets. In: Artificial intelligence and statistics, pp 562–570Google Scholar
  31. 31.
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S et al (2015) ImageNet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.College of Computer Science and TechnologyChongqing University of Posts and TelecommunicationsChongqingChina
  2. 2.Laboratory of Cognitive Model and Algorithms, Department of Computer Science, Shanghai Key Laboratory of Data ScienceFudan UniversityShanghaiChina

Personalised recommendations