Regularization-based pruning of irrelevant weights in deep neural architectures

Bonetta, Giovanni; Ribero, Matteo; Cancelliere, Rossella

doi:10.1007/s10489-022-04353-y

Regularization-based pruning of irrelevant weights in deep neural architectures

Published: 05 January 2023

Volume 53, pages 17429–17443, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

265 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Deep neural networks exploiting million parameters are currently the norm. This is a potential issue because of the great number of computations needed for training, and the possible loss of generalization performance of overparameterized networks. We propose in this paper a method for learning sparse neural topologies via a regularization approach that identifies nonrelevant weights in any type of layer (i.e., convolutional, fully connected, attention and embedding ones) and selectively shrinks their norm while performing a standard back-propagation update for relevant layers. This technique, which is an improvement of classical weight decay, is based on the definition of a regularization term that can be added to any loss function regardless of its form, resulting in a unified general framework exploitable in many different contexts. The actual elimination of parameters identified as irrelevant is handled by an iterative pruning algorithm.

To explore the possibility of an interdisciplinary use of our proposed technique, we test it on six different image classification and natural language generation tasks, among which four are based on real datasets. We reach state-of-the-art performance in one out of four imaging tasks while obtaining results better than competitors for the others and one out of two of the considered language generation tasks, both in terms of compression and metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Neural network relief: a pruning algorithm based on neural activity

Article 05 March 2024

Data-Driven Sparse Structure Selection for Deep Neural Networks

Improve Convolutional Neural Network Pruning by Maximizing Filter Variety

Notes

https://www.graphcore.ai/
more info at: https://www.gnu.org/software/gzip/
more info at: https://www.sourceware.org/bzip2/
https://pytorch.org/vision/stable/index.html
https://huggingface.co/
https://commoncrawl.org/

References

Zhang K, Gool LV, Timofte R (2020) Deep unfolding network for image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
He T, Zhang Z, Zhang H, Zhang Z, Xie J, Li M (2019) Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 770–778
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1–9
Guo L, Liu J, Zhu X, Yao P, Lu S, Lu H (2020) Normalized and geometry-aware self-attention network for image captioning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Feng Y, Ma L, Liu W, Luo J (2019) Unsupervised image captioning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Puduppully R, Dong L, Lapata M (2019) Data-to-text generation with content selection and planning. In: Proceedings of the thirty-third conference on artificial intelligence, AAAI, Honolulu, Hawaii, USA, pp 6908–6915
Dusek O, Novikova J, Rieser V (2020) Evaluating the state-of-the-art of end-to-end natural language generation: the E2E NLG challenge. Comput Speech Lang 59:123–156
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems 31, pp 6000–6010
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International conference on learning representations ICLR, San Diego, CA, USA
Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural network. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems 28, pp 1135–1143
Ullrich K, Meeds E, Welling M (2017) Soft weight-sharing for neural network compression. In: 5th International Conference on Learning Representations, ICLR, Toulon, France
Sanh V, Wolf T, Rush AM (2020) Movement pruning: adaptive sparsity by fine-tuning. Adv Neural Inf Process Syst, vol 34
Liu J, Wang Y, Qiao Y (2017) Sparse deep transfer learning for convolutional neural network. In: The thirty-first AAAI conference on artificial intelligence, AAAI
Tartaglione E, Lepsøy S, Fiandrotti A , Francini G (2018) Learning sparse neural networks via sensitivity-driven regularization. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems, vol 32, NeurIPS
Tartaglione E, Bragagnolo A, Fiandrotti A, Grangetto M (2022) LOSs-based sensitivity regularization: towards deep sparse neural networks. Neural Netw 146:230–237
Article Google Scholar
Gomez AN, Zhang I, Swersky K, Gal Y, Hinton GE (2019) Learning sparse networks using targeted dropout. arXiv:1905.13678
Lin S et al (2018) Accelerating convolutional networks via global & dynamic filter pruning, proceedings of the 27th international joint conference on artificial intelligence IJCAI
Lin S et al (2020) Toward compact convnets via structure-sparsity regularized filter pruning. IEEE Trans Neural Netw Learn Syst 31(2):574–588
Article MathSciNet Google Scholar
Lin C et al (2018) Synaptic strength for convolutional neural network. Adv Neural Inf Process Syst, vol 32 neurIPS
Wang Z, Lin S, Xie J, Lin Y (2019) Pruning Blocks for CNN compression and acceleration via online ensemble distillation. IEEE Access 7:175703–175716
Article Google Scholar
Ding G, Zhang S, Jia Z, Zhong J, Han J (2021) Where to prune: using LSTM to guide data-dependent soft pruning. IEEE Trans Image Process 30:293–304
Article Google Scholar
Zhu J, Pei J (2022) Progressive kernel pruning with saliency mapping of input-output channels. Neurocomputing 467(7):360–378
Article Google Scholar
Zhu J, Pei J, Progressive kernel pruning CNN (2022) Compression method with an adjustable input channel. Appl Intell 52(3):1–22
Google Scholar
Huang Z, Wang N (2018) Data-driven sparse structure selection for deep neural. Proceedings of the 15th european conference on computer vision ECCV
He Y, Dong X, Kang G, Fu Y, Yan C, Yang Y (2020) Asymptotic soft filter pruning for deep convolutional neural networks. IEEE Trans Cybern 50(8):3594–3604
Article Google Scholar
Lin M et al (2020) HRank: filter pruning using high-rank feature map. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1526–1535
Zhuang Z, Tan M, Zhuang B, Liu J, Guo Y, Wu Q, Huang J, Zhu J (2018) Discrimination-aware channel pruning for deep neural networks. In: Proceedings of the 32nd international conference on neural information processing systems (NIPS’18). Red Hook, NY, USA, pp 883–894
Molchanov D, Ashukha A, Vetrov DP (2017) Variational dropout sparsifies deep neural networks. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, ICML, pp 2498–2507
Salehinejad H, Valaee S (2021) EDRopout: energy-based dropout and pruning of deep neural networks. IEEE Trans Neural Netw Learn:1–14
Lee N, Ajanthan T, Torr PHS (2019) Snip: single-shot network pruning based on connection sensitivity. In: Proceedings of the 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA
Guo Y, Yao A, Chen Y (2016) Dynamic network surgery for efficient dnns. In: Lee DD, Sugiyama M, von Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems, vol 29, Barcelona, Spain, pp 1379–1387
Gale T, Elsen E, Hooker S (2019) The state of sparsity in deep neural networks. arXiv:1902.09574
Goodfellow IJ, Bengio Y, Courville AC (2016) Deep Learning, Adaptive computation and machine learning series. The MIT Press, Massachusetts Institute of Technology, Cambridge
MATH Google Scholar
Tikhonov AN (1963) Solution of incorrectly formulated problems and the regularization method. Soviet Math Dokl 4:1035–1038
MATH Google Scholar
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics (ACL), Philadelphia, pp 311–318
LeCun Y, Cortes C (1990) MNIST handwritten digit database
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747
Alex Krizhevsky VN, Hinton G (2009) CIFAR RGB image dataset
Torralba A, Fergus R, Freeman WT (2008) 80 Million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970
Article Google Scholar
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the 2009 IEEE conference on computer vision and pattern recognition, pp 248–255
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
Article Google Scholar
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Article Google Scholar
Louizos C, Welling M, Kingma DP (2018) Learning sparse neural networks through l_0 regularization. In: 6th International conference on learning representations, ICLR 2018, Vancouver, BC, Canada, 30 April - 3 May 2018, conference track proceedings, openreview.net
Michel P, Levy O, Neubig G (2019) Are sixteen heads really better than one?. In: wallach H, Larochelle H, Beygelzimer A, D'Alché-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems, vol 33
Voita E, Talbot D, Moiseev F, Sennrich R, Titov I (2019) Analyzing multi-head self-attention: specialized heads do the heavy lifting, the rest can be pruned. In: Proceedings of the 57th annual meeting of the association for computational linguistics (ACL), Stroudsburg, PA, USA, pp 5797–5808
Byrne B, Krishnamoorthi K, Sankar C, Neelakantan A, Goodrich B, Duckworth D, Yavuz S, Dubey A, Kim K, Cedilnik A (2019) Taskmaster-1: toward a realistic and diverse dialog dataset. In: Inui K, Jiang J, Ng V, Wan X (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing EMNLP-IJCNLP, Hong Kong, China, pp 4515–4524
Bojar O, Buck C, Federmann C, Haddow B, Koehn P, Leveling J, Monz C, Pecina P, Post M, Saint-Amand H, Soricut R, Specia L, Tamchyna A (2014) Proceedings of the ninth workshop on statistical machine translation, association for computational linguistics, Baltimore, Maryland, USA, pp 12–58
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the tenth machine translation summit, AAMT, Phuket, Thailand, pp 79–86
Tiedemann J (2012) Parallel data, tools and interfaces in opus. In: Proceedings of the eight international conference on language resources and evaluation (LREC’12), european language resources association (ELRA), Istanbul, Turkey

Download references

Acknowledgements

The activity has been partially carried on in the context of the Visiting Professor Program of the Gruppo Nazionale per il Calcolo Scientifico (GNCS) of the Italian Istituto Nazionale di Alta Matematica (INdAM).

Author information

Authors and Affiliations

Computer Science Department, University of Turin, Via Pessinetto 12, Turin, 10149, Italy
Giovanni Bonetta, Matteo Ribero & Rossella Cancelliere

Authors

Giovanni Bonetta
View author publications
You can also search for this author in PubMed Google Scholar
Matteo Ribero
View author publications
You can also search for this author in PubMed Google Scholar
Rossella Cancelliere
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giovanni Bonetta.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bonetta, G., Ribero, M. & Cancelliere, R. Regularization-based pruning of irrelevant weights in deep neural architectures. Appl Intell 53, 17429–17443 (2023). https://doi.org/10.1007/s10489-022-04353-y

Download citation

Accepted: 18 November 2022
Published: 05 January 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s10489-022-04353-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Regularization-based pruning of irrelevant weights in deep neural architectures

Abstract

Access this article

Similar content being viewed by others

Neural network relief: a pruning algorithm based on neural activity

Data-Driven Sparse Structure Selection for Deep Neural Networks

Improve Convolutional Neural Network Pruning by Maximizing Filter Variety

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Regularization-based pruning of irrelevant weights in deep neural architectures

Abstract

Access this article

Similar content being viewed by others

Neural network relief: a pruning algorithm based on neural activity

Data-Driven Sparse Structure Selection for Deep Neural Networks

Improve Convolutional Neural Network Pruning by Maximizing Filter Variety

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation