Abstract
Deep neural networks exploiting million parameters are currently the norm. This is a potential issue because of the great number of computations needed for training, and the possible loss of generalization performance of overparameterized networks. We propose in this paper a method for learning sparse neural topologies via a regularization approach that identifies nonrelevant weights in any type of layer (i.e., convolutional, fully connected, attention and embedding ones) and selectively shrinks their norm while performing a standard back-propagation update for relevant layers. This technique, which is an improvement of classical weight decay, is based on the definition of a regularization term that can be added to any loss function regardless of its form, resulting in a unified general framework exploitable in many different contexts. The actual elimination of parameters identified as irrelevant is handled by an iterative pruning algorithm.
To explore the possibility of an interdisciplinary use of our proposed technique, we test it on six different image classification and natural language generation tasks, among which four are based on real datasets. We reach state-of-the-art performance in one out of four imaging tasks while obtaining results better than competitors for the others and one out of two of the considered language generation tasks, both in terms of compression and metrics.
Similar content being viewed by others
References
Zhang K, Gool LV, Timofte R (2020) Deep unfolding network for image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
He T, Zhang Z, Zhang H, Zhang Z, Xie J, Li M (2019) Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 770–778
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1–9
Guo L, Liu J, Zhu X, Yao P, Lu S, Lu H (2020) Normalized and geometry-aware self-attention network for image captioning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Feng Y, Ma L, Liu W, Luo J (2019) Unsupervised image captioning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Puduppully R, Dong L, Lapata M (2019) Data-to-text generation with content selection and planning. In: Proceedings of the thirty-third conference on artificial intelligence, AAAI, Honolulu, Hawaii, USA, pp 6908–6915
Dusek O, Novikova J, Rieser V (2020) Evaluating the state-of-the-art of end-to-end natural language generation: the E2E NLG challenge. Comput Speech Lang 59:123–156
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems 31, pp 6000–6010
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International conference on learning representations ICLR, San Diego, CA, USA
Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural network. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems 28, pp 1135–1143
Ullrich K, Meeds E, Welling M (2017) Soft weight-sharing for neural network compression. In: 5th International Conference on Learning Representations, ICLR, Toulon, France
Sanh V, Wolf T, Rush AM (2020) Movement pruning: adaptive sparsity by fine-tuning. Adv Neural Inf Process Syst, vol 34
Liu J, Wang Y, Qiao Y (2017) Sparse deep transfer learning for convolutional neural network. In: The thirty-first AAAI conference on artificial intelligence, AAAI
Tartaglione E, Lepsøy S, Fiandrotti A , Francini G (2018) Learning sparse neural networks via sensitivity-driven regularization. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems, vol 32, NeurIPS
Tartaglione E, Bragagnolo A, Fiandrotti A, Grangetto M (2022) LOSs-based sensitivity regularization: towards deep sparse neural networks. Neural Netw 146:230–237
Gomez AN, Zhang I, Swersky K, Gal Y, Hinton GE (2019) Learning sparse networks using targeted dropout. arXiv:1905.13678
Lin S et al (2018) Accelerating convolutional networks via global & dynamic filter pruning, proceedings of the 27th international joint conference on artificial intelligence IJCAI
Lin S et al (2020) Toward compact convnets via structure-sparsity regularized filter pruning. IEEE Trans Neural Netw Learn Syst 31(2):574–588
Lin C et al (2018) Synaptic strength for convolutional neural network. Adv Neural Inf Process Syst, vol 32 neurIPS
Wang Z, Lin S, Xie J, Lin Y (2019) Pruning Blocks for CNN compression and acceleration via online ensemble distillation. IEEE Access 7:175703–175716
Ding G, Zhang S, Jia Z, Zhong J, Han J (2021) Where to prune: using LSTM to guide data-dependent soft pruning. IEEE Trans Image Process 30:293–304
Zhu J, Pei J (2022) Progressive kernel pruning with saliency mapping of input-output channels. Neurocomputing 467(7):360–378
Zhu J, Pei J, Progressive kernel pruning CNN (2022) Compression method with an adjustable input channel. Appl Intell 52(3):1–22
Huang Z, Wang N (2018) Data-driven sparse structure selection for deep neural. Proceedings of the 15th european conference on computer vision ECCV
He Y, Dong X, Kang G, Fu Y, Yan C, Yang Y (2020) Asymptotic soft filter pruning for deep convolutional neural networks. IEEE Trans Cybern 50(8):3594–3604
Lin M et al (2020) HRank: filter pruning using high-rank feature map. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1526–1535
Zhuang Z, Tan M, Zhuang B, Liu J, Guo Y, Wu Q, Huang J, Zhu J (2018) Discrimination-aware channel pruning for deep neural networks. In: Proceedings of the 32nd international conference on neural information processing systems (NIPS’18). Red Hook, NY, USA, pp 883–894
Molchanov D, Ashukha A, Vetrov DP (2017) Variational dropout sparsifies deep neural networks. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, ICML, pp 2498–2507
Salehinejad H, Valaee S (2021) EDRopout: energy-based dropout and pruning of deep neural networks. IEEE Trans Neural Netw Learn:1–14
Lee N, Ajanthan T, Torr PHS (2019) Snip: single-shot network pruning based on connection sensitivity. In: Proceedings of the 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA
Guo Y, Yao A, Chen Y (2016) Dynamic network surgery for efficient dnns. In: Lee DD, Sugiyama M, von Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems, vol 29, Barcelona, Spain, pp 1379–1387
Gale T, Elsen E, Hooker S (2019) The state of sparsity in deep neural networks. arXiv:1902.09574
Goodfellow IJ, Bengio Y, Courville AC (2016) Deep Learning, Adaptive computation and machine learning series. The MIT Press, Massachusetts Institute of Technology, Cambridge
Tikhonov AN (1963) Solution of incorrectly formulated problems and the regularization method. Soviet Math Dokl 4:1035–1038
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics (ACL), Philadelphia, pp 311–318
LeCun Y, Cortes C (1990) MNIST handwritten digit database
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747
Alex Krizhevsky VN, Hinton G (2009) CIFAR RGB image dataset
Torralba A, Fergus R, Freeman WT (2008) 80 Million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the 2009 IEEE conference on computer vision and pattern recognition, pp 248–255
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Louizos C, Welling M, Kingma DP (2018) Learning sparse neural networks through l_0 regularization. In: 6th International conference on learning representations, ICLR 2018, Vancouver, BC, Canada, 30 April - 3 May 2018, conference track proceedings, openreview.net
Michel P, Levy O, Neubig G (2019) Are sixteen heads really better than one?. In: wallach H, Larochelle H, Beygelzimer A, D'Alché-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems, vol 33
Voita E, Talbot D, Moiseev F, Sennrich R, Titov I (2019) Analyzing multi-head self-attention: specialized heads do the heavy lifting, the rest can be pruned. In: Proceedings of the 57th annual meeting of the association for computational linguistics (ACL), Stroudsburg, PA, USA, pp 5797–5808
Byrne B, Krishnamoorthi K, Sankar C, Neelakantan A, Goodrich B, Duckworth D, Yavuz S, Dubey A, Kim K, Cedilnik A (2019) Taskmaster-1: toward a realistic and diverse dialog dataset. In: Inui K, Jiang J, Ng V, Wan X (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing EMNLP-IJCNLP, Hong Kong, China, pp 4515–4524
Bojar O, Buck C, Federmann C, Haddow B, Koehn P, Leveling J, Monz C, Pecina P, Post M, Saint-Amand H, Soricut R, Specia L, Tamchyna A (2014) Proceedings of the ninth workshop on statistical machine translation, association for computational linguistics, Baltimore, Maryland, USA, pp 12–58
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the tenth machine translation summit, AAMT, Phuket, Thailand, pp 79–86
Tiedemann J (2012) Parallel data, tools and interfaces in opus. In: Proceedings of the eight international conference on language resources and evaluation (LREC’12), european language resources association (ELRA), Istanbul, Turkey
Acknowledgements
The activity has been partially carried on in the context of the Visiting Professor Program of the Gruppo Nazionale per il Calcolo Scientifico (GNCS) of the Italian Istituto Nazionale di Alta Matematica (INdAM).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bonetta, G., Ribero, M. & Cancelliere, R. Regularization-based pruning of irrelevant weights in deep neural architectures. Appl Intell 53, 17429–17443 (2023). https://doi.org/10.1007/s10489-022-04353-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04353-y