Skip to main content

Advertisement

Log in

Neural network relief: a pruning algorithm based on neural activity

  • Published:
Machine Learning Aims and scope Submit manuscript

Abstract

Current deep neural networks (DNNs) are overparameterized and use most of their neuronal connections during inference for each task. The human brain, however, developed specialized regions for different tasks and performs inference with a small fraction of its neuronal connections. We propose an iterative pruning strategy introducing a simple importance-score metric that deactivates unimportant connections, tackling overparameterization in DNNs and modulating the firing patterns. The aim is to find the smallest number of connections that is still capable of solving a given task with comparable accuracy, i.e. a simpler subnetwork. We achieve comparable performance for LeNet architectures on MNIST, and significantly higher parameter compression than state-of-the-art algorithms for VGG and ResNet architectures on CIFAR-10/100 and Tiny-ImageNet. Our approach also performs well for the two different optimizers considered—Adam and SGD. The algorithm is not designed to minimize FLOPs when considering current hardware and software implementations, although it performs reasonably when compared to the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Algorithm 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

Not applicable

Code availability

PyTorch [1] implementation of the code is available at: https://github.com/adekhovich/NNrelief.

References

  • Aghasi, A., Abdi, A., Nguyen, N., & Romberg, J. (2017). Net-trim: Convex pruning of deep neural networks with performance guarantee. Advances in Neural Information Processing Systems, 30, 3177–3186.

    Google Scholar 

  • Ahmad, S., & Scheinkman, L. (2019). How can we be so dense? The benefits of using highly sparse representations. arXiv preprint arXiv:1903.11257

  • Ancona, M., Öztireli, C., & Gross, M. (2020). Shapley value as principled metric for structured network pruning. arXiv preprint arXiv:2006.01795

  • Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202.

    Article  MathSciNet  Google Scholar 

  • Dong, X., Chen, S., & Pan, S. (2017). Learning to prune deep neural networks via layer-wise optimal brain surgeon. In Advances in neural information processing systems (pp. 4857–4867).

  • Frankle, J., & Carbin, M. (2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International conference on learning representations (ICLR)

  • Garg, Y., & Candan, K.S. (2020). isparse: Output informed sparsification of neural network. In Proceedings of the 2020 international conference on multimedia retrieval (pp. 180–188).

  • Geng, L., & Niu, B. (2022). Pruning convolutional neural networks via filter similarity analysis. Machine Learning, 111(9), 3161–3180.

    Article  MathSciNet  Google Scholar 

  • Guo, Y., Yao, A. & Chen, Y. (2016). Dynamic network surgery for efficient DNNs. In Advances in neural information processing systems (pp. 1379–1387).

  • Han, S., Mao, H., & Dally, W.J. (2016). Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In International conference on learning representations (ICLR)

  • Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. In Advances in neural information processing systems (pp. 1135–1143).

  • Hassibi, B., & Stork, D.G. (1993). Second order derivatives for network pruning: Optimal brain surgeon. In Advances in neural information processing systems (pp. 164–171)

  • He, Y., Zhang, X., & Sun, J. (2017). Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision (pp. 1389–1397).

  • He, Y., Kang, G., Dong, X., Fu, Y., & Yang, Y. (2018). Soft filter pruning for accelerating deep convolutional neural networks. In International joint conference on artificial intelligence (IJCAI) (pp. 2234–2240).

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

  • He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision (pp. 1026–1034).

  • Hu, H., Peng, R., Tai, Y.-W., & Tang, C.-K. (2016). Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250

  • Huang, Z., & Wang, N. (2018). Data-driven sparse structure selection for deep neural networks. In Proceedings of the European conference on computer vision (ECCV) (pp. 304–320).

  • Jaccard, P. (1912). The distribution of the flora in the alpine zone. 1. New Phytologist, 11(2), 37–50.

    Article  Google Scholar 

  • Karnin, E. D. (1990). A simple procedure for pruning back-propagation trained neural networks. IEEE Transactions on Neural Networks, 1(2), 239–242.

    Article  CAS  PubMed  Google Scholar 

  • Kingma, D.P. & Ba, J. (2015). Adam: A method for stochastic optimization. In 3rd international conference on learning representations, ICLR.

  • Krizhevsky. A. (2009). Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto.

  • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.

    Article  Google Scholar 

  • LeCun, Y., Denker, J.S., & Solla, S.A. (1990). Optimal brain damage. In Advances in neural information processing systems (pp. 598–605).

  • Lebedev, V., & Lempitsky, V. (2016). Fast convnets using group-wise brain damage. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2554–2564).

  • Lee, N., Ajanthan, T. & Torr, P.H. (2019). Snip: Single-shot network pruning based on connection sensitivity. In International conference on learning representations (ICLR)

  • Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H.P. (2017). Pruning filters for efficient convnets. In international conference of learning representation (ICLR).

  • Li, T., Wu, B., Yang, Y., Fan, Y., Zhang, Y., & Liu, W. (2019). Compressing convolutional neural networks via factorized convolutional filters. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3977–3986).

  • Liao, N., Wang, S., Xiang, L., Ye, N., Shao, S., & Chu, P. (2022). Achieving adversarial robustness via sparsity. Machine Learning, 111(2), 685–711.

    Article  MathSciNet  Google Scholar 

  • Lin, M., Ji, R., Wang, Y., Zhang, Y., Zhang, B., Tian, Y., & Shao, L. (2020). Hrank: Filter pruning using high-rank feature map. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1529–1538).

  • Louizos, C., Ullrich, K., Welling, M. (2017). Bayesian compression for deep learning. In Advances in neural information processing systems (pp. 3288–3298).

  • Luo, J.-H., Wu, J., & Lin, W. (2017). Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE international conference on computer vision (pp. 5058–5066)

  • Mallya, A., & Lazebnik, S. (2018). Packnet: Adding multiple tasks to a single network by iterative pruning. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7765–7773).

  • Mallya, A., Davis, D., & Lazebnik, S. (2018) Piggyback: Adapting a single network to multiple tasks by learning to mask weights. In Proceedings of the European conference on computer vision (ECCV) (pp. 67–82).

  • Mehta, D., Kim, K.I., & Theobalt, C. (2019). On implicit filter level sparsity in convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 520–528).

  • Molchanov, P., Mallya, A., Tyree, S., Frosio, I., & Kautz, J. (2019). Importance estimation for neural network pruning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 11264–11272).

  • Molchanov, P., Tyree, S., Karras, T., Aila, T., & Kautz, J. (2016). Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440

  • Molchanov, D., Ashukha, A., & Vetrov, D. (2017). Variational dropout sparsifies deep neural networks. In International conference on machine learning (pp. 2498–2507). PMLR.

  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J. & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703

  • Renda, A., Frankle, J., & Carbin, M. (2020). Comparing rewinding and fine-tuning in neural network pruning. arXiv preprint arXiv:2003.02389

  • Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations (ICLR).

  • Sui, Y., Yin, M., Xie, Y., Phan, H., Aliari Zonouz, S., & Yuan, B. (2021). Chip: Channel independence-based pruning for compact neural networks. Advances in Neural Information Processing Systems, 34, 24604–24616.

    Google Scholar 

  • Ullrich, K., Meeds, E., & Welling, M. (2017). Soft weight-sharing for neural network compression. arXiv preprint arXiv:1702.04008

  • Ye, S., Xu, K., Liu, S., Cheng, H., Lambrechts, J.-H., Zhang, H., Zhou, A., Ma, K., Wang, Y., & Lin, X. (2019). Adversarial robustness vs. model compression, or both?. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 111–120).

  • Ye, J., Lu, X., Lin, Z., & Wang, J.Z. (2018). Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. arXiv preprint arXiv:1802.00124

  • Yu, R., Li, A., Chen, C.-F., Lai, J.-H., Morariu, V.I., Han, X., Gao, M., Lin, C.-Y., & Davis, L.S. (2018). Nisp: Pruning networks using neuron importance score propagation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9194–9203).

  • Zagoruyko, S. (2015). 92.45 on cifar-10 in torch. URL: http://torch.ch/blog/2015/07/30/cifar.html.

  • Zhao, C., Ni, B., Zhang, J., Zhao, Q., Zhang, W., & Tian, Q. (2019). Variational convolutional neural network pruning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2780–2789).

Download references

Funding

Not applicable

Author information

Authors and Affiliations

Authors

Contributions

Not applicable

Corresponding author

Correspondence to Miguel A. Bessa.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare.

Ethical approval

Not applicable

Consent to participate

Not applicable

Consent for publication

Not applicable

Additional information

Editor: Andrea Passerini.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Pruning setups

We implement our approach with PyTorch (Paszke et al., 2019). Table 8 shows the training parameters for each network.

Table 8 Training setups

Table 9 shows the parameters for retraining on every iteration.

Table 9 Retraining parameters

Table 10 shows the pruning parameters and total number of iterations.

Table 10 Pruning parameters

Appendix B: FLOPs computation

According to Molchanov et al. (2016), we compute FLOPs as follows:

  • for a convolutional layer: FLOPs \(= 2HW(C_{in}K^2+1)C_{out}\) where HW and \(C_{in}\) are height, width and number of channels of the input feature map, K is the kernel width (and height due to symmetry), and \(C_{out}\) is the number of output channels.

  • for a fully connected layer: FLOPs \(= (2I-1)O\), where I is the input dimensionality and O is the output dimensionality.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dekhovich, A., Tax, D.M.J., Sluiter, M.H.F. et al. Neural network relief: a pruning algorithm based on neural activity. Mach Learn (2024). https://doi.org/10.1007/s10994-024-06516-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10994-024-06516-z

Keywords

Navigation