Abstract
Current deep neural networks (DNNs) are overparameterized and use most of their neuronal connections during inference for each task. The human brain, however, developed specialized regions for different tasks and performs inference with a small fraction of its neuronal connections. We propose an iterative pruning strategy introducing a simple importance-score metric that deactivates unimportant connections, tackling overparameterization in DNNs and modulating the firing patterns. The aim is to find the smallest number of connections that is still capable of solving a given task with comparable accuracy, i.e. a simpler subnetwork. We achieve comparable performance for LeNet architectures on MNIST, and significantly higher parameter compression than state-of-the-art algorithms for VGG and ResNet architectures on CIFAR-10/100 and Tiny-ImageNet. Our approach also performs well for the two different optimizers considered—Adam and SGD. The algorithm is not designed to minimize FLOPs when considering current hardware and software implementations, although it performs reasonably when compared to the state of the art.
Similar content being viewed by others
Data availability
Not applicable
Code availability
PyTorch [1] implementation of the code is available at: https://github.com/adekhovich/NNrelief.
References
Aghasi, A., Abdi, A., Nguyen, N., & Romberg, J. (2017). Net-trim: Convex pruning of deep neural networks with performance guarantee. Advances in Neural Information Processing Systems, 30, 3177–3186.
Ahmad, S., & Scheinkman, L. (2019). How can we be so dense? The benefits of using highly sparse representations. arXiv preprint arXiv:1903.11257
Ancona, M., Öztireli, C., & Gross, M. (2020). Shapley value as principled metric for structured network pruning. arXiv preprint arXiv:2006.01795
Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202.
Dong, X., Chen, S., & Pan, S. (2017). Learning to prune deep neural networks via layer-wise optimal brain surgeon. In Advances in neural information processing systems (pp. 4857–4867).
Frankle, J., & Carbin, M. (2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International conference on learning representations (ICLR)
Garg, Y., & Candan, K.S. (2020). isparse: Output informed sparsification of neural network. In Proceedings of the 2020 international conference on multimedia retrieval (pp. 180–188).
Geng, L., & Niu, B. (2022). Pruning convolutional neural networks via filter similarity analysis. Machine Learning, 111(9), 3161–3180.
Guo, Y., Yao, A. & Chen, Y. (2016). Dynamic network surgery for efficient DNNs. In Advances in neural information processing systems (pp. 1379–1387).
Han, S., Mao, H., & Dally, W.J. (2016). Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In International conference on learning representations (ICLR)
Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. In Advances in neural information processing systems (pp. 1135–1143).
Hassibi, B., & Stork, D.G. (1993). Second order derivatives for network pruning: Optimal brain surgeon. In Advances in neural information processing systems (pp. 164–171)
He, Y., Zhang, X., & Sun, J. (2017). Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision (pp. 1389–1397).
He, Y., Kang, G., Dong, X., Fu, Y., & Yang, Y. (2018). Soft filter pruning for accelerating deep convolutional neural networks. In International joint conference on artificial intelligence (IJCAI) (pp. 2234–2240).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision (pp. 1026–1034).
Hu, H., Peng, R., Tai, Y.-W., & Tang, C.-K. (2016). Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250
Huang, Z., & Wang, N. (2018). Data-driven sparse structure selection for deep neural networks. In Proceedings of the European conference on computer vision (ECCV) (pp. 304–320).
Jaccard, P. (1912). The distribution of the flora in the alpine zone. 1. New Phytologist, 11(2), 37–50.
Karnin, E. D. (1990). A simple procedure for pruning back-propagation trained neural networks. IEEE Transactions on Neural Networks, 1(2), 239–242.
Kingma, D.P. & Ba, J. (2015). Adam: A method for stochastic optimization. In 3rd international conference on learning representations, ICLR.
Krizhevsky. A. (2009). Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
LeCun, Y., Denker, J.S., & Solla, S.A. (1990). Optimal brain damage. In Advances in neural information processing systems (pp. 598–605).
Lebedev, V., & Lempitsky, V. (2016). Fast convnets using group-wise brain damage. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2554–2564).
Lee, N., Ajanthan, T. & Torr, P.H. (2019). Snip: Single-shot network pruning based on connection sensitivity. In International conference on learning representations (ICLR)
Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H.P. (2017). Pruning filters for efficient convnets. In international conference of learning representation (ICLR).
Li, T., Wu, B., Yang, Y., Fan, Y., Zhang, Y., & Liu, W. (2019). Compressing convolutional neural networks via factorized convolutional filters. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3977–3986).
Liao, N., Wang, S., Xiang, L., Ye, N., Shao, S., & Chu, P. (2022). Achieving adversarial robustness via sparsity. Machine Learning, 111(2), 685–711.
Lin, M., Ji, R., Wang, Y., Zhang, Y., Zhang, B., Tian, Y., & Shao, L. (2020). Hrank: Filter pruning using high-rank feature map. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1529–1538).
Louizos, C., Ullrich, K., Welling, M. (2017). Bayesian compression for deep learning. In Advances in neural information processing systems (pp. 3288–3298).
Luo, J.-H., Wu, J., & Lin, W. (2017). Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE international conference on computer vision (pp. 5058–5066)
Mallya, A., & Lazebnik, S. (2018). Packnet: Adding multiple tasks to a single network by iterative pruning. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7765–7773).
Mallya, A., Davis, D., & Lazebnik, S. (2018) Piggyback: Adapting a single network to multiple tasks by learning to mask weights. In Proceedings of the European conference on computer vision (ECCV) (pp. 67–82).
Mehta, D., Kim, K.I., & Theobalt, C. (2019). On implicit filter level sparsity in convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 520–528).
Molchanov, P., Mallya, A., Tyree, S., Frosio, I., & Kautz, J. (2019). Importance estimation for neural network pruning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 11264–11272).
Molchanov, P., Tyree, S., Karras, T., Aila, T., & Kautz, J. (2016). Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440
Molchanov, D., Ashukha, A., & Vetrov, D. (2017). Variational dropout sparsifies deep neural networks. In International conference on machine learning (pp. 2498–2507). PMLR.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J. & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703
Renda, A., Frankle, J., & Carbin, M. (2020). Comparing rewinding and fine-tuning in neural network pruning. arXiv preprint arXiv:2003.02389
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations (ICLR).
Sui, Y., Yin, M., Xie, Y., Phan, H., Aliari Zonouz, S., & Yuan, B. (2021). Chip: Channel independence-based pruning for compact neural networks. Advances in Neural Information Processing Systems, 34, 24604–24616.
Ullrich, K., Meeds, E., & Welling, M. (2017). Soft weight-sharing for neural network compression. arXiv preprint arXiv:1702.04008
Ye, S., Xu, K., Liu, S., Cheng, H., Lambrechts, J.-H., Zhang, H., Zhou, A., Ma, K., Wang, Y., & Lin, X. (2019). Adversarial robustness vs. model compression, or both?. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 111–120).
Ye, J., Lu, X., Lin, Z., & Wang, J.Z. (2018). Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. arXiv preprint arXiv:1802.00124
Yu, R., Li, A., Chen, C.-F., Lai, J.-H., Morariu, V.I., Han, X., Gao, M., Lin, C.-Y., & Davis, L.S. (2018). Nisp: Pruning networks using neuron importance score propagation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9194–9203).
Zagoruyko, S. (2015). 92.45 on cifar-10 in torch. URL: http://torch.ch/blog/2015/07/30/cifar.html.
Zhao, C., Ni, B., Zhang, J., Zhao, Q., Zhang, W., & Tian, Q. (2019). Variational convolutional neural network pruning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2780–2789).
Funding
Not applicable
Author information
Authors and Affiliations
Contributions
Not applicable
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare.
Ethical approval
Not applicable
Consent to participate
Not applicable
Consent for publication
Not applicable
Additional information
Editor: Andrea Passerini.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Pruning setups
We implement our approach with PyTorch (Paszke et al., 2019). Table 8 shows the training parameters for each network.
Table 9 shows the parameters for retraining on every iteration.
Table 10 shows the pruning parameters and total number of iterations.
Appendix B: FLOPs computation
According to Molchanov et al. (2016), we compute FLOPs as follows:
-
for a convolutional layer: FLOPs \(= 2HW(C_{in}K^2+1)C_{out}\) where H, W and \(C_{in}\) are height, width and number of channels of the input feature map, K is the kernel width (and height due to symmetry), and \(C_{out}\) is the number of output channels.
-
for a fully connected layer: FLOPs \(= (2I-1)O\), where I is the input dimensionality and O is the output dimensionality.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dekhovich, A., Tax, D.M.J., Sluiter, M.H.F. et al. Neural network relief: a pruning algorithm based on neural activity. Mach Learn (2024). https://doi.org/10.1007/s10994-024-06516-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10994-024-06516-z