Pruning Artificial Neural Networks: A Way to Find Well-Generalizing, High-Entropy Sharp Minima

Tartaglione, Enzo; Bragagnolo, Andrea; Grangetto, Marco

doi:10.1007/978-3-030-61616-8_6

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12397))

Included in the following conference series:

International Conference on Artificial Neural Networks

2193 Accesses
8 Citations

Abstract

Recently, a race towards the simplification of deep networks has begun, showing that it is effectively possible to reduce the size of these models with minimal or no performance loss. However, there is a general lack in understanding why these pruning strategies are effective. In this work, we are going to compare and analyze pruned solutions with two different pruning approaches, one-shot and gradual, showing the higher effectiveness of the latter. In particular, we find that gradual pruning allows access to narrow, well-generalizing minima, which are typically ignored when using one-shot approaches. In this work we also propose PSP-entropy, a measure to understand how a given neuron correlates to some specific learned classes. Interestingly, we observe that the features extracted by iteratively-pruned models are less correlated to specific classes, potentially making these models a better fit in transfer learning approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/akamaster/pytorch_resnet_cifar10.
2.
The source code for PSP-entropy is available at https://github.com/EIDOSlab/PSP-entropy.git.

References

Ba, J., Caruana, R.: Do deep nets really need to be deep? In: Advances in Neural Information Processing Systems, pp. 2654–2662 (2014)
Google Scholar
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) Proceedings of COMPSTAT’2010, pp. 177–186. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-7908-2604-3_16
Chapter Google Scholar
Chaudhari, P., Choromanska, A., et al.: Entropy-SGD: biasing gradient descentinto wide valleys. In: International Conference on Learning Representations, ICLR 2017 (2017)
Google Scholar
Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in Neural Information Processing Systems, pp. 1269–1277 (2014)
Google Scholar
Draxler, F., Veschgini, K., Salmhofer, M., Hamprecht, F.: Essentially no barriers in neural network energy landscape. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR, Stockholmsmässan, Stockholm Sweden, 10–15 July 2018, vol. 80, pp. 1309–1318 (2018). http://proceedings.mlr.press/v80/draxler18a.html
Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks (2019). https://www.scopus.com/inward/record.uri?eid=2-s2.0-85069453436&partnerID=40&md5=fd1a2b2384d79f66a49cc838a76343d3
Golmant, N., Yao, Z., Gholami, A., Mahoney, M., Gonzalez, J.: pytorch-hessian-eigentings: efficient PyTorch Hessian eigendecomposition (October 2018). https://github.com/noahgolmant/pytorch-hessian-eigenthings
Goodfellow, I.J., Vinyals, O., Saxe, A.M.: Qualitatively characterizing neural network optimization problems. In: International Conference on Learning Representations, ICLR 2015 (2015)
Google Scholar
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, pp. 1135–1143 (2015)
Google Scholar
Howard, A.G., et al.: MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836 (2016)
LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems, pp. 598–605 (1990)
Google Scholar
Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient ConvNets. In: International Conference on Learning Representations, ICLR 2017 (2017)
Google Scholar
Liu, Z., Sun, M., Zhou, T., Huang, G., Darrell, T.: Rethinking the value of network pruning. In: International Conference on Learning Representations, ICLR 2019 (2019)
Google Scholar
Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through \(l\_0\) regularization. In: International Conference on Learning Representations, ICLR 2018 (2018)
Google Scholar
Luo, J.H., Wu, J., Lin, W.: ThiNet: a filter level pruning method for deep neural network compression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5058–5066 (2017)
Google Scholar
Molchanov, D., Ashukha, A., Vetrov, D.: Variational dropout sparsifies deep neural networks. In: 34th International Conference on Machine Learning, ICML 2017, vol. 5, pp. 3854–3863 (2017). https://www.scopus.com/inward/record.uri?eid=2-s2.0-85048506601&partnerID=40&md5=c352a4786ef977ccea7e397bd7469f14
Renda, A., Frankle, J., Carbin, M.: Comparing rewinding and fine-tuning in neural network pruning. arXiv preprint arXiv:2003.02389 (2020)
Tartaglione, E., Bragagnolo, A., Grangetto, M., Lepsøy, S.: Loss-based sensitivity regularization: towards deep sparse neural networks (2020). https://iris.unito.it/retrieve/handle/2318/1737767/608158/ICML20.pdf
Tartaglione, E., Grangetto, M.: Take a ramble into solution spaces for classification problems in neural networks. In: Ricci, E., Rota Bulò, S., Snoek, C., Lanz, O., Messelodi, S., Sebe, N. (eds.) ICIAP 2019. LNCS, vol. 11751, pp. 345–355. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30642-7_31
Chapter Google Scholar
Tartaglione, E., Lepsøy, S., Fiandrotti, A., Francini, G.: Learning sparse neural networks via sensitivity-driven regularization. In: Advances in Neural Information Processing Systems, pp. 3878–3888 (2018)
Google Scholar
Tartaglione, E., Perlo, D., Grangetto, M.: Post-synaptic potential regularization has potential. In: Tetko, I.V., Kůrková, V., Karpov, P., Theis, F. (eds.) ICANN 2019. LNCS, vol. 11728, pp. 187–200. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30484-3_16
Chapter Google Scholar
Ullrich, K., Welling, M., Meeds, E.: Soft weight-sharing for neural network compression. In: 5th International Conference on Learning Representations - Conference Track Proceedings, ICLR 2017 (2019). https://www.scopus.com/inward/record.uri?eid=2-s2.0-85071003624&partnerID=40&md5=dc00c36113f775ff4a6978b86543814d
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. CoRR abs/1708.07747 (2017). http://arxiv.org/abs/1708.07747
Zhu, M., Gupta, S.: To prune, or not to prune: exploring the efficacy of pruning for model compression. In: International Conference on Learning Representations, ICLR 2018 (2018)
Google Scholar
Zinkevich, M., Weimer, M., Li, L., Smola, A.J.: Parallelized stochastic gradient descent. In: Advances in Neural Information Processing Systems, pp. 2595–2603 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Torino, 10149, Turin, TO, Italy
Enzo Tartaglione, Andrea Bragagnolo & Marco Grangetto

Authors

Enzo Tartaglione
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Bragagnolo
View author publications
You can also search for this author in PubMed Google Scholar
Marco Grangetto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Enzo Tartaglione .

Editor information

Editors and Affiliations

Department of Applied Informatics, Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark
Paolo Masulli
Department of Informatics, University of Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tartaglione, E., Bragagnolo, A., Grangetto, M. (2020). Pruning Artificial Neural Networks: A Way to Find Well-Generalizing, High-Entropy Sharp Minima. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-61616-8_6
Published: 14 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61615-1
Online ISBN: 978-3-030-61616-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics