Abstract
Recently, a race towards the simplification of deep networks has begun, showing that it is effectively possible to reduce the size of these models with minimal or no performance loss. However, there is a general lack in understanding why these pruning strategies are effective. In this work, we are going to compare and analyze pruned solutions with two different pruning approaches, one-shot and gradual, showing the higher effectiveness of the latter. In particular, we find that gradual pruning allows access to narrow, well-generalizing minima, which are typically ignored when using one-shot approaches. In this work we also propose PSP-entropy, a measure to understand how a given neuron correlates to some specific learned classes. Interestingly, we observe that the features extracted by iteratively-pruned models are less correlated to specific classes, potentially making these models a better fit in transfer learning approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The source code for PSP-entropy is available at https://github.com/EIDOSlab/PSP-entropy.git.
References
Ba, J., Caruana, R.: Do deep nets really need to be deep? In: Advances in Neural Information Processing Systems, pp. 2654–2662 (2014)
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) Proceedings of COMPSTAT’2010, pp. 177–186. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-7908-2604-3_16
Chaudhari, P., Choromanska, A., et al.: Entropy-SGD: biasing gradient descentinto wide valleys. In: International Conference on Learning Representations, ICLR 2017 (2017)
Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in Neural Information Processing Systems, pp. 1269–1277 (2014)
Draxler, F., Veschgini, K., Salmhofer, M., Hamprecht, F.: Essentially no barriers in neural network energy landscape. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR, Stockholmsmässan, Stockholm Sweden, 10–15 July 2018, vol. 80, pp. 1309–1318 (2018). http://proceedings.mlr.press/v80/draxler18a.html
Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks (2019). https://www.scopus.com/inward/record.uri?eid=2-s2.0-85069453436&partnerID=40&md5=fd1a2b2384d79f66a49cc838a76343d3
Golmant, N., Yao, Z., Gholami, A., Mahoney, M., Gonzalez, J.: pytorch-hessian-eigentings: efficient PyTorch Hessian eigendecomposition (October 2018). https://github.com/noahgolmant/pytorch-hessian-eigenthings
Goodfellow, I.J., Vinyals, O., Saxe, A.M.: Qualitatively characterizing neural network optimization problems. In: International Conference on Learning Representations, ICLR 2015 (2015)
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, pp. 1135–1143 (2015)
Howard, A.G., et al.: MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836 (2016)
LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems, pp. 598–605 (1990)
Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient ConvNets. In: International Conference on Learning Representations, ICLR 2017 (2017)
Liu, Z., Sun, M., Zhou, T., Huang, G., Darrell, T.: Rethinking the value of network pruning. In: International Conference on Learning Representations, ICLR 2019 (2019)
Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through \(l\_0\) regularization. In: International Conference on Learning Representations, ICLR 2018 (2018)
Luo, J.H., Wu, J., Lin, W.: ThiNet: a filter level pruning method for deep neural network compression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5058–5066 (2017)
Molchanov, D., Ashukha, A., Vetrov, D.: Variational dropout sparsifies deep neural networks. In: 34th International Conference on Machine Learning, ICML 2017, vol. 5, pp. 3854–3863 (2017). https://www.scopus.com/inward/record.uri?eid=2-s2.0-85048506601&partnerID=40&md5=c352a4786ef977ccea7e397bd7469f14
Renda, A., Frankle, J., Carbin, M.: Comparing rewinding and fine-tuning in neural network pruning. arXiv preprint arXiv:2003.02389 (2020)
Tartaglione, E., Bragagnolo, A., Grangetto, M., Lepsøy, S.: Loss-based sensitivity regularization: towards deep sparse neural networks (2020). https://iris.unito.it/retrieve/handle/2318/1737767/608158/ICML20.pdf
Tartaglione, E., Grangetto, M.: Take a ramble into solution spaces for classification problems in neural networks. In: Ricci, E., Rota Bulò, S., Snoek, C., Lanz, O., Messelodi, S., Sebe, N. (eds.) ICIAP 2019. LNCS, vol. 11751, pp. 345–355. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30642-7_31
Tartaglione, E., Lepsøy, S., Fiandrotti, A., Francini, G.: Learning sparse neural networks via sensitivity-driven regularization. In: Advances in Neural Information Processing Systems, pp. 3878–3888 (2018)
Tartaglione, E., Perlo, D., Grangetto, M.: Post-synaptic potential regularization has potential. In: Tetko, I.V., Kůrková, V., Karpov, P., Theis, F. (eds.) ICANN 2019. LNCS, vol. 11728, pp. 187–200. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30484-3_16
Ullrich, K., Welling, M., Meeds, E.: Soft weight-sharing for neural network compression. In: 5th International Conference on Learning Representations - Conference Track Proceedings, ICLR 2017 (2019). https://www.scopus.com/inward/record.uri?eid=2-s2.0-85071003624&partnerID=40&md5=dc00c36113f775ff4a6978b86543814d
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. CoRR abs/1708.07747 (2017). http://arxiv.org/abs/1708.07747
Zhu, M., Gupta, S.: To prune, or not to prune: exploring the efficacy of pruning for model compression. In: International Conference on Learning Representations, ICLR 2018 (2018)
Zinkevich, M., Weimer, M., Li, L., Smola, A.J.: Parallelized stochastic gradient descent. In: Advances in Neural Information Processing Systems, pp. 2595–2603 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Tartaglione, E., Bragagnolo, A., Grangetto, M. (2020). Pruning Artificial Neural Networks: A Way to Find Well-Generalizing, High-Entropy Sharp Minima. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-61616-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61615-1
Online ISBN: 978-3-030-61616-8
eBook Packages: Computer ScienceComputer Science (R0)