Sparseout: Controlling Sparsity in Deep Networks

Khan, Najeeb; Stavness, Ian

doi:10.1007/978-3-030-18305-9_24

Sparseout: Controlling Sparsity in Deep Networks

Najeeb Khan¹⁶ &
Ian Stavness¹⁶

Conference paper
First Online: 24 April 2019

2580 Accesses
1 Citations
2 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11489))

Abstract

Dropout is commonly used to help reduce overfitting in deep neural networks. Sparsity is a potentially important property of neural networks, but is not explicitly controlled by Dropout-based regularization. In this work, we propose Sparseout a simple and efficient variant of Dropout that can be used to control the sparsity of the activations in a neural network. We theoretically prove that Sparseout is equivalent to an \(L_q\) penalty on the features of a generalized linear model and that Dropout is a special case of Sparseout for neural networks. We empirically demonstrate that Sparseout is computationally inexpensive and is able to control the desired level of sparsity in the activations. We evaluated Sparseout on image classification and language modelling tasks to see the effect of sparsity on these tasks. We found that sparsity of the activations is favorable for language modelling performance while image classification benefits from denser activations. Sparseout provides a way to investigate sparsity in state-of-the-art deep learning models. Source code for Sparseout could be found at https://github.com/najeebkhan/sparseout.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://github.com/salesforce/awd-lstm-lm.

References

Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis. Res. 37(23), 3311–3325 (1997)
Article Google Scholar
Morris, G., Nevet, A., Bergman, H.: Anatomical funneling, sparse connectivity and redundancy reduction in the neural networks of the basal ganglia. J. Physiol.-Paris 97(4), 581–589 (2003)
Article Google Scholar
Thom, M., Palm, G.: Sparse activity and sparse connectivity in supervised learning. J. Mach. Learn. Res. 14(Apr), 1091–1143 (2013)
MathSciNet MATH Google Scholar
Hinton, G.: A practical guide to training restricted Boltzmann machines. Momentum 9(1), 926 (2010)
Google Scholar
Lee, H., Ekanadham, C., Ng, A.Y.: Sparse deep belief net model for visual area V2. In: Advances in Neural Information Processing Systems, pp. 873–880 (2008)
Google Scholar
Schweighofer, N., Doya, K., Lay, F.: Unsupervised learning of granule cell sparse codes enhances cerebellar adaptive control. Neuroscience 103(1), 35–50 (2001)
Article Google Scholar
Spanne, A., Jörntell, H.: Questioning the role of sparse coding in the brain. Trends in Neurosci. 38(7), 417–427 (2015)
Article Google Scholar
Goodfellow, I., Warde-farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 1319–1327 (2013)
Google Scholar
Kawaguchi, K., Kaelbling, L.P., Bengio, Y.: Generalization in deep learning. arXiv preprint arXiv:1710.05468 (2017)
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)
Google Scholar
Hanson, S.J., Pratt, L.Y.: Comparing biases for minimal network construction with back-propagation. In: Advances in Neural Information Processing Systems, pp. 177–185 (1989)
Google Scholar
LeCun, Y., Denker, J.S., Solla, S.A., Howard, R.E., Jackel, L.D.: Optimal brain damage. In: NIPS, vol. 2, pp. 598–605 (1989)
Google Scholar
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, pp. 1135–1143 (2015)
Google Scholar
Chauvin, Y.: A back-propagation algorithm with optimal use of hidden units. In: Advances in Neural Information Processing Systems, pp. 519–526 (1989)
Google Scholar
Mrázová, I., Wang, D.: Improved generalization of neural classifiers with enforced internal representation. Neurocomputing 70(16), 2940–2952 (2007)
Article Google Scholar
Wan, W., Mabu, S., Shimada, K., Hirasawa, K., Hu, J.: Enhancing the generalization ability of neural networks through controlling the hidden layers. Appl. Soft Comput. 9(1), 404–414 (2009)
Article Google Scholar
Liao, R., Schwing, A., Zemel, R., Urtasun, R.: Learning deep parsimonious representations. In: Advances in Neural Information Processing Systems, pp. 5076–5084 (2016)
Google Scholar
Chetlur, S., et al.: cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, New York (2014)
Book Google Scholar
Neyshabur, B., Tomioka, R., Srebro, N.: Norm-based capacity control in neural networks. In: Conference on Learning Theory, pp. 1376–1401 (2015)
Google Scholar
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Kang, G., Li, J., Tao, D.: Shakeout: a new regularized deep neural network training scheme. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Khan, N., Shah, J., Stavness, I.: Bridgeout: stochastic bridge regularization for deep neural networks. arXiv preprint arXiv:1804.08042 (2018)
Cogswell, M., Ahmed, F., Girshick, R., Zitnick, L., Batra, D.: Reducing overfitting in deep networks by decorrelating representations. arXiv preprint arXiv:1511.06068 (2015)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Google Scholar
Wager, S., Wang, S., Liang, P.S.: Dropout training as adaptive regularization. In: Advances in Neural Information Processing Systems, pp. 351–359 (2013)
Google Scholar
Park, C., Yoon, Y.J.: Bridge regression: adaptivity and group selection. J. Stat. Plann. Inference 141(11), 3506–3519 (2011)
Article MathSciNet Google Scholar
Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5(Nov), 1457–1469 (2004)
MathSciNet MATH Google Scholar
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, Citeseer (2009)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through \(l_0\) regularization. In: International Conference on Learning Representations (2018)
Google Scholar
Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)
Google Scholar
Lipton, Z.C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015)
Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models. arXiv preprint arXiv:1708.02182 (2017)
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: insights and applications. In: Deep Learning Workshop, ICML (2015)
Google Scholar
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of english: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
Google Scholar
Merity, S., Xiong, C., Bradbury, J., Socher, R.: Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843 (2016)
Rigamonti, R., Brown, M.A., Lepetit, V.: Are sparse representations really relevant for image classification? In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1545–1552. IEEE (2011)
Google Scholar
Gulcehre, C., Cho, K., Pascanu, R., Bengio, Y.: Learned-norm pooling for deep feedforward and recurrent neural networks. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8724, pp. 530–546. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44848-9_34
Chapter Google Scholar

Download references

Acknowledgments

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Department of Computer Science, University of Saskatchewan, Saskatoon, Canada
Najeeb Khan & Ian Stavness

Authors

Najeeb Khan
View author publications
You can also search for this author in PubMed Google Scholar
Ian Stavness
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Najeeb Khan .

Editor information

Editors and Affiliations

University of Quebec in Montreal, Montreal, QC, Canada
Marie-Jean Meurs
University of Toronto, Toronto, ON, Canada
Frank Rudzicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khan, N., Stavness, I. (2019). Sparseout: Controlling Sparsity in Deep Networks. In: Meurs, MJ., Rudzicz, F. (eds) Advances in Artificial Intelligence. Canadian AI 2019. Lecture Notes in Computer Science(), vol 11489. Springer, Cham. https://doi.org/10.1007/978-3-030-18305-9_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-18305-9_24
Published: 24 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18304-2
Online ISBN: 978-3-030-18305-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics