Skip to main content

Sparseout: Controlling Sparsity in Deep Networks

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11489))

Abstract

Dropout is commonly used to help reduce overfitting in deep neural networks. Sparsity is a potentially important property of neural networks, but is not explicitly controlled by Dropout-based regularization. In this work, we propose Sparseout a simple and efficient variant of Dropout that can be used to control the sparsity of the activations in a neural network. We theoretically prove that Sparseout is equivalent to an \(L_q\) penalty on the features of a generalized linear model and that Dropout is a special case of Sparseout for neural networks. We empirically demonstrate that Sparseout is computationally inexpensive and is able to control the desired level of sparsity in the activations. We evaluated Sparseout on image classification and language modelling tasks to see the effect of sparsity on these tasks. We found that sparsity of the activations is favorable for language modelling performance while image classification benefits from denser activations. Sparseout provides a way to investigate sparsity in state-of-the-art deep learning models. Source code for Sparseout could be found at https://github.com/najeebkhan/sparseout.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/salesforce/awd-lstm-lm.

References

  1. Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis. Res. 37(23), 3311–3325 (1997)

    Article  Google Scholar 

  2. Morris, G., Nevet, A., Bergman, H.: Anatomical funneling, sparse connectivity and redundancy reduction in the neural networks of the basal ganglia. J. Physiol.-Paris 97(4), 581–589 (2003)

    Article  Google Scholar 

  3. Thom, M., Palm, G.: Sparse activity and sparse connectivity in supervised learning. J. Mach. Learn. Res. 14(Apr), 1091–1143 (2013)

    MathSciNet  MATH  Google Scholar 

  4. Hinton, G.: A practical guide to training restricted Boltzmann machines. Momentum 9(1), 926 (2010)

    Google Scholar 

  5. Lee, H., Ekanadham, C., Ng, A.Y.: Sparse deep belief net model for visual area V2. In: Advances in Neural Information Processing Systems, pp. 873–880 (2008)

    Google Scholar 

  6. Schweighofer, N., Doya, K., Lay, F.: Unsupervised learning of granule cell sparse codes enhances cerebellar adaptive control. Neuroscience 103(1), 35–50 (2001)

    Article  Google Scholar 

  7. Spanne, A., Jörntell, H.: Questioning the role of sparse coding in the brain. Trends in Neurosci. 38(7), 417–427 (2015)

    Article  Google Scholar 

  8. Goodfellow, I., Warde-farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 1319–1327 (2013)

    Google Scholar 

  9. Kawaguchi, K., Kaelbling, L.P., Bengio, Y.: Generalization in deep learning. arXiv preprint arXiv:1710.05468 (2017)

  10. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)

    Google Scholar 

  11. Hanson, S.J., Pratt, L.Y.: Comparing biases for minimal network construction with back-propagation. In: Advances in Neural Information Processing Systems, pp. 177–185 (1989)

    Google Scholar 

  12. LeCun, Y., Denker, J.S., Solla, S.A., Howard, R.E., Jackel, L.D.: Optimal brain damage. In: NIPS, vol. 2, pp. 598–605 (1989)

    Google Scholar 

  13. Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, pp. 1135–1143 (2015)

    Google Scholar 

  14. Chauvin, Y.: A back-propagation algorithm with optimal use of hidden units. In: Advances in Neural Information Processing Systems, pp. 519–526 (1989)

    Google Scholar 

  15. Mrázová, I., Wang, D.: Improved generalization of neural classifiers with enforced internal representation. Neurocomputing 70(16), 2940–2952 (2007)

    Article  Google Scholar 

  16. Wan, W., Mabu, S., Shimada, K., Hirasawa, K., Hu, J.: Enhancing the generalization ability of neural networks through controlling the hidden layers. Appl. Soft Comput. 9(1), 404–414 (2009)

    Article  Google Scholar 

  17. Liao, R., Schwing, A., Zemel, R., Urtasun, R.: Learning deep parsimonious representations. In: Advances in Neural Information Processing Systems, pp. 5076–5084 (2016)

    Google Scholar 

  18. Chetlur, S., et al.: cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)

  19. Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, New York (2014)

    Book  Google Scholar 

  20. Neyshabur, B., Tomioka, R., Srebro, N.: Norm-based capacity control in neural networks. In: Conference on Learning Theory, pp. 1376–1401 (2015)

    Google Scholar 

  21. Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  22. Kang, G., Li, J., Tao, D.: Shakeout: a new regularized deep neural network training scheme. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)

    Google Scholar 

  23. Khan, N., Shah, J., Stavness, I.: Bridgeout: stochastic bridge regularization for deep neural networks. arXiv preprint arXiv:1804.08042 (2018)

  24. Cogswell, M., Ahmed, F., Girshick, R., Zitnick, L., Batra, D.: Reducing overfitting in deep networks by decorrelating representations. arXiv preprint arXiv:1511.06068 (2015)

  25. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)

    Google Scholar 

  26. Wager, S., Wang, S., Liang, P.S.: Dropout training as adaptive regularization. In: Advances in Neural Information Processing Systems, pp. 351–359 (2013)

    Google Scholar 

  27. Park, C., Yoon, Y.J.: Bridge regression: adaptivity and group selection. J. Stat. Plann. Inference 141(11), 3506–3519 (2011)

    Article  MathSciNet  Google Scholar 

  28. Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5(Nov), 1457–1469 (2004)

    MathSciNet  MATH  Google Scholar 

  29. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, Citeseer (2009)

    Google Scholar 

  30. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  31. Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)

  32. Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through \(l_0\) regularization. In: International Conference on Learning Representations (2018)

    Google Scholar 

  33. Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)

    Google Scholar 

  34. Lipton, Z.C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015)

  35. Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models. arXiv preprint arXiv:1708.02182 (2017)

  36. Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: insights and applications. In: Deep Learning Workshop, ICML (2015)

    Google Scholar 

  37. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of english: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)

    Google Scholar 

  38. Merity, S., Xiong, C., Bradbury, J., Socher, R.: Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843 (2016)

  39. Rigamonti, R., Brown, M.A., Lepetit, V.: Are sparse representations really relevant for image classification? In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1545–1552. IEEE (2011)

    Google Scholar 

  40. Gulcehre, C., Cho, K., Pascanu, R., Bengio, Y.: Learned-norm pooling for deep feedforward and recurrent neural networks. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8724, pp. 530–546. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44848-9_34

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Najeeb Khan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khan, N., Stavness, I. (2019). Sparseout: Controlling Sparsity in Deep Networks. In: Meurs, MJ., Rudzicz, F. (eds) Advances in Artificial Intelligence. Canadian AI 2019. Lecture Notes in Computer Science(), vol 11489. Springer, Cham. https://doi.org/10.1007/978-3-030-18305-9_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18305-9_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18304-2

  • Online ISBN: 978-3-030-18305-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics