Skip to main content

Dynamic Neural Diversification: Path to Computationally Sustainable Neural Networks

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2021 (ICANN 2021)

Abstract

Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks, where now excessively large models are used. However, such models face several problems during the learning process, mainly due to the redundancy of the individual neurons, which results in sub-optimal accuracy or the need for additional training steps. Here, we explore the diversity of the neurons within the hidden layer during the learning process, and analyze how the diversity of the neurons affects predictions of the model. As following, we introduce several techniques to dynamically reinforce diversity between neurons during the training. These decorrelation techniques improve learning at early stages and occasionally help to overcome local minima faster. Additionally, we describe novel weight initialization method to obtain decorrelated, yet stochastic weight initialization for a fast and efficient neural network training. Decorrelated weight initialization in our case shows about 40% relative increase in test accuracy during the first 5 epochs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arjevani, Y., Field, M.: Symmetry & critical points for a model shallow neural network (2020)

    Google Scholar 

  2. Atakulreka, A., Sutivong, D.: Avoiding local minima in feedforward neural networks by simultaneous learning. In: Orgun, M.A., Thornton, J. (eds.) AI 2007: Advances in Artificial Intelligence, pp. 100–109. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76928-6_12

    Chapter  Google Scholar 

  3. Bian, Y., Chen, H.: When does diversity help generalization in classification ensembles? (2021)

    Google Scholar 

  4. Brown, G.: Diversity in neural network ensembles. Tech. rep. (2004)

    Google Scholar 

  5. Brown, T.B., et al.: Language models are few-shot learners (2020)

    Google Scholar 

  6. Chollet, F.: Xception: deep learning with depthwise separable convolutions (2017)

    Google Scholar 

  7. Ciresan, D.C., Meier, U., Masci, J., Gambardella, L.M., Schmidhuber, J.: Flexible, high performance convolutional neural networks for image classification. In: Twenty-Second International Joint Conference on Artificial Intelligence (2011)

    Google Scholar 

  8. Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)

    Article  MathSciNet  Google Scholar 

  9. Gao, H., Cai, L., Ji, S.: Adaptive convolutional relus. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 3914–3921 (2020)

    Google Scholar 

  10. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256. JMLR Workshop and Conference Proceedings (2010)

    Google Scholar 

  11. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323. JMLR Workshop and Conference Proceedings (2011)

    Google Scholar 

  12. Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)

    Article  Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)

    Google Scholar 

  14. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)

    Google Scholar 

  15. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  Google Scholar 

  16. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Google Scholar 

  17. Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press (2001)

    Google Scholar 

  18. Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 6(02), 107–116 (1998)

    Article  Google Scholar 

  19. Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991)

    Article  MathSciNet  Google Scholar 

  20. Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., Wilson, A.G.: Averaging weights leads to wider optima and better generalization (2019)

    Google Scholar 

  21. Jha, D., Yazidi, A., Riegler, M.A., Johansen, D., Johansen, H.D., Halvorsen, P.: LightLayers: parameter efficient dense and convolutional layers for image classification (2021)

    Google Scholar 

  22. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2017)

    Google Scholar 

  23. Krogh, A., Vedelsby, J.: Validation, and active learning. In: Advances in Neural Information Processing Systems, vol. 7, no. 7, p. 231 (1995)

    Google Scholar 

  24. Lacoste, A., Luccioni, A., Schmidt, V., Dandres, T.: Quantifying the carbon emissions of machine learning. arXiv preprint arXiv:1910.09700 (2019)

  25. LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010). http://yann.lecun.com/exdb/mnist/

  26. Lu, L.: Dying relu and initialization: theory and numerical examples. Commun. Comput. Phys. 28(5), 1671–1706 (2020). https://doi.org/10.4208/cicp.oa-2020-0165. http://dx.doi.org/10.4208/cicp.OA-2020-0165

  27. Lu, Z., Pu, H., Wang, F., Hu, Z., Wang, L.: The expressive power of neural networks: a view from the width (2017)

    Google Scholar 

  28. Luijten, B., et al.: Deep learning for fast adaptive beamforming. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1333–1337. IEEE (2019)

    Google Scholar 

  29. Martens, J., Sutskever, I.: Learning recurrent neural networks with hessian-free optimization. In: ICML (2011)

    Google Scholar 

  30. Marziale, L., Richard, G.G., Roussev, V.: Massive threading: using GPUs to increase the performance of digital forensics tools. Digital Invest. 4, 73 – 81 (2007). https://doi.org/10.1016/j.diin.2007.06.014. http://www.sciencedirect.com/science/article/pii/S1742287607000436

  31. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

  32. Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. arXiv preprint arXiv:1710.05941 (2017)

  33. Scherer, D., Müller, A., Behnke, S.: Evaluation of pooling operations in convolutional architectures for object recognition. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds.) ICANN 2010. LNCS, vol. 6354, pp. 92–101. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15825-4_10

    Chapter  Google Scholar 

  34. Schmidhuber, J.: Learning to control fast-weight memories: an alternative to dynamic recurrent networks. Neural Comput. 4(1), 131–139 (1992)

    Article  Google Scholar 

  35. Swirszcz, G., Czarnecki, W.M., Pascanu, R.: Local minima in training of neural networks (2017)

    Google Scholar 

  36. Szegedy, C., et al.: Going deeper with convolutions (2014)

    Google Scholar 

  37. Tayal, K., Lai, C.H., Kumar, V., Sun, J.: Inverse problems, deep learning, and symmetry breaking (2020)

    Google Scholar 

  38. Ueda, N., Nakano, R.: Generalization error of ensemble estimators. In: Proceedings of International Conference on Neural Networks (ICNN 1996), vol. 1, pp. 90–95. IEEE (1996)

    Google Scholar 

  39. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103 (2008)

    Google Scholar 

  40. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)

  41. Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 (2015)

Download references

Acknowledgment

This research is supported by the Czech Ministry of Education, Youth and Sports from the Czech Operational ProgrammeResearch, Development, and Education, under grant agreement No. CZ.02.1.01/0.0/0.0/15003/0000421 and the Czech Science Foundation (GAČR 18-18080S).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Magda Friedjungová .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kovalenko, A., Kordík, P., Friedjungová, M. (2021). Dynamic Neural Diversification: Path to Computationally Sustainable Neural Networks. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12892. Springer, Cham. https://doi.org/10.1007/978-3-030-86340-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86340-1_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86339-5

  • Online ISBN: 978-3-030-86340-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics