Abstract
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks, where now excessively large models are used. However, such models face several problems during the learning process, mainly due to the redundancy of the individual neurons, which results in sub-optimal accuracy or the need for additional training steps. Here, we explore the diversity of the neurons within the hidden layer during the learning process, and analyze how the diversity of the neurons affects predictions of the model. As following, we introduce several techniques to dynamically reinforce diversity between neurons during the training. These decorrelation techniques improve learning at early stages and occasionally help to overcome local minima faster. Additionally, we describe novel weight initialization method to obtain decorrelated, yet stochastic weight initialization for a fast and efficient neural network training. Decorrelated weight initialization in our case shows about 40% relative increase in test accuracy during the first 5 epochs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arjevani, Y., Field, M.: Symmetry & critical points for a model shallow neural network (2020)
Atakulreka, A., Sutivong, D.: Avoiding local minima in feedforward neural networks by simultaneous learning. In: Orgun, M.A., Thornton, J. (eds.) AI 2007: Advances in Artificial Intelligence, pp. 100–109. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76928-6_12
Bian, Y., Chen, H.: When does diversity help generalization in classification ensembles? (2021)
Brown, G.: Diversity in neural network ensembles. Tech. rep. (2004)
Brown, T.B., et al.: Language models are few-shot learners (2020)
Chollet, F.: Xception: deep learning with depthwise separable convolutions (2017)
Ciresan, D.C., Meier, U., Masci, J., Gambardella, L.M., Schmidhuber, J.: Flexible, high performance convolutional neural networks for image classification. In: Twenty-Second International Joint Conference on Artificial Intelligence (2011)
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)
Gao, H., Cai, L., Ji, S.: Adaptive convolutional relus. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 3914–3921 (2020)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256. JMLR Workshop and Conference Proceedings (2010)
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323. JMLR Workshop and Conference Proceedings (2011)
Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press (2001)
Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 6(02), 107–116 (1998)
Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991)
Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., Wilson, A.G.: Averaging weights leads to wider optima and better generalization (2019)
Jha, D., Yazidi, A., Riegler, M.A., Johansen, D., Johansen, H.D., Halvorsen, P.: LightLayers: parameter efficient dense and convolutional layers for image classification (2021)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2017)
Krogh, A., Vedelsby, J.: Validation, and active learning. In: Advances in Neural Information Processing Systems, vol. 7, no. 7, p. 231 (1995)
Lacoste, A., Luccioni, A., Schmidt, V., Dandres, T.: Quantifying the carbon emissions of machine learning. arXiv preprint arXiv:1910.09700 (2019)
LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010). http://yann.lecun.com/exdb/mnist/
Lu, L.: Dying relu and initialization: theory and numerical examples. Commun. Comput. Phys. 28(5), 1671–1706 (2020). https://doi.org/10.4208/cicp.oa-2020-0165. http://dx.doi.org/10.4208/cicp.OA-2020-0165
Lu, Z., Pu, H., Wang, F., Hu, Z., Wang, L.: The expressive power of neural networks: a view from the width (2017)
Luijten, B., et al.: Deep learning for fast adaptive beamforming. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1333–1337. IEEE (2019)
Martens, J., Sutskever, I.: Learning recurrent neural networks with hessian-free optimization. In: ICML (2011)
Marziale, L., Richard, G.G., Roussev, V.: Massive threading: using GPUs to increase the performance of digital forensics tools. Digital Invest. 4, 73 – 81 (2007). https://doi.org/10.1016/j.diin.2007.06.014. http://www.sciencedirect.com/science/article/pii/S1742287607000436
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. arXiv preprint arXiv:1710.05941 (2017)
Scherer, D., Müller, A., Behnke, S.: Evaluation of pooling operations in convolutional architectures for object recognition. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds.) ICANN 2010. LNCS, vol. 6354, pp. 92–101. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15825-4_10
Schmidhuber, J.: Learning to control fast-weight memories: an alternative to dynamic recurrent networks. Neural Comput. 4(1), 131–139 (1992)
Swirszcz, G., Czarnecki, W.M., Pascanu, R.: Local minima in training of neural networks (2017)
Szegedy, C., et al.: Going deeper with convolutions (2014)
Tayal, K., Lai, C.H., Kumar, V., Sun, J.: Inverse problems, deep learning, and symmetry breaking (2020)
Ueda, N., Nakano, R.: Generalization error of ensemble estimators. In: Proceedings of International Conference on Neural Networks (ICNN 1996), vol. 1, pp. 90–95. IEEE (1996)
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103 (2008)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 (2015)
Acknowledgment
This research is supported by the Czech Ministry of Education, Youth and Sports from the Czech Operational ProgrammeResearch, Development, and Education, under grant agreement No. CZ.02.1.01/0.0/0.0/15003/0000421 and the Czech Science Foundation (GAČR 18-18080S).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kovalenko, A., Kordík, P., Friedjungová, M. (2021). Dynamic Neural Diversification: Path to Computationally Sustainable Neural Networks. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12892. Springer, Cham. https://doi.org/10.1007/978-3-030-86340-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-86340-1_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86339-5
Online ISBN: 978-3-030-86340-1
eBook Packages: Computer ScienceComputer Science (R0)