Training Neural Networks by Optimizing Random Subspaces of the Weight Space

Skubalska-Rafajłowicz, Ewa

doi:10.1007/978-3-319-39378-0_14

Ewa Skubalska-Rafajłowicz¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9692))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

1210 Accesses
1 Citations

Abstract

This paper describes a new approach to feed-forward neural networks learning based on a random choice of a set of neurons which are temporally active in the process of neural network weight adaptation. The rest of the network weights is locked out (frozen). In contrast to the “dropout” method introduced by Hinton et al. [15], the neurons (along with their connections) are not removed from the neural network during training, only their weights are not modified, i.e. stay constant. This means that in every epoch of training only the random part of the neural networks (a chosen set of neurons and its connections) adapts. Freezing of neurons suppresses overfitting and prevents drastic increment of weights during the learning process, since the overall structure of the neural networks does not change. In many cases the approach based on training only some parts of the neural network (subspaces of the weight space) shortens the time of training. Experimental results for medium size neural networks used for modeling regression are also provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ba, J., Frey, B.: Adaptive dropout for training deep neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 3084–3092 (2013)
Google Scholar
Bartlett, P.L.: The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans. Inf. Theor. 4(2), 525–536 (1998)
Article MathSciNet MATH Google Scholar
Baldi, P., Hornik, K.: Learning from examples without local minima. Neural Netw. 2(1), 53–58 (1989)
Article Google Scholar
Choromanska, A., Henaff, M., Mathieu, M., Arous, G.B., LeCun, Y.: The loss surface of multilayer networks. In: Proceedings of the Conference on AI and Statistics (2014). http://arxiv.org/abs/1412.0233
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Article MathSciNet MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article MATH Google Scholar
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control, Signals Syst. 2(4), 303–314 (1989)
Article MathSciNet MATH Google Scholar
Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: IEEE International Conference on Acoustic Speech and Signal Processing (ICASSP 2013), Vancouver (2013)
Google Scholar
Dauphin, Y., Pascanu, R., Gulcehre, C., Cho, K.: Identifying and attacking the saddle point problem in highdimensional non-convex optimization. In: Proceedings of Advances in Neural Information Processing Systems, vol. 27, pp. 2933–2941 (2014)
Google Scholar
Fine, T.: Feedforward Neural Network Methodology. Statistics for Engineering and Information Science. Springer, New York, Inc (1999)
MATH Google Scholar
Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1319–1327. ACM (2013)
Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River (1998)
MATH Google Scholar
Hertz, J., Krogh, A., Palmer, R.G.: Introduction to the Theory of Neural Computation. Addison-Wesley, Redwood City (1991)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors (2012). http://arxiv.org/abs/1207.0580
Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4, 251–257 (1991)
Article Google Scholar
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)
Article Google Scholar
Kushner, H., Yin, G.: Stochastic Approximation and Recursive Algorithms and Applications, II edn. Springer-Verlag, New York, Inc (2003)
MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)
MATH Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning (Review). Nature 521, 436–444 (2015)
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA (2013)
Google Scholar
Schmidhuber, J.: Deep Learning and Neural Networks: An Overview. arXiv:1404.7828 (2014)
Qiu, X., Zhang, L., Ren, Y., Suganthan, P., Amaratunga, G.: Ensemble deep learning for regression and time series forecasting. 2014 IEEE Symposium on Computational Intelligence in Ensemble Learning (CIEL), pp. 1–6 (2014). doi:10.1109/CIEL.2014.7015739
Wang, S.I., Manning, C.D.: Fast dropout training. In: Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA (2013)
Google Scholar

Download references

Acknowledgments

This research was supported by S50242 grant at the Faculty of Electronics, Wrocław University of Science and Technology.

Author information

Authors and Affiliations

Department of Computer Engineering, Faculty of Electronics, Wrocław University of Science and Technology, Wrocław, Poland
Ewa Skubalska-Rafajłowicz

Authors

Ewa Skubalska-Rafajłowicz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ewa Skubalska-Rafajłowicz .

Editor information

Editors and Affiliations

Częstochowa University of Technology, Czestochowa, Poland
Leszek Rutkowski
Częstochowa University of Technology, Czestochowa, Poland
Marcin Korytkowski
Częstochowa University of Technology, Czestochowa, Poland
Rafał Scherer
AGH University of Science and Technology, Krakow, Poland
Ryszard Tadeusiewicz
University of California, Berkeley, California, USA
Lotfi A. Zadeh
University of Louisville, Louisville, Kentucky, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Skubalska-Rafajłowicz, E. (2016). Training Neural Networks by Optimizing Random Subspaces of the Weight Space. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2016. Lecture Notes in Computer Science(), vol 9692. Springer, Cham. https://doi.org/10.1007/978-3-319-39378-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-39378-0_14
Published: 29 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39377-3
Online ISBN: 978-3-319-39378-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics