Empirical Study of Extreme Overfitting Points of Neural Networks

Merkulov, D. M.; Oseledets, I. V.

doi:10.1134/S1064226919120118

Empirical Study of Extreme Overfitting Points of Neural Networks

ARTIFICIAL INTELLECT
Published: 21 February 2020

Volume 64, pages 1527–1534, (2019)
Cite this article

Journal of Communications Technology and Electronics Aims and scope Submit manuscript

D. M. Merkulov^1,2 &
I. V. Oseledets^1,3

160 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper we propose a method of obtaining points of extreme overfitting—parameters of modern neural networks, at which they demonstrate close to 100% training accuracy, simultaneously with almost zero accuracy on the test sample. Despite the widespread opinion that the overwhelming majority of critical points of the loss function of a neural network have equally good generalizing ability, such points have a huge generalization error. The paper studies the properties of such points and their location on the surface of the loss function of modern neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

REFERENCES

V. N. Vapnik, “An overview of statistical learning theory,” IEEE Trans. Neural Networks 10, 988−999 (1999).
Article Google Scholar
A. Choromanska, M. Henaff, M. Mathieu, G. Ben Arous, and Y. LeCun, “The loss surfaces of multilayer networks,” in Artificial Intelligence and Statistics (Proc. 18th Int. Conf., AISTATS 2015, San Diego, California, USA, May 9−12, 2015) (AISTATS 2015), pp. 192−204 (2015).
K. Kawaguchi, “Deep learning without poor local minima,” in Advances in neural information processing systems (NIPS, 2016), pp. 586−594.
C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning requires rethinking generalization,” arXiv, Preprint arXiv:1611.03530 (2016).
Laurent Dinh, Razvan Pascanu, Samy Bengio, and Yoshua Bengio, “Sharp minima can generalize for deep nets,” J. Machine Learning Res. (JMLR) 70, 1019−1028 (2017).
Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, June 27–30,2016 (IEEE, New York, 2016), pp. 770−778.
Y. Le Cun, L. D Jackel, B. Boser, J. S. Denker, H. P. Graf, I. Guyon, D. Henderson, R. E. Howard, and W. Hubbard, “Handwritten digit recognition: Applications of neural network chips and automatic learning,” IEEE Commun. Mag. 27 (11), 41−46 (1989).
Article Google Scholar
Ya. Le Cun and C. Cortes, MNIST Handwritten Digit Database (Modified Nat. Inst. of Standards and Technol., 2010).
Google Scholar
H. Xiao, K. Rasul, and R. Vollgraf, Fashion-mnist: a Novel Image Dataset for Benchmarking Machine Learning Algorithms arXiv Preprint arXiv:1708.07747 (2017).
A. Krizhevsky and G. Hinton., “Learning multiple layers of features from tiny images,” Tech. Report, Citeseer (2009).
Google Scholar
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” in NIPS 2017 Workshop on Autodiff, Long Beach, California, USA, Dec. 7,2017.
D. P. Kingma and J. Ba. Adam, “A method for stochastic optimization,” arXiv Preprint arXiv:1412.6980 (2014).
H. Robbins and S. Monro, “A stochastic approximation method,” Ann. Math. Statist., 400−407 (1951).
X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proc. 13th Int. Conf. on Artificial Intelligence and Statistics, Sardinia, Italy, May 13−15,2010, (AISTATS, 2010), pp. 249−256.
E. D. Sontag, “Vc dimension of neural networks,” NATO ASI Series F Computer and Systems Sci. 168, 69−96 (1998).
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Skolkovo Institute of Science and Technology, Center for Computational and Data-Intensive Science and Engineering, Moscow, Russia
D. M. Merkulov & I. V. Oseledets
Moscow Institute of Physics and Technology, Moscow, Russia
D. M. Merkulov
Institute of Numerical Mathematics, Russian Academy of Sciences, Moscow, Russia
I. V. Oseledets

Authors

D. M. Merkulov
View author publications
You can also search for this author in PubMed Google Scholar
I. V. Oseledets
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to D. M. Merkulov or I. V. Oseledets.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Merkulov, D.M., Oseledets, I.V. Empirical Study of Extreme Overfitting Points of Neural Networks. J. Commun. Technol. Electron. 64, 1527–1534 (2019). https://doi.org/10.1134/S1064226919120118

Download citation

Received: 01 June 2019
Revised: 01 June 2019
Accepted: 26 June 2019
Published: 21 February 2020
Issue Date: December 2019
DOI: https://doi.org/10.1134/S1064226919120118

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions