Skip to main content
Log in

Empirical Study of Extreme Overfitting Points of Neural Networks

  • ARTIFICIAL INTELLECT
  • Published:
Journal of Communications Technology and Electronics Aims and scope Submit manuscript

Abstract

In this paper we propose a method of obtaining points of extreme overfitting—parameters of modern neural networks, at which they demonstrate close to 100% training accuracy, simultaneously with almost zero accuracy on the test sample. Despite the widespread opinion that the overwhelming majority of critical points of the loss function of a neural network have equally good generalizing ability, such points have a huge generalization error. The paper studies the properties of such points and their location on the surface of the loss function of modern neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.

Similar content being viewed by others

REFERENCES

  1. V. N. Vapnik, “An overview of statistical learning theory,” IEEE Trans. Neural Networks 10, 988−999 (1999).

    Article  Google Scholar 

  2. A. Choromanska, M. Henaff, M. Mathieu, G. Ben Arous, and Y. LeCun, “The loss surfaces of multilayer networks,” in Artificial Intelligence and Statistics (Proc. 18th Int. Conf., AISTATS 2015, San Diego, California, USA, May 9−12, 2015) (AISTATS 2015), pp. 192−204 (2015).

  3. K. Kawaguchi, “Deep learning without poor local minima,” in Advances in neural information processing systems (NIPS, 2016), pp. 586−594.

  4. C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning requires rethinking generalization,” arXiv, Preprint arXiv:1611.03530 (2016).

  5. Laurent Dinh, Razvan Pascanu, Samy Bengio, and Yoshua Bengio, “Sharp minima can generalize for deep nets,” J. Machine Learning Res. (JMLR) 70, 1019−1028 (2017).

    Google Scholar 

  6. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, June 27–30,2016 (IEEE, New York, 2016), pp. 770−778.

  7. Y. Le Cun, L. D Jackel, B. Boser, J. S. Denker, H. P. Graf, I. Guyon, D. Henderson, R. E. Howard, and W. Hubbard, “Handwritten digit recognition: Applications of neural network chips and automatic learning,” IEEE Commun. Mag. 27 (11), 41−46 (1989).

    Article  Google Scholar 

  8. Ya. Le Cun and C. Cortes, MNIST Handwritten Digit Database (Modified Nat. Inst. of Standards and Technol., 2010).

    Google Scholar 

  9. H. Xiao, K. Rasul, and R. Vollgraf, Fashion-mnist: a Novel Image Dataset for Benchmarking Machine Learning Algorithms arXiv Preprint arXiv:1708.07747 (2017).

  10. A. Krizhevsky and G. Hinton., “Learning multiple layers of features from tiny images,” Tech. Report, Citeseer (2009).

    Google Scholar 

  11. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” in NIPS 2017 Workshop on Autodiff, Long Beach, California, USA, Dec. 7,2017.

  12. D. P. Kingma and J. Ba. Adam, “A method for stochastic optimization,” arXiv Preprint arXiv:1412.6980 (2014).

  13. H. Robbins and S. Monro, “A stochastic approximation method,” Ann. Math. Statist., 400−407 (1951).

  14. X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proc. 13th Int. Conf. on Artificial Intelligence and Statistics, Sardinia, Italy, May 1315,2010, (AISTATS, 2010), pp. 249−256.

  15. E. D. Sontag, “Vc dimension of neural networks,” NATO ASI Series F Computer and Systems Sci. 168, 69−96 (1998).

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to D. M. Merkulov or I. V. Oseledets.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Merkulov, D.M., Oseledets, I.V. Empirical Study of Extreme Overfitting Points of Neural Networks. J. Commun. Technol. Electron. 64, 1527–1534 (2019). https://doi.org/10.1134/S1064226919120118

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1064226919120118

Keywords:

Navigation