Abstract
In this paper we propose a method of obtaining points of extreme overfitting—parameters of modern neural networks, at which they demonstrate close to 100% training accuracy, simultaneously with almost zero accuracy on the test sample. Despite the widespread opinion that the overwhelming majority of critical points of the loss function of a neural network have equally good generalizing ability, such points have a huge generalization error. The paper studies the properties of such points and their location on the surface of the loss function of modern neural networks.
Similar content being viewed by others
REFERENCES
V. N. Vapnik, “An overview of statistical learning theory,” IEEE Trans. Neural Networks 10, 988−999 (1999).
A. Choromanska, M. Henaff, M. Mathieu, G. Ben Arous, and Y. LeCun, “The loss surfaces of multilayer networks,” in Artificial Intelligence and Statistics (Proc. 18th Int. Conf., AISTATS 2015, San Diego, California, USA, May 9−12, 2015) (AISTATS 2015), pp. 192−204 (2015).
K. Kawaguchi, “Deep learning without poor local minima,” in Advances in neural information processing systems (NIPS, 2016), pp. 586−594.
C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning requires rethinking generalization,” arXiv, Preprint arXiv:1611.03530 (2016).
Laurent Dinh, Razvan Pascanu, Samy Bengio, and Yoshua Bengio, “Sharp minima can generalize for deep nets,” J. Machine Learning Res. (JMLR) 70, 1019−1028 (2017).
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, June 27–30,2016 (IEEE, New York, 2016), pp. 770−778.
Y. Le Cun, L. D Jackel, B. Boser, J. S. Denker, H. P. Graf, I. Guyon, D. Henderson, R. E. Howard, and W. Hubbard, “Handwritten digit recognition: Applications of neural network chips and automatic learning,” IEEE Commun. Mag. 27 (11), 41−46 (1989).
Ya. Le Cun and C. Cortes, MNIST Handwritten Digit Database (Modified Nat. Inst. of Standards and Technol., 2010).
H. Xiao, K. Rasul, and R. Vollgraf, Fashion-mnist: a Novel Image Dataset for Benchmarking Machine Learning Algorithms arXiv Preprint arXiv:1708.07747 (2017).
A. Krizhevsky and G. Hinton., “Learning multiple layers of features from tiny images,” Tech. Report, Citeseer (2009).
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” in NIPS 2017 Workshop on Autodiff, Long Beach, California, USA, Dec. 7,2017.
D. P. Kingma and J. Ba. Adam, “A method for stochastic optimization,” arXiv Preprint arXiv:1412.6980 (2014).
H. Robbins and S. Monro, “A stochastic approximation method,” Ann. Math. Statist., 400−407 (1951).
X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proc. 13th Int. Conf. on Artificial Intelligence and Statistics, Sardinia, Italy, May 13−15,2010, (AISTATS, 2010), pp. 249−256.
E. D. Sontag, “Vc dimension of neural networks,” NATO ASI Series F Computer and Systems Sci. 168, 69−96 (1998).
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Merkulov, D.M., Oseledets, I.V. Empirical Study of Extreme Overfitting Points of Neural Networks. J. Commun. Technol. Electron. 64, 1527–1534 (2019). https://doi.org/10.1134/S1064226919120118
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1064226919120118