Preventing Overfitting by Training Derivatives

Avrutskiy, V. I.

doi:10.1007/978-3-030-32520-6_12

V. I. Avrutskiy¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1069))

Included in the following conference series:

Proceedings of the Future Technologies Conference

1336 Accesses
2 Citations

Abstract

A seamless data-driven method of eliminating overfitting is proposed. The technique is based on an extended cost function which includes the deviation of network derivatives from the target function derivatives up to the 4^th order. When gradient descent is run for this cost function overfitting becomes nearly non existent. For the most common applications of neural networks high order derivatives of a target are difficult to obtain, so model cases are considered: training a network to approximate an analytical expression inside 2D and 5D domains and solving a Poisson equation inside a 2D circle. To investigate overfitting, fully connected perceptrons of different sizes are trained using sets of points with various density until the cost is stabilized. The extended cost allows to train a network with \(5\cdot 10^{6}\) weights to represent a 2D expression inside \([-1,1]^{2}\) square using only 10 training points with the test set error on average being only 1.5 times higher than the train error. Using the classical cost in comparable conditions results in the test set error being \(2\cdot 10^{4}\) times higher than the train error. In contrast with the common techniques of combating overfitting like regularization or dropout the proposed method is entirely data-driven therefore it introduces no tunable parameters. It also does not restrict weights in any way unlike regularization that can hinder the quality of approximation if its parameters are poorly chosen. Using the extended cost also increases the overall precision by one order of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Avrutskiy, V.I.: Enhancing approximation abilities of neural networks by training derivatives. arXiv preprint arXiv:1712.04473 (2017)
Bengio, Y.: Gradient-based optimization of hyperparameters. Neural Comput. 12(8), 1889–1900 (2000)
Article MathSciNet Google Scholar
Domhan, T., Springenberg, J.T., Hutter, F.: Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)
Google Scholar
Drucker, H., Le Cun, Y.: Improving generalization performance using double backpropagation. IEEE Trans. Neural Networks 3(6), 991–997 (1992)
Article Google Scholar
Flake, G.W., Pearlmutter, B.A.: Differentiating functions of the Jacobian with respect to the weights. In: Advances in Neural Information Processing Systems, pp. 435–441 (2000)
Google Scholar
Hassibi, B., Stork, D.G., Wolff, G.J.: Optimal brain surgeon and general network pruning. In: IEEE International Conference on Neural Networks, pp. 293–299. IEEE (1993)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Google Scholar
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
Hu, H., Peng, R., Tai, Y.-W., Tang, C.-K.: Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250 (2016)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Kumar, M., Yadav, N.: Multilayer perceptrons and radial basis function neural network methods for the solution of differential equations: a survey. Comput. Math. Appl. 62(10), 3796–3811 (2011)
Article MathSciNet Google Scholar
Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural network methods in quantum mechanics. Comput. Phys. Commun. 104(1–3), 1–14 (1997)
Article Google Scholar
Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 9(5), 987–1000 (1998)
Article Google Scholar
Lawrence, S., Giles, C.L.: Overfitting and neural networks: conjugate gradient and backpropagation. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, IJCNN 2000, vol. 1, pp. 114–119. IEEE (2000)
Google Scholar
Lawrence, S., Giles, C.L., Tsoi, A.C.: Lessons in neural network training: overfitting may be harder than expected. In: AAAI/IAAI, pp. 540–545. Citeseer (1997
Google Scholar
LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems, pp. 598–605 (1990)
Google Scholar
Malek, A., Shekari Beidokhti, R.: Numerical solution for high order differential equations using a hybrid neural network optimization method. Appl. Math. Comput. 183(1), 260–271 (2006)
MathSciNet MATH Google Scholar
Mendoza, H., Klein, A., Feurer, M., Springenberg, J.T., Hutter, F.: Towards automatically-tuned neural networks. In: Workshop on Automatic Machine Learning, pp. 58–65 (2016)
Google Scholar
Reed, R.: Pruning algorithms-a survey. IEEE Trans. Neural Networks 4(5), 740–747 (1993)
Article Google Scholar
Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: IEEE International Conference on Neural Networks, pp. 586–591. IEEE (1993)
Google Scholar
Sarle, W.S.: Stopped training and other remedies for overfitting. Comput. Sci. Stat., 352–360 (1996)
Google Scholar
Shirvany, Y., Hayati, M., Moradian, R.: Multilayer perceptron neural networks with novel unsupervised training method for numerical solution of the partial differential equations. Appl. Soft Comput. 9(1), 20–29 (2009)
Article Google Scholar
Simard, P., LeCun, Y., Denker, J., Victorri, B.: Transformation invariance in pattern recognition–tangent distance and tangent propagation. In: Neural Networks: Tricks of the Trade, pp. 549–550 (1998)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Suganuma, M., Shirakawa, S., Nagao, T.: A genetic programming approach to designing convolutional neural network architectures. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 497–504. ACM (2017)
Google Scholar
Tetko, I.V., Livingstone, D.J., Luik, A.I.: Neural network studies. 1. comparison of overfitting and overtraining. J. Chem. Inf. Comput. Sci. 35(5), 826–833 (1995)
Article Google Scholar
Thomas, J.W.: Numerical Partial Differential Equations: Finite Difference Methods, vol. 22. Springer Science & Business Media, New York (2013)
Google Scholar
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)
Zúñiga-Aguilar, C.J., Romero-Ugalde, H.M., Gómez-Aguilar, J.F., Escobar-Jiménez, R.F., Valtierra-Rodríguez, M.: Solving fractional differential equations of variable-order involving operators with mittag-leffler kernel using artificial neural networks. Chaos Solitons Fractals 103, 382–403 (2017)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Institutsky lane 9, Dolgoprudny, Moscow Region, 141700, Russia
V. I. Avrutskiy

Authors

V. I. Avrutskiy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to V. I. Avrutskiy .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Avrutskiy, V.I. (2020). Preventing Overfitting by Training Derivatives. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Proceedings of the Future Technologies Conference (FTC) 2019. FTC 2019. Advances in Intelligent Systems and Computing, vol 1069. Springer, Cham. https://doi.org/10.1007/978-3-030-32520-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-32520-6_12
Published: 13 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32519-0
Online ISBN: 978-3-030-32520-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics