Abstract
A seamless data-driven method of eliminating overfitting is proposed. The technique is based on an extended cost function which includes the deviation of network derivatives from the target function derivatives up to the 4th order. When gradient descent is run for this cost function overfitting becomes nearly non existent. For the most common applications of neural networks high order derivatives of a target are difficult to obtain, so model cases are considered: training a network to approximate an analytical expression inside 2D and 5D domains and solving a Poisson equation inside a 2D circle. To investigate overfitting, fully connected perceptrons of different sizes are trained using sets of points with various density until the cost is stabilized. The extended cost allows to train a network with \(5\cdot 10^{6}\) weights to represent a 2D expression inside \([-1,1]^{2}\) square using only 10 training points with the test set error on average being only 1.5 times higher than the train error. Using the classical cost in comparable conditions results in the test set error being \(2\cdot 10^{4}\) times higher than the train error. In contrast with the common techniques of combating overfitting like regularization or dropout the proposed method is entirely data-driven therefore it introduces no tunable parameters. It also does not restrict weights in any way unlike regularization that can hinder the quality of approximation if its parameters are poorly chosen. Using the extended cost also increases the overall precision by one order of magnitude.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Avrutskiy, V.I.: Enhancing approximation abilities of neural networks by training derivatives. arXiv preprint arXiv:1712.04473 (2017)
Bengio, Y.: Gradient-based optimization of hyperparameters. Neural Comput. 12(8), 1889–1900 (2000)
Domhan, T., Springenberg, J.T., Hutter, F.: Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)
Drucker, H., Le Cun, Y.: Improving generalization performance using double backpropagation. IEEE Trans. Neural Networks 3(6), 991–997 (1992)
Flake, G.W., Pearlmutter, B.A.: Differentiating functions of the Jacobian with respect to the weights. In: Advances in Neural Information Processing Systems, pp. 435–441 (2000)
Hassibi, B., Stork, D.G., Wolff, G.J.: Optimal brain surgeon and general network pruning. In: IEEE International Conference on Neural Networks, pp. 293–299. IEEE (1993)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
Hu, H., Peng, R., Tai, Y.-W., Tang, C.-K.: Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250 (2016)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Kumar, M., Yadav, N.: Multilayer perceptrons and radial basis function neural network methods for the solution of differential equations: a survey. Comput. Math. Appl. 62(10), 3796–3811 (2011)
Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural network methods in quantum mechanics. Comput. Phys. Commun. 104(1–3), 1–14 (1997)
Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 9(5), 987–1000 (1998)
Lawrence, S., Giles, C.L.: Overfitting and neural networks: conjugate gradient and backpropagation. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, IJCNN 2000, vol. 1, pp. 114–119. IEEE (2000)
Lawrence, S., Giles, C.L., Tsoi, A.C.: Lessons in neural network training: overfitting may be harder than expected. In: AAAI/IAAI, pp. 540–545. Citeseer (1997
LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems, pp. 598–605 (1990)
Malek, A., Shekari Beidokhti, R.: Numerical solution for high order differential equations using a hybrid neural network optimization method. Appl. Math. Comput. 183(1), 260–271 (2006)
Mendoza, H., Klein, A., Feurer, M., Springenberg, J.T., Hutter, F.: Towards automatically-tuned neural networks. In: Workshop on Automatic Machine Learning, pp. 58–65 (2016)
Reed, R.: Pruning algorithms-a survey. IEEE Trans. Neural Networks 4(5), 740–747 (1993)
Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: IEEE International Conference on Neural Networks, pp. 586–591. IEEE (1993)
Sarle, W.S.: Stopped training and other remedies for overfitting. Comput. Sci. Stat., 352–360 (1996)
Shirvany, Y., Hayati, M., Moradian, R.: Multilayer perceptron neural networks with novel unsupervised training method for numerical solution of the partial differential equations. Appl. Soft Comput. 9(1), 20–29 (2009)
Simard, P., LeCun, Y., Denker, J., Victorri, B.: Transformation invariance in pattern recognition–tangent distance and tangent propagation. In: Neural Networks: Tricks of the Trade, pp. 549–550 (1998)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Suganuma, M., Shirakawa, S., Nagao, T.: A genetic programming approach to designing convolutional neural network architectures. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 497–504. ACM (2017)
Tetko, I.V., Livingstone, D.J., Luik, A.I.: Neural network studies. 1. comparison of overfitting and overtraining. J. Chem. Inf. Comput. Sci. 35(5), 826–833 (1995)
Thomas, J.W.: Numerical Partial Differential Equations: Finite Difference Methods, vol. 22. Springer Science & Business Media, New York (2013)
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)
Zúñiga-Aguilar, C.J., Romero-Ugalde, H.M., Gómez-Aguilar, J.F., Escobar-Jiménez, R.F., Valtierra-Rodríguez, M.: Solving fractional differential equations of variable-order involving operators with mittag-leffler kernel using artificial neural networks. Chaos Solitons Fractals 103, 382–403 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Avrutskiy, V.I. (2020). Preventing Overfitting by Training Derivatives. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Proceedings of the Future Technologies Conference (FTC) 2019. FTC 2019. Advances in Intelligent Systems and Computing, vol 1069. Springer, Cham. https://doi.org/10.1007/978-3-030-32520-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-32520-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32519-0
Online ISBN: 978-3-030-32520-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)