Preventing Overfitting by Training Derivatives

  • V. I. AvrutskiyEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1069)


A seamless data-driven method of eliminating overfitting is proposed. The technique is based on an extended cost function which includes the deviation of network derivatives from the target function derivatives up to the 4th order. When gradient descent is run for this cost function overfitting becomes nearly non existent. For the most common applications of neural networks high order derivatives of a target are difficult to obtain, so model cases are considered: training a network to approximate an analytical expression inside 2D and 5D domains and solving a Poisson equation inside a 2D circle. To investigate overfitting, fully connected perceptrons of different sizes are trained using sets of points with various density until the cost is stabilized. The extended cost allows to train a network with \(5\cdot 10^{6}\) weights to represent a 2D expression inside \([-1,1]^{2}\) square using only 10 training points with the test set error on average being only 1.5 times higher than the train error. Using the classical cost in comparable conditions results in the test set error being \(2\cdot 10^{4}\) times higher than the train error. In contrast with the common techniques of combating overfitting like regularization or dropout the proposed method is entirely data-driven therefore it introduces no tunable parameters. It also does not restrict weights in any way unlike regularization that can hinder the quality of approximation if its parameters are poorly chosen. Using the extended cost also increases the overall precision by one order of magnitude.


Neural networks Overfitting Partial differential equations High order derivatives Function approximation 


  1. 1.
    Avrutskiy, V.I.: Enhancing approximation abilities of neural networks by training derivatives. arXiv preprint arXiv:1712.04473 (2017)
  2. 2.
    Bengio, Y.: Gradient-based optimization of hyperparameters. Neural Comput. 12(8), 1889–1900 (2000)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Domhan, T., Springenberg, J.T., Hutter, F.: Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)Google Scholar
  4. 4.
    Drucker, H., Le Cun, Y.: Improving generalization performance using double backpropagation. IEEE Trans. Neural Networks 3(6), 991–997 (1992)CrossRefGoogle Scholar
  5. 5.
    Flake, G.W., Pearlmutter, B.A.: Differentiating functions of the Jacobian with respect to the weights. In: Advances in Neural Information Processing Systems, pp. 435–441 (2000)Google Scholar
  6. 6.
    Hassibi, B., Stork, D.G., Wolff, G.J.: Optimal brain surgeon and general network pruning. In: IEEE International Conference on Neural Networks, pp. 293–299. IEEE (1993)Google Scholar
  7. 7.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)Google Scholar
  8. 8.
    Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
  9. 9.
    Hu, H., Peng, R., Tai, Y.-W., Tang, C.-K.: Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250 (2016)
  10. 10.
    Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014)
  11. 11.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  12. 12.
    Kumar, M., Yadav, N.: Multilayer perceptrons and radial basis function neural network methods for the solution of differential equations: a survey. Comput. Math. Appl. 62(10), 3796–3811 (2011)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural network methods in quantum mechanics. Comput. Phys. Commun. 104(1–3), 1–14 (1997)CrossRefGoogle Scholar
  14. 14.
    Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 9(5), 987–1000 (1998)CrossRefGoogle Scholar
  15. 15.
    Lawrence, S., Giles, C.L.: Overfitting and neural networks: conjugate gradient and backpropagation. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, IJCNN 2000, vol. 1, pp. 114–119. IEEE (2000)Google Scholar
  16. 16.
    Lawrence, S., Giles, C.L., Tsoi, A.C.: Lessons in neural network training: overfitting may be harder than expected. In: AAAI/IAAI, pp. 540–545. Citeseer (1997Google Scholar
  17. 17.
    LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems, pp. 598–605 (1990)Google Scholar
  18. 18.
    Malek, A., Shekari Beidokhti, R.: Numerical solution for high order differential equations using a hybrid neural network optimization method. Appl. Math. Comput. 183(1), 260–271 (2006)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Mendoza, H., Klein, A., Feurer, M., Springenberg, J.T., Hutter, F.: Towards automatically-tuned neural networks. In: Workshop on Automatic Machine Learning, pp. 58–65 (2016)Google Scholar
  20. 20.
    Reed, R.: Pruning algorithms-a survey. IEEE Trans. Neural Networks 4(5), 740–747 (1993)CrossRefGoogle Scholar
  21. 21.
    Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: IEEE International Conference on Neural Networks, pp. 586–591. IEEE (1993)Google Scholar
  22. 22.
    Sarle, W.S.: Stopped training and other remedies for overfitting. Comput. Sci. Stat., 352–360 (1996)Google Scholar
  23. 23.
    Shirvany, Y., Hayati, M., Moradian, R.: Multilayer perceptron neural networks with novel unsupervised training method for numerical solution of the partial differential equations. Appl. Soft Comput. 9(1), 20–29 (2009)CrossRefGoogle Scholar
  24. 24.
    Simard, P., LeCun, Y., Denker, J., Victorri, B.: Transformation invariance in pattern recognition–tangent distance and tangent propagation. In: Neural Networks: Tricks of the Trade, pp. 549–550 (1998)Google Scholar
  25. 25.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Suganuma, M., Shirakawa, S., Nagao, T.: A genetic programming approach to designing convolutional neural network architectures. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 497–504. ACM (2017)Google Scholar
  27. 27.
    Tetko, I.V., Livingstone, D.J., Luik, A.I.: Neural network studies. 1. comparison of overfitting and overtraining. J. Chem. Inf. Comput. Sci. 35(5), 826–833 (1995)CrossRefGoogle Scholar
  28. 28.
    Thomas, J.W.: Numerical Partial Differential Equations: Finite Difference Methods, vol. 22. Springer Science & Business Media, New York (2013)Google Scholar
  29. 29.
    Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)
  30. 30.
    Zúñiga-Aguilar, C.J., Romero-Ugalde, H.M., Gómez-Aguilar, J.F., Escobar-Jiménez, R.F., Valtierra-Rodríguez, M.: Solving fractional differential equations of variable-order involving operators with mittag-leffler kernel using artificial neural networks. Chaos Solitons Fractals 103, 382–403 (2017)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Moscow Institute of Physics and TechnologyDolgoprudny, Moscow RegionRussia

Personalised recommendations