Skip to main content

Preventing Overfitting by Training Derivatives

  • Conference paper
  • First Online:
Proceedings of the Future Technologies Conference (FTC) 2019 (FTC 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1069))

Included in the following conference series:

Abstract

A seamless data-driven method of eliminating overfitting is proposed. The technique is based on an extended cost function which includes the deviation of network derivatives from the target function derivatives up to the 4th order. When gradient descent is run for this cost function overfitting becomes nearly non existent. For the most common applications of neural networks high order derivatives of a target are difficult to obtain, so model cases are considered: training a network to approximate an analytical expression inside 2D and 5D domains and solving a Poisson equation inside a 2D circle. To investigate overfitting, fully connected perceptrons of different sizes are trained using sets of points with various density until the cost is stabilized. The extended cost allows to train a network with \(5\cdot 10^{6}\) weights to represent a 2D expression inside \([-1,1]^{2}\) square using only 10 training points with the test set error on average being only 1.5 times higher than the train error. Using the classical cost in comparable conditions results in the test set error being \(2\cdot 10^{4}\) times higher than the train error. In contrast with the common techniques of combating overfitting like regularization or dropout the proposed method is entirely data-driven therefore it introduces no tunable parameters. It also does not restrict weights in any way unlike regularization that can hinder the quality of approximation if its parameters are poorly chosen. Using the extended cost also increases the overall precision by one order of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Avrutskiy, V.I.: Enhancing approximation abilities of neural networks by training derivatives. arXiv preprint arXiv:1712.04473 (2017)

  2. Bengio, Y.: Gradient-based optimization of hyperparameters. Neural Comput. 12(8), 1889–1900 (2000)

    Article  MathSciNet  Google Scholar 

  3. Domhan, T., Springenberg, J.T., Hutter, F.: Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)

    Google Scholar 

  4. Drucker, H., Le Cun, Y.: Improving generalization performance using double backpropagation. IEEE Trans. Neural Networks 3(6), 991–997 (1992)

    Article  Google Scholar 

  5. Flake, G.W., Pearlmutter, B.A.: Differentiating functions of the Jacobian with respect to the weights. In: Advances in Neural Information Processing Systems, pp. 435–441 (2000)

    Google Scholar 

  6. Hassibi, B., Stork, D.G., Wolff, G.J.: Optimal brain surgeon and general network pruning. In: IEEE International Conference on Neural Networks, pp. 293–299. IEEE (1993)

    Google Scholar 

  7. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)

    Google Scholar 

  8. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)

  9. Hu, H., Peng, R., Tai, Y.-W., Tang, C.-K.: Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250 (2016)

  10. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014)

  11. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  12. Kumar, M., Yadav, N.: Multilayer perceptrons and radial basis function neural network methods for the solution of differential equations: a survey. Comput. Math. Appl. 62(10), 3796–3811 (2011)

    Article  MathSciNet  Google Scholar 

  13. Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural network methods in quantum mechanics. Comput. Phys. Commun. 104(1–3), 1–14 (1997)

    Article  Google Scholar 

  14. Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 9(5), 987–1000 (1998)

    Article  Google Scholar 

  15. Lawrence, S., Giles, C.L.: Overfitting and neural networks: conjugate gradient and backpropagation. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, IJCNN 2000, vol. 1, pp. 114–119. IEEE (2000)

    Google Scholar 

  16. Lawrence, S., Giles, C.L., Tsoi, A.C.: Lessons in neural network training: overfitting may be harder than expected. In: AAAI/IAAI, pp. 540–545. Citeseer (1997

    Google Scholar 

  17. LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems, pp. 598–605 (1990)

    Google Scholar 

  18. Malek, A., Shekari Beidokhti, R.: Numerical solution for high order differential equations using a hybrid neural network optimization method. Appl. Math. Comput. 183(1), 260–271 (2006)

    MathSciNet  MATH  Google Scholar 

  19. Mendoza, H., Klein, A., Feurer, M., Springenberg, J.T., Hutter, F.: Towards automatically-tuned neural networks. In: Workshop on Automatic Machine Learning, pp. 58–65 (2016)

    Google Scholar 

  20. Reed, R.: Pruning algorithms-a survey. IEEE Trans. Neural Networks 4(5), 740–747 (1993)

    Article  Google Scholar 

  21. Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: IEEE International Conference on Neural Networks, pp. 586–591. IEEE (1993)

    Google Scholar 

  22. Sarle, W.S.: Stopped training and other remedies for overfitting. Comput. Sci. Stat., 352–360 (1996)

    Google Scholar 

  23. Shirvany, Y., Hayati, M., Moradian, R.: Multilayer perceptron neural networks with novel unsupervised training method for numerical solution of the partial differential equations. Appl. Soft Comput. 9(1), 20–29 (2009)

    Article  Google Scholar 

  24. Simard, P., LeCun, Y., Denker, J., Victorri, B.: Transformation invariance in pattern recognition–tangent distance and tangent propagation. In: Neural Networks: Tricks of the Trade, pp. 549–550 (1998)

    Google Scholar 

  25. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  26. Suganuma, M., Shirakawa, S., Nagao, T.: A genetic programming approach to designing convolutional neural network architectures. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 497–504. ACM (2017)

    Google Scholar 

  27. Tetko, I.V., Livingstone, D.J., Luik, A.I.: Neural network studies. 1. comparison of overfitting and overtraining. J. Chem. Inf. Comput. Sci. 35(5), 826–833 (1995)

    Article  Google Scholar 

  28. Thomas, J.W.: Numerical Partial Differential Equations: Finite Difference Methods, vol. 22. Springer Science & Business Media, New York (2013)

    Google Scholar 

  29. Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)

  30. Zúñiga-Aguilar, C.J., Romero-Ugalde, H.M., Gómez-Aguilar, J.F., Escobar-Jiménez, R.F., Valtierra-Rodríguez, M.: Solving fractional differential equations of variable-order involving operators with mittag-leffler kernel using artificial neural networks. Chaos Solitons Fractals 103, 382–403 (2017)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. I. Avrutskiy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Avrutskiy, V.I. (2020). Preventing Overfitting by Training Derivatives. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Proceedings of the Future Technologies Conference (FTC) 2019. FTC 2019. Advances in Intelligent Systems and Computing, vol 1069. Springer, Cham. https://doi.org/10.1007/978-3-030-32520-6_12

Download citation

Publish with us

Policies and ethics