Advertisement

Neural Computing and Applications

, Volume 31, Issue 12, pp 9241–9260 | Cite as

Evaluating generalization through interval-based neural network inversion

  • Stavros P. AdamEmail author
  • Aristidis C. Likas
  • Michael N. Vrahatis
Original Article
  • 92 Downloads

Abstract

Typically, measuring the generalization ability of a neural network relies on the well-known method of cross-validation which statistically estimates the classification error of a network architecture thus assessing its generalization ability. However, for a number of reasons, cross-validation does not constitute an efficient and unbiased estimator of generalization and cannot be used to assess generalization of neural network after training. In this paper, we introduce a new method for evaluating generalization based on a deterministic approach revealing and exploiting the network’s domain of validity. This is the area of the input space containing all the points for which a class-specific network output provides values higher than a certainty threshold. The proposed approach is a set membership technique which defines the network’s domain of validity by inverting its output activity on the input space. For a trained neural network, the result of this inversion is a set of hyper-boxes which constitute a reliable and \(\varepsilon\)-accurate computation of the domain of validity. Suitably defined metrics on the volume of the domain of validity provide a deterministic estimation of the generalization ability of the trained network not affected by random test set selection as with cross-validation. The effectiveness of the proposed generalization measures is demonstrated on illustrative examples using artificial and real datasets using swallow feed-forward neural networks such as Multi-layer perceptrons.

Keywords

Neural networks Generalization Inversion Interval analysis Reliable computing 

Abbreviations

HPD

Highest posterior density

INTLAB

INTerval LABoratory

IA

Interval analysis

MLP

Multi-layer perceptron

OTS

Off training set

PDF

Probability density function

SCS

Set computations with subpavings

SIVIA

Set inversion via interval analysis

Notes

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable suggestions and comments on earlier version of the manuscript that helped to significantly improve the paper at hand.

Compliance with ethical standards

Conflict of interest:

The authors declare that they have no conflict of interest.

References

  1. 1.
    Adam SP, Karras DA, Magoulas GD, Vrahatis MN (2015) Reliable estimation of a neural network’s domain of validity through interval analysis based inversion. In: 2015 international joint conference on neural networks (IJCNN), pp 1–8.  https://doi.org/10.1109/IJCNN.2015.7280794
  2. 2.
    Adam SP, Likas AC, Vrahatis MN (2017) Interval analysis based neural network inversion: a means for evaluating generalization. In: Boracchi G, Iliadis L, Jayne C, Likas A (eds) Engineering applications of neural networks. Springer International Publishing, Berlin, pp 314–326CrossRefGoogle Scholar
  3. 3.
    Adam SP, Magoulas GD, Karras DA, Vrahatis MN (2016) Bounding the search space for global optimization of neural networks learning error: an interval analysis approach. J Mach Learn Res 17(169):1–40. http://jmlr.org/papers/v17/14-350.html
  4. 4.
    Bishop CM (1996) Neural networks for pattern recognition. Oxford University Press, OxfordzbMATHGoogle Scholar
  5. 5.
    Courrieu P (1994) Three algorithms for estimating the domain of validity of feedforward neural networks. Neural Netw 7(1):169–174CrossRefGoogle Scholar
  6. 6.
    Eberhart R, Dobbins R (1991) Designing neural network explanation facilities using genetic algorithms. In: 1991 IEEE international joint conference on neural networks, vol 2, pp 1758–1763Google Scholar
  7. 7.
    Hampshire II JB, Pearlmutter BA (1991) Equivalence proofs for multilayer perceptron classifiers and the Bayesian discriminant function. In: Proceedings of the 1990 connectionist models summer school, vol 1, pp 159–172CrossRefGoogle Scholar
  8. 8.
    Hassoun MH (1995) Fundamentals of artificial neural networks. MIT Press, CambridgezbMATHGoogle Scholar
  9. 9.
    Haykin S (1999) Neural networks a comprehensive foundation, 2nd edn. Prentice-Hall, Upper Saddle River, NJzbMATHGoogle Scholar
  10. 10.
    Hernández-Espinosa C, Fernández-Redondo M, Ortiz-Gómez M (2003) Inversion of a Neural Network via Interval Arithmetic for Rule Extraction. In: Kaynak O, Alpaydin E, Oja E, Xu L (eds) Artificial Neural Networks and Neural Information Processing ICANN/ICONIP 2003, vol 2714. Springer, Berlin Heidelberg, pp 670–677 Lecture Notes in Computer SciencezbMATHCrossRefGoogle Scholar
  11. 11.
    Jaulin L, Kieffer M, Didrit O, Walter E (2001) Applied interval analysis with examples in parameter and state estimation, robust control and robotics. Springer, LondonzbMATHGoogle Scholar
  12. 12.
    Jaulin L, Walter E (1993) Set inversion via interval analysis for nonlinear bounded-error estimation. Automatica 29(4):1053–1064MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Jensen C, Reed R, Marks R, El-Sharkawi M, Jung JB, Miyamoto R, Anderson G, Eggen C (1999) Inversion of feedforward neural networks: algorithms and applications. In: Proceedings of the IEEE 87(9):1536–1549CrossRefGoogle Scholar
  14. 14.
    Kamimura R (2017) Mutual information maximization for improving and interpreting multi-layered neural networks. In: 2017 IEEE symposium series on computational intelligence (SSCI), pp 1–7Google Scholar
  15. 15.
    Karystinos GN, Pados DA (2000) On overfitting, generalization, and randomly expanded training sets. IEEE Trans Neural Netw 11(5):1050–1057CrossRefGoogle Scholar
  16. 16.
    Kearfott RB (1996) Interval computations: introduction, uses, and resources. Euromath Bull 2(1):95–112MathSciNetGoogle Scholar
  17. 17.
    Kiefer J, Wolfowitz J (1952) Stochastic estimation of the maximum of a regression function. Ann Math Stat 23:462–466MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    Kindermann J, Linden A (1990) Inversion of neural networks by gradient descent. Parallel Comput 14(3):277–286CrossRefGoogle Scholar
  19. 19.
    Likas A (2001) Probability density estimation using artificial neural networks. Comput Phys Commun 135(2):167–175zbMATHCrossRefGoogle Scholar
  20. 20.
    Liu Y (1995) Unbiased estimate of generalization error and model selection in neural network. Neural Netw 8(2):215–219MathSciNetCrossRefGoogle Scholar
  21. 21.
    Lu BL, Kita H, Nishikawa Y (1999) Inverting feedforward neural networks using linear and nonlinear programming. IEEE Trans Neural Netw 10(6):1271–1290CrossRefGoogle Scholar
  22. 22.
    Novak R, Bahri Y, Abolafia DA, Pennington J, Sohl-Dickstein J (2018) Sensitivity and generalization in neural networks: an empirical study. In: International conference on learning representations. https://openreview.net/forum?id=HJC2SzZCW
  23. 23.
    Reed R, Marks R (1995) An evolutionary algorithm for function inversion and boundary marking. In: IEEE international conference on evolutionary computation, 1995, vol  2, pp 794–797Google Scholar
  24. 24.
    Richard M, Lippmann R (1991) Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Comput 3(4):461–483.  https://doi.org/10.1162/neco.1991.3.4.461 CrossRefGoogle Scholar
  25. 25.
    Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22:400–407MathSciNetzbMATHCrossRefGoogle Scholar
  26. 26.
    Rump SM (1999) INTLAB - INTerval LABoratory. In: Csendes T (ed) Developments in reliable computing. Kluwer Academic, Dordrecht, Netherlands, pp 77–104CrossRefGoogle Scholar
  27. 27.
    Saad EW, Wunsch DC II (2007) Neural network explanation using inversion. Neural Netw 20(1):78–93zbMATHCrossRefGoogle Scholar
  28. 28.
    Theodoridis S, Pikrakis A, Koutroumbas K, Kavouras D (2010) Introduction to pattern recognition: a MATLAB approach. Academic Press, Burlington, MA 01803, USAGoogle Scholar
  29. 29.
    Thrun SB (1993) Extracting provably correct rules from artificial neural networks. Technical Report IAI–TR–93–5, Institut fur Informatik III, Bonn, GermanyGoogle Scholar
  30. 30.
    Tornil-Sin S, Puig V, Escobet T (2010) Set computations with subpavings in MATLAB: the SCS toolbox. In: 2010 IEEE international symposium on computer-aided control system design (CACSD), pp 1403–1408Google Scholar
  31. 31.
    Wolpert DH (1990) A mathematical theory of generalization: part I. Complex Syst 4(2):151–200zbMATHGoogle Scholar
  32. 32.
    Wolpert DH (1990) A mathematical theory of generalization: part II. Complex Syst 4(2):201–249zbMATHGoogle Scholar
  33. 33.
    Wolpert DH (1992) On the connection between in-sample testing and generalization error. Complex Syst 6(1):47–94MathSciNetzbMATHGoogle Scholar
  34. 34.
    Wolpert DH (1996) The existence of a priori distinctions between learning algorithms. Neural Comput 8(7):1391–1420.  https://doi.org/10.1162/neco.1996.8.7.1391 CrossRefGoogle Scholar
  35. 35.
    Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390.  https://doi.org/10.1162/neco.1996.8.7.1341 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  • Stavros P. Adam
    • 1
    • 2
    Email author
  • Aristidis C. Likas
    • 3
  • Michael N. Vrahatis
    • 2
  1. 1.Department of Informatics and TelecommunicationsUniversity of IoanninaArtaGreece
  2. 2.Computational Intelligence Laboratory, Department of MathematicsUniversity of PatrasPatrasGreece
  3. 3.Department of Computer Science and EngineeringUniversity of IoanninaIoanninaGreece

Personalised recommendations