Neural Networks: A Statistician’s (Possible) View

  • K. Hornik
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Summary

Within the past few years, neural networks (NNs) have emerged as a popular, rather general-purpose means of data processing and analysis. As in most applications they are employed to perform rather standard statistical tasks like regression analysis and classification, one might wonder what is really new about them. We shed some light on this issue from a statistician’s point of view by “translating” neural network terminology into more familiar terms and then discussing some of their most important properties. Particular attention is given to “supervised” classification, i.e., discriminant analysis.

Keywords

Sorting Nite Stim Estima 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AKAIKE, H. (1973): Information theory and an extension of the maximum likelihood principle. In Petrov, B. N. and Csáki, F. (eds.), Second International Symposium on Information Theory, pp. 267–281. Budapest, Hungary: Akademiai Kiado.Google Scholar
  2. ANTHONY, M. (1994): Probabilistic analysis of learning in artificial neural networks: The PAC model and its variants. Tech. Rep. NC-TR-94-3, NeuroColt Technical Report Series.Google Scholar
  3. BALDI, P. and HORNIK, K. (1995): Learning in linear neural networks: a survey. IEEE Transactions on Neural Networks, NN-6(4), 837–858.CrossRefGoogle Scholar
  4. BARRON, A. R. (1992): Neural net approximation. In Narendra, K. (ed.), Proceedings of the 6th Yale Workshop on Adaptive Learning Systems, pp. 69–72. New Haven: Yale University.Google Scholar
  5. BARRON, A. R. (1993): Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, IT-39, 930–945.CrossRefGoogle Scholar
  6. BRUCK, J. (1990): Harmonic analysis of polynomial threshold functions. SIAM Journal on Discrete Mathematics, 3, 168–177.CrossRefGoogle Scholar
  7. BRUCK, J. and SMOLENSKY, P. (1992): Polynomial threshold functions, AC0 functions and spectral norms. SIAM Journal on Computing, 21, 33–42.CrossRefGoogle Scholar
  8. CYBENKO, G. (1989): Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2, 303–314.CrossRefGoogle Scholar
  9. FUNAHASHI, K. (1989): On the approximate realization of continuous mappings by neural networks. Neural Networks, 2, 183–192.CrossRefGoogle Scholar
  10. GOLDBERG, P. and JARRUM, M. (1993): Bounding the Vapnik-Chervonenkis dimension of concept classes parametrized by real numbers. In Proceedings of the 6th Annual ACM Conference on Computational Learning Theory, pp. 361–369. New York: ACM Press.CrossRefGoogle Scholar
  11. HAYKIN, S. (1994): Neural Networks: A Comprehensive Foundation. New York: Macmillan College Publishing.Google Scholar
  12. HORNIK, K. (1991): Approximation capabilities of multilayer feedforward networks. Neural Networks, 4, 251–257.CrossRefGoogle Scholar
  13. HORNIK, K. (1993): Some new results on neural network approximation. Neural Networks, 6, 1069–1072.CrossRefGoogle Scholar
  14. HORNIK, K.; STINCHCOMBE, M. and WHITE, H. (1989): Multilayer feedforward networks are universal approximators. Neural Networks, 2, 359–366.CrossRefGoogle Scholar
  15. JUDD, J. S. (1990): Neural Network Design and the Complexity of Learning. Cambridge, MA: MIT Press.Google Scholar
  16. KARPINSKI, M. and MACINTYRE, A. (1994): Polynomial bounds for the VC dimension of sigmoidal neural networks. Tech. Rep. 85116-CS, University of Bonn.Google Scholar
  17. KOHONEN, T. (1989): Self-Organization and Associative Memory. Berlin: Springer, 3rd edn.CrossRefGoogle Scholar
  18. KOHONEN, T. (1995): Self-organizing maps. Berlin: Springer.CrossRefGoogle Scholar
  19. LIN, J.-H. and VITTER, J. S. (1991): Complexity results on learning by neural nets. Machine Learning, 6, 211–230.Google Scholar
  20. MACKAY, D. J. C. (1992a): Bayesian interpolation. Neural Computation, 4(3), 415–447.CrossRefGoogle Scholar
  21. MACKAY, D. J. C. (1992b): The evidence framework applied to classification networks. Neural Computation, 4(5), 720–736.CrossRefGoogle Scholar
  22. MACKAY, D. J. C. (1992c): A practical bayesian framework for backpropagation networks. Neural Computation, 4(3), 448–472.CrossRefGoogle Scholar
  23. MURATA, N.; YOSHIZAWA, S. and AMARI, S.-I. (1994): Network information criterion—determining the number of hidden units for an artificial neural network model. Neural Networks, 5(6), 865–872.CrossRefGoogle Scholar
  24. RIPLEY, B. (1993): Statistical aspects of neural networks. In Barndorff-Nielsen, O. E., Jensen, J. L., and Kendall, W. S. (eds.), Networks and Chaos—Statistical and Probabilistic Aspects, vol. 50 of Monographs on Statistics and Applied Probability, pp. 40–123. London: Chapman and Hall.Google Scholar
  25. RIPLEY, B. (1996): Pattern Recognition and Neural Networks. Cambridge University Press.Google Scholar
  26. RISSANEN, J. (1986): Stochastic complexity and modeling. The Annals of Statistics, 14(3), 1080–1100.CrossRefGoogle Scholar
  27. RUMELHART, D. E.; Hinton, G. E. and Williams, R. J. (1986): Learning representations by backpropagating errors. Nature, 323, 533–536.CrossRefGoogle Scholar
  28. SIEGELMANN, H. T. and Sontag, E. D. (1994): Analog computation, neural networks, and circuits. Theor. Comp. Sci., 131, 331–360.CrossRefGoogle Scholar
  29. SIEGELMANN, H. T. and Sontag, E. D. (1995): On the computational power of neural nets. J. Comp. Syst. Sci., 50, 132–150.CrossRefGoogle Scholar
  30. SONTAG, E. D. (1992): Feedforward nets for interpolation and classification. J. Comp. Syst. Sci., 45, 20–48.CrossRefGoogle Scholar
  31. STONE, M. (1974): Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society Series B, 36, 111–147.Google Scholar
  32. VALIANT, L. G. (1984): A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.CrossRefGoogle Scholar
  33. VAPNIK, V. N. (1995): The nature of statistical learning theory. New York: Springer.Google Scholar
  34. VAPNIK, V. N. and Chervonenkis, A. Y. (1991): The necessary and sufficient conditions for consistency of the method of empirical risk minimization. Pattern Recognition and Image Analysis, 1(3), 284–305.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • K. Hornik
    • 1
  1. 1.Institut für Statistik und WahrscheinlichkeitstheorieTechnische Universität WienWienAustria

Personalised recommendations