Advertisement

Circuits, Systems and Signal Processing

, Volume 12, Issue 2, pp 331–374 | Cite as

Neural network constructive algorithms: Trading generalization for learning efficiency?

  • F. J. Śmieja
Article

Abstract

There are currently several types of constructive, (or growth), algorithms available for training a feed-forward neural network. This paper describes and explains the main ones, using a fundamental approach to the multi-layer perceptron problem-solving mechanisms. The claimed convergence properties of the algorithms are verified using just two mapping theorems, which consequently enables all the algorithms to be unified under a basic mechanism. The algorithms are compared and contrasted and the deficiencies of some highlighted. The fundamental reasons for the actual success of these algorithms are extracted, and used to suggest where they might most fruitfully be applied. A suspicion that they are not a panacea for all current neural network difficulties, and that one must somewhere along the line pay for the learning efficiency they promise, is developed into an argument that their generalization abilities will lie on average below that of back-propagation.

Keywords

Neural Network Basic Mechanism Convergence Property Generalization Ability Actual Success 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    S. Ahmad and G. Tesauro, A study of scaling and generalization in neural networks,Abstracts First Ann. Mtg. INNS, Suppl. to Neural Networks, vol. 1, Boston, Massachusetts, September, 1988.Google Scholar
  2. [2]
    T. Ash, Dynamic node creation in backpropagation networks,Connection Science,1(4), 1989.Google Scholar
  3. [3]
    K. P. Bennett and O. L. Mangasarian, Neural network training via linear programming,Technical Report 948, Computer Sciences Department, University of Wisconsin-Madison, July, 1990.Google Scholar
  4. [4]
    J. Denker, D. Schwartz, B. Wittner, S. Solla, R. Howard, L. Jackel, and J. Hopfield, Large automatic learning, rule extraction and generalization,Complex Systems,1(5), 1987.Google Scholar
  5. [5]
    S. E. Fahlman, Faster-learning variations on back-propagation: an empirical study,Proc. 1988 Connectionist Models Summer School, 1988.Google Scholar
  6. [6]
    S. E. Fahlman and C. Lebiere, The cascade-correlation learning architecture, D. S. Touretzky, ed.,Advances in Neural Information Processing Systems 2, Morgen Kaufmann, 1990.Google Scholar
  7. [7]
    M. Frean, The upstart algorithm: a method for constructing and training feedforward neural networks,Neural Computation,1:198–209, 1990.Google Scholar
  8. [8]
    R. M. French, Using semi-distributed representations to overcome catastrophic forgetting in connectionist networks,Technical Report CRCC-51-1991, Bloomington, Indiana: Indiana University, 1991.Google Scholar
  9. [9]
    S. I. Gallant, Three constructive algorithms for network learning,Prog. 8th Ann. Conf. Cognitive Science Society, Hillsday, NJ, August, 1986.Google Scholar
  10. [10]
    S. I. Gallant, Perceptron-based learning algorithms,IEEE Trans. Neural Networks,1(2), 1990.Google Scholar
  11. [11]
    I. Guyon, Applications of neural networks to character recognition,Int. J. Pattern Recognition and Artificial Intelligence,5(1–2):353–382, 1991.Google Scholar
  12. [12]
    G. E. Hinton, Learning distributed representations of concepts,Eighth Ann. Conf. Cognitive Sci. Soc., 1986.Google Scholar
  13. [13]
    R. A. Jacobs and M. I. Jordan, Adaptive mixtures of local experts,Neural Computation,3(1), 1991.Google Scholar
  14. [14]
    S. Judd, Learning in networks is hard,Proc. IEEE First Int. Conf. Neural Networks, San Diego, pp. 685–692, 1987.Google Scholar
  15. [15]
    O. L. Mangasarian, Multisurface method of pattern separation,IEEE Trans. Information Theory, IT-14(6):801–807, 1968.Google Scholar
  16. [16]
    M. Mézard and J-P. Nadal, Learning in feedforward layered networks: The tiling algorithm,Journal of Physics A,22(12):2191–2203, 1989.Google Scholar
  17. [17]
    M. Minsky and S. Papert,Perceptrons, MIT Press, 1969.Google Scholar
  18. [18]
    H. Mühlenbein, Limitations of multilayer perceptrons — steps towards genetic neural networks,Parallel Computing,14(3):249–260, 1990.Google Scholar
  19. [19]
    J-P. Nadal, Study of a growth algorithm for a feedforward neural network,Int. J. Neural Systems,1(1):55–59, 1989.Google Scholar
  20. [20]
    F. Rosenblatt,Principles of Neurodynamics, New York: Spartan Books, 1959.Google Scholar
  21. [21]
    P. Ruján, Reliable training required negative examples,Neural Network Review,3(3): 123–125, 1990.Google Scholar
  22. [22]
    P. Ruján and M. Marchand, Learning by minimizing resources in neural networks,Complex Systems,3:229–241, 1989.Google Scholar
  23. [23]
    D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning internal representations by error propagation,Nature,323(533), 1986.Google Scholar
  24. [24]
    A. Schüte, How can multi-layer perceptrons classify,Proc. Workshop on Distributed Adaptive Neural Inf. Proc, GMD, St. Augustin, Germany: Oldenbourg Verlag, April, 1989.Google Scholar
  25. [25]
    T. J. Sejnowski and C. R. Rosenberg, NETtalk: A parallel network that learns to read aloud,Complex Systems,1(1), 1987.Google Scholar
  26. [26]
    H. Shvaytser, Even simple nets cannot be trained reliably with a polynomial number of examples,Proc. Int. J. Conf. Neural Networks,2, pp. 141–145, Washington, D.C., New York: IEEE Press, June, 1989.Google Scholar
  27. [27]
    F. J. Śmieja, MLP solutions, generalization and hidden-unit representations,Proc. Workshop on Distributed Adaptive Neural Inf. Proc., St. Augustin, Germany: Oldenbourg Verlag, April, 1989. See also Edinburgh University Physics preprint 89/461.Google Scholar
  28. [28]
    F. J. Śmieja, Hyperplane “spin” dynamics, network plasticity and back-propagation learning,Technical Report No. 634, Gesellschaft für Mathematik und Datenverarbeitung, St. Augustin, Germany, November, 1991. Submitted toComplex Systems.Google Scholar
  29. [29]
    F. J. Śmieja, Multiple network systems (MINOS) modules: Task division and module discrimination,Proc. 8th AISB Conf. Artificial Intelligence, Leeds, 16–19 April, 1991. Also available as GMD Technical Report No. 638.Google Scholar
  30. [30]
    F. J. Śmieja and H. Mühlenbein, The geometry of multilayer perceptron solutions,Parallel Computing,14:261–275, 1990.Google Scholar
  31. [31]
    F. J. Śmieja and H. Mühlenbein, Reflective modular neural network systems,Technical Report No. 633, GMD, St. Augustin, Germany, February, 1992.Google Scholar
  32. [32]
    F. J. Śmieja and G. D. Richards, Hard learning the easy way: Backpropagation and deformation,Complex Systems,2(4), 1988.Google Scholar
  33. [33]
    G. Tesauro, Scaling relationships in backpropagation learning: dependence on training set size,Complex Systems,1:367–372, 1987.Google Scholar
  34. [34]
    G. Tesauro and B. Janssens, Scaling relationships in backpropagation learning,Complex Systems,2:39–44, 1988.Google Scholar
  35. [35]
    A. Waibel, Modular construction of time-delay neural networks for speech recognition,Neural Computation,1, 1989.Google Scholar

Copyright information

© Birkhäuser 1993

Authors and Affiliations

  • F. J. Śmieja
    • 1
  1. 1.German National Research Centre for Computer Science (GMD)Schloß BirlinghovenGermany

Personalised recommendations