Advertisement

Cross-validation in MLP Training

  • Hervé A. Bourlard
  • Nelson Morgan
Part of the The Springer International Series in Engineering and Computer Science book series (SECS, volume 247)

Abstract

It is well known that system models which have too many parameters (with respect to the number of measurements) do not generalize well to new measurements. For instance, an autoregressive (AR) model can be derived which will represent the training data with no error by using as many parameters as there are data points. This would generally be of no value, as it would only represent the training data. Criteria such as the Akaike Information Criterion (AIC) [Akaike, 1974, 1986] can be used to penalize both the complexity of AR models and their training error variance. In feedforward nets, we do not currently have such a measure. In fact, given the aim of building systems which are biologically plausible, there is a temptation to assume the usefulness of indefinitely large adaptive networks. In contrast to our best guess at Nature’stricks, manmade systems for pattern recognition seem to require nasty amounts of data for training. In short, the design of massively parallel systems is limited by the number of parameters that can be learned with available training data. It is likely that the only way truly massive systems can be built is with the help of prior information, e.g., connection topology and weights that need not be learned [Feldman et al., 1988]. Learning theory [Valiant, 1984; Pearl, 1978] has begun to establish what is possible for trained systems. Order-of-magnitude lower bounds have been established for the number of required measurements to train a desired size feedforward net [Baum & Haussler, 1988]. Rules of thumb suggesting the number of samples required for specific distributions could be useful for practical problems. Widrow has suggested having a training sample size that is 10 times the number of weights in a network (“Uncle Berllie’s Rule”) [Widrow, 1987].

Keywords

Speech Recognition Network Size Hide Unit Training Pattern Trained System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer Science+Business Media New York 1994

Authors and Affiliations

  • Hervé A. Bourlard
    • 1
    • 2
  • Nelson Morgan
    • 2
    • 3
  1. 1.Lernout & Hauspie Speech ProductsBelgium
  2. 2.International Computer Science InstituteBerkeleyUSA
  3. 3.University of CaliforniaBerkeleyUSA

Personalised recommendations