Abstract
It is well known that system models which have too many parameters (with respect to the number of measurements) do not generalize well to new measurements. For instance, an autoregressive (AR) model can be derived which will represent the training data with no error by using as many parameters as there are data points. This would generally be of no value, as it would only represent the training data. Criteria such as the Akaike Information Criterion (AIC) [Akaike, 1974, 1986] can be used to penalize both the complexity of AR models and their training error variance. In feedforward nets, we do not currently have such a measure. In fact, given the aim of building systems which are biologically plausible, there is a temptation to assume the usefulness of indefinitely large adaptive networks. In contrast to our best guess at Nature’stricks, manmade systems for pattern recognition seem to require nasty amounts of data for training. In short, the design of massively parallel systems is limited by the number of parameters that can be learned with available training data. It is likely that the only way truly massive systems can be built is with the help of prior information, e.g., connection topology and weights that need not be learned [Feldman et al., 1988]. Learning theory [Valiant, 1984; Pearl, 1978] has begun to establish what is possible for trained systems. Order-of-magnitude lower bounds have been established for the number of required measurements to train a desired size feedforward net [Baum & Haussler, 1988]. Rules of thumb suggesting the number of samples required for specific distributions could be useful for practical problems. Widrow has suggested having a training sample size that is 10 times the number of weights in a network (“Uncle Berllie’s Rule”) [Widrow, 1987].
We should be careful to get out of an experience only the wisdom that is in it — and stop there; lest we be like the cat that sits down on a hot stove-lid. She will never sit down on a hot stove-lid again — and that is well; but also she will never sit down on a cold one anymore. - Mark Twain -
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 1994 Springer Science+Business Media New York
About this chapter
Cite this chapter
Bourlard, H.A., Morgan, N. (1994). Cross-validation in MLP Training. In: Connectionist Speech Recognition. The Springer International Series in Engineering and Computer Science, vol 247. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-3210-1_12
Download citation
DOI: https://doi.org/10.1007/978-1-4615-3210-1_12
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-6409-2
Online ISBN: 978-1-4615-3210-1
eBook Packages: Springer Book Archive