Optimal weight decay in a perceptron
Weight decay was proposed to reduce overfitting as it often appears in the learning tasks of artificial neural networks. In this paper weight decay is applied to a well defined model system based on a single layer perceptron, which exhibits strong overfitting. Since the optimal non-overfitting solution is known for this system, we can compare the effect of the weight decay with this solution. A strategy to find the optimal weight decay strength is proposed, which leads to the optimal solution for any number of examples.
Unable to display preview. Download preview PDF.
- 1.S. Bös (1995),’ Avoiding overfitting by finite temperature learning and cross-validation', in International Conference on Artificial Neural Networks 95 (ICANN'95), edited by EC2 & Cie, Vol.2, p.111–116.Google Scholar
- 2.S. Bös (1996),’ A realizable learning task which exhibits overfitting', in Advances in Neural Information Processing Systems 8 (@#@ NIPS*95 @#@), editors D. Touretzky, M. Mozer, and M. Hasselmo, MIT Press, Cambridge MA, in press.Google Scholar
- 3.J. Hertz, A. Krogh, and R.G. Palmer (1991), Introduction to the Theory of Neural Computation, Addison-Wesley, Reading.Google Scholar
- 4.A. Krogh, and J. Hertz (1992),’ A simple weight decay can improve generalization', in Advances in Neural Information Processing Systems 4, editors J.E. Moody, S.J. Hanson and R.J. Lippmann, Kaufmann, San Mateo CA, p.950–957.Google Scholar