Skip to main content
Log in

Estimation in linear models using gradient descent with early stopping

  • Discussion Paper
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

A new shrinkage estimator of the coefficients of a linear model is derived. The estimator is motivated by the gradient-descent algorithm used to minimize the sum of squared errors and results from early stopping of the algorithm. The statistical properties of the estimator are examined and compared with other well-established methods such as least squares and ridge regression, both analytically and through a simulation study. An important result is that the new estimator is shown to be comparable to other shrinkage estimators in terms of mean squared error of parameters and of predictions, and superior under certain circumstances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Allen, D. M. (1971) Mean square error of prediction as a criterion for selecting variables. Technometrics, 13, 469–481.

    Google Scholar 

  • Bramson, M. J. and Hoptroff, R. G. (1991) Forecasting the economic cycle: A neural network approach. In Neural Networks for Statistical and Economic Data. Eurostat, Luxembourg.

    Google Scholar 

  • Chawla, J. S. (1988) A note on general ridge estimator. Communications in Statistics—Theory and Methods, 17, 739–744.

    Google Scholar 

  • Golub, G. H., Heath M. and Wahba, G. (1979) Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21, 215–223.

    Google Scholar 

  • Hertz, J., Krogh, A. and Palmer, R. G. (1991) Introduction to the Theory of Neural Computation. Addison-Wesley, Reading, MA.

    Google Scholar 

  • Hoerl, A. E. and Kennard, R. W. (1970a) Ridge regression: applications to nonorthogonal problems. Technometrics, 12, 69–83.

    Google Scholar 

  • Hoerl, A. E. and Kennard, R. W. (1970b) Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12, 55–67.

    Google Scholar 

  • Riley, J. D. (1955) Solving systems of linear equations with a positive definite, symmetric, but possibly ill-conditioned matrix. Mathematics of Computation, 9, 96–101.

    Google Scholar 

  • Ripley, B. D. (1993) Statistical aspects of neural networks. In Networks and Chaos—Statistical and Probabilistic Aspects. Chapman and Hall, London.

    Google Scholar 

  • Schoner, W. (1992) Reaching the generalization maximum of backpropagation networks. In International Conference on Artificial Neural Networks 1992, pp. 91–94. Elsevier Science, London.

    Google Scholar 

  • Stone, M. (1974) Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, B, 36, 111–147.

    Google Scholar 

  • Theobald, C. M. (1974) Generalizations of mean square error applied to ridge regression. Journal of the Royal Statistical Society, B, 36, 103–106.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Supported by the Greek State Scholarships Foundation

Rights and permissions

Reprints and permissions

About this article

Cite this article

Skouras, K., Goutis, C. & Bramson, M.J. Estimation in linear models using gradient descent with early stopping. Stat Comput 4, 271–278 (1994). https://doi.org/10.1007/BF00156750

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00156750

Keywords

Navigation