Skip to main content
  • 1497 Accesses

Abstract

In Chap. 6 we learned how to detect and manage violations of the Gauss-Markov theorem. In this chapter, we consider a related problem—how to accommodate errors that are not normally distributed. Normally distributed errors are not demanded by the Gauss-Markov theorem, but the errors need to be at least approximately normal if we wish to use the normal distribution to test hypotheses about the regression coefficients or construct confidence intervals around them. Fortunately, the central limit theorem tells us that if our criterion is normally distributed, the errors will also be normally distributed with large samples. Normality is less certain with small samples, however, so it is important to examine the residuals to be sure that they are, at least, approximately normal and to take appropriate action if they are found not to be so.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In fact, no distribution is ever “perfectly” normal, so our concern is a relative one.

  2. 2.

    See Chap. 2 for a discussion of the hat matrix and its diagonal elements, called hat values.

  3. 3.

    The term “resistant” is sometimes used to refer to an estimator that retains its value in the face of extreme observations, with a robust estimator being one that is resistant and efficient. The two terms are now used more or less interchangeably, and I will distinguish them only when it is important to do so.

  4. 4.

    Details regarding another resistant estimator, Least Trimmed Squares, can be found in Rousseeuw and Leroy (1987).

  5. 5.

    Least Absolute Regression is also known as Least Absolute Deviation Regression, L1 Norm Regression, and Quantile Regression (when using the median).

  6. 6.

    The open brackets in the calculation of h indicate that we are to use the lowest integer (i.e., floor).

  7. 7.

    The number of possible combinations is n! /[(n − p)!  ∗ p!], so combinations need to be randomly sampled from the data with large samples.

  8. 8.

    The value of 1.4826 in Eq. (7.12) is chosen so that when n is large and the errors are normally distributed, s closely approximates the standard deviation of the residuals from an OLS regression.

  9. 9.

    The tuning constants k in Eqs. (7.13) and (7.14) are used because they have been shown to produce estimates that possess 95% efficiency.

  10. 10.

    Bisquare weights perform even better in our example, producing a regression slope that is virtually identical to the one found with the final observation omitted (b = .2438).

  11. 11.

    The bootstrap samples are formed randomly, so your results will not exactly match the ones in the text. Additionally, because our sample size is so small, the estimation might fail to converge.

  12. 12.

    These observations provide the best scale value.

References

  • Andersen, R. (2008). Modern methods for robust regression. Los Angeles: Sage.

    Book  Google Scholar 

  • Efron, B., & Tibshirani, R. (1994). An introduction to the bootstrap. New York: Chapman & Hall.

    MATH  Google Scholar 

  • Rousseeuw, P. J., & Leroy, A. M. (1987). Robust regression and outlier detection. New York: Wiley.

    Book  Google Scholar 

  • Rousseeuw, P. J., & Yohai, V. J. (1984). Robust regression by means of S estimators. In J. Franke, W. Härdle, & R. D. Martin (Eds.), Robust and nonlinear time series: Lecture notes in statistics, 26 (pp. 256–272). New York: Springer-Verlag.

    Chapter  Google Scholar 

  • Salibian-Barrera, M., & Yohai, V. (2006). A fast algorithm for S-regression estimates. Journal of Computational and Graphical Statistics, 15, 414–427.

    Article  MathSciNet  Google Scholar 

  • Stephens, M. A. (1986). Tests based on EDF statistics. In R. B. d’Agostino & M. A. Stephens (Eds.), Goodness-of-fit techniques (pp. 97–193). New York: Marcel Dekker.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Brown, J.D. (2018). Robust Regression. In: Advanced Statistics for the Behavioral Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-93549-2_7

Download citation

Publish with us

Policies and ethics