Skip to main content
Log in

Outlier detection by means of robust regression estimators for use in engineering science

  • Published:
Journal of Zhejiang University-SCIENCE A Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

This study compares the ability of different robust regression estimators to detect and classify outliers. Well-known estimators with high breakdown points were compared using simulated data. Mean success rates (MSR) were computed and used as comparison criteria. The results showed that the least median of squares (LMS) and least trimmed squares (LTS) were the most successful methods for data that included leverage points, masking and swamping effects or critical and concentrated outliers. We recommend using LMS and LTS as diagnostic tools to classify outliers, because they remain robust even when applied to models that are heavily contaminated or that have a complicated structure of outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Barnett, V., Lewis, T., 1994. Outliers in Statistical Data (3rd Ed.). John Wiley and Sons, New York.

    MATH  Google Scholar 

  • Chen, C., 2002. Robust Regression and Outlier Detection with the ROBUSTREG Procedure. SUGI Paper No.265-27. SAS Institute, Cary, NC.

    Google Scholar 

  • Daniel, C., Wood, F.S., 1971. Fitting Equations to Data. Wiley, New York.

    MATH  Google Scholar 

  • Davies, P.L., 1993. Aspects of robust linear regression. Ann. Stat., 21(4):1843–1899. [doi:10.1214/aos/1176349401]

    Article  MathSciNet  MATH  Google Scholar 

  • Davies, P.L., Gather, U., 2005. Breakdown and groups with discussion and rejoinder. Ann. Stat., 33(3):977–1035. [doi:10.1214/009053604000001138]

    Article  MathSciNet  MATH  Google Scholar 

  • Donoho, D.L., 1982. Breakdown Properties of Multivariate Location Estimators. PhD Qualifying Paper, Harvard University, Boston.

    Google Scholar 

  • Donoho, D.L., Huber, P.J., 1983. The Notion of Breakdown Point. In: Bickel, P.J., Doksum, K., Hodges, J.L.J. (Eds.), A Festschrift for Erich L. Lehmann. Wadsworth, Belmont, p.157–184.

    Google Scholar 

  • Gather, U., Hilker, T., 1997. A note on Tyler’s modification of the MAD for the Stahel-Donoho estimator. Ann. Stat., 25(5):2024–2026. [doi:10.1214/aos/1069362384]

    Article  MathSciNet  MATH  Google Scholar 

  • Hadi, A.S., Simonoff, J.S., 1993. Procedures for the identification of multiple outliers in linear models. J. Am. Stat. Assoc., 88(424):1264–1272. [doi:10.2307/2291266]

    Article  MathSciNet  Google Scholar 

  • Hampel, F.R., 1968. Contributions to the Theory of Robust Estimation. PhD Thesis, University of California, Berkeley.

    Google Scholar 

  • Hampel, F.R., 1971. A general qualitative definition of robustness. Ann. Math. Stat., 42(6):1887–1896. [doi:10.1214/aoms/1177693054]

    Article  MathSciNet  MATH  Google Scholar 

  • Hampel, F.R., 1975. Beyond location parameters: robust concepts and methods (with discussion). Bull. Inst. Int. Stat., 46:375–391.

    MathSciNet  MATH  Google Scholar 

  • Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.R., Shatel, W.A., 1986. Robust Statistics: The Approach Based on Influence Functions. Wiley, New York.

    Google Scholar 

  • Hekimoglu, S., 1997. Finite sample breakdown points of outlier detection procedures. ASCE J. Surv. Eng., 123(1):15–31. [doi:10.1061/(ASCE)0733-9453(1997)123:1(15)]

    Article  Google Scholar 

  • Hekimoglu, S., 2005. Do robust methods identify outliers more reliably than conventional test for outlier? Zeitschrift für Vermessungwesen, 3:174–180.

    Google Scholar 

  • Hekimoglu, S., Koch, K.R., 1999. How Can Reliability of the Robust Methods Be Measured? In: Altan, M.O., Gründig, L. (Eds.), Third Turkish-German Joint Geodetic Days, 1:179–196.

  • Hekimoglu, S., Erenoglu, R.C., 2005. Estimation of Parameters for Linear Regression Using Median Estimator. Int. Conf. on Robust Statistics, University of Jyvaskyla, Finland, p.26.

    Google Scholar 

  • Hekimoglu, S., Erenoglu, R.C., 2007. Effect of heteroscedasticity and heterogeneousness on outlier detection for geodetic networks. J. Geod., 81(2):137–148. [doi:10.1007/s00190-006-0095-z]

    Article  MATH  Google Scholar 

  • Huber, P.J., 1981. Robust Statistics. John Wiley and Sons, New York.

    Book  MATH  Google Scholar 

  • Kamgar-Parsi, B., Netanyahu, N.S., 1989. A nonparametric method for fitting a straight line to a noisy image. IEEE Trans. Pattern Anal. Mach. Intell., 11(9):998–1001. [doi:10.1109/34.35504]

    Article  Google Scholar 

  • Lopuhaa, H.P., Rousseeuw, P.J., 1991. Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann. Stat., 19(1):229–248. [doi:10.1214/aos/1176347978]

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw, P.J., 1984. Least median of squares regression. J. Am. Stat. Assoc., 79(388):871–880. [doi:10.2307/2288718]

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw, P.J., 1985. Multivariate Estimation with High Breakdown Point. In: Grossman, W., Pflug, G., Vincze, I., Werz, W. (Eds.), Mathematical Statistics and Applications. Reidel, Dordrecht, p.283–297.

    Chapter  Google Scholar 

  • Rousseeuw, P.J., Leroy, A.M., 1987. Robust Regression and Outlier Detection. John Wiley and Sons, New York.

    Book  MATH  Google Scholar 

  • Sen, P.K., 1968. Estimates of the regression coefficient based on Kendall’s tau. J. Am. Stat. Assoc., 63(324):1379–1389. [doi:10.2307/2285891]

    Article  MATH  Google Scholar 

  • Shevlyakov, G.L., Vilchevski, N.O., 2001. Robustness in Data Analysis: Criteria and Methods. VSP International Science Publishers, Utrecht.

    Book  Google Scholar 

  • Siegel, A.F., 1982. Robust regression using repeated medians. Biometrika, 69(1):242–244. [doi:10.1093/biomet/69.1.242]

    Article  MATH  Google Scholar 

  • Stahel, W.A., 1981. Breakdown of Covariance Estimators. Research Rep. 31, Fachgruppe für Statistik, ETH, Zurich.

    Google Scholar 

  • Staudte, R.G., Sheather, S.J., 1990. Robust Estimation and Testing. Wiley, New York.

    Book  MATH  Google Scholar 

  • Stromberg, A.J., 1993. Computing the exact least median of squares estimate and stability diagnostics in multiple linear regression. SIAM J. Sci. Comput., 14(6):1289–1299. [doi:10.1137/0914076]

    Article  MATH  Google Scholar 

  • Theil, H., 1950. A rank-invariant method of linear and polynomial regression analysis. Nederlandse Akademie Wetenchappen Series A, 53:386–392.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Serif Hekimoglu.

Additional information

Project (No. 28-05-03-03) supported by the Yildiz Technical University Research Fund, Turkey

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hekimoglu, S., Erenoglu, R.C. & Kalina, J. Outlier detection by means of robust regression estimators for use in engineering science. J. Zhejiang Univ. Sci. A 10, 909–921 (2009). https://doi.org/10.1631/jzus.A0820140

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.A0820140

Key words

CLC number

Navigation