Skip to main content
Log in

A fast algorithm for robust regression with penalised trimmed squares

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The presence of groups containing high leverage outliers makes linear regression a difficult problem due to the masking effect. The available high breakdown estimators based on Least Trimmed Squares often do not succeed in detecting masked high leverage outliers in finite samples. An alternative to the LTS estimator, called Penalised Trimmed Squares (PTS) estimator, was introduced by the authors in Zioutas and Avramidis (2005) Acta Math Appl Sin 21:323–334, Zioutas et al. (2007) REVSTAT 5:115–136 and it appears to be less sensitive to the masking problem. This estimator is defined by a Quadratic Mixed Integer Programming (QMIP) problem, where in the objective function a penalty cost for each observation is included which serves as an upper bound on the residual error for any feasible regression line. Since the PTS does not require presetting the number of outliers to delete from the data set, it has better efficiency with respect to other estimators. However, due to the high computational complexity of the resulting QMIP problem, exact solutions for moderately large regression problems is infeasible. In this paper we further establish the theoretical properties of the PTS estimator, such as high breakdown and efficiency, and propose an approximate algorithm called Fast-PTS to compute the PTS estimator for large data sets efficiently. Extensive computational experiments on sets of benchmark instances with varying degrees of outlier contamination, indicate that the proposed algorithm performs well in identifying groups of high leverage outliers in reasonable computational time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agostinelli C, Markatou M (1998) A one-step robust estimator for regression based on the weighted likelihood re-weighting scheme. Stat Probab Lett 37: 342–350

    Article  MathSciNet  Google Scholar 

  • Agulló J (2001) New algorithms for computing the least trimmed squares regression estimator. Comput Stat Data Anal 36: 425–439

    Article  MATH  Google Scholar 

  • Atkinson AC (1994) Fast very robust methods for the detection of multiple outliers. J Am Stat Assoc 89: 1329–1339

    Article  MATH  Google Scholar 

  • Atkinson AC, Cheng T-C (1999) Computing least trimmed squares regression with the forward search. Stat Comput 9: 251–263

    Article  Google Scholar 

  • Bazaraa MS, Sherali HD, Shetty CM (1993) Nonlinear programming: theory and algorithms. Wiley, London

    MATH  Google Scholar 

  • Billor N, Chatterjee S, Hadi AS (2006) A re-weighted least squares method for robust regression estimation. Am J Math Manag Sci 26: 229–252

    MathSciNet  Google Scholar 

  • Billor N, Hadi AS, Velleman PF (2000) BACON: blocked adaptive computationally efficient outlier nominators. Comput Stat Data Anal 34: 279–298

    Article  MATH  Google Scholar 

  • Billor N, Kiral G (2008) A comparison of multiple outlier detection methods for regression data. Commun Stat Simul Comput 37: 521–545

    Article  MATH  Google Scholar 

  • Coakley CW, Hettmansperger TP (1993) A bounded influence, high breakdown, efficient regression estimator. J Am Stat Assoc 88: 872–880

    Article  MATH  MathSciNet  Google Scholar 

  • Donoho DL, Huber PJ (1983) The notion of breakdown point. In: A Festschrift for Erich Lehmann, Belmont, CA, Wadsworth

  • Feo TA, Resende MGC (1995) Greedy randomized adaptive search procedures. J Glob Optim 6: 109–133

    Article  MATH  MathSciNet  Google Scholar 

  • Gentleman JF, Wilk MB (1975) Detecting outliers ii: supplementing the direct analysis of residuals. Biometrics 31: 387–410

    Article  MATH  Google Scholar 

  • Gervini D, Yohai VJ (2002) A class of robust and fully efficient regression estimators. Ann Stat 30: 583–616

    Article  MATH  MathSciNet  Google Scholar 

  • Giloni A, Padberg M (2002) Least trimmed squares regression, least median squares regression, and mathematical programming. Math Comput Model 35: 1043–1060

    Article  MATH  MathSciNet  Google Scholar 

  • Hadi AS, Simonoff JS (1993) Procedures for the identification of multiple outliers in linear models. J Am Stat Assoc 88: 1264–1272

    Article  MathSciNet  Google Scholar 

  • Hawkins DM (1994) The feasible solution algorithm for least trimmed squares regression. Comput Stat Data Anal 17: 185–196

    Article  MATH  MathSciNet  Google Scholar 

  • Hawkins DM, Bradu D, Kass GV (1984) Location of several outliers in multiple regression data using elemental sets. Technometrics 26: 197–208

    Article  MathSciNet  Google Scholar 

  • Hawkins DM, Olive DJ (1999) Improved feasible solution algorithms for high breakdown estimation. Comput Stat Data Anal 30: 1–11

    Article  MATH  MathSciNet  Google Scholar 

  • Hössjer O (1995) Exact computation of the least trimmed squares estimate in simple linear regression. Comput Stat Data Anal 19: 265–282

    Article  Google Scholar 

  • Li LM (2005) An algorithm for computing exact least-trimmed squares estimate of simple linear regression with constraints. Comput Stat Data Anal 48: 717–734

    Article  MATH  Google Scholar 

  • Peña D, Yohai VJ (1995) The detection of influential subsets in linear regression by using an influence matrix. J R Stat Soc Series B 57: 145–156

    MATH  Google Scholar 

  • Peña D, Yohai VJ (1999) A fast procedure for outlier diagnostics in large regression problems. J Am Stat Assoc 94: 434–445

    Article  MATH  Google Scholar 

  • Pitsoulis LS, Resende MGC (2002) Greedy randomized adaptive search procedures. In: Pardalos PM, Resende MGC (eds) Handbook of applied optimization. Oxford University Press, Oxford, pp 168–183

    Google Scholar 

  • Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79: 871–880

    Article  MATH  MathSciNet  Google Scholar 

  • Rousseeuw PJ, van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41: 212–223

    Article  Google Scholar 

  • Rousseeuw PJ, van Driessen K (2006) Computing LTS regression for large data sets. Data Min Knowl Discov 12: 29–45

    Article  MathSciNet  Google Scholar 

  • Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley, New York

    Book  MATH  Google Scholar 

  • Rousseeuw PJ, Yohai VJ (1984) Robust regression by means of s-estimators. In: Robust and nonlinear time series analysis. Springer, pp 256–272

  • Rousseeuw RJ, van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85: 633–639

    Article  Google Scholar 

  • Salibian-Barrera M, Yohai VJ (2006) A fast algorithm for s-regression estimates. J Comput Graph Stat 15(2): 414–427

    Article  MathSciNet  Google Scholar 

  • Sebert DM, Montgomery DC, Rollier DA (1998) A clustering algorithm for identifying multiple outliers in linear regression. Comput Stat Data Anal 27: 461–484

    Article  MATH  Google Scholar 

  • Yohai VJ (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15: 642–656

    Article  MATH  MathSciNet  Google Scholar 

  • Yohai VJ, Zamar RH (1988) High breakdown point estimates of regression by means of the minimization of an efficient scale. J Am Stat Assoc 83: 406–413

    Article  MATH  MathSciNet  Google Scholar 

  • Zioutas G, Avramidis A (2005) Deleting outliers in robust regression with mixed integer programming. Acta Math Appl Sin 21: 323–334

    Article  MATH  MathSciNet  Google Scholar 

  • Zioutas G, Avramidis A, Pitsoulis L (2007) Penalized trimmed squares and a modification of support vectors for unmasking outliers in linear regression. REVSTAT 5: 115–136

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to L. Pitsoulis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pitsoulis, L., Zioutas, G. A fast algorithm for robust regression with penalised trimmed squares. Comput Stat 25, 663–689 (2010). https://doi.org/10.1007/s00180-010-0196-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-010-0196-2

Keywords

Navigation