Abstract
The presence of groups containing high leverage outliers makes linear regression a difficult problem due to the masking effect. The available high breakdown estimators based on Least Trimmed Squares often do not succeed in detecting masked high leverage outliers in finite samples. An alternative to the LTS estimator, called Penalised Trimmed Squares (PTS) estimator, was introduced by the authors in Zioutas and Avramidis (2005) Acta Math Appl Sin 21:323–334, Zioutas et al. (2007) REVSTAT 5:115–136 and it appears to be less sensitive to the masking problem. This estimator is defined by a Quadratic Mixed Integer Programming (QMIP) problem, where in the objective function a penalty cost for each observation is included which serves as an upper bound on the residual error for any feasible regression line. Since the PTS does not require presetting the number of outliers to delete from the data set, it has better efficiency with respect to other estimators. However, due to the high computational complexity of the resulting QMIP problem, exact solutions for moderately large regression problems is infeasible. In this paper we further establish the theoretical properties of the PTS estimator, such as high breakdown and efficiency, and propose an approximate algorithm called Fast-PTS to compute the PTS estimator for large data sets efficiently. Extensive computational experiments on sets of benchmark instances with varying degrees of outlier contamination, indicate that the proposed algorithm performs well in identifying groups of high leverage outliers in reasonable computational time.
Similar content being viewed by others
References
Agostinelli C, Markatou M (1998) A one-step robust estimator for regression based on the weighted likelihood re-weighting scheme. Stat Probab Lett 37: 342–350
Agulló J (2001) New algorithms for computing the least trimmed squares regression estimator. Comput Stat Data Anal 36: 425–439
Atkinson AC (1994) Fast very robust methods for the detection of multiple outliers. J Am Stat Assoc 89: 1329–1339
Atkinson AC, Cheng T-C (1999) Computing least trimmed squares regression with the forward search. Stat Comput 9: 251–263
Bazaraa MS, Sherali HD, Shetty CM (1993) Nonlinear programming: theory and algorithms. Wiley, London
Billor N, Chatterjee S, Hadi AS (2006) A re-weighted least squares method for robust regression estimation. Am J Math Manag Sci 26: 229–252
Billor N, Hadi AS, Velleman PF (2000) BACON: blocked adaptive computationally efficient outlier nominators. Comput Stat Data Anal 34: 279–298
Billor N, Kiral G (2008) A comparison of multiple outlier detection methods for regression data. Commun Stat Simul Comput 37: 521–545
Coakley CW, Hettmansperger TP (1993) A bounded influence, high breakdown, efficient regression estimator. J Am Stat Assoc 88: 872–880
Donoho DL, Huber PJ (1983) The notion of breakdown point. In: A Festschrift for Erich Lehmann, Belmont, CA, Wadsworth
Feo TA, Resende MGC (1995) Greedy randomized adaptive search procedures. J Glob Optim 6: 109–133
Gentleman JF, Wilk MB (1975) Detecting outliers ii: supplementing the direct analysis of residuals. Biometrics 31: 387–410
Gervini D, Yohai VJ (2002) A class of robust and fully efficient regression estimators. Ann Stat 30: 583–616
Giloni A, Padberg M (2002) Least trimmed squares regression, least median squares regression, and mathematical programming. Math Comput Model 35: 1043–1060
Hadi AS, Simonoff JS (1993) Procedures for the identification of multiple outliers in linear models. J Am Stat Assoc 88: 1264–1272
Hawkins DM (1994) The feasible solution algorithm for least trimmed squares regression. Comput Stat Data Anal 17: 185–196
Hawkins DM, Bradu D, Kass GV (1984) Location of several outliers in multiple regression data using elemental sets. Technometrics 26: 197–208
Hawkins DM, Olive DJ (1999) Improved feasible solution algorithms for high breakdown estimation. Comput Stat Data Anal 30: 1–11
Hössjer O (1995) Exact computation of the least trimmed squares estimate in simple linear regression. Comput Stat Data Anal 19: 265–282
Li LM (2005) An algorithm for computing exact least-trimmed squares estimate of simple linear regression with constraints. Comput Stat Data Anal 48: 717–734
Peña D, Yohai VJ (1995) The detection of influential subsets in linear regression by using an influence matrix. J R Stat Soc Series B 57: 145–156
Peña D, Yohai VJ (1999) A fast procedure for outlier diagnostics in large regression problems. J Am Stat Assoc 94: 434–445
Pitsoulis LS, Resende MGC (2002) Greedy randomized adaptive search procedures. In: Pardalos PM, Resende MGC (eds) Handbook of applied optimization. Oxford University Press, Oxford, pp 168–183
Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79: 871–880
Rousseeuw PJ, van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41: 212–223
Rousseeuw PJ, van Driessen K (2006) Computing LTS regression for large data sets. Data Min Knowl Discov 12: 29–45
Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley, New York
Rousseeuw PJ, Yohai VJ (1984) Robust regression by means of s-estimators. In: Robust and nonlinear time series analysis. Springer, pp 256–272
Rousseeuw RJ, van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85: 633–639
Salibian-Barrera M, Yohai VJ (2006) A fast algorithm for s-regression estimates. J Comput Graph Stat 15(2): 414–427
Sebert DM, Montgomery DC, Rollier DA (1998) A clustering algorithm for identifying multiple outliers in linear regression. Comput Stat Data Anal 27: 461–484
Yohai VJ (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15: 642–656
Yohai VJ, Zamar RH (1988) High breakdown point estimates of regression by means of the minimization of an efficient scale. J Am Stat Assoc 83: 406–413
Zioutas G, Avramidis A (2005) Deleting outliers in robust regression with mixed integer programming. Acta Math Appl Sin 21: 323–334
Zioutas G, Avramidis A, Pitsoulis L (2007) Penalized trimmed squares and a modification of support vectors for unmasking outliers in linear regression. REVSTAT 5: 115–136
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pitsoulis, L., Zioutas, G. A fast algorithm for robust regression with penalised trimmed squares. Comput Stat 25, 663–689 (2010). https://doi.org/10.1007/s00180-010-0196-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-010-0196-2