Abstract
We consider a classical regression model contaminated by multiple outliers arising simultaneously from mean-shift and variance-inflation mechanisms—which are generally considered as alternative. Identifying multiple outliers leads to computational challenges in the usual variance-inflation framework. We propose the use of robust estimation techniques to identify outliers arising from each mechanism, and we rely on restricted maximum likelihood estimation to accommodate variance-inflated outliers into the model. Furthermore, we introduce diagnostic plots which help to guide the analysis. We compare classical and robust methods with our novel approach on both simulated and real data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Since FSR only aims to trim observations, the default settings can be too weak to separate coexisting VIOM and MSOM outliers—as we wish to do here.
- 2.
Full documentation of all the functions inside the FSDA Toolbox can be found at https://rosa.unipr.it/FSDA/guide.html. For example, to obtain the documentation of our VIOM function, it is necessary to type https://rosa.unipr.it/FSDA/VIOM.html.
- 3.
Note here we are not using any consistency factor in \(\hat {s}^2\). Consistency factors are often used in robust estimation (Maronna et al., 2006).
- 4.
This is a feature “inherited” from our use of FSR. As the sample size increases, FSR becomes better (likely due to its strong consistency) at detecting and thus trimming all outliers (both VIOM and MSOM). Hence, when n is large, the first signal (which is the same both for FSR and FSRws) can occur while still including clean observations. In the strong contamination scenario (50% total fraction of contamination) which we consider next, we do not notice such phenomenon because we are forcing FSR to find signals in the second half of the search (i.e., we force 50% of the weights to be = 1, motivated by our assumption A1).
- 5.
Here, due to the small sample size, FSR does not detect any signal. The FSRws fit shown in the figure is based on “manual” detection; the two down-weighted observations also correspond to the two residuals exceeding 90% confidence intervals in a LMS fit.
References
A.C. Atkinson, Plots, Transformations, and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis (Clarendon Press, Oxford, 1985)
A.C. Atkinson, M. Riani, Robust Diagnostic Regression Analysis (Springer, New York, 2000)
A.C. Atkinson, M. Riani, Distribution theory and simulations for tests of outliers in regression. J. Comput. Graph. Stat. 15(2), 460–476 (2006)
A.C. Atkinson, M. Riani, F. Torti, Robust methods for heteroskedastic regression. Comput. Stat. Data Anal. 104, 209–222 (2016)
V. Barnett, T. Lewis, Outliers in Statistical Data (Wiley, 1974)
R.J. Beckman, R.D. Cook, Outlier.......... s. Technometrics 25(2), 119–149 (1983)
D.A. Belsley, E. Kuh, R.E. Welsch, Regression Diagnostics - Identifying Influential Data and Sources of Collinearity (Wiley-Interscience, New York, 2004)
D. Bertsimas, R. Mazumder, Least quantile regression via modern optimization. Ann. Stat., 2494–2525 (2014)
D. Bertsimas, A. King, R. Mazumder, et al., Best subset selection via a modern optimization lens. Ann. Stat. 44(2), 813–852 (2016)
G.E. Box, G.C. Tiao, A bayesian approach to some outlier problems. Biometrika 55(1), 119–129 (1968)
A. Cerioli, A. Farcomeni, M. Riani, Strong consistency and robustness of the forward search estimator of multivariate location and scatter. J. Multivariate Anal. 126, 167–183 (2014)
A. Cerioli, A.C. Atkinson, M. Riani, How to marry robustness and applied statistics, in Topics on Methodological and Applied Statistical Inference (Springer, 2016), pp. 51–64
A. Cerioli, M. Riani, A.C. Atkinson, A. Corbellini, The power of monitoring: how to make the most of a contaminated multivariate sample. Stat. Methods Appl., 1–29 (2018)
S. Chatterjee, A.S. Hadi, Sensitivity Analysis in Linear Regression (Wiley, New York, 1988)
R.D. Cook, Detection of influential observation in linear regression. Technometrics 19(1), 15–18 (1977)
R.D. Cook, Assessment of local influence. J. Roy. Stat. Soc. B (Methodological) 48(2), 133–155 (1986)
R.D. Cook, S. Weisberg, Residuals and Influence in Regression (Chapman and Hall, New York, 1982)
R.D. Cook, N. Holschuh, S. Weisberg, A note on an alternative outlier model. J. Roy. Stat. Soc. B (Methodological) 44(3), 370–376 (1982)
B. De Finetti, The bayesian approach to the rejection of outliers, in Proceedings of the fourth Berkeley Symposium on Probability and Statistics, vol. 1 (University of California Press, Berkeley, 1961), pp. 199–210
D.L. Donoho, P.J. Huber, The notion of breakdown point, in A Festschrift for Erich L. Lehmann, ed. by P. Bickel, K.A. Doksum, J.L. Hodges (Wadsworth, Belmont, California, 1983), pp. 157–184
F.N. Gumedze, Use of likelihood ratio tests to detect outliers under the variance shift outlier model. J. Appl. Stat. 46(4), 598–620 (2019)
F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, W.A. Stahel, Robust Statistics: The Approach Based on Influence Functions (Wiley, New York, 1986)
D.A. Harville, Maximum likelihood approaches to variance component estimation and to related problems. J. Am. Stat. Assoc. 72(358), 320–338 (1977)
P.J. Huber, E.M. Ronchetti, Robust Statistics (Wiley, New Jersey, 2009)
S. Johansen, B. Nielsen, et al., Analysis of the forward search using some new results for martingales and empirical processes. Bernoulli 22(2), 1131–1183 (2016)
R.A. Maronna, R.D. Martin, V.J. Yohai, Robust Statistics: Theory and Methods (Wiley, 2006)
L. McCann, et al., Robust model selection and outlier detection in linear regressions. Ph.D. thesis, Massachusetts Institute of Technology (2006)
R.S. Menjoge, R.E. Welsch, A diagnostic method for simultaneous feature selection and outlier identification in linear regression. Comput. Stat. Data Anal. 54(12), 3181–3193 (2010)
D. Perrotta, F. Torti, Detecting price outliers in european trade data with the forward search, in Data Analysis and Classification (Springer, 2010), pp. 415–423
M. Riani, A.C. Atkinson, Fast calibrations of the forward search for testing multiple outliers in regression. Adv. Data Anal. Classif. 1(2), 123–141 (2007)
M. Riani, A.C. Atkinson, A. Cerioli, Finding an unknown number of multivariate outliers. J. Roy. Stat. Soc. B (Stat. Methodol.) 71(2), 447–466 (2009)
M. Riani, D. Perrotta, F. Torti, FSDA: A MATLAB toolbox for robust analysis and interactive data exploration. Chemom. Intell. Lab. Syst. 116, 17–32 (2012)
M. Riani, A. Cerioli, A.C. Atkinson, D. Perrotta, et al., Monitoring robust regression. Electron. J. Stat. 8(1), 646–677 (2014)
P.J. Rousseeuw, Least median of squares regression. J. Am. Stat. Assoc. 79(388), 871–880 (1984)
P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection (Wiley, New York, 1987)
P.J. Rousseeuw, K. Van Driessen, Computing lts regression for large data sets. Data Min. Knowl. Disc. 12(1), 29–45 (2006)
P.J. Rousseeuw, B.C. Van Zomeren, Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85(411), 633–639 (1990)
P.J. Rousseeuw, V.J. Yohai, Robust regression by means of s-estimators, in Robust and Nonlinear Time Series Analysis (Springer, 1984), pp. 256–272
Y. She, A.B. Owen, Outlier detection using nonconvex penalized regression. J. Am. Stat. Assoc. 106(494), 626–639 (2011)
R. Thompson, A note on restricted maximum likelihood estimation with an alternative outlier model. J. Roy. Stat. Soc. B (Methodological) 47(1), 53–55 (1985)
V. Todorov, E. Sordini, fsdaR: Robust Data Analysis Through Monitoring and Dynamic Visualization. https://CRAN.R-project.org/package=fsdaR, R package version 0.4–9 (2020)
V.J. Yohai, High breakdown-point and high efficiency robust estimates for regression. Ann. Stat., 642–656 (1987)
V.J. Yohai, R. Zamar, High breakdown-point estimates of regression by means of the minimization of an efficient scale. J. Am. Stat. Assoc. 83(402), 406–413 (1988)
Acknowledgements
This research benefitted from the High Performance Computing facility of the University of Parma. M.R. acknowledges financial support from the project “Statistics for fraud detection, with applications to trade data and financial statement” of the University of Parma. F.C. acknowledges financial support from the Huck Institutes of the Life Sciences of the Pennsylvania State University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Insolia, L., Chiaromonte, F., Riani, M. (2021). A Robust Estimation Approach for Mean-Shift and Variance-Inflation Outliers. In: Bura, E., Li, B. (eds) Festschrift in Honor of R. Dennis Cook. Springer, Cham. https://doi.org/10.1007/978-3-030-69009-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-69009-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69008-3
Online ISBN: 978-3-030-69009-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)