Skip to main content

A Robust Estimation Approach for Mean-Shift and Variance-Inflation Outliers

  • Chapter
  • First Online:
Festschrift in Honor of R. Dennis Cook

Abstract

We consider a classical regression model contaminated by multiple outliers arising simultaneously from mean-shift and variance-inflation mechanisms—which are generally considered as alternative. Identifying multiple outliers leads to computational challenges in the usual variance-inflation framework. We propose the use of robust estimation techniques to identify outliers arising from each mechanism, and we rely on restricted maximum likelihood estimation to accommodate variance-inflated outliers into the model. Furthermore, we introduce diagnostic plots which help to guide the analysis. We compare classical and robust methods with our novel approach on both simulated and real data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Since FSR only aims to trim observations, the default settings can be too weak to separate coexisting VIOM and MSOM outliers—as we wish to do here.

  2. 2.

    Full documentation of all the functions inside the FSDA Toolbox can be found at https://rosa.unipr.it/FSDA/guide.html. For example, to obtain the documentation of our VIOM function, it is necessary to type https://rosa.unipr.it/FSDA/VIOM.html.

  3. 3.

    Note here we are not using any consistency factor in \(\hat {s}^2\). Consistency factors are often used in robust estimation (Maronna et al., 2006).

  4. 4.

    This is a feature “inherited” from our use of FSR. As the sample size increases, FSR becomes better (likely due to its strong consistency) at detecting and thus trimming all outliers (both VIOM and MSOM). Hence, when n is large, the first signal (which is the same both for FSR and FSRws) can occur while still including clean observations. In the strong contamination scenario (50% total fraction of contamination) which we consider next, we do not notice such phenomenon because we are forcing FSR to find signals in the second half of the search (i.e., we force 50% of the weights to be = 1, motivated by our assumption A1).

  5. 5.

    Here, due to the small sample size, FSR does not detect any signal. The FSRws fit shown in the figure is based on “manual” detection; the two down-weighted observations also correspond to the two residuals exceeding 90% confidence intervals in a LMS fit.

References

  • A.C. Atkinson, Plots, Transformations, and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis (Clarendon Press, Oxford, 1985)

    MATH  Google Scholar 

  • A.C. Atkinson, M. Riani, Robust Diagnostic Regression Analysis (Springer, New York, 2000)

    Book  Google Scholar 

  • A.C. Atkinson, M. Riani, Distribution theory and simulations for tests of outliers in regression. J. Comput. Graph. Stat. 15(2), 460–476 (2006)

    Article  MathSciNet  Google Scholar 

  • A.C. Atkinson, M. Riani, F. Torti, Robust methods for heteroskedastic regression. Comput. Stat. Data Anal. 104, 209–222 (2016)

    Article  MathSciNet  Google Scholar 

  • V. Barnett, T. Lewis, Outliers in Statistical Data (Wiley, 1974)

    Google Scholar 

  • R.J. Beckman, R.D. Cook, Outlier.......... s. Technometrics 25(2), 119–149 (1983)

    MathSciNet  MATH  Google Scholar 

  • D.A. Belsley, E. Kuh, R.E. Welsch, Regression Diagnostics - Identifying Influential Data and Sources of Collinearity (Wiley-Interscience, New York, 2004)

    MATH  Google Scholar 

  • D. Bertsimas, R. Mazumder, Least quantile regression via modern optimization. Ann. Stat., 2494–2525 (2014)

    Google Scholar 

  • D. Bertsimas, A. King, R. Mazumder, et al., Best subset selection via a modern optimization lens. Ann. Stat. 44(2), 813–852 (2016)

    Article  MathSciNet  Google Scholar 

  • G.E. Box, G.C. Tiao, A bayesian approach to some outlier problems. Biometrika 55(1), 119–129 (1968)

    Article  MathSciNet  Google Scholar 

  • A. Cerioli, A. Farcomeni, M. Riani, Strong consistency and robustness of the forward search estimator of multivariate location and scatter. J. Multivariate Anal. 126, 167–183 (2014)

    Article  MathSciNet  Google Scholar 

  • A. Cerioli, A.C. Atkinson, M. Riani, How to marry robustness and applied statistics, in Topics on Methodological and Applied Statistical Inference (Springer, 2016), pp. 51–64

    Google Scholar 

  • A. Cerioli, M. Riani, A.C. Atkinson, A. Corbellini, The power of monitoring: how to make the most of a contaminated multivariate sample. Stat. Methods Appl., 1–29 (2018)

    Google Scholar 

  • S. Chatterjee, A.S. Hadi, Sensitivity Analysis in Linear Regression (Wiley, New York, 1988)

    Book  Google Scholar 

  • R.D. Cook, Detection of influential observation in linear regression. Technometrics 19(1), 15–18 (1977)

    MathSciNet  MATH  Google Scholar 

  • R.D. Cook, Assessment of local influence. J. Roy. Stat. Soc. B (Methodological) 48(2), 133–155 (1986)

    Google Scholar 

  • R.D. Cook, S. Weisberg, Residuals and Influence in Regression (Chapman and Hall, New York, 1982)

    MATH  Google Scholar 

  • R.D. Cook, N. Holschuh, S. Weisberg, A note on an alternative outlier model. J. Roy. Stat. Soc. B (Methodological) 44(3), 370–376 (1982)

    Google Scholar 

  • B. De Finetti, The bayesian approach to the rejection of outliers, in Proceedings of the fourth Berkeley Symposium on Probability and Statistics, vol. 1 (University of California Press, Berkeley, 1961), pp. 199–210

    Google Scholar 

  • D.L. Donoho, P.J. Huber, The notion of breakdown point, in A Festschrift for Erich L. Lehmann, ed. by P. Bickel, K.A. Doksum, J.L. Hodges (Wadsworth, Belmont, California, 1983), pp. 157–184

    Google Scholar 

  • F.N. Gumedze, Use of likelihood ratio tests to detect outliers under the variance shift outlier model. J. Appl. Stat. 46(4), 598–620 (2019)

    Article  MathSciNet  Google Scholar 

  • F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, W.A. Stahel, Robust Statistics: The Approach Based on Influence Functions (Wiley, New York, 1986)

    MATH  Google Scholar 

  • D.A. Harville, Maximum likelihood approaches to variance component estimation and to related problems. J. Am. Stat. Assoc. 72(358), 320–338 (1977)

    Article  MathSciNet  Google Scholar 

  • P.J. Huber, E.M. Ronchetti, Robust Statistics (Wiley, New Jersey, 2009)

    Book  Google Scholar 

  • S. Johansen, B. Nielsen, et al., Analysis of the forward search using some new results for martingales and empirical processes. Bernoulli 22(2), 1131–1183 (2016)

    Article  MathSciNet  Google Scholar 

  • R.A. Maronna, R.D. Martin, V.J. Yohai, Robust Statistics: Theory and Methods (Wiley, 2006)

    Google Scholar 

  • L. McCann, et al., Robust model selection and outlier detection in linear regressions. Ph.D. thesis, Massachusetts Institute of Technology (2006)

    Google Scholar 

  • R.S. Menjoge, R.E. Welsch, A diagnostic method for simultaneous feature selection and outlier identification in linear regression. Comput. Stat. Data Anal. 54(12), 3181–3193 (2010)

    Article  MathSciNet  Google Scholar 

  • D. Perrotta, F. Torti, Detecting price outliers in european trade data with the forward search, in Data Analysis and Classification (Springer, 2010), pp. 415–423

    Google Scholar 

  • M. Riani, A.C. Atkinson, Fast calibrations of the forward search for testing multiple outliers in regression. Adv. Data Anal. Classif. 1(2), 123–141 (2007)

    Article  MathSciNet  Google Scholar 

  • M. Riani, A.C. Atkinson, A. Cerioli, Finding an unknown number of multivariate outliers. J. Roy. Stat. Soc. B (Stat. Methodol.) 71(2), 447–466 (2009)

    Google Scholar 

  • M. Riani, D. Perrotta, F. Torti, FSDA: A MATLAB toolbox for robust analysis and interactive data exploration. Chemom. Intell. Lab. Syst. 116, 17–32 (2012)

    Article  Google Scholar 

  • M. Riani, A. Cerioli, A.C. Atkinson, D. Perrotta, et al., Monitoring robust regression. Electron. J. Stat. 8(1), 646–677 (2014)

    Article  MathSciNet  Google Scholar 

  • P.J. Rousseeuw, Least median of squares regression. J. Am. Stat. Assoc. 79(388), 871–880 (1984)

    Article  MathSciNet  Google Scholar 

  • P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection (Wiley, New York, 1987)

    Book  Google Scholar 

  • P.J. Rousseeuw, K. Van Driessen, Computing lts regression for large data sets. Data Min. Knowl. Disc. 12(1), 29–45 (2006)

    Article  MathSciNet  Google Scholar 

  • P.J. Rousseeuw, B.C. Van Zomeren, Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85(411), 633–639 (1990)

    Article  Google Scholar 

  • P.J. Rousseeuw, V.J. Yohai, Robust regression by means of s-estimators, in Robust and Nonlinear Time Series Analysis (Springer, 1984), pp. 256–272

    Google Scholar 

  • Y. She, A.B. Owen, Outlier detection using nonconvex penalized regression. J. Am. Stat. Assoc. 106(494), 626–639 (2011)

    Article  MathSciNet  Google Scholar 

  • R. Thompson, A note on restricted maximum likelihood estimation with an alternative outlier model. J. Roy. Stat. Soc. B (Methodological) 47(1), 53–55 (1985)

    Google Scholar 

  • V. Todorov, E. Sordini, fsdaR: Robust Data Analysis Through Monitoring and Dynamic Visualization. https://CRAN.R-project.org/package=fsdaR, R package version 0.4–9 (2020)

  • V.J. Yohai, High breakdown-point and high efficiency robust estimates for regression. Ann. Stat., 642–656 (1987)

    Google Scholar 

  • V.J. Yohai, R. Zamar, High breakdown-point estimates of regression by means of the minimization of an efficient scale. J. Am. Stat. Assoc. 83(402), 406–413 (1988)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research benefitted from the High Performance Computing facility of the University of Parma. M.R. acknowledges financial support from the project “Statistics for fraud detection, with applications to trade data and financial statement” of the University of Parma. F.C. acknowledges financial support from the Huck Institutes of the Life Sciences of the Pennsylvania State University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luca Insolia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Insolia, L., Chiaromonte, F., Riani, M. (2021). A Robust Estimation Approach for Mean-Shift and Variance-Inflation Outliers. In: Bura, E., Li, B. (eds) Festschrift in Honor of R. Dennis Cook. Springer, Cham. https://doi.org/10.1007/978-3-030-69009-0_2

Download citation

Publish with us

Policies and ethics