Skip to main content
Log in

Non-asymptotic analysis and inference for an outlyingness induced winsorized mean

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

Robust estimation of a mean vector, a topic regarded as obsolete in the traditional robust statistics community, has recently surged in machine learning literature in the last decade. The latest focus is on the sub-Gaussian performance and computability of the estimators in a non-asymptotic setting. Numerous traditional robust estimators are computationally intractable, which partly contributes to the renewal of the interest in the robust mean estimation. Robust centrality estimators, however, include the trimmed mean and the sample median. The latter has the best robustness but suffers a low efficiency drawback. Trimmed mean and median of means, achieving sub-Gaussian performance have been proposed and studied in the literature. This article investigates the robustness of leading sub-Gaussian estimators of mean and reveals that none of them can resist greater than \(25\%\) contamination in data and consequently introduces an outlyingness induced winsorized mean which has the best possible robustness (can resist up to \(50\%\) contamination without breakdown) meanwhile achieving high efficiency. Furthermore, it has a sub-Gaussian performance for uncontaminated samples and a bounded estimation error for contaminated samples at a given confidence level in a finite sample setting. It can be computed in linear time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alon N, Matias Y, Szegedy M (2002) The space complexity of approximating the frequency moments. J Comput Syst Sci 58:137–147

    Article  MathSciNet  MATH  Google Scholar 

  • Bernstein SN (1946) The theory of probabilities. Gastehizdat Publishing House, Moscow

    Google Scholar 

  • Boucheron S, Lugosi G, Massart P (2013) Concentration inequalities: a nonasymptotic theory of independence. Oxford University Press, Oxford

    Book  MATH  Google Scholar 

  • Catoni O (2012) Challenging the empirical mean and empirical variance: a deviation study. Ann Inst Henri Poincaré, Prob Stat 48(4):1148–1185

    Article  MathSciNet  MATH  Google Scholar 

  • Catoni O, Giulini I (2018) Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector. arXiv preprint arXiv:1802.04308

  • Chen M, Gao C, Ren Z (2018) Robust covariance and scatter matrix estimation under Huber’s contamination model. Ann Stat 46:1932–1960

    Article  MathSciNet  MATH  Google Scholar 

  • Davies PL (1987) Asymptotic behavior of S-estimators of multivariate location parameters and dispersion matrices. Ann Stat 15:1269–1292

    Article  MATH  Google Scholar 

  • Depersin J, Lecué G (2021) On the robustness to adversarial corruption and to heavy-tailed data of the Stahel–Donoho median of means. arXiv:2101.09117v1

  • Diakonikolas I, Kane D (2019) Recent advances in algorithmic high-dimensional robust statistics. arXiv:1911.05911v1

  • Donoho DL (1982) Breakdown properties of multivariate location estimators. Harvard University, PhD Qualifying paper

  • Donoho DL, Huber PJ (1983) A festschrift foe Erich L. Lehmann. In: Bickel PJ, Doksum KA, Hodges JL (eds) The notion of breakdown point. Chapman and Hall, Wadsworth, pp 157–184

    MATH  Google Scholar 

  • Hastie T, Tibshirani R, Wainwright MJ (2015) Statistical learning with sparsity: the lasso and generalizations. CRC Press, Boca Raton

    Book  MATH  Google Scholar 

  • Hsu D (2010) Robust statistics. http://www.inherentuncertainty.org/2010/12/robust-statistics.html

  • Hubert M, Rousseeuw PJ, Van Aelst S (2008) High-breakdown robust multivariate methods. Stat Sci 23(1):92–119

    Article  MathSciNet  MATH  Google Scholar 

  • Jerrum M, Valiant L, Vazirani V (1986) Random generation of combinatorial structures from a uniform distribution. Theor Comput Sci 43:186–188

    Article  MathSciNet  MATH  Google Scholar 

  • Lerasle M (2019) Selected topics on robust statistical learning theory, Lecture Notes. arXiv:1908.10761v1

  • Lerasle M, Oliveira RI (2011) Robust empirical mean estimators. Preprint. Available at arXiv:1112.3914

  • Liu X (2017) Approximating projection depth median of dimensions \(p \ge 3\). Commun Stat Simul C 46:3756–3768

    MathSciNet  MATH  Google Scholar 

  • Lopuhaä HP, Rousseeuw J (1991) Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann Statist 19:229–248

    Article  MathSciNet  MATH  Google Scholar 

  • Lecué G, Lerasle M (2020) Robust machine learning by median-of-means: theory and practice. Ann Statist 48:906–931

    Article  MathSciNet  MATH  Google Scholar 

  • Lugosi G, Mendelson S (2019) Mean estimation and regression under heavy-tailed distributions: a survey. Found Comput Math 19:1145–1190

    Article  MathSciNet  MATH  Google Scholar 

  • Lugosi G, Mendelson S (2021) Robust multivariate mean estimation: the optimality of trimmed mean. Ann Stat 49(1):393–410. https://doi.org/10.1214/20-AOS1961

    Article  MathSciNet  MATH  Google Scholar 

  • Nemirovsky AS, Yudin DB (1983) Problem complexity and method efficiency in optimization

  • Pauwels E (2020) Lecture notes: statistics, optimization and algorithms in high dimension. https://www.math.univ-toulouse.fr/ epauwels/M2RI/

  • Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79:871–880

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw PJ (1985) Multivariate estimation with high breakdown point. In: Grossmann W, Pflug G, Vincze I, Wertz W (eds) Mathematical statistics and applications. Riedel, Kufstein, pp 283–297

    Chapter  Google Scholar 

  • Rousseeuw PJ, Ruts I (1998) Construting the bivariate Tukey median. Stat Sin 8(3):827–839

    MATH  Google Scholar 

  • Rousseeuw PJ, Yohai VJ (1984) Robust regression by means of S-estimators. In Robust and nonlinear time series analysis. Lecture Notes in Statist. Springer, New York. 26:256–272

  • Stahel WA (1981) Robuste Schatzungen: Infinitesimale Optimalitiit und Schiitzungen von Kovarianzmatrizen. Ph.D. dissertation, ETH, Zurich

  • Sun Q, Zhou WX, Fan JQ (2020) Adaptive Huber regression. J Am Stat Assoc 115(529):254–265. https://doi.org/10.1080/01621459.2018.1543124

    Article  MathSciNet  MATH  Google Scholar 

  • Weber A (1909) Uber den Standort der Industrien, Tubingen. In: Alfred Weber’s Theory of Location of Industries, University of Chicago Press. English translation by Freidrich, C.J. (1929)

  • Weng H, Maleki A, Zheng L (2018) Overcoming the limitations of phase transition by higher order analysis of regularization techniques. Ann Stat 46(6A):3099–3129

    Article  MathSciNet  MATH  Google Scholar 

  • Wu M, Zuo Y (2009) Trimmed and Winsorized means based on a scaled deviation. J Stat Plann Inference 139(2):350–365

    Article  MathSciNet  MATH  Google Scholar 

  • Zuo Y (2003) Projection-based depth functions and associated medians. Ann Stat 31:1460–1490

    Article  MathSciNet  MATH  Google Scholar 

  • Zuo Y (2006) Robust location and scatter estimators in multivariate analysis. Imperial College Press, London, pp 467–490

    MATH  Google Scholar 

  • Zuo Y (2006) Multi-dimensional trimming based on projection depth. Ann Stat 34(5):2211–2251

    Article  MATH  Google Scholar 

  • Zuo Y (2018) A new approach for the computation of halfspace depth in high dimensions. Commun Stat Simul Comput 48(3):900–921

    Article  MathSciNet  MATH  Google Scholar 

  • Zuo Y, Serfling R (2000) General notions of statistical depth function. Ann Stat 28:461–482

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The author thanks Hanshi Zuo, Prof.s Wei Shao, Yimin Xiao, and Haolei Weng for insightful comments and stimulus discussions. Helpful comments and suggestions of two anonymous reviewers are highly appreciated.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yijun Zuo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 12 KB)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zuo, Y. Non-asymptotic analysis and inference for an outlyingness induced winsorized mean. Stat Papers 64, 1465–1481 (2023). https://doi.org/10.1007/s00362-022-01353-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-022-01353-5

Keywords

Mathematics Subject Classification

Navigation