Skip to main content

Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey

Abstract

We survey some of the recent advances in mean estimation and regression function estimation. In particular, we describe sub-Gaussian mean estimators for possibly heavy-tailed data in both the univariate and multivariate settings. We focus on estimators based on median-of-means techniques, but other methods such as the trimmed-mean and Catoni’s estimators are also reviewed. We give detailed proofs for the cornerstone results. We dedicate a section to statistical learning problems—in particular, regression function estimation—in the presence of possibly heavy-tailed data.

This is a preview of subscription content, access via your institution.

Notes

  1. As we explain in what follows, it suffices to ensure that the comparison is correct between \(\mu \) and any point that is not too close to \(\mu \).

  2. In the proof of Theorem 8, “well-behaved” means that (3.5) holds for a majority of the blocks.

  3. The case \(q=3\) is the standard Berry–Esseen theorem, while for \(2<q<3\) one may use generalized Berry–Esseen bounds, see [71].

  4. Note that one has the freedom to select a function \(\widehat{f}\) that does not belong to \({{\mathcal {F}}}\).

References

  1. N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58:137–147, 2002.

    MathSciNet  Article  Google Scholar 

  2. G. Aloupis. Geometric measures of data depth. DIMACS series in discrete mathematics and theoretical computer science, 72:147–158, 2006.

    MathSciNet  Article  Google Scholar 

  3. M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999.

    Book  Google Scholar 

  4. J.-Y. Audibert and O. Catoni. Robust linear least squares regression. The Annals of Statistics, 39:2766–2794, 2011.

    MathSciNet  Article  Google Scholar 

  5. Y. Baraud and L. Birgé. Rho-estimators revisited: General theory and applications. The Annals of Statistics, 46(6B):3767–3804, 2018.

    MathSciNet  Article  Google Scholar 

  6. Y. Baraud, L. Birgé, and M. Sart. A new method for estimation and model selection: \(\rho \)-estimation. Inventiones Mathematicae, 207(2):425–517, 2017.

    MathSciNet  Article  Google Scholar 

  7. P.L. Bartlett, O. Bousquet, and S. Mendelson. Localized Rademacher complexities. Annals of Statistics, 33:1497–1537, 2005.

    MathSciNet  Article  Google Scholar 

  8. P.J. Bickel. On some robust estimates of location. The Annals of Mathematical Statistics, 36:847–858, 1965.

    MathSciNet  Article  Google Scholar 

  9. A. Blumer, A. Ehrenfeucht, D. Haussler, and M.K. Warmuth. Learnability and the Vapnik–Chervonenkis dimension. Journal of the ACM, 36:929–965, 1989.

    MathSciNet  Article  Google Scholar 

  10. S. Boucheron, G. Lugosi, and P. Massart. Concentration inequalities:A Nonasymptotic Theory of Independence. Oxford University Press, 2013.

    Book  Google Scholar 

  11. C. Brownlees, E. Joly, and G. Lugosi. Empirical risk minimization for heavy-tailed losses. Annals of Statistics, 43:2507–2536, 2015.

    MathSciNet  Article  Google Scholar 

  12. S. Bubeck, N. Cesa-Bianchi, and G. Lugosi. Bandits with heavy tail. IEEE Transactions on Information Theory, 59:7711–7717, 2013.

    MathSciNet  Article  Google Scholar 

  13. P. Bühlmann and S. van de Geer. Statistics for high-dimensional data. Springer Series in Statistics. Springer, Heidelberg, 2011. Methods, theory and applications.

    Chapter  Google Scholar 

  14. O. Catoni. Challenging the empirical mean and empirical variance: a deviation study. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 48(4):1148–1185, 2012.

    MathSciNet  Article  Google Scholar 

  15. O. Catoni and I. Giulini. Dimension-free PAC-Bayesian bounds for matrices, vectors, and linear least squares regression. arXiv preprint arXiv:1712.02747, 2017.

  16. O. Catoni and I. Giulini. Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector. arXiv preprint arXiv:1802.04308, 2018.

  17. Moses Charikar, Jacob Steinhardt, and Gregory Valiant. Learning from untrusted data. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 47–60. ACM, 2017.

  18. Y. Cherapanamjeri, N. Flammarion, and P. Bartlett. Fast mean estimation with sub-Gaussian rates. arXiv preprint arXiv:1902.01998, 2019.

  19. M. Chichignoud and J. Lederer. A robust, adaptive m-estimator for pointwise estimation in heteroscedastic regression. Bernoulli, 20(3):1560–1599, 2014.

    MathSciNet  Article  Google Scholar 

  20. M.B. Cohen, Y.T. Lee, G. Miller, J. Pachocki, and A. Sidford. Geometric median in nearly linear time. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, pages 9–21. ACM, 2016.

  21. L. Devroye, L. Györfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York, 1996.

    Book  Google Scholar 

  22. L. Devroye, M. Lerasle, G. Lugosi, and R.I. Oliveira. Sub-Gaussian mean estimators. Annals of Statistics, 2016.

  23. I. Diakonikolas, G. Kamath, D.M. Kane, J. Li, A. Moitra, and A. Stewart. Robust estimators in high dimensions without the computational intractability. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 655–664. IEEE, 2016.

  24. I. Diakonikolas, G. Kamath, D.M. Kane, J. Li, A. Moitra, and A. Stewart. Being robust (in high dimensions) can be practical. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017), 2017.

  25. I. Diakonikolas, G. Kamath, D.M. Kane, J. Li, A. Moitra, and A. Stewart. Robustly learning a Gaussian: Getting optimal error, efficiently. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2683–2702. Society for Industrial and Applied Mathematics, 2018.

  26. I. Diakonikolas, D.M. Kane, and A. Stewart. Efficient robust proper learning of log-concave distributions. arXiv preprint arXiv:1606.03077, 2016.

  27. I. Diakonikolas, W. Kong, and A. Stewart. Efficient algorithms and lower bounds for robust linear regression. arXiv preprint arXiv:1806.00040, 2018.

  28. J. Fan, Q. Li, and Y. Wang. Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(1):247–265, 2017.

    MathSciNet  Article  Google Scholar 

  29. L. Györfi, M. Kohler, A. Krzyżak, and H. Walk. A distribution-free theory of nonparametric regression. Springer-Verlag, New York, 2002.

    Book  Google Scholar 

  30. F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, and W.A. Stahel. Robust statistics: the approach based on influence functions, volume 196. Wiley, 1986.

    MATH  Google Scholar 

  31. Q. Han and J.A. Wellner. A sharp multiplier inequality with applications to heavy-tailed regression problems. arXiv preprint arXiv:1706.02410, 2017.

  32. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13–30, 1963.

    MathSciNet  Article  Google Scholar 

  33. S.B. Hopkins. Sub-Gaussian mean estimation in polynomial time. Annals of Statistics, 2019, to appear.

  34. S.B. Hopkins and J. Li. Mixture models, robustness, and sum of squares proofs. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1021–1034. ACM, 2018.

  35. D. Hsu. Robust statistics. http://www.inherentuncertainty.org/2010/12/robust-statistics.html, 2010.

  36. D. Hsu and S. Sabato. Loss minimization and parameter estimation with heavy tails. Journal of Machine Learning Research, 17:1–40, 2016.

    MathSciNet  MATH  Google Scholar 

  37. M. Huber. An optimal (\(\epsilon \), \(\delta \))-randomized approximation scheme for the mean of random variables with bounded relative variance. Random Structures & Algorithms, 2019.

  38. P.J. Huber. Robust estimation of a location parameter. The annals of mathematical statistics, 35(1):73–101, 1964.

    MathSciNet  Article  Google Scholar 

  39. P.J. Huber and E.M. Ronchetti. Robust statistics. Wiley, New York, 2009. Second edition.

  40. M. Jerrum, L. Valiant, and V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoretical Computer Science, 43:186–188, 1986.

    MathSciNet  Article  Google Scholar 

  41. E. Joly, G. Lugosi, and R. I. Oliveira. On the estimation of the mean of a random vector. Electronic Journal of Statistics, 11:440–451, 2017.

    MathSciNet  Article  Google Scholar 

  42. A. Klivans, P.K. Kothari, and R. Meka. Efficient algorithms for outlier-robust regression. In Proceedings of the 31st Annual Conference of Learning Theory (COLT 2018), 2018.

  43. V. Koltchinskii. Oracle inequalities in empirical risk minimization and sparse recovery problems, volume 2033 of Lecture Notes in Mathematics. Springer, Heidelberg, 2011. Lectures from the 38th Probability Summer School held in Saint-Flour, 2008, École d’Été de Probabilités de Saint-Flour. [Saint-Flour Probability Summer School].

  44. P.K. Kothari, J. Steinhardt, and D. Steurer. Robust moment estimation and improved clustering via sum of squares. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1035–1046. ACM, 2018.

  45. Kevin A. Lai, Anup B. Rao, and Santosh Vempala. Agnostic estimation of mean and covariance. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 665–674. IEEE, 2016.

  46. G. Lecué and M. Lerasle. Learning from mom’s principles: Le cam’s approach. arXiv preprint arXiv:1701.01961, 2017.

  47. G. Lecué and M. Lerasle. Robust machine learning by median-of-means: theory and practice. Annals of Stastistics, 2019, to appear.

  48. G. Lecué, M. Lerasle, and T. Mathieu. Robust classification via mom minimization. arXiv preprint arXiv:1808.03106, 2018.

  49. G. Lecué and S. Mendelson. Learning subgaussian classes: Upper and minimax bounds. In S. Boucheron and N. Vayatis, editors, Topics in Learning Theory. Societe Mathematique de France, 2016.

  50. G. Lecué and S. Mendelson. Performance of empirical risk minimization in linear aggregation. Bernoulli, 22(3):1520–1534, 2016.

    MathSciNet  Article  Google Scholar 

  51. M. Ledoux. The concentration of measure phenomenon. American Mathematical Society, Providence, RI, 2001.

    MATH  Google Scholar 

  52. M. Ledoux and M. Talagrand. Probability in Banach Space. Springer-Verlag, New York, 1991.

    Book  Google Scholar 

  53. M. Lerasle and R. I. Oliveira. Robust empirical mean estimators. arXiv:1112.3914, 2012.

  54. Po-Ling Loh and Xin Lu Tan. High-dimensional robust precision matrix estimation: Cellwise corruption under \(\epsilon \)-contamination. Electronic Journal of Statistics, 12(1):1429–1467, 2018.

    MathSciNet  Article  Google Scholar 

  55. G. Lugosi and S. Mendelson. Robust multivariate mean estimation: the optimality of trimmed mean. manuscript, 2019.

  56. G. Lugosi and S. Mendelson. Sub-Gaussian estimators of the mean of a random vector. Annals of Statistics, 47:783–794, 2019.

    MathSciNet  Article  Google Scholar 

  57. G. Lugosi and S. Mendelson. Near-optimal mean estimators with respect to general norms. Probability Theory and Related Fields, 2019, to appear.

  58. G. Lugosi and S. Mendelson. Regularization, sparse recovery, and median-of-means tournaments. Bernoulli, 2019, to appear.

  59. G. Lugosi and S. Mendelson. Risk minimization by median-of-means tournaments. Journal of the European Mathematical Society, 2019, to appear.

  60. P. Massart. Concentration inequalities and model selection. Ecole d’été de Probabilités de Saint-Flour 2003. Lecture Notes in Mathematics. Springer, 2006.

  61. S. Mendelson. Learning without concentration. Journal of the ACM, 62:21, 2015.

    MathSciNet  Article  Google Scholar 

  62. S. Mendelson. An optimal unrestricted learning procedure. arXiv preprint arXiv:1707.05342, 2017.

  63. S. Mendelson. Learning without concentration for general loss functions. Probability Theory and Related Fields, 171(1-2):459–502, 2018.

    MathSciNet  Article  Google Scholar 

  64. S. Mendelson and N. Zhivotovskiy. Robust covariance estimation under \({L}_4-{L}_2\) norm equivalence. arXiv preprint arXiv:1809.10462, 2018.

  65. S. Minsker. Geometric median and robust estimation in Banach spaces. Bernoulli, 21:2308–2335, 2015.

    MathSciNet  Article  Google Scholar 

  66. Stanislav Minsker. Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries. The Annals of Statistics, 46(6A):2871–2903, 2018.

    MathSciNet  Article  Google Scholar 

  67. Stanislav Minsker. Uniform bounds for robust mean estimators. arXiv preprint arXiv:1812.03523, 2018.

  68. Stanislav Minsker and Nate Strawn. Distributed statistical estimation and rates of convergence in normal approximation. arXiv preprint arXiv:1704.02658, 2017.

  69. A.S. Nemirovsky and D.B. Yudin. Problem complexity and method efficiency in optimization. 1983.

  70. Roberto I. Oliveira and Paulo Orenstein. The sub-Gaussian property of trimmed means estimators. Technical report, IMPA, 2019.

  71. Valentin V Petrov. Limit theorems of probability theory: sequences of independent random variables. Technical report, Oxford, New York, 1995.

  72. IG Shevtsova. On the absolute constants in the Berry–Esseen-type inequalities. In Doklady Mathematics, volume 89, pages 378–381. Springer, 2014.

  73. C.G. Small. A survey of multidimensional medians. International Statistical Review, pages 263–277, 1990.

    Google Scholar 

  74. S.M. Stigler. The asymptotic distribution of the trimmed mean. The Annals of Statistics, 1:472–477, 1973.

    MathSciNet  Article  Google Scholar 

  75. B.S. Tsirelson, I.A. Ibragimov, and V.N. Sudakov. Norm of Gaussian sample function. In Proceedings of the 3rd Japan-U.S.S.R. Symposium on Probability Theory, volume 550 of Lecture Notes in Mathematics, pages 20–41. Springer-Verlag, Berlin, 1976.

  76. A. B. Tsybakov. Introduction to nonparametric estimation. Springer Series in Statistics. Springer, New York, 2009.

    Book  Google Scholar 

  77. J.W. Tukey. Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians, Vancouver, 1975, volume 2, pages 523–531, 1975.

  78. J.W. Tukey and D.H. McLaughlin. Less vulnerable confidence and significance procedures for location based on a single sample: Trimming/winsorization 1. Sankhyā: The Indian Journal of Statistics, Series A, 25:331–352, 1963.

    MathSciNet  MATH  Google Scholar 

  79. L.G. Valiant. A theory of the learnable. Communications of the ACM, 27:1134–1142, 1984.

    Article  Google Scholar 

  80. S. van de Geer. Applications of empirical process theory, volume 6 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2000.

  81. A.W. van der Waart and J.A. Wellner. Weak convergence and empirical processes. Springer, 1996.

    Google Scholar 

  82. V.N. Vapnik and A.Ya. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974. (in Russian); German translation: Theorie der Zeichenerkennung, Akademie Verlag, Berlin, 1979.

  83. R. Vershynin. Lectures in geometric functional analysis. 2009.

Download references

Acknowledgements

We thank Sam Hopkins, Stanislav Minsker, and Roberto Imbuzeiro Oliveira for illuminating discussions on the subject. We also thank two referees for their thorough reports and insightful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gábor Lugosi.

Additional information

Communicated by Albert Cohen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Gábor Lugosi was supported by the Spanish Ministry of Economy and Competitiveness, Grant MTM2015-67304-P and FEDER, EU, by “High-dimensional problems in structured probabilistic models - Ayudas Fundación BBVA a Equipos de Investigación Cientifica 2017” and by “Google Focused Award Algorithms and Learning for AI.” Shahar Mendelson was supported in part by the Israel Science Foundation.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lugosi, G., Mendelson, S. Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey. Found Comput Math 19, 1145–1190 (2019). https://doi.org/10.1007/s10208-019-09427-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10208-019-09427-x

Keywords

  • Mean estimation
  • Heavy-tailed distributions
  • Robustness
  • Regression function estimation
  • Statistical learning

Mathematics Subject Classification

  • 62G05
  • 62G15
  • 62G35