A robust proposal of estimation for the sufficient dimension reduction problem

Abstract

In nonparametric regression contexts, when the number of covariables is large, we face the curse of dimensionality. One way to deal with this problem when the sample is not large enough is using a reduced number of linear combinations of the explanatory variables that contain most of the information about the response variable. This leads to the so-called sufficient reduction problem. The purpose of this paper is to obtain robust estimators of a sufficient dimension reduction, that is, estimators which are not very much affected by the presence of a small fraction of outliers in the data. One way to derive a sufficient dimension reduction is by means of the principal fitted components (PFC) model. We obtain robust estimations for the parameters of this model and the corresponding sufficient dimension reduction based on a \(\tau \)-scale (\(\tau \)-estimators). Strong consistency of these estimators under weak assumptions of the underlying distribution is proven. The \(\tau \)-estimators for the PFC model are computed using an iterative algorithm. A Monte Carlo study compares the performance of \(\tau \)-estimators and maximum likelihood estimators. The results show clear advantages for \(\tau \)-estimators in the presence of outlier contamination and only small loss of efficiency when outliers are absent. A proposal to select the dimension of the reduction space based on cross-validation is given. These estimators are implemented in R language through functions contained in the package tauPFC. As the PFC model is a special case of multivariate reduced-rank regression, our proposal can be applied directly to this model as well.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. Adrover JG, Donato SM (2015) A robust predictive approach for canonical correlation analysis. J Multivar Anal 133:356–376

    MathSciNet  MATH  Article  Google Scholar 

  2. Anderson TW (1951) Estimating linear restrictions on regression coefficients for multivariate normal distributions. Ann Math Stat 22(3):327–351

    MathSciNet  MATH  Article  Google Scholar 

  3. Bergesio A, Szretter Noste ME, Yohai VJ (2020) tauPFC: computes robust estimators for the PFC model. R package version 0.0.1. https://github.com/meszre/tauPFC

  4. Boente G, Fraiman R (1989) Robust nonparametric regression estimation for dependent observations. Ann Stat 17(3):1242–1256

    MathSciNet  MATH  Article  Google Scholar 

  5. Boente G, Martínez A (2017) Marginal integration m-estimators for additive models. TEST 26(2):231–260

    MathSciNet  MATH  Article  Google Scholar 

  6. Bura E, Cook RD (2001) Estimating the structural dimension of regressions via parametric inverse regression. J R Stat Soc Ser B (Stat Methodol) 63(2):393–410

    MathSciNet  MATH  Article  Google Scholar 

  7. Bura E, Cook RD (2003) Rank estimation in reduced-rank regression. J Multivar Anal 87(1):159–176

    MathSciNet  MATH  Article  Google Scholar 

  8. Bura E, Forzani L (2015) Sufficient reductions in regressions with elliptically contoured inverse predictors. J Am Stat Assoc 110(509):420–434

    MathSciNet  MATH  Article  Google Scholar 

  9. Bura E, Yang J (2011) Dimension estimation in sufficient dimension reduction: a unifying approach. J Multivar Anal 102(1):130–142

    MathSciNet  MATH  Article  Google Scholar 

  10. Bura E, Duarte S, Forzani L (2016) Sufficient reductions in regressions with exponential family inverse predictors. J Am Stat Assoc 111(515):1313–1329

    MathSciNet  Article  Google Scholar 

  11. Cook RD (2007) Fisher lecture: dimension reduction in regression. Stat Sci 22(1):1–26

    MathSciNet  MATH  Article  Google Scholar 

  12. Cook RD, Forzani L (2008) Principal fitted components for dimension reduction in regression. Stat Sci 23(4):485–501

    MathSciNet  MATH  Article  Google Scholar 

  13. Cook RD, Ni L (2005) Sufficient dimension reduction via inverse regression. J Am Stat Assoc 100(470):410–428

    MATH  Article  Google Scholar 

  14. Cook RD, Weisberg S (1991) Comment. J Am Stat Assoc 86(414):328–332

    MATH  Google Scholar 

  15. Cook RD, Li B, Chiaromonte F (2010) Envelope models for parsimonious and efficient multivariate linear regression. Stat Sin 20:927–960

    MathSciNet  MATH  Google Scholar 

  16. Cook RD, Forzani L, Tomassi D (2011) Ldr: a package for likelihood-based sufficient dimension reduction. J Stat Softw 39(1):1–20

    MATH  Google Scholar 

  17. Filzmoser P, Dehon C, Croux C (2000) Outlier resistant estimators for canonical correlation analysis. In: COMPSTAT, Springer, pp 301–306

  18. García Ben M, Martínez E, Yohai VJ (2006) Robust estimation for the multivariate linear model based on a \(\tau \)-scale. J Multivar Anal 97(7):1600–1622

    MathSciNet  MATH  Article  Google Scholar 

  19. Gather U, Hilker T, Becker C (2001) A robustified version of sliced inverse regression. In: Statistics in genetics and in the environmental sciences, Springer, pp 147–157

  20. Hampel FR (1971) A general qualitative definition of robustness. Ann Math Stat 42(6):1887–1896

    MathSciNet  MATH  Article  Google Scholar 

  21. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn, Springer, New York,

  22. Huber PJ (1981) Robust statistics. Wiley, New York

    Google Scholar 

  23. Izenman AJ (1975) Reduced-rank regression for the multivariate linear model. J Multivar Anal 5(2):248–264

    MathSciNet  MATH  Article  Google Scholar 

  24. Li K-C (1991) Sliced inverse regression for dimension reduction. J Am Stat Assoc 86(414):316–327

    MathSciNet  MATH  Article  Google Scholar 

  25. Li K-C (1992) On principal hessian directions for data visualization and dimension reduction: another application of stein’s lemma. J Am Stat Assoc 87(420):1025–1039

    MathSciNet  MATH  Article  Google Scholar 

  26. Li B, Wang S (2007) On directional regression for dimension reduction. J Am Stat Assoc 102(479):997–1008

    MathSciNet  MATH  Article  Google Scholar 

  27. Li B, Zha H, Chiaromonte F (2005) Contour regression: a general approach to dimension reduction. Ann Stat 33(4):1580–1616

    MathSciNet  MATH  Article  Google Scholar 

  28. Li B, Artemiou A, Li L (2011) Principal support vector machines for linear and nonlinear sufficient dimension reduction. Ann Stat 39(6):3182–3210

    MathSciNet  MATH  Article  Google Scholar 

  29. Lopuhaä HP (1991) Multivariate \(\tau \)-estimators for location and scatter. Can J Stat 19(3):307–321

    MathSciNet  MATH  Article  Google Scholar 

  30. Maechler M, Rousseeuw P, Croux C, Todorov V, Ruckstuhl A, Salibian-Barrera M, Verbeke T, Koller M, Conceicao ELT, Anna di Palma M (2020) Robustbase: basic robust statistics. R package version 0.93-6

  31. Muler N, Yohai VJ (2002) Robust estimates for arch processes. J Time Ser Anal 23(3):341–375

    MathSciNet  MATH  Article  Google Scholar 

  32. Papantoni-Kazakos P, Gray RM (1979) Robustness of estimators on stationary observations. Ann Probab 7(6):989–1002

    MathSciNet  MATH  Article  Google Scholar 

  33. R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria

  34. Reinsel GC, Velu RP (1998) Multivariate reduced-rank regression: theory and applications. Springer, Berlin

    Google Scholar 

  35. Salibian-Barrera M, Yohai VJ (2006) A fast algorithm for s-regression estimates. J Comput Gr Stat 15(2):414–427

    MathSciNet  Article  Google Scholar 

  36. Scrucca L (2011) Model-based sir for dimension reduction. Comput Stat Data Anal 55(11):3010–3026

    MathSciNet  MATH  Article  Google Scholar 

  37. She Y, Chen K (2017) Robust reduced-rank regression. Biometrika 104(3):633–647

    MathSciNet  MATH  Article  Google Scholar 

  38. Szretter Noste ME (2019) Using dags to identify the sufficient dimension reduction in the principal fitted components model. Stat Probab Lett 145:317–320

    MathSciNet  MATH  Article  Google Scholar 

  39. Tatsuoka KS, Tyler DE (2000) On the uniqueness of s-functionals and m-functionals under nonelliptical distributions. Ann Stat 28(4):1219–1243

    MathSciNet  MATH  Article  Google Scholar 

  40. Todorov V, Filzmoser P (2009) An object-oriented framework for robust multivariate analysis. J Stat Softw 32(3):1–47

    Article  Google Scholar 

  41. Tyler DE (1987) A distribution-free m-estimator of multivariate scatter. Ann Stat 15:234–251

    MathSciNet  MATH  Article  Google Scholar 

  42. Weisberg S (2005) Applied linear regression, vol 528. Wiley, New York

    Google Scholar 

  43. Yohai VJ (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15(2):642–656

    MathSciNet  MATH  Article  Google Scholar 

  44. Yohai VJ, Zamar RH (1988) High breakdown-point estimates of regression by means of the minimization of an efficient scale. J Am Stat Assoc 83(402):406–413

    MathSciNet  MATH  Article  Google Scholar 

  45. Yohai VJ, Zamar RH (1997) Optimal locally robust m-estimates of regression. J Stat Plan Inference 64(2):309–323

    MathSciNet  MATH  Article  Google Scholar 

  46. Zhao W, Lian H, Ma S (2017) Robust reduced-rank modeling via rank regression. J Stat Plan Inference 180:1–12

    MathSciNet  MATH  Article  Google Scholar 

  47. Zhou J (2009) Robust dimension reduction based on canonical correlation. J Multivar Anal 100(1):195–209

    MathSciNet  MATH  Article  Google Scholar 

Download references

Acknowledgements

We would like to thank the Associate Editor and referees for their comments and suggestions which have helped us to improve this paper. Also, we gratefully acknowledge partial support by Grants PICT-2016-0377 and PICT-2015-2023 from ANPCYT, and also Grants 20020170100330BA from Universidad de Buenos Aires, and 50420150100032LI from Universidad Nacional del Litoral, Argentina.

Author information

Affiliations

Authors

Corresponding author

Correspondence to María Eugenia Szretter Noste.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 332 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bergesio, A., Szretter Noste, M.E. & Yohai, V.J. A robust proposal of estimation for the sufficient dimension reduction problem. TEST (2021). https://doi.org/10.1007/s11749-020-00745-9

Download citation

Keywords

  • \(\tau \)-Estimators
  • Principal fitted components
  • Multivariate reduced-rank regression
  • Robustness

Mathematics Subject Classification

  • 62F35 Robustness and adaptive procedures (parametric inference)
  • 62F12 Asymptotic properties of parametric estimators