Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Robust regression based on shrinkage with application to Living Environment Deprivation

Abstract

A robust estimator is proposed for the parameters that characterize the linear regression problem. It is based on the notion of shrinkages, often used in Finance and previously studied for outlier detection in multivariate data. A thorough simulation study is conducted to investigate: the efficiency with Normal and heavy-tailed errors, the robustness under contamination, the computational time, the affine equivariance and breakdown value of the regression estimator. Two classical data-sets often used in the literature and a real socioeconomic data-set about the Living Environment Deprivation of areas in Liverpool (UK), are studied. The results from the simulations and the real data examples show the advantages of the proposed robust estimator in regression.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

References

  1. Agulló J, Croux C, Van Aelst S (2008) The multivariate least-trimmed squares estimator. J Multivar Anal 99(3):311–338

  2. Arribas-Bel D, Patino JE, Duque JC (2017) Remote sensing-based measurement of Living Environment Deprivation: improving classical approaches with machine learning. PLOS ONE 12(5):e0176684

  3. Cabana E, Lillo R E, Laniado H (Nov 2019) Multivariate outlier detection based on a robust mahalanobis distance with shrinkage estimators. Stat Pap. ISSN 1613-9798. https://doi.org/10.1007/s00362-019-01148-1

  4. Croux C, Rousseeuw PJ, Hössjer O (1994) Generalized S-estimators. J Am Stat Assoc 89(428):1271

  5. Croux C, Van Aelst S, Dehon C (2003) Bounded influence regression using high breakdown scatter matrices. Ann Inst Stat Math 55(2):265–285

  6. D’Alimonte D, Cornford D (2008) Outlier detection with partial information: application to emergency mapping. Stoch Environ Res Risk Assess 22(5):613–620

  7. De Grève JP, Vanbeveren D (1980) Close binary systems before and after mass transfer: a comparison of observations and theory. Astrophy Space Sci 68(2):433–457

  8. DeMiguel V, Martin-Utrera A, Nogales FJ (2013) Size matters: optimal calibration of shrinkage estimators for portfolio selection. J Bank Finance 37(8):3018–3034

  9. Donoho DL, Huber PJ (1983) The notion of breakdown point. In: Bickel PJ, Doksum K, Hodges JL (eds) A festschrift for Erich L. Lehmann, vol 157184. CRC Press, Wadsworth

  10. Edgeworth FY (1887) On observations relating to several quantities. Hermathena 6:279–285

  11. Falk M (1997) On mad and comedians. Ann Inst Stat Math 49(4):615–644

  12. Gervini D, Yohai VJ (2002) A class of robust and fully efficient regression estimators. Ann Stat 30(2):583–616

  13. Hawkins DM, Olive DJ (2002) Inconsistency of resampling algorithms for high-breakdown regression estimators and a new algorithm. J Am Stat Assoc 97(457):136–148

  14. Hawkins DM, Bradu D, Kass GV (1984) Location of several outliers in multiple-regression data using elemental sets. Technometrics 26(3):197

  15. Huber PJ (1964) Robust estimation of a location parameter. Ann Math Stat 35(1):73–101

  16. Huber PJ (1973) Robust regression: asymptotics, conjectures and monte Carlo. Ann Stat 1(5):799–821

  17. Huber P J (1981) Robust statistics. Wiley, New York

  18. Humphreys R M (1978) Studies of luminous stars in nearby galaxies. I. Supergiants and O stars in the Milky Way. Astrophys J Suppl Ser 38:309

  19. James W, Stein C (1992) Estimation with quadratic loss. In: Kotz S, Johnson NL (eds) Breakthroughs in Statistics. Springer Series in Statistics (Perspectives in Statistics). Springer, New York, NY, pp 443–460

  20. Jeong D, St-Hilaire A, Ouarda T, Gachon P (2012) Comparison of transfer functions in statistical downscaling models for daily temperature and precipitation over canada. Stoch Environ Res Risk Assess 26(5):633–653

  21. Jolliffe I (2011) Principal component analysis. In: Lovric M (eds) International encyclopedia of statistical science. Springer, Berlin, pp 1094–1096 

  22. Ledoit O, Wolf M (2003a) Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J Empir Finance 10(5):603–621

  23. Ledoit O, Wolf M N (2003b) Honey, I shrunk the sample covariance matrix. UPF Economics and Business Working Paper No. 691

  24. Ledoit O, Wolf M (2004) A well-conditioned estimator for large-dimensional covariance matrices. J Multivar Anal 88(2):365–411

  25. Leroy AM, Rousseeuw PJ (1987) Robust regression and outlier detection. John wiley & sons, New York

  26. Lopuhaa HP, Rousseeuw PJ (1991) Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann Stat 19(1):229–248

  27. Maronna R, Morgenthaler S (1986) Robust regression through robust covariances. Commun Stat—Theory Methods 15(4):1347–1365

  28. Maronna RA, Zamar RH (2002) Robust estimates of location and dispersion for high-dimensional datasets. Technometrics 44(4):307–317

  29. Maronna RA, Martin RD, Yohai VJ (2006) Robust statistics : theory and methods. Wiley, New York

  30. Mourino H, Barao MI (2010) A comparison between the linear regression model with autocorrelated errors and the partial adjustment model. Stoch Environ Res Risk Assess 24(4):499–511

  31. Oja H (2010) Multivariate nonparametric methods with R: an approach based on spatial signs and ranks. Springer, Berlin

  32. Pan Z, Liu P, Gao S, Feng M, Zhang Y (2018) Evaluation of flood season segmentation using seasonal exceedance probability measurement after outlier identification in the three gorges reservoir. Stoch Environ Res Risk Assess 32(6):1573–1586

  33. Riani M, Perrotta D, Torti F (2012) FSDA: a MATLAB toolbox for robust analysis and interactive data exploration. Chemometr Intell Lab Syst 116:17–32

  34. Rousseeuw PJ (1983) Multivariate estimation with high breakdown point. Math Stat Appl 8:287–297

  35. Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79(388):871–880

  36. Rousseeuw PJ, Croux C (1993) Alternatives to the median absolute deviation. J Am Stat Assoc 88(424):1273

  37. Rousseeuw P, Yohai V (1984) Robust regression by means of S-estimators. Springer, New York, pp 256–272

  38. Rousseeuw PJ, Aelst SV, Van Driessen K, Agulló J (2004) Robust multivariate regression. Technometrics 46(3):293–305

  39. Ruppert D (1992) Computing S estimators for regression and multivariate location/dispersion. J Comput Graph Stat 1(3):253

  40. Sajesh TA, Srinivasan MR (2012) Outlier detection for high dimensional data using the Comedian approach. J Stat Comput Simul 82(5):745–757

  41. Sguera C, Galeano P, Lillo RE (2016) Functional outlier detection by a local depth with application to no x levels. Stoch Environ Res Risk Assess 30(4):1115–1130

  42. Siegel AF (1982) Robust regression using repeated medians. Biometrika 69(1):242

  43. Stromberg AJ, Hössjer O, Hawkins DM (2000) The least trimmed differences regression estimator and alternatives. J Am Stat Assoc 95(451):853–864

  44. Tung Y, Yeh K, Yang J (1997) Regionalization of unit hydrograph parameters: 1. Comp Regres Anal Tech 11:17

  45. Vardi Y, Zhang CH (2000) The multivariate L1-median and associated data depth. Proc Natl Acad Sci U S Am 97(4):1423–6

  46. Verboven S, Hubert M (2005) LIBRA: a MATLAB library for robust analysis. Chemometr Intell Lab Syst 75(2):127–136

  47. Xiong S, Joseph VR (2013) Regression with outlier shrinkage. J Stat Plan Inference 143(11):1988–2001

  48. Yohai VJ (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15(2):642–656

  49. Yu C, Yao W (2017) Robust linear regression: a review and comparison. Commun Stat—Simul Comput 46(8):6261–6282

  50. Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286

Download references

Acknowledgements

The authors are grateful to the editor and the referee for the constructive and valuable comments. This research was partially supported by MINISTERIO DE ECONOMIA, INDUSTRIA Y COMPETITIVIDAD, Award Number: ECO2015-66593-P.

Author information

Correspondence to Elisa Cabana.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was partially supported by MINISTERIO DE ECONOMIA, INDUSTRIA Y COMPETITIVIDAD, Award Number: ECO2015-66593-P.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 116 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cabana, E., Lillo, R.E. & Laniado, H. Robust regression based on shrinkage with application to Living Environment Deprivation. Stoch Environ Res Risk Assess (2020). https://doi.org/10.1007/s00477-020-01774-4

Download citation

Keywords

  • Robust regression
  • Robust Mahalanobis distance
  • Shrinkage estimator
  • Outliers
  • Environmental study