Computational Statistics

, Volume 32, Issue 2, pp 631–646 | Cite as

Nonsingular subsampling for regression S estimators with categorical predictors

Original Paper

Abstract

Simple random subsampling is an integral part of S estimation algorithms for linear regression. Subsamples are required to be nonsingular. Usually, discarding a singular subsample and drawing a new one leads to a sufficient number of nonsingular subsamples with a reasonable computational effort. However, this procedure can require so many subsamples that it becomes infeasible, especially if levels of categorical variables have low frequency. A subsampling algorithm called nonsingular subsampling is presented, which generates only nonsingular subsamples. When no singular subsamples occur, nonsingular subsampling is as fast as the simple algorithm, and if singular subsamples do occur, it maintains the same computational order. The algorithm works consistently, unless the full design matrix is singular. The method is based on a modified LU decomposition algorithm that combines sample generation with solving the least squares problem. The algorithm may also be useful for ordinary bootstrapping. Since the method allows for S estimation in designs with factors and interactions between factors and continuous regressors, we study properties of the resulting estimators, both in the sense of their dependence on the randomness of the sampling and of their statistical performance.

Keywords

Robust regression MM estimate S estimate Resampling Collinearity Bootstrap Dummy variables 

References

  1. Davies PL, Gather U (2005) Breakdown and groups. Ann Stat 33(3):977–1035MathSciNetCrossRefMATHGoogle Scholar
  2. Demmel J (1997) Applied numerical linear algebra. Society for Industrial and Applied MathematicsGoogle Scholar
  3. Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, BaltimoreMATHGoogle Scholar
  4. Hampel FR (1975) Beyond location parameters: robust concepts and methods. Bull Int Stat Inst 46:375–382MathSciNetMATHGoogle Scholar
  5. Koller M, Stahel WA (2011) Sharpening wald-type inference in robust regression for small samples. Comput Stat Data Anal 55(8):2504–2515. doi:10.1016/j.csda.2011.02.014 MathSciNetCrossRefGoogle Scholar
  6. Maronna RA, Yohai VJ (2000) Robust regression with both continuous and categorical predictors. J Stat Plan Inference 89(12):197–214. doi:10.1016/S0378-3758(99)00208-6 MathSciNetCrossRefMATHGoogle Scholar
  7. Maronna RA, Martin RD, Yohai VJ (2006) Robust statistics, theory and methods. Wiley, NYCrossRefMATHGoogle Scholar
  8. Mili L, Coakley CW (1996) Robust estimation in structured linear regression. Ann Stat 24(6):2593–2607MathSciNetCrossRefMATHGoogle Scholar
  9. Mili L, Phaniraj V, Rousseeuw P (1991) Least median of squares estimation in power systems. IEEE Trans Power Syst 6(2):511–523CrossRefGoogle Scholar
  10. Politis DN, Romano JP, Michael W (1999) Subsampling. Springer series in statistics. Springer, NYGoogle Scholar
  11. R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/, ISBN 3-900051-07-0
  12. Rousseeuw P, Croux C, Todorov V, Ruckstuhl A, Salibian-Barrera M, Verbeke T, Koller M, Maechler M (2015) Robustbase: basic robust statistics. http://CRAN.R-project.org/package=robustbase, r package version 0.92-5
  13. Ruckstuhl AF (1995) Analysis of the t2 emission spectrum by robust estimation techniques. Ph.D. thesis, Swiss Federal Institute of Technology ZurichGoogle Scholar
  14. Salibian-Barrera M, Yohai V (2006) A fast algorithm for S-regression estimates. J Comput Graph Stat 15(2):414–427MathSciNetCrossRefGoogle Scholar
  15. Stahel WA, Ruckstuhl AF, Senn P, Dressler K (1994) Robust estimation in the analysis of complex molecular spectra. J Am Stat Assoc 89(427):788–795CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Institute of Social and Preventive MedicineUniversity of BernBernSwitzerland
  2. 2.Seminar für StatistikETH ZürichZürichSwitzerland

Personalised recommendations