Computational Statistics

, Volume 32, Issue 2, pp 631–646

# Nonsingular subsampling for regression S estimators with categorical predictors

Original Paper

## Abstract

Simple random subsampling is an integral part of S estimation algorithms for linear regression. Subsamples are required to be nonsingular. Usually, discarding a singular subsample and drawing a new one leads to a sufficient number of nonsingular subsamples with a reasonable computational effort. However, this procedure can require so many subsamples that it becomes infeasible, especially if levels of categorical variables have low frequency. A subsampling algorithm called nonsingular subsampling is presented, which generates only nonsingular subsamples. When no singular subsamples occur, nonsingular subsampling is as fast as the simple algorithm, and if singular subsamples do occur, it maintains the same computational order. The algorithm works consistently, unless the full design matrix is singular. The method is based on a modified LU decomposition algorithm that combines sample generation with solving the least squares problem. The algorithm may also be useful for ordinary bootstrapping. Since the method allows for S estimation in designs with factors and interactions between factors and continuous regressors, we study properties of the resulting estimators, both in the sense of their dependence on the randomness of the sampling and of their statistical performance.

### Keywords

Robust regression MM estimate S estimate Resampling Collinearity Bootstrap Dummy variables

### References

1. Davies PL, Gather U (2005) Breakdown and groups. Ann Stat 33(3):977–1035
2. Demmel J (1997) Applied numerical linear algebra. Society for Industrial and Applied MathematicsGoogle Scholar
3. Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore
4. Hampel FR (1975) Beyond location parameters: robust concepts and methods. Bull Int Stat Inst 46:375–382
5. Koller M, Stahel WA (2011) Sharpening wald-type inference in robust regression for small samples. Comput Stat Data Anal 55(8):2504–2515. doi:10.1016/j.csda.2011.02.014
6. Maronna RA, Yohai VJ (2000) Robust regression with both continuous and categorical predictors. J Stat Plan Inference 89(12):197–214. doi:10.1016/S0378-3758(99)00208-6
7. Maronna RA, Martin RD, Yohai VJ (2006) Robust statistics, theory and methods. Wiley, NY
8. Mili L, Coakley CW (1996) Robust estimation in structured linear regression. Ann Stat 24(6):2593–2607
9. Mili L, Phaniraj V, Rousseeuw P (1991) Least median of squares estimation in power systems. IEEE Trans Power Syst 6(2):511–523
10. Politis DN, Romano JP, Michael W (1999) Subsampling. Springer series in statistics. Springer, NYGoogle Scholar
11. R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/, ISBN 3-900051-07-0
12. Rousseeuw P, Croux C, Todorov V, Ruckstuhl A, Salibian-Barrera M, Verbeke T, Koller M, Maechler M (2015) Robustbase: basic robust statistics. http://CRAN.R-project.org/package=robustbase, r package version 0.92-5
13. Ruckstuhl AF (1995) Analysis of the t2 emission spectrum by robust estimation techniques. Ph.D. thesis, Swiss Federal Institute of Technology ZurichGoogle Scholar
14. Salibian-Barrera M, Yohai V (2006) A fast algorithm for S-regression estimates. J Comput Graph Stat 15(2):414–427
15. Stahel WA, Ruckstuhl AF, Senn P, Dressler K (1994) Robust estimation in the analysis of complex molecular spectra. J Am Stat Assoc 89(427):788–795