Nonsingular subsampling for regression S estimators with categorical predictors
- 427 Downloads
Simple random subsampling is an integral part of S estimation algorithms for linear regression. Subsamples are required to be nonsingular. Usually, discarding a singular subsample and drawing a new one leads to a sufficient number of nonsingular subsamples with a reasonable computational effort. However, this procedure can require so many subsamples that it becomes infeasible, especially if levels of categorical variables have low frequency. A subsampling algorithm called nonsingular subsampling is presented, which generates only nonsingular subsamples. When no singular subsamples occur, nonsingular subsampling is as fast as the simple algorithm, and if singular subsamples do occur, it maintains the same computational order. The algorithm works consistently, unless the full design matrix is singular. The method is based on a modified LU decomposition algorithm that combines sample generation with solving the least squares problem. The algorithm may also be useful for ordinary bootstrapping. Since the method allows for S estimation in designs with factors and interactions between factors and continuous regressors, we study properties of the resulting estimators, both in the sense of their dependence on the randomness of the sampling and of their statistical performance.
KeywordsRobust regression MM estimate S estimate Resampling Collinearity Bootstrap Dummy variables
The authors would like to thank Kali Tal for providing editorial help with the manuscript. A reviewer has provided very helpful suggestions to improve earlier versions of the manuscript.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
- Demmel J (1997) Applied numerical linear algebra. Society for Industrial and Applied MathematicsGoogle Scholar
- Politis DN, Romano JP, Michael W (1999) Subsampling. Springer series in statistics. Springer, NYGoogle Scholar
- R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/, ISBN 3-900051-07-0
- Rousseeuw P, Croux C, Todorov V, Ruckstuhl A, Salibian-Barrera M, Verbeke T, Koller M, Maechler M (2015) Robustbase: basic robust statistics. http://CRAN.R-project.org/package=robustbase, r package version 0.92-5
- Ruckstuhl AF (1995) Analysis of the t2 emission spectrum by robust estimation techniques. Ph.D. thesis, Swiss Federal Institute of Technology ZurichGoogle Scholar