Abstract
We propose a residual and wild bootstrap methodology for individual and simultaneous inference in high-dimensional linear models with possibly non-Gaussian and heteroscedastic errors. We establish asymptotic consistency for simultaneous inference for parameters in groups G, where \(p \gg n\), \(s_0 = o(n^{1/2}/\{\log (p) \log (|G|)^{1/2}\})\) and \(\log (|G|) = o(n^{1/7})\), with p the number of variables, n the sample size and \(s_0\) the sparsity. The theory is complemented by many empirical results. Our proposed procedures are implemented in the R-package hdi (Meier et al. hdi: high-dimensional inference. R package version 0.1-6, 2016).
Similar content being viewed by others
References
Belloni A, Chernozhukov V, Chetverikov D, Wei Y (2015a) Uniformly valid post-regularization confidence regions for many functional parameters in z-estimation. Preprint arXiv:1512.07619
Belloni A, Chernozhukov V, Kato K (2015b) Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems. Biometrika 102(1):77–94
Bickel P, Klaassen C, Ritov Y, Wellner J (1998) Efficient and adaptive estimation for semiparametric models. Springer, Berlin
Breiman L (1996) Heuristics of instability and stabilization in model selection. Ann Stat 24:2350–2383
Bühlmann P (2013) Statistical significance in high-dimensional linear models. Bernoulli 19:1212–1242
Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, Berlin
Bühlmann P, van de Geer S (2015) High-dimensional inference in misspecified linear models. Electron J Stat 9:1449–1473
Bühlmann P, Kalisch M, Meier L (2014) High-dimensional statistics with a view towards applications in biology. Annu Rev Stat Appl 1:255–278
Chatterjee A, Lahiri S (2011) Bootstrapping Lasso estimators. J Am Stat Assoc 106:608–625
Chatterjee A, Lahiri S (2013) Rates of convergence of the adaptive LASSO estimators to the oracle distribution and higher order refinements by the bootstrap. Ann Stat 41:1232–1259
Chernozhukov V, Chetverikov D, Kato K (2013) Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann Stat 41:2786–2819
Chernozhukov V, Chetverikov D, Kato K (2014) Central limit theorems and bootstrap in high dimensions. The Annals of Probabiliy, To appear, Preprint arXiv:1412.3661
Chernozhukov V, Hansen C, Spindler M (2016) hdm: high-dimensional metrics. Preprint arXiv:1608.00354
Deng H, Zhang C-H (2017) Beyond Gaussian approximation: bootstrap in large scale simultaneous inference. unpublished work in progress
Dezeure R, Bühlmann P, Meier L, Meinshausen N (2015) High-dimensional inference: confidence intervals, \(p\)-values and R-software hdi. Stat Sci 30:533–558
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26
Eicker F (1967) Limit theorems for regressions with unequal and dependent errors. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 59–82
Foygel Barber R, Candès EJ (2015) Controlling the false discovery rate via knockoffs. Ann Stat 43:2055–2085
Freedman DA (1981) Bootstrapping regression models. Ann Stat 9:1218–1228
Giné E, Zinn J (1989) Necessary conditions for the bootstrap of the mean. Ann Stat 17:684–691
Giné E, Zinn J (1990) Bootstrapping general empirical measures. Ann Probab 18:851–869
Hall P, Wilson SR (1991) Two guidelines for bootstrap hypothesis testing. Biometrics 47:757–762
Huber PJ (1967) The behavior of maximum likelihood estimates under nonstandard conditions. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 221–233
Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res 15:2869–2909
Liu RY, Singh K (1992) Efficiency and robustness in resampling. Ann Stat 20:370–384
Liu H, Yu B (2013) Asymptotic properties of lasso+mls and lasso+ridge in sparse high-dimensional linear regression. Electron J Stat 7:3124–3169
Mammen E (1993) Bootstrap and wild bootstrap for high dimensional linear models. Ann Stat 21:255–285
McKeague IW, Qian M (2015) An adaptive resampling test for detecting the presence of significant predictors. J Am Stat Assoc 110:1422–1433
Meier L, Dezeure R, Meinshausen N, Mächler M, Bühlmann P (2016) hdi: high-dimensional inference. R package version 0.1-6
Meinshausen N (2015) Group bound: confidence intervals for groups of variables in sparse high dimensional regression without assumptions on the design. J R Stat Soc B 77:923–945
Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the Lasso. Ann Stat 34:1436–1462
Meinshausen N, Bühlmann P (2010) Stability selection (with discussion). J R Stat Soc B 72:417–473
Meinshausen N, Meier L, Bühlmann P (2009) P-values for high-dimensional regression. J Am Stat Assoc 104:1671–1681
Meinshausen N, Maathuis MH, Bühlmann P (2011) Asymptotic optimality of the Westfall-Young permutation procedure for multiple testing under dependence. Ann Stat 39:3369–3391
Reid S, Tibshirani R, Friedman J (2016) A study of error variance estimation in Lasso regression. Stat Sinica 26:35–67
Rudelson M, Zhou S (2013) Reconstruction from anisotropic random measurements. IEEE Trans Inf Theory 59:3434–3447
Shah R, Samworth R (2013) Variable selection with error control: another look at stability selection. J R Stat Soc B 75:55–80
Shah R, Bühlmann P (2015) Goodness of fit tests for high-dimensional linear models. J R Stat Soc B. doi:10.1111/rssb.12234
van de Geer S, Bühlmann P, Zhou S (2011) The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso). Electron J Stat 5:688–749
van de Geer S, Bühlmann P, Ritov Y, Dezeure R (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Stat 42:1166–1202
Wasserman L, Roeder K (2009) High dimensional variable selection. Ann Stat 37:2178–2201
Westfall P, Young S (1993) Resampling-based multiple testing: examples and methods for P-value adjustment. Wiley, Hoboken
White H (1980) A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48:817–838
Wu C-FJ (1986) Jackknife, bootstrap and other resampling methods in regression analysis. Ann Stat 14:1261–1295
Ye F, Zhang C-H (2010) Rate minimaxity of the Lasso and Dantzig selector for the \(\ell _q\) loss in \(\ell _r\) balls. J Mach Learn Res 11:3481–3502
Zhang C-H, Huang J (2008) The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann Stat 36:1567–1594
Zhang C-H, Zhang SS (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc B 76:217–242
Zhang X, Cheng G (2016) Simultaneous inference for high-dimensional linear models. J Am Stat Assoc. doi:10.1080/01621459.2016.1166114
Zhou Q (2014) Monte Carlo simulation for Lasso-type problems by estimator augmentation. J Am Stat Assoc 109:1495–1516
Acknowledgements
We gratefully acknowledge visits at the American Institute of Mathematics (AIM), San Jose, USA, and at the Mathematisches Forschungsinstitut (MFO), Oberwolfach, Germany. We also thank anonymous reviewers for constructive comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Ruben Dezeure is partially supported by the Swiss National Science Foundation SNF 2-77991-14. Cun-Hui Zhang is partially supported by NSF Grants DMS-12-09014 and DMS-15-13378 and NSA Grant H98230-15-1-0040.
This invited paper is discussed in comments available at: doi:10.1007/s11749-017-0555-1; doi:10.1007/s11749-017-0556-0; doi:10.1007/s11749-017-0557-z; doi:10.1007/s11749-017-0558-y; doi:10.1007/s11749-017-0559-x.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Dezeure, R., Bühlmann, P. & Zhang, CH. High-dimensional simultaneous inference with the bootstrap. TEST 26, 685–719 (2017). https://doi.org/10.1007/s11749-017-0554-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-017-0554-2
Keywords
- De-biased Lasso
- De-sparsified Lasso
- Gaussian approximation for maxima
- High-dimensional linear model
- Heteroscedastic errors
- Multiple testing
- Westfall–Young method