Skip to main content
Log in

High-dimensional simultaneous inference with the bootstrap

  • Invited Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

We propose a residual and wild bootstrap methodology for individual and simultaneous inference in high-dimensional linear models with possibly non-Gaussian and heteroscedastic errors. We establish asymptotic consistency for simultaneous inference for parameters in groups G, where \(p \gg n\), \(s_0 = o(n^{1/2}/\{\log (p) \log (|G|)^{1/2}\})\) and \(\log (|G|) = o(n^{1/7})\), with p the number of variables, n the sample size and \(s_0\) the sparsity. The theory is complemented by many empirical results. Our proposed procedures are implemented in the R-package hdi (Meier et al. hdi: high-dimensional inference. R package version 0.1-6, 2016).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Belloni A, Chernozhukov V, Chetverikov D, Wei Y (2015a) Uniformly valid post-regularization confidence regions for many functional parameters in z-estimation. Preprint arXiv:1512.07619

  • Belloni A, Chernozhukov V, Kato K (2015b) Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems. Biometrika 102(1):77–94

    Article  MathSciNet  MATH  Google Scholar 

  • Bickel P, Klaassen C, Ritov Y, Wellner J (1998) Efficient and adaptive estimation for semiparametric models. Springer, Berlin

    MATH  Google Scholar 

  • Breiman L (1996) Heuristics of instability and stabilization in model selection. Ann Stat 24:2350–2383

    Article  MathSciNet  MATH  Google Scholar 

  • Bühlmann P (2013) Statistical significance in high-dimensional linear models. Bernoulli 19:1212–1242

    Article  MathSciNet  MATH  Google Scholar 

  • Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, Berlin

    Book  MATH  Google Scholar 

  • Bühlmann P, van de Geer S (2015) High-dimensional inference in misspecified linear models. Electron J Stat 9:1449–1473

    Article  MathSciNet  MATH  Google Scholar 

  • Bühlmann P, Kalisch M, Meier L (2014) High-dimensional statistics with a view towards applications in biology. Annu Rev Stat Appl 1:255–278

    Article  Google Scholar 

  • Chatterjee A, Lahiri S (2011) Bootstrapping Lasso estimators. J Am Stat Assoc 106:608–625

    Article  MathSciNet  MATH  Google Scholar 

  • Chatterjee A, Lahiri S (2013) Rates of convergence of the adaptive LASSO estimators to the oracle distribution and higher order refinements by the bootstrap. Ann Stat 41:1232–1259

    Article  MathSciNet  MATH  Google Scholar 

  • Chernozhukov V, Chetverikov D, Kato K (2013) Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann Stat 41:2786–2819

    Article  MathSciNet  MATH  Google Scholar 

  • Chernozhukov V, Chetverikov D, Kato K (2014) Central limit theorems and bootstrap in high dimensions. The Annals of Probabiliy, To appear, Preprint arXiv:1412.3661

    MATH  Google Scholar 

  • Chernozhukov V, Hansen C, Spindler M (2016) hdm: high-dimensional metrics. Preprint arXiv:1608.00354

  • Deng H, Zhang C-H (2017) Beyond Gaussian approximation: bootstrap in large scale simultaneous inference. unpublished work in progress

  • Dezeure R, Bühlmann P, Meier L, Meinshausen N (2015) High-dimensional inference: confidence intervals, \(p\)-values and R-software hdi. Stat Sci 30:533–558

    Article  MathSciNet  Google Scholar 

  • Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26

    Article  MathSciNet  MATH  Google Scholar 

  • Eicker F (1967) Limit theorems for regressions with unequal and dependent errors. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 59–82

  • Foygel Barber R, Candès EJ (2015) Controlling the false discovery rate via knockoffs. Ann Stat 43:2055–2085

    Article  MathSciNet  MATH  Google Scholar 

  • Freedman DA (1981) Bootstrapping regression models. Ann Stat 9:1218–1228

    Article  MathSciNet  MATH  Google Scholar 

  • Giné E, Zinn J (1989) Necessary conditions for the bootstrap of the mean. Ann Stat 17:684–691

    Article  MathSciNet  MATH  Google Scholar 

  • Giné E, Zinn J (1990) Bootstrapping general empirical measures. Ann Probab 18:851–869

    Article  MathSciNet  MATH  Google Scholar 

  • Hall P, Wilson SR (1991) Two guidelines for bootstrap hypothesis testing. Biometrics 47:757–762

    Article  MathSciNet  Google Scholar 

  • Huber PJ (1967) The behavior of maximum likelihood estimates under nonstandard conditions. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 221–233

  • Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res 15:2869–2909

    MathSciNet  MATH  Google Scholar 

  • Liu RY, Singh K (1992) Efficiency and robustness in resampling. Ann Stat 20:370–384

    Article  MathSciNet  MATH  Google Scholar 

  • Liu H, Yu B (2013) Asymptotic properties of lasso+mls and lasso+ridge in sparse high-dimensional linear regression. Electron J Stat 7:3124–3169

    Article  MathSciNet  MATH  Google Scholar 

  • Mammen E (1993) Bootstrap and wild bootstrap for high dimensional linear models. Ann Stat 21:255–285

    Article  MathSciNet  MATH  Google Scholar 

  • McKeague IW, Qian M (2015) An adaptive resampling test for detecting the presence of significant predictors. J Am Stat Assoc 110:1422–1433

    Article  MathSciNet  MATH  Google Scholar 

  • Meier L, Dezeure R, Meinshausen N, Mächler M, Bühlmann P (2016) hdi: high-dimensional inference. R package version 0.1-6

  • Meinshausen N (2015) Group bound: confidence intervals for groups of variables in sparse high dimensional regression without assumptions on the design. J R Stat Soc B 77:923–945

    Article  MathSciNet  Google Scholar 

  • Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the Lasso. Ann Stat 34:1436–1462

    Article  MathSciNet  MATH  Google Scholar 

  • Meinshausen N, Bühlmann P (2010) Stability selection (with discussion). J R Stat Soc B 72:417–473

    Article  MathSciNet  Google Scholar 

  • Meinshausen N, Meier L, Bühlmann P (2009) P-values for high-dimensional regression. J Am Stat Assoc 104:1671–1681

    Article  MathSciNet  MATH  Google Scholar 

  • Meinshausen N, Maathuis MH, Bühlmann P (2011) Asymptotic optimality of the Westfall-Young permutation procedure for multiple testing under dependence. Ann Stat 39:3369–3391

    Article  MathSciNet  MATH  Google Scholar 

  • Reid S, Tibshirani R, Friedman J (2016) A study of error variance estimation in Lasso regression. Stat Sinica 26:35–67

    MathSciNet  MATH  Google Scholar 

  • Rudelson M, Zhou S (2013) Reconstruction from anisotropic random measurements. IEEE Trans Inf Theory 59:3434–3447

    Article  MathSciNet  MATH  Google Scholar 

  • Shah R, Samworth R (2013) Variable selection with error control: another look at stability selection. J R Stat Soc B 75:55–80

    Article  MathSciNet  Google Scholar 

  • Shah R, Bühlmann P (2015) Goodness of fit tests for high-dimensional linear models. J R Stat Soc B. doi:10.1111/rssb.12234

  • van de Geer S, Bühlmann P, Zhou S (2011) The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso). Electron J Stat 5:688–749

    Article  MathSciNet  MATH  Google Scholar 

  • van de Geer S, Bühlmann P, Ritov Y, Dezeure R (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Stat 42:1166–1202

    Article  MathSciNet  MATH  Google Scholar 

  • Wasserman L, Roeder K (2009) High dimensional variable selection. Ann Stat 37:2178–2201

    Article  MathSciNet  MATH  Google Scholar 

  • Westfall P, Young S (1993) Resampling-based multiple testing: examples and methods for P-value adjustment. Wiley, Hoboken

    MATH  Google Scholar 

  • White H (1980) A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48:817–838

    Article  MathSciNet  MATH  Google Scholar 

  • Wu C-FJ (1986) Jackknife, bootstrap and other resampling methods in regression analysis. Ann Stat 14:1261–1295

    Article  MathSciNet  MATH  Google Scholar 

  • Ye F, Zhang C-H (2010) Rate minimaxity of the Lasso and Dantzig selector for the \(\ell _q\) loss in \(\ell _r\) balls. J Mach Learn Res 11:3481–3502

    MathSciNet  Google Scholar 

  • Zhang C-H, Huang J (2008) The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann Stat 36:1567–1594

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang C-H, Zhang SS (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc B 76:217–242

    Article  MathSciNet  Google Scholar 

  • Zhang X, Cheng G (2016) Simultaneous inference for high-dimensional linear models. J Am Stat Assoc. doi:10.1080/01621459.2016.1166114

    Google Scholar 

  • Zhou Q (2014) Monte Carlo simulation for Lasso-type problems by estimator augmentation. J Am Stat Assoc 109:1495–1516

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge visits at the American Institute of Mathematics (AIM), San Jose, USA, and at the Mathematisches Forschungsinstitut (MFO), Oberwolfach, Germany. We also thank anonymous reviewers for constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Bühlmann.

Additional information

Ruben Dezeure is partially supported by the Swiss National Science Foundation SNF 2-77991-14. Cun-Hui Zhang is partially supported by NSF Grants DMS-12-09014 and DMS-15-13378 and NSA Grant H98230-15-1-0040.

This invited paper is discussed in comments available at: doi:10.1007/s11749-017-0555-1; doi:10.1007/s11749-017-0556-0; doi:10.1007/s11749-017-0557-z; doi:10.1007/s11749-017-0558-y; doi:10.1007/s11749-017-0559-x.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 510 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dezeure, R., Bühlmann, P. & Zhang, CH. High-dimensional simultaneous inference with the bootstrap. TEST 26, 685–719 (2017). https://doi.org/10.1007/s11749-017-0554-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-017-0554-2

Keywords

Mathematics Subject Classification

Navigation