Advertisement

TEST

, Volume 26, Issue 4, pp 685–719 | Cite as

High-dimensional simultaneous inference with the bootstrap

  • Ruben Dezeure
  • Peter Bühlmann
  • Cun-Hui Zhang
Invited Paper

Abstract

We propose a residual and wild bootstrap methodology for individual and simultaneous inference in high-dimensional linear models with possibly non-Gaussian and heteroscedastic errors. We establish asymptotic consistency for simultaneous inference for parameters in groups G, where \(p \gg n\), \(s_0 = o(n^{1/2}/\{\log (p) \log (|G|)^{1/2}\})\) and \(\log (|G|) = o(n^{1/7})\), with p the number of variables, n the sample size and \(s_0\) the sparsity. The theory is complemented by many empirical results. Our proposed procedures are implemented in the R-package hdi (Meier et al. hdi: high-dimensional inference. R package version 0.1-6, 2016).

Keywords

De-biased Lasso De-sparsified Lasso Gaussian approximation for maxima High-dimensional linear model Heteroscedastic errors Multiple testing Westfall–Young method 

Mathematics Subject Classification

62J07 62F40 

Notes

Acknowledgements

We gratefully acknowledge visits at the American Institute of Mathematics (AIM), San Jose, USA, and at the Mathematisches Forschungsinstitut (MFO), Oberwolfach, Germany. We also thank anonymous reviewers for constructive comments.

Supplementary material

11749_2017_554_MOESM1_ESM.pdf (510 kb)
Supplementary material 1 (pdf 510 KB)

References

  1. Belloni A, Chernozhukov V, Chetverikov D, Wei Y (2015a) Uniformly valid post-regularization confidence regions for many functional parameters in z-estimation. Preprint arXiv:1512.07619
  2. Belloni A, Chernozhukov V, Kato K (2015b) Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems. Biometrika 102(1):77–94MathSciNetCrossRefzbMATHGoogle Scholar
  3. Bickel P, Klaassen C, Ritov Y, Wellner J (1998) Efficient and adaptive estimation for semiparametric models. Springer, BerlinzbMATHGoogle Scholar
  4. Breiman L (1996) Heuristics of instability and stabilization in model selection. Ann Stat 24:2350–2383MathSciNetCrossRefzbMATHGoogle Scholar
  5. Bühlmann P (2013) Statistical significance in high-dimensional linear models. Bernoulli 19:1212–1242MathSciNetCrossRefzbMATHGoogle Scholar
  6. Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, BerlinCrossRefzbMATHGoogle Scholar
  7. Bühlmann P, van de Geer S (2015) High-dimensional inference in misspecified linear models. Electron J Stat 9:1449–1473MathSciNetCrossRefzbMATHGoogle Scholar
  8. Bühlmann P, Kalisch M, Meier L (2014) High-dimensional statistics with a view towards applications in biology. Annu Rev Stat Appl 1:255–278CrossRefGoogle Scholar
  9. Chatterjee A, Lahiri S (2011) Bootstrapping Lasso estimators. J Am Stat Assoc 106:608–625MathSciNetCrossRefzbMATHGoogle Scholar
  10. Chatterjee A, Lahiri S (2013) Rates of convergence of the adaptive LASSO estimators to the oracle distribution and higher order refinements by the bootstrap. Ann Stat 41:1232–1259MathSciNetCrossRefzbMATHGoogle Scholar
  11. Chernozhukov V, Chetverikov D, Kato K (2013) Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann Stat 41:2786–2819MathSciNetCrossRefzbMATHGoogle Scholar
  12. Chernozhukov V, Chetverikov D, Kato K (2014) Central limit theorems and bootstrap in high dimensions. The Annals of Probabiliy, To appear, Preprint arXiv:1412.3661 zbMATHGoogle Scholar
  13. Chernozhukov V, Hansen C, Spindler M (2016) hdm: high-dimensional metrics. Preprint arXiv:1608.00354
  14. Deng H, Zhang C-H (2017) Beyond Gaussian approximation: bootstrap in large scale simultaneous inference. unpublished work in progressGoogle Scholar
  15. Dezeure R, Bühlmann P, Meier L, Meinshausen N (2015) High-dimensional inference: confidence intervals, \(p\)-values and R-software hdi. Stat Sci 30:533–558MathSciNetCrossRefGoogle Scholar
  16. Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26MathSciNetCrossRefzbMATHGoogle Scholar
  17. Eicker F (1967) Limit theorems for regressions with unequal and dependent errors. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 59–82Google Scholar
  18. Foygel Barber R, Candès EJ (2015) Controlling the false discovery rate via knockoffs. Ann Stat 43:2055–2085MathSciNetCrossRefzbMATHGoogle Scholar
  19. Freedman DA (1981) Bootstrapping regression models. Ann Stat 9:1218–1228MathSciNetCrossRefzbMATHGoogle Scholar
  20. Giné E, Zinn J (1989) Necessary conditions for the bootstrap of the mean. Ann Stat 17:684–691MathSciNetCrossRefzbMATHGoogle Scholar
  21. Giné E, Zinn J (1990) Bootstrapping general empirical measures. Ann Probab 18:851–869MathSciNetCrossRefzbMATHGoogle Scholar
  22. Hall P, Wilson SR (1991) Two guidelines for bootstrap hypothesis testing. Biometrics 47:757–762MathSciNetCrossRefGoogle Scholar
  23. Huber PJ (1967) The behavior of maximum likelihood estimates under nonstandard conditions. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 221–233Google Scholar
  24. Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res 15:2869–2909MathSciNetzbMATHGoogle Scholar
  25. Liu RY, Singh K (1992) Efficiency and robustness in resampling. Ann Stat 20:370–384MathSciNetCrossRefzbMATHGoogle Scholar
  26. Liu H, Yu B (2013) Asymptotic properties of lasso+mls and lasso+ridge in sparse high-dimensional linear regression. Electron J Stat 7:3124–3169MathSciNetCrossRefzbMATHGoogle Scholar
  27. Mammen E (1993) Bootstrap and wild bootstrap for high dimensional linear models. Ann Stat 21:255–285MathSciNetCrossRefzbMATHGoogle Scholar
  28. McKeague IW, Qian M (2015) An adaptive resampling test for detecting the presence of significant predictors. J Am Stat Assoc 110:1422–1433MathSciNetCrossRefzbMATHGoogle Scholar
  29. Meier L, Dezeure R, Meinshausen N, Mächler M, Bühlmann P (2016) hdi: high-dimensional inference. R package version 0.1-6Google Scholar
  30. Meinshausen N (2015) Group bound: confidence intervals for groups of variables in sparse high dimensional regression without assumptions on the design. J R Stat Soc B 77:923–945MathSciNetCrossRefGoogle Scholar
  31. Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the Lasso. Ann Stat 34:1436–1462MathSciNetCrossRefzbMATHGoogle Scholar
  32. Meinshausen N, Bühlmann P (2010) Stability selection (with discussion). J R Stat Soc B 72:417–473MathSciNetCrossRefGoogle Scholar
  33. Meinshausen N, Meier L, Bühlmann P (2009) P-values for high-dimensional regression. J Am Stat Assoc 104:1671–1681MathSciNetCrossRefzbMATHGoogle Scholar
  34. Meinshausen N, Maathuis MH, Bühlmann P (2011) Asymptotic optimality of the Westfall-Young permutation procedure for multiple testing under dependence. Ann Stat 39:3369–3391MathSciNetCrossRefzbMATHGoogle Scholar
  35. Reid S, Tibshirani R, Friedman J (2016) A study of error variance estimation in Lasso regression. Stat Sinica 26:35–67MathSciNetzbMATHGoogle Scholar
  36. Rudelson M, Zhou S (2013) Reconstruction from anisotropic random measurements. IEEE Trans Inf Theory 59:3434–3447MathSciNetCrossRefzbMATHGoogle Scholar
  37. Shah R, Samworth R (2013) Variable selection with error control: another look at stability selection. J R Stat Soc B 75:55–80MathSciNetCrossRefGoogle Scholar
  38. Shah R, Bühlmann P (2015) Goodness of fit tests for high-dimensional linear models. J R Stat Soc B. doi: 10.1111/rssb.12234
  39. van de Geer S, Bühlmann P, Zhou S (2011) The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso). Electron J Stat 5:688–749MathSciNetCrossRefzbMATHGoogle Scholar
  40. van de Geer S, Bühlmann P, Ritov Y, Dezeure R (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Stat 42:1166–1202MathSciNetCrossRefzbMATHGoogle Scholar
  41. Wasserman L, Roeder K (2009) High dimensional variable selection. Ann Stat 37:2178–2201MathSciNetCrossRefzbMATHGoogle Scholar
  42. Westfall P, Young S (1993) Resampling-based multiple testing: examples and methods for P-value adjustment. Wiley, HobokenzbMATHGoogle Scholar
  43. White H (1980) A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48:817–838MathSciNetCrossRefzbMATHGoogle Scholar
  44. Wu C-FJ (1986) Jackknife, bootstrap and other resampling methods in regression analysis. Ann Stat 14:1261–1295MathSciNetCrossRefzbMATHGoogle Scholar
  45. Ye F, Zhang C-H (2010) Rate minimaxity of the Lasso and Dantzig selector for the \(\ell _q\) loss in \(\ell _r\) balls. J Mach Learn Res 11:3481–3502MathSciNetGoogle Scholar
  46. Zhang C-H, Huang J (2008) The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann Stat 36:1567–1594MathSciNetCrossRefzbMATHGoogle Scholar
  47. Zhang C-H, Zhang SS (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc B 76:217–242MathSciNetCrossRefGoogle Scholar
  48. Zhang X, Cheng G (2016) Simultaneous inference for high-dimensional linear models. J Am Stat Assoc. doi: 10.1080/01621459.2016.1166114 Google Scholar
  49. Zhou Q (2014) Monte Carlo simulation for Lasso-type problems by estimator augmentation. J Am Stat Assoc 109:1495–1516MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Sociedad de Estadística e Investigación Operativa 2017

Authors and Affiliations

  1. 1.Seminar for StatisticsETH ZürichZurichSwitzerland
  2. 2.Department of Statistics and BiostatisticsRutgers UniversityPiscatawayUSA

Personalised recommendations