Variable selection procedure from multiple testing

  • Baoxue Zhang
  • Guanghui Cheng
  • Chunming Zhang
  • Shurong Zheng


Variable selection has played an important role in statistical learning and scientific discoveries during the past ten years, and multiple testing is a fundamental problem in statistical inference and also has wide application in many scientific fields. Significant advances have been achieved in both areas. This study attempts to find a connection between adaptive LASSO (least absolute shrinkage and selection operator) and multiple testing procedures in linear regression models. We also propose procedures based on multiple testing methods to select variables and control the selection error rate, i.e., the false discovery rate. Simulation studies demonstrate that the proposed methods show good performance relative to controlling the selection error rate under a wide range of settings.


variable selection multiple testing adaptive LASSO false discovery rate linear regression 


35J60 35J70 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



This work was supported by National Natural Science Foundation of China (Grant Nos. 11671268, 11522105, and 11690012). The authors thank the reviewers for their constructive comments, which helped us improve this manuscript substantially.


  1. 1.
    Barber R F, Candes E. Controlling the false discovery rate via knockoffs. ArXiv:1404.5609, 2014zbMATHGoogle Scholar
  2. 2.
    Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Ser B Stat Methodol, 1995, 57: 289–300MathSciNetzbMATHGoogle Scholar
  3. 3.
    Buhlmann P, van de Geer S. Statistics for High-Dimensional Data: Methods, Theory and Applications. New York: Springer, 2011CrossRefzbMATHGoogle Scholar
  4. 4.
    Bunea F, Wegkamp M H, Auguste A. Consistent variable selection in high dimensional regression via multiple testing. J Statist Plann Inference, 2006, 136: 4349–4364MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Candes E J, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann Statist, 2007, 35: 2313–2351MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Conlon E M, Liu X S, Lieb J D, et al. Integrating regulatory motif discovery and genome-wide expression analysis. Proc Natl Acad Sci USA, 2003, 100: 3339–3344CrossRefGoogle Scholar
  7. 7.
    Efron B. Correlation and large-scale simultaneous sigfinicance testing. J Amer Statist Assoc, 2007, 102: 93–103MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Efron B, Hastie T, Johnstone I, et al. Least angle regression. Ann Statist, 2004, 32: 407–489MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Fan J Q, Han X, Gu W J. Estimating false discovery proportion under arbitrary covariance testing. J Amer Statist Assoc, 2012, 107: 1019–1035MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Fan J Q, Li R Z. Variable selection via noncave penalized likelihood and its oracle properties. J Amer Statist Assoc, 2001, 96: 1348–1360MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Ferreira J, Zwinderman A. On the Benjamini-Hochberg method. Ann Statist, 2006, 34: 1827–1849MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Furmańczyk K. On some stepdown procedures with application to consistent variable selection in linear regression. Statistics, 2015, 49: 614–628MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Meinshausen N, Meier L, Buhlmann P. P-values for high-dimensional regression. J Amer Statist Assoc, 2009, 104: 1671–1681MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Storey J D. A direct approach to false discovery rates. J R Stat Soc Ser B Stat Methodol, 2002, 64: 479–498MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Storey J D, Taylor J E, Siegmund D. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J R Stat Soc Ser B Stat Methodol, 2004, 66: 187–205MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Tibshirani R. Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B Stat Methodol, 1996, 58: 267–288MathSciNetzbMATHGoogle Scholar
  17. 17.
    Tibshirani R, Hoeing H, Tibshirani R. Nearly isotonic regression. Technometrics, 2011, 53: 54–61MathSciNetCrossRefGoogle Scholar
  18. 18.
    Wasserman L, Roeder K. High dimensional variable selection. Ann Statist, 2009, 37: 2178–2201MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol, 2006, 68: 49–67MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Yuan M, Lin Y. Model selection and estimation in the Gaussian graphical model. Biometrika, 2007, 94: 19–35MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Zhang C M. Assessing mean and median filters in multiple testing for large-scale imaging data. TEST, 2014, 23: 51–71MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Zhang C M, Fan J Q, Yu T. Multiple testing via FDRL for large-scale imaging data. Ann Statist, 2011, 39: 613–642MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Zou H, Hastie T. Regularizaition and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol, 2005, 67: 301–320MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Zou H. The adaptive LASSO and its oracle properties. J Amer Statist Assoc, 2006, 476: 1418–1429MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Science China Press and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Baoxue Zhang
    • 1
  • Guanghui Cheng
    • 2
  • Chunming Zhang
    • 3
  • Shurong Zheng
    • 2
  1. 1.School of StatisticsCapital University of Economics and BusinessBeijingChina
  2. 2.School of Mathematics and Statistics and Key Laboratory of Applied Statistics of Ministry of EducationNortheast Normal UniversityChangchunChina
  3. 3.Department of StatisticsUniversity of Wisconsin-MadisonMadisonUSA

Personalised recommendations