Statistics and Computing

, Volume 27, Issue 1, pp 169–179 | Cite as

Beyond support in two-stage variable selection

  • Jean-Michel Bécu
  • Yves GrandvaletEmail author
  • Christophe Ambroise
  • Cyril Dalmasso


Numerous variable selection methods rely on a two-stage procedure, where a sparsity-inducing penalty is used in the first stage to predict the support, which is then conveyed to the second stage for estimation or inference purposes. In this framework, the first stage screens variables to find a set of possibly relevant variables and the second stage operates on this set of candidate variables, to improve estimation accuracy or to assess the uncertainty associated to the selection of variables. We advocate that more information can be conveyed from the first stage to the second one: we use the magnitude of the coefficients estimated in the first stage to define an adaptive penalty that is applied at the second stage. We give the example of an inference procedure that highly benefits from the proposed transfer of information. The procedure is precisely analyzed in a simple setting, and our large-scale experiments empirically demonstrate that actual benefits can be expected in much more general situations, with sensitivity gains ranging from 50 to 100 % compared to state-of-the-art.


Linear model Lasso Variable selection p-values False discovery rate Screen and clean 



This work was supported by the UTC foundation for Innovation, in the ToxOnChip program. It has been carried out in the framework of the Labex MS2T (ANR-11-IDEX-0004-02) within the “Investments for the future” program, managed by the National Agency for Research.


  1. Ambroise, C., McLachlan, G.J.: Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. 99(10), 6562–6566 (2002)CrossRefzbMATHGoogle Scholar
  2. Anders, S., Huber, W.: Differential expression analysis for sequence count data. Genome Biol. 11(10), R106 (2010)CrossRefGoogle Scholar
  3. Anderson, M.J., Robinson, J.: Permutation tests for linear models. Austral. N. Z. J. Stat. 43(1), 75–88 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  4. Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Found. Trends Mach. Learn. 4(1), 1–106 (2012)CrossRefzbMATHGoogle Scholar
  5. Balding, D.: A tutorial on statistical methods for population association studies. Nat. Rev. Genet. 7(10), 781–791 (2006)CrossRefGoogle Scholar
  6. Belloni, A., Chernozhukov, V.: Least squares after model selection in high-dimensional sparse models. Bernoulli 19(2), 521–547 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  7. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57(1), 289–300 (1995)MathSciNetzbMATHGoogle Scholar
  8. Boulesteix, A.L., Schmid, M.: Machine learning versus statistical modeling. Biom. J. 56, 588–593 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  9. Bühlmann, P.: Statistical significance in high-dimensional linear models. Bernoulli 19, 1212–1242 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  10. Candès, E., Tao, T.: The Dantzig selector: statistical estimation when \(p\) is much larger than \(n\). Ann. Stat. 35, 2313–2351 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  11. Chatterjee, A., Lahiri, S.N.: Rates of convergence of the adaptive lasso estimators to the oracle distribution and higher order refinements by the bootstrap. Ann. Stat. 41(3), 1232–1259 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  12. Chong, I.G., Jun, C.H.: Performance of some variable selection methods when multicollinearity is present. Chemom. Intel. Lab. Syst. 78(1–2), 103–112 (2005)CrossRefGoogle Scholar
  13. Cule, E., Vineis, P., De Lorio, M.: Significance testing in ridge regression for genetic data. BMC Bioinf. 12(372), 1–15 (2011)Google Scholar
  14. Dalmasso, C., Carpentier, W., Meyer, L., Rouzioux, C., Goujard, C., Chaix, M.L., Lambotte, O., Avettand-Fenoel, V., Le Clerc, S., Denis de Senneville, L., Deveau, C., Boufassa, F., Debre, P., Delfraissy, J.F., Broet, P., Theodorou, I.: Distinct genetic loci control plasma HIV-RNA and cellular HIV-DNA levels in HIV-1 infection: the ANRS genome wide association 01 study. PLoS One 3(12), e3907 (2008)CrossRefGoogle Scholar
  15. Dudoit, S., Van der Laan, M.: Multiple Testing Procedures with Applications to Genomics. Springer, New York (2008)CrossRefzbMATHGoogle Scholar
  16. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  17. Grandvalet, Y.: Least absolute shrinkage is equivalent to quadratic penalization. In: Niklasson L, Bodén M, Ziemske T (eds) ICANN’98, Perspectives in Neural Computing, vol 1, Springer, New York, pp. 201–206 (1998)Google Scholar
  18. Grandvalet, Y., Canu, S.: Outcomes of the equivalence of adaptive ridge with least absolute shrinkage. In: Kearns MS, Solla SA, Cohn DA (eds) Advances in Neural Information Processing Systems 11 (NIPS 1998), MIT Press, Cambridge, pp. 445–451 (1999)Google Scholar
  19. Halawa, A.M., El Bassiouni, M.Y.: Tests of regressions coefficients under ridge regression models. J. Stat. Comput. Simul. 65(1), 341–356 (1999)MathSciNetzbMATHGoogle Scholar
  20. Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models, Monographs on Statistics and Applied Probability, vol. 43. Chapman & Hall, London (1990)zbMATHGoogle Scholar
  21. Huang, J., Horowitz, J.L., Ma, S.: Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann. Stat. 36(2), 587–613 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  22. Kyung, M., Gill, J., Ghosh, M., Casella, G.: Penalized regression, standard errors, and Bayesian lassos. Bayesian Anal. 5(2), 369–411 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  23. Liu, H., Yu, B.: Asymptotic properties of lasso+mls and lasso+ridge in sparse high-dimensional linear regression. Electr. J. Stat. 7, 3124–3169 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  24. Lockhart, R., Taylor, J., Tibshirani, R.J., Tibshirani, R.: A significance test for the lasso. Ann. Stat. 42(2), 413–468 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  25. Meinshausen, N.: Relaxed lasso. Comput. Stat. Data Anal. 52(1), 374–393 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  26. Meinshausen, N., Meier, L., Bühlmann, P.: \(p\)-values for high-dimensional regression. J. Am. Stat. Assoc. 104(488), 1671–1681 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  27. Tenenhaus, A., Philippe, C., Guillemot, V., Le Cao, K.A., Grill, J., Frouin, V.: Variable selection for generalized canonical correlation analysis. Biostatistics 15(3), 569–583 (2014)CrossRefGoogle Scholar
  28. Tibshirani, R.J.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  29. Verzelen, N.: Minimax risks for sparse regressions: ultra-high dimensional phenomenons. Electr. J. Stat. 6, 38–90 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  30. Wang, Y., Yang, J., Yin, W., Zhang, W.: A new alternating minimization algorithm for total variation image reconstruction. SIAM J. Imaging Sci. 1(3), 248–272 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  31. Wasserman, L., Roeder, K.: High-dimensional variable selection. Ann. Stat. 37(5A), 2178–2201 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  32. Xing, E.P., Jordan, M.I., Karp, R.M.: Feature selection for high-dimensional genomic microarray data. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), pp. 601–608 (2001)Google Scholar
  33. Zhang, C.H., Zhang, S.S.: Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76(1), 217–242 (2014)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Jean-Michel Bécu
    • 1
  • Yves Grandvalet
    • 1
    Email author
  • Christophe Ambroise
    • 2
  • Cyril Dalmasso
    • 2
  1. 1.Sorbonne universités, Université de technologie de Compiègne, CNRS, Heudiasyc UMR 7253Compiègne CedexFrance
  2. 2.LaMMEUniversité d’Évry val d’EssonneÉvryFrance

Personalised recommendations