Asymptotic theory of the adaptive Sparse Group Lasso

  • Benjamin PoignardEmail author


We study the asymptotic properties of a new version of the Sparse Group Lasso estimator (SGL), called adaptive SGL. This new version includes two distinct regularization parameters, one for the Lasso penalty and one for the Group Lasso penalty, and we consider the adaptive version of this regularization, where both penalties are weighted by preliminary random coefficients. The asymptotic properties are established in a general framework, where the data are dependent and the loss function is convex. We prove that this estimator satisfies the oracle property: the sparsity-based estimator recovers the true underlying sparse model and is asymptotically normally distributed. We also study its asymptotic properties in a double-asymptotic framework, where the number of parameters diverges with the sample size. We show by simulations and on real data that the adaptive SGL outperforms other oracle-like methods in terms of estimation precision and variable selection.


Asymptotic normality Consistency Oracle property 



I would like to thank Alexandre Tsybakov, Arnak Dalalyan, Jean-Michel Zakoïan and Christian Francq for all the theoretical references they provided. And I thank warmly Jean-David Fermanian for his significant help and helpful comments. I gratefully acknowledge the Ecodec Laboratory for its support and the Japan Society for the Promotion of Science.

Supplementary material

10463_2018_692_MOESM1_ESM.pdf (279 kb)
Supplementary material 1 (pdf 279 KB)


  1. Anderson, P. K., Gill, R. D. (1982). Cox’s regression model for counting processes: A large sample study. The Annals of Statistics, 10(4), 1100–1120.MathSciNetCrossRefGoogle Scholar
  2. Bertsekas, D. (1995). Nonlinear programming. Belmont, MA: Athena Scientific.zbMATHGoogle Scholar
  3. Billingsley, P. (1961). The Lindeberg–Levy theorem for martingales. Proceedings of the American Mathematical Society, 12, 788792.zbMATHGoogle Scholar
  4. Billingsley, P. (1995). Probability and measure. New York: Wiley.zbMATHGoogle Scholar
  5. Bühlmann, P., van de Geer, S. (2011). Statistics for high-dimensional data: Methods, theory and applications. Springer series in statistics Berlin: Springer.CrossRefGoogle Scholar
  6. Chernozhukov, V. (2005). Extremal quantile regression. The Annals of Statistics, 33(2), 806–839.MathSciNetCrossRefGoogle Scholar
  7. Chernozhukov, V., Hong, H. (2004). Likelihood estimation and inference in a class of nonregular econometric models. Econometrica, 72(5), 1445–1480.MathSciNetCrossRefGoogle Scholar
  8. Davis, R. A., Knight, K., Liu, J. (1992). M-estimation for autoregressions with infinite variance. Stochastic Processes and Their Applications, 40, 145–180.MathSciNetCrossRefGoogle Scholar
  9. Fan, J. (1997). Comments on wavelets in statistics: A review by A. Antoniadis. Journal of the Italian Statistical Association, 6, 131138.Google Scholar
  10. Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.MathSciNetCrossRefGoogle Scholar
  11. Fan, J., Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 32(3), 928–961.MathSciNetCrossRefGoogle Scholar
  12. Francq, C., Thieu, L. Q. (2015). QML inference for volatility models with covariates. MPRA paper no. 63198.Google Scholar
  13. Francq, C., Zakoïan, J. M. (2010). GARCH models. Chichester: Wiley.CrossRefGoogle Scholar
  14. Fu, W. J. (1998). Penalized regression: the Bridge versus the Lasso. Journal of Computational and Graphical Statistics, 7, 397–416.MathSciNetGoogle Scholar
  15. Geyer, C. J. (1996). On the asymptotics of convex stochastic optimization. Unpublished manuscript.Google Scholar
  16. Hjort, N. L., Pollard, D. (1993). Asymptotics for minimisers of convex processes. Unpublished manuscript.Google Scholar
  17. Huber, P. J. (1973). Robust regression: Asymptotics, conjectures and Monte Carlo. The Annals of Statistics, 1(5), 799821.MathSciNetCrossRefGoogle Scholar
  18. Hunter, D. R., Li, R. (2005). Variable selection using MM algorithms. The Annals of Statistics, 33(4), 1617–1642.MathSciNetCrossRefGoogle Scholar
  19. Kato, K. (2009). Asymptotics for argmin processes: Convexity arguments. Journal of Multivariate Analysis, 100, 1816–1829.MathSciNetCrossRefGoogle Scholar
  20. Knight, K., Fu, W. (2000). Asymptotics for Lasso-type estimators. The Annals of Statistics, 28(5), 1356–1378.MathSciNetCrossRefGoogle Scholar
  21. Li, X., Mo, L., Yuan, X., Zhang, J. (2014). Linearized alternating direction method of multipliers for Sparse Group and Fused Lasso models. Computational Statistics and Data Analysis, 79, 203–221.MathSciNetCrossRefGoogle Scholar
  22. Nardi, Y., Rinaldo, A. (2008). On the asymptotic properties of the Group Lasso estimator for linear models. Electronic Journal of Statistics, 2, 605–633.MathSciNetCrossRefGoogle Scholar
  23. Neumann, M. H. (2013). A central limit theorem for triangular arrays of weakly dependent random variables, with applications in statistics. Probability and Statistics, 17, 120–134.MathSciNetCrossRefGoogle Scholar
  24. Newey, W. K., Powell, J. L. (1987). Asymmetric least squares estimation and testing. Econometrica, 55(4), 819–847.MathSciNetCrossRefGoogle Scholar
  25. Pollard, D. (1991). Asymptotics for least absolute deviation regression estimators. Econometric Theory, 7(2), 186–199.MathSciNetCrossRefGoogle Scholar
  26. Racine, J. (2000). Consistent cross-validatory model-selection for dependent data: hv-block cross-validation. Journal of Econometrics, 99, 39–61.CrossRefGoogle Scholar
  27. Rio, E. (2013). Inequalities and limit theorems for weakly dependent sequences. 3 ème Cycle, cel–00867106, 170.Google Scholar
  28. Rockafeller, R. T. (1970). Convex analysis. Princeton: Princeton University Press.CrossRefGoogle Scholar
  29. Shiryaev, A. N. (1991). Probability. Berlin: Springer.zbMATHGoogle Scholar
  30. Simon, N., Friedman, J., Hastie, T., Tibshirani, R. (2013). A Sparse Group Lasso. Journal of Computational and Graphical Statistics, 22(2), 231–245.MathSciNetCrossRefGoogle Scholar
  31. Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B, 58(1), 267–288.MathSciNetzbMATHGoogle Scholar
  32. Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using \(l^1\)-constrained quadratic programming. IEEE Transactions on Information Theory, 55(5), 2183–2202.MathSciNetCrossRefGoogle Scholar
  33. Wellner, J. A., van der Vaart, A. W. (1996). Weak convergence and empirical processes. With applications to statistics. New York, NY: Springer.zbMATHGoogle Scholar
  34. Yuan, M., Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society. Series B, 68(1), 49–67.MathSciNetCrossRefGoogle Scholar
  35. Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429.MathSciNetCrossRefGoogle Scholar
  36. Zou, H., Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67(2), 301–320.MathSciNetCrossRefGoogle Scholar
  37. Zou, H., Zhang, H. H. (2009). On the adaptive elastic-net with a diverging number of parameters. The Annals of Statistics, 37(4), 1733–1751.MathSciNetCrossRefGoogle Scholar

Copyright information

© The Institute of Statistical Mathematics, Tokyo 2018

Authors and Affiliations

  1. 1.Graduate School of Engineering ScienceOsaka UniversityToyonakaJapan

Personalised recommendations