Skip to main content
Log in

Regularizers for structured sparsity

  • Published:
Advances in Computational Mathematics Aims and scope Submit manuscript

Abstract

We study the problem of learning a sparse linear regression vector under additional conditions on the structure of its sparsity pattern. This problem is relevant in machine learning, statistics and signal processing. It is well known that a linear regression can benefit from knowledge that the underlying regression vector is sparse. The combinatorial problem of selecting the nonzero components of this vector can be “relaxed” by regularizing the squared error with a convex penalty function like the ℓ1 norm. However, in many applications, additional conditions on the structure of the regression vector and its sparsity pattern are available. Incorporating this information into the learning method may lead to a significant decrease of the estimation error. In this paper, we present a family of convex penalty functions, which encode prior knowledge on the structure of the vector formed by the absolute values of the regression coefficients. This family subsumes the ℓ1 norm and is flexible enough to include different models of sparsity patterns, which are of practical and theoretical importance. We establish the basic properties of these penalty functions and discuss some examples where they can be computed explicitly. Moreover, we present a convergent optimization algorithm for solving regularized least squares with these penalty functions. Numerical simulations highlight the benefit of structured sparsity and the advantage offered by our approach over the Lasso method and other related methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)

    Article  Google Scholar 

  2. Argyriou, A., Micchelli, C.A., Pontil, M.: On spectral learning. J. Mach. Learn. Res. 11, 935–953 (2010)

    MathSciNet  MATH  Google Scholar 

  3. Bach, F.R., Lanckriet, G.R.G., Jordan, M.I.: Multiple kernels learning, conic duality, and the SMO algorithm. In: Proceedings of the Twenty-First International Conference on Machine Learning (2004)

  4. Bertsekas, D.: Nonlinear Programming. Athena Scientific (1999)

  5. Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37, 1705–1732 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  6. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press (2004)

  7. Bunea, F., Tsybakov, A.B., Wegkamp, M.H.: Sparsity oracle inequalities for the Lasso. Electronic Journal of Statistics 1, 169–194 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  8. Caponnetto, A., Micchelli, C.A., Pontil, M., Ying, Y.: Universal multi-task kernels. J. Mach. Learn. Res. 9, 1615–1646 (2008)

    MathSciNet  MATH  Google Scholar 

  9. Danskin, J.M.: The theory of max-min, with applications. SIAM J. Appl. Math. 14(4), 641–664 (1966)

    Article  MathSciNet  MATH  Google Scholar 

  10. Huang, J., Zhang, T., Metaxas, D.: Learning with structured sparsity. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 417–424. ACM (2009)

  11. Jacob, L.: Structured priors for supervised learning in computational biology. Ph.D. thesis (2009)

  12. Jacob, L., Obozinski, G., Vert, J.-P.: Group lasso with overlap and graph lasso. In: International Conference on Machine Learning (ICML 26) (2009)

  13. Jenatton, R., Audibert, J.-Y., Bach, F.: Structured variable selection with sparsity-inducing norms. arXiv:0904.3523v2 (2009)

  14. Koltchinskii, V., Yuan, M.: Sparsity in multiple kernel learning. Ann. Stat. 38(6), 3660–3695 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  15. Lanckriet, G.R.G., Cristianini, N., Bartlett, P., El Ghaoui, L., Jordan, M.I.: Learning the kernel matrix with semi-definite programming. J. Mach. Learn. Res. 5, 27–72 (2004)

    MATH  Google Scholar 

  16. Lounici, K.: Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electronic Journal of Statistics 2, 90–102 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  17. Lounici, K., Pontil, M., Tsybakov, A.B., Van De Geer, S.: Oracle inequalities and optimal inference under group sparsity. Ann. Stat. 39(4), 2164–2204 (2010). Arxiv preprint arXiv:1007.1771

    Google Scholar 

  18. Micchelli, C.A., Pontil, M.: Feature space perspectives for learning the kernel. Mach. Learn. 66, 297–319 (2007)

    Article  Google Scholar 

  19. Micchelli, C.A., Morales, J.M., Pontil, M.: A family of penalty functions for structured sparsity. In: J. Lafferty, Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 1612–1623 (2010)

  20. Mosci, S., Rosasco, L., Santoro, M., Verri, A., Villa, S.: Solving structured sparsity regularization with proximal methods. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2010), pp. 418–433 (2010)

  21. Owen, A.B.: A robust hybrid of lasso and ridge regression. In: Prediction and Discovery: AMS-IMS-SIAM Joint Summer Research Conference, Machine and Statistical Learning: Prediction and Discovery, vol. 443, p. 59 (2007)

  22. Suzuki, T., Tomioka, R.: Regularization strategies and empirical bayesian learning for MKL. arXiv:1001.26151 (2011)

  23. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc., Ser. B Stat. Methodol. 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  24. van de Geer, S.A.: High-dimensional generalized linear models and the Lasso. Ann. Stat. 36(2), 614 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  25. Yuan, M., Joseph, R., Zou, H.: Structured variable selection and estimation. Annals of Applied Statistics 3(4), 1738–1757 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  26. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc., Ser. B Stat. Methodol. 68(1), 49–67 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  27. Zhao, P., Rocha, G., Yu, B.: Grouped and hierarchical model selection through composite absolute penalties. Ann. Stat. 37(6A), 3468–3497 (2009)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Massimiliano Pontil.

Additional information

Communicated by Lixin Shen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Micchelli, C.A., Morales, J.M. & Pontil, M. Regularizers for structured sparsity. Adv Comput Math 38, 455–489 (2013). https://doi.org/10.1007/s10444-011-9245-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10444-011-9245-9

Keywords

Navigation