Regularizers for structured sparsity

Micchelli, Charles A.; Morales, Jean M.; Pontil, Massimiliano

doi:10.1007/s10444-011-9245-9

Regularizers for structured sparsity

Published: 24 November 2011

Volume 38, pages 455–489, (2013)
Cite this article

Advances in Computational Mathematics Aims and scope Submit manuscript

Charles A. Micchelli^1,2,
Jean M. Morales³ &
Massimiliano Pontil³

804 Accesses
37 Citations
Explore all metrics

Abstract

We study the problem of learning a sparse linear regression vector under additional conditions on the structure of its sparsity pattern. This problem is relevant in machine learning, statistics and signal processing. It is well known that a linear regression can benefit from knowledge that the underlying regression vector is sparse. The combinatorial problem of selecting the nonzero components of this vector can be “relaxed” by regularizing the squared error with a convex penalty function like the ℓ₁ norm. However, in many applications, additional conditions on the structure of the regression vector and its sparsity pattern are available. Incorporating this information into the learning method may lead to a significant decrease of the estimation error. In this paper, we present a family of convex penalty functions, which encode prior knowledge on the structure of the vector formed by the absolute values of the regression coefficients. This family subsumes the ℓ₁ norm and is flexible enough to include different models of sparsity patterns, which are of practical and theoretical importance. We establish the basic properties of these penalty functions and discuss some examples where they can be computed explicitly. Moreover, we present a convergent optimization algorithm for solving regularized least squares with these penalty functions. Numerical simulations highlight the benefit of structured sparsity and the advantage offered by our approach over the Lasso method and other related methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)
Article Google Scholar
Argyriou, A., Micchelli, C.A., Pontil, M.: On spectral learning. J. Mach. Learn. Res. 11, 935–953 (2010)
MathSciNet MATH Google Scholar
Bach, F.R., Lanckriet, G.R.G., Jordan, M.I.: Multiple kernels learning, conic duality, and the SMO algorithm. In: Proceedings of the Twenty-First International Conference on Machine Learning (2004)
Bertsekas, D.: Nonlinear Programming. Athena Scientific (1999)
Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37, 1705–1732 (2009)
Article MathSciNet MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press (2004)
Bunea, F., Tsybakov, A.B., Wegkamp, M.H.: Sparsity oracle inequalities for the Lasso. Electronic Journal of Statistics 1, 169–194 (2007)
Article MathSciNet MATH Google Scholar
Caponnetto, A., Micchelli, C.A., Pontil, M., Ying, Y.: Universal multi-task kernels. J. Mach. Learn. Res. 9, 1615–1646 (2008)
MathSciNet MATH Google Scholar
Danskin, J.M.: The theory of max-min, with applications. SIAM J. Appl. Math. 14(4), 641–664 (1966)
Article MathSciNet MATH Google Scholar
Huang, J., Zhang, T., Metaxas, D.: Learning with structured sparsity. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 417–424. ACM (2009)
Jacob, L.: Structured priors for supervised learning in computational biology. Ph.D. thesis (2009)
Jacob, L., Obozinski, G., Vert, J.-P.: Group lasso with overlap and graph lasso. In: International Conference on Machine Learning (ICML 26) (2009)
Jenatton, R., Audibert, J.-Y., Bach, F.: Structured variable selection with sparsity-inducing norms. arXiv:0904.3523v2 (2009)
Koltchinskii, V., Yuan, M.: Sparsity in multiple kernel learning. Ann. Stat. 38(6), 3660–3695 (2010)
Article MathSciNet MATH Google Scholar
Lanckriet, G.R.G., Cristianini, N., Bartlett, P., El Ghaoui, L., Jordan, M.I.: Learning the kernel matrix with semi-definite programming. J. Mach. Learn. Res. 5, 27–72 (2004)
MATH Google Scholar
Lounici, K.: Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electronic Journal of Statistics 2, 90–102 (2008)
Article MathSciNet MATH Google Scholar
Lounici, K., Pontil, M., Tsybakov, A.B., Van De Geer, S.: Oracle inequalities and optimal inference under group sparsity. Ann. Stat. 39(4), 2164–2204 (2010). Arxiv preprint arXiv:1007.1771
Google Scholar
Micchelli, C.A., Pontil, M.: Feature space perspectives for learning the kernel. Mach. Learn. 66, 297–319 (2007)
Article Google Scholar
Micchelli, C.A., Morales, J.M., Pontil, M.: A family of penalty functions for structured sparsity. In: J. Lafferty, Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 1612–1623 (2010)
Mosci, S., Rosasco, L., Santoro, M., Verri, A., Villa, S.: Solving structured sparsity regularization with proximal methods. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2010), pp. 418–433 (2010)
Owen, A.B.: A robust hybrid of lasso and ridge regression. In: Prediction and Discovery: AMS-IMS-SIAM Joint Summer Research Conference, Machine and Statistical Learning: Prediction and Discovery, vol. 443, p. 59 (2007)
Suzuki, T., Tomioka, R.: Regularization strategies and empirical bayesian learning for MKL. arXiv:1001.26151 (2011)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc., Ser. B Stat. Methodol. 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
van de Geer, S.A.: High-dimensional generalized linear models and the Lasso. Ann. Stat. 36(2), 614 (2008)
Article MathSciNet MATH Google Scholar
Yuan, M., Joseph, R., Zou, H.: Structured variable selection and estimation. Annals of Applied Statistics 3(4), 1738–1757 (2009)
Article MathSciNet MATH Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc., Ser. B Stat. Methodol. 68(1), 49–67 (2006)
Article MathSciNet MATH Google Scholar
Zhao, P., Rocha, G., Yu, B.: Grouped and hierarchical model selection through composite absolute penalties. Ann. Stat. 37(6A), 3468–3497 (2009)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon Tong, Hong Kong, People’s Republic of China
Charles A. Micchelli
Department of Mathematics and Statistics, State University of New York, The University at Albany, 1400 Washington Avenue, Albany, NY, 12222, USA
Charles A. Micchelli
Department of Computer Science, University College London, Gower Street, London, WC1E, England, UK
Jean M. Morales & Massimiliano Pontil

Authors

Charles A. Micchelli
View author publications
You can also search for this author in PubMed Google Scholar
Jean M. Morales
View author publications
You can also search for this author in PubMed Google Scholar
Massimiliano Pontil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Massimiliano Pontil.

Additional information

Communicated by Lixin Shen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Micchelli, C.A., Morales, J.M. & Pontil, M. Regularizers for structured sparsity. Adv Comput Math 38, 455–489 (2013). https://doi.org/10.1007/s10444-011-9245-9

Download citation

Received: 29 August 2011
Accepted: 10 October 2011
Published: 24 November 2011
Issue Date: April 2013
DOI: https://doi.org/10.1007/s10444-011-9245-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Regularizers for structured sparsity

Abstract

Access this article

Similar content being viewed by others

On Sparsity Inducing Regularization Methods for Machine Learning

Low Complexity Regularization of Linear Inverse Problems

Recovering Structured Signals in Noise: Least-Squares Meets Compressed Sensing

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Regularizers for structured sparsity

Abstract

Access this article

Similar content being viewed by others

On Sparsity Inducing Regularization Methods for Machine Learning

Low Complexity Regularization of Linear Inverse Problems

Recovering Structured Signals in Noise: Least-Squares Meets Compressed Sensing

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation