On Sparsity Inducing Regularization Methods for Machine Learning

Argyriou, Andreas; Baldassarre, Luca; Micchelli, Charles A.; Pontil, Massimiliano

doi:10.1007/978-3-642-41136-6_18

Andreas Argyriou⁴,
Luca Baldassarre⁵,
Charles A. Micchelli⁶ &
…
Massimiliano Pontil⁷

3938 Accesses
2 Citations

Abstract

During the past few years there has been an explosion of interest in learning methods based on sparsity regularization. In this chapter, we discuss a general class of such methods, in which the regularizer can be expressed as the composition of a convex function ω with a linear function. This setting includes several methods such as the Group Lasso, the Fused Lasso, multi-task learning and many more. We present a general approach for solving regularization problems of this kind, under the assumption that the proximity operator of the function ω is available. Furthermore, we comment on the application of this approach to support vector machines, a technique pioneered by the groundbreaking work of Vladimir Vapnik.

Dedicated to Vladimir Vapnik with esteem and gratitude for his fundamental contribution to Machine Learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)
Article Google Scholar
Argyriou, A., Micchelli, C., Pontil, M., Shen, L., Xu, Y.: Efficient first order methods for linear composite regularizers. CoRR 1104.1436 (2011)
Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Borwein, J., Lewis, A.: Convex Analysis and Nonlinear Optimization: Theory and Examples. CMS Books in Mathematics. Springer (2005)
Google Scholar
Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of 5th Annual ACM Workshop on Computational Learning Theory, Pittsburgh, pp. 144–152 (1992)
Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article Google Scholar
Cavallanti, G., Cesa-Bianchi, N., Gentile, C.: Linear algorithms for online multitask classification. J. Mach. Learn. Res. 11, 2901–2934 (2010)
MathSciNet MATH Google Scholar
Combettes, P., Pesquet, J.C.: Proximal splitting methods in signal processing. In: Bauschke, H., Burachik, R., Combettes, P., Elser, V., Luke, D., Wolkowicz, H. (eds.) Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185–212. Springer (2011)
Google Scholar
Combettes, P., Wajs, V.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2006)
Article MathSciNet Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
Evgeniou, T., Pontil, M., Poggio, T.: Regularization networks and support vector machines. Adv. Comput. Math. 13(1), 1–50 (2000)
Article MathSciNet MATH Google Scholar
Evgeniou, T., Pontil, M., Toubia, O.: A convex optimization approach to modeling heterogeneity in conjoint estimation. Mark. Sci. 26, 805–818 (2007)
Article Google Scholar
Herbster, M., Lever, G.: Predicting the labelling of a graph via minimum p-seminorm interpolation. In: Proceedings of the 22nd Conference on Learning Theory (COLT), Montreal (2009)
Google Scholar
Herbster, M., Pontil, M.: Prediction on a Graph with the Perceptron. Advances in Neural Information Processing Systems 19, pp. 577–584. MIT (2007)
Google Scholar
Jenatton, R., Audibert, J.Y., Bach, F.: Structured variable selection with sparsity-inducing norms. CoRR 0904.3523v2 (2009)
Google Scholar
Maurer, A., Pontil, M.: Structured sparsity and generalization. J. Mach. Learn. Res. 13, 671–690 (2012)
MathSciNet Google Scholar
Micchelli, C., Pontil, M.: Feature space perspectives for learning the kernel. Mach. Learn. 66, 297–319 (2007)
Article Google Scholar
Micchelli, C., Morales, J., Pontil, M.: A family of penalty functions for structured sparsity. In: NIPS, Vancouver, pp. 1612–1623 (2010)
Google Scholar
Micchelli, C., Shen, L., Xu, Y.: Proximity algorithms for image models: denoising. Inverse Probl. 27(4) (2011)
Google Scholar
Moreau, J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Acad. Sci. Paris Sér. A Math. 255, 2897–2899 (1962)
MathSciNet MATH Google Scholar
Mosci, S., Rosasco, L., Santoro, M., Verri, A., Villa, S.: Solving structured sparsity regularization with proximal methods. In: Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases, Barcelona, pp. 418–433 (2010)
Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1∕k ²). Sov. Math. Dokl. 27(2), 372–376 (1983)
MATH Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004)
Book Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite objective function. ECORE Discussion Paper 2007/76 (2007)
Google Scholar
Pontil, M., Verri, A.: Properties of support vector machines. Neural Comput. 10, 955–974 (1998)
Article Google Scholar
Pontil, M., Rifkin, R., Evgeniou, T.: From regression to classification in support vector machines. In: Proceedings of 7th European Symposium on Artificial Neural Networks, Bruges, pp. 225–230 (1999)
Google Scholar
Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y.: SimpleMKL. J. Mach. Learn. Res. 9, 2491–2521 (2008)
MathSciNet MATH Google Scholar
Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large scale multiple kernel learning. J. Mach. Learn. Res. 7, 1531–1565 (2006)
MathSciNet MATH Google Scholar
Suzuki, T., Tomioka, R.: SpicyMKL: a fast algorithm for multiple kernel learning with thousands of kernels. Mach. Learn. 8(1), 77–108 (2011)
Article MathSciNet Google Scholar
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused Lasso. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 67(1), 91–108 (2005)
Google Scholar
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125(2), 263–295 (2010)
Article MathSciNet MATH Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1999)
Google Scholar
Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward-backward splitting. SIAM J. Optim. 23(3), 1607–1633 (2013)
Article MathSciNet Google Scholar
von Neumann, J.: Some matrix-inequalities and metrization of matric-space. Mitt. Forsch.-Inst. Math. Mech. Univ. Tomsk 1, 286–299 (1937)
Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68(1), 49–67 (2006)
Article MathSciNet MATH Google Scholar
Zǎlinescu, C.: Convex Analysis in General Vector Spaces. World Scientific, River Edge/London (2002)
Book Google Scholar
Zhao, P., Rocha, G., Yu, B.: Grouped and hierarchical model selection through composite absolute penalties. Ann. Stat. 37(6A), 3468–3497 (2009)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

Part of this work was supported by EPSRC Grant EP/H027203/1, Royal Society International Joint Project Grant 2012/R2 and by the European Union Seventh Framework Programme (FP7 2007-2013) under grant agreement No. 246556.

Author information

Authors and Affiliations

Ecole Centrale de Paris, Grande Voie des Vignes, 92 295, Chatenay-Malabry, France
Andreas Argyriou
Laboratory for Information and Inference Systems, EPFL, ELD 243, Station 11, CH-1015, Lausanne, Switzerland
Luca Baldassarre
Department of Mathematics and Statistics, University at Albany, Earth Science 110, Albany, NY, 12222, USA
Charles A. Micchelli
Department of Computer Science, University College London, Malet Place, London, WC1E 6BT, UK
Massimiliano Pontil

Authors

Andreas Argyriou
View author publications
You can also search for this author in PubMed Google Scholar
Luca Baldassarre
View author publications
You can also search for this author in PubMed Google Scholar
Charles A. Micchelli
View author publications
You can also search for this author in PubMed Google Scholar
Massimiliano Pontil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Argyriou .

Editor information

Editors and Affiliations

Max Planck Institute for Intelligent Systems, Tübingen, Germany
Bernhard Schölkopf
Dept. of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
Zhiyuan Luo
Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
Vladimir Vovk

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Argyriou, A., Baldassarre, L., Micchelli, C.A., Pontil, M. (2013). On Sparsity Inducing Regularization Methods for Machine Learning. In: Schölkopf, B., Luo, Z., Vovk, V. (eds) Empirical Inference. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41136-6_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-41136-6_18
Published: 09 October 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41135-9
Online ISBN: 978-3-642-41136-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics