Skip to main content

On Sparsity Inducing Regularization Methods for Machine Learning

  • Chapter
  • First Online:
Empirical Inference

Abstract

During the past few years there has been an explosion of interest in learning methods based on sparsity regularization. In this chapter, we discuss a general class of such methods, in which the regularizer can be expressed as the composition of a convex function ω with a linear function. This setting includes several methods such as the Group Lasso, the Fused Lasso, multi-task learning and many more. We present a general approach for solving regularization problems of this kind, under the assumption that the proximity operator of the function ω is available. Furthermore, we comment on the application of this approach to support vector machines, a technique pioneered by the groundbreaking work of Vladimir Vapnik.

Dedicated to Vladimir Vapnik with esteem and gratitude for his fundamental contribution to Machine Learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)

    Article  Google Scholar 

  2. Argyriou, A., Micchelli, C., Pontil, M., Shen, L., Xu, Y.: Efficient first order methods for linear composite regularizers. CoRR 1104.1436 (2011)

    Google Scholar 

  3. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  4. Borwein, J., Lewis, A.: Convex Analysis and Nonlinear Optimization: Theory and Examples. CMS Books in Mathematics. Springer (2005)

    Google Scholar 

  5. Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of 5th Annual ACM Workshop on Computational Learning Theory, Pittsburgh, pp. 144–152 (1992)

    Google Scholar 

  6. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  Google Scholar 

  7. Cavallanti, G., Cesa-Bianchi, N., Gentile, C.: Linear algorithms for online multitask classification. J. Mach. Learn. Res. 11, 2901–2934 (2010)

    MathSciNet  MATH  Google Scholar 

  8. Combettes, P., Pesquet, J.C.: Proximal splitting methods in signal processing. In: Bauschke, H., Burachik, R., Combettes, P., Elser, V., Luke, D., Wolkowicz, H. (eds.) Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185–212. Springer (2011)

    Google Scholar 

  9. Combettes, P., Wajs, V.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2006)

    Article  MathSciNet  Google Scholar 

  10. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)

    MATH  Google Scholar 

  11. Evgeniou, T., Pontil, M., Poggio, T.: Regularization networks and support vector machines. Adv. Comput. Math. 13(1), 1–50 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  12. Evgeniou, T., Pontil, M., Toubia, O.: A convex optimization approach to modeling heterogeneity in conjoint estimation. Mark. Sci. 26, 805–818 (2007)

    Article  Google Scholar 

  13. Herbster, M., Lever, G.: Predicting the labelling of a graph via minimum p-seminorm interpolation. In: Proceedings of the 22nd Conference on Learning Theory (COLT), Montreal (2009)

    Google Scholar 

  14. Herbster, M., Pontil, M.: Prediction on a Graph with the Perceptron. Advances in Neural Information Processing Systems 19, pp. 577–584. MIT (2007)

    Google Scholar 

  15. Jenatton, R., Audibert, J.Y., Bach, F.: Structured variable selection with sparsity-inducing norms. CoRR 0904.3523v2 (2009)

    Google Scholar 

  16. Maurer, A., Pontil, M.: Structured sparsity and generalization. J. Mach. Learn. Res. 13, 671–690 (2012)

    MathSciNet  Google Scholar 

  17. Micchelli, C., Pontil, M.: Feature space perspectives for learning the kernel. Mach. Learn. 66, 297–319 (2007)

    Article  Google Scholar 

  18. Micchelli, C., Morales, J., Pontil, M.: A family of penalty functions for structured sparsity. In: NIPS, Vancouver, pp. 1612–1623 (2010)

    Google Scholar 

  19. Micchelli, C., Shen, L., Xu, Y.: Proximity algorithms for image models: denoising. Inverse Probl. 27(4) (2011)

    Google Scholar 

  20. Moreau, J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Acad. Sci. Paris Sér. A Math. 255, 2897–2899 (1962)

    MathSciNet  MATH  Google Scholar 

  21. Mosci, S., Rosasco, L., Santoro, M., Verri, A., Villa, S.: Solving structured sparsity regularization with proximal methods. In: Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases, Barcelona, pp. 418–433 (2010)

    Google Scholar 

  22. Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1∕k 2). Sov. Math. Dokl. 27(2), 372–376 (1983)

    MATH  Google Scholar 

  23. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004)

    Book  Google Scholar 

  24. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  25. Nesterov, Y.: Gradient methods for minimizing composite objective function. ECORE Discussion Paper 2007/76 (2007)

    Google Scholar 

  26. Pontil, M., Verri, A.: Properties of support vector machines. Neural Comput. 10, 955–974 (1998)

    Article  Google Scholar 

  27. Pontil, M., Rifkin, R., Evgeniou, T.: From regression to classification in support vector machines. In: Proceedings of 7th European Symposium on Artificial Neural Networks, Bruges, pp. 225–230 (1999)

    Google Scholar 

  28. Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y.: SimpleMKL. J. Mach. Learn. Res. 9, 2491–2521 (2008)

    MathSciNet  MATH  Google Scholar 

  29. Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large scale multiple kernel learning. J. Mach. Learn. Res. 7, 1531–1565 (2006)

    MathSciNet  MATH  Google Scholar 

  30. Suzuki, T., Tomioka, R.: SpicyMKL: a fast algorithm for multiple kernel learning with thousands of kernels. Mach. Learn. 8(1), 77–108 (2011)

    Article  MathSciNet  Google Scholar 

  31. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused Lasso. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 67(1), 91–108 (2005)

    Google Scholar 

  32. Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125(2), 263–295 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  33. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1999)

    Google Scholar 

  34. Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward-backward splitting. SIAM J. Optim. 23(3), 1607–1633 (2013)

    Article  MathSciNet  Google Scholar 

  35. von Neumann, J.: Some matrix-inequalities and metrization of matric-space. Mitt. Forsch.-Inst. Math. Mech. Univ. Tomsk 1, 286–299 (1937)

    Google Scholar 

  36. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68(1), 49–67 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  37. Zǎlinescu, C.: Convex Analysis in General Vector Spaces. World Scientific, River Edge/London (2002)

    Book  Google Scholar 

  38. Zhao, P., Rocha, G., Yu, B.: Grouped and hierarchical model selection through composite absolute penalties. Ann. Stat. 37(6A), 3468–3497 (2009)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Part of this work was supported by EPSRC Grant EP/H027203/1, Royal Society International Joint Project Grant 2012/R2 and by the European Union Seventh Framework Programme (FP7 2007-2013) under grant agreement No. 246556.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Argyriou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Argyriou, A., Baldassarre, L., Micchelli, C.A., Pontil, M. (2013). On Sparsity Inducing Regularization Methods for Machine Learning. In: Schölkopf, B., Luo, Z., Vovk, V. (eds) Empirical Inference. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41136-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41136-6_18

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41135-9

  • Online ISBN: 978-3-642-41136-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics