Fast projections onto mixed-norm balls with applications

Sra, Suvrit

doi:10.1007/s10618-012-0277-7

Fast projections onto mixed-norm balls with applications

Published: 30 June 2012

Volume 25, pages 358–377, (2012)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Suvrit Sra¹

604 Accesses
17 Citations
Explore all metrics

Abstract

Joint sparsity offers powerful structural cues for feature selection, especially for variables that are expected to demonstrate a “grouped” behavior. Such behavior is commonly modeled via group-lasso, multitask lasso, and related methods where feature selection is effected via mixed-norms. Several mixed-norm based sparse models have received substantial attention, and for some cases efficient algorithms are also available. Surprisingly, several constrained sparse models seem to be lacking scalable algorithms. We address this deficiency by presenting batch and online (stochastic-gradient) optimization methods, both of which rely on efficient projections onto mixed-norm balls. We illustrate our methods by applying them to the multitask lasso. We conclude by mentioning some open problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bach F (2010) Structured sparsity-inducing norms through submodular functions. In: Advances in Neural Information Processing Systems (NIPS)
Bach F, Jenatton R, Mairal J, Obozinski G (2011) Convex optimization with sparsity-inducing norms. In: Sra S, Nowozin S, Wright SJ (eds) Optimization for machine learning. MIT Press, Cambridge
Google Scholar
Bach FR (2008) Consistency of the group lasso and multiple kernel learning. J Mach Learn Res 9: 1179–1225
MathSciNet MATH Google Scholar
Barzilai J, Borwein JM (1988) Two-point step size gradient methods. IMA J Numer Anal 8(1): 141–148
Article MathSciNet MATH Google Scholar
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imgaging Sci 2(1): 183–202
Article MathSciNet MATH Google Scholar
Bertsekas DP (1999) Nonlinear programming. Athena Scientific, Massachusett
MATH Google Scholar
Bhatia R (1997) Matrix analysis. Springer, New York
Book Google Scholar
Birgin EG, Martínez JM, Raydan M (2000) Nonmonotone spectral projected gradient methods on convex sets. SIAM J Opt 10(4): 1196–1211
Article MATH Google Scholar
Birgin EG, Martínez JM, Raydan M (2003) Inexact spectral projected gradient methods on convex sets. IMA J Numer Anal 23: 539–559
Article MathSciNet MATH Google Scholar
Cai JF, Candes EJ, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4): 1956–1982
Article MathSciNet MATH Google Scholar
Combettes PL, Pesquet J (2010) Proximal splitting methods in signal processing. arXiv:0912.3522v4
Dai YH, Fletcher R (2005) Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming. Numer Math 100(1): 21–47
Article MathSciNet MATH Google Scholar
Donoho D (2002) Denoising by soft-thresholding. IEEE Tran Inf Theory 41(3): 613–627
Article MathSciNet Google Scholar
Duchi J, Singer Y (2009) Online and batch learning using forward-backward splitting. J Mach Learn Res
Evgeniou T, Micchelli C, Pontil M (2005) Learning multiple tasks with kernel methods. J Mach Learn Res 6: 615–637
MathSciNet MATH Google Scholar
Evgeniou T, Pontil M (2004) Regularized multi-task learning. In: SIGKDD, 109–117
Friedman J, Hastie T, Tibshirani R (2010) A note on the group lasso and a sparse group lasso. arXiv:1001.0736v1 [math.ST]
Fukushima M, Mine H (1981) A generalized proximal point algorithm for certain non-convex minimization problems. Int J Syst Sci 12(8): 989–1000
Article MathSciNet MATH Google Scholar
Horn RA, Johnson CR (1991) Topics in matrix analysis. Cambridge University Press, Cambridge
Book MATH Google Scholar
Jenatton R, Mairal J, Obozinski G, Bach F (2010) Proximal methods for sparse hierarchical dictionary learning. In: International Conference on Machine Learning (ICML), pp 487–494
Kim D, Sra S, Dhillon IS (2010) A scalable trust-region algorithm with application to mixed-norm regression. In: International Conferences on Machine Learning (ICML), pp 519–526
Kiwiel K (2007) On linear-time algorithms for the continuous quadratic knapsack problem. J Optim Theory Appl 134: 549–554
Article MathSciNet MATH Google Scholar
Kowalski M (2009) Sparse regression using mixed norms. Appl Comput Harmon Anal 27(3): 303–324
Article MathSciNet MATH Google Scholar
Lewis A (1995) The convex analysis of unitarily invariant matrix functions. J Convex Anal 2(1): 173–183
MathSciNet MATH Google Scholar
Liu H, Palatucci M, Zhang J (2009) Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. In: International Conference on Machine Learning, pp 649–656
Liu J, Ji S, Ye J (2009) SLEP: sparse learning with efficient projections. Arizona State University, Phoenix. http://www.public.asu.edu/~jye02/Software/SLEP
Liu J, Ye J (2009) Efficient euclidean projections in linear time. In: International conference on machine learning (ICML), pp 657–664
Liu J, Ye J (2010) Efficient L1/Lq norm regularization. arXiv:1009.4766v1
Liu J, Ye J (2010) Moreau-Yosida regularization for grouped tree structure learning. In: Neural information processing systems (NIPS)
Mairal J, Jenatton R, Obozinski G, Bach F (2010) Network flow algorithms for structured sparsity. In: Advances in neural information processing systems (NIPS)
Michelot C (1986) A finite algorithm for finding the projection of a point onto the canonical simplex of \({\mathbb{R}^n}\) . J Optim Theory Appl 50(1): 195–200
Article MathSciNet MATH Google Scholar
Obonzinski G, Taskar B, Jordan M (2006) Multi-task feature selection. Technical report. University of California, Berkeley
Patriksson M (2005) A survey on a classic core problem in operations research, vol 33. Technical Report. Chalmers University of Technology and Göteborg University, Sweden
Polyak BT (1987) Introduction to optimization. Optimization Software, Wellesley
Google Scholar
Quattoni A, Carreras X, Collins M, Darrell T (2009) An efficient projection for ℓ _1,∞ regularization. In: International conference on machine learning (ICML)
Rakotomamonjy A, Flamary R, Gasso G, Canu S (2010)ℓ _p−ℓ _q penalty for sparse linear and sparse multiple kernel multi-task learning. Technical Report hal-00509608, Version 1, INSA-Rouen
Rice U (2010) Compressive sensing resources. http://dsp.rice.edu/cs
Rish I, Grabarnik G (2010) Sparse modeling: ICML 2010 tutorial. Online
Rosen J (1960) The gradient projection method for nonlinear programming. Part I. Linear constraints. J Soc Ind Appl Math 8(1): 181–217
Article MATH Google Scholar
Schmidt M, van~den Berg E, Friedlander M, Murphy K (2009) Optimizing costly functions with simple constraints: a limited-memory projected quasi-newton algorithm. In: Artificial intelligence and statistics (AISTATS)
Schmidt M, Roux NL, Bach F (2011) Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in neural information processing systems (NIPS)
Similä T, Tikka J (2007) Input selection and shrinkage in multiresponse linear regression. Comp Stat Data Anal 52(1): 406–422
Article MATH Google Scholar
Sra S (2011) Fast projections onto ℓ _1,q-norm balls for grouped feature selection. In: European conference on machine learning (ECML)
Tomioka R, Suzuki T, Sugiyama M (2011) Augmented lagrangian methods for learning, selecting, and combining features. In: Sra S, Nowozin S, Wright SJ (eds) Optimization for machine learning. MIT Press, Cambridge
Google Scholar
Tropp JA (2006) Algorithms for simultaneous sparse approximation, Part II: convex relaxation. Signal Proc 86(3): 589–602
Article MathSciNet MATH Google Scholar
Turlach BA, Venables WN, Wright SJ (2005) Simultaneous variable selection. Technometrics 27: 349–363
Article MathSciNet Google Scholar
van den Berg E, Schmidt M, Friedlander MP, Murphy K (2008) Group sparsity via linear-time projection. Tech Rep TR-2008-09, University of British Columbia, Vancouver
Yuan M, Lin Y (2004) Model selection and estimation in regression with grouped variables. Technical Report 1095. Deptartment of Statistics, University of Wisconsin, Madison
Zhang Y, Yeung DY, Xu Q (2010) Probabilistic multi-task feature selection. In: Neural information processing systems (NIPS)
Zhao P, Rocha G, Yu B (2009) The composite absolute penalties family for grouped and hierarchical variable selection. Ann Stat 37(6A): 3468–3497
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Max Planck Institute for Intelligent Systems, Tübingen, Germany
Suvrit Sra

Authors

Suvrit Sra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suvrit Sra.

Additional information

Responsible editor: Dimitrios Gunopulos, Donato Malerba, Michalis Vazirgiannis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sra, S. Fast projections onto mixed-norm balls with applications. Data Min Knowl Disc 25, 358–377 (2012). https://doi.org/10.1007/s10618-012-0277-7

Download citation

Received: 08 November 2011
Accepted: 05 June 2012
Published: 30 June 2012
Issue Date: September 2012
DOI: https://doi.org/10.1007/s10618-012-0277-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast projections onto mixed-norm balls with applications

Abstract

Access this article

Similar content being viewed by others

Tutorial on PCA and approximate PCA and approximate kernel PCA

kNN Classification: a review

Robust estimation in regression and classification methods for large dimensional data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast projections onto mixed-norm balls with applications

Abstract

Access this article

Similar content being viewed by others

Tutorial on PCA and approximate PCA and approximate kernel PCA

kNN Classification: a review

Robust estimation in regression and classification methods for large dimensional data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation