The Group-Lasso: ℓ1, ∞  Regularization versus ℓ1,2 Regularization

  • Julia E. Vogt
  • Volker Roth
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6376)

Abstract

The ℓ1, ∞  norm and the ℓ1,2 norm are well known tools for joint regularization in Group-Lasso methods. While the ℓ1,2 version has been studied in detail, there are still open questions regarding the uniqueness of solutions and the efficiency of algorithms for the ℓ1, ∞  variant. For the latter, we characterize the conditions for uniqueness of solutions, we present a simple test for uniqueness, and we derive a highly efficient active set algorithm that can deal with input dimensions in the millions. We compare both variants of the Group-Lasso for the two most common application scenarios of the Group-Lasso, one is to obtain sparsity on the level of groups in “standard” prediction problems, the second one is multi-task learning where the aim is to solve many learning problems in parallel which are coupled via the Group-Lasso constraint. We show that both version perform quite similar in “standard” applications. However, a very clear distinction between the variants occurs in multi-task settings where the ℓ1,2 version consistently outperforms the ℓ1, ∞  counterpart in terms of prediction accuracy.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc. B 58(1), 267–288 (1996)MATHMathSciNetGoogle Scholar
  2. 2.
    Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. B, 49–67 (2006)Google Scholar
  3. 3.
    Turlach, B.A., Venables, W.N., Wright, S.J.: Simultaneous variable selection. Technometrics 47, 349–363 (2005)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Meier, L., van de Geer, S., Bühlmann, P.: The Group Lasso for Logistic Regression. J. Roy. Stat. Soc. B 70(1), 53–71 (2008)MATHCrossRefGoogle Scholar
  5. 5.
    Argyriou, A., Evgeniou, T., Pontil, M.: Multi-task feature learning. In: Advances in Neural Information Processing Systems, vol. 19. MIT Press, Cambridge (2007)Google Scholar
  6. 6.
    Kim, Y., Kim, J., Kim, Y.: Blockwise sparse regression. Statistica Sinica 16, 375–390 (2006)MATHMathSciNetGoogle Scholar
  7. 7.
    Roth, V., Fischer, B.: The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms. In: ICML 2008, pp. 848–855. ACM, New York (2008)CrossRefGoogle Scholar
  8. 8.
    Schmidt, M., Murphy, K., Fung, G., Rosales, R.: Structure learning in random fields for heart motion abnormality detection. In: CVPR (2008)Google Scholar
  9. 9.
    Quattoni, A., Carreras, X., Collins, M., Darrell, T.: An efficient projection for l 1 ∞  regularization. In: 26th Intern. Conference on Machine Learning (2009)Google Scholar
  10. 10.
    Liu, H., Palatucci, M., Zhang, J.: Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. In: 26th Intern. Conference on Machine Learning (2009)Google Scholar
  11. 11.
    Osborne, M., Presnell, B., Turlach, B.: On the LASSO and its dual. J. Comp. and Graphical Statistics 9(2), 319–337 (2000)CrossRefMathSciNetGoogle Scholar
  12. 12.
    McCullaghand, P., Nelder, J.: Generalized Linear Models. Chapman & Hall, Boca Raton (1983)Google Scholar
  13. 13.
    Liu, Q., Xu, Q., Zheng, V.W., Xue, H., Cao, Z., Yang, Q.: Multi-task learning for cross-platform sirna efficacy prediction: an in-silico study. BMC Bioinformatics 11(1), 181 (2010)CrossRefGoogle Scholar
  14. 14.
    Yeo, G., Burge, C.: Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comp. Biology 11, 377–394 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Julia E. Vogt
    • 1
  • Volker Roth
    • 1
  1. 1.Department of Computer ScienceUniversity of BaselBaselSwitzerland

Personalised recommendations