On Learning Matrices with Orthogonal Columns or Disjoint Supports

  • Kevin Vervier
  • Pierre Mahé
  • Alexandre D’Aspremont
  • Jean-Baptiste Veyrieras
  • Jean-Philippe Vert
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8726)

Abstract

We investigate new matrix penalties to jointly learn linear models with orthogonality constraints, generalizing the work of Xiao et al. [24] who proposed a strictly convex matrix norm for orthogonal transfer. We show that this norm converges to a particular atomic norm when its convexity parameter decreases, leading to new algorithmic solutions to minimize it. We also investigate concave formulations of this norm, corresponding to more aggressive strategies to induce orthogonality, and show how these penalties can also be used to learn sparse models with disjoint supports.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Foundations and Trends® in Machine Learning 4(1), 1–106 (2011)CrossRefMATHGoogle Scholar
  2. 2.
    Bakker, B., Heskes, T.: Task clustering and gating for bayesian multitask learning. J. Mach. Learn. Res. 4, 83–99 (2003)Google Scholar
  3. 3.
    Barvinok, A. A Course in Convexity. American Mathematical Society (2002)Google Scholar
  4. 4.
    Baxter, J.: A model of inductive bias learning. Journal of Artificial Intelligence Research 12, 149–198 (2000)MATHMathSciNetGoogle Scholar
  5. 5.
    Borwein, J.M., Lewis, A.S.: Convex Analysis and Nonlinear Optimization: Theory and Examples. Cms Books in Mathematics Series. Springer (2000)Google Scholar
  6. 6.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press (2004)Google Scholar
  7. 7.
    Brickman, L.: On the field of values of a matrix. Proceedings of the American Mathematical Society, 61–66 (1961)Google Scholar
  8. 8.
    Cai, L., Hofmann, T.: Hierarchical document categorization with support vector machines. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 78–87. ACM, New York (2004)Google Scholar
  9. 9.
    Calder, A.J., Burton, A.M., Miller, P., Young, A.W., Akamatsu, S.: A principal component analysis of facial expressions. Vision Res. 41(9), 1179–1208 (2001)CrossRefGoogle Scholar
  10. 10.
    Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Chandrasekaran, V., Recht, B., Parrilo, P.A., Willsky, A.S.: The convex geometry of linear inverse problems. Found. Comput. Math. 12(6), 805–849 (2012)CrossRefMATHMathSciNetGoogle Scholar
  12. 12.
    Evgeniou, T., Micchelli, C., Pontil, M.: Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6, 615–637 (2005)MATHMathSciNetGoogle Scholar
  13. 13.
    Hwang, S.J.J., Grauman, K., Sha, F.: Learning a tree of metrics with disjoint visual features. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P., Pereira, F.C.N., Weinberger, K.Q. (eds.) Adv. Neural. Inform. Process Syst. 24, pp. 621–629 (2011)Google Scholar
  14. 14.
    Jacob, L., Vert, J.-P.: Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24(19), 2149–2156 (2008)CrossRefGoogle Scholar
  15. 15.
    Lovász, L., Schrijver, A.: Cones of matrices and set-functions and 0-1 optimization. SIAM Journal on Optimization 1(2), 166–190 (1991)CrossRefMATHMathSciNetGoogle Scholar
  16. 16.
    McCallum, A., Rosenfeld, R., Mitchell, T.M., Ng, A.Y.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 359–367. Morgan Kaufmann Publishers Inc., San Francisco (1998)Google Scholar
  17. 17.
    Obozinski, G., Taskar, B., Jordan, M.I.: Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing 20(2), 231–252 (2010)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)MATHGoogle Scholar
  19. 19.
    Romera-Paredes, B., Argyriou, A., Berthouze, N., Pontil, M.: Exploiting unrelated tasks in multi-task learning. J. Mach. Learn. Res. - Proceedings Track 22, 951–959 (2012)Google Scholar
  20. 20.
    Shor, N.Z.: Quadratic optimization problems. Soviet Journal of Computer and Systems Sciences 25, 1–11 (1987)MATHMathSciNetGoogle Scholar
  21. 21.
    Srebro, N., Rennie, J.D.M., Jaakkola, T.S.: Maximum-margin matrix factorization. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Adv. Neural. Inform. Process Syst. 17, pp. 1329–1336. MIT Press, Cambridge (2005)Google Scholar
  22. 22.
    Thrun, S., Pratt, L. (eds.): Learning to learn. Kluwer Academic Publishers, Norwell (1998)MATHGoogle Scholar
  23. 23.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)MATHMathSciNetGoogle Scholar
  24. 24.
    Xiao, L., Zhou, D., Wu, M.: Hierarchical classification via orthogonal transfer. In: Getoor, L., Scheffer, T. (eds.) Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28-July 2, pp. 801–808. Omnipress (2011)Google Scholar
  25. 25.
    Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 9999, 2543–2596 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Kevin Vervier
    • 1
    • 2
    • 3
  • Pierre Mahé
    • 1
  • Alexandre D’Aspremont
    • 4
  • Jean-Baptiste Veyrieras
    • 1
  • Jean-Philippe Vert
    • 2
    • 3
  1. 1.Data and Knowledge LabBiomerieuxMarcy l’EtoileFrance
  2. 2.Centre for Computational BiologyMines ParisTechFontainebleauFrance
  3. 3.Institut Curie, INSERM U900ParisFrance
  4. 4.CNRS and D.I. UMR 8548, Ecole normale supérieureParisFrance

Personalised recommendations