Machine Learning

, Volume 94, Issue 3, pp 303–351 | Cite as

Learning with tensors: a framework based on convex optimization and spectral regularization

  • Marco Signoretto
  • Quoc Tran Dinh
  • Lieven De Lathauwer
  • Johan A. K. Suykens
Article

Abstract

We present a framework based on convex optimization and spectral regularization to perform learning when feature observations are multidimensional arrays (tensors). We give a mathematical characterization of spectral penalties for tensors and analyze a unifying class of convex optimization problems for which we present a provably convergent and scalable template algorithm. We then specialize this class of problems to perform learning both in a transductive as well as in an inductive setting. In the transductive case one has an input data tensor with missing features and, possibly, a partially observed matrix of labels. The goal is to both infer the missing input features as well as predict the missing labels. For induction, the goal is to determine a model for each learning task to be used for out of sample prediction. Each training pair consists of a multidimensional array and a set of labels each of which corresponding to related but distinct tasks. In either case the proposed technique exploits precise low multilinear rank assumptions over unknown multidimensional arrays; regularization is based on composite spectral penalties and connects to the concept of Multilinear Singular Value Decomposition. As a by-product of using a tensor-based formalism, our approach allows one to tackle the multi-task case in a natural way. Empirical studies demonstrate the merits of the proposed methods.

Keywords

Spectral regularization Matrix and tensor completion Tucker decomposition Multilinear rank Transductive and inductive learning Multi-task learning 

References

  1. Abernethy, J., Bach, F., Evgeniou, T., & Vert, J. (2009). A new approach to collaborative filtering: operator estimation with spectral regularization. Journal of Machine Learning Research, 10, 803–826. MATHGoogle Scholar
  2. Acar, E., Dunlavy, D., Kolda, T., & Mørup, M. (2011). Scalable tensor factorizations for incomplete data. Chemometrics and Intelligent Laboratory Systems, 106(1), 41–56. CrossRefGoogle Scholar
  3. Argyriou, A., Micchelli, C., Pontil, M., & Ying, Y. (2007a). A spectral regularization framework for multi-task structure learning. In Advances in neural information processing systems. Google Scholar
  4. Argyriou, A., Micchelli, C. A., Pontil, M., & Ying, Y. (2007b). A spectral regularization framework for multi-task structure learning. In J. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems (Vol. 20, pp. 25–32). Cambridge: MIT Press. Google Scholar
  5. Argyriou, A., Evgeniou, T., & Pontil, M. (2007c). Multi-task feature learning. In B. Schölkopf, J. Platt, & T. Hoffman (Eds.), Advances in neural information processing systems (Vol. 19, pp. 41–48). Cambridge: MIT Press. Google Scholar
  6. Argyriou, A., Evgeniou, T., & Pontil, M. (2008). Convex multi-task feature learning. Machine Learning, 73(3), 243–272. CrossRefGoogle Scholar
  7. Argyriou, A., Micchelli, C., & Pontil, M. (2009). When is there a representer theorem? Vector versus matrix regularizers. Journal of Machine Learning Research, 10, 2507–2529. MathSciNetMATHGoogle Scholar
  8. Argyriou, A., Micchelli, C., & Pontil, M. (2010). On spectral learning. Journal of Machine Learning Research, 11, 935–953. MathSciNetMATHGoogle Scholar
  9. Argyriou, A., Micchelli, C., Pontil, M., Shen, L., & Xu, Y. (2011). Efficient first order methods for linear composite regularizers. Arxiv preprint arXiv:1104.1436.
  10. Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68, 337–404. MathSciNetCrossRefMATHGoogle Scholar
  11. Bauschke, H., & Combettes, P. (2011). Convex analysis and monotone operator theory in Hilbert spaces. Berlin: Springer. CrossRefMATHGoogle Scholar
  12. Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202. MathSciNetCrossRefMATHGoogle Scholar
  13. Becker, S., Candès, E. J., & Grant, M. (2010). Templates for convex cone problems with applications to sparse signal recovery. Tech. rep, Stanford University. Google Scholar
  14. Berlinet, A., & Thomas-Agnan, C. (2004). Reproducing kernel Hilbert spaces in probability and statistics. Amsterdam: Kluwer Academic. CrossRefMATHGoogle Scholar
  15. Bertsekas, D. (1976). On the Goldstein-Levitin-Polyak gradient projection method. IEEE Transactions on Automatic Control, 21(2), 174–184. MathSciNetCrossRefMATHGoogle Scholar
  16. Bertsekas, D. P. (1995). Nonlinear programming. Belmont: Athena Scientific. MATHGoogle Scholar
  17. Bertsekas, D. P., & Tsitsiklis, J. N. (1989). Parallel and distributed computation. Englewood Cliffs: Prentice-Hall. MATHGoogle Scholar
  18. Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1), 1–122. CrossRefMATHGoogle Scholar
  19. Brézis, H. (1973). Opérateurs maximaux monotones. Amsterdam: Elsevier. MATHGoogle Scholar
  20. Cai, J., Candès, E., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4), 1956–1982. MathSciNetCrossRefMATHGoogle Scholar
  21. Candes, E., Li, X., Ma, Y., & Wright, J. (2011). Robust principal component analysis? Journal of the ACM, 58(3), 11, p. 37. MathSciNetCrossRefMATHGoogle Scholar
  22. Candès, E., & Plan, Y. (2010). Matrix completion with noise. Proceedings of the IEEE, 98(6), 925–936. CrossRefGoogle Scholar
  23. Candès, E., & Recht, B. (2009). Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6), 717–772. MathSciNetCrossRefMATHGoogle Scholar
  24. Chen, S., Donoho, D., & Saunders, M. (2001). Atomic decomposition by basis pursuit. SIAM Review, 43, 129–159. MathSciNetCrossRefMATHGoogle Scholar
  25. Combettes, P. (2009). Iterative construction of the resolvent of a sum of maximal monotone operators. Journal of Convex Analysis, 16(4), 727–748. MathSciNetMATHGoogle Scholar
  26. Combettes, P., & Pesquet, J. (2008). A proximal decomposition method for solving convex variational inverse problems. Inverse Problems, 24, 065014. MathSciNetCrossRefMATHGoogle Scholar
  27. Combettes, P., & Pesquet, J. (2009). Proximal splitting methods in signal processing. Arxiv preprint arXiv:0912.3522.
  28. Coppi, R., & Bolasco, S. (1989). Multiway data analysis. Amsterdam: North-Holland MATHGoogle Scholar
  29. De Brabanter, K., Karsmakers, P., Ojeda, F., Alzate, C., De Brabanter, J., Pelckmans, K., De Moor, B., Vandewalle, J., & Suykens, J. A. K. (2010). LS-SVMlab toolbox user’s guide version 1.8. Internal Report 10-146, ESAT-SISTA, K.U.Leuven (Leuven, Belgium). Google Scholar
  30. De Lathauwer, L., De Moor, B., & Vandewalle, J. (2000). A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 21(4), 1253–1278. MathSciNetCrossRefMATHGoogle Scholar
  31. Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2), 103–130. CrossRefMATHGoogle Scholar
  32. Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on Information Theory, 52(4), 1289–1306. MathSciNetCrossRefMATHGoogle Scholar
  33. Douglas, J., & Rachford, H. H. (1956). On the numerical solution of heat conduction problems in two and three space variables. Transactions of the American Mathematical Society, 82(2), 421–439. MathSciNetCrossRefMATHGoogle Scholar
  34. Eckstein, J., & Bertsekas, D. (1992). On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55(1), 293–318. MathSciNetCrossRefMATHGoogle Scholar
  35. Ekeland, I., & Temam, R. (1976). Convex analysis and variational problems. Amsterdam: North-Holland MATHGoogle Scholar
  36. Fazel, M. (2002). Matrix rank minimization with applications. Ph.D. thesis, Elec. Eng. Dept., Stanford University. Google Scholar
  37. Gandy, S., Recht, B., & Yamada, I. (2011). Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Problems, 27(2), 025010. MathSciNetCrossRefMATHGoogle Scholar
  38. Geng, X., Smith-Miles, K., Zhou, Z., & Wang, L. (2011). Face image modeling by multilinear subspace analysis with missing values. IEEE Transactions on Systems, Man and Cybernetics. Part B. Cybernetics, 41(3), 881–892. CrossRefGoogle Scholar
  39. Goldberg, A., Xiaojin, Z., Recht, B., Xu, J., & Nowak, R. (2010). Transduction with matrix completion: three birds with one stone. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, & A. Culotta (Eds.), Advances in neural information processing systems (Vol. 23, pp. 757–765). Google Scholar
  40. Golub, G., & Van Loan, C. (1980). An analysis of the total least squares problem. SIAM Journal on Numerical Analysis, 17(6), 883–893. MathSciNetCrossRefMATHGoogle Scholar
  41. Golub, G. H., & Van Loan, C. F. (1996). Matrix Computations (3rd ed.). Baltimore: Johns Hopkins University Press. MATHGoogle Scholar
  42. Halmos, P. (1982). A Hilbert space problem book (Vol. 19). Berlin: Springer. MATHGoogle Scholar
  43. Hastad, J. (1990). Tensor rank is NP-complete. Journal of Algorithms, 11(4), 644–654. MathSciNetCrossRefMATHGoogle Scholar
  44. Hillar, C., & Lim, L. (2010). Most tensor problems are NP hard. Arxiv preprint arXiv:0911.1393.
  45. Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67. MathSciNetCrossRefMATHGoogle Scholar
  46. Horn, R., & Johnson, C. (1994). Topics in matrix analysis. Cambridge: Cambridge University Press. MATHGoogle Scholar
  47. Jacob, L., Obozinski, G., & Vert, J. (2009). Group lasso with overlap and graph lasso. In Proceedings of the 26th annual international conference on machine learning. New York: ACM. Google Scholar
  48. Kolda, T., & Bader, B. (2009). Tensor decompositions and applications. SIAM Review, 51(3), 455–500. MathSciNetCrossRefMATHGoogle Scholar
  49. Koltchinskii, V., Tsybakov, A., & Lounici, K. (2010). Nuclear norm penalization and optimal rates for noisy low rank matrix completion. Arxiv preprint arXiv:1011.6256.
  50. Kroonenberg, P. (2008). Applied multiway data analysis. New York: Wiley-Interscience. CrossRefMATHGoogle Scholar
  51. Lions, P., & Mercier, B. (1979). Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis, 16(6), 964–979. MathSciNetCrossRefMATHGoogle Scholar
  52. Liu, J., Musialski, P., Wonka, P., & Ye, J. (2009). Tensor completion for estimating missing values in visual data. In IEEE international conference on computer vision (ICCV), Kyoto, Japan (pp. 2114–2121). Google Scholar
  53. Ma, S., Goldfarb, D., & Chen, L. (2011). Fixed point and Bregman iterative methods for matrix rank minimization. Mathematical Programming, 128(1), 321–353. MathSciNetCrossRefMATHGoogle Scholar
  54. Minty, G. (1962). Monotone (nonlinear) operators in Hilbert space. Duke Mathematical Journal, 29(3), 341–346. MathSciNetCrossRefMATHGoogle Scholar
  55. Moreau, J. (1962). Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes Rendus Mathematique. Academie Des Sciences Paris, Sér. A Math, 255, 2897–2899. MathSciNetMATHGoogle Scholar
  56. Nesterov, Y. (2003). Introductory lectures on convex optimization: a basic course. Norwell: Kluwer Academic. MATHGoogle Scholar
  57. Nesterov, Y. (2007). Gradient methods for minimizing composite objective function. Center for Operations Research and Econometrics (CORE), Université catholique de Louvain, Tech. Rep. Google Scholar
  58. von Neumann, J. (1937). Some matrix inequalities and metrization of matric-space. Tomsk University Review, 1, 286–300. MATHGoogle Scholar
  59. Pelckmans, K., De Brabanter, J., Suykens, J. A. K., & De Moor, B. (2005). Handling missing values in support vector machine classifiers. Neural Networks, 18, 684–692. CrossRefMATHGoogle Scholar
  60. Phelps, R. (1993). Convex functions, monotone operators, and differentiability. Berlin: Springer. MATHGoogle Scholar
  61. Recht, B., Fazel, M., & Parrilo, P. (2007). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52, 471–501. MathSciNetCrossRefMATHGoogle Scholar
  62. Rockafellar, R. (1970a). Convex analysis. Princeton: Princeton University Press. CrossRefMATHGoogle Scholar
  63. Rockafellar, R. (1970b). On the maximal monotonicity of subdifferential mappings. Pacific Journal of Mathematics, 33(1), 209–216. MathSciNetCrossRefMATHGoogle Scholar
  64. Rockafellar, R. (1976). Monotone operators and the proximal point algorithm. SIAM Journal on Control and Optimization, 14, 877. MathSciNetCrossRefMATHGoogle Scholar
  65. Signoretto, M., De Lathauwer, L., & Suykens, J. A. K. (2011a). A kernel-based framework to tensorial data analysis. Neural Networks, 24(8), 861–874. CrossRefMATHGoogle Scholar
  66. Signoretto, M., Van de Plas, R., De Moor, B., & Suykens, J. A. K. (2011b). Tensor versus matrix completion: a comparison with application to spectral data. IEEE Signal Processing Letters, 18(7), 403–406. CrossRefGoogle Scholar
  67. Smale, S., & Zhou, D. (2005). Shannon sampling II: connections to learning theory. Applied and Computational Harmonic Analysis, 19(3), 285–302. MathSciNetCrossRefMATHGoogle Scholar
  68. Smilde, A., Bro, R., & Geladi, P. (2004). Multi-way analysis with applications in the chemical sciences. New York: Wiley. CrossRefGoogle Scholar
  69. Spingarn, J. (1983). Partial inverse of a Monotone Operator. Applied Mathematics & Optimization, 10(1), 247–265. MathSciNetCrossRefMATHGoogle Scholar
  70. Srebro, N. (2004). Learning with matrix factorizations. Ph.D. thesis, Massachusetts Institute of Technology. Google Scholar
  71. Sturm, J. (1999). Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods & Software, 11(1), 625–653. MathSciNetCrossRefMATHGoogle Scholar
  72. Suykens, J. A. K., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9(3), 293–300. MathSciNetCrossRefMATHGoogle Scholar
  73. Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288. MathSciNetMATHGoogle Scholar
  74. Tomioka, R., & Aihara, K. (2007). Classifying matrices with a spectral regularization. In Proceedings of the 24th international conference on machine learning (pp. 895–902). New York: ACM. Google Scholar
  75. Tomioka, R., Hayashi, K., & Kashima, H. (2011). Estimation of low-rank tensors via convex optimization. Arxiv preprint arXiv:1010.0789.
  76. Tucker, L. R. (1964). The extension of factor analysis to three-dimensional matrices. In Contributions to mathematical psychology (pp. 109–127). New York: Holt, Rinehart & Winston. Google Scholar
  77. Tucker, L. R. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3), 279–311. MathSciNetCrossRefGoogle Scholar
  78. Tütüncü, R., Toh, K., & Todd, M. (2003). Solving semidefinite-quadratic-linear programs using SDPT3. Mathematical Programming, 95(2), 189–217. MathSciNetCrossRefMATHGoogle Scholar
  79. Van Huffel, S., & Vandewalle, J. (1991). The total least squares problem: computational aspects and analysis (Vol. 9). Philadelphia: Society for Industrial Mathematics. CrossRefMATHGoogle Scholar
  80. Vandenberghe, L., & Boyd, S. (1996). Semidefinite programming. SIAM Review, 38(1), 49–95. MathSciNetCrossRefMATHGoogle Scholar
  81. Zhao, P., Rocha, G., & Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. The Annals of Statistics, 37, 3468–3497. MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  • Marco Signoretto
    • 1
  • Quoc Tran Dinh
    • 2
  • Lieven De Lathauwer
    • 3
  • Johan A. K. Suykens
    • 1
  1. 1.ESAT-SCD/SISTAKatholieke Universiteit LeuvenLeuvenBelgium
  2. 2.Laboratory for Information and Inference Systems (LIONS)Ecole Polytechnique Federale de Lausanne (EPFL)LausanneSwitzerland
  3. 3.Group Science, Engineering and TechnologyKatholieke Universiteit LeuvenKortrijkBelgium

Personalised recommendations