A Unified View of Matrix Factorization Models

  • Ajit P. Singh
  • Geoffrey J. Gordon
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5212)


We present a unified view of matrix factorization that frames the differences among popular methods, such as NMF, Weighted SVD, E-PCA, MMMF, pLSI, pLSI-pHITS, Bregman co-clustering, and many others, in terms of a small number of modeling choices. Many of these approaches can be viewed as minimizing a generalized Bregman divergence, and we show that (i) a straightforward alternating projection algorithm can be applied to almost any model in our unified view; (ii) the Hessian for each projection has special structure that makes a Newton projection feasible, even when there are equality constraints on the factors, which allows for matrix co-clustering; and (iii) alternating projections can be generalized to simultaneously factor a set of matrices that share dimensions. These observations immediately yield new optimization algorithms for the above factorization methods, and suggest novel generalizations of these methods such as incorporating row and column biases, and adding or relaxing clustering constraints.


Matrix Factorization Latent Dirichlet Allocation Nonnegative Matrix Factorization Prediction Link Positive Matrix Factorization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Golub, G.H., Loan, C.F.V.: Matrix Computions, 3rd edn. John Hopkins University Press (1996)Google Scholar
  2. 2.
    Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR, pp. 50–57 (1999)Google Scholar
  3. 3.
    Singh, A.P., Gordon, G.J.: Relational learning via collective matrix factorization. In: KDD (2008)Google Scholar
  4. 4.
    Rish, I., Grabarnik, G., Cecchi, G., Pereira, F., Gordon, G.: Closed-form supervised dimensionality reduction with generalized linear models. In: ICML (2008)Google Scholar
  5. 5.
    Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: NIPS (2001)Google Scholar
  6. 6.
    Collins, M., Dasgupta, S., Schapire, R.E.: A generalization of principal component analysis to the exponential family. In: NIPS (2001)Google Scholar
  7. 7.
    Gordon, G.J.: Approximate Solutions to Markov Decision Processes. PhD thesis. Carnegie Mellon University (1999)Google Scholar
  8. 8.
    Gordon, G.J.: Generalized2 linear2 models. In: NIPS (2002)Google Scholar
  9. 9.
    Bregman, L.: The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comp. Math and Math. Phys. 7, 200–217 (1967)CrossRefGoogle Scholar
  10. 10.
    Censor, Y., Zenios, S.A.: Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press, Oxford (1997)zbMATHGoogle Scholar
  11. 11.
    Azoury, K.S., Warmuth, M.K.: Relative loss bounds for on-line density estimation with the exponential family of distributions. Mach. Learn. 43, 211–246 (2001)zbMATHCrossRefGoogle Scholar
  12. 12.
    Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)MathSciNetGoogle Scholar
  13. 13.
    Forster, J., Warmuth, M.K.: Relative expected instantaneous loss bounds. In: COLT, pp. 90–99 (2000)Google Scholar
  14. 14.
    Aldous, D.J.: Representations for partially exchangeable arrays of random variables. J. Multivariate Analysis 11(4), 581–598 (1981)zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Aldous, D.J.: 1. In: Exchangeability and related topics, pp. 1–198. Springer, Heidelberg (1985)Google Scholar
  16. 16.
    Welling, M., Rosen-Zvi, M., Hinton, G.: Exponential family harmoniums with an application to information retrieval. In: NIPS (2005)Google Scholar
  17. 17.
    Welling, M., Chemudugunta, C., Sutter, N.: Deterministic latent variable models and their pitfalls. In: SDM (2008)Google Scholar
  18. 18.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHCrossRefGoogle Scholar
  19. 19.
    Koenker, R., Bassett, G.J.: Regression quantiles. Econometrica 46(1), 33–50 (1978)zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc. B. 58(1), 267–288 (1996)zbMATHMathSciNetGoogle Scholar
  21. 21.
    Ding, C.H.Q., Li, T., Peng, W.: Nonnegative matrix factorization and probabilistic latent semantic indexing: Equivalence chi-square statistic, and a hybrid method. In: AAAI (2006)Google Scholar
  22. 22.
    Ding, C.H.Q., He, X., Simon, H.D.: Nonnegative Lagrangian relaxation of -means and spectral clustering. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 530–538. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  23. 23.
    Buntine, W.L., Jakulin, A.: Discrete component analysis. In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds.) SLSFS 2005. LNCS, vol. 3940, pp. 1–33. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  24. 24.
    Gabriel, K.R., Zamir, S.: Lower rank approximation of matrices by least squares with any choice of weights. Technometrics 21(4), 489–498 (1979)zbMATHCrossRefGoogle Scholar
  25. 25.
    Srebro, N., Jaakola, T.: Weighted low-rank approximations. In: ICML (2003)Google Scholar
  26. 26.
    Hartigan, J.: Clustering Algorithms. Wiley, Chichester (1975)zbMATHGoogle Scholar
  27. 27.
    Ke, Q., Kanade, T.: Robust l\(_{\mbox{1}}\) norm factorization in the presence of outliers and missing data by alternative convex programming. In: CVPR, pp. 739–746 (2005)Google Scholar
  28. 28.
    Paatero, P., Tapper, U.: Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5, 111–126 (1994)CrossRefGoogle Scholar
  29. 29.
    Schein, A.I., Saul, L.K., Ungar, L.H.: A generalized linear model for principal component analysis of binary data. In: AISTATS (2003)Google Scholar
  30. 30.
    Srebro, N., Rennie, J.D.M., Jaakkola, T.S.: Maximum-margin matrix factorization. In: NIPS (2004)Google Scholar
  31. 31.
    Rennie, J.D.M., Srebro, N.: Fast maximum margin matrix factorization for collaborative prediction. In: ICML, pp. 713–719. ACM Press, New York (2005)CrossRefGoogle Scholar
  32. 32.
    Nocedal, J., Wright, S.J.: Numerical Optimization. Series in Operations Research. Springer, Heidelberg (1999)zbMATHGoogle Scholar
  33. 33.
    Schmidt, M., Fung, G., Rosales, R.: Fast optimization methods for L1 regularization: A comparative study and two new approaches. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 286–297. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  34. 34.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)zbMATHGoogle Scholar
  35. 35.
    Pereira, F., Gordon, G.: The support vector decomposition machine. In: ICML, pp. 689–696. ACM Press, New York (2006)CrossRefGoogle Scholar
  36. 36.
    Zhu, S., Yu, K., Chi, Y., Gong, Y.: Combining content and link for classification using matrix factorization. In: SIGIR, pp. 487–494. ACM Press, New York (2007)Google Scholar
  37. 37.
    Yu, K., Yu, S., Tresp, V.: Multi-label informed latent semantic indexing. In: SIGIR, pp. 258–265. ACM Press, New York (2005)Google Scholar
  38. 38.
    Yu, S., Yu, K., Tresp, V., Kriegel, H.P., Wu, M.: Supervised probabilistic principal component analysis. In: KDD, pp. 464–473 (2006)Google Scholar
  39. 39.
    Cohn, D., Hofmann, T.: The missing link–a probabilistic model of document content and hypertext connectivity. In: NIPS (2000)Google Scholar
  40. 40.
    Long, B., Wu, X., Zhang, Z.M., Yu, P.S.: Unsupervised learning on k-partite graphs. In: KDD, pp. 317–326. ACM Press, New York (2006)Google Scholar
  41. 41.
    Long, B., Zhang, Z.M., Wú, X., Yu, P.S.: Spectral clustering for multi-type relational data. In: ICML, pp. 585–592. ACM Press, New York (2006)CrossRefGoogle Scholar
  42. 42.
    Long, B., Zhang, Z.M., Wu, X., Yu, P.S.: Relational clustering by symmetric convex coding. In: ICML, pp. 569–576. ACM Press, New York (2007)Google Scholar
  43. 43.
    Long, B., Zhang, Z.M., Yu, P.S.: A probabilistic framework for relational clustering. In: KDD, pp. 470–479. ACM Press, New York (2007)Google Scholar
  44. 44.
    Banerjee, A., Basu, S., Merugu, S.: Multi-way clustering on relation graphs. In: SDM (2007)Google Scholar
  45. 45.
    Netflix: Netflix prize dataset (January 2007),
  46. 46.
    Internet Movie Database Inc.: IMDB alternate interfaces (January 2007),
  47. 47.
    Rennie, J.D.: Extracting Information from Informal Communication. PhD thesis, Massachusetts Institute of Technology (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Ajit P. Singh
    • 1
  • Geoffrey J. Gordon
    • 1
  1. 1.Machine Learning DepartmentCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations