Machine Learning

, Volume 107, Issue 8–10, pp 1431–1455 | Cite as

A new method of moments for latent variable models

  • Matteo RuffiniEmail author
  • Marta Casanellas
  • Ricard Gavaldà
Part of the following topical collections:
  1. Special Issue of the ECML PKDD 2018 Journal Track


We present an algorithm for the unsupervised learning of latent variable models based on the method of moments. We give efficient estimates of the moments for two models that are well known, e.g., in text mining, the single-topic model and latent Dirichlet allocation, and we provide a tensor decomposition algorithm for the moments that proves to be robust both in theory and in practice. Experiments on synthetic data show that the proposed estimators outperform the existing ones in terms of reconstruction accuracy, and that the proposed tensor decomposition technique achieves the learning accuracy of the state-of-the-art method with significantly smaller running times. We also provide examples of applications to real-world text corpora for both single-topic model and LDA, obtaining meaningful results.


Spectral methods Method of moments Latent variable models Topic modeling 



M. Casanellas is is partially funded by AGAUR Project 2017 SGR-932 and MINECO/FEDER Project MTM2015-69135-P. R. Gavaldà is partially funded by AGAUR Project 2014 SGR-890 (MACDA) and by MINECO Projects TIN2014-57226-P (APCOM) and TIN2017-89244-R (MACDA). Both authors are partially funded by MDM-2014-0445.


  1. Alighieri, D. (1979). La Divina Commedia, a cura di N. Sapegno. Firenze: Nuova Italia.Google Scholar
  2. Anandkumar, A., Foster, D. P., Hsu, D. J., Kakade, S. M., & Liu, Y.-K. (2012a). A spectral algorithm for latent Dirichlet allocation. In Advances in neural information processing systems (NIPS) (pp. 917–925).Google Scholar
  3. Anandkumar, A., Ge, R., Hsu, D., Kakade, S. M., & Telgarsky, M. (2014). Tensor decompositions for learning latent variable models. The Journal of Machine Learning Research, 15(1), 2773–2832.MathSciNetzbMATHGoogle Scholar
  4. Anandkumar, A., Hsu, D., & Kakade, S. M. (2012b). A method of moments for mixture models and hidden Markov models. In Conference on learning theory (COLT), (pp. 33.1–33.34).Google Scholar
  5. Appellof, C. J., & Davidson, E. R. (1981). Strategies for analyzing data from video fluorometric monitoring of liquid chromatographic effluents. Analytical Chemistry, 53(13), 2053–2056.CrossRefGoogle Scholar
  6. Balle, B., Hamilton, W., & Pineau, J. (2014). Methods of moments for learning stochastic languages: unified presentation and empirical comparison. In International conference on machine learning (ICML) (pp. 1386–1394).Google Scholar
  7. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. The Journal of Machine Learning Research, 3(Jan), 993–1022.zbMATHGoogle Scholar
  8. Carroll, J. D., & Chang, J.-J. (1970). Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition. Psychometrika, 35(3), 283–319.CrossRefzbMATHGoogle Scholar
  9. Chaganty, A. T., & Liang, P. (2013). Spectral experts for estimating mixtures of linear regressions. In International conference on machine learning (ICML) (pp. 1040–1048).Google Scholar
  10. Chiantini, L., Ottaviani, G., & Vannieuwenhoven, N. (2017). On generic identifiability of symmetric tensors of subgeneric rank. Transactions of the American Mathematical Society, 369(6), 4021–4042.MathSciNetCrossRefzbMATHGoogle Scholar
  11. Colombo, N., & Vlassis, N. (2016). Tensor decomposition via joint matrix Schur decomposition. In International conference on machine learning (ICML) (pp. 2820–2828).Google Scholar
  12. Comon, P., Qi, Y., & Usevich, K. (2017). Identifiability of an X-rank decomposition of polynomial maps. SIAM Journal on Applied Algebra and Geometry, 1(1), 388–414.MathSciNetCrossRefzbMATHGoogle Scholar
  13. De Lathauwer, L., De Moor, B., & Vandewalle, J. (2004). Computation of the canonical decomposition by means of a simultaneous generalized Schur decomposition. SIAM Journal on Matrix Analysis and Applications, 26(2), 295–327.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B (Methodological), 39(1), 1–38.MathSciNetzbMATHGoogle Scholar
  15. Ge, R., Huang, Q., & Kakade, S. M. (2015). Learning mixtures of Gaussians in high dimensions. In Proceedings of the forty-seventh annual ACM symposium on theory of computing (STOC) (pp. 761–770).Google Scholar
  16. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl 1), 5228–5235.CrossRefGoogle Scholar
  17. Halko, N., Martinsson, P.-G., & Tropp, J. A. (2011). Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 53(2), 217–288.MathSciNetCrossRefzbMATHGoogle Scholar
  18. Harshman, R. (1970). Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics, 16, 1–84.Google Scholar
  19. Hitchcock, F. L. (1927). The expression of a tensor or a polyadic as a sum of products. Studies in Applied Mathematics, 6(1–4), 164–189.zbMATHGoogle Scholar
  20. Hitchcock, F. L. (1928). Multiple invariants and generalized rank of a P-way matrix or tensor. Studies in Applied Mathematics, 7(1–4), 39–79.zbMATHGoogle Scholar
  21. Hsu, D., & Kakade, S. M. (2013). Learning mixtures of spherical Gaussians: moment methods and spectral decompositions. In Proceedings of the 4th conference on innovations in theoretical computer science (ITCS) (pp. 11–20).Google Scholar
  22. Hsu, D., Kakade, S. M., & Zhang, T. (2012). A spectral algorithm for learning hidden Markov models. Journal of Computer and System Sciences, 78(5), 1460–1480.MathSciNetCrossRefzbMATHGoogle Scholar
  23. Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.CrossRefzbMATHGoogle Scholar
  24. Jain, P., & Oh, S. (2014). Learning mixtures of discrete product distributions using spectral decompositions. In Conference on learning theory (COLT) (pp. 824–856).Google Scholar
  25. Kolda, T. G. (2001). Orthogonal tensor decompositions. SIAM Journal on Matrix Analysis and Applications, 23(1), 243–255.MathSciNetCrossRefzbMATHGoogle Scholar
  26. Kolda, T. G., & Bader, B. W. (2009). Tensor decompositions and applications. SIAM Review, 51(3), 455–500.MathSciNetCrossRefzbMATHGoogle Scholar
  27. Kruskal, J. B. (1977). Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and its Applications, 18(2), 95–138.MathSciNetCrossRefzbMATHGoogle Scholar
  28. Kuleshov, V., Chaganty, A., & Liang, P. (2015). Tensor factorization via matrix factorization. In Artificial intelligence and statistics (AISTATS) (pp. 507–516).Google Scholar
  29. Leurgans, S., Ross, R., & Abel, R. (1993). A decomposition for three-way arrays. SIAM Journal on Matrix Analysis and Applications, 14(4), 1064–1083.MathSciNetCrossRefzbMATHGoogle Scholar
  30. McDiarmid, C. (1989). On the method of bounded differences. Surveys in Combinatorics, 141(1), 148–188.MathSciNetzbMATHGoogle Scholar
  31. Mimno, D., Wallach, H. M., Talley, E., Leenders, M., & McCallum, A. (2011). Optimizing semantic coherence in topic models. In Proceedings of the conference on empirical methods in natural language processing (EMNLP) (pp. 262–272).Google Scholar
  32. Newman, D., Asuncion, A., Smyth, P., & Welling, M. (2009). Distributed algorithms for topic models. The Journal of Machine Learning Research, 10(Aug), 1801–1828.MathSciNetzbMATHGoogle Scholar
  33. Pearson, K. (1894). Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society of London, 185, 71–110.CrossRefzbMATHGoogle Scholar
  34. Qi, Y., Comon, P., & Lim, L.-H. (2016). Semialgebraic geometry of nonnegative tensor rank. SIAM Journal on Matrix Analysis and Applications, 37(4), 1556–1580.MathSciNetCrossRefzbMATHGoogle Scholar
  35. Ruffini, M., Gavalda, R., & Limon, E. (2017). Clustering patients with tensor decomposition. In Machine learning for healthcare conference (MLHC) (pp. 126–146).Google Scholar
  36. Sanchez, E., & Kowalski, B. R. (1990). Tensorial resolution: A direct trilinear decomposition. Journal of Chemometrics, 4(1), 29–45.CrossRefGoogle Scholar
  37. Sidiropoulos, N. D., De Lathauwer, L., Fu, X., Huang, K., Papalexakis, E. E., & Faloutsos, C. (2017). Tensor decomposition for signal processing and machine learning. IEEE Transactions on Signal Processing, 65(13), 3551–3582.MathSciNetCrossRefGoogle Scholar
  38. Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces (ILLVI) (pp. 63–70).Google Scholar
  39. Song, L., Xing, E. P., & Parikh, A. P. (2011). A spectral algorithm for latent tree graphical models. In International conference on machine learning (ICML) (pp. 1065–1072).Google Scholar
  40. Stewart, G., & Sun, J.-G. (1990). Matrix perturbation theory (Computer science and scientific computing). Academic Press.Google Scholar
  41. Tomasi, G., & Bro, R. (2006). A comparison of algorithms for fitting the parafac model. Computational Statistics & Data Analysis, 50(7), 1700–1734.MathSciNetCrossRefzbMATHGoogle Scholar
  42. Tucker, L. R. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3), 279–311.MathSciNetCrossRefGoogle Scholar
  43. Van Der Veen, A.-J., & Paulraj, A. (1996). An analytical constant modulus algorithm. IEEE Transactions on Signal Processing, 44(5), 1136–1155.CrossRefGoogle Scholar
  44. Walt, S. V. D., Colbert, S. C., & Varoquaux, G. (2011). The NumPy array: A structure for efficient numerical computation. Computing in Science & Engineering, 13(2), 22–30.CrossRefGoogle Scholar
  45. Zhang, T., & Golub, G. H. (2001). Rank-one approximation to high order tensors. SIAM Journal on Matrix Analysis and Applications, 23(2), 534–550.MathSciNetCrossRefzbMATHGoogle Scholar
  46. Zou, J. Y., Hsu, D. J., Parkes, D. C., & Adams, R. P. (2013). Contrastive learning using spectral methods. In Advances in neural information processing systems (NIPS) (pp. 2238–2246).Google Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.Universitat Politècnica de Catalunya and BGSMathBarcelonaSpain

Personalised recommendations