Algorithmica

, Volume 72, Issue 1, pp 193–214 | Cite as

A Spectral Algorithm for Latent Dirichlet Allocation

  • Anima Anandkumar
  • Dean P. Foster
  • Daniel Hsu
  • Sham M. Kakade
  • Yi-Kai Liu
Article

Abstract

Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. The increased representational power comes at the cost of a more challenging unsupervised learning problem for estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method is based on an efficiently computable orthogonal tensor decomposition of low-order moments.

Keywords

Topic models Mixture models Method of moments  Latent Dirichlet allocation 

Notes

Acknowledgments

We thank Kamalika Chaudhuri, Adam Kalai, Percy Liang, Chris Meek, David Sontag, and Tong Zhang for valuable insights. We also thank Rong Ge for sharing preliminary results (in [8]) and the anonymous reviewers for their comments, suggestions, and pointers to references. Part of this work was completed while DH was a postdoctoral researcher at Microsoft Research New England, and while DPF, YKL, and AA were visiting the same lab. AA is supported in part by Microsoft Faculty Fellowship, NSF Career award CCF-1254106, NSF Award CCF-1219234, NSF BIGDATA IIS-1251267 and ARO YIP Award W911NF-13-1-0084.

References

  1. 1.
    Achlioptas, D., McSherry, F.: On spectral learning of mixtures of distributions. Eighteenth Annual Conference on Learning Theory, pp. 458–469. Springer, Bertinoro (2005)Google Scholar
  2. 2.
    Anandkumar, A., Chaudhuri, K., Hsu, D., Kakade, S.M., Song, L., Zhang, T.: Spectral methods for learning multivariate latent tree structure. Adv. Neural Inf. Process. Syst. 24, 2025–2033 (2011)Google Scholar
  3. 3.
    Anandkumar, A., Foster, D.P., Hsu, D., Kakade, S.M., Liu, Y.K.: A spectral algorithm for latent Dirichlet allocation. Adv. Neural Inf. Process. Syst. 25, 917–925 (2012)Google Scholar
  4. 4.
    Anandkumar, A., Foster, D.P., Hsu, D., Kakade, S.M., Liu, Y.K.: Two SVDs suffice: spectral decompositions for probabilistic topic models and latent Dirichlet allocation (2012). arXiv:1204.6703v1
  5. 5.
    Anandkumar, A., Ge, R., Hsu, D., Kakade, S.M., Telgarsky, M.: Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. (2014). To appear.Google Scholar
  6. 6.
    Anandkumar, A., Hsu, D., Kakade, S.M.: A method of moments for mixture models and hidden Markov models. In: Twenty-Fifth Annual Conference on Learning Theory, vol. 23, pp. 33.1-33.34 (2012)Google Scholar
  7. 7.
    Ando, R., Zhang, T.: Two-view feature generation model for semi-supervised learning. In: Twenty-Fourth International Conference on Machine Learning, pp. 25–32 (2007)Google Scholar
  8. 8.
    Arora, S., Ge, R., Moitra, A.: Learning topic models – going beyond SVD. In: Fifty-Third IEEE Annual Symposium on Foundations of Computer Science, pp. 1–10 (2012)Google Scholar
  9. 9.
    Arora, S., Ge, R., Moitra, A., Sachdeva, S.: Provable ICA with unknown Gaussian noise, with implications for Gaussian mixtures and autoencoders. Adv. Neural Inf. Process. Syst. 25, 2375–2383 (2012)Google Scholar
  10. 10.
    Arora, S., Kannan, R.: Learning mixtures of separated nonspherical Gaussians. Ann. Appl. Probab. 15(1A), 69–92 (2005)CrossRefMATHMathSciNetGoogle Scholar
  11. 11.
    Belkin, M., Sinha, K.: Polynomial learning of distribution families. In: Fifty-First Annual IEEE Symposium on Foundations of Computer Science, pp. 103–112 (2010)Google Scholar
  12. 12.
    Blei, D.M., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATHGoogle Scholar
  13. 13.
    Canny, J.: GaP: A factor model for discrete data. In: Proceedings of the Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 122–129 (2004)Google Scholar
  14. 14.
    Cardoso, J.F., Comon, P.: Independent component analysis, a survey of some algebraic methods. In: IEEE International Symposium on Circuits and Systems, pp. 93–96 (1996)Google Scholar
  15. 15.
    Chang, J.T.: Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math. Biosci. 137, 51–73 (1996)CrossRefMATHMathSciNetGoogle Scholar
  16. 16.
    Chaudhuri, K., Kakade, S.M., Livescu, K., Sridharan, K.: Multi-view clustering via canonical correlation analysis. In: Twenty-Sixth Annual International Conference on Machine Learning, pp. 129–136 (2009)Google Scholar
  17. 17.
    Chaudhuri, K., Rao, S.: Learning mixtures of product distributions using correlations and independence. In: Twenty-First Annual Conference on Learning Theory, pp. 9–20 (2008)Google Scholar
  18. 18.
    Comon, P., Jutten, C.: Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press, Waltham (2010)Google Scholar
  19. 19.
    Dasgupta, S.: Learning mixutres of Gaussians. In: Fortieth Annual IEEE Symposium on Foundations of Computer Science, pp. 634–644 (1999)Google Scholar
  20. 20.
    Dasgupta, S., Schulman, L.: A probabilistic analysis of EM for mixtures of separated, spherical Gaussians. J. Mach. Learn. Res. 8, 203–226 (2007)MATHMathSciNetGoogle Scholar
  21. 21.
    Frieze, A.M., Jerrum, M., Kannan, R.: Learning linear transformations. In: Thirty-Seventh Annual Symposium on Foundations of Computer Science, pp. 359–368 (1996)Google Scholar
  22. 22.
    Griffiths, T.: Gibbs sampling in the generative model of latent Dirichlet allocation. Tech. rep., Stanford University (2002)Google Scholar
  23. 23.
    Harshman, R.: Foundations of the PARAFAC procedure: model and conditions for an ‘explanatory’ multi-mode factor analysis. Tech. rep., UCLA Working Papers in Phonetics (1970)Google Scholar
  24. 24.
    Hitchcock, F.: The expression of a tensor or a polyadic as a sum of products. J. Math. Phys. 6, 164–189 (1927)MATHGoogle Scholar
  25. 25.
    Hitchcock, F.: Multiple invariants and generalized rank of a p-way matrix or tensor. J. Math. Phys. 7, 39–79 (1927)MATHGoogle Scholar
  26. 26.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the Twenty-Second Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)Google Scholar
  27. 27.
    Hotelling, H.: The most predictable criterion. J. Educ. Psychol. 26(2), 139–142 (1935)CrossRefGoogle Scholar
  28. 28.
    Hsu, D., Kakade, S.M.: Learning mixtures of spherical Gaussians: moment methods and spectral decompositions. In: Fourth Innovations in Theoretical Computer Science (2013)Google Scholar
  29. 29.
    Hsu, D., Kakade, S.M., Zhang, T.: A spectral algorithm for learning hidden Markov models. J. Comput. Syst. Sci. 78(5), 1460–1480 (2012). http://www.sciencedirect.com/science/article/pii/S0022000012000244
  30. 30.
    Jutten, C., Herault, J.: Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture. Signal Process. 24, 1–10 (1991)CrossRefMATHGoogle Scholar
  31. 31.
    Kakade, S.M., Foster, D.P.: Multi-view regression via canonical correlation analysis. In: Twentieth Annual Conference on Learning Theory, pp. 82–96 (2007)Google Scholar
  32. 32.
    Kalai, A.T., Moitra, A., Valiant, G.: Efficiently learning mixtures of two Gaussians. In: Forty-second ACM Symposium on Theory of Computing, pp. 553–562 (2010)Google Scholar
  33. 33.
    Kannan, R., Salmasian, H., Vempala, S.: The spectral method for general mixture models. SIAM J. Comput. 38(3), 1141–1156 (2008)CrossRefMATHMathSciNetGoogle Scholar
  34. 34.
    Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)CrossRefMATHMathSciNetGoogle Scholar
  35. 35.
    Kruskal, J.B.: Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Its Appl. 18(2), 95–138 (1977)CrossRefMATHMathSciNetGoogle Scholar
  36. 36.
    Lee, D.D., Seung, H.S.: Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788–791 (1999)CrossRefGoogle Scholar
  37. 37.
    Leurgans, S., Ross, R., Abel, R.: A decomposition for three-way arrays. SIAM J. Matrix Anal. Appl. 14(4), 1064–1083 (1993)CrossRefMATHMathSciNetGoogle Scholar
  38. 38.
    Moitra, A., Valiant, G.: Settling the polynomial learnability of mixtures of Gaussians. In: Fifty-First Annual IEEE Symposium on Foundations of Computer Science, pp. 93–102 (2010)Google Scholar
  39. 39.
    Mossel, E., Roch, S.: Learning nonsingular phylogenies and hidden Markov models. Ann. Appl. Probab. 16(2), 583–614 (2006)CrossRefMATHMathSciNetGoogle Scholar
  40. 40.
    Nguyen, P.Q., Regev, O.: Learning a parallelepiped: cryptanalysis of GGH and NTRU signatures. J. Cryptol. 22(2), 139–160 (2009)Google Scholar
  41. 41.
    Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. J. Comput. Syst. Sci. 61(2), 217–235 (2000)CrossRefMATHMathSciNetGoogle Scholar
  42. 42.
    Pearson, K.: Contributions to the mathematical theory of evolution. Philos. Trans. R. Soc. Lond. A 185, 71–110 (1894)CrossRefMATHGoogle Scholar
  43. 43.
    Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26(2), 195–239 (1984)CrossRefMATHMathSciNetGoogle Scholar
  44. 44.
    Tucker, L.R.: Some mathematical notes on three-mode factor analysis. Psychometrika 31(3), 279–311 (1966)CrossRefMathSciNetGoogle Scholar
  45. 45.
    Vempala, S., Wang, G.: A spectral algorithm for learning mixtures models. J. Comput. Syst. Sci. 68(4), 841–860 (2004)CrossRefMATHMathSciNetGoogle Scholar
  46. 46.
    Zou, J., Hsu, D., Parkes, D., Adams, R.: Contrastive learning using spectral methods. Adv. Neural Inf. Process. Syst. 26, 2238–2246 (2013)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Anima Anandkumar
    • 1
  • Dean P. Foster
    • 2
  • Daniel Hsu
    • 3
  • Sham M. Kakade
    • 4
  • Yi-Kai Liu
    • 5
  1. 1.University of California, IrvineIrvineUSA
  2. 2.Yahoo! LabsNew YorkUSA
  3. 3.Columbia UniversityNew YorkUSA
  4. 4.Microsoft ResearchCambridgeUSA
  5. 5.National Institute of Standards and TechnologyGaithersburgUSA

Personalised recommendations