Advertisement

Estimating Likelihoods for Topic Models

  • Wray Buntine
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5828)

Abstract

Topic models are a discrete analogue to principle component analysis and independent component analysis that model topic at the word level within a document. They have many variants such as NMF, PLSI and LDA, and are used in many fields such as genetics, text and the web, image analysis and recommender systems. However, only recently have reasonable methods for estimating the likelihood of unseen documents, for instance to perform testing or model comparison, become available. This paper explores a number of recent methods, and improves their theory, performance, and testing.

Keywords

Independent Component Analysis Recommender System Topic Model Independent Component Analysis Proposal Distribution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [AGvR]
    Azzopardi, L., Girolami, M., van Risjbergen, K.: Investigating the relationship between language model perplexity and IR precision-recall measures. In: SIGIR 2003, pp. 369–370 (2003)Google Scholar
  2. [BGJT]
    Blei, D., Griffiths, T.L., Jordan, M.I., Tenenbaum, J.B.: Hierarchical topic models and the nested Chinese restaurant process. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2004)Google Scholar
  3. [BJ1]
    Buntine, W., Jakulin, A.: Applying discrete PCA in data analysis. In: UAI-2004, Banff, Canada (2004)Google Scholar
  4. [BJ2]
    Buntine, W.L., Jakulin, A.: Discrete components analysis. In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds.) SLSFS 2005. LNCS, vol. 3940, pp. 1–33. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. [BNJ]
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHCrossRefGoogle Scholar
  6. [Can]
    Canny, J.: GaP: a factor model for discrete data. In: SIGIR 2004, pp. 122–129 (2004)Google Scholar
  7. [CC]
    Carlin, B.P., Chib, S.: Bayesian model choice via MCMC. Journal of the Royal Statistical Society B 57, 473–484 (1995)zbMATHGoogle Scholar
  8. [GB]
    Ghahramani, Z., Beal, M.J.: Propagation algorithms for variational Bayesian learning. In: NIPS, pp. 507–513 (2000)Google Scholar
  9. [GS]
    Griffiths, T.L., Steyvers, M.: Finding scientific topics. In: PNAS Colloquium (2004)Google Scholar
  10. [GSBT]
    Griffiths, T.L., Steyvers, M., Blei, D.M., Tenenbaum, J.B.: Integrating topics and syntax. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 537–544. MIT Press, Cambridge (2005)Google Scholar
  11. [Hof]
    Hofmann, T.: Probabilistic latent semantic indexing. In: Research and Development in Information Retrieval, pp. 50–57 (1999)Google Scholar
  12. [LM]
    Li, W., McCallum, A.: Pachinko allocation: DAG-structured mixture models of topic correlations. In: ICML 2006: Proc. of the 23rd Int. Conf. on Machine learning, pp. 577–584. ACM, New York (2006)CrossRefGoogle Scholar
  13. [LS]
    Lee, D., Seung, H.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)CrossRefGoogle Scholar
  14. [MLM]
    Mimno, D., Li, W., McCallum, A.: Mixtures of hierarchical topics with Pachinko allocation. In: ICML 2007: Proceedings of the 24th international conference on Machine learning, pp. 633–640. ACM, New York (2007)CrossRefGoogle Scholar
  15. [NAXC]
    Nallapati, R., Ahmed, A., Xing, E.P., Cohen, W.W.: Joint latent topic models for text and citations. In: Proc. of the 14th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Las Vegas, pp. 542–550. ACM, New York (2008)CrossRefGoogle Scholar
  16. [PSD]
    Pritchard, J.K., Stephens, M., Donnelly, P.J.: Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000)Google Scholar
  17. [RZGSS]
    Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proc. of the 20th Annual Conf. on Uncertainty in Artificial Intelligence (UAI 2004), Arlington, Virginia, pp. 487–494. AUAI Press (2004)Google Scholar
  18. [Wal]
    Wallach, H.: Structured Topic Models for Language. PhD thesis, University of Cambridge (2008)Google Scholar
  19. [WMSM]
    Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Bottou, L., Littman, M. (eds.) Proceedings of the 26th International Conference on Machine Learning, ICML 2009 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Wray Buntine
    • 1
  1. 1.NICTA and Australian National UniversityCanberraAustralia

Personalised recommendations