Skip to main content

Estimating Likelihoods for Topic Models

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5828))

Abstract

Topic models are a discrete analogue to principle component analysis and independent component analysis that model topic at the word level within a document. They have many variants such as NMF, PLSI and LDA, and are used in many fields such as genetics, text and the web, image analysis and recommender systems. However, only recently have reasonable methods for estimating the likelihood of unseen documents, for instance to perform testing or model comparison, become available. This paper explores a number of recent methods, and improves their theory, performance, and testing.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Azzopardi, L., Girolami, M., van Risjbergen, K.: Investigating the relationship between language model perplexity and IR precision-recall measures. In: SIGIR 2003, pp. 369–370 (2003)

    Google Scholar 

  2. Blei, D., Griffiths, T.L., Jordan, M.I., Tenenbaum, J.B.: Hierarchical topic models and the nested Chinese restaurant process. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2004)

    Google Scholar 

  3. Buntine, W., Jakulin, A.: Applying discrete PCA in data analysis. In: UAI-2004, Banff, Canada (2004)

    Google Scholar 

  4. Buntine, W.L., Jakulin, A.: Discrete components analysis. In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds.) SLSFS 2005. LNCS, vol. 3940, pp. 1–33. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    Article  MATH  Google Scholar 

  6. Canny, J.: GaP: a factor model for discrete data. In: SIGIR 2004, pp. 122–129 (2004)

    Google Scholar 

  7. Carlin, B.P., Chib, S.: Bayesian model choice via MCMC. Journal of the Royal Statistical Society B 57, 473–484 (1995)

    MATH  Google Scholar 

  8. Ghahramani, Z., Beal, M.J.: Propagation algorithms for variational Bayesian learning. In: NIPS, pp. 507–513 (2000)

    Google Scholar 

  9. Griffiths, T.L., Steyvers, M.: Finding scientific topics. In: PNAS Colloquium (2004)

    Google Scholar 

  10. Griffiths, T.L., Steyvers, M., Blei, D.M., Tenenbaum, J.B.: Integrating topics and syntax. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 537–544. MIT Press, Cambridge (2005)

    Google Scholar 

  11. Hofmann, T.: Probabilistic latent semantic indexing. In: Research and Development in Information Retrieval, pp. 50–57 (1999)

    Google Scholar 

  12. Li, W., McCallum, A.: Pachinko allocation: DAG-structured mixture models of topic correlations. In: ICML 2006: Proc. of the 23rd Int. Conf. on Machine learning, pp. 577–584. ACM, New York (2006)

    Chapter  Google Scholar 

  13. Lee, D., Seung, H.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)

    Article  Google Scholar 

  14. Mimno, D., Li, W., McCallum, A.: Mixtures of hierarchical topics with Pachinko allocation. In: ICML 2007: Proceedings of the 24th international conference on Machine learning, pp. 633–640. ACM, New York (2007)

    Chapter  Google Scholar 

  15. Nallapati, R., Ahmed, A., Xing, E.P., Cohen, W.W.: Joint latent topic models for text and citations. In: Proc. of the 14th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Las Vegas, pp. 542–550. ACM, New York (2008)

    Chapter  Google Scholar 

  16. Pritchard, J.K., Stephens, M., Donnelly, P.J.: Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000)

    Google Scholar 

  17. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proc. of the 20th Annual Conf. on Uncertainty in Artificial Intelligence (UAI 2004), Arlington, Virginia, pp. 487–494. AUAI Press (2004)

    Google Scholar 

  18. Wallach, H.: Structured Topic Models for Language. PhD thesis, University of Cambridge (2008)

    Google Scholar 

  19. Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Bottou, L., Littman, M. (eds.) Proceedings of the 26th International Conference on Machine Learning, ICML 2009 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Buntine, W. (2009). Estimating Likelihoods for Topic Models. In: Zhou, ZH., Washio, T. (eds) Advances in Machine Learning. ACML 2009. Lecture Notes in Computer Science(), vol 5828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05224-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-05224-8_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05223-1

  • Online ISBN: 978-3-642-05224-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics