Estimating Likelihoods for Topic Models

Buntine, Wray

doi:10.1007/978-3-642-05224-8_6

Estimating Likelihoods for Topic Models

Wray Buntine²¹

Conference paper

2411 Accesses
20 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5828))

Abstract

Topic models are a discrete analogue to principle component analysis and independent component analysis that model topic at the word level within a document. They have many variants such as NMF, PLSI and LDA, and are used in many fields such as genetics, text and the web, image analysis and recommender systems. However, only recently have reasonable methods for estimating the likelihood of unseen documents, for instance to perform testing or model comparison, become available. This paper explores a number of recent methods, and improves their theory, performance, and testing.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Azzopardi, L., Girolami, M., van Risjbergen, K.: Investigating the relationship between language model perplexity and IR precision-recall measures. In: SIGIR 2003, pp. 369–370 (2003)
Google Scholar
Blei, D., Griffiths, T.L., Jordan, M.I., Tenenbaum, J.B.: Hierarchical topic models and the nested Chinese restaurant process. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2004)
Google Scholar
Buntine, W., Jakulin, A.: Applying discrete PCA in data analysis. In: UAI-2004, Banff, Canada (2004)
Google Scholar
Buntine, W.L., Jakulin, A.: Discrete components analysis. In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds.) SLSFS 2005. LNCS, vol. 3940, pp. 1–33. Springer, Heidelberg (2006)
Chapter Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Article MATH Google Scholar
Canny, J.: GaP: a factor model for discrete data. In: SIGIR 2004, pp. 122–129 (2004)
Google Scholar
Carlin, B.P., Chib, S.: Bayesian model choice via MCMC. Journal of the Royal Statistical Society B 57, 473–484 (1995)
MATH Google Scholar
Ghahramani, Z., Beal, M.J.: Propagation algorithms for variational Bayesian learning. In: NIPS, pp. 507–513 (2000)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. In: PNAS Colloquium (2004)
Google Scholar
Griffiths, T.L., Steyvers, M., Blei, D.M., Tenenbaum, J.B.: Integrating topics and syntax. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 537–544. MIT Press, Cambridge (2005)
Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Research and Development in Information Retrieval, pp. 50–57 (1999)
Google Scholar
Li, W., McCallum, A.: Pachinko allocation: DAG-structured mixture models of topic correlations. In: ICML 2006: Proc. of the 23rd Int. Conf. on Machine learning, pp. 577–584. ACM, New York (2006)
Chapter Google Scholar
Lee, D., Seung, H.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)
Article Google Scholar
Mimno, D., Li, W., McCallum, A.: Mixtures of hierarchical topics with Pachinko allocation. In: ICML 2007: Proceedings of the 24th international conference on Machine learning, pp. 633–640. ACM, New York (2007)
Chapter Google Scholar
Nallapati, R., Ahmed, A., Xing, E.P., Cohen, W.W.: Joint latent topic models for text and citations. In: Proc. of the 14th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Las Vegas, pp. 542–550. ACM, New York (2008)
Chapter Google Scholar
Pritchard, J.K., Stephens, M., Donnelly, P.J.: Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000)
Google Scholar
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proc. of the 20th Annual Conf. on Uncertainty in Artificial Intelligence (UAI 2004), Arlington, Virginia, pp. 487–494. AUAI Press (2004)
Google Scholar
Wallach, H.: Structured Topic Models for Language. PhD thesis, University of Cambridge (2008)
Google Scholar
Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Bottou, L., Littman, M. (eds.) Proceedings of the 26th International Conference on Machine Learning, ICML 2009 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

NICTA and Australian National University, Locked Bag 8001, Canberra, 2601, ACT, Australia
Wray Buntine

Authors

Wray Buntine
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Laboratory for Novel Software Technology, Nanjing University, 22 Hankou Road, 210093, Nanjing, China
Zhi-Hua Zhou
The Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, 567, Osaka, Ibaraki, Japan
Takashi Washio

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Buntine, W. (2009). Estimating Likelihoods for Topic Models. In: Zhou, ZH., Washio, T. (eds) Advances in Machine Learning. ACML 2009. Lecture Notes in Computer Science(), vol 5828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05224-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-05224-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05223-1
Online ISBN: 978-3-642-05224-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics