Abstract
For human beings, the process of generating a piece of text is extremely complex. People often think in terms of an outline containing the main “topics” before writing the concrete words. However, as an explicit text representation method, the vector space model represents a piece of text as a vector of independent words, which destroys the word order information and syntactic structure and ignores semantic relationships such as polysemy and synonymy in the text.
To solve these problems, researchers in the fields of natural language processing and information retrieval have proposed a series of statistical models called topic models, including latent semantic analysis, probabilistic latent semantic analysis, and latent Dirichlet allocation. The purpose of building such topic models is to discover the latent semantics behind the text. Topic models are also referred to as probabilistic topic models. This chapter will first introduce the definition of “topic” in this field, and then introduce three main topic modeling methods respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. In Proceedings of ICML.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Chang, J., & Blei, D. (2009). Relational topic models for document networks. In Artificial Intelligence and Statistics.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.
Dumais, S. T., Furnas, G. W., Landauer, T. K., Deerwester, S., & Harshman, R. (1988). Using latent semantic analysis to improve access to textual information. In Proceedings of SIGCHI (pp. 281–285).
Girolami, M., & Kabán, A. (2003). On an equivalence between PLSI and LDA. In Proceedings of SIGIR (pp. 433–434).
Griffiths, T. L., Jordan, M. I., Tenenbaum, J. B., & Blei, D. M. (2004). Hierarchical topic models and the nested Chinese restaurant process. In Advances in Neural Information Processing Systems (pp. 17–24).
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(Suppl. 1), 5228–5235.
Heinrich, G. (2005). Parameter estimation for text analysis. Technical report.
Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of SIGIR (pp. 50–57).
Li, W., & McCallum, A. (2006). Pachinko allocation: DAG-structured mixture models of topic correlations. In Proceedings of ICML (pp. 577–584).
Lin, C., & He, Y. (2009). Joint sentiment/topic model for sentiment analysis. In Proceedings of CIKM (pp. 375–384).
Mcauliffe, J. D., & Blei, D. M. (2008). Supervised topic models. In Advances in Neural Information Processing Systems (pp. 121–128).
McCallum, A., Corrada-Emmanuel, A., & Wang, X. (2005). Topic and role discovery in social networks. https://scholarworks.umass.edu/cs_faculty_pubs
Mei, Q., Ling, X., Wondra, M., Su, H., & Zhai, C. (2007). Topic sentiment mixture: Modeling facets and opinions in weblogs. In Proceedings of WWW (pp. 171–180).
Mei, Q., & Zhai, C. (2001). A note on EM algorithm for probabilistic latent semantic analysis. In Proceedings of CIKM.
Ramage, D., Hall, D., Nallapati, R., & Manning, C. D. (2009). Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of EMNLP (pp. 248–256). Stroudsburg: Association for Computational Linguistics.
Steyvers, M., Smyth, P., Rosen-Zvi, M., & Griffiths, T. (2004). Probabilistic author-topic models for information discovery. In Proceedings of ACM SIGKDD (pp. 306–315).
Titov, I., & McDonald, R. (2008). A joint model of text and aspect ratings for sentiment summarization. In Proceedings of ACL (pp. 308–316). Columbus, OH: Association for Computational Linguistics.
Wang, X., & McCallum, A. (2006). Topics over time: A non-markov continuous-time model of topical trends. In Proceedings of ACM SIGKDD (pp. 424–433).
Zhao, W. X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., et al. (2011). Comparing twitter and traditional media using topic models. In European Conference on Information Retrieval (pp. 338–349). Berlin: Springer.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2021 Tsinghua University Press
About this chapter
Cite this chapter
Zong, C., Xia, R., Zhang, J. (2021). Topic Model. In: Text Data Mining. Springer, Singapore. https://doi.org/10.1007/978-981-16-0100-2_7
Download citation
DOI: https://doi.org/10.1007/978-981-16-0100-2_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-0099-9
Online ISBN: 978-981-16-0100-2
eBook Packages: Computer ScienceComputer Science (R0)