Skip to main content

Topic Model

  • Chapter
  • First Online:
Text Data Mining

Abstract

For human beings, the process of generating a piece of text is extremely complex. People often think in terms of an outline containing the main “topics” before writing the concrete words. However, as an explicit text representation method, the vector space model represents a piece of text as a vector of independent words, which destroys the word order information and syntactic structure and ignores semantic relationships such as polysemy and synonymy in the text.

To solve these problems, researchers in the fields of natural language processing and information retrieval have proposed a series of statistical models called topic models, including latent semantic analysis, probabilistic latent semantic analysis, and latent Dirichlet allocation. The purpose of building such topic models is to discover the latent semantics behind the text. Topic models are also referred to as probabilistic topic models. This chapter will first introduce the definition of “topic” in this field, and then introduce three main topic modeling methods respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. In Proceedings of ICML.

    Google Scholar 

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

    MATH  Google Scholar 

  • Chang, J., & Blei, D. (2009). Relational topic models for document networks. In Artificial Intelligence and Statistics.

    Google Scholar 

  • Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.

    Article  Google Scholar 

  • Dumais, S. T., Furnas, G. W., Landauer, T. K., Deerwester, S., & Harshman, R. (1988). Using latent semantic analysis to improve access to textual information. In Proceedings of SIGCHI (pp. 281–285).

    Google Scholar 

  • Girolami, M., & Kabán, A. (2003). On an equivalence between PLSI and LDA. In Proceedings of SIGIR (pp. 433–434).

    Google Scholar 

  • Griffiths, T. L., Jordan, M. I., Tenenbaum, J. B., & Blei, D. M. (2004). Hierarchical topic models and the nested Chinese restaurant process. In Advances in Neural Information Processing Systems (pp. 17–24).

    Google Scholar 

  • Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(Suppl. 1), 5228–5235.

    Article  Google Scholar 

  • Heinrich, G. (2005). Parameter estimation for text analysis. Technical report.

    Google Scholar 

  • Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of SIGIR (pp. 50–57).

    Google Scholar 

  • Li, W., & McCallum, A. (2006). Pachinko allocation: DAG-structured mixture models of topic correlations. In Proceedings of ICML (pp. 577–584).

    Google Scholar 

  • Lin, C., & He, Y. (2009). Joint sentiment/topic model for sentiment analysis. In Proceedings of CIKM (pp. 375–384).

    Google Scholar 

  • Mcauliffe, J. D., & Blei, D. M. (2008). Supervised topic models. In Advances in Neural Information Processing Systems (pp. 121–128).

    Google Scholar 

  • McCallum, A., Corrada-Emmanuel, A., & Wang, X. (2005). Topic and role discovery in social networks. https://scholarworks.umass.edu/cs_faculty_pubs

    Google Scholar 

  • Mei, Q., Ling, X., Wondra, M., Su, H., & Zhai, C. (2007). Topic sentiment mixture: Modeling facets and opinions in weblogs. In Proceedings of WWW (pp. 171–180).

    Google Scholar 

  • Mei, Q., & Zhai, C. (2001). A note on EM algorithm for probabilistic latent semantic analysis. In Proceedings of CIKM.

    Google Scholar 

  • Ramage, D., Hall, D., Nallapati, R., & Manning, C. D. (2009). Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of EMNLP (pp. 248–256). Stroudsburg: Association for Computational Linguistics.

    Chapter  Google Scholar 

  • Steyvers, M., Smyth, P., Rosen-Zvi, M., & Griffiths, T. (2004). Probabilistic author-topic models for information discovery. In Proceedings of ACM SIGKDD (pp. 306–315).

    Google Scholar 

  • Titov, I., & McDonald, R. (2008). A joint model of text and aspect ratings for sentiment summarization. In Proceedings of ACL (pp. 308–316). Columbus, OH: Association for Computational Linguistics.

    Google Scholar 

  • Wang, X., & McCallum, A. (2006). Topics over time: A non-markov continuous-time model of topical trends. In Proceedings of ACM SIGKDD (pp. 424–433).

    Google Scholar 

  • Zhao, W. X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., et al. (2011). Comparing twitter and traditional media using topic models. In European Conference on Information Retrieval (pp. 338–349). Berlin: Springer.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Tsinghua University Press

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zong, C., Xia, R., Zhang, J. (2021). Topic Model. In: Text Data Mining. Springer, Singapore. https://doi.org/10.1007/978-981-16-0100-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-0100-2_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-0099-9

  • Online ISBN: 978-981-16-0100-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics