Skip to main content

Prior-Knowledge-Embedded LDA with Word2vec – for Detecting Specific Topics in Documents

  • Conference paper
  • First Online:
Knowledge Management and Acquisition for Intelligent Systems (PKAW 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11669))

Included in the following conference series:

  • 686 Accesses

Abstract

This paper proposes a method to apply prior knowledge about topics of interest to Latent Dirichlet Allocation (LDA). The conventional LDA sometimes fails to detect specific topics of interest. Therefore, our approach uses word2vec to acquire linkages between words related to specific topics. The extracted linkages are used as prior knowledge about the topics in the subsequent LDA process. The extracted linkages can also be used to annotate words in a consistent manner. Such consistent annotations cannot be realized using conventional LDA, which relies on bag-of-words–based clustering. We examine our approach by applying it to travelers’ reviews, to detect topics related to Japanese shrines. The experimental results show that our approach is effective in the following three aspects: (1) The average coherence of our approach, i.e., the semantic consistencies among words, outperforms that of the conventional LDA. (2) Words in each sentence are annotated such that the annotations reflect the topic of the sentence. The conventional LDA sometimes makes confusing/mixed annotations to the words in a single sentence. Our approach, on the contrary, can make annotations that reflect the topic of the sentence in a consistent manner. (3) Our approach enables to detect very specific topics complying with users’ interests.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)

    MATH  Google Scholar 

  2. Budhkar, A., Rudzicz, F.: Augmenting word2vec with latent Dirichlet allocation within a clinical application. arXiv preprint arXiv:1808.03967 (2018)

  3. He, Y.: Extracting topical phrases from clinical documents. In: Thirtieth AAAI Conference on Artificial Intelligence, pp. 2957–2963 (2016)

    Google Scholar 

  4. Li, C., et al.: LDA meets word2vec: A novel model for academic abstract clustering. In: Companion Proceedings of the The Web Conference 2018, pp. 1699–1706. International World Wide Web Conferences Steering Committee (2018)

    Google Scholar 

  5. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances In Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  6. Moody, C.E.: Mixing Dirichlet topic models and word embeddings to make lda2vec. arXiv preprint arXiv:1605.02019 (2016)

  7. Yao, L., et al.: Incorporating knowledge graph embeddings into topic modeling. In: Thirty-First AAAI Conference on Artificial Intelligence, pp. 3119–3126 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroshi Uehara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Uehara, H., Ito, A., Saito, Y., Yoshida, K. (2019). Prior-Knowledge-Embedded LDA with Word2vec – for Detecting Specific Topics in Documents. In: Ohara, K., Bai, Q. (eds) Knowledge Management and Acquisition for Intelligent Systems. PKAW 2019. Lecture Notes in Computer Science(), vol 11669. Springer, Cham. https://doi.org/10.1007/978-3-030-30639-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30639-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30638-0

  • Online ISBN: 978-3-030-30639-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics