Skip to main content

Bigram Anchor Words Topic Model

  • Conference paper
  • First Online:
Analysis of Images, Social Networks and Texts (AIST 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 661))

  • 1337 Accesses

Abstract

A probabilistic topic model is a modern statistical tool for document collection analysis that allows extracting a number of topics in the collection and describes each document as a discrete probability distribution over topics. Classical approaches to statistical topic modeling can be quite effective in various tasks, but the generated topics may be too similar to each other or poorly interpretable. We supposed that it is possible to improve the interpretability and differentiation of topics by using linguistic information such as collocations while building the topic model. In this paper we offer an approach to accounting bigrams (two-word phrases) for the construction of Anchor Words Topic Model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arora, S., Ge, R., Halpern, Y., Mimno, D., Moitra, A., Sontag, D., Wu, Y., Zhu, M.: A practical algorithm for topic modeling with provable guarantees. arXiv preprint arXiv:1212.4777 (2012)

  2. Arora, S., Ge, R., Moitra, A.: Learning topic models - going beyond SVD. In: 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 1–10. IEEE (2012)

    Google Scholar 

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Machine Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Cheng, X., Yan, X., Lan, Y., Guo, J.: BTM: topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26(12), 2928–2941 (2014)

    Article  Google Scholar 

  5. Dobrov, B., Lukashevich, N., Siromytnikov, S.: Forming the base of terminological phrases in the texts of the subject area, pp. 201–210 (2003)

    Google Scholar 

  6. Gao, W., Li, P., Darwish, K.: Joint topic modeling for event summarization across news and social media streams. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1173–1182. ACM (2012)

    Google Scholar 

  7. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)

    Google Scholar 

  8. Krestel, R., Fankhauser, P., Nejdl, W.: Latent dirichlet allocation for tag recommendation. In: Proceedings of the third ACM Conference on Recommender Systems, pp. 61–68. ACM (2009)

    Google Scholar 

  9. MacKay, D.J., Peto, L.C.B.: A hierarchical dirichlet language model. Nat. Lang. Eng. 1(03), 289–308 (1995)

    Article  Google Scholar 

  10. Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: Proceedings of the 17th International Conference on World Wide Web, pp. 101–110. ACM (2008)

    Google Scholar 

  11. Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 100–108. Association for Computational Linguistics (2010)

    Google Scholar 

  12. Nokel, M., Loukachevitch, N.: The method of accounting bigram structure in topical models. Computational Methods and Programming (2015)

    Google Scholar 

  13. Steyvers, M., Griffiths, T.: Matlab topic modeling toolbox 1.3 (2005)

    Google Scholar 

  14. Vorontsov, K.: Additive regularization for topic models of text collections. In: Doklady Mathematics, pp. 301–304. Pleiades Publishing (2014)

    Google Scholar 

  15. Wallach, H.M.: Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 977–984. ACM (2006)

    Google Scholar 

  16. Wang, X., McCallum, A., Wei, X.: Topical n-grams: phrase and topic discovery, with an application to information retrieval. In: Seventh IEEE International Conference on Data Mining, ICDM 2007, pp. 697–702. IEEE (2007)

    Google Scholar 

Download references

Acknowledgments

This work was supported by grant RFFI 14-07-00383A Research of methods of integration of linguistic knowledge into statistical topic models.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natalia Loukachevitch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Ashuha, A., Loukachevitch, N. (2017). Bigram Anchor Words Topic Model. In: Ignatov, D., et al. Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science, vol 661. Springer, Cham. https://doi.org/10.1007/978-3-319-52920-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52920-2_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52919-6

  • Online ISBN: 978-3-319-52920-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics