Bigram Anchor Words Topic Model

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 661)


A probabilistic topic model is a modern statistical tool for document collection analysis that allows extracting a number of topics in the collection and describes each document as a discrete probability distribution over topics. Classical approaches to statistical topic modeling can be quite effective in various tasks, but the generated topics may be too similar to each other or poorly interpretable. We supposed that it is possible to improve the interpretability and differentiation of topics by using linguistic information such as collocations while building the topic model. In this paper we offer an approach to accounting bigrams (two-word phrases) for the construction of Anchor Words Topic Model.


Topic model Anchor words Bigram 



This work was supported by grant RFFI 14-07-00383A Open image in new window Research of methods of integration of linguistic knowledge into statistical topic models Open image in new window .


  1. 1.
    Arora, S., Ge, R., Halpern, Y., Mimno, D., Moitra, A., Sontag, D., Wu, Y., Zhu, M.: A practical algorithm for topic modeling with provable guarantees. arXiv preprint arXiv:1212.4777 (2012)
  2. 2.
    Arora, S., Ge, R., Moitra, A.: Learning topic models - going beyond SVD. In: 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 1–10. IEEE (2012)Google Scholar
  3. 3.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Machine Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  4. 4.
    Cheng, X., Yan, X., Lan, Y., Guo, J.: BTM: topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26(12), 2928–2941 (2014)CrossRefGoogle Scholar
  5. 5.
    Dobrov, B., Lukashevich, N., Siromytnikov, S.: Forming the base of terminological phrases in the texts of the subject area, pp. 201–210 (2003)Google Scholar
  6. 6.
    Gao, W., Li, P., Darwish, K.: Joint topic modeling for event summarization across news and social media streams. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1173–1182. ACM (2012)Google Scholar
  7. 7.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)Google Scholar
  8. 8.
    Krestel, R., Fankhauser, P., Nejdl, W.: Latent dirichlet allocation for tag recommendation. In: Proceedings of the third ACM Conference on Recommender Systems, pp. 61–68. ACM (2009)Google Scholar
  9. 9.
    MacKay, D.J., Peto, L.C.B.: A hierarchical dirichlet language model. Nat. Lang. Eng. 1(03), 289–308 (1995)CrossRefGoogle Scholar
  10. 10.
    Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: Proceedings of the 17th International Conference on World Wide Web, pp. 101–110. ACM (2008)Google Scholar
  11. 11.
    Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 100–108. Association for Computational Linguistics (2010)Google Scholar
  12. 12.
    Nokel, M., Loukachevitch, N.: The method of accounting bigram structure in topical models. Computational Methods and Programming (2015)Google Scholar
  13. 13.
    Steyvers, M., Griffiths, T.: Matlab topic modeling toolbox 1.3 (2005)Google Scholar
  14. 14.
    Vorontsov, K.: Additive regularization for topic models of text collections. In: Doklady Mathematics, pp. 301–304. Pleiades Publishing (2014)Google Scholar
  15. 15.
    Wallach, H.M.: Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 977–984. ACM (2006)Google Scholar
  16. 16.
    Wang, X., McCallum, A., Wei, X.: Topical n-grams: phrase and topic discovery, with an application to information retrieval. In: Seventh IEEE International Conference on Data Mining, ICDM 2007, pp. 697–702. IEEE (2007)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Moscow Institute of Physics and TechnologyDolgoprudnyRussia
  2. 2.Research Computing Center of Lomonosov Moscow State UniversityMoscowRussia

Personalised recommendations