Bigram Anchor Words Topic Model

Ashuha, Arseniy; Loukachevitch, Natalia

doi:10.1007/978-3-319-52920-2_12

Arseniy Ashuha¹⁸ &
Natalia Loukachevitch¹⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 661))

Included in the following conference series:

International Conference on Analysis of Images, Social Networks and Texts

1337 Accesses

Abstract

A probabilistic topic model is a modern statistical tool for document collection analysis that allows extracting a number of topics in the collection and describes each document as a discrete probability distribution over topics. Classical approaches to statistical topic modeling can be quite effective in various tasks, but the generated topics may be too similar to each other or poorly interpretable. We supposed that it is possible to improve the interpretability and differentiation of topics by using linguistic information such as collocations while building the topic model. In this paper we offer an approach to accounting bigrams (two-word phrases) for the construction of Anchor Words Topic Model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arora, S., Ge, R., Halpern, Y., Mimno, D., Moitra, A., Sontag, D., Wu, Y., Zhu, M.: A practical algorithm for topic modeling with provable guarantees. arXiv preprint arXiv:1212.4777 (2012)
Arora, S., Ge, R., Moitra, A.: Learning topic models - going beyond SVD. In: 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 1–10. IEEE (2012)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Machine Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Cheng, X., Yan, X., Lan, Y., Guo, J.: BTM: topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26(12), 2928–2941 (2014)
Article Google Scholar
Dobrov, B., Lukashevich, N., Siromytnikov, S.: Forming the base of terminological phrases in the texts of the subject area, pp. 201–210 (2003)
Google Scholar
Gao, W., Li, P., Darwish, K.: Joint topic modeling for event summarization across news and social media streams. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1173–1182. ACM (2012)
Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)
Google Scholar
Krestel, R., Fankhauser, P., Nejdl, W.: Latent dirichlet allocation for tag recommendation. In: Proceedings of the third ACM Conference on Recommender Systems, pp. 61–68. ACM (2009)
Google Scholar
MacKay, D.J., Peto, L.C.B.: A hierarchical dirichlet language model. Nat. Lang. Eng. 1(03), 289–308 (1995)
Article Google Scholar
Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: Proceedings of the 17th International Conference on World Wide Web, pp. 101–110. ACM (2008)
Google Scholar
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 100–108. Association for Computational Linguistics (2010)
Google Scholar
Nokel, M., Loukachevitch, N.: The method of accounting bigram structure in topical models. Computational Methods and Programming (2015)
Google Scholar
Steyvers, M., Griffiths, T.: Matlab topic modeling toolbox 1.3 (2005)
Google Scholar
Vorontsov, K.: Additive regularization for topic models of text collections. In: Doklady Mathematics, pp. 301–304. Pleiades Publishing (2014)
Google Scholar
Wallach, H.M.: Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 977–984. ACM (2006)
Google Scholar
Wang, X., McCallum, A., Wei, X.: Topical n-grams: phrase and topic discovery, with an application to information retrieval. In: Seventh IEEE International Conference on Data Mining, ICDM 2007, pp. 697–702. IEEE (2007)
Google Scholar

Download references

Acknowledgments

This work was supported by grant RFFI 14-07-00383A Research of methods of integration of linguistic knowledge into statistical topic models.

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Dolgoprudny, Russia
Arseniy Ashuha
Research Computing Center of Lomonosov Moscow State University, Moscow, Russia
Natalia Loukachevitch

Authors

Arseniy Ashuha
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Loukachevitch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Natalia Loukachevitch .

Editor information

Editors and Affiliations

National Research University Higher School of Economics, Moscow, Russia
Dmitry I. Ignatov
Krasovsky Institute of Mathematics and Mechanics, Yekaterinburg, Russia
Mikhail Yu. Khachay
Ural Federal University, Yekaterinbug, Russia
Valeri G. Labunets
Research Computing Center, Lomonosov Moscow State University, Moscow, Russia
Natalia Loukachevitch
National Research University Higher School of Economics, St. Petersburg, Russia
Sergey I. Nikolenko
Technische Universität Darmstadt, Darmstadt, Germany
Alexander Panchenko
Laboratory of Algorithms and Technologies for Networks Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia
Andrey V. Savchenko
Dorodnicyn Computing Centre of Russian Academy of Sciences, Moscow, Russia
Konstantin Vorontsov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ashuha, A., Loukachevitch, N. (2017). Bigram Anchor Words Topic Model. In: Ignatov, D., et al. Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science, vol 661. Springer, Cham. https://doi.org/10.1007/978-3-319-52920-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-52920-2_12
Published: 17 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52919-6
Online ISBN: 978-3-319-52920-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics