Abstract
A probabilistic topic model is a modern statistical tool for document collection analysis that allows extracting a number of topics in the collection and describes each document as a discrete probability distribution over topics. Classical approaches to statistical topic modeling can be quite effective in various tasks, but the generated topics may be too similar to each other or poorly interpretable. We supposed that it is possible to improve the interpretability and differentiation of topics by using linguistic information such as collocations while building the topic model. In this paper we offer an approach to accounting bigrams (two-word phrases) for the construction of Anchor Words Topic Model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arora, S., Ge, R., Halpern, Y., Mimno, D., Moitra, A., Sontag, D., Wu, Y., Zhu, M.: A practical algorithm for topic modeling with provable guarantees. arXiv preprint arXiv:1212.4777 (2012)
Arora, S., Ge, R., Moitra, A.: Learning topic models - going beyond SVD. In: 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 1–10. IEEE (2012)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Machine Learn. Res. 3, 993–1022 (2003)
Cheng, X., Yan, X., Lan, Y., Guo, J.: BTM: topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26(12), 2928–2941 (2014)
Dobrov, B., Lukashevich, N., Siromytnikov, S.: Forming the base of terminological phrases in the texts of the subject area, pp. 201–210 (2003)
Gao, W., Li, P., Darwish, K.: Joint topic modeling for event summarization across news and social media streams. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1173–1182. ACM (2012)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)
Krestel, R., Fankhauser, P., Nejdl, W.: Latent dirichlet allocation for tag recommendation. In: Proceedings of the third ACM Conference on Recommender Systems, pp. 61–68. ACM (2009)
MacKay, D.J., Peto, L.C.B.: A hierarchical dirichlet language model. Nat. Lang. Eng. 1(03), 289–308 (1995)
Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: Proceedings of the 17th International Conference on World Wide Web, pp. 101–110. ACM (2008)
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 100–108. Association for Computational Linguistics (2010)
Nokel, M., Loukachevitch, N.: The method of accounting bigram structure in topical models. Computational Methods and Programming (2015)
Steyvers, M., Griffiths, T.: Matlab topic modeling toolbox 1.3 (2005)
Vorontsov, K.: Additive regularization for topic models of text collections. In: Doklady Mathematics, pp. 301–304. Pleiades Publishing (2014)
Wallach, H.M.: Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 977–984. ACM (2006)
Wang, X., McCallum, A., Wei, X.: Topical n-grams: phrase and topic discovery, with an application to information retrieval. In: Seventh IEEE International Conference on Data Mining, ICDM 2007, pp. 697–702. IEEE (2007)
Acknowledgments
This work was supported by grant RFFI 14-07-00383A Research of methods of integration of linguistic knowledge into statistical topic models.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ashuha, A., Loukachevitch, N. (2017). Bigram Anchor Words Topic Model. In: Ignatov, D., et al. Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science, vol 661. Springer, Cham. https://doi.org/10.1007/978-3-319-52920-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-52920-2_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52919-6
Online ISBN: 978-3-319-52920-2
eBook Packages: Computer ScienceComputer Science (R0)