Additive Regularization of Topic Models for Topic Selection and Sparse Factorization

Vorontsov, Konstantin; Potapenko, Anna; Plavin, Alexander

doi:10.1007/978-3-319-17091-6_14

Konstantin Vorontsov⁷,
Anna Potapenko⁸ &
Alexander Plavin⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9047))

Included in the following conference series:

International Symposium on Statistical Learning and Data Sciences

2868 Accesses
10 Citations

Abstract

Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. Determining the optimal number of topics remains a challenging problem in topic modeling. We propose a simple entropy regularization for topic selection in terms of Additive Regularization of Topic Models (ARTM), a multicriteria approach for combining regularizers. The entropy regularization gradually eliminates insignificant and linearly dependent topics. This process converges to the correct value on semi-real data. On real text collections it can be combined with sparsing, smoothing and decorrelation regularizers to produce a sequence of models with different numbers of well interpretable topics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blei, D.M.: Probabilistic topic models. Communications of the ACM 55(4), 77–84 (2012)
Article MathSciNet Google Scholar
Daud, A., Li, J., Zhou, L., Muhammad, F.: Knowledge discovery through directed probabilistic topic models: a survey. Frontiers of Computer Science in China 4(2), 280–301 (2010)
Article Google Scholar
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. Journal of the American Statistical Association 101(476), 1566–1581 (2006)
Article MATH MathSciNet Google Scholar
Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 7:1–7:30 (2010)
Google Scholar
Vorontsov, K.V.: Additive regularization for topic models of text collections. Doklady Mathematics 89(3), 301–304 (2014)
Article MATH MathSciNet Google Scholar
Vorontsov, K.V., Potapenko, A.A.: Additive regularization of topic models. Machine Learning, Special Issue on Data Analysis and Intelligent Optimization (2014). doi:10.1007/s10994-014-5476-6
Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development In Information Retrieval, pp. 50–57. ACM, New York (1999)
Google Scholar
Vorontsov, K.V., Potapenko, A.A.: Tutorial on probabilistic topic modeling: additive regularization for stochastic matrix factorization. In: Ignatov, D.I., et al (eds.) AIST 2014. CCIS, vol. 436, pp. 29–46. Springer, Heidelberg (2014)
Google Scholar
Tan, Y., Ou, Z.: Topic-weak-correlated latent dirichlet allocation. In: 7th International Symposium Chinese Spoken Language Processing (ISCSLP), pp. 224–228 (2010)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
McCallum, A.K.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering (1996). http://www.cs.cmu.edu/~mccallum/bow
Newman, D., Noh, Y., Talley, E., Karimi, S., Baldwin, T.: Evaluating topic models for digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital libraries, JCDL 2010, pp. 215–224. ACM, New York (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Dorodnicyn Computing Centre of RAS, National Research University Higher School of Economics, Moscow, Russia
Konstantin Vorontsov
National Research University Higher School of Economics, Moscow, Russia
Anna Potapenko
Moscow Institute of Physics and Technology, Moscow, Russia
Alexander Plavin

Authors

Konstantin Vorontsov
View author publications
You can also search for this author in PubMed Google Scholar
Anna Potapenko
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Plavin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Konstantin Vorontsov .

Editor information

Editors and Affiliations

University of London, Egham, Surrey, United Kingdom
Alexander Gammerman
University of London, Egham, Surrey, United Kingdom
Vladimir Vovk
Frederick University, Nicosia, Cyprus
Harris Papadopoulos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vorontsov, K., Potapenko, A., Plavin, A. (2015). Additive Regularization of Topic Models for Topic Selection and Sparse Factorization. In: Gammerman, A., Vovk, V., Papadopoulos, H. (eds) Statistical Learning and Data Sciences. SLDS 2015. Lecture Notes in Computer Science(), vol 9047. Springer, Cham. https://doi.org/10.1007/978-3-319-17091-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-17091-6_14
Published: 03 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17090-9
Online ISBN: 978-3-319-17091-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics