Skip to main content

Additive Regularization of Topic Models for Topic Selection and Sparse Factorization

  • Conference paper
  • First Online:
Book cover Statistical Learning and Data Sciences (SLDS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9047))

Included in the following conference series:

Abstract

Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. Determining the optimal number of topics remains a challenging problem in topic modeling. We propose a simple entropy regularization for topic selection in terms of Additive Regularization of Topic Models (ARTM), a multicriteria approach for combining regularizers. The entropy regularization gradually eliminates insignificant and linearly dependent topics. This process converges to the correct value on semi-real data. On real text collections it can be combined with sparsing, smoothing and decorrelation regularizers to produce a sequence of models with different numbers of well interpretable topics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blei, D.M.: Probabilistic topic models. Communications of the ACM 55(4), 77–84 (2012)

    Article  MathSciNet  Google Scholar 

  2. Daud, A., Li, J., Zhou, L., Muhammad, F.: Knowledge discovery through directed probabilistic topic models: a survey. Frontiers of Computer Science in China 4(2), 280–301 (2010)

    Article  Google Scholar 

  3. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. Journal of the American Statistical Association 101(476), 1566–1581 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  4. Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 7:1–7:30 (2010)

    Google Scholar 

  5. Vorontsov, K.V.: Additive regularization for topic models of text collections. Doklady Mathematics 89(3), 301–304 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  6. Vorontsov, K.V., Potapenko, A.A.: Additive regularization of topic models. Machine Learning, Special Issue on Data Analysis and Intelligent Optimization (2014). doi:10.1007/s10994-014-5476-6

    Google Scholar 

  7. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development In Information Retrieval, pp. 50–57. ACM, New York (1999)

    Google Scholar 

  8. Vorontsov, K.V., Potapenko, A.A.: Tutorial on probabilistic topic modeling: additive regularization for stochastic matrix factorization. In: Ignatov, D.I., et al (eds.) AIST 2014. CCIS, vol. 436, pp. 29–46. Springer, Heidelberg (2014)

    Google Scholar 

  9. Tan, Y., Ou, Z.: Topic-weak-correlated latent dirichlet allocation. In: 7th International Symposium Chinese Spoken Language Processing (ISCSLP), pp. 224–228 (2010)

    Google Scholar 

  10. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  11. McCallum, A.K.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering (1996). http://www.cs.cmu.edu/~mccallum/bow

  12. Newman, D., Noh, Y., Talley, E., Karimi, S., Baldwin, T.: Evaluating topic models for digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital libraries, JCDL 2010, pp. 215–224. ACM, New York (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Konstantin Vorontsov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Vorontsov, K., Potapenko, A., Plavin, A. (2015). Additive Regularization of Topic Models for Topic Selection and Sparse Factorization. In: Gammerman, A., Vovk, V., Papadopoulos, H. (eds) Statistical Learning and Data Sciences. SLDS 2015. Lecture Notes in Computer Science(), vol 9047. Springer, Cham. https://doi.org/10.1007/978-3-319-17091-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17091-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17090-9

  • Online ISBN: 978-3-319-17091-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics