Skip to main content

Additive Regularization of Topic Models for Topic Selection and Sparse Factorization

Part of the Lecture Notes in Computer Science book series (LNAI,volume 9047)

Abstract

Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. Determining the optimal number of topics remains a challenging problem in topic modeling. We propose a simple entropy regularization for topic selection in terms of Additive Regularization of Topic Models (ARTM), a multicriteria approach for combining regularizers. The entropy regularization gradually eliminates insignificant and linearly dependent topics. This process converges to the correct value on semi-real data. On real text collections it can be combined with sparsing, smoothing and decorrelation regularizers to produce a sequence of models with different numbers of well interpretable topics.

Keywords

  • Probabilistic topic modeling
  • Regularization
  • Probabilistic latent sematic analysis
  • Topic selection
  • EM-algorithm

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-17091-6_14
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-17091-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blei, D.M.: Probabilistic topic models. Communications of the ACM 55(4), 77–84 (2012)

    CrossRef  MathSciNet  Google Scholar 

  2. Daud, A., Li, J., Zhou, L., Muhammad, F.: Knowledge discovery through directed probabilistic topic models: a survey. Frontiers of Computer Science in China 4(2), 280–301 (2010)

    CrossRef  Google Scholar 

  3. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. Journal of the American Statistical Association 101(476), 1566–1581 (2006)

    CrossRef  MATH  MathSciNet  Google Scholar 

  4. Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 7:1–7:30 (2010)

    Google Scholar 

  5. Vorontsov, K.V.: Additive regularization for topic models of text collections. Doklady Mathematics 89(3), 301–304 (2014)

    CrossRef  MATH  MathSciNet  Google Scholar 

  6. Vorontsov, K.V., Potapenko, A.A.: Additive regularization of topic models. Machine Learning, Special Issue on Data Analysis and Intelligent Optimization (2014). doi:10.1007/s10994-014-5476-6

    Google Scholar 

  7. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development In Information Retrieval, pp. 50–57. ACM, New York (1999)

    Google Scholar 

  8. Vorontsov, K.V., Potapenko, A.A.: Tutorial on probabilistic topic modeling: additive regularization for stochastic matrix factorization. In: Ignatov, D.I., et al (eds.) AIST 2014. CCIS, vol. 436, pp. 29–46. Springer, Heidelberg (2014)

    Google Scholar 

  9. Tan, Y., Ou, Z.: Topic-weak-correlated latent dirichlet allocation. In: 7th International Symposium Chinese Spoken Language Processing (ISCSLP), pp. 224–228 (2010)

    Google Scholar 

  10. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  11. McCallum, A.K.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering (1996). http://www.cs.cmu.edu/~mccallum/bow

  12. Newman, D., Noh, Y., Talley, E., Karimi, S., Baldwin, T.: Evaluating topic models for digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital libraries, JCDL 2010, pp. 215–224. ACM, New York (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Konstantin Vorontsov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Vorontsov, K., Potapenko, A., Plavin, A. (2015). Additive Regularization of Topic Models for Topic Selection and Sparse Factorization. In: Gammerman, A., Vovk, V., Papadopoulos, H. (eds) Statistical Learning and Data Sciences. SLDS 2015. Lecture Notes in Computer Science(), vol 9047. Springer, Cham. https://doi.org/10.1007/978-3-319-17091-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17091-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17090-9

  • Online ISBN: 978-3-319-17091-6

  • eBook Packages: Computer ScienceComputer Science (R0)