Advertisement

Supervised Topic Classification for Modeling a Hierarchical Conference Structure

  • Mikhail KuznetsovEmail author
  • Marianne Clausel
  • Massih-Reza Amini
  • Eric Gaussier
  • Vadim Strijov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9489)

Abstract

In this paper we investigate the problem of supervised latent modeling for extracting topic hierarchies from data. The supervised part is given in the form of expert information over document-topic correspondence. To exploit the expert information we use a regularization term that penalizes the difference between a predicted and an expert-given model. We hence add the regularization term to the log-likelihood function and use a stochastic EM based algorithm for parameter estimation. The proposed method is used to construct a topic hierarchy over the proceedings of the European Conference on Operational Research and helps to automatize the abstract submission system.

Keywords

Hierarchical topic model Labeled classification Probabilistic latent semantic analysis EM approach 

References

  1. 1.
    EURO conference abstracts and data. http://sourceforge.net/p/mlalgorithms/code/ HEAD/tree/EURO_data/. Accessed 14 May 2015
  2. 2.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  3. 3.
    Gaussier, É., Goutte, C., Popat, K., Chen, F.: A hierarchical model for clustering and categorising documents. In: Crestani, F., Girolami, M., van Rijsbergen, C.J. (eds.) ECIR 2002. LNCS, vol. 2291, p. 229. Springer, Heidelberg (2002) CrossRefGoogle Scholar
  4. 4.
    Good, I.J., Gaskins, R.A.: Nonparametric roughness penalties for probability densities. Biometrika 58(2), 255–277 (1971)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Griffiths, T.L., Jordan, M.I., Tenenbaum, J.B., Blei, D.M.: Hierarchical topic models and the nested Chinese restaurant process. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16, pp. 17–24 (2004)Google Scholar
  6. 6.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)Google Scholar
  7. 7.
    Kuzmin, A.A., Strijov, V.V.: Validation of the thematic models for document collections. Inf. Technol. 4, 16–20 (2013)Google Scholar
  8. 8.
    Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1, pp. 248–256. Association for Computational Linguistics (2009)Google Scholar
  9. 9.
    Vorontsov, K.V., Potapenko, A.A.: Additive regularization of topic models. Mach. Learn. J., Special Issue “Data Analysis and Intelligent Optimization” 101, 303–323 (2015)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Wallach, H.M.: Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 977–984. ACM (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Mikhail Kuznetsov
    • 1
    Email author
  • Marianne Clausel
    • 2
  • Massih-Reza Amini
    • 3
  • Eric Gaussier
    • 3
  • Vadim Strijov
    • 1
  1. 1.Moscow Institute of Physics and TechnologyDolgoprudny, MoscowRussia
  2. 2.Laboratoire Jean KuntzmannUniversité de Grenoble Alpes, CNRSGrenoble Cedex 9France
  3. 3.Laboratoire d’Informatique de GrenobleUniversité de Grenoble Alpes, CNRSGrenoble Cedex 9France

Personalised recommendations