Advertisement

United We Stand: Using Multiple Strategies for Topic Labeling

  • Antoine Gourru
  • Julien Velcin
  • Mathieu Roche
  • Christophe Gravier
  • Pascal Poncelet
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10859)

Abstract

Topic labeling aims at providing a sound, possibly multi-words, label that depicts a topic drawn from a topic model. This is of the utmost practical interest in order to quickly grasp a topic informational content – the usual ranked list of words that maximizes a topic presents limitations for this task. In this paper, we introduce three new unsupervised n-gram topic labelers that achieve comparable results than the existing unsupervised topic labelers but following different assumptions. We demonstrate that combining topic labelers - even only two - makes it possible to target a 64% improvement with respect to single topic labeler approaches and therefore opens research in that direction. Finally, we introduce a fourth topic labeler that extracts representative sentences, using Dirichlet smoothing to add contextual information. This sentence-based labeler provides strong surrogate candidates when n-gram topic labelers fall short on providing relevant labels, leading up to 94% topic covering.

Keywords

Topic model Topic labeling Topic coherence 

Notes

Acknowledgments

This work is partially funded by the SONGES project (Occitanie and FEDER).

References

  1. 1.
    Boyd-Graber, J., Yuening, H., Mimno, D., et al.: Applications of topic models. Found. Trends® Inf. Retr. 11(2–3), 143–296 (2017)CrossRefGoogle Scholar
  2. 2.
    Kim, D., Swanson, B.F., Hughes, M.C., Sudderth, E.B.: Refinery: an open source topic modeling web platform. J. Mach. Learn. Res. 18(12), 1–5 (2017)MATHGoogle Scholar
  3. 3.
    Wang, X., McCallum, A., Wei, X.: Topical n-grams: phrase and topic discovery, with an application to information retrieval. In: Seventh IEEE International Conference on Data Mining (ICDM), pp. 697–702. IEEE (2007)Google Scholar
  4. 4.
    Mei, Q., Shen, X., Zhai, C.: Automatic labeling of multinomial topic models. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 490–499. ACM (2007)Google Scholar
  5. 5.
    Lau, J.H., Grieser, K., Newman, D., Baldwin, T.: Automatic labelling of topic models. In: Proceedings of Annual Meeting of ACL-HLT, vol. 1, pp. 1536–1545. Association for Computational Linguistics (2011)Google Scholar
  6. 6.
    Kou, W., Li, F., Baldwin, T.: Automatic labelling of topic models using word vectors and letter trigram vectors. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds.) AIRS 2015. LNCS, vol. 9460, pp. 253–264. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-28940-3_20CrossRefGoogle Scholar
  7. 7.
    Danilevsky, M., Wang, C., Desai, N., Ren, X., Guo, J., Han, J.: Automatic construction and ranking of topical keyphrases on collections of short documents. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 398–406. SIAM (2014)CrossRefGoogle Scholar
  8. 8.
    Magatti, D., Calegari, S., Ciucci, D., Stella, F.: Automatic labeling of topics. In: Ninth International Conference on Intelligent Systems Design and Applications, ISDA 2009, pp. 1227–1232. IEEE (2009)Google Scholar
  9. 9.
    El-Assady, M., Sevastjanova, R., Sperrle, F., Keim, D., Collins, C.: Progressive learning of topic modeling parameters: a visual analytics framework. IEEE Trans. Vis. Comput. Graph. 24, 382–391 (2017)CrossRefGoogle Scholar
  10. 10.
    Aletras, N., Baldwin, T., Lau, J.H., Stevenson, M.: Evaluating topic representations for exploring document collections. J. Assoc. Inf. Sci. Technol. 68(1), 154–167 (2017)CrossRefGoogle Scholar
  11. 11.
    Frantzi, K.T., Ananiadou, S., Tsujii, J.: The C-value/NC-value method of automatic recognition for multi-word terms. In: Nikolaou, C., Stephanidis, C. (eds.) ECDL 1998. LNCS, vol. 1513, pp. 585–604. Springer, Heidelberg (1998).  https://doi.org/10.1007/3-540-49653-X_35CrossRefGoogle Scholar
  12. 12.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. (JMLR) 3, 993–1022 (2003)MATHGoogle Scholar
  13. 13.
    Wallach, H.M., Mimno, D.M., McCallum, A.: Rethinking LDA: why priors matter. In: Advances in Neural Information Processing Systems, pp. 1973–1981 (2009)Google Scholar
  14. 14.
    Tang, J., Wu, S., Sun, J., Su, H.: Cross-domain collaboration recommendation. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1285–1293 (2012)Google Scholar
  15. 15.
    Aletras, N., Stevenson, M.: Evaluating topic coherence using distributional semantics. In: IWCS, vol. 13, pp. 13–22 (2013)Google Scholar
  16. 16.
    Li, Z., Li, J., Liao, Y., Wen, S., Tang, J.: Labeling clusters from both linguistic and statistical perspectives: a hybrid approach. Knowl. Based Syst. 76, 219–227 (2015)CrossRefGoogle Scholar
  17. 17.
    Lossio-Ventura, J.A., Jonquet, C., Roche, M., Teisseire, M.: Biomedical term extraction: overview and a new methodology. Inf. Retr. J. 19(1), 59–99 (2016)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Antoine Gourru
    • 1
  • Julien Velcin
    • 1
  • Mathieu Roche
    • 2
    • 5
  • Christophe Gravier
    • 3
  • Pascal Poncelet
    • 4
  1. 1.Université de Lyon, Lyon 2, ERIC EA 3083LyonFrance
  2. 2.TETIS, Univ. Montpellier, APT, Cirad, CNRS, IrsteaMontpellierFrance
  3. 3.Université Jean Monnet, Laboratoire Hubert Curien UMR CNRS 5516Saint-ÉtienneFrance
  4. 4.Univ. Montpellier, LIRMMMontpellierFrance
  5. 5.Cirad, TETISMontpellierFrance

Personalised recommendations