Advertisement

Explicit Semantic Analysis as a Means for Topic Labelling

  • Anna Kriukova
  • Aliia Erofeeva
  • Olga Mitrofanova
  • Kirill Sukharev
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 930)

Abstract

This paper deals with a method for topic labelling that makes use of Explicit Semantic Analysis (ESA). Top words of a topic are given to ESA as an input, and the algorithm yields titles of Wikipedia articles that are considered most relevant to the input. An alternative approach that serves as a strong baseline employs titles of first outputs in a search engine, given topic words as a query. In both methods, obtained titles are then automatically analysed and phrases characterizing the topic are constructed from them with the use of a graph algorithm and are assigned with weights. Within the proposed method based on ESA, post-processing is then performed to sort candidate labels according to empirically formulated rules. Experiments were conducted on a corpus of Russian encyclopaedic texts on linguistics. The results justify applying ESA for this task, and we state that though it works a little inferior to the method based on a search engine in terms of labels’ quality, it can be used as a reasonable alternative because it exhibits two advantages that the baseline method lacks.

Keywords

Topic labels Topic modelling Explicit Semantic Analysis Russian 

References

  1. 1.
    Aletras, N., Mittal, A.: Labeling topics with images using a neural network. In: Jose, J.M., et al. (eds.) ECIR 2017. LNCS, vol. 10193, pp. 500–505. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-56608-5_40CrossRefGoogle Scholar
  2. 2.
    Aletras, N., Stevenson, M., Court, R.: Labelling topics using unsupervised graph-based methods. In: Proceedings of the 52nd Annual Meeting of ACL, pp. 631–636. ACL (2014).  https://doi.org/10.3115/v1/P14-2103
  3. 3.
    Blei, D., Ng, A., Jordan, M.L.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003).  https://doi.org/10.1162/jmlr.2003.3.4-5.993CrossRefzbMATHGoogle Scholar
  4. 4.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 1606–1611 (2007). https://dl.acm.org/citation.cfm?id=1625535
  5. 5.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999).  https://doi.org/10.1145/312624.312649
  6. 6.
    Kou, W., Li, F., Baldwin, T.: Automatic labelling of topic models using word vectors and letter trigram vectors. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds.) AIRS 2015. LNCS, vol. 9460, pp. 253–264. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-28940-3_20CrossRefGoogle Scholar
  7. 7.
    Kriukova, A., Mitrofanova, O., Sukharev, K., Roschina, N.: Using explicit semantic analysis and Word2Vec in measuring semantic relatedness of Russian paraphrases. In: 2018 Digital Transformations and Modern Society (2018)Google Scholar
  8. 8.
    Lau, J.H., Grieser, K., Newman, D., Baldwin, T.: Automatic labelling of topic models. In: Proceedings of the 49th Annual Meeting of the ACL, pp. 1536–1545. ACL, Stroudsburg (2011)Google Scholar
  9. 9.
    Lau, J.H., Newman, D., Karimi, S., Baldwin, T.: Best topic word selection for topic labelling. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), No. August, pp. 605–613 ACL, Stroudsburg (2010)Google Scholar
  10. 10.
    Mei, Q., Shen, X., Zhai, C.: Automatic labeling of multinomial topic models. In: Proceedings of the 13th ACM SIGKDD Knowledge Discovery and Data Mining, KDD 2007, p. 490. ACM Press (2007).  https://doi.org/10.1145/1281192.1281246
  11. 11.
    Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Proceedings of EMNLP, vol. 85, pp. 404–411 (2004).  https://doi.org/10.3115/1219044.1219064
  12. 12.
    Mirzagitova, A., Mitrofanova, O.: Automatic assignment of labels in topic modelling for Russian corpora. In: Botinis, A. (ed.) Proceedings of the 7th Tutorial and Research Workshop on Experimental Linguistics, pp. 107–110. ISCA, Saint Petersburg (2016). https://www.researchgate.net/publication/320444549
  13. 13.
    Panicheva, P., Mirzagitova, A., Ledovaya, Y.: Semantic feature aggregation for gender identification in Russian Facebook. In: Filchenkov, A., Pivovarova, L., Žižka, J. (eds.) AINL 2017. CCIS, vol. 789, pp. 3–15. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-71746-3_1CrossRefGoogle Scholar
  14. 14.
    Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)zbMATHGoogle Scholar
  16. 16.
    Sorodoc, I., Lau, J.H., Aletras, N., Baldwin, T.: Multimodal topic labelling. In: Proceedings of the 15th Conference of EACL, vol. 2, pp. 701–706 (2017).  https://doi.org/10.18653/v1/E17-2111

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Anna Kriukova
    • 1
  • Aliia Erofeeva
    • 2
  • Olga Mitrofanova
    • 1
  • Kirill Sukharev
    • 3
  1. 1.St. Petersburg State UniversitySt. PetersburgRussia
  2. 2.University of TrentoTrentoItaly
  3. 3.St. Petersburg Electrotechnical UniversitySt. PetersburgRussia

Personalised recommendations