Abstract
This paper deals with a method for topic labelling that makes use of Explicit Semantic Analysis (ESA). Top words of a topic are given to ESA as an input, and the algorithm yields titles of Wikipedia articles that are considered most relevant to the input. An alternative approach that serves as a strong baseline employs titles of first outputs in a search engine, given topic words as a query. In both methods, obtained titles are then automatically analysed and phrases characterizing the topic are constructed from them with the use of a graph algorithm and are assigned with weights. Within the proposed method based on ESA, post-processing is then performed to sort candidate labels according to empirically formulated rules. Experiments were conducted on a corpus of Russian encyclopaedic texts on linguistics. The results justify applying ESA for this task, and we state that though it works a little inferior to the method based on a search engine in terms of labels’ quality, it can be used as a reasonable alternative because it exhibits two advantages that the baseline method lacks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Aletras, N., Mittal, A.: Labeling topics with images using a neural network. In: Jose, J.M., et al. (eds.) ECIR 2017. LNCS, vol. 10193, pp. 500–505. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56608-5_40
Aletras, N., Stevenson, M., Court, R.: Labelling topics using unsupervised graph-based methods. In: Proceedings of the 52nd Annual Meeting of ACL, pp. 631–636. ACL (2014). https://doi.org/10.3115/v1/P14-2103
Blei, D., Ng, A., Jordan, M.L.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://doi.org/10.1162/jmlr.2003.3.4-5.993
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 1606–1611 (2007). https://dl.acm.org/citation.cfm?id=1625535
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999). https://doi.org/10.1145/312624.312649
Kou, W., Li, F., Baldwin, T.: Automatic labelling of topic models using word vectors and letter trigram vectors. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds.) AIRS 2015. LNCS, vol. 9460, pp. 253–264. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-28940-3_20
Kriukova, A., Mitrofanova, O., Sukharev, K., Roschina, N.: Using explicit semantic analysis and Word2Vec in measuring semantic relatedness of Russian paraphrases. In: 2018 Digital Transformations and Modern Society (2018)
Lau, J.H., Grieser, K., Newman, D., Baldwin, T.: Automatic labelling of topic models. In: Proceedings of the 49th Annual Meeting of the ACL, pp. 1536–1545. ACL, Stroudsburg (2011)
Lau, J.H., Newman, D., Karimi, S., Baldwin, T.: Best topic word selection for topic labelling. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), No. August, pp. 605–613 ACL, Stroudsburg (2010)
Mei, Q., Shen, X., Zhai, C.: Automatic labeling of multinomial topic models. In: Proceedings of the 13th ACM SIGKDD Knowledge Discovery and Data Mining, KDD 2007, p. 490. ACM Press (2007). https://doi.org/10.1145/1281192.1281246
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Proceedings of EMNLP, vol. 85, pp. 404–411 (2004). https://doi.org/10.3115/1219044.1219064
Mirzagitova, A., Mitrofanova, O.: Automatic assignment of labels in topic modelling for Russian corpora. In: Botinis, A. (ed.) Proceedings of the 7th Tutorial and Research Workshop on Experimental Linguistics, pp. 107–110. ISCA, Saint Petersburg (2016). https://www.researchgate.net/publication/320444549
Panicheva, P., Mirzagitova, A., Ledovaya, Y.: Semantic feature aggregation for gender identification in Russian Facebook. In: Filchenkov, A., Pivovarova, L., Žižka, J. (eds.) AINL 2017. CCIS, vol. 789, pp. 3–15. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-71746-3_1
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Sorodoc, I., Lau, J.H., Aletras, N., Baldwin, T.: Multimodal topic labelling. In: Proceedings of the 15th Conference of EACL, vol. 2, pp. 701–706 (2017). https://doi.org/10.18653/v1/E17-2111
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Kriukova, A., Erofeeva, A., Mitrofanova, O., Sukharev, K. (2018). Explicit Semantic Analysis as a Means for Topic Labelling. In: Ustalov, D., Filchenkov, A., Pivovarova, L., Žižka, J. (eds) Artificial Intelligence and Natural Language. AINL 2018. Communications in Computer and Information Science, vol 930. Springer, Cham. https://doi.org/10.1007/978-3-030-01204-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-01204-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01203-8
Online ISBN: 978-3-030-01204-5
eBook Packages: Computer ScienceComputer Science (R0)