Computing Classifier-Based Embeddings with the Help of Text2ddc

Uslu, Tolga; Mehler, Alexander; Baumartz, Daniel

doi:10.1007/978-3-031-24340-0_37

Tolga Uslu⁸,
Alexander Mehler⁸ &
Daniel Baumartz⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13452))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

365 Accesses

Abstract

We introduce a method for computing classifier-based semantic spaces on top of text2ddc . To this end, we optimize text2ddc, a neural network-based classifier for the Dewey Decimal Classification (DDC). By using a wide range of linguistic features, including sense embeddings, we achieve an F-score of 87,4%. To show that our approach is language independent, we evaluate text2ddc by classifying texts in six different languages. Based thereon, we develop a topic model that generates probability distributions over topics for linguistic input at the word (sense), sentence and text level. In contrast to related approaches, these probabilities are estimated with text2ddc, so that each dimension of the resulting embeddings corresponds to a separate DDC class. We finally evaluate this Classifier-based Semantic space (CaSe) in the context of text classification and show that it improves the classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This paper expands and details the work we have presented in [2], providing more information about our model and the used data. We elaborate on the experiments and evaluation of text2ddc and CaSe , and include an error analysis.
2.
www.babelfy.org.
3.
www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html.
4.
www.wiki.dbpedia.org/data-set-2014.

References

Bär, D., Biemann, C., Gurevych, I., Zesch, T.: UKP: Computing semantic textual similarity by combining multiple content similarity measures. In: Proceedings of SemEval ’12, pp. 435–440. Stroudsburg (2012)
Google Scholar
Baumartz, D., Uslu, T., Mehler, A.: LTV: Labeled topic vector. In: Proceedings of COLING 2018. In: the 27th International Conference on Computational Linguistics: System Demonstrations, August 20–26. The COLING 2018 Organizing Committee, Santa Fe, New Mexico, USA (2018)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3 993–1022 (2003)
Google Scholar
vor der Brück, T., Eger, S., Mehler, A.: Complex decomposition of the negative distance kernel. In: IEEE International Conference on Machine Learning and Applications (2015)
Google Scholar
Hemati, W., Uslu, T., Mehler, A.: Textimager: a distributed uima-based system for nlp. In: Proceedings of COLING 2016. In: The 26th International Conference on Computational Linguistics: System Demonstrations, pp. 59–63 (2016)
Google Scholar
Iacobacci, I., Pilehvar, M.T., Navigli, R.: Sensembed: Learning sense embeddings for word and relational similarity. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). vol. 1, pp. 95–105 (2015)
Google Scholar
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014)
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211–240 (1997)
Article Google Scholar
Leopold, E.: Models of semantic spaces. In: Mehler, A., Köhler, R. (eds.) Aspects of Automatic Text Analysis, Studies in Fuzziness and Soft Computing, vol. 209, pp. 117–137. Springer, Heidelberg (2007)
Chapter Google Scholar
Li, J., Jurafsky, D.: Do multi-sense embeddings improve natural language understanding? arXiv preprint arXiv:1506.01070 (2015)
Li, Qi., Li, Tianshi, Chang, Baobao: Learning word sense embeddings from word sense definitions. In: Lin, Chin-Yew., Xue, Nianwen, Zhao, Dongyan, Huang, Xuanjing, Feng, Yansong (eds.) ICCPOL/NLPCC -2016. LNCS (LNAI), vol. 10102, pp. 224–235. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50496-4_19
Chapter Google Scholar
Magatti, D., Calegari, S., Ciucci, D., Stella, F.: Automatic labeling of topics. In: Intelligent Systems Design and Applications, 2009. ISDA’09. In: Ninth International Conference, pp. 1227–1232. IEEE (2009)
Google Scholar
Mei, Q., Shen, X., Zhai, C.: Automatic labeling of multinomial topic models. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 490–499. KDD ’07, ACM, New York, NY, USA (2007). https://doi.org/10.1145/1281192.1281246, http://doi.acm.org/10.1145/1281192.1281246
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Pelevina, M., Arefyev, N., Biemann, C., Panchenko, A.: Making sense of word embeddings. arXiv preprint arXiv:1708.03390 (2017)
Pilehvar, M.T., Navigli, R.: From senses to texts: An all-in-one graph-based approach for measuring semantic similarity. Artif. Intell. 228, 95–128 (2015)
Article Google Scholar
Uslu, T., Mehler, A., Baumartz, D., Henlein, A., Hemati, W.: fastsense: An efficient word sense disambiguation classifier. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018) (2018)
Google Scholar
Uslu, T., Mehler, A., Niekler, A., Baumartz, D.: Towards a DDC-based topic network model of wikipedia. In: Proceedings of 2nd International Workshop on Modeling, Analysis, and Management of Social Networks and their Applications (SOCNET 2018), February 28, 2018 (2018)
Google Scholar
Vial, L., Lecouteux, B., Schwab, D.: Sense embeddings in knowledge-based word sense disambiguation. In: 12th International Conference on Computational Semantics (2017)
Google Scholar
Waltinger, U., Mehler, A., Lösch, M., Horstmann, W.: Hierarchical classification of OAI metadata using the DDC taxonomy. In: Bernardi, R., Chambers, S., Gottfried, B., Segond, F., Zaihrayeu, I. (eds.) Advanced Language Technologies for Digital Libraries (ALT4DL), pp. 29–40. Springer, LNCS (2011)
Chapter Google Scholar
Wu, L., Fisch, A., Chopra, S., Adams, K., Bordes, A., Weston, J.: Starspace: Embed all the things! CoRR abs/1709.03856 (2017). http://arxiv.org/abs/1709.03856
Zhang, X., LeCun, Y.: Text understanding from scratch. CoRR abs/1502.01710 (2015). http://arxiv.org/abs/1502.01710

Download references

Author information

Authors and Affiliations

Goethe University, Frankfurt am Main, Germany
Tolga Uslu, Alexander Mehler & Daniel Baumartz

Authors

Tolga Uslu
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Mehler
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Baumartz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Baumartz .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Uslu, T., Mehler, A., Baumartz, D. (2023). Computing Classifier-Based Embeddings with the Help of Text2ddc. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13452. Springer, Cham. https://doi.org/10.1007/978-3-031-24340-0_37

Download citation

DOI: https://doi.org/10.1007/978-3-031-24340-0_37
Published: 26 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24339-4
Online ISBN: 978-3-031-24340-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Computing Classifier-Based Embeddings with the Help of Text2ddc