Skip to main content

Computing Classifier-Based Embeddings with the Help of Text2ddc

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2019)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13452))

  • 365 Accesses

Abstract

We introduce a method for computing classifier-based semantic spaces on top of text2ddc . To this end, we optimize text2ddc, a neural network-based classifier for the Dewey Decimal Classification (DDC). By using a wide range of linguistic features, including sense embeddings, we achieve an F-score of 87,4%. To show that our approach is language independent, we evaluate text2ddc by classifying texts in six different languages. Based thereon, we develop a topic model that generates probability distributions over topics for linguistic input at the word (sense), sentence and text level. In contrast to related approaches, these probabilities are estimated with text2ddc, so that each dimension of the resulting embeddings corresponds to a separate DDC class. We finally evaluate this Classifier-based Semantic space (CaSe) in the context of text classification and show that it improves the classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This paper expands and details the work we have presented in [2], providing more information about our model and the used data. We elaborate on the experiments and evaluation of text2ddc and CaSe , and include an error analysis.

  2. 2.

    www.babelfy.org.

  3. 3.

    www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html.

  4. 4.

    www.wiki.dbpedia.org/data-set-2014.

References

  1. Bär, D., Biemann, C., Gurevych, I., Zesch, T.: UKP: Computing semantic textual similarity by combining multiple content similarity measures. In: Proceedings of SemEval ’12, pp. 435–440. Stroudsburg (2012)

    Google Scholar 

  2. Baumartz, D., Uslu, T., Mehler, A.: LTV: Labeled topic vector. In: Proceedings of COLING 2018. In: the 27th International Conference on Computational Linguistics: System Demonstrations, August 20–26. The COLING 2018 Organizing Committee, Santa Fe, New Mexico, USA (2018)

    Google Scholar 

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3 993–1022 (2003)

    Google Scholar 

  4. vor der Brück, T., Eger, S., Mehler, A.: Complex decomposition of the negative distance kernel. In: IEEE International Conference on Machine Learning and Applications (2015)

    Google Scholar 

  5. Hemati, W., Uslu, T., Mehler, A.: Textimager: a distributed uima-based system for nlp. In: Proceedings of COLING 2016. In: The 26th International Conference on Computational Linguistics: System Demonstrations, pp. 59–63 (2016)

    Google Scholar 

  6. Iacobacci, I., Pilehvar, M.T., Navigli, R.: Sensembed: Learning sense embeddings for word and relational similarity. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). vol. 1, pp. 95–105 (2015)

    Google Scholar 

  7. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)

  8. Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014)

  9. Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)

  10. Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211–240 (1997)

    Article  Google Scholar 

  11. Leopold, E.: Models of semantic spaces. In: Mehler, A., Köhler, R. (eds.) Aspects of Automatic Text Analysis, Studies in Fuzziness and Soft Computing, vol. 209, pp. 117–137. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  12. Li, J., Jurafsky, D.: Do multi-sense embeddings improve natural language understanding? arXiv preprint arXiv:1506.01070 (2015)

  13. Li, Qi., Li, Tianshi, Chang, Baobao: Learning word sense embeddings from word sense definitions. In: Lin, Chin-Yew., Xue, Nianwen, Zhao, Dongyan, Huang, Xuanjing, Feng, Yansong (eds.) ICCPOL/NLPCC -2016. LNCS (LNAI), vol. 10102, pp. 224–235. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50496-4_19

    Chapter  Google Scholar 

  14. Magatti, D., Calegari, S., Ciucci, D., Stella, F.: Automatic labeling of topics. In: Intelligent Systems Design and Applications, 2009. ISDA’09. In: Ninth International Conference, pp. 1227–1232. IEEE (2009)

    Google Scholar 

  15. Mei, Q., Shen, X., Zhai, C.: Automatic labeling of multinomial topic models. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 490–499. KDD ’07, ACM, New York, NY, USA (2007). https://doi.org/10.1145/1281192.1281246, http://doi.acm.org/10.1145/1281192.1281246

  16. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  17. Pelevina, M., Arefyev, N., Biemann, C., Panchenko, A.: Making sense of word embeddings. arXiv preprint arXiv:1708.03390 (2017)

  18. Pilehvar, M.T., Navigli, R.: From senses to texts: An all-in-one graph-based approach for measuring semantic similarity. Artif. Intell. 228, 95–128 (2015)

    Article  Google Scholar 

  19. Uslu, T., Mehler, A., Baumartz, D., Henlein, A., Hemati, W.: fastsense: An efficient word sense disambiguation classifier. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018) (2018)

    Google Scholar 

  20. Uslu, T., Mehler, A., Niekler, A., Baumartz, D.: Towards a DDC-based topic network model of wikipedia. In: Proceedings of 2nd International Workshop on Modeling, Analysis, and Management of Social Networks and their Applications (SOCNET 2018), February 28, 2018 (2018)

    Google Scholar 

  21. Vial, L., Lecouteux, B., Schwab, D.: Sense embeddings in knowledge-based word sense disambiguation. In: 12th International Conference on Computational Semantics (2017)

    Google Scholar 

  22. Waltinger, U., Mehler, A., Lösch, M., Horstmann, W.: Hierarchical classification of OAI metadata using the DDC taxonomy. In: Bernardi, R., Chambers, S., Gottfried, B., Segond, F., Zaihrayeu, I. (eds.) Advanced Language Technologies for Digital Libraries (ALT4DL), pp. 29–40. Springer, LNCS (2011)

    Chapter  Google Scholar 

  23. Wu, L., Fisch, A., Chopra, S., Adams, K., Bordes, A., Weston, J.: Starspace: Embed all the things! CoRR abs/1709.03856 (2017). http://arxiv.org/abs/1709.03856

  24. Zhang, X., LeCun, Y.: Text understanding from scratch. CoRR abs/1502.01710 (2015). http://arxiv.org/abs/1502.01710

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Baumartz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Uslu, T., Mehler, A., Baumartz, D. (2023). Computing Classifier-Based Embeddings with the Help of Text2ddc. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13452. Springer, Cham. https://doi.org/10.1007/978-3-031-24340-0_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24340-0_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24339-4

  • Online ISBN: 978-3-031-24340-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics