Skip to main content

Turkish Document Classification with Coarse-Grained Semantic Matrix

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9624))

  • 1114 Accesses

Abstract

In this paper, we present a novel method for Document Classification that uses semantic matrix representation of Turkish sentences by concentrating on the sentence phrases and their concepts in text. Our model has been designed to find phrases in a sentence, identify their relations with specific concepts, and represent the sentences as coarse-grained semantic matrix. Predicate features and semantic class type are also added to the coarse-grained semantic matrix representation. The highest success rate in Turkish Document Classification “97.12” is obtained by adding the coarse-grained semantic matrix representation to the data which has previous highest result in the previous studies about Turkish Document Classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amasyalı, M.F., Beken, A.: Türkçe kelimelerin anlamsal benzerliklerinin ölçülmesi ve metin sınıflandırmada kullanılması measurement of Turkish word semantic similarity and text categorization application

    Google Scholar 

  2. Amasyalı, M.F., Diri, B.: Automatic Turkish text categorization in terms of author, genre and gender. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds.) NLDB 2006. LNCS, vol. 3999, pp. 221–226. Springer, Heidelberg (2006). https://doi.org/10.1007/11765448_22

    Chapter  Google Scholar 

  3. Backus, J.W.: The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM conference. In: 1959 Proceedings of the International Comference on Information Processing (1959)

    Google Scholar 

  4. Baytop, T.: Türkçe bitki adları sözlüğü, vol. 578. Turk Dil Kurumu, Ankara (1994)

    Google Scholar 

  5. Bilgin, O., Çetinoğlu, Ö., Oflazer, K.: Building a wordnet for Turkish. Rom. J. Inf. Sci. Technol. 7(1–2), 163–172 (2004)

    Google Scholar 

  6. Çataltepe, Z., Turan, Y., Kesgin, F.: Turkish document classification using shorter roots. In: 2007 IEEE 15th Signal Processing and Communications Applications, SIU 2007, pp. 1–4. IEEE (2007)

    Google Scholar 

  7. Chomsky, N.: Syntactic Structures. Walter de Gruyter, Berlin (2002)

    Book  MATH  Google Scholar 

  8. Eryigit, G.: ITU Turkish NLP web service. In: 2014 EACL, p. 1 (2014)

    Google Scholar 

  9. Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)

    MATH  Google Scholar 

  10. Hoffman, B.: The computational analysis of the syntax and interpretation of “free” word order in Turkish. IRCS Technical reports Series, p. 130 (1995)

    Google Scholar 

  11. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683

    Chapter  Google Scholar 

  12. Kim, S.-B., Rim, H.-C., Yook, D.S., Lim, H.-S.: Effective methods for improving naive Bayes text classifiers. In: Ishizuka, M., Sattar, A. (eds.) PRICAI 2002. LNCS (LNAI), vol. 2417, pp. 414–423. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45683-X_45

    Chapter  Google Scholar 

  13. Lakoff, G.: Women, Fire, and Dangerous Things: What Categories Reveal About the Mind, vol. 1. Cambridge University Press, Cambridge (1990)

    Google Scholar 

  14. Lewis, D.D.: Naive (Bayes) at forty: the independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026666

    Chapter  Google Scholar 

  15. Li, C.H., Park, S.C.: Text categorization based on artificial neural networks. In: King, I., Wang, J., Chan, L.-W., Wang, D.L. (eds.) ICONIP 2006. LNCS, vol. 4234, pp. 302–311. Springer, Heidelberg (2006). https://doi.org/10.1007/11893295_35

    Chapter  Google Scholar 

  16. Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., Zamparelli, R.: Semeval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: SemEval-2014 (2014)

    Google Scholar 

  17. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  18. Nakayama, M., Shimizu, Y.: Subject categorization for web educational resources using MLP. In: ESANN, pp. 9–14 (2003)

    Google Scholar 

  19. Schuler, K.K.: Verbnet: a broad-coverage, comprehensive verb lexicon (2005)

    Google Scholar 

  20. Socher, R., Huval, B., Manning, C.D., Ng, A.Y.: Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1201–1211. Association for Computational Linguistics (2012)

    Google Scholar 

  21. Stamou, S., Oflazer, K., Pala, K., Christoudoulakis, D., Cristea, D., Tufis, D., Koeva, S., Totkov, G., Dutoit, D., Grigoriadou, M.: BALKANET: a multilingual semantic network for the Balkan languages. In: Proceedings of the International Wordnet Conference, Mysore, India, pp. 21–25 (2002)

    Google Scholar 

  22. Tan, S.: An effective refinement strategy for KNN text classifier. Expert Syst. Appl. 30(2), 290–298 (2006)

    Article  Google Scholar 

  23. Torunoğlu, D., Çakırman, E., Ganiz, M.C., Akyokuş, S., Gürbüz, M.Z.: Analysis of preprocessing methods on classification of Turkish texts. In: 2011 International Symposium on Innovations in Intelligent Systems and Applications (INISTA), pp. 112–117. IEEE (2011)

    Google Scholar 

  24. Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 104. ACM (2004)

    Google Scholar 

  25. Wu, M.C., Lin, S.Y., Lin, C.H.: An effective application of decision tree to stock trading. Expert Syst. Appl. 31(2), 270–274 (2006)

    Article  Google Scholar 

  26. Yang, Y., Chute, C.G.: An example-based mapping method for text categorization and retrieval. ACM Trans. Inf. Syst. (TOIS) 12(3), 252–277 (1994)

    Article  Google Scholar 

  27. Yıldız, H., Gençtav, M., Usta, N., Diri, B., Amasyalı, M.: Metin sınıflandırmada yeni özellik çıkarımı. In: IEEE SIU 2007 15 Sinyal İşleme, İletişim ve Uygulamaları Kurultayı (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to İlknur Dönmez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dönmez, İ., Adalı, E. (2018). Turkish Document Classification with Coarse-Grained Semantic Matrix. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75487-1_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75486-4

  • Online ISBN: 978-3-319-75487-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics