Turkish Document Classification with Coarse-Grained Semantic Matrix

Dönmez, İlknur; Adalı, Eşref

doi:10.1007/978-3-319-75487-1_37

İlknur Dönmez¹⁴ &
Eşref Adalı¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9624))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1114 Accesses

Abstract

In this paper, we present a novel method for Document Classification that uses semantic matrix representation of Turkish sentences by concentrating on the sentence phrases and their concepts in text. Our model has been designed to find phrases in a sentence, identify their relations with specific concepts, and represent the sentences as coarse-grained semantic matrix. Predicate features and semantic class type are also added to the coarse-grained semantic matrix representation. The highest success rate in Turkish Document Classification “97.12” is obtained by adding the coarse-grained semantic matrix representation to the data which has previous highest result in the previous studies about Turkish Document Classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amasyalı, M.F., Beken, A.: Türkçe kelimelerin anlamsal benzerliklerinin ölçülmesi ve metin sınıflandırmada kullanılması measurement of Turkish word semantic similarity and text categorization application
Google Scholar
Amasyalı, M.F., Diri, B.: Automatic Turkish text categorization in terms of author, genre and gender. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds.) NLDB 2006. LNCS, vol. 3999, pp. 221–226. Springer, Heidelberg (2006). https://doi.org/10.1007/11765448_22
Chapter Google Scholar
Backus, J.W.: The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM conference. In: 1959 Proceedings of the International Comference on Information Processing (1959)
Google Scholar
Baytop, T.: Türkçe bitki adları sözlüğü, vol. 578. Turk Dil Kurumu, Ankara (1994)
Google Scholar
Bilgin, O., Çetinoğlu, Ö., Oflazer, K.: Building a wordnet for Turkish. Rom. J. Inf. Sci. Technol. 7(1–2), 163–172 (2004)
Google Scholar
Çataltepe, Z., Turan, Y., Kesgin, F.: Turkish document classification using shorter roots. In: 2007 IEEE 15th Signal Processing and Communications Applications, SIU 2007, pp. 1–4. IEEE (2007)
Google Scholar
Chomsky, N.: Syntactic Structures. Walter de Gruyter, Berlin (2002)
Book MATH Google Scholar
Eryigit, G.: ITU Turkish NLP web service. In: 2014 EACL, p. 1 (2014)
Google Scholar
Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)
MATH Google Scholar
Hoffman, B.: The computational analysis of the syntax and interpretation of “free” word order in Turkish. IRCS Technical reports Series, p. 130 (1995)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683
Chapter Google Scholar
Kim, S.-B., Rim, H.-C., Yook, D.S., Lim, H.-S.: Effective methods for improving naive Bayes text classifiers. In: Ishizuka, M., Sattar, A. (eds.) PRICAI 2002. LNCS (LNAI), vol. 2417, pp. 414–423. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45683-X_45
Chapter Google Scholar
Lakoff, G.: Women, Fire, and Dangerous Things: What Categories Reveal About the Mind, vol. 1. Cambridge University Press, Cambridge (1990)
Google Scholar
Lewis, D.D.: Naive (Bayes) at forty: the independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026666
Chapter Google Scholar
Li, C.H., Park, S.C.: Text categorization based on artificial neural networks. In: King, I., Wang, J., Chan, L.-W., Wang, D.L. (eds.) ICONIP 2006. LNCS, vol. 4234, pp. 302–311. Springer, Heidelberg (2006). https://doi.org/10.1007/11893295_35
Chapter Google Scholar
Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., Zamparelli, R.: Semeval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: SemEval-2014 (2014)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nakayama, M., Shimizu, Y.: Subject categorization for web educational resources using MLP. In: ESANN, pp. 9–14 (2003)
Google Scholar
Schuler, K.K.: Verbnet: a broad-coverage, comprehensive verb lexicon (2005)
Google Scholar
Socher, R., Huval, B., Manning, C.D., Ng, A.Y.: Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1201–1211. Association for Computational Linguistics (2012)
Google Scholar
Stamou, S., Oflazer, K., Pala, K., Christoudoulakis, D., Cristea, D., Tufis, D., Koeva, S., Totkov, G., Dutoit, D., Grigoriadou, M.: BALKANET: a multilingual semantic network for the Balkan languages. In: Proceedings of the International Wordnet Conference, Mysore, India, pp. 21–25 (2002)
Google Scholar
Tan, S.: An effective refinement strategy for KNN text classifier. Expert Syst. Appl. 30(2), 290–298 (2006)
Article Google Scholar
Torunoğlu, D., Çakırman, E., Ganiz, M.C., Akyokuş, S., Gürbüz, M.Z.: Analysis of preprocessing methods on classification of Turkish texts. In: 2011 International Symposium on Innovations in Intelligent Systems and Applications (INISTA), pp. 112–117. IEEE (2011)
Google Scholar
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 104. ACM (2004)
Google Scholar
Wu, M.C., Lin, S.Y., Lin, C.H.: An effective application of decision tree to stock trading. Expert Syst. Appl. 31(2), 270–274 (2006)
Article Google Scholar
Yang, Y., Chute, C.G.: An example-based mapping method for text categorization and retrieval. ACM Trans. Inf. Syst. (TOIS) 12(3), 252–277 (1994)
Article Google Scholar
Yıldız, H., Gençtav, M., Usta, N., Diri, B., Amasyalı, M.: Metin sınıflandırmada yeni özellik çıkarımı. In: IEEE SIU 2007 15 Sinyal İşleme, İletişim ve Uygulamaları Kurultayı (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering Department, İstanbul Technical University, Maslak, 34369, İstanbul, Turkey
İlknur Dönmez & Eşref Adalı

Authors

İlknur Dönmez
View author publications
You can also search for this author in PubMed Google Scholar
Eşref Adalı
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to İlknur Dönmez .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dönmez, İ., Adalı, E. (2018). Turkish Document Classification with Coarse-Grained Semantic Matrix. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-75487-1_37
Published: 21 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75486-4
Online ISBN: 978-3-319-75487-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Turkish Document Classification with Coarse-Grained Semantic Matrix