Abstract
In this paper, we present a novel method for Document Classification that uses semantic matrix representation of Turkish sentences by concentrating on the sentence phrases and their concepts in text. Our model has been designed to find phrases in a sentence, identify their relations with specific concepts, and represent the sentences as coarse-grained semantic matrix. Predicate features and semantic class type are also added to the coarse-grained semantic matrix representation. The highest success rate in Turkish Document Classification “97.12” is obtained by adding the coarse-grained semantic matrix representation to the data which has previous highest result in the previous studies about Turkish Document Classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amasyalı, M.F., Beken, A.: Türkçe kelimelerin anlamsal benzerliklerinin ölçülmesi ve metin sınıflandırmada kullanılması measurement of Turkish word semantic similarity and text categorization application
Amasyalı, M.F., Diri, B.: Automatic Turkish text categorization in terms of author, genre and gender. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds.) NLDB 2006. LNCS, vol. 3999, pp. 221–226. Springer, Heidelberg (2006). https://doi.org/10.1007/11765448_22
Backus, J.W.: The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM conference. In: 1959 Proceedings of the International Comference on Information Processing (1959)
Baytop, T.: Türkçe bitki adları sözlüğü, vol. 578. Turk Dil Kurumu, Ankara (1994)
Bilgin, O., Çetinoğlu, Ö., Oflazer, K.: Building a wordnet for Turkish. Rom. J. Inf. Sci. Technol. 7(1–2), 163–172 (2004)
Çataltepe, Z., Turan, Y., Kesgin, F.: Turkish document classification using shorter roots. In: 2007 IEEE 15th Signal Processing and Communications Applications, SIU 2007, pp. 1–4. IEEE (2007)
Chomsky, N.: Syntactic Structures. Walter de Gruyter, Berlin (2002)
Eryigit, G.: ITU Turkish NLP web service. In: 2014 EACL, p. 1 (2014)
Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)
Hoffman, B.: The computational analysis of the syntax and interpretation of “free” word order in Turkish. IRCS Technical reports Series, p. 130 (1995)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683
Kim, S.-B., Rim, H.-C., Yook, D.S., Lim, H.-S.: Effective methods for improving naive Bayes text classifiers. In: Ishizuka, M., Sattar, A. (eds.) PRICAI 2002. LNCS (LNAI), vol. 2417, pp. 414–423. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45683-X_45
Lakoff, G.: Women, Fire, and Dangerous Things: What Categories Reveal About the Mind, vol. 1. Cambridge University Press, Cambridge (1990)
Lewis, D.D.: Naive (Bayes) at forty: the independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026666
Li, C.H., Park, S.C.: Text categorization based on artificial neural networks. In: King, I., Wang, J., Chan, L.-W., Wang, D.L. (eds.) ICONIP 2006. LNCS, vol. 4234, pp. 302–311. Springer, Heidelberg (2006). https://doi.org/10.1007/11893295_35
Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., Zamparelli, R.: Semeval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: SemEval-2014 (2014)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nakayama, M., Shimizu, Y.: Subject categorization for web educational resources using MLP. In: ESANN, pp. 9–14 (2003)
Schuler, K.K.: Verbnet: a broad-coverage, comprehensive verb lexicon (2005)
Socher, R., Huval, B., Manning, C.D., Ng, A.Y.: Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1201–1211. Association for Computational Linguistics (2012)
Stamou, S., Oflazer, K., Pala, K., Christoudoulakis, D., Cristea, D., Tufis, D., Koeva, S., Totkov, G., Dutoit, D., Grigoriadou, M.: BALKANET: a multilingual semantic network for the Balkan languages. In: Proceedings of the International Wordnet Conference, Mysore, India, pp. 21–25 (2002)
Tan, S.: An effective refinement strategy for KNN text classifier. Expert Syst. Appl. 30(2), 290–298 (2006)
Torunoğlu, D., Çakırman, E., Ganiz, M.C., Akyokuş, S., Gürbüz, M.Z.: Analysis of preprocessing methods on classification of Turkish texts. In: 2011 International Symposium on Innovations in Intelligent Systems and Applications (INISTA), pp. 112–117. IEEE (2011)
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 104. ACM (2004)
Wu, M.C., Lin, S.Y., Lin, C.H.: An effective application of decision tree to stock trading. Expert Syst. Appl. 31(2), 270–274 (2006)
Yang, Y., Chute, C.G.: An example-based mapping method for text categorization and retrieval. ACM Trans. Inf. Syst. (TOIS) 12(3), 252–277 (1994)
Yıldız, H., Gençtav, M., Usta, N., Diri, B., Amasyalı, M.: Metin sınıflandırmada yeni özellik çıkarımı. In: IEEE SIU 2007 15 Sinyal İşleme, İletişim ve Uygulamaları Kurultayı (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Dönmez, İ., Adalı, E. (2018). Turkish Document Classification with Coarse-Grained Semantic Matrix. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-75487-1_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75486-4
Online ISBN: 978-3-319-75487-1
eBook Packages: Computer ScienceComputer Science (R0)