Abstract
This article introduces a security level classification methodology of confidential documents written in Turkish language. Internal documents of TUBITAK UEKAE, holding various security levels (unclassified-restricted-secret) were classified within a methodology using Support Vector Machines (SVM’s) [1] and naïve bayes classifiers [3][9]. To represent term-document relations a recommended metric “TF-IDF" [2] was chosen to construct a weight matrix. Turkic languages provide a very difficult natural language processing problem in comparison with English: “Stemming”. A Turkish stemming tool "zemberek" was used to find out the features without suffix. At the end of the article some experimental results and success metrics are projected.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cortes, C., Vapnik, V.: Support-vector Networks. Machine Learning 20, 273–297 (1995)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)
Feldman, R., Sanger, J.: Text Mining Handbook. Cambridge University Press, Cambridge (2007)
Han, J.W., Kamber, M.: Data Mining Concept and Techniques, 2nd edn. (2007)
Alparslan, E., Bahsi, B., Karahoca, A.: Classification of Turkish News Documents Using Support Vector Machines. INISTA (2009)
Cooley, R.: Classification of News Stories Using Support Vector Machines. In: IJCAI Workshop on Text Mining (1999)
Eyheramendy, S., Lewis, D., Madigan, D.: On the Naive Bayes Model for Text Categorization (2003)
Ageev, M., Dobrov, V.: Support Vector Machine Parameter Optimization for Text Categorization. In: International Conference on Information Systems Technology and its Applications (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Alparslan, E., Bahsi, H. (2010). Security Level Classification of Confidential Documents Written in Turkish. In: Daras, P., Ibarra, O.M. (eds) User Centric Media. UCMEDIA 2009. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 40. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12630-7_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-12630-7_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12629-1
Online ISBN: 978-3-642-12630-7
eBook Packages: Computer ScienceComputer Science (R0)