Abstract
Text classification is a very important area in information retrieval. Text classification techniques used to classify documents into a set of predefined categories. There are several techniques and methods used to classify data and in fact there are many researches talks about English text classification. Unfortunately, few researches talks about Arabic text classification. This paper talks about three well-known techniques used to classify data. These three well-known techniques are applied on Arabic data set. A comparative study is made between these three techniques. Also this study used fixed number of documents for all categories of documents in training and testing phase. The result shows that the Support Vector machine gives the best results.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Motaz K Saad, Wesam Ashour, “Arabic Text Classification Using Decision Trees”(2010) proceedings of the 12th international workshop on computer science and information technologies CSIT’2010, Moscow—Saint-Petersburg, Russia, 2010
Mofleh Al-diabat (2012),” Arabic Text Categorization Using Classification Rule Mining “Applied Mathematical Sciences, Vol. 6, 2012, no. 81,pp. 4033-404.
Sebastiani, F. (2002) ’Machine learning in automated text categorization’ ACM Publication:. ACM Computing Surveys, Vol. 34, No. 1, March 2002, pp. 1-4.
Rasha Elhassan, Mahmoud Ahmed (2015),” Arabic Text Classification review “ International Journal of Computer Science and Software Engineering (IJCSSE), Volume 4, Issue 1, January 2015
Adel Hamdan,, Raed Abu-Zitar “Spam Detection Using Assisted Artificial immune System”, Volume: 25, Issue: 8(2011) pp. 1275–1295, International Journal of Pattern Recognition and Artificial Intelligence,.
Raed Abu-Zitar,Adel Hamdan,,” Application of Genetic Optimized Artificial Immune System and Neural Networks in Spam Detection,Applied Soft Computing, Volume 11, Issue 4, June 2011, Pages 3827–3845,Elsevier, 2011
Dharmadhikari, C.S., Ingle, M. and Kulkarni, P.(2011) "Empirical Studies on Machine Learning Based Text Classification Algorithms," Advanced Computing: An International Journal (ACIJ), Vol.2, 2011.
Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., and Wang Z., “ A Noval Feature Selection Algorithm for text catogorization.” Elsevier, science Direct Expert system with application −2006, 33(1), pp. 1–5, 2006.
Rehab Duwairi (2007), “ Arabic c text categorization”, the international Arab journal of information technology, Vol4, No2,April 2007.
Mesleh, A.M. (2008), “Support Vector Machines Based Arabic Language Text Classification System: Feature Selection Comparative Study,” Advances in Computer and Information Sciences and Engineering, Springer Science + Business Media B.V., 2008
Rehab Duwairi (2005),Machine learning for Arabic text categorization,” Journal of American society for information science and technology (JASIST), Vol57, No8,pp1005-1010, 2005
Eldos T. (2003), “Arabic Text Data Mining” A root Based Hierarchical Indexing Model”, International Journal of Modeling and Simulation, vol23, no3,pp158-166,2003
Aurangzeb Khan, Baharum Baharudin, Lam Hong Lee*, Khairullah khan, (2010)“ A Review of Machine Learning Algorithms for Text-Documents Classification, journal of advances in information technology, vol. 1, no. 1, February 2010.
Franca Debole et al., “Supervised Term Weighting for Automated Text Categorization”, proceedings of SAC-03, 18th ACM Symposium on Applied Computing, Melbourne, 2003, USA
Al-Zaghoul F., Al-Dhaheri S., “Arabic Text Classification Based on Features Reduction Using Artificial Neural Networks”, UKSim, pp. 485–490. 2013.
Johannes Furnkranz., “A Study Using n Gram Features For Text Categorization”, Technical Report OEFAI-TR-1998-30.
Abu-Errub A., “Arabic Text Classification Algorithm using TFIDF and Chi Square Measurements”, International Journal of Computer Applications 93 (6), 40–45, 2014.
Liu, H. and Motoda,(1998)., “Feature Extraction, constraction and selection: A Data Mining Perpective.”, Boston, Massachusetts(MA): Kluwer Academic Publishers.
Wang, Y., and Wang X.J.(2005), “ A New Approach to feature selection in Text Classification”, Proceedings of 4th International Conference on Machine Learning and Cybernetics, IEEE-2005, Vol.6, pp. 3814–3819, 2005.
Yan, J., Liu, N., Zhang, B., Yan, S., Chen, Z., Cheng, Q., Fan, W., and Ma, W., “OCFS: Optimal Orthogonal centroid Feature selection for Text Categorization.” 28 Annual International conference on Reserch and Informational reterival, ACM SIGIR, (2005).
Zi-Qiang Wang, Xia Sun, De-Xian Zhang, Xin Li “An Optimal Svm-Based Text Classification Algorithm” Fifth International Conference on Machine Learning and Cybernetics, Dalian,pp. 13–16, 2006.Barizal,, pp. 122–129, 2005.
Montanes,E., Ferandez, J., Diaz, I., Combarro, E.F and Ranilla, J., “ Measures of Rule Quality for Feature Selection in Text Categorization”, 5th international Symposium on Intelligent data analysis, Germeny-2003, Springer- Verlag 2003, Vol2810, pp. 589–598, 2003.
Al-Shalabi R, and Evans M.,”A Computational Morphology System For Arabic”, in proceedings of Computational approaches to semitic languages workshop (COLING’98), Montreal, Canada, pp. 58–65,1998.
El-Sadany T. A. and Hashish M. A., “An Arabic Morphological System.” IBM Systems Journal, vol. 28, no 4,pp 600–612,1989.
Gheith M. and El-Sadany T., “Arabic Morphological Analyzer on a personal Computer”, in proceedings of the Arabic Morphology Workshop Stanford University, California, USA, pp55-65, 1987.
Hilal Y, “Automatic Processing of Arabic Language and application [in Arabic] “, in proceedings of the 1st Kuwaiti Computer conference, Kuwait, pp. 145–171, 1989.
Al-Shalabi, R., Kanaan, G., and Muaidi, H. (2003). New Approach for Extracting Arabic Roots. Proceeding of the International Arab Conference on Information Technology. Alexandria, Egypt.
Vladimir, N., Vapnik. 1995. The Nature of Statistical Learning Theory. Springer-Verlag Berlin.
Thorsten Joachims. “Text categorization with support vector machines: learning with many relevant features”. In Proceedings of the 10th European Conference on Machine Learning ECML-98, Chemnitz, Germany. Pages 137–142. 1998.
Cristianini, N., and J. Shawe-Taylor. 2000 An Introduction to Support Vector Machines (and other kernel-based learning methods). Cambridge University Press
Tarek Fouad Gharib, Mena Badieh Habib, and Zaki Taha Fayed, “Arabic Text Classification Using Support Vector Machines”, International Journal of Computers and Their Applications, Vol (16), Issue(4),(2009).
W.S. Noble. What is a support vector machine? Nature Biotechnology, vol (24) Number (12) 2006.
N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge UP, Cambridge, UK, 2000
Russell, Stuart; Norvig, Peter (2003) [1995]. Artificial Intelligence: A Modern Approach (2nd ed.). Prentice Hall. ISBN 978-0137903955
Saleh Alsaleem, Automated Arabic Text Categorization Using SVM and NB, International Arab Journal of e-Technology, Vol. 2, No. 2, June 2011
Rish, Irina (2001). An empirical study of the naive Bayes classifier. IJCAI Workshop on Empirical Methods in AI.
Rosenblatt, F. (1958). “The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain”. Psychological Review 65 (6): 386-40.
M. Caudill, and C. Butler, “Understanding Neural Networks: Computer Explorations”, Vols. 1 and 2, MIT Press, Cambridge MA, USA, 1992.
Fouzi Harrag, Eyas Al-Qawasmah,” Improving Arabic Text Categorization Using Neural Network with SVD”, Journal of Digital Information Management, Volume 8 Number 4 θ August 2010
D. Rumelhart, G. Hinton, and R. Williams, “Learning Internal Representations by Error Propagation”, Parallel Distributed Processing, MIT Press, Cambridge MA, USA 1986
El-Kourdi, M., Bensaid, A., and Rachidi, T. “Automatic Arabic Document Categorization Based on the Naïve Bayes Algorithm,” 20th International Conference on Computational Linguistics, 2004, Geneva
Abdelwadood Moh’d A Mesleh, “Chi Square Feature Extraction Based Svms Arabic Language Text Categorization System, Journal of Computer Science 3 (6): 430–435, 2007
Al-Harbi S., Almuhareb A., Al-Thubaity A., Khorsheed M. S., and Al-Rajeh A., “Automatic Arabic Text Classification,” 9es Journéesinternationals, France, pp. 77–83, 2008.
Author information
Authors and Affiliations
Additional information
© The Author(s) 2016. This article is published with open access by the GSTF.
Adel Hamdan Mohammad, Computer Science Department, The world Islamic Sciences and Education University, Amman -Jordan,
Tariq Alwada‘n, Computer Science Department, The world Islamic Sciences and Education University, Amman-Jordan
Omar Al-Momani Network Department The world Islamic Sciences and Education University, Amman-Jordan
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mohammad, A., Alwada‘n, T. & Al-Momani, O. Arabic Text Categorization Using Support vector machine, Naïve Bayes and Neural Network. GSTF J Comput 5, 16 (2016). https://doi.org/10.7603/s40601-016-0016-9
Published:
DOI: https://doi.org/10.7603/s40601-016-0016-9