Abstract
Nowadays, the quantity of textual content available online has experienced such a colossal increase. Hence, the need for a system to investigate this content data is mandatory. In this concern, Text Categorization (TC) highlights many performance methods and techniques to analyze, explore and classify various types of documents. This study consists of two main steps. First, we extract terms from text documents using Fuzzy Near Neighbors (FNN) with web-based mining techniques algorithm. Second, we identify documents according to a particular form of similarity based on combining all Arabic encyclopedic dictionaries using clustering algorithms. In this article, Fuzzy C-Means (FCM) as a clustering algorithm is used to perform the precision of documents’ classification. This work suggests Arabic TC based on a multilingual encyclopedic dictionary (Arabic WordNet, OMW, Wikipedia, OmegaWiki, Wictionary, and Wikidata). To evaluate the efficacy of TC approach with FNN and FCM, an experimental study using a real-world dataset is carried out. The results of the present study indicate that proposed approach outperforms the traditional one and produces good results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baraa S, Nazlia O, Zeyad S (2014) An automated Arabic text categorization based on the frequency ratio accumulation. Int Arab J Inform Technol 11(2):213–221
Kumar GD, Gosul M (2011) Web mining research and future directions. In: Wyld DC, Wozniak M, Chaki N, Meghanathan N, Nagamalai D (eds.) Advances in Network Security and Applications. CNSA
Tharwat A (2019) Parameter investigation of support vector machine classifier with kernel functions. Knowl Inf Syst 61:1269–1302
Ali N, Neagu D, Trundle P (2019) Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets. SN Appl Sci 1:1559
Chen J, Dai Z, Duan J et al (2019) Improved Naive Bayes with optimal correlation factor for text classification. SN Appl Sci 1:1129
Gavade AB, Rajpurohit VS (2020) Sparse-FCM and deep learning for effective classification of land area in multi-spectral satellite images. Evol Intel
Schroder HW, Welling H, Wellegehausen B (1973) Appl Phys 1:343–348
Bezdek JC (1987) Analysis of Fuzzy Information, vol 1, 3. CRC Press, Boca Raton
Al-Radaideh QA, Al-Abrat MA (2019) An Arabic text categorization approach using term weighting and multiple reducts. Soft Comput 23:5849–5863
Chantar H, Mafarja M, Alsawalqah H, et al (2019) Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text classification. Neural Comput Appl
Mesleh A, Ghassan G (2008) Support vector machine text classification system: using ant colony optimization based feature subset selection. Comput Eng Syst
Razavi AR, Gill H, Ă…hlfeldt H, Shahsavar N (2005) A Data Pre-Processing Method To Increase Efficiency And Accuracy In Data Mining. In: Miksch S, Hunter J, Keravnou ET (eds.) Artificial Intelligence in Medicine. AIME 2005. Lecture Notes in Computer Science, vol 3581. Springer, Heidelberg
Biehl M (2012) Admire LVQ—adaptive distance measures in relevance learning vector quantization. Künstl Intell 26:391–395
Al-Radaideh Q, Al-Khateeb S (2015) An associative rule-based classifier for Arabic medical text. Int J Knowl Eng Data Min 3(3–4):255–273
Kramer O (2013) K-nearest neighbors. In: Dimensionality Reduction with Unsupervised Nearest Neighbors. In: Intelligent Systems Reference Library, vol 51. Springer, Heidelberg
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gouiouez, M. (2021). A Fuzzy Near Neighbors Approach for Arabic Text Categorization Based on Web Mining Technique. In: Motahhir, S., Bossoufi, B. (eds) Digital Technologies and Applications. ICDTA 2021. Lecture Notes in Networks and Systems, vol 211. Springer, Cham. https://doi.org/10.1007/978-3-030-73882-2_52
Download citation
DOI: https://doi.org/10.1007/978-3-030-73882-2_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73881-5
Online ISBN: 978-3-030-73882-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)