Abstract
Nowadays, a huge number of web documents are available on the Internet, which makes the retrieval process of a specific topic very difficult, where some irrelevant pages may be retrieved as well. The automatic classification of web documents and pages has an essential application in different domains such as medicine, health, science, and information technology. A large number of web pages classification methods have been proposed to improve the search capabilities, especially in English language. In addition, the current classification methods attempt to classify the English web pages, and at the same time to reduce the high dimensionality of features extracted from these web pages. Due to the lack of classification methods for other languages, this paper focuses on Arabic web pages classification according to its scarcity as well as the importance of the Arabic language. In particular, we propose an evolutionary model based on binary particle swarm optimization (BPSO) combined with random weight networks (RWNs) as an induction algorithm to reduce the high dimensionality of features in the Arabic web pages and to perform document classification automatically. The datasets used in this paper were collected from popular Arabic websites. We collected three different datasets relating to three different fields, namely Computer Science, Science, and Health. Further, Taguchi method is incorporated to locate the best parameters of the proposed algorithm. The experimental results showed that the proposed model gives better performance results for Arabic web pages classification. In addition, an analysis study was conducted to identify the most important features learned from the proposed model as well as the most important tags. The results showed that list tag has obtained the highest percentage, which reflect its effectiveness on the classification of Arabic web pages.
Similar content being viewed by others
Notes
References
Arabic Speaking Countries List - 2018, (1999)
A few surprising facts about the Arabic language | British Council, (2018)
Ababneh, J.; Almomani, O.; Hadi, W.; El-Omari, N.K.T.; Al-Ibrahim, A.: Vector space models to classify arabic text. Int. J. Comput. Tr. and Technol. (IJCTT) 7(4), 219–223 (2014)
AbuZeina, D.; Al-Anzi, F.S.: Employing fisher discriminant analysis for arabic text classification. Comput Electr. Eng 66, 474–486 (2018)
Al-Anzi, F.S.; AbuZeina, D.: Toward an enhanced arabic text classification using cosine similarity and latent semantic indexing. J. King Saud Univ. Comput. Inform. Sci. 29(2), 189–195 (2017)
Al-Ghuribi, S.M., Alshomrani, S.: A simple study of webpage text classification algorithms for arabic and english languages. pages 1–5, (2013)
Al-Harbi, S., Almuhareb, A., Al-Thubaity, A., Khorsheed, M.S., Al-Rajeh, A.: Automatic arabic text classification. (2008)
Al-Shammari, E.T.: Improving arabic document categorization: Introducing local stem. In Intelligent Systems Design and Applications (ISDA), 2010 10th International Conference on, pages 385–390. IEEE, (2010)
Al-Shargabi, B., Al-Romimah, W., Olayah, F.: A comparative study for arabic text classification algorithms based on stop words elimination. In Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications, page 11. ACM, (2011)
Al-Taani, A.T., Al-Awad, N.A.K.: A comparative study of web-pages classification methods using fuzzy operators applied to arabic web-pages. In IEC (Prague), pages 33–35, (2005)
Al-Tahrawi, M.M.; Al-Khatib, S.N.: Arabic text classification using polynomial networks. J. King Saud Univ. Comput. Inform. Sci. 27(4), 437–449 (2015)
Alghamdi, H.; Selamat, A.: The hybrid feature selection k-means method for arabic webpage classification. Jurnal Teknologi 70(5), 73–79 (2014)
Alghamdi, H.M.; Selamat, A.: Arabic web page clustering: A review. J. King Saud Univ.-Comput. Inform. Sci. 31, 1–14 (2017)
Aljedani, N., Alotaibi, R., Taileb, M.: Hmatc: Hierarchical multi-label arabic text classification model using machine learning. Egyptian Informatics Journal (2020)
Ayed, R., Labidi, M., Maraoui, M.: Arabic text classification: New study. In Engineering & MIS (ICEMIS), 2017 International Conference on, pages 1–7. IEEE, (2017)
Azevedo, G.L.F.B.G., Cavalcanti, G.D.C., Filho, E.C.B.C.: An approach to feature selection for keystroke dynamics systems based on pso and feature weighting. In Evolutionary Computation, 2007. CEC 2007. IEEE Congress on, pages 3577–3584. IEEE, (2007)
BenoíT, F.; Van Heeswijk, M.; Miche, Y.; Verleysen, M.; Lendasse, A.: Feature selection for nonlinear models with extreme learning machines. Neurocomputing 102, 111–124 (2013)
Bhatt, K.; Singh, A.; Singh, D.: An improved optimized web page classification using firefly algorithm with nb classifier (wpcnb). Int. J. Comput. Appl. 146(4), 15–21 (2016)
Chandrashekar, G.; Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Chantar, H.; Mafarja, M.; Alsawalqah, H.; Heidari, A.A.; Aljarah, I.; Faris, H.: Feature selection using binary grey wolf optimizer with elite-based crossover for arabic text classification. Neural Comput. and Appl. 32(16), 12201–12220 (2020)
H.K. Chantar, Corne, D.W.: Feature subset selection for arabic document categorization using bpso-knn. In Nature and Biologically Inspired Computing (NaBIC), 2011 Third World Congress on, pages 546–551. IEEE, (2011)
Das, A.; Majumder, A.; Das, P.K.: Detection of apposite pso parameters using taguchi based grey relational analysis: Optimization and implementation aspects on manufacturing related problem. Procedia materials science 6, 597–604 (2014)
Das, S., Mishra, S., Senapati, M.R.: New approaches in metaheuristic to classify medical data using artificial neural network. Arabian Journal for Science and Engineering, pages 1–13, (2020)
Dash, M.; Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)
Deng, X.; Li, Y.; Weng, J.; Zhang, J.: Feature selection for text classification: A review. Multimed. Tools and Appl. 78(3), 3797–3816 (2019)
Duwairi, R., Al-Refai, M., Khasawneh, N.: Stemming versus light stemming as feature selection techniques for arabic text categorization. In Innovations in Information Technology, 2007. IIT’07. 4th International Conference on, pages 446–450. IEEE, (2007)
Duwairi, R.; Al-Refai, M.N.; Khasawneh, N.: Feature reduction techniques for arabic text categorization. J. Am. Soci. Inform. Sci. Technol. 60(11), 2347–2352 (2009)
Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory. In Micro Machine and Human Science, 1995. MHS’95., Proceedings of the Sixth International Symposium on, pages 39–43. IEEE, (1995)
El-Halees, A.M.: Arabic text classification using maximum entropy. IUG J. Nat. Stud. 15(1), 157–167 (2015)
Kourdi, M.E., Bensaid, A., Rachidi, T.-e.: Automatic arabic document categorization based on the naïve bayes algorithm. In Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, pages 51–58. Association for Computational Linguistics, (2004)
El-Masri, M.; Altrabsheh, N.; Mansour, H.: Successes and challenges of arabic sentiment analysis research: A literature review. Soci. Network Anal. and Min. 7(1), 54 (2017)
Elhassan, R.; Ahmed, M.: Arabic text classification on full word. Int. J. Comput. Sci. Softw. Eng. (IJCSSE) 4(5), 114–120 (2015)
Elhassan, R.; Ali, M.: Arabic text classification process. Int. J. Comput. Sci. Softw. Eng 6(11), 258–265 (2017)
Eshtay, M., Faris, H., Heidari, A.A., Ala’M, A.-Z., Aljarah, I.: Autorwn: automatic construction and training of random weight networks using competitive swarm of agents. Neural Computing and Applications, pages 1–18, (2020)
Espíndola, R.P., Ebecken, N.F.F.: On extending f-measure and g-mean metrics to multi-class problems. WIT Transactions on Information and Communication Technologies 35, (2005)
Faris, H.; Ala’M, A.-Z.; Heidari, A.A.; Aljarah, I.; Mafarja, M.; Hassonah, M.A.; Fujita, H.: An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks. Inform. Fusion 48, 67–83 (2019)
Faris, H.; Heidari, A.A.; Ala’M, A.-Z.; Mafarja, M.; Aljarah, I.; Eshtay, M.; Mirjalili, S.: Time-varying hierarchical chains of salps with random weight networks for feature selection. Expert Syst. with Appl. 140, 112898 (2020)
Figueiredo, E.M.N., Ludermir, T.B.: Effect of the pso topologies on the performance of the pso-elm. In Neural Networks (SBRN), 2012 Brazilian Symposium on, pages 178–183. IEEE, (2012)
Fodil, L., Sayoud, H., Ouamour, S.: Theme classification of arabic text: A statistical approach. In Terminology and Knowledge Engineering 2014, pages 10–p, (2014)
Ghani, J.A.; Choudhury, I.A.; Hassan, H.H.: Application of taguchi method in the optimization of end milling parameters. J. Mater. Process. Technol. 145(1), 84–92 (2004)
Golub, G.H., Van Loan, C.F.: Matrix computations, volume 3. JHU Press, (2012)
Gopal, A.; Sultani, M.M.; Bansal, J.C.: On stability analysis of particle swarm optimization algorithm. Arab. J. Sci. Eng. 45(4), 2385–2394 (2020)
Habib, M., Aljarah, I., Faris, H.: A modified multi-objective particle swarm optimizer-based lévy flight: An approach toward intrusion detection in internet of things. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, (2020)
Hadni, M.; Ouatik, S.A.; Lachkar, A.: Effective arabic stemmer based hybrid approach for arabic text categorization. Int. J. Data Min. Knowledge Manage. Process 3(4), 1 (2013)
Harrag, F.; Al-Qawasmah, E.: Improving arabic text categorization using neural network with svd. JDIM 8(4), 233–239 (2010)
Hmeidi, I.; Al-Ayyoub, M.; Abdulla, N.A.; Almodawar, A.A.; Abooraig, R.; Mahyoub, N.A.: Automatic arabic text categorization: A comprehensive comparative study. J. Inform. Sci. 41(1), 114–124 (2015)
Hmeidi, I.; Hawashin, B.; El-Qawasmeh, E.: Performance of knn and svm classifiers on full word arabic articles. Adv. Eng. Inform. 22(1), 106–111 (2008)
Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)
Jbara, K.: Knowledge discovery in al-hadith using text classification algorithm. J. Am. Sci. 6(11), 409–419 (2010)
Karima, A., Zakaria, E., Yamina, T.G., Mohammed, A.A.S., Selvam, R.P., VENKATAKRISHNAN, V., et al.: Arabic text categorization: a comparative study of different representation modes. Journal of Theoretical and Applied Information Technology, 38(1):1–5, (2012)
Kennedy, J., Eberhart, R.C.: A discrete binary version of the particle swarm algorithm. In Systems, Man, and Cybernetics, 1997. Computational Cybernetics and Simulation., 1997 IEEE International Conference on, volume 5, pages 4104–4108. IEEE, (1997)
Khoja, S.: Apt: Arabic part-of-speech tagger. In Proceedings of the Student Workshop at NAACL, pages 20–25, (2001)
Khorsheed, M.S.; Al-Thubaity, A.O.: Comparative evaluation of text classification techniques using a large diverse arabic dataset. Lang. Resources and Evaluation 47(2), 513–538 (2013)
Khreisat, L.: A machine learning approach for arabic text classification using n-gram frequency statistics. J. Inform. 3(1), 72–77 (2009)
Krink, T., VesterstrOm, J.S., Riget, J.: Particle swarm optimisation with spatial particle extension. In Evolutionary Computation, 2002. CEC’02. Proceedings of the 2002 Congress on, volume 2, pages 1474–1479. IEEE, (2002)
Larkey, L.S., Ballesteros, L., Connell, M.E.: Improving stemming for arabic information retrieval: light stemming and co-occurrence analysis. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 275–282. ACM, (2002)
Lee, J.-H.; Yeh, W.-C.; Chuang, M.-C.: Web page classification based on a simplified swarm optimization. Appl. Mathe. Comput. 270, 13–24 (2015)
Marie-Sainte, S.L.; Alalyani, N.: Firefly algorithm based feature selection for arabic text classification. J. King Saud Univ. Comput. Inform. Sci. 32(3), 320–328 (2020)
Mirjalili, S.; Hashim, S.; Taherzadeh, G.; Mirjalili, S.Z.; Salehi, S.: A study of different transfer functions for binary version of particle swarm optimization. International Conference on Genetic and Evolutionary Methods 1, 2–7 (2011)
Mirjalili, S.; Lewis, A.: S-shaped versus v-shaped transfer functions for binary particle swarm optimization. Swarm and Evolut. Comput. 9, 1–14 (2013)
Mesleh, A.M., Kanaan, G.: Support vector machine text classification system: Using ant colony optimization based feature subset selection. In 2008 International Conference on Computer Engineering & Systems, pages 143–148. IEEE, (2008)
Mesleh, A.M.: Support vector machines based arabic language text classification system: feature selection comparative study. In Advances in Computer and Information Sciences and EngineeringSpringer, New York (2008)
Mesleh, A.M.: Feature sub-set selection metrics for arabic text classification. Pattern Recognit. Lett. 32(14), 1922–1929 (2011)
Naji, H., Ashour, W.: Text classification for arabic words using rep-tree. (2016)
Odeh, M.A., Abudalbouh, N.: Arabic data analysis using classification techniques. In Proceedings on the International Conference on Artificial Intelligence (ICAI), page 1. The Steering Committee of The World Congress in Computer Science, Computer \(\ldots \), 2013.
Özel, S.A.: A genetic algorithm based optimal feature selection for web page classification. In Innovations in Intelligent Systems and Applications (INISTA), 2011 International Symposium on, pages 282–286. IEEE, (2011)
Özel, S.A.: A web page classification system based on a genetic algorithm using tagged-terms as features. Expert Syst. Appl. 38(4), 3407–3415 (2011)
Patel, A.D., Pandya, V.N.: Web page classification based on context to the content extraction of articles. In Convergence in Technology (I2CT), 2017 2nd International Conference for, pages 539–541. IEEE, (2017)
Qi, X.; Davison, B.D.: Web page classification: Features and algorithms. ACM Comput. Surv. (CSUR) 41(2), 12 (2009)
Raho, G.; Al-Shalabi, R.; Kanaan, G.; Nassar, A.: Different classification algorithms based on arabic text classification: Feature selection comparative study. Int. J. Adv. Comput. Sci. Appl. Ijacsa 6(2), 23–28 (2015)
Ramos, J.; et al.: Using tf-idf to determine word relevance in document queries. Proceedings of the first instructional conference on machine learning 242, 133–142 (2003)
Rawashdeh, E.F., Aljarah, I., Faris. H.: A cooperative coevolutionary method for optimizing random weight networks and its application for medical classification problems. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, (2020)
Röhler, A.B., Chen, S.: An analysis of sub-swarms in multi-swarm systems. In Australasian Joint Conference on Artificial Intelligence, pages 271–280. Springer, (2011)
Saad, M.K.; Ashour, W.: Arabic morphological tools for text mining. Corpora 18, 19 (2010)
Salton, G.; Buckley, C.: Term-weighting approaches in automatic text retrieval. Inform. Process. Manage. 24(5), 513–523 (1988)
Saraç, E., Özel, S.A.: Web page classification using firefly optimization. In Innovations in Intelligent Systems and Applications (INISTA), 2013 IEEE International Symposium, pages 1–5, (2013)
Saraç, E., Özel, S.A.: An ant colony optimization based feature selection for web page classification. The Scientific World Journal 2014, (2014)
Sharef, B.T.; Omar, N.; Sharef, Z.T.: An automated arabic text categorization based on the frequency ratio accumulation. Int. Arab J. Inf. Technol. 11(2), 213–221 (2014)
Shdaifat, A., ALian, M.: Arabic webpages classification based on fuzzy association. International Journal of Computer Science Issues (IJCSI), 11(2):110, (2014)
Shen, D., Chen, Z., Yang, Q., Zeng, H.-J., Zhang, B., Lu, Y., Ma, W.-Y.: Web-page classification through summarization. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 242–249. ACM, (2004)
Syiam, M.M.; Fayed, Z.T.; Habib, M.B.: An intelligent system for arabic text categorization. Int. J. Intell. Comput. Informa.Sci. 6(1), 1–19 (2006)
Thabtah, F. et al.: Vsms with k-nearest neighbour to categorise arabic text data. (2008)
Wang, H., Geng, Q., Qiao, Z.: Parameter tuning of particle swarm optimization by using taguchi method and its application to motor design. In Information Science and Technology (ICIST), 2014 4th IEEE International Conference on, pages 722–726. IEEE, (2014)
Wright, W., Caspari, C.P.: A grammar of the Arabic language. Cosimo, Inc., (2011)
Xue, B.; Zhang, M.; Browne, W.N.: Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE Trans. Cybernetics 43(6), 1656–1671 (2013)
Xue, B.; Zhang, M.; Browne, W.N.; Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evolut. Comput. 20(4), 606–626 (2016)
Zahran, B.M., Kanaan, G.: Text feature selection using particle swarm optimization algorithm 1, (2009)
Zhai, J.; Wang, X.; Pang, X.: Voting-based instance selection from large data sets with mapreduce and random weight networks. Inform. Sci. 367, 1066–1077 (2016)
Ziegler, C.-N., Skubacz, M.: Content extraction from news pages using particle swarm optimization on linguistic and structural features. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, pages 242–249. IEEE Computer Society, (2007)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Shawabkeh, A., Faris, H., Aljarah, I. et al. An Evolutionary-based Random Weight Networks with Taguchi Method for Arabic Web Pages Classification. Arab J Sci Eng 46, 3955–3980 (2021). https://doi.org/10.1007/s13369-020-05301-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-020-05301-z