Skip to main content
Log in

An Evolutionary-based Random Weight Networks with Taguchi Method for Arabic Web Pages Classification

  • Research Article-Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Nowadays, a huge number of web documents are available on the Internet, which makes the retrieval process of a specific topic very difficult, where some irrelevant pages may be retrieved as well. The automatic classification of web documents and pages has an essential application in different domains such as medicine, health, science, and information technology. A large number of web pages classification methods have been proposed to improve the search capabilities, especially in English language. In addition, the current classification methods attempt to classify the English web pages, and at the same time to reduce the high dimensionality of features extracted from these web pages. Due to the lack of classification methods for other languages, this paper focuses on Arabic web pages classification according to its scarcity as well as the importance of the Arabic language. In particular, we propose an evolutionary model based on binary particle swarm optimization (BPSO) combined with random weight networks (RWNs) as an induction algorithm to reduce the high dimensionality of features in the Arabic web pages and to perform document classification automatically. The datasets used in this paper were collected from popular Arabic websites. We collected three different datasets relating to three different fields, namely Computer Science, Science, and Health. Further, Taguchi method is incorporated to locate the best parameters of the proposed algorithm. The experimental results showed that the proposed model gives better performance results for Arabic web pages classification. In addition, an analysis study was conducted to identify the most important features learned from the proposed model as well as the most important tags. The results showed that list tag has obtained the highest percentage, which reflect its effectiveness on the classification of Arabic web pages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. https://mawdoo3.com/

References

  1. Arabic Speaking Countries List - 2018, (1999)

  2. A few surprising facts about the Arabic language | British Council, (2018)

  3. Ababneh, J.; Almomani, O.; Hadi, W.; El-Omari, N.K.T.; Al-Ibrahim, A.: Vector space models to classify arabic text. Int. J. Comput. Tr. and Technol. (IJCTT) 7(4), 219–223 (2014)

    Article  Google Scholar 

  4. AbuZeina, D.; Al-Anzi, F.S.: Employing fisher discriminant analysis for arabic text classification. Comput Electr. Eng 66, 474–486 (2018)

    Article  Google Scholar 

  5. Al-Anzi, F.S.; AbuZeina, D.: Toward an enhanced arabic text classification using cosine similarity and latent semantic indexing. J. King Saud Univ. Comput. Inform. Sci. 29(2), 189–195 (2017)

    Google Scholar 

  6. Al-Ghuribi, S.M., Alshomrani, S.: A simple study of webpage text classification algorithms for arabic and english languages. pages 1–5, (2013)

  7. Al-Harbi, S., Almuhareb, A., Al-Thubaity, A., Khorsheed, M.S., Al-Rajeh, A.: Automatic arabic text classification. (2008)

  8. Al-Shammari, E.T.: Improving arabic document categorization: Introducing local stem. In Intelligent Systems Design and Applications (ISDA), 2010 10th International Conference on, pages 385–390. IEEE, (2010)

  9. Al-Shargabi, B., Al-Romimah, W., Olayah, F.: A comparative study for arabic text classification algorithms based on stop words elimination. In Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications, page 11. ACM, (2011)

  10. Al-Taani, A.T., Al-Awad, N.A.K.: A comparative study of web-pages classification methods using fuzzy operators applied to arabic web-pages. In IEC (Prague), pages 33–35, (2005)

  11. Al-Tahrawi, M.M.; Al-Khatib, S.N.: Arabic text classification using polynomial networks. J. King Saud Univ. Comput. Inform. Sci. 27(4), 437–449 (2015)

    Google Scholar 

  12. Alghamdi, H.; Selamat, A.: The hybrid feature selection k-means method for arabic webpage classification. Jurnal Teknologi 70(5), 73–79 (2014)

    Article  Google Scholar 

  13. Alghamdi, H.M.; Selamat, A.: Arabic web page clustering: A review. J. King Saud Univ.-Comput. Inform. Sci. 31, 1–14 (2017)

    Google Scholar 

  14. Aljedani, N., Alotaibi, R., Taileb, M.: Hmatc: Hierarchical multi-label arabic text classification model using machine learning. Egyptian Informatics Journal (2020)

  15. Ayed, R., Labidi, M., Maraoui, M.: Arabic text classification: New study. In Engineering & MIS (ICEMIS), 2017 International Conference on, pages 1–7. IEEE, (2017)

  16. Azevedo, G.L.F.B.G., Cavalcanti, G.D.C., Filho, E.C.B.C.: An approach to feature selection for keystroke dynamics systems based on pso and feature weighting. In Evolutionary Computation, 2007. CEC 2007. IEEE Congress on, pages 3577–3584. IEEE, (2007)

  17. BenoíT, F.; Van Heeswijk, M.; Miche, Y.; Verleysen, M.; Lendasse, A.: Feature selection for nonlinear models with extreme learning machines. Neurocomputing 102, 111–124 (2013)

    Article  Google Scholar 

  18. Bhatt, K.; Singh, A.; Singh, D.: An improved optimized web page classification using firefly algorithm with nb classifier (wpcnb). Int. J. Comput. Appl. 146(4), 15–21 (2016)

    Google Scholar 

  19. Chandrashekar, G.; Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)

    Article  Google Scholar 

  20. Chantar, H.; Mafarja, M.; Alsawalqah, H.; Heidari, A.A.; Aljarah, I.; Faris, H.: Feature selection using binary grey wolf optimizer with elite-based crossover for arabic text classification. Neural Comput. and Appl. 32(16), 12201–12220 (2020)

    Article  Google Scholar 

  21. H.K. Chantar, Corne, D.W.: Feature subset selection for arabic document categorization using bpso-knn. In Nature and Biologically Inspired Computing (NaBIC), 2011 Third World Congress on, pages 546–551. IEEE, (2011)

  22. Das, A.; Majumder, A.; Das, P.K.: Detection of apposite pso parameters using taguchi based grey relational analysis: Optimization and implementation aspects on manufacturing related problem. Procedia materials science 6, 597–604 (2014)

    Article  Google Scholar 

  23. Das, S., Mishra, S., Senapati, M.R.: New approaches in metaheuristic to classify medical data using artificial neural network. Arabian Journal for Science and Engineering, pages 1–13, (2020)

  24. Dash, M.; Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)

    Article  Google Scholar 

  25. Deng, X.; Li, Y.; Weng, J.; Zhang, J.: Feature selection for text classification: A review. Multimed. Tools and Appl. 78(3), 3797–3816 (2019)

    Article  Google Scholar 

  26. Duwairi, R., Al-Refai, M., Khasawneh, N.: Stemming versus light stemming as feature selection techniques for arabic text categorization. In Innovations in Information Technology, 2007. IIT’07. 4th International Conference on, pages 446–450. IEEE, (2007)

  27. Duwairi, R.; Al-Refai, M.N.; Khasawneh, N.: Feature reduction techniques for arabic text categorization. J. Am. Soci. Inform. Sci. Technol. 60(11), 2347–2352 (2009)

    Article  Google Scholar 

  28. Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory. In Micro Machine and Human Science, 1995. MHS’95., Proceedings of the Sixth International Symposium on, pages 39–43. IEEE, (1995)

  29. El-Halees, A.M.: Arabic text classification using maximum entropy. IUG J. Nat. Stud. 15(1), 157–167 (2015)

    Google Scholar 

  30. Kourdi, M.E., Bensaid, A., Rachidi, T.-e.: Automatic arabic document categorization based on the naïve bayes algorithm. In Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, pages 51–58. Association for Computational Linguistics, (2004)

  31. El-Masri, M.; Altrabsheh, N.; Mansour, H.: Successes and challenges of arabic sentiment analysis research: A literature review. Soci. Network Anal. and Min. 7(1), 54 (2017)

    Article  Google Scholar 

  32. Elhassan, R.; Ahmed, M.: Arabic text classification on full word. Int. J. Comput. Sci. Softw. Eng. (IJCSSE) 4(5), 114–120 (2015)

    Google Scholar 

  33. Elhassan, R.; Ali, M.: Arabic text classification process. Int. J. Comput. Sci. Softw. Eng 6(11), 258–265 (2017)

    Google Scholar 

  34. Eshtay, M., Faris, H., Heidari, A.A., Ala’M, A.-Z., Aljarah, I.: Autorwn: automatic construction and training of random weight networks using competitive swarm of agents. Neural Computing and Applications, pages 1–18, (2020)

  35. Espíndola, R.P., Ebecken, N.F.F.: On extending f-measure and g-mean metrics to multi-class problems. WIT Transactions on Information and Communication Technologies 35, (2005)

  36. Faris, H.; Ala’M, A.-Z.; Heidari, A.A.; Aljarah, I.; Mafarja, M.; Hassonah, M.A.; Fujita, H.: An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks. Inform. Fusion 48, 67–83 (2019)

    Article  Google Scholar 

  37. Faris, H.; Heidari, A.A.; Ala’M, A.-Z.; Mafarja, M.; Aljarah, I.; Eshtay, M.; Mirjalili, S.: Time-varying hierarchical chains of salps with random weight networks for feature selection. Expert Syst. with Appl. 140, 112898 (2020)

    Article  Google Scholar 

  38. Figueiredo, E.M.N., Ludermir, T.B.: Effect of the pso topologies on the performance of the pso-elm. In Neural Networks (SBRN), 2012 Brazilian Symposium on, pages 178–183. IEEE, (2012)

  39. Fodil, L., Sayoud, H., Ouamour, S.: Theme classification of arabic text: A statistical approach. In Terminology and Knowledge Engineering 2014, pages 10–p, (2014)

  40. Ghani, J.A.; Choudhury, I.A.; Hassan, H.H.: Application of taguchi method in the optimization of end milling parameters. J. Mater. Process. Technol. 145(1), 84–92 (2004)

    Article  Google Scholar 

  41. Golub, G.H., Van Loan, C.F.: Matrix computations, volume 3. JHU Press, (2012)

  42. Gopal, A.; Sultani, M.M.; Bansal, J.C.: On stability analysis of particle swarm optimization algorithm. Arab. J. Sci. Eng. 45(4), 2385–2394 (2020)

    Article  Google Scholar 

  43. Habib, M., Aljarah, I., Faris, H.: A modified multi-objective particle swarm optimizer-based lévy flight: An approach toward intrusion detection in internet of things. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, (2020)

  44. Hadni, M.; Ouatik, S.A.; Lachkar, A.: Effective arabic stemmer based hybrid approach for arabic text categorization. Int. J. Data Min. Knowledge Manage. Process 3(4), 1 (2013)

    Article  Google Scholar 

  45. Harrag, F.; Al-Qawasmah, E.: Improving arabic text categorization using neural network with svd. JDIM 8(4), 233–239 (2010)

    Google Scholar 

  46. Hmeidi, I.; Al-Ayyoub, M.; Abdulla, N.A.; Almodawar, A.A.; Abooraig, R.; Mahyoub, N.A.: Automatic arabic text categorization: A comprehensive comparative study. J. Inform. Sci. 41(1), 114–124 (2015)

    Article  Google Scholar 

  47. Hmeidi, I.; Hawashin, B.; El-Qawasmeh, E.: Performance of knn and svm classifiers on full word arabic articles. Adv. Eng. Inform. 22(1), 106–111 (2008)

    Article  Google Scholar 

  48. Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)

    Article  Google Scholar 

  49. Jbara, K.: Knowledge discovery in al-hadith using text classification algorithm. J. Am. Sci. 6(11), 409–419 (2010)

    Google Scholar 

  50. Karima, A., Zakaria, E., Yamina, T.G., Mohammed, A.A.S., Selvam, R.P., VENKATAKRISHNAN, V., et al.: Arabic text categorization: a comparative study of different representation modes. Journal of Theoretical and Applied Information Technology, 38(1):1–5, (2012)

  51. Kennedy, J., Eberhart, R.C.: A discrete binary version of the particle swarm algorithm. In Systems, Man, and Cybernetics, 1997. Computational Cybernetics and Simulation., 1997 IEEE International Conference on, volume 5, pages 4104–4108. IEEE, (1997)

  52. Khoja, S.: Apt: Arabic part-of-speech tagger. In Proceedings of the Student Workshop at NAACL, pages 20–25, (2001)

  53. Khorsheed, M.S.; Al-Thubaity, A.O.: Comparative evaluation of text classification techniques using a large diverse arabic dataset. Lang. Resources and Evaluation 47(2), 513–538 (2013)

    Article  Google Scholar 

  54. Khreisat, L.: A machine learning approach for arabic text classification using n-gram frequency statistics. J. Inform. 3(1), 72–77 (2009)

    Article  Google Scholar 

  55. Krink, T., VesterstrOm, J.S., Riget, J.: Particle swarm optimisation with spatial particle extension. In Evolutionary Computation, 2002. CEC’02. Proceedings of the 2002 Congress on, volume 2, pages 1474–1479. IEEE, (2002)

  56. Larkey, L.S., Ballesteros, L., Connell, M.E.: Improving stemming for arabic information retrieval: light stemming and co-occurrence analysis. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 275–282. ACM, (2002)

  57. Lee, J.-H.; Yeh, W.-C.; Chuang, M.-C.: Web page classification based on a simplified swarm optimization. Appl. Mathe. Comput. 270, 13–24 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  58. Marie-Sainte, S.L.; Alalyani, N.: Firefly algorithm based feature selection for arabic text classification. J. King Saud Univ. Comput. Inform. Sci. 32(3), 320–328 (2020)

    Google Scholar 

  59. Mirjalili, S.; Hashim, S.; Taherzadeh, G.; Mirjalili, S.Z.; Salehi, S.: A study of different transfer functions for binary version of particle swarm optimization. International Conference on Genetic and Evolutionary Methods 1, 2–7 (2011)

    Google Scholar 

  60. Mirjalili, S.; Lewis, A.: S-shaped versus v-shaped transfer functions for binary particle swarm optimization. Swarm and Evolut. Comput. 9, 1–14 (2013)

    Article  Google Scholar 

  61. Mesleh, A.M., Kanaan, G.: Support vector machine text classification system: Using ant colony optimization based feature subset selection. In 2008 International Conference on Computer Engineering & Systems, pages 143–148. IEEE, (2008)

  62. Mesleh, A.M.: Support vector machines based arabic language text classification system: feature selection comparative study. In Advances in Computer and Information Sciences and EngineeringSpringer, New York (2008)

    Google Scholar 

  63. Mesleh, A.M.: Feature sub-set selection metrics for arabic text classification. Pattern Recognit. Lett. 32(14), 1922–1929 (2011)

    Article  Google Scholar 

  64. Naji, H., Ashour, W.: Text classification for arabic words using rep-tree. (2016)

  65. Odeh, M.A., Abudalbouh, N.: Arabic data analysis using classification techniques. In Proceedings on the International Conference on Artificial Intelligence (ICAI), page 1. The Steering Committee of The World Congress in Computer Science, Computer \(\ldots \), 2013.

  66. Özel, S.A.: A genetic algorithm based optimal feature selection for web page classification. In Innovations in Intelligent Systems and Applications (INISTA), 2011 International Symposium on, pages 282–286. IEEE, (2011)

  67. Özel, S.A.: A web page classification system based on a genetic algorithm using tagged-terms as features. Expert Syst. Appl. 38(4), 3407–3415 (2011)

    Article  Google Scholar 

  68. Patel, A.D., Pandya, V.N.: Web page classification based on context to the content extraction of articles. In Convergence in Technology (I2CT), 2017 2nd International Conference for, pages 539–541. IEEE, (2017)

  69. Qi, X.; Davison, B.D.: Web page classification: Features and algorithms. ACM Comput. Surv. (CSUR) 41(2), 12 (2009)

    Article  Google Scholar 

  70. Raho, G.; Al-Shalabi, R.; Kanaan, G.; Nassar, A.: Different classification algorithms based on arabic text classification: Feature selection comparative study. Int. J. Adv. Comput. Sci. Appl. Ijacsa 6(2), 23–28 (2015)

    Google Scholar 

  71. Ramos, J.; et al.: Using tf-idf to determine word relevance in document queries. Proceedings of the first instructional conference on machine learning 242, 133–142 (2003)

    Google Scholar 

  72. Rawashdeh, E.F., Aljarah, I., Faris. H.: A cooperative coevolutionary method for optimizing random weight networks and its application for medical classification problems. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, (2020)

  73. Röhler, A.B., Chen, S.: An analysis of sub-swarms in multi-swarm systems. In Australasian Joint Conference on Artificial Intelligence, pages 271–280. Springer, (2011)

  74. Saad, M.K.; Ashour, W.: Arabic morphological tools for text mining. Corpora 18, 19 (2010)

    Google Scholar 

  75. Salton, G.; Buckley, C.: Term-weighting approaches in automatic text retrieval. Inform. Process. Manage. 24(5), 513–523 (1988)

    Article  Google Scholar 

  76. Saraç, E., Özel, S.A.: Web page classification using firefly optimization. In Innovations in Intelligent Systems and Applications (INISTA), 2013 IEEE International Symposium, pages 1–5, (2013)

  77. Saraç, E., Özel, S.A.: An ant colony optimization based feature selection for web page classification. The Scientific World Journal 2014, (2014)

  78. Sharef, B.T.; Omar, N.; Sharef, Z.T.: An automated arabic text categorization based on the frequency ratio accumulation. Int. Arab J. Inf. Technol. 11(2), 213–221 (2014)

    Google Scholar 

  79. Shdaifat, A., ALian, M.: Arabic webpages classification based on fuzzy association. International Journal of Computer Science Issues (IJCSI), 11(2):110, (2014)

  80. Shen, D., Chen, Z., Yang, Q., Zeng, H.-J., Zhang, B., Lu, Y., Ma, W.-Y.: Web-page classification through summarization. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 242–249. ACM, (2004)

  81. Syiam, M.M.; Fayed, Z.T.; Habib, M.B.: An intelligent system for arabic text categorization. Int. J. Intell. Comput. Informa.Sci. 6(1), 1–19 (2006)

    Google Scholar 

  82. Thabtah, F. et al.: Vsms with k-nearest neighbour to categorise arabic text data. (2008)

  83. Wang, H., Geng, Q., Qiao, Z.: Parameter tuning of particle swarm optimization by using taguchi method and its application to motor design. In Information Science and Technology (ICIST), 2014 4th IEEE International Conference on, pages 722–726. IEEE, (2014)

  84. Wright, W., Caspari, C.P.: A grammar of the Arabic language. Cosimo, Inc., (2011)

  85. Xue, B.; Zhang, M.; Browne, W.N.: Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE Trans. Cybernetics 43(6), 1656–1671 (2013)

    Article  Google Scholar 

  86. Xue, B.; Zhang, M.; Browne, W.N.; Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evolut. Comput. 20(4), 606–626 (2016)

    Article  Google Scholar 

  87. Zahran, B.M., Kanaan, G.: Text feature selection using particle swarm optimization algorithm 1, (2009)

  88. Zhai, J.; Wang, X.; Pang, X.: Voting-based instance selection from large data sets with mapreduce and random weight networks. Inform. Sci. 367, 1066–1077 (2016)

    Article  Google Scholar 

  89. Ziegler, C.-N., Skubacz, M.: Content extraction from news pages using particle swarm optimization on linguistic and structural features. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, pages 242–249. IEEE Computer Society, (2007)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ibrahim Aljarah.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shawabkeh, A., Faris, H., Aljarah, I. et al. An Evolutionary-based Random Weight Networks with Taguchi Method for Arabic Web Pages Classification. Arab J Sci Eng 46, 3955–3980 (2021). https://doi.org/10.1007/s13369-020-05301-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-020-05301-z

Keywords

Navigation