Skip to main content

Arabic News Articles Classification Using Different Word Embeddings

  • Conference paper
  • First Online:
Emerging Trends and Applications in Artificial Intelligence ( ICETAI 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 960))

  • 26 Accesses

Abstract

With the accelerated growth of the internet, vast repositories of unstructured textual data have emerged, necessitating automated categorization algorithms for organization and insight extraction. The Arabic language, however, poses particular challenges due to its inflected nature, large vocabulary, and varying forms. This study targets the development of robust automated classification systems for Arabic text, a language increasingly adopted online. In this paper, we propose a comparison of four prevalent pre-trained word embeddings: Word2Vec (represented by Aravec), GloVe, FastText, and BERT (represented by ARBERTv2), using the widely-adopted SANAD dataset of Arabic news articles. We provide a comprehensive comparison by applying a fixed deep learning architecture across all four word embeddings to ensure fairness. The motivation behind this comparison is to bridge the knowledge gap observed in the usage of popular word embeddings for Arabic news classification. Despite the state-of-the-art results from transformer models, a significant inclination towards older methodologies still persists. Hence, we aim to highlight the efficiencies of modern techniques. Results indicate that ARBERTv2 outperforms the other embeddings, achieving 95.81%, 98.68%, and 99.30% accuracy on the Akhbarona, Alkhaleej, and Alarabiya subsets of SANAD, respectively. Despite its large number of parameters, ARBERT’s context-based word embeddings seem to offer superior performance. FastText stood out as the top performer among non-contextualized word embeddings due to its ability to capture morphological similarities and handle out-of-vocabulary words. Following closely behind was GloVe, and then came Aravec.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ababneh, A.H.: Investigating the relevance of Arabic text classification datasets based on supervised learning. J. Electron. Sci. Technol. 20(2), 100160 (2022)

    Article  Google Scholar 

  2. Aftan, S., Shah, H.: A survey on BERT and its applications. In: 2023 20th Learning and Technology Conference (L &T), pp. 161–166. IEEE (2023)

    Google Scholar 

  3. Al Qadi, L., El Rifai, H., Obaid, S., Elnagar, A.: Arabic text classification of news articles using classical supervised classifiers. In: 2019 2nd International Conference on New Trends in Computing Sciences (ICTCS), pp. 1–6. IEEE (2019)

    Google Scholar 

  4. Alammary, A.S.: Bert models for Arabic text classification: a systematic review. Appl. Sci. 12(11), 5720 (2022)

    Article  Google Scholar 

  5. Alhaj, Y.A., et al.: A novel text classification technique using improved particle swarm optimization: a case study of Arabic language. Future Internet 14(7), 194 (2022)

    Article  Google Scholar 

  6. Alhawarat, M., Aseeri, A.O.: A superior Arabic text categorization deep model (SATCDM). IEEE Access 8, 24653–24661 (2020)

    Article  Google Scholar 

  7. Boukil, S., Biniz, M., El Adnani, F., Cherrat, L., El Moutaouakkil, A.E.: Arabic text classification using deep learning technics. Int. J. Grid Distrib. Comput. 11(9), 103–114 (2018)

    Article  Google Scholar 

  8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  9. Einea, O., Elnagar, A., Al Debsi, R.: SANAD: single-label Arabic news articles dataset for automatic text categorization. Data Brief 25, 104076 (2019)

    Article  Google Scholar 

  10. El Rifai, H., Al Qadi, L., Elnagar, A.: Arabic text classification: the need for multi-labeling systems. Neural Comput. Appl. 34(2), 1135–1159 (2022)

    Article  Google Scholar 

  11. Elmadany, A., Nagoudi, E.M.B., Abdul-Mageed, M.: ORCA: a challenging benchmark for arabic language understanding. arXiv preprint arXiv:2212.10758 (2022)

  12. Elnagar, A., Al-Debsi, R., Einea, O.: Arabic text classification using deep learning models. Inf. Process. Manag. 57(1), 102121 (2020)

    Article  Google Scholar 

  13. Galal, M., Madbouly, M.M., El-Zoghby, A.: Classifying Arabic text using deep learning. J. Theor. Appl. Inf. Technol. 97(23), 3412–3422 (2019)

    Google Scholar 

  14. Grave, E., Bojanowski, P., Gupta, P., Joulin, A., Mikolov, T.: Learning word vectors for 157 languages. arXiv preprint arXiv:1802.06893 (2018)

  15. Guyon, I., Elisseeff, A.: An introduction to feature extraction. Feature Extraction: Foundations and Applications, pp. 1–25 (2006)

    Google Scholar 

  16. Habash, N.Y.: Introduction to Arabic natural language processing. Synthesis Lectures Hum. Lang. Technol. 3(1), 1–187 (2010)

    Article  Google Scholar 

  17. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)

  18. Liu, Z., Lin, Y., Sun, M., Liu, Z., Lin, Y., Sun, M.: Representation learning and NLP. In: Representation Learning for Natural Language Processing, pp. 1–11 (2020)

    Google Scholar 

  19. Liu, Z., Lin, Y., Sun, M., Liu, Z., Lin, Y., Sun, M.: Word representation. In: Representation Learning for Natural Language Processing, pp. 13–41 (2020)

    Google Scholar 

  20. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  21. Mishal, S.M., Hamad, M.M.: Text classification using convolutional neural networks (2022)

    Google Scholar 

  22. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  23. Pratiwi, N.I., Budi, I., Alfina, I.: Hate speech detection on Indonesian Instagram comments using FastText approach. In: 2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS), pp. 447–450. IEEE (2018)

    Google Scholar 

  24. Rong, X.: Word2vec parameter learning explained. arXiv preprint arXiv:1411.2738 (2014)

  25. Salloum, S.A., Mhamdi, C., Al-Emran, M., Shaalan, K.: Analysis and classification of Arabic newspapers’ Facebook pages using text mining techniques. Int. J. Inf. Technol. Lang. Stud. 1(2), 8–17 (2017)

    Google Scholar 

  26. Singh, K.N., Dorendro, A., Devi, H.M., Mahanta, A.K.: Analysis of changing trends in textual data representation. In: Santosh, K.C., Gawali, B. (eds.) RTIP2R 2020. CCIS, vol. 1380, pp. 237–251. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0507-9_21

    Chapter  Google Scholar 

  27. Soliman, A.B., Eissa, K., El-Beltagy, S.R.: AraVec: a set of Arabic word embedding models for use in Arabic NLP. Procedia Comput. Sci. 117, 256–265 (2017)

    Article  Google Scholar 

  28. Sundus, K., Al-Haj, F., Hammo, B.: A deep learning approach for Arabic text classification. In: 2019 2nd International Conference on New Trends in Computing Sciences (ICTCS), pp. 1–7. IEEE (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashraf Elnagar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khaled, M.M., Al-Barham, M., Alomari, O.A., Elnagar, A. (2024). Arabic News Articles Classification Using Different Word Embeddings. In: García Márquez, F.P., Jamil, A., Hameed, A.A., Segovia Ramírez, I. (eds) Emerging Trends and Applications in Artificial Intelligence. ICETAI 2023. Lecture Notes in Networks and Systems, vol 960. Springer, Cham. https://doi.org/10.1007/978-3-031-56728-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56728-5_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56727-8

  • Online ISBN: 978-3-031-56728-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics