Skip to main content
Log in

An Ensemble Multi-label Themes-Based Classification for Holy Qur’an Verses Using Word2Vec Embedding

  • Research Article-Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Automatic themes-based classification of Quran verses is the process of classifying verses to predefined categorizes or themes. It is an essential task for all Muslims and people interested in studying the Quran. Quran themes-based classification could be used in many natural language processing (NLP) fields such as search engines, data mining, question–answering systems, and information retrieval applications. This paper presents an ensemble multi-label classification model that automatically identifies and classifies the Quran verses based on themes/topics. The model is composed of four phases: pre-processing, data vectorization, binary relevance classifier, and voting module. Firstly, the verses of the second chapter of the Quran (Al-Baqarah) are tokenized and normalized. Then, the topics of these verses are manually labeled based on “Mushaf Al-Tajweed” classification. Secondly, verses are converted into features’ vectors using term frequency-inverse document frequency (TF-IDF) and word2vec techniques. Word2vec is used to consider the semantic meaning of Quranic words and to improve performance. Also, they are trained on a collected classic Arabic corpus of 200 million words. Then, the binary relevance multi-label classification technique is applied using three different classifiers: logistic regression, support vector machine, and random forest, which categorize verses into 393 topics/tags. Finally, the voting module is applied by picking the tags with the maximum prediction probability among the three classifiers. The results of the three classifiers and the ensemble model are compared against “Mushaf Al-Tajweed.” The ensemble model outperforms the three classifiers. Its average hamming loss, recall, precision, and F1-Score are 0.224, 81%, 75%, and 77%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Al-Kabi, M.N.; Kanaan, G.; Al-Shalabi, R.; Nahar, K.; Bani-Ismail, B.: Statistical classifier of the holy Quran verses (Fatiha and Yaseen chapters). J. Appl. Sci. 5(3), 580–583 (2005). https://doi.org/10.3923/jas.2005.580.583

    Article  Google Scholar 

  2. Ta’A, A.; Abdullah, M.S.; Ali, A.B.M.; Ahmad, M.: Themes-based classification for Al-Quran knowledge ontology. In: Int. Conf. ICT Converg., pp. 89–94, 2014, https://doi.org/10.1109/ictc.2014.6983090

  3. Abbas, N.: Quran’ search for a Concept’ Tool and Website. Unpubl. Diss. Leed, no. July 2009, pp. 1–170, 2009. http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Quran+?Search+for+a+Concept?+Tool+and+Website#0

  4. Nurfikri, F.S.; Adiwijaya, : A comparison of Neural Network and SVM on the multi-label classification of Quran verses topic in English translation. J. Phys.: Conf. Ser. 1192, 12030 (2019). https://doi.org/10.1088/1742-6596/1192/1/012030

    Article  Google Scholar 

  5. Pane, R.A.; Mubarok, M.S.; Huda, N.S.: A Multi-lable Classification on Topics of Quranic Verses in English Translation using Multinomial Naive Bayes. 2019, 2018, https://doi.org/10.1109/icoict.2018.8528777

  6. Habash, M.: Mushaf Al Tajweed. Dar-Al-Maarifah, Syria (2001)

  7. Alhawarat, M.: Extracting topics from the holy quran using generative models. Int. J. Adv. Comput. Sci. Appl. 6(12), 2016 (2016). https://doi.org/10.14569/ijacsa.2015.061238

    Article  Google Scholar 

  8. Zakariah, M.; Khurram, M.; Omar, K.; Salah, K.: Digital quran computing : review, classification, and trend analysis digital quran computing : review, classification, and trend analysis (2017). https://doi.org/10.1007/s13369-017-2415-4

  9. Hammo, B.H.; Sleit, A.; Baarah, A.; Abu-Salem, H.: A computational approach for identifying quranic themes. Int. J. Comput. Process. Lang. 24(02), 189–206 (2012). https://doi.org/10.1142/s1793840612400120

    Article  Google Scholar 

  10. Thabet, N.: Understanding the thematic structure of the Qur’an: An exploratory multivariate approach. In: ACL-05 - 43rd Annu. Meet. Assoc. Comput. Linguist. Proc. Conf., no. June, pp. 7–12 (2005)

  11. El Mouatasim, A.: Topic Classification of Arabic Text in Quran by using Matlab Topics Classification of Arabic Text in Quran by using Matlab (2018). doi: 10.1007/978-3-030-12048-1

  12. Word Embeddings. https://en.wikipedia.org/wiki/Word_embedding

  13. Al-Kabi, M.N.; Wahsheh, H.A.; Alsmadi, I.M.: A Topical Classification of Hadith Arabic Text, pp. 272–277 (2013)

  14. Ta’A, A.; Abdullah, M.S.; Ali, A.B.M.; Ahmad, M.: Themes-based classification for Al-Quran knowledge ontology. In: Int. Conf. ICT Converg., no. October, pp. 89–94 (2014). https://doi.org/10.1109/ictc.2014.6983090

  15. Hamed, S.K.; Ab Aziz, M.J.: Classification of holy quran translation using neural network technique. J. Eng. Appl. Sci. 13(12), 4468–4475 (2018). https://doi.org/10.3923/jeasci.2018.4468.4475

    Article  Google Scholar 

  16. Rostam, N.A.P.; Malim, N.H.A.H.: Text categorisation in Quran and Hadith: overcoming the interrelation challenges using machine learning and term weighting. J. King Saud Univ. Comput. Inf. Sci. (2019). https://doi.org/10.1016/j.jksuci.2019.03.007

    Article  Google Scholar 

  17. Izzaty, A.M.K.; Mubarok, M.S.; Huda, N.S.; Adiwijaya: A multi-label classification on topics of quranic verses in English translation using Tree Augmented Naïve Bayes. In: 2018 6th Int. Conf. Inf. Commun. Technol. ICoICT 2018, vol. 0, no. c, pp. 103–106 (2018). https://doi.org/10.1109/icoict.2018.8528802

  18. Huda, N.S.; Mubarok, M.S.; Adiwijaya: A multi-label classification on topics of quranic verses (english translation) using backpropagation neural network with stochastic gradient descent and adam optimizer. In: 2019 7th Int. Conf. Inf. Commun. Technol. ICoICT 2019, pp. 1–5 (2019). https://doi.org/10.1109/icoict.2019.8835362

  19. Borhani, M.: Multi-label Log-Loss function using L-BFGS for document categorization. Eng. Appl. Artif. Intell. 91, 103623 (2020). https://doi.org/10.1016/j.engappai.2020.103623

    Article  Google Scholar 

  20. Mikolov, T.; Chen, K.; Corrado, G.; Dean, J.: Efficient Estimation of Word Representations in Vector Space, pp. 1–12, (2013). http://arxiv.org/abs/1301.3781

  21. Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural. Inf. Process. Syst. 2013, 1–9 (2013)

    Google Scholar 

  22. Liu, L.; Tang, L.; Dong, W.; Yao, S.; Zhou, W.: An overview of topic modeling and its current applications in bioinformatics. Springerplus 5(1), 2016 (2016). https://doi.org/10.1186/s40064-016-3252-8

    Article  Google Scholar 

  23. Rezaeinia, S.M.; Rahmani, R.; Ghodsi, A.; Veisi, H.: Sentiment analysis based on improved pre-trained word embeddings. Expert Syst. Appl. 117, 139–147 (2019). https://doi.org/10.1016/j.eswa.2018.08.044

    Article  Google Scholar 

  24. Mohamed, E.H.; Moussa, M.E.: An Enhanced Sentiment Analysis Framework, vol. 2050031, (2020). https://doi.org/10.1142/s1469026820500315

  25. Mohamed, E.H.; Shokry, E.M.: QSST: a Quranic Semantic Search Tool based on word embedding. J. King Saud Univ. Comput. Inf. Sci. (2020). https://doi.org/10.1016/j.jksuci.2020.01.004

    Article  Google Scholar 

  26. Moussa, W.; Mohamed, E.H.; Haggag, M.H.: Opinion mining: a hybrid framework based on lexicon and machine learning approaches. Int. J. Comput. Appl. 0, 1–9 (2019). https://doi.org/10.1080/1206212X.2019.1615250

    Article  Google Scholar 

  27. Wang, T.; Liu, L.; Liu, N.; Zhang, H.; Zhang, L.; Feng, S.: A multi-label text classification method via dynamic semantic representation model and deep neural network. Appl. Intell. 50(8), 2339–2351 (2020). https://doi.org/10.1007/s10489-020-01680-w

    Article  Google Scholar 

  28. Zarrabi-Zadeh, H.: Tanzil. (2007). http://tanzil.net/

  29. Manning, C.D.; Raghavan, P.; Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  Google Scholar 

  30. Pennington, J.; Socher, R.; Manning, C.: Glove: Global Vectors for Word Representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Nov. 2014, pp. 1532–1543, https://doi.org/10.3115/v1/d14-1162

  31. Baroni, M.; Dinu, G.; Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: 52nd Annu. Meet. Assoc. Comput. Linguist. ACL 2014 - Proc. Conf., vol. 1, pp. 238–247, (2014). https://doi.org/10.3115/v1/p14-1023

  32. Watan corpus, 2017. https://sites.google.com/site/mouradabbas9/corpora (accessed Jan. 20, 2017)

  33. Alrabia, M.; Atwell, E.; Al-Salman, A.; Alhelewh, N.: KSUCCA : A Key To Exploring Arabic Historical Linguistics. Int. J. Comput. Linguist. (2014)

  34. Saad, M.; Ashour, W.: OSAC: Open Source Arabic Corpora (2010)

  35. Aly, M.; Atiya, A.: LABR: A large scale arabic book reviews dataset (2013)

  36. word2vec google toolkit. https://code.google.com/archive/p/word2vec/

  37. gensim library. https://radimrehurek.com/gensim/models/word2vec

  38. Cherman, E.A.; Monard, M.C.; Metz, J.: Multi-label problem transformation methods: a case study. CLEI Electron. J. 14(1), 2011 (2011). https://doi.org/10.19153/cleiej.14.1.4

    Article  Google Scholar 

  39. Hsu, B.M.: Comparison of supervised classification models on textual data. Mathematics 8(5), 2020 (2020). https://doi.org/10.3390/math8050851

    Article  Google Scholar 

  40. Witten, I.H.; Frank, E.; Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier, Amsterdam (2011)

    Google Scholar 

  41. Hackeling, A.: Mastering Machine Learning with scikit-learn (2014)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ensaf Hussein Mohamed.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mohamed, E.H., El-Behaidy, W.H. An Ensemble Multi-label Themes-Based Classification for Holy Qur’an Verses Using Word2Vec Embedding. Arab J Sci Eng 46, 3519–3529 (2021). https://doi.org/10.1007/s13369-020-05184-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-020-05184-0

Keywords

Navigation