An Ensemble Multi-label Themes-Based Classification for Holy Qur’an Verses Using Word2Vec Embedding

Mohamed, Ensaf Hussein; El-Behaidy, Wessam H.

doi:10.1007/s13369-020-05184-0

An Ensemble Multi-label Themes-Based Classification for Holy Qur’an Verses Using Word2Vec Embedding

Research Article-Computer Engineering and Computer Science
Published: 03 January 2021

Volume 46, pages 3519–3529, (2021)
Cite this article

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

619 Accesses
8 Citations
Explore all metrics

Abstract

Automatic themes-based classification of Quran verses is the process of classifying verses to predefined categorizes or themes. It is an essential task for all Muslims and people interested in studying the Quran. Quran themes-based classification could be used in many natural language processing (NLP) fields such as search engines, data mining, question–answering systems, and information retrieval applications. This paper presents an ensemble multi-label classification model that automatically identifies and classifies the Quran verses based on themes/topics. The model is composed of four phases: pre-processing, data vectorization, binary relevance classifier, and voting module. Firstly, the verses of the second chapter of the Quran (Al-Baqarah) are tokenized and normalized. Then, the topics of these verses are manually labeled based on “Mushaf Al-Tajweed” classification. Secondly, verses are converted into features’ vectors using term frequency-inverse document frequency (TF-IDF) and word2vec techniques. Word2vec is used to consider the semantic meaning of Quranic words and to improve performance. Also, they are trained on a collected classic Arabic corpus of 200 million words. Then, the binary relevance multi-label classification technique is applied using three different classifiers: logistic regression, support vector machine, and random forest, which categorize verses into 393 topics/tags. Finally, the voting module is applied by picking the tags with the maximum prediction probability among the three classifiers. The results of the three classifiers and the ensemble model are compared against “Mushaf Al-Tajweed.” The ensemble model outperforms the three classifiers. Its average hamming loss, recall, precision, and F1-Score are 0.224, 81%, 75%, and 77%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Group-Based Feature Selection Approach to Improve Classification of Holy Quran Verses

Deep Learning and Super-Hybrid Textual Feature Based Multi-category Thematic Classifier for Punjabi Poetry

Distributed Vector Representations of Folksong Motifs

References

Al-Kabi, M.N.; Kanaan, G.; Al-Shalabi, R.; Nahar, K.; Bani-Ismail, B.: Statistical classifier of the holy Quran verses (Fatiha and Yaseen chapters). J. Appl. Sci. 5(3), 580–583 (2005). https://doi.org/10.3923/jas.2005.580.583
Article Google Scholar
Ta’A, A.; Abdullah, M.S.; Ali, A.B.M.; Ahmad, M.: Themes-based classification for Al-Quran knowledge ontology. In: Int. Conf. ICT Converg., pp. 89–94, 2014, https://doi.org/10.1109/ictc.2014.6983090
Abbas, N.: Quran’ search for a Concept’ Tool and Website. Unpubl. Diss. Leed, no. July 2009, pp. 1–170, 2009. http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Quran+?Search+for+a+Concept?+Tool+and+Website#0
Nurfikri, F.S.; Adiwijaya, : A comparison of Neural Network and SVM on the multi-label classification of Quran verses topic in English translation. J. Phys.: Conf. Ser. 1192, 12030 (2019). https://doi.org/10.1088/1742-6596/1192/1/012030
Article Google Scholar
Pane, R.A.; Mubarok, M.S.; Huda, N.S.: A Multi-lable Classification on Topics of Quranic Verses in English Translation using Multinomial Naive Bayes. 2019, 2018, https://doi.org/10.1109/icoict.2018.8528777
Habash, M.: Mushaf Al Tajweed. Dar-Al-Maarifah, Syria (2001)
Alhawarat, M.: Extracting topics from the holy quran using generative models. Int. J. Adv. Comput. Sci. Appl. 6(12), 2016 (2016). https://doi.org/10.14569/ijacsa.2015.061238
Article Google Scholar
Zakariah, M.; Khurram, M.; Omar, K.; Salah, K.: Digital quran computing : review, classification, and trend analysis digital quran computing : review, classification, and trend analysis (2017). https://doi.org/10.1007/s13369-017-2415-4
Hammo, B.H.; Sleit, A.; Baarah, A.; Abu-Salem, H.: A computational approach for identifying quranic themes. Int. J. Comput. Process. Lang. 24(02), 189–206 (2012). https://doi.org/10.1142/s1793840612400120
Article Google Scholar
Thabet, N.: Understanding the thematic structure of the Qur’an: An exploratory multivariate approach. In: ACL-05 - 43rd Annu. Meet. Assoc. Comput. Linguist. Proc. Conf., no. June, pp. 7–12 (2005)
El Mouatasim, A.: Topic Classification of Arabic Text in Quran by using Matlab Topics Classification of Arabic Text in Quran by using Matlab (2018). doi: 10.1007/978-3-030-12048-1
Word Embeddings. https://en.wikipedia.org/wiki/Word_embedding
Al-Kabi, M.N.; Wahsheh, H.A.; Alsmadi, I.M.: A Topical Classification of Hadith Arabic Text, pp. 272–277 (2013)
Ta’A, A.; Abdullah, M.S.; Ali, A.B.M.; Ahmad, M.: Themes-based classification for Al-Quran knowledge ontology. In: Int. Conf. ICT Converg., no. October, pp. 89–94 (2014). https://doi.org/10.1109/ictc.2014.6983090
Hamed, S.K.; Ab Aziz, M.J.: Classification of holy quran translation using neural network technique. J. Eng. Appl. Sci. 13(12), 4468–4475 (2018). https://doi.org/10.3923/jeasci.2018.4468.4475
Article Google Scholar
Rostam, N.A.P.; Malim, N.H.A.H.: Text categorisation in Quran and Hadith: overcoming the interrelation challenges using machine learning and term weighting. J. King Saud Univ. Comput. Inf. Sci. (2019). https://doi.org/10.1016/j.jksuci.2019.03.007
Article Google Scholar
Izzaty, A.M.K.; Mubarok, M.S.; Huda, N.S.; Adiwijaya: A multi-label classification on topics of quranic verses in English translation using Tree Augmented Naïve Bayes. In: 2018 6th Int. Conf. Inf. Commun. Technol. ICoICT 2018, vol. 0, no. c, pp. 103–106 (2018). https://doi.org/10.1109/icoict.2018.8528802
Huda, N.S.; Mubarok, M.S.; Adiwijaya: A multi-label classification on topics of quranic verses (english translation) using backpropagation neural network with stochastic gradient descent and adam optimizer. In: 2019 7th Int. Conf. Inf. Commun. Technol. ICoICT 2019, pp. 1–5 (2019). https://doi.org/10.1109/icoict.2019.8835362
Borhani, M.: Multi-label Log-Loss function using L-BFGS for document categorization. Eng. Appl. Artif. Intell. 91, 103623 (2020). https://doi.org/10.1016/j.engappai.2020.103623
Article Google Scholar
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J.: Efficient Estimation of Word Representations in Vector Space, pp. 1–12, (2013). http://arxiv.org/abs/1301.3781
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural. Inf. Process. Syst. 2013, 1–9 (2013)
Google Scholar
Liu, L.; Tang, L.; Dong, W.; Yao, S.; Zhou, W.: An overview of topic modeling and its current applications in bioinformatics. Springerplus 5(1), 2016 (2016). https://doi.org/10.1186/s40064-016-3252-8
Article Google Scholar
Rezaeinia, S.M.; Rahmani, R.; Ghodsi, A.; Veisi, H.: Sentiment analysis based on improved pre-trained word embeddings. Expert Syst. Appl. 117, 139–147 (2019). https://doi.org/10.1016/j.eswa.2018.08.044
Article Google Scholar
Mohamed, E.H.; Moussa, M.E.: An Enhanced Sentiment Analysis Framework, vol. 2050031, (2020). https://doi.org/10.1142/s1469026820500315
Mohamed, E.H.; Shokry, E.M.: QSST: a Quranic Semantic Search Tool based on word embedding. J. King Saud Univ. Comput. Inf. Sci. (2020). https://doi.org/10.1016/j.jksuci.2020.01.004
Article Google Scholar
Moussa, W.; Mohamed, E.H.; Haggag, M.H.: Opinion mining: a hybrid framework based on lexicon and machine learning approaches. Int. J. Comput. Appl. 0, 1–9 (2019). https://doi.org/10.1080/1206212X.2019.1615250
Article Google Scholar
Wang, T.; Liu, L.; Liu, N.; Zhang, H.; Zhang, L.; Feng, S.: A multi-label text classification method via dynamic semantic representation model and deep neural network. Appl. Intell. 50(8), 2339–2351 (2020). https://doi.org/10.1007/s10489-020-01680-w
Article Google Scholar
Zarrabi-Zadeh, H.: Tanzil. (2007). http://tanzil.net/
Manning, C.D.; Raghavan, P.; Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book Google Scholar
Pennington, J.; Socher, R.; Manning, C.: Glove: Global Vectors for Word Representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Nov. 2014, pp. 1532–1543, https://doi.org/10.3115/v1/d14-1162
Baroni, M.; Dinu, G.; Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: 52nd Annu. Meet. Assoc. Comput. Linguist. ACL 2014 - Proc. Conf., vol. 1, pp. 238–247, (2014). https://doi.org/10.3115/v1/p14-1023
Watan corpus, 2017. https://sites.google.com/site/mouradabbas9/corpora (accessed Jan. 20, 2017)
Alrabia, M.; Atwell, E.; Al-Salman, A.; Alhelewh, N.: KSUCCA : A Key To Exploring Arabic Historical Linguistics. Int. J. Comput. Linguist. (2014)
Saad, M.; Ashour, W.: OSAC: Open Source Arabic Corpora (2010)
Aly, M.; Atiya, A.: LABR: A large scale arabic book reviews dataset (2013)
word2vec google toolkit. https://code.google.com/archive/p/word2vec/
gensim library. https://radimrehurek.com/gensim/models/word2vec
Cherman, E.A.; Monard, M.C.; Metz, J.: Multi-label problem transformation methods: a case study. CLEI Electron. J. 14(1), 2011 (2011). https://doi.org/10.19153/cleiej.14.1.4
Article Google Scholar
Hsu, B.M.: Comparison of supervised classification models on textual data. Mathematics 8(5), 2020 (2020). https://doi.org/10.3390/math8050851
Article Google Scholar
Witten, I.H.; Frank, E.; Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier, Amsterdam (2011)
Google Scholar
Hackeling, A.: Mastering Machine Learning with scikit-learn (2014)

Download references

Author information

Authors and Affiliations

Faculty of Computers and Artificial Intelligence, Helwan University, Cairo, Egypt
Ensaf Hussein Mohamed & Wessam H. El-Behaidy

Authors

Ensaf Hussein Mohamed
View author publications
You can also search for this author in PubMed Google Scholar
Wessam H. El-Behaidy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ensaf Hussein Mohamed.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mohamed, E.H., El-Behaidy, W.H. An Ensemble Multi-label Themes-Based Classification for Holy Qur’an Verses Using Word2Vec Embedding. Arab J Sci Eng 46, 3519–3529 (2021). https://doi.org/10.1007/s13369-020-05184-0

Download citation

Received: 09 July 2020
Accepted: 24 November 2020
Published: 03 January 2021
Issue Date: April 2021
DOI: https://doi.org/10.1007/s13369-020-05184-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Ensemble Multi-label Themes-Based Classification for Holy Qur’an Verses Using Word2Vec Embedding

Abstract

Access this article

Similar content being viewed by others

A Group-Based Feature Selection Approach to Improve Classification of Holy Quran Verses

Deep Learning and Super-Hybrid Textual Feature Based Multi-category Thematic Classifier for Punjabi Poetry

Distributed Vector Representations of Folksong Motifs

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Ensemble Multi-label Themes-Based Classification for Holy Qur’an Verses Using Word2Vec Embedding

Abstract

Access this article

Similar content being viewed by others

A Group-Based Feature Selection Approach to Improve Classification of Holy Quran Verses

Deep Learning and Super-Hybrid Textual Feature Based Multi-category Thematic Classifier for Punjabi Poetry

Distributed Vector Representations of Folksong Motifs

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation