Abstract
Automatic themes-based classification of Quran verses is the process of classifying verses to predefined categorizes or themes. It is an essential task for all Muslims and people interested in studying the Quran. Quran themes-based classification could be used in many natural language processing (NLP) fields such as search engines, data mining, question–answering systems, and information retrieval applications. This paper presents an ensemble multi-label classification model that automatically identifies and classifies the Quran verses based on themes/topics. The model is composed of four phases: pre-processing, data vectorization, binary relevance classifier, and voting module. Firstly, the verses of the second chapter of the Quran (Al-Baqarah) are tokenized and normalized. Then, the topics of these verses are manually labeled based on “Mushaf Al-Tajweed” classification. Secondly, verses are converted into features’ vectors using term frequency-inverse document frequency (TF-IDF) and word2vec techniques. Word2vec is used to consider the semantic meaning of Quranic words and to improve performance. Also, they are trained on a collected classic Arabic corpus of 200 million words. Then, the binary relevance multi-label classification technique is applied using three different classifiers: logistic regression, support vector machine, and random forest, which categorize verses into 393 topics/tags. Finally, the voting module is applied by picking the tags with the maximum prediction probability among the three classifiers. The results of the three classifiers and the ensemble model are compared against “Mushaf Al-Tajweed.” The ensemble model outperforms the three classifiers. Its average hamming loss, recall, precision, and F1-Score are 0.224, 81%, 75%, and 77%, respectively.
Similar content being viewed by others
References
Al-Kabi, M.N.; Kanaan, G.; Al-Shalabi, R.; Nahar, K.; Bani-Ismail, B.: Statistical classifier of the holy Quran verses (Fatiha and Yaseen chapters). J. Appl. Sci. 5(3), 580–583 (2005). https://doi.org/10.3923/jas.2005.580.583
Ta’A, A.; Abdullah, M.S.; Ali, A.B.M.; Ahmad, M.: Themes-based classification for Al-Quran knowledge ontology. In: Int. Conf. ICT Converg., pp. 89–94, 2014, https://doi.org/10.1109/ictc.2014.6983090
Abbas, N.: Quran’ search for a Concept’ Tool and Website. Unpubl. Diss. Leed, no. July 2009, pp. 1–170, 2009. http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Quran+?Search+for+a+Concept?+Tool+and+Website#0
Nurfikri, F.S.; Adiwijaya, : A comparison of Neural Network and SVM on the multi-label classification of Quran verses topic in English translation. J. Phys.: Conf. Ser. 1192, 12030 (2019). https://doi.org/10.1088/1742-6596/1192/1/012030
Pane, R.A.; Mubarok, M.S.; Huda, N.S.: A Multi-lable Classification on Topics of Quranic Verses in English Translation using Multinomial Naive Bayes. 2019, 2018, https://doi.org/10.1109/icoict.2018.8528777
Habash, M.: Mushaf Al Tajweed. Dar-Al-Maarifah, Syria (2001)
Alhawarat, M.: Extracting topics from the holy quran using generative models. Int. J. Adv. Comput. Sci. Appl. 6(12), 2016 (2016). https://doi.org/10.14569/ijacsa.2015.061238
Zakariah, M.; Khurram, M.; Omar, K.; Salah, K.: Digital quran computing : review, classification, and trend analysis digital quran computing : review, classification, and trend analysis (2017). https://doi.org/10.1007/s13369-017-2415-4
Hammo, B.H.; Sleit, A.; Baarah, A.; Abu-Salem, H.: A computational approach for identifying quranic themes. Int. J. Comput. Process. Lang. 24(02), 189–206 (2012). https://doi.org/10.1142/s1793840612400120
Thabet, N.: Understanding the thematic structure of the Qur’an: An exploratory multivariate approach. In: ACL-05 - 43rd Annu. Meet. Assoc. Comput. Linguist. Proc. Conf., no. June, pp. 7–12 (2005)
El Mouatasim, A.: Topic Classification of Arabic Text in Quran by using Matlab Topics Classification of Arabic Text in Quran by using Matlab (2018). doi: 10.1007/978-3-030-12048-1
Word Embeddings. https://en.wikipedia.org/wiki/Word_embedding
Al-Kabi, M.N.; Wahsheh, H.A.; Alsmadi, I.M.: A Topical Classification of Hadith Arabic Text, pp. 272–277 (2013)
Ta’A, A.; Abdullah, M.S.; Ali, A.B.M.; Ahmad, M.: Themes-based classification for Al-Quran knowledge ontology. In: Int. Conf. ICT Converg., no. October, pp. 89–94 (2014). https://doi.org/10.1109/ictc.2014.6983090
Hamed, S.K.; Ab Aziz, M.J.: Classification of holy quran translation using neural network technique. J. Eng. Appl. Sci. 13(12), 4468–4475 (2018). https://doi.org/10.3923/jeasci.2018.4468.4475
Rostam, N.A.P.; Malim, N.H.A.H.: Text categorisation in Quran and Hadith: overcoming the interrelation challenges using machine learning and term weighting. J. King Saud Univ. Comput. Inf. Sci. (2019). https://doi.org/10.1016/j.jksuci.2019.03.007
Izzaty, A.M.K.; Mubarok, M.S.; Huda, N.S.; Adiwijaya: A multi-label classification on topics of quranic verses in English translation using Tree Augmented Naïve Bayes. In: 2018 6th Int. Conf. Inf. Commun. Technol. ICoICT 2018, vol. 0, no. c, pp. 103–106 (2018). https://doi.org/10.1109/icoict.2018.8528802
Huda, N.S.; Mubarok, M.S.; Adiwijaya: A multi-label classification on topics of quranic verses (english translation) using backpropagation neural network with stochastic gradient descent and adam optimizer. In: 2019 7th Int. Conf. Inf. Commun. Technol. ICoICT 2019, pp. 1–5 (2019). https://doi.org/10.1109/icoict.2019.8835362
Borhani, M.: Multi-label Log-Loss function using L-BFGS for document categorization. Eng. Appl. Artif. Intell. 91, 103623 (2020). https://doi.org/10.1016/j.engappai.2020.103623
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J.: Efficient Estimation of Word Representations in Vector Space, pp. 1–12, (2013). http://arxiv.org/abs/1301.3781
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural. Inf. Process. Syst. 2013, 1–9 (2013)
Liu, L.; Tang, L.; Dong, W.; Yao, S.; Zhou, W.: An overview of topic modeling and its current applications in bioinformatics. Springerplus 5(1), 2016 (2016). https://doi.org/10.1186/s40064-016-3252-8
Rezaeinia, S.M.; Rahmani, R.; Ghodsi, A.; Veisi, H.: Sentiment analysis based on improved pre-trained word embeddings. Expert Syst. Appl. 117, 139–147 (2019). https://doi.org/10.1016/j.eswa.2018.08.044
Mohamed, E.H.; Moussa, M.E.: An Enhanced Sentiment Analysis Framework, vol. 2050031, (2020). https://doi.org/10.1142/s1469026820500315
Mohamed, E.H.; Shokry, E.M.: QSST: a Quranic Semantic Search Tool based on word embedding. J. King Saud Univ. Comput. Inf. Sci. (2020). https://doi.org/10.1016/j.jksuci.2020.01.004
Moussa, W.; Mohamed, E.H.; Haggag, M.H.: Opinion mining: a hybrid framework based on lexicon and machine learning approaches. Int. J. Comput. Appl. 0, 1–9 (2019). https://doi.org/10.1080/1206212X.2019.1615250
Wang, T.; Liu, L.; Liu, N.; Zhang, H.; Zhang, L.; Feng, S.: A multi-label text classification method via dynamic semantic representation model and deep neural network. Appl. Intell. 50(8), 2339–2351 (2020). https://doi.org/10.1007/s10489-020-01680-w
Zarrabi-Zadeh, H.: Tanzil. (2007). http://tanzil.net/
Manning, C.D.; Raghavan, P.; Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Pennington, J.; Socher, R.; Manning, C.: Glove: Global Vectors for Word Representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Nov. 2014, pp. 1532–1543, https://doi.org/10.3115/v1/d14-1162
Baroni, M.; Dinu, G.; Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: 52nd Annu. Meet. Assoc. Comput. Linguist. ACL 2014 - Proc. Conf., vol. 1, pp. 238–247, (2014). https://doi.org/10.3115/v1/p14-1023
Watan corpus, 2017. https://sites.google.com/site/mouradabbas9/corpora (accessed Jan. 20, 2017)
Alrabia, M.; Atwell, E.; Al-Salman, A.; Alhelewh, N.: KSUCCA : A Key To Exploring Arabic Historical Linguistics. Int. J. Comput. Linguist. (2014)
Saad, M.; Ashour, W.: OSAC: Open Source Arabic Corpora (2010)
Aly, M.; Atiya, A.: LABR: A large scale arabic book reviews dataset (2013)
word2vec google toolkit. https://code.google.com/archive/p/word2vec/
gensim library. https://radimrehurek.com/gensim/models/word2vec
Cherman, E.A.; Monard, M.C.; Metz, J.: Multi-label problem transformation methods: a case study. CLEI Electron. J. 14(1), 2011 (2011). https://doi.org/10.19153/cleiej.14.1.4
Hsu, B.M.: Comparison of supervised classification models on textual data. Mathematics 8(5), 2020 (2020). https://doi.org/10.3390/math8050851
Witten, I.H.; Frank, E.; Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier, Amsterdam (2011)
Hackeling, A.: Mastering Machine Learning with scikit-learn (2014)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mohamed, E.H., El-Behaidy, W.H. An Ensemble Multi-label Themes-Based Classification for Holy Qur’an Verses Using Word2Vec Embedding. Arab J Sci Eng 46, 3519–3529 (2021). https://doi.org/10.1007/s13369-020-05184-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-020-05184-0