Abstract
Hadiths are important textual sources of law, tradition, and teaching in the Islamic world. Analyzing the unique linguistic features of Hadiths (e.g. ancient Arabic language and story-like text) results to compile and utilize specific natural language processing methods. In the literature, no study is solely focused on Hadith from artificial intelligence perspective, while many new developments have been overlooked and need to be highlighted. Therefore, this review analyze all academic journal and conference publications that using two main methods of artificial intelligence for Hadith text: Hadith classification and mining. All Hadith relevant methods and algorithms from the literature are discussed and analyzed in terms of functionality, simplicity, F-score and accuracy. Using various different Hadith datasets makes a direct comparison between the evaluation results impossible. Therefore, we have re-implemented and evaluated the methods using a single dataset (i.e. 3150 Hadiths from Sahih Al-Bukhari book). The result of evaluation on the classification method reveals that neural networks classify the Hadith with 94 % accuracy. This is because neural networks are capable of handling complex (high dimensional) input data. The Hadith mining method that combines vector space model, Cosine similarity, and enriched queries obtains the best accuracy result (i.e. 88 %) among other re-evaluated Hadith mining methods. The most important aspect in Hadith mining methods is query expansion since the query must be fitted to the Hadith lingo. The lack of knowledge based methods is evident in Hadith classification and mining approaches and this absence can be covered in future works using knowledge graphs.
Similar content being viewed by others
References
Abuzeina DEM (2011) Utilizing data-driven and knowledge-based techniques to enhance Arabic speech recognition. King Fahd University of Petroleum and Minerals, Saudi Arabia
Aizawa A (2003) An information-theoretic perspective of tf-idf measures. Inf Process Manag 39(1):45–65. doi:10.1016/S0306-4573(02)00021-3
Al Kharashi IA, Al Sughaiyer IA (2002) Rule merging in a rule-based Arabic stemmer. In: Proceedings of the 19th international conference on computational linguistics, vol 1. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 1–7. doi:10.3115/1072228.1072265
Aldhaln K, Zeki A, Zeki A (2010) Datamining and Islamic knowledge extraction: alhadith as a knowledge resources. In: Proceeding 3rd international conference on ICT4M, Jakarta, Indonesia, pp 21–25. Retrieved from http://irep.iium.edu.my/17123/1/WA_17123_AKRAM_Datamining_and_Islamic_KnowledgeExtraction.pdf
Aldhaln K, Zeki A, Zeki A (2012) Knowledge extraction in hadith using data mining technique. Int J Inf Technol Comput Sci 2:13–21. Retrieved from http://www.ijitcs.com/2ndicekmt/Kawther+AAldhaln.php
Aldhaln K, Zeki A, Zeki A, Alreshidi H (2012a) Improving knowledge extraction of Hadith classifier using decision tree algorithm. In: 2012 international conference on information retrieval knowledge management, pp 148–152. doi:10.1109/InfRKM.2012.6205024
Aldhaln K, Zeki A, Zeki A, Alreshidi H (2012b) Novel mechanism to improve Hadith classifier performance. In: 2012 international conference on advanced computer science applications and technologies (ACSAT), pp 512–517. doi:10.1109/ACSAT.2012.93
Al-Kabi MN, Al-Sinjilawi SI (2007) A comparative study of the efficiency of different measures to classify Arabic text. Univ Sharjah J Pure Appl Sci 4(2):13–26
Alkhatib M (2010) Classification of Al-Hadith Al-Shareef using data mining algorithm. In: European, mediterranean and middle eastern conference on information systems, EMCIS2010, Abu Dhabi, UAE, pp 1–23. Retrieved from http://www.iseing.org/emcis/emcis2010/Proceedings/AcceptedRefereedPapers/C20.pdf
Alrazou HM (2004) Computerized frame of the Prophetic tradition. In: 17th national conferences for computer, Medina, Saudi Arabia, pp 596–610
Alrazou HM (2008) Data mining application on the Islamic knowledge resource. Alukah. Retrieved from http://www.alukah.net/culture/0/3123/
Al-tarawneh R, Hamatta HSA, Muiadi H (2014) Novel approach for Arabic spell-checker: based on radix search tree. Int J Comput Appl 95(7):1–5
Althobaiti M, Kruschwitz U, Poesio M (2014) AraNLP: a Java-based Library for the processing of Arabic text. In: Calzolari N (Conference Chair), Choukri K, Declerck T, Loftsson H, Maegaard B, Mariani J, et al (eds) Proceedings of the ninth international conference on language resources and evaluation (LREC’14), European Language Resources Association (ELRA), Reykjavik, Iceland
Attia M, Toral A, Tounsi L, Monachini M, van Genabith J (2010) An automatically built named entity lexicon for Arabic. In: Calzolari N (Conference Chair), Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, et al (eds) Proceedings of the seventh international conference on language resources and evaluation (LREC’10), European Language Resources Association (ELRA), Valletta, Malta
Atwell E, Brierley C, Dukes K, Sawalha M, Sharaf A-B (2011) An Artificial intelligence approach to Arabic and Islamic content on the internet. In: Proceedings of NITS 3rd national information technology symposium, The University of Leeds, pp 1–13. doi:10.13140/2.1.2425.9528
Batyrzhan M, Kulzhanova BR, Abzhalov SU, Mukhitdinov RS (2014) Significance of the hadith of the Prophet Muhammad in Kazakh proverbs and sayings. Proced Social Behav Sci 116:4899–4904. doi:10.1016/j.sbspro.2014.01.1046
Bilal K, Mohsin S (2012) Muhadith: a cloud based distributed expert system for classification of ahadith. In: Proceedings of the 2012 10th international conference on frontiers of information technology, IEEE Computer Society, Washington, DC, USA, pp 73–78. doi:10.1109/FIT.2012.22
Cabena P, Hadjinian P, Stadler R, Verhees J, Zanasi A (1998) Discovering data mining: from concept to implementation. Prentice-Hall Inc, Upper Saddle River, NJ, USA
Choi FYY (2000) Advances in domain independent linear text segmentation. In: Proceedings of the 1st North American chapter of the association for computational linguistics conference, Association for Computational Linguistics, Stroudsburg, PA, USA, pp 26–33. Retrieved from http://dl.acm.org/citation.cfm?id=974305.974309
Fattāhizādeh F, Afshāri N (2010) An approach to understanding hadith in Wasāil al-Shia. Hadith Stud 4:67–98
Flachsbart B, Bond WE, St. Clair DC, Holland J (1994) Using the ID3 symbolic classification algorithm to reduce data density. In: Proceedings of the 1994 ACM symposium on applied computing, ACM, New York, NY, USA, pp 292–296. doi:10.1145/326619.326750
Ghazizadeh M, Zahedi MH, Kahani M, Bidgoli BM (2008) Fuzzy expert system in determining hadith validity. In: Sobh T (ed) Advances in computer and information sciences and engineering. Springer, Netherlands, pp 354–359. doi:10.1007/978-1-4020-8741-7_64
Hanum HM, Bakar ZA, Rahman NA, Rosli MM, Musa N (2014) Using topic analysis for querying halal information on Malay documents. Proced Social Behav Sci 121:214–222. doi:10.1016/j.sbspro.2014.01.1122
Harrag F (2014) Text mining approach for knowledge extraction in Sahîh Al-Bukhari. Comput Hum Behav 30:558–566. doi:10.1016/j.chb.2013.06.035
Harrag F, El-Qawasmeh E (2009) Neural network for Arabic text classification. In: Second international conference on the applications of digital information and web technologies, 2009. ICADIWT ’09, pp 778–783. doi:10.1109/ICADIWT.2009.5273841
Harrag F, El-Qawasmeh E, Pichappan P (2009) Improving arabic text categorization using decision trees. In: First international conference on networked digital technologies, 2009. NDT ’09, pp 110–115. doi:10.1109/NDT.2009.5272214
Harrag F, El-Qawasmeh E, Salman Al-Salman A (2011a) Extracting named entities from prophetic narration texts (hadith). In: Zain J, Wan Mohd W, El-Qawasmeh E (eds) Software engineering and computer systems, vol 180. Springer, Berlin, pp 289–297. doi:10.1007/978-3-642-22191-0_26
Harrag F, El-Qawasmeh E, Salman Al-Salman A (2011b) Stemming as a feature reduction technique for Arabic text categorization. In: 2011 10th international symposium on programming and systems (ISPS), pp. 128–133. doi:10.1109/ISPS.2011.5898874
Harrag F, Hamdi-Cherif A (2007) UML modeling of text mining in Arabic language and application to the prophetic traditions “Hadiths.” In: The 1st international sysmposium on computers and Arabic language and exhibition, KACST & SCS, pp 11–20. Retrieved from iscal.org.sa
Harrag F, Hamdi-Cherif A, El-Qawasmeh E (2008) Vector space model for Arabic information retrieval—application to “Hadith” indexing. In: First international conference on the applications of digital information and web technologies, 2008. ICADIWT 2008, pp 107–112. doi:10.1109/ICADIWT.2008.4664328
Harrag F, Hamdi-Cherif A, Salman Al-Salman A, El-Qawasmeh E (2009) Experiments in improvement of Arabic information retrieval. In: 3rd IEEE international conference on Arabic language processing, Rabat, Morocco, pp 71–81
Hsu C-W, Lin C-J (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425. doi:10.1109/72.991427
Hu K, Lu Y, Zhou L, Shi C (1999) Integrating classification and association rule mining: a concept lattice framework. In: Zhong N, Skowron A, Ohsuga S (eds) New directions in rough sets, data mining, and granular-soft computing, vol 1711. Springer, Berlin, pp 443–447. doi:10.1007/978-3-540-48061-7_53
Jbara K (2010) Knowledge discovery in Al-Hadith Using text classification algorithm. J Am Sci 6(11):485–494
Kabi MNA, Kanaan G, Al-Shalabi R, Al-Sinjilawi SI, Al-Mustafa RS (2005) Al-Hadith text classifier. J Appl Sci 5(3):584–587. doi:10.3923/jas.2005.584.587
Kamsin A, Gani A, Suliaman I, Jaafar S, Mahmud R, Sabri AQM, et al (2014) Developing the novel Quran and Hadith authentication system. In: 2nd international conference on islamic applications in computer science and technology, Amman, Jordan, pp 1–8. Retrieved from http://umexpert.um.edu.my/file/publication/00006192_111415.pdf
Karim NSA, Hazmi NR (2005) Assessing islamic information quality on the internet: a case of information about hadith. Malays J Libr Inf Sci 10(2):51–66
Lacroix S (2008) Al-Albani’s revolutionary approach to hadith. ISIM Rev 21(1):6–7
Maazouzi F, Bahi H (2012) Using multi decision tree technique to improving decision tree classifier. Int J Bus Intell Data Min 7(4):274–287. doi:10.1504/IJBIDM.2012.051712
Melchert C (2002) Early renunciants as Hadīth transmitters. Muslim World 92(3–4):407–418. doi:10.1111/j.1478-1913.2002.tb03750.x
Najeeb MM (2014) Towards innovative system for Hadith Isnad processing. Int J Comput Trends Technol 18(6):257–259. doi:10.14445/22312803/IJCTT-V18P154
Papadimitriou CH, Tamaki H, Raghavan P, Vempala S (1998) Latent semantic indexing: a probabilistic analysis. In: Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, ACM, New York, NY, USA, pp. 159–168. doi:10.1145/275487.275505
Ragas H, Koster CHA (1998) Four text classification algorithms compared on a Dutch corpus. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, ACM, New York, NY, USA, pp 369–370. doi:10.1145/290941.291059
Rennie JD, Shih L, Teevan J, Karger D (2003) Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the twentieth international conference on machine learning, ICML-2003, Washington DC, pp 616–623
Roche E, Shabes Y (eds) (1997) Finite-state language processing. MIT Press, Cambridge, MA, USA
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620. doi:10.1145/361219.361220
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47. doi:10.1145/505282.505283
Shatnawi MQ, Abuein QQ, Darwish O (2012) Verification hadith correctness in islamic web pages using information retrieval techniques. Int J Comput Appl. doi:10.5120/6327-8680
Tata S, Patel JM (2007) Estimating the selectivity of tf-idf based cosine similarity predicates. SIGMOD Rec 36(4):75–80. doi:10.1145/1361348.1361351
The Encyclopedia of the Nine Books for the Honorable Prophetic Traditions! (1997). Sakhr Company. Retrieved from http://www.harf.com
Yang Y, Liu X (1999) A re-examination of text categorization methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, ACM, New York, NY, USA, pp 42–49. doi:10.1145/312624.312647
Yusoff Y, Ismail R, Hassan Z (2010) Adopting hadith verification techniques in to digital evidence authentication. J Comput Sci 6(6):613–618
Zhang M-L, Zhou Z-H (2005) A k-nearest neighbour based algorithm for multi-label classification. In: 2005 IEEE international conference on granular computing, vol. 2, pp 718–721. doi:10.1109/GRC.2005.1547385
Acknowledgments
The research for this paper was financially supported by the University of Malaya UMRG Grant (RP003C-14HNE).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Saloot, M.A., Idris, N., Mahmud, R. et al. Hadith data mining and classification: a comparative analysis. Artif Intell Rev 46, 113–128 (2016). https://doi.org/10.1007/s10462-016-9458-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-016-9458-x