Skip to main content

Hadith data mining and classification: a comparative analysis


Hadiths are important textual sources of law, tradition, and teaching in the Islamic world. Analyzing the unique linguistic features of Hadiths (e.g. ancient Arabic language and story-like text) results to compile and utilize specific natural language processing methods. In the literature, no study is solely focused on Hadith from artificial intelligence perspective, while many new developments have been overlooked and need to be highlighted. Therefore, this review analyze all academic journal and conference publications that using two main methods of artificial intelligence for Hadith text: Hadith classification and mining. All Hadith relevant methods and algorithms from the literature are discussed and analyzed in terms of functionality, simplicity, F-score and accuracy. Using various different Hadith datasets makes a direct comparison between the evaluation results impossible. Therefore, we have re-implemented and evaluated the methods using a single dataset (i.e. 3150 Hadiths from Sahih Al-Bukhari book). The result of evaluation on the classification method reveals that neural networks classify the Hadith with 94 % accuracy. This is because neural networks are capable of handling complex (high dimensional) input data. The Hadith mining method that combines vector space model, Cosine similarity, and enriched queries obtains the best accuracy result (i.e. 88 %) among other re-evaluated Hadith mining methods. The most important aspect in Hadith mining methods is query expansion since the query must be fitted to the Hadith lingo. The lack of knowledge based methods is evident in Hadith classification and mining approaches and this absence can be covered in future works using knowledge graphs.

This is a preview of subscription content, access via your institution.

Fig. 1


  • Abuzeina DEM (2011) Utilizing data-driven and knowledge-based techniques to enhance Arabic speech recognition. King Fahd University of Petroleum and Minerals, Saudi Arabia

    Google Scholar 

  • Aizawa A (2003) An information-theoretic perspective of tf-idf measures. Inf Process Manag 39(1):45–65. doi:10.1016/S0306-4573(02)00021-3

    MathSciNet  Article  MATH  Google Scholar 

  • Al Kharashi IA, Al Sughaiyer IA (2002) Rule merging in a rule-based Arabic stemmer. In: Proceedings of the 19th international conference on computational linguistics, vol 1. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 1–7. doi:10.3115/1072228.1072265

  • Aldhaln K, Zeki A, Zeki A (2010) Datamining and Islamic knowledge extraction: alhadith as a knowledge resources. In: Proceeding 3rd international conference on ICT4M, Jakarta, Indonesia, pp 21–25. Retrieved from

  • Aldhaln K, Zeki A, Zeki A (2012) Knowledge extraction in hadith using data mining technique. Int J Inf Technol Comput Sci 2:13–21. Retrieved from

  • Aldhaln K, Zeki A, Zeki A, Alreshidi H (2012a) Improving knowledge extraction of Hadith classifier using decision tree algorithm. In: 2012 international conference on information retrieval knowledge management, pp 148–152. doi:10.1109/InfRKM.2012.6205024

  • Aldhaln K, Zeki A, Zeki A, Alreshidi H (2012b) Novel mechanism to improve Hadith classifier performance. In: 2012 international conference on advanced computer science applications and technologies (ACSAT), pp 512–517. doi:10.1109/ACSAT.2012.93

  • Al-Kabi MN, Al-Sinjilawi SI (2007) A comparative study of the efficiency of different measures to classify Arabic text. Univ Sharjah J Pure Appl Sci 4(2):13–26

    Google Scholar 

  • Alkhatib M (2010) Classification of Al-Hadith Al-Shareef using data mining algorithm. In: European, mediterranean and middle eastern conference on information systems, EMCIS2010, Abu Dhabi, UAE, pp 1–23. Retrieved from

  • Alrazou HM (2004) Computerized frame of the Prophetic tradition. In: 17th national conferences for computer, Medina, Saudi Arabia, pp 596–610

  • Alrazou HM (2008) Data mining application on the Islamic knowledge resource. Alukah. Retrieved from

  • Al-tarawneh R, Hamatta HSA, Muiadi H (2014) Novel approach for Arabic spell-checker: based on radix search tree. Int J Comput Appl 95(7):1–5

    Google Scholar 

  • Althobaiti M, Kruschwitz U, Poesio M (2014) AraNLP: a Java-based Library for the processing of Arabic text. In: Calzolari N (Conference Chair), Choukri K, Declerck T, Loftsson H, Maegaard B, Mariani J, et al (eds) Proceedings of the ninth international conference on language resources and evaluation (LREC’14), European Language Resources Association (ELRA), Reykjavik, Iceland

  • Attia M, Toral A, Tounsi L, Monachini M, van Genabith J (2010) An automatically built named entity lexicon for Arabic. In: Calzolari N (Conference Chair), Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, et al (eds) Proceedings of the seventh international conference on language resources and evaluation (LREC’10), European Language Resources Association (ELRA), Valletta, Malta

  • Atwell E, Brierley C, Dukes K, Sawalha M, Sharaf A-B (2011) An Artificial intelligence approach to Arabic and Islamic content on the internet. In: Proceedings of NITS 3rd national information technology symposium, The University of Leeds, pp 1–13. doi:10.13140/2.1.2425.9528

  • Batyrzhan M, Kulzhanova BR, Abzhalov SU, Mukhitdinov RS (2014) Significance of the hadith of the Prophet Muhammad in Kazakh proverbs and sayings. Proced Social Behav Sci 116:4899–4904. doi:10.1016/j.sbspro.2014.01.1046

    Article  Google Scholar 

  • Bilal K, Mohsin S (2012) Muhadith: a cloud based distributed expert system for classification of ahadith. In: Proceedings of the 2012 10th international conference on frontiers of information technology, IEEE Computer Society, Washington, DC, USA, pp 73–78. doi:10.1109/FIT.2012.22

  • Cabena P, Hadjinian P, Stadler R, Verhees J, Zanasi A (1998) Discovering data mining: from concept to implementation. Prentice-Hall Inc, Upper Saddle River, NJ, USA

    Google Scholar 

  • Choi FYY (2000) Advances in domain independent linear text segmentation. In: Proceedings of the 1st North American chapter of the association for computational linguistics conference, Association for Computational Linguistics, Stroudsburg, PA, USA, pp 26–33. Retrieved from

  • Fattāhizādeh F, Afshāri N (2010) An approach to understanding hadith in Wasāil al-Shia. Hadith Stud 4:67–98

  • Flachsbart B, Bond WE, St. Clair DC, Holland J (1994) Using the ID3 symbolic classification algorithm to reduce data density. In: Proceedings of the 1994 ACM symposium on applied computing, ACM, New York, NY, USA, pp 292–296. doi:10.1145/326619.326750

  • Ghazizadeh M, Zahedi MH, Kahani M, Bidgoli BM (2008) Fuzzy expert system in determining hadith validity. In: Sobh T (ed) Advances in computer and information sciences and engineering. Springer, Netherlands, pp 354–359. doi:10.1007/978-1-4020-8741-7_64

    Chapter  Google Scholar 

  • Hanum HM, Bakar ZA, Rahman NA, Rosli MM, Musa N (2014) Using topic analysis for querying halal information on Malay documents. Proced Social Behav Sci 121:214–222. doi:10.1016/j.sbspro.2014.01.1122

    Article  Google Scholar 

  • Harrag F (2014) Text mining approach for knowledge extraction in Sahîh Al-Bukhari. Comput Hum Behav 30:558–566. doi:10.1016/j.chb.2013.06.035

    Article  Google Scholar 

  • Harrag F, El-Qawasmeh E (2009) Neural network for Arabic text classification. In: Second international conference on the applications of digital information and web technologies, 2009. ICADIWT ’09, pp 778–783. doi:10.1109/ICADIWT.2009.5273841

  • Harrag F, El-Qawasmeh E, Pichappan P (2009) Improving arabic text categorization using decision trees. In: First international conference on networked digital technologies, 2009. NDT ’09, pp 110–115. doi:10.1109/NDT.2009.5272214

  • Harrag F, El-Qawasmeh E, Salman Al-Salman A (2011a) Extracting named entities from prophetic narration texts (hadith). In: Zain J, Wan Mohd W, El-Qawasmeh E (eds) Software engineering and computer systems, vol 180. Springer, Berlin, pp 289–297. doi:10.1007/978-3-642-22191-0_26

    Chapter  Google Scholar 

  • Harrag F, El-Qawasmeh E, Salman Al-Salman A (2011b) Stemming as a feature reduction technique for Arabic text categorization. In: 2011 10th international symposium on programming and systems (ISPS), pp. 128–133. doi:10.1109/ISPS.2011.5898874

  • Harrag F, Hamdi-Cherif A (2007) UML modeling of text mining in Arabic language and application to the prophetic traditions “Hadiths.” In: The 1st international sysmposium on computers and Arabic language and exhibition, KACST & SCS, pp 11–20. Retrieved from

  • Harrag F, Hamdi-Cherif A, El-Qawasmeh E (2008) Vector space model for Arabic information retrieval—application to “Hadith” indexing. In: First international conference on the applications of digital information and web technologies, 2008. ICADIWT 2008, pp 107–112. doi:10.1109/ICADIWT.2008.4664328

  • Harrag F, Hamdi-Cherif A, Salman Al-Salman A, El-Qawasmeh E (2009) Experiments in improvement of Arabic information retrieval. In: 3rd IEEE international conference on Arabic language processing, Rabat, Morocco, pp 71–81

  • Hsu C-W, Lin C-J (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425. doi:10.1109/72.991427

    Article  Google Scholar 

  • Hu K, Lu Y, Zhou L, Shi C (1999) Integrating classification and association rule mining: a concept lattice framework. In: Zhong N, Skowron A, Ohsuga S (eds) New directions in rough sets, data mining, and granular-soft computing, vol 1711. Springer, Berlin, pp 443–447. doi:10.1007/978-3-540-48061-7_53

    Chapter  Google Scholar 

  • Jbara K (2010) Knowledge discovery in Al-Hadith Using text classification algorithm. J Am Sci 6(11):485–494

    Google Scholar 

  • Kabi MNA, Kanaan G, Al-Shalabi R, Al-Sinjilawi SI, Al-Mustafa RS (2005) Al-Hadith text classifier. J Appl Sci 5(3):584–587. doi:10.3923/jas.2005.584.587

    Article  Google Scholar 

  • Kamsin A, Gani A, Suliaman I, Jaafar S, Mahmud R, Sabri AQM, et al (2014) Developing the novel Quran and Hadith authentication system. In: 2nd international conference on islamic applications in computer science and technology, Amman, Jordan, pp 1–8. Retrieved from

  • Karim NSA, Hazmi NR (2005) Assessing islamic information quality on the internet: a case of information about hadith. Malays J Libr Inf Sci 10(2):51–66

    Google Scholar 

  • Lacroix S (2008) Al-Albani’s revolutionary approach to hadith. ISIM Rev 21(1):6–7

    Google Scholar 

  • Maazouzi F, Bahi H (2012) Using multi decision tree technique to improving decision tree classifier. Int J Bus Intell Data Min 7(4):274–287. doi:10.1504/IJBIDM.2012.051712

    Article  Google Scholar 

  • Melchert C (2002) Early renunciants as Hadīth transmitters. Muslim World 92(3–4):407–418. doi:10.1111/j.1478-1913.2002.tb03750.x

    Article  Google Scholar 

  • Najeeb MM (2014) Towards innovative system for Hadith Isnad processing. Int J Comput Trends Technol 18(6):257–259. doi:10.14445/22312803/IJCTT-V18P154

    Article  Google Scholar 

  • Papadimitriou CH, Tamaki H, Raghavan P, Vempala S (1998) Latent semantic indexing: a probabilistic analysis. In: Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, ACM, New York, NY, USA, pp. 159–168. doi:10.1145/275487.275505

  • Ragas H, Koster CHA (1998) Four text classification algorithms compared on a Dutch corpus. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, ACM, New York, NY, USA, pp 369–370. doi:10.1145/290941.291059

  • Rennie JD, Shih L, Teevan J, Karger D (2003) Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the twentieth international conference on machine learning, ICML-2003, Washington DC, pp 616–623

  • Roche E, Shabes Y (eds) (1997) Finite-state language processing. MIT Press, Cambridge, MA, USA

    MATH  Google Scholar 

  • Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620. doi:10.1145/361219.361220

    Article  MATH  Google Scholar 

  • Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47. doi:10.1145/505282.505283

    Article  Google Scholar 

  • Shatnawi MQ, Abuein QQ, Darwish O (2012) Verification hadith correctness in islamic web pages using information retrieval techniques. Int J Comput Appl. doi:10.5120/6327-8680

    Google Scholar 

  • Tata S, Patel JM (2007) Estimating the selectivity of tf-idf based cosine similarity predicates. SIGMOD Rec 36(4):75–80. doi:10.1145/1361348.1361351

    Article  Google Scholar 

  • The Encyclopedia of the Nine Books for the Honorable Prophetic Traditions! (1997). Sakhr Company. Retrieved from

  • Yang Y, Liu X (1999) A re-examination of text categorization methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, ACM, New York, NY, USA, pp 42–49. doi:10.1145/312624.312647

  • Yusoff Y, Ismail R, Hassan Z (2010) Adopting hadith verification techniques in to digital evidence authentication. J Comput Sci 6(6):613–618

    Article  Google Scholar 

  • Zhang M-L, Zhou Z-H (2005) A k-nearest neighbour based algorithm for multi-label classification. In: 2005 IEEE international conference on granular computing, vol. 2, pp 718–721. doi:10.1109/GRC.2005.1547385

Download references


The research for this paper was financially supported by the University of Malaya UMRG Grant (RP003C-14HNE).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Mohammad Arshi Saloot.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Saloot, M.A., Idris, N., Mahmud, R. et al. Hadith data mining and classification: a comparative analysis. Artif Intell Rev 46, 113–128 (2016).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Review
  • Comparison
  • Islamic knowledge
  • Hadith
  • Classification
  • Data mining