Comparative evaluation of tools for Arabic corpora search and analysis


As the number of Arabic corpora is constantly increasing, there is an obvious and growing need for concordancing software for corpus search and analysis that supports as many features as possible of the Arabic language, and provides users with a greater number of functions. This paper evaluates six existing corpus search and analysis tools based on eight criteria which seem to be the most essential for searching and analysing Arabic corpora, such as displaying Arabic text in its right-to-left direction, normalising diacritics and Hamza, and providing an Arabic user interface. The results of the evaluation revealed that three tools: Khawas, Sketch Engine, and aConCorde, have met most of the evaluation criteria and achieved the highest benchmark scores. The paper concluded that developers’ conscious consideration of the linguistic features of Arabic when designing these three tools was the most significant factor behind their superiority.

The authors would like to thank the developers, Abdulmohsen Althubaity, Andrew Roberts, Laurence Anthony, Mike Scott, Adam Kilgarriff and James Wilson for their valuable comments and suggestions to improve the quality of the paper.

  • Arabic
  • Tool
  • Corpus
  • Evaluation
  • Analysis