Skip to main content

Comparative evaluation of tools for Arabic corpora search and analysis


As the number of Arabic corpora is constantly increasing, there is an obvious and growing need for concordancing software for corpus search and analysis that supports as many features as possible of the Arabic language, and provides users with a greater number of functions. This paper evaluates six existing corpus search and analysis tools based on eight criteria which seem to be the most essential for searching and analysing Arabic corpora, such as displaying Arabic text in its right-to-left direction, normalising diacritics and Hamza, and providing an Arabic user interface. The results of the evaluation revealed that three tools: Khawas, Sketch Engine, and aConCorde, have met most of the evaluation criteria and achieved the highest benchmark scores. The paper concluded that developers’ conscious consideration of the linguistic features of Arabic when designing these three tools was the most significant factor behind their superiority.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. The ALC may be accessed here

  2. The manual can be accessed here


  • Alansary, S., Nagi, M., & Adly, N. (2007). Building an international corpus of Arabic (ICA): Progress of compilation stage. Paper presented at the seventh conference of language engineering ESOLEC (5–6 December 2007), Cairo, Egypt.

  • Alfaifi, A., Atwell, E., & Hedaya, I. (2014). Arabic learner corpus (ALC) v. 2: A new written and spoken corpus of Arabic learners. In S. Ishikawa (Ed.), Learner corpus studies in Asia and the World (Vol. 2, pp. 77–89). Papers from LCSAW2014. Kobe: School of Languages and Communication, Kobe University.

  • Al-Khalifa, H., & Al-Thubaity, A. (Eds.) (2014). In Proceedings of the workshop on free/open-source Arabic corpora and corpora processing tools, Reykjavik, Iceland.

  • Al-Sulaiti, L. (2010). Arabic corpora. The University of Leeds, Latifa Al-Sulaiti’s Homepage:

  • Al-Sulaiti, L., & Atwell, E. (2006). The design of a corpus of contemporary Arabic. International Journal of Corpus Linguistics, 11, 135–171.

    Article  Google Scholar 

  • Al-Thubaity, A., & Al-Mazrua, M. (2014). Khawas: Arabic Corpora Processing Tool USER GUIDE. Retrieved April 6, 2014, from

  • Al-Thubaity, A., Khan, M., Al-Mazrua, M., & Almoussa, M. (2013). New language resources for Arabic Corpora containing more than two million words and a corpus processing tool. In Proceedings of IALP international conference on Asian language processing, Urumqui, Xinjiang Uyghur Autonomous Region, China (pp. 67–70).

  • Al-Thubaity, A., Khan, M., Al-Mazrua, M., & Almoussa, M. (2014). KACST Arabic Corpora Processing Tool “Khawas” [Computer Software]. Retrieved April 6, 2014, from

  • AntConc-discussion. (2013). AntConc and Arabic Texts. Retrieved September 20, 2014, from

  • Anthony, L. (2005). AntCone: design and development of a freeware corpus analysis toolkit for the technical writing classroom. In Proceedings of IPCC international professional communication conference, Limerick (pp. 729–737).

  • Anthony, L. (2014a). AntConc, (Version 3.4.2) [Computer Software]. Tokyo, Japan: Waseda University.

  • Anthony, L. (2014b). AntConc 3.4.2Readme. Tokyo, Japan: Waseda University.

  • Atwell, E., & Hardie, A. (Eds.) (2013). In Proceedings of WACL’2, 22nd to 26th July 2013. Lancaster: Lancaster University.

  • Atwell, E.S., Al-Sulaiti, L., Al-Osaimi, S., & Abu Shawar, B. A. (2004). A review of Arabic corpus analysis tools—un examen d’outils pour l’analyse de corpora Arabes. In B. Bel & I. Marlien (Eds.) Proceedings of TALN04: XI Conference sur le Traitement Automatique des Langues Naturelles (Vol. 2, pp. 229–234).

  • Burnard, L. (2005). Metadata for corpus work. In M. Wynne (Ed.), Developing linguistic corpora: A guide to good practice (pp. 30–46). Oxford: Oxbow Books.

    Google Scholar 

  • Habash, N. (2010). Introduction to Arabic natural language processing. In G. Hirst (Ed.), Synthesis lectures on human language technologies. San Rafael, CA: Morgan and Claypool.

    Google Scholar 

  • Kilgarriff, A. (2014). Sketch engine [Computer Software]. Retrieved April 6, 2014, from

  • Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D. (2004). The sketch engine. In Proceedings of the Euralex, 610 July 2004, pp. 105–116, Lorient, France.

  • Roberts, A. (2014). aConCorde [Computer Software]. Retrieved April 6, 2014, from

  • Roberts, A., Al-Sulaiti, L., & Atwell, E. (2006). aConCorde: Towards an open-source, extendable concordancer for Arabic. Corpora (Vol. 1, pp. 39–60).

  • Samy, W., & Samy, L. (2014). Basic arabic: A grammar and workbook. London: Routledge.

    Google Scholar 

  • Scott, M. (2008). Developing Wordsmith. International Journal of English Studies, 8(1), 95–106.

    Google Scholar 

  • Scott, M. (2012). WordSmith Tools version 6 [Computer Software], Liverpool: Lexical Analysis Software. Retrieved September 16, 2014, from

  • Sharoff, S. (2014). IntelliText Corpus Queries [Computer Software]. Retrieved April 6, 2014, from

  • Sketch Engine. (2014). Overview of language integration in Sketch Engine. Retrieved September 22, 2014, from

  • Wiechmann, D., & Fuhs, S. (2006). Concordancing software. Corpus Linguistics and Linguistic Theory Journal, 2(1), 107–127.

    Google Scholar 

  • Wilson, J., Hartley, A., Sharoff, S., & Stephenson, P. (2010). Advanced corpus solutions for humanities researchers. In Proceedings of PACLIC 24, Sendai, Japan.

  • WordSmith Tools. (2013). WordSmith Tools Manual. Retrieved September 22, 2014, from

Download references


The authors would like to thank the developers, Abdulmohsen Althubaity, Andrew Roberts, Laurence Anthony, Mike Scott, Adam Kilgarriff and James Wilson for their valuable comments and suggestions to improve the quality of the paper.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Abdullah Alfaifi.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Alfaifi, A., Atwell, E. Comparative evaluation of tools for Arabic corpora search and analysis. Int J Speech Technol 19, 347–357 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Arabic
  • Tool
  • Corpus
  • Evaluation
  • Analysis