Error Annotation of the Arabic Learner Corpus

A New Error Tagset
  • Abdullah Alfaifi
  • Eric Atwell
  • Ghazi Abuhakema
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8105)


This paper introduces a new two-level error tagset, AALETA (Alfaifi Atwell Leeds Error Tagset for Arabic), to be used for annotating the Arabic Learner Corpora (ALC). The new tagset includes six broad classes, subdivided into 37 more specific error types or subcategories. It is easily understood by Arabic corpus error annotators. AALEETA is based on an existing error tagset for Arabic corpora, ARIDA, created by Abuhakema et al. [1], and a number of other error-analysis studies. It was used to annotate texts of the Arabic Learner Corpus [2]. The paper shows the tagset broad classes and types or subcategories and an example of annotation. The understandability of AALETA was measured against that of ARIDA, and the preliminary results showed that AALETA achieved a slightly higher score. Annotators reported that they preferred using AALETA over ARIDA.


error tagset Arabic corpus learner 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abuhakema, G., Feldman, A., Fitzpatrick, E.: ARIDA: An Arabic Interlanguage Database and Its Applications: A Pilot Study. Journal of the National Council of Less Commonly Taught Languages (JNCOLCTL) 7, 161–184 (2009)Google Scholar
  2. 2.
    Alfaifi, A. and E. Atwell. المدونات اللغوية لمتعلمي اللغة العربية: نظامٌ لتصنيف وترميز الأخطاء اللغوية (in Arabic)"Arabic Learner Corpora (ALC): A Taxonomy of Coding Errors". in 8th International Computing Conference in Arabic (ICCA 2012) 26-28 December 2012. 2012. Cairo, Egypt. Google Scholar
  3. 3.
    Granger, S.: The International Corpus of Learner English: A New Resource for Foreign Language Learning and Teaching and Second Language Acquisition Research. TESOL Quarterly 37(3), 538–546 (2003)CrossRefGoogle Scholar
  4. 4.
    Nesselhauf, N.: Learner Corpora and Their Potential in Language Teaching. In: Sinclair, J. (ed.) How to Use Corpora in Language Teaching, pp. 125–152. Benjamins, Amsterdam (2004)Google Scholar
  5. 5.
    Buttery, P., Caines, A.: Normalising Frequency Counts to Account for ‘opportunity of use’ in Learner Corpora. In: Tono, Y., Kawaguchi, Y., Minegishi, M. (eds.) Developmental and Crosslinguistic Perspectives in Learner Corpus Research, pp. 187–204. John Benjamins, Amsterdam (2012)Google Scholar
  6. 6.
    Meunier, F., et al.: The LONGDALE (Longitudinal Database of Learner English), [cited 2012, September 14] (2010),
  7. 7.
    Diez-Bedmar, M.B.: Written Learner Corpora by Spanish Students of English: an overview. In: Gómez, P.C., Pére, A.S. (eds.) A Survey on Corpus-based Research, Proceedings of the AELINCO Conference, pp. 920–933. Asociación Española de Lingüística del Corpus, Murcia (2009)Google Scholar
  8. 8.
    Hammarberg, B.: Introduction to the ASU Corpus, a Longitudinal Oral and Written Text Corpus of Adult Learners’ Swedish with a Corresponding Part from Native Swedes. Stockholm University, Department of Linguistics (2010)Google Scholar
  9. 9.
    Dagneaux, E., et al.: Error tagging manual (1996)Google Scholar
  10. 10.
    Granger, S.: Error-tagged Learner Corpora and CALL: A Promising Synergy. CALICO Journal 20(3), 465–480 (2003)Google Scholar
  11. 11.
    Nicholls, D.: The Cambridge Learner Corpus - error coding and analysis for lexicography and ELT. In: Corpus Linguistics 2003 Conference (CL 2003), Lancaster, UK (2003)Google Scholar
  12. 12.
    Izumi, E., Uchimoto, K., Isahara, H.: Error anotation for corpus of Japanese learner English. In: Sixth International Workshop on Linguistically Interpreted Corpora (LINC 2005), Jeju Island, Korea, October 15 (2005)Google Scholar
  13. 13.
    Alosaili, A.I., الأخطاء الشائعة في الكلام لدى طلاب اللغة العربية الناطقين بلغات أخرى: دراسة وصفية تحليلية (in Arabic) "Common Errors in Speech Production of Non-Native Arabic Learners". 1985, Al Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia. Google Scholar
  14. 14.
    Alateeq, Z.M., تحليل الأخطاء الدلالية لدى دارسي اللغة العربية من غير الناطقين بها في مادة التعبير الكتابي (in Arabic) "Semantic Errors Analysis of Non-Native Arabic Learners in Writing". 1992, Al Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia. Google Scholar
  15. 15.
    Alhamad, M.M.: تحليل أخطاء التعبير الكتابي لدى المستوى المتقدم من دارسي العربية غير الناطقين بها في جامعة الملك سعود (in Arabic)"Writing Errors Analysis of Advanced-Level Arabic Learners at King Saud University. Al Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia (1994) Google Scholar
  16. 16.
    Alaqeeli, A.S.: تحليل الأخطاء في بعض أنماط الجملة الفعلية للغة العربية في الأداء الكتابي لدى دارسي المستوى المتقدم (in Arabic). Error Analysis in Some Verbal Sentence Patterns of Arabic in Writing Production of Advanced-Level Learners, Al Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Abdullah Alfaifi
    • 1
  • Eric Atwell
    • 1
  • Ghazi Abuhakema
    • 2
  1. 1.University of LeedsLeedsUK
  2. 2.College of CharlestonUSA

Personalised recommendations