Skip to main content

Learner Corpora

  • Chapter
  • First Online:
A Practical Handbook of Corpus Linguistics

Abstract

This chapter deals with learner corpora, that is, collections of (spoken and/or written) texts produced by learners of a language. It describes their main characteristics, with particular emphasis on those that are distinctive of learner corpora. Special types of corpora are introduced, such as longitudinal learner corpora or local learner corpora. The issues of the metadata accompanying learner corpora and the annotation of learner corpora are also discussed, and the challenges they involve are highlighted. Several methods of analysis designed to deal with learner corpora are presented, including Contrastive Interlanguage Analysis, Computer-aided Error Analysis and the Integrated Contrastive Model. The development of the field of learner corpus research is sketched, and possible future directions are examined, in terms of the size of learner corpora, their diversity, or the techniques of compilation and analysis. The chapter also features representative corpus-based studies of learner language, representative learner corpora, tools and resources related to learner corpora, and annotated references for further reading.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://merlin-platform.eu/. Accessed 22 May 2019.

  2. 2.

    https://korpling.german.hu-berlin.de/falko-suche/. Accessed 22 May 2019.

  3. 3.

    https://uclouvain.be/en/research-institutes/ilc/cecl/proceed.html. Accessed 22 May 2019.

  4. 4.

    https://childes.talkbank.org/. Accessed 22 May 2019.

  5. 5.

    www.flloc.soton.ac.uk/. Accessed 22 May 2019.

References

  • Alexopoulou, T., Geertzen, J., Korhonen, A., & Meurers, D. (2015). Exploring big educational learner corpora for SLA research: Perspectives on relative clauses. International Journal of Learner Corpus Research, 1(1), 96–129.

    Article  Google Scholar 

  • Altenberg, B., & Granger, S. (2001). The grammatical and lexical patterning of make in native and non-native student writing. Applied Linguistics, 22(2), 173–194.

    Article  Google Scholar 

  • Belz, J., & Vyatkina, N. (2005). Learner corpus analysis and the development of L2 pragmatic competence in networked inter-cultural language study: The case of German modal particles. The Canadian Modern Language Review/La revue canadienne des langues vivantes, 62(1):17–48.

    Google Scholar 

  • Blanchard, D., Tetreault, J., Higgins, D., Cahill, A., & Chodorow, M. (2013). TOEFL11: A corpus of non-native English. Princeton: Educational Testing Service.

    Google Scholar 

  • Caines, A., & Buttery, P. (2014). The effect of disfluencies and learner errors on the parsing of spoken learner language. In First joint workshop on statistical parsing of morphologically rich languages and syntactic analysis of non-canonical languages (pp. 74–81). Dublin, Ireland, August 23–29, 2014.

    Google Scholar 

  • Callies, M. (2015). Learner corpus methodology. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 35–55). Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  • Castagnoli, S., Ciobanu, D., Kunz, K., Kübler, N., & Volanschi, A. (2011). Designing a learner translator corpus for training purposes. In N. Kübler (Ed.), Corpora, language, teaching, and resources: From theory to practice (pp. 221–248). Bern: Peter Lang.

    Google Scholar 

  • Crossley, S. A., & McNamara, D. S. (2012). Interlanguage talk: A computational analysis of non-native speakers’ lexical production and exposure. In P. M. McCarthy & C. Boonthum-Denecke (Eds.), Applied natural language processing: Identification, investigation and resolution (pp. 425–437). Hershey: IGI Global.

    Chapter  Google Scholar 

  • Dagneaux, E., Denness, S., & Granger, S. (1998). Computer-aided error analysis. System, 26(2), 163–174.

    Article  Google Scholar 

  • Dagneaux, E., Denness, S., Granger, S., Meunier, F., Neff, J. A., & Thewissen, J. (2008). Error tagging manual version 1.3. Louvain-la-Neuve: Centre for English Corpus Linguistics.

    Google Scholar 

  • de Bot, K., Lowie, W., & Verspoor, M. (2007). A Dynamic Systems Theory approach to second language acquisition. Bilingualism: Language and Cognition, 10(1), 7–21.

    Article  Google Scholar 

  • De Felice, R., & Pulman, S. (2009). Automatic detection of preposition errors in learner writing. CALICO Journal, 26(3), 512–528.

    Article  Google Scholar 

  • de Haan, P. (1984). Problem-oriented tagging of English corpus data. In J. Aarts & W. Meijs (Eds.), Corpus linguistics: Recent developments in the use of computer corpora (pp. 123–139). Amsterdam: Rodopi.

    Google Scholar 

  • de Haan, P. (2000). Tagging non-native English with the TOSCA–ICLE tagger. In C. Mair & M. Hundt (Eds.), Corpus linguistics and linguistic theory (pp. 69–79). Amsterdam: Rodopi.

    Google Scholar 

  • De Knop, S., & Meunier, F. (2015). The ‘learner corpus research, cognitive linguistics and second language acquisition’ nexus: A SWOT analysis. Corpus Linguistics and Linguistic Theory, 11(1), 1–18.

    Article  Google Scholar 

  • Díaz-Negrillo, A., Meurers, D., Valera, S., & Wunsch, H. (2010). Towards interlanguage POS annotation for effective learner corpora in SLA and FLT. Language Forum, 36(1–2), 139–154.

    Google Scholar 

  • Ellis, N. C., Simpson-Vlach, R., Römer, U., O’Donnell, M. B., & Wulff, S. (2015). Learner corpora and formulaic language in second language acquisition research. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 357–378). Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  • Flowerdew, L. (2015). Learner corpora and language for academic and specific purposes. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 465–484). Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  • Gablasova, D., Brezina, V., McEnery, T., & Boyd, E. (2017). Epistemic stance in spoken L2 English: The effect of task and speaker style. Applied Linguistics, 38(5), 613–637.

    Article  Google Scholar 

  • García Lecumberri, M. L., Cooke, M., & Wester, M. (2017). A bi-directional task-based corpus of learners’ conversational speech. International Journal of Learner Corpus Research, 3(2), 175–195.

    Article  Google Scholar 

  • Geertzen, J., Alexopoulou, T., & Korhonen, A. (2014). Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge Open Language Database (EFCamDat). In R. T. I. Miller, K. Martin, C. M. Eddington, A. Henery, N. Marcos Miguel, A. M. Tseng, A. Tuninetti, & D. Walter (Eds.), Selected proceedings of the 2012 second language research forum: Building bridges between disciplines (pp. 240–254). Somerville: Cascadilla Proceedings Project.

    Google Scholar 

  • Gilquin, G. (2000/2001). The integrated contrastive model: Spicing up your data. Languages in Contrast, 3(1), 95–123.

    Article  Google Scholar 

  • Gilquin, G. (2012). Lexical infelicity in English causative constructions. Comparing native and learner collostructions. In J. Leino & R. V. Waldenfels (Eds.), Analytical causatives. From ‘give’ and ‘come’ to ‘let’ and ‘make’ (pp. 41–63). München: Lincom Europa.

    Google Scholar 

  • Gilquin, G. (2016). Discourse markers in L2 English: From classroom to naturalistic input. In O. Timofeeva, A.-C. Gardner, A. Honkapohja, & S. Chevalier (Eds.), New approaches to English linguistics: Building bridges (pp. 213–249). Amsterdam: John Benjamins.

    Chapter  Google Scholar 

  • Gilquin, G. (2017). POS tagging a spoken learner corpus: Testing accuracy testing. Paper presented at the 4th Learner Corpus Research Conference, Bolzano/Bozen, Italy, 5–7 October 2017.

    Google Scholar 

  • Gilquin, G. (Forthcoming). Hic sunt dracones: Exploring some terra incognita in learner corpus research. In A. Čermáková & M. Malá (Eds.), Variation in time and space: Observing the world through corpora. Berlin: De Gruyter.

    Google Scholar 

  • Gilquin, G., De Cock, S., & Granger, S. (2010). Louvain International Database of Spoken English Interlanguage. Louvain-la-Neuve: Presses universitaires de Louvain.

    Google Scholar 

  • Golden, A., Jarvis, S., & Tenfjord, K. (2017). Crosslinguistic influence and distinctive patterns of language learning: Findings and insights from a learner corpus. Bristol: Multilingual Matters.

    Book  Google Scholar 

  • Granger, S. (1996). From CA to CIA and back: An integrated approach to computerized bilingual and learner corpora. In K. Aijmer, B. Altenberg, & M. Johansson (Eds.), Languages in contrast. Text-based cross-linguistic studies (pp. 37–51). Lund: Lund University Press.

    Google Scholar 

  • Granger, S. (1998). The computer learner corpus: A testbed for electronic EFL tools. In J. Nerbonne (Ed.), Linguistic databases (pp. 175–188). Stanford: CSLI Publications.

    Google Scholar 

  • Granger, S. (2004). Computer learner corpus research: Current status and future prospects. In U. Connor & T. Upton (Eds.), Applied corpus linguistics: A multidimensional perspective (pp. 123–145). Amsterdam: Rodopi.

    Google Scholar 

  • Granger, S. (2009). The contribution of learner corpora to second language acquisition and foreign language teaching: A critical evaluation. In K. Aijmer (Ed.), Corpora and language teaching (pp. 13–32). Amsterdam: John Benjamins.

    Chapter  Google Scholar 

  • Granger, S. (2015). Contrastive interlanguage analysis: A reappraisal. International Journal of Learner Corpus Research, 1(1), 7–24.

    Article  Google Scholar 

  • Granger, S., Dagneaux, E., Meunier, F., & Paquot, M. (2009). The International Corpus of Learner English (Handbook and CD-ROM. Version 2). Louvain-la-Neuve: Presses universitaires de Louvain.

    Google Scholar 

  • Gries, S. T., & Deshors, S. C. (2014). Using regressions to explore deviations between corpus data and a standard/target: Two suggestions. Corpora, 9(1), 109–136.

    Article  Google Scholar 

  • Higgins, D., Ramineni, C., & Zechner, K. (2015). Learner corpora and automated scoring. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 587–604). Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  • Hilton, H., Osborne, J., Derive, M. -J., Succo, N., O’Donnell, J., Billard, S., & Rutigliano-Daspet, S. (2008). Corpus PAROLE (Parallèle Oral en Langue Étrangère). Architecture du corpus & conventions de transcription. Chambéry: Laboratoire LLS – Équipe Langages, Université de Savoie. http://archive.sfl.cnrs.fr/sites/sfl/IMG/pdf/PAROLE_manual.pdf. Accessed 22 May 2019.

  • Hokamura, M. (2018). The dynamics of complexity, accuracy, and fluency: A longitudinal case study of Japanese learners’ English writing. JALT Journal, 40(1), 23–46.

    Article  Google Scholar 

  • Huang, Y., Murakami, A., Alexopoulou, T., & Korhonen, A. (2018). Dependency parsing of learner English. International Journal of Corpus Linguistics, 23(1), 28–54.

    Article  Google Scholar 

  • Hutchinson, J. (1996). Université Catholique de Louvain Error Editor. Louvain-la-Neuve: Centre for English Corpus Linguistics, Université catholique de Louvain.

    Google Scholar 

  • Izumi, E., Uchimoto, K., & Isahara, H. (2004). The NICT JLE Corpus: Exploiting the language learners’ speech database for research and education. International Journal of the Computer, the Internet and Management, 12(2), 119–125.

    Google Scholar 

  • James, C. (1998). Errors in language learning and use: Exploring error analysis. London/New York: Longman.

    Google Scholar 

  • Jarvis, S., & Pavlenko, A. (2008). Crosslinguistic influence in language and cognition. New York/London: Routledge.

    Book  Google Scholar 

  • Jucker, A. H., Smith, S. W., & Lüdge, T. (2003). Interactive aspects of vagueness in conversation. Journal of Pragmatics, 35(12), 1737–1769.

    Article  Google Scholar 

  • Liu, E. T. K., & Shaw, P. M. (2001). Investigating learner vocabulary: A possible approach to looking at EFL/ESL learners’ qualitative knowledge of the word. International Review of Applied Linguistics in Language Teaching, 39(3), 171–194.

    Article  Google Scholar 

  • Lüdeling, A., Hirschmann, H., & Shadrova, A. (2017). Linguistic models, acquisition theories, and learner corpora: Morphological productivity in SLA research exemplified by complex verbs in German. Language Learning, 67(S1), 96–129.

    Article  Google Scholar 

  • Meunier, F. (1998). Computer tools for interlanguage analysis: A critical approach. In S. Granger (Ed.), Learner English on computer (pp. 19–37). London/New York: Addison Wesley Longman.

    Google Scholar 

  • Meunier, F. (2016). Introduction to the LONGDALE Project. In E. Castello, K. Ackerley, & F. Coccetta (Eds.), Studies in learner corpus linguistics. Research and applications for foreign language teaching and assessment (pp. 123–126). Berlin: Peter Lang.

    Google Scholar 

  • Meunier, F., & Littré, D. (2013). Tracking learners’ progress: Adopting a dual ‘corpus cum experimental data’ approach. The Modern Language Journal, 97(S1), 61–76.

    Article  Google Scholar 

  • Meurers, D. (2015). Learner corpora and natural language processing. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 537–566). Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  • Möller, V. (2017). Language acquisition in CLIL and non-CLIL settings: Learner corpus and experimental evidence on passive constructions. Amsterdam: John Benjamins.

    Book  Google Scholar 

  • Myles, F. (2015). Second language acquisition theory and learner corpus research. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 309–331). Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  • Nesselhauf, N. (2004). Learner corpora and their potential in language teaching. In J. Sinclair (Ed.), How to use corpora in language teaching (pp. 125–152). Amsterdam: John Benjamins.

    Chapter  Google Scholar 

  • Osborne, J. (2015). Transfer and learner corpus research. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 333–356). Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  • Paquot, M. (2014). Cross-linguistic influence and formulaic language: Recurrent word sequences in French learner writing. In L. Roberts, I. Vedder, & J. H. Hulstijn (Eds.), EUROSLA Yearbook 14 (pp. 240–261). Amsterdam: John Benjamins.

    Google Scholar 

  • Pendar, N., & Chapelle, C. A. (2008). Investigating the promise of learner corpora: Methodological issues. CALICO Journal, 25(2), 189–206.

    Google Scholar 

  • Rayson, P., & Baron, A. (2011). Automatic error tagging of spelling mistakes in learner corpora. In F. Meunier, S. De Cock, G. Gilquin, & M. Paquot (Eds.), A taste for corpora: In honour of Sylviane Granger (pp. 109–126). Amsterdam: John Benjamins.

    Chapter  Google Scholar 

  • Reder, S., Harris, K., & Setzler, K. (2003). The Multimedia Adult ESL Learner Corpus. TESOL Quarterly, 37(3), 546–557.

    Article  Google Scholar 

  • Reznicek, M., Lüdeling, A., & Hirschmann, H. (2013). Competing target hypotheses in the Falko corpus: A flexible multi-layer corpus architecture. In A. Díaz-Negrillo, N. Ballier, & P. Thompson (Eds.), Automatic treatment and analysis of learner corpus data (pp. 101–124). Amsterdam: John Benjamins.

    Chapter  Google Scholar 

  • Römer, U. (2004). Comparing real and ideal language learner input: The use of an EFL textbook corpus in corpus linguistics and language teaching. In G. Aston, S. Bernardini, & D. Stewart (Eds.), Corpora and language learners (pp. 152–168). Amsterdam: John Benjamins.

    Google Scholar 

  • Rozovskaya, A., & Roth, D. (2010). Training paradigms for correcting errors in grammar and usage. In Human language technologies: The 2010 annual conference of the North American chapter of the association for computational linguistics (pp. 154–162). Los Angeles: Association for Computational Linguistics.

    Google Scholar 

  • Seidlhofer, B. (2002). Pedagogy and local learner corpora: Working with learning-driven data. In S. Granger, J. Hung, & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition and foreign language teaching (pp. 213–234). Amsterdam: John Benjamins.

    Google Scholar 

  • Sinclair, J. (1996). Preliminary recommendations on corpus typology (Technical report, EAGLES (Expert Advisory Group on Language Engineering Standards). www.ilc.cnr.it/EAGLES96/corpustyp/corpustyp.html. Accessed 22 May 2019.

  • Spoelman, M. (2013). The (under)use of partitive objects in Estonian, German and Dutch learners of Finnish. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Twenty years of learner corpus research: Looking back, moving ahead (pp. 423–433). Louvain-la-Neuve: Presses universitaires de Louvain.

    Google Scholar 

  • Tono, Y. (2012). International Corpus of Crosslinguistic Interlanguage: Project overview and a case study on the acquisition of new verb co-occurrence patterns. In Y. Tono, Y. Kawaguchi, & M. Minegishi (Eds.), Developmental and crosslinguistic perspectives in learner corpus research (pp. 27–46). Amsterdam: John Benjamins.

    Chapter  Google Scholar 

  • Van Rooy, B., & Schäfer, L. (2002). The effect of learner errors on POS tag errors during automatic POS tagging. Southern African Linguistics and Applied Language Studies, 20, 325–335.

    Article  Google Scholar 

  • Zechner, K., Higgins, D., Xi, X., & Williamson, D. M. (2009). Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Communication, 51(10), 883–895.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gaëtanelle Gilquin .

Editor information

Editors and Affiliations

Further Reading

Further Reading

  • Granger, S. 2012. How to use foreign and second language learner corpora. In Research methods in second language acquisition: A practical guide , eds. Mackey, A., and Gass, S.M., 7–29. Chichester: Blackwell Publishing.

After briefly introducing learner corpora, this paper clearly presents the different stages that can be involved in a learner corpus study: choice of a methodological approach, selection and/or compilation of a learner corpus, data annotation, data extraction, data analysis, data interpretation and pedagogical implementation.

  • Díaz-Negrillo, A., Ballier, N., and Thompson, P., eds. 2013. Automatic treatment and analysis of learner corpus data . Amsterdam: John Benjamins.

This edited volume covers many important methodological issues related to learner corpora, such as the question of interoperability, multi-layer error annotation, automatic error detection and correction, or the use of statistics in learner corpus research.

  • Granger, S, Gilquin, G., and Meunier, F., eds. 2015. The Cambridge handbook of learner corpus research . Cambridge: Cambridge University Press.

This handbook provides a comprehensive overview of the different facets of learner corpus research, including the design of learner corpora, the methods that can be applied to study them, their use to investigate various aspects of language, and the link between learner corpus research and second language acquisition, language teaching and natural language processing.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Gilquin, G. (2020). Learner Corpora. In: Paquot, M., Gries, S.T. (eds) A Practical Handbook of Corpus Linguistics. Springer, Cham. https://doi.org/10.1007/978-3-030-46216-1_13

Download citation

Publish with us

Policies and ethics