Abstract
This chapter deals with learner corpora, that is, collections of (spoken and/or written) texts produced by learners of a language. It describes their main characteristics, with particular emphasis on those that are distinctive of learner corpora. Special types of corpora are introduced, such as longitudinal learner corpora or local learner corpora. The issues of the metadata accompanying learner corpora and the annotation of learner corpora are also discussed, and the challenges they involve are highlighted. Several methods of analysis designed to deal with learner corpora are presented, including Contrastive Interlanguage Analysis, Computer-aided Error Analysis and the Integrated Contrastive Model. The development of the field of learner corpus research is sketched, and possible future directions are examined, in terms of the size of learner corpora, their diversity, or the techniques of compilation and analysis. The chapter also features representative corpus-based studies of learner language, representative learner corpora, tools and resources related to learner corpora, and annotated references for further reading.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
http://merlin-platform.eu/. Accessed 22 May 2019.
- 2.
https://korpling.german.hu-berlin.de/falko-suche/. Accessed 22 May 2019.
- 3.
https://uclouvain.be/en/research-institutes/ilc/cecl/proceed.html. Accessed 22 May 2019.
- 4.
https://childes.talkbank.org/. Accessed 22 May 2019.
- 5.
www.flloc.soton.ac.uk/. Accessed 22 May 2019.
References
Alexopoulou, T., Geertzen, J., Korhonen, A., & Meurers, D. (2015). Exploring big educational learner corpora for SLA research: Perspectives on relative clauses. International Journal of Learner Corpus Research, 1(1), 96–129.
Altenberg, B., & Granger, S. (2001). The grammatical and lexical patterning of make in native and non-native student writing. Applied Linguistics, 22(2), 173–194.
Belz, J., & Vyatkina, N. (2005). Learner corpus analysis and the development of L2 pragmatic competence in networked inter-cultural language study: The case of German modal particles. The Canadian Modern Language Review/La revue canadienne des langues vivantes, 62(1):17–48.
Blanchard, D., Tetreault, J., Higgins, D., Cahill, A., & Chodorow, M. (2013). TOEFL11: A corpus of non-native English. Princeton: Educational Testing Service.
Caines, A., & Buttery, P. (2014). The effect of disfluencies and learner errors on the parsing of spoken learner language. In First joint workshop on statistical parsing of morphologically rich languages and syntactic analysis of non-canonical languages (pp. 74–81). Dublin, Ireland, August 23–29, 2014.
Callies, M. (2015). Learner corpus methodology. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 35–55). Cambridge: Cambridge University Press.
Castagnoli, S., Ciobanu, D., Kunz, K., Kübler, N., & Volanschi, A. (2011). Designing a learner translator corpus for training purposes. In N. Kübler (Ed.), Corpora, language, teaching, and resources: From theory to practice (pp. 221–248). Bern: Peter Lang.
Crossley, S. A., & McNamara, D. S. (2012). Interlanguage talk: A computational analysis of non-native speakers’ lexical production and exposure. In P. M. McCarthy & C. Boonthum-Denecke (Eds.), Applied natural language processing: Identification, investigation and resolution (pp. 425–437). Hershey: IGI Global.
Dagneaux, E., Denness, S., & Granger, S. (1998). Computer-aided error analysis. System, 26(2), 163–174.
Dagneaux, E., Denness, S., Granger, S., Meunier, F., Neff, J. A., & Thewissen, J. (2008). Error tagging manual version 1.3. Louvain-la-Neuve: Centre for English Corpus Linguistics.
de Bot, K., Lowie, W., & Verspoor, M. (2007). A Dynamic Systems Theory approach to second language acquisition. Bilingualism: Language and Cognition, 10(1), 7–21.
De Felice, R., & Pulman, S. (2009). Automatic detection of preposition errors in learner writing. CALICO Journal, 26(3), 512–528.
de Haan, P. (1984). Problem-oriented tagging of English corpus data. In J. Aarts & W. Meijs (Eds.), Corpus linguistics: Recent developments in the use of computer corpora (pp. 123–139). Amsterdam: Rodopi.
de Haan, P. (2000). Tagging non-native English with the TOSCA–ICLE tagger. In C. Mair & M. Hundt (Eds.), Corpus linguistics and linguistic theory (pp. 69–79). Amsterdam: Rodopi.
De Knop, S., & Meunier, F. (2015). The ‘learner corpus research, cognitive linguistics and second language acquisition’ nexus: A SWOT analysis. Corpus Linguistics and Linguistic Theory, 11(1), 1–18.
Díaz-Negrillo, A., Meurers, D., Valera, S., & Wunsch, H. (2010). Towards interlanguage POS annotation for effective learner corpora in SLA and FLT. Language Forum, 36(1–2), 139–154.
Ellis, N. C., Simpson-Vlach, R., Römer, U., O’Donnell, M. B., & Wulff, S. (2015). Learner corpora and formulaic language in second language acquisition research. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 357–378). Cambridge: Cambridge University Press.
Flowerdew, L. (2015). Learner corpora and language for academic and specific purposes. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 465–484). Cambridge: Cambridge University Press.
Gablasova, D., Brezina, V., McEnery, T., & Boyd, E. (2017). Epistemic stance in spoken L2 English: The effect of task and speaker style. Applied Linguistics, 38(5), 613–637.
García Lecumberri, M. L., Cooke, M., & Wester, M. (2017). A bi-directional task-based corpus of learners’ conversational speech. International Journal of Learner Corpus Research, 3(2), 175–195.
Geertzen, J., Alexopoulou, T., & Korhonen, A. (2014). Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge Open Language Database (EFCamDat). In R. T. I. Miller, K. Martin, C. M. Eddington, A. Henery, N. Marcos Miguel, A. M. Tseng, A. Tuninetti, & D. Walter (Eds.), Selected proceedings of the 2012 second language research forum: Building bridges between disciplines (pp. 240–254). Somerville: Cascadilla Proceedings Project.
Gilquin, G. (2000/2001). The integrated contrastive model: Spicing up your data. Languages in Contrast, 3(1), 95–123.
Gilquin, G. (2012). Lexical infelicity in English causative constructions. Comparing native and learner collostructions. In J. Leino & R. V. Waldenfels (Eds.), Analytical causatives. From ‘give’ and ‘come’ to ‘let’ and ‘make’ (pp. 41–63). München: Lincom Europa.
Gilquin, G. (2016). Discourse markers in L2 English: From classroom to naturalistic input. In O. Timofeeva, A.-C. Gardner, A. Honkapohja, & S. Chevalier (Eds.), New approaches to English linguistics: Building bridges (pp. 213–249). Amsterdam: John Benjamins.
Gilquin, G. (2017). POS tagging a spoken learner corpus: Testing accuracy testing. Paper presented at the 4th Learner Corpus Research Conference, Bolzano/Bozen, Italy, 5–7 October 2017.
Gilquin, G. (Forthcoming). Hic sunt dracones: Exploring some terra incognita in learner corpus research. In A. Čermáková & M. Malá (Eds.), Variation in time and space: Observing the world through corpora. Berlin: De Gruyter.
Gilquin, G., De Cock, S., & Granger, S. (2010). Louvain International Database of Spoken English Interlanguage. Louvain-la-Neuve: Presses universitaires de Louvain.
Golden, A., Jarvis, S., & Tenfjord, K. (2017). Crosslinguistic influence and distinctive patterns of language learning: Findings and insights from a learner corpus. Bristol: Multilingual Matters.
Granger, S. (1996). From CA to CIA and back: An integrated approach to computerized bilingual and learner corpora. In K. Aijmer, B. Altenberg, & M. Johansson (Eds.), Languages in contrast. Text-based cross-linguistic studies (pp. 37–51). Lund: Lund University Press.
Granger, S. (1998). The computer learner corpus: A testbed for electronic EFL tools. In J. Nerbonne (Ed.), Linguistic databases (pp. 175–188). Stanford: CSLI Publications.
Granger, S. (2004). Computer learner corpus research: Current status and future prospects. In U. Connor & T. Upton (Eds.), Applied corpus linguistics: A multidimensional perspective (pp. 123–145). Amsterdam: Rodopi.
Granger, S. (2009). The contribution of learner corpora to second language acquisition and foreign language teaching: A critical evaluation. In K. Aijmer (Ed.), Corpora and language teaching (pp. 13–32). Amsterdam: John Benjamins.
Granger, S. (2015). Contrastive interlanguage analysis: A reappraisal. International Journal of Learner Corpus Research, 1(1), 7–24.
Granger, S., Dagneaux, E., Meunier, F., & Paquot, M. (2009). The International Corpus of Learner English (Handbook and CD-ROM. Version 2). Louvain-la-Neuve: Presses universitaires de Louvain.
Gries, S. T., & Deshors, S. C. (2014). Using regressions to explore deviations between corpus data and a standard/target: Two suggestions. Corpora, 9(1), 109–136.
Higgins, D., Ramineni, C., & Zechner, K. (2015). Learner corpora and automated scoring. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 587–604). Cambridge: Cambridge University Press.
Hilton, H., Osborne, J., Derive, M. -J., Succo, N., O’Donnell, J., Billard, S., & Rutigliano-Daspet, S. (2008). Corpus PAROLE (Parallèle Oral en Langue Étrangère). Architecture du corpus & conventions de transcription. Chambéry: Laboratoire LLS – Équipe Langages, Université de Savoie. http://archive.sfl.cnrs.fr/sites/sfl/IMG/pdf/PAROLE_manual.pdf. Accessed 22 May 2019.
Hokamura, M. (2018). The dynamics of complexity, accuracy, and fluency: A longitudinal case study of Japanese learners’ English writing. JALT Journal, 40(1), 23–46.
Huang, Y., Murakami, A., Alexopoulou, T., & Korhonen, A. (2018). Dependency parsing of learner English. International Journal of Corpus Linguistics, 23(1), 28–54.
Hutchinson, J. (1996). Université Catholique de Louvain Error Editor. Louvain-la-Neuve: Centre for English Corpus Linguistics, Université catholique de Louvain.
Izumi, E., Uchimoto, K., & Isahara, H. (2004). The NICT JLE Corpus: Exploiting the language learners’ speech database for research and education. International Journal of the Computer, the Internet and Management, 12(2), 119–125.
James, C. (1998). Errors in language learning and use: Exploring error analysis. London/New York: Longman.
Jarvis, S., & Pavlenko, A. (2008). Crosslinguistic influence in language and cognition. New York/London: Routledge.
Jucker, A. H., Smith, S. W., & Lüdge, T. (2003). Interactive aspects of vagueness in conversation. Journal of Pragmatics, 35(12), 1737–1769.
Liu, E. T. K., & Shaw, P. M. (2001). Investigating learner vocabulary: A possible approach to looking at EFL/ESL learners’ qualitative knowledge of the word. International Review of Applied Linguistics in Language Teaching, 39(3), 171–194.
Lüdeling, A., Hirschmann, H., & Shadrova, A. (2017). Linguistic models, acquisition theories, and learner corpora: Morphological productivity in SLA research exemplified by complex verbs in German. Language Learning, 67(S1), 96–129.
Meunier, F. (1998). Computer tools for interlanguage analysis: A critical approach. In S. Granger (Ed.), Learner English on computer (pp. 19–37). London/New York: Addison Wesley Longman.
Meunier, F. (2016). Introduction to the LONGDALE Project. In E. Castello, K. Ackerley, & F. Coccetta (Eds.), Studies in learner corpus linguistics. Research and applications for foreign language teaching and assessment (pp. 123–126). Berlin: Peter Lang.
Meunier, F., & Littré, D. (2013). Tracking learners’ progress: Adopting a dual ‘corpus cum experimental data’ approach. The Modern Language Journal, 97(S1), 61–76.
Meurers, D. (2015). Learner corpora and natural language processing. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 537–566). Cambridge: Cambridge University Press.
Möller, V. (2017). Language acquisition in CLIL and non-CLIL settings: Learner corpus and experimental evidence on passive constructions. Amsterdam: John Benjamins.
Myles, F. (2015). Second language acquisition theory and learner corpus research. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 309–331). Cambridge: Cambridge University Press.
Nesselhauf, N. (2004). Learner corpora and their potential in language teaching. In J. Sinclair (Ed.), How to use corpora in language teaching (pp. 125–152). Amsterdam: John Benjamins.
Osborne, J. (2015). Transfer and learner corpus research. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 333–356). Cambridge: Cambridge University Press.
Paquot, M. (2014). Cross-linguistic influence and formulaic language: Recurrent word sequences in French learner writing. In L. Roberts, I. Vedder, & J. H. Hulstijn (Eds.), EUROSLA Yearbook 14 (pp. 240–261). Amsterdam: John Benjamins.
Pendar, N., & Chapelle, C. A. (2008). Investigating the promise of learner corpora: Methodological issues. CALICO Journal, 25(2), 189–206.
Rayson, P., & Baron, A. (2011). Automatic error tagging of spelling mistakes in learner corpora. In F. Meunier, S. De Cock, G. Gilquin, & M. Paquot (Eds.), A taste for corpora: In honour of Sylviane Granger (pp. 109–126). Amsterdam: John Benjamins.
Reder, S., Harris, K., & Setzler, K. (2003). The Multimedia Adult ESL Learner Corpus. TESOL Quarterly, 37(3), 546–557.
Reznicek, M., Lüdeling, A., & Hirschmann, H. (2013). Competing target hypotheses in the Falko corpus: A flexible multi-layer corpus architecture. In A. Díaz-Negrillo, N. Ballier, & P. Thompson (Eds.), Automatic treatment and analysis of learner corpus data (pp. 101–124). Amsterdam: John Benjamins.
Römer, U. (2004). Comparing real and ideal language learner input: The use of an EFL textbook corpus in corpus linguistics and language teaching. In G. Aston, S. Bernardini, & D. Stewart (Eds.), Corpora and language learners (pp. 152–168). Amsterdam: John Benjamins.
Rozovskaya, A., & Roth, D. (2010). Training paradigms for correcting errors in grammar and usage. In Human language technologies: The 2010 annual conference of the North American chapter of the association for computational linguistics (pp. 154–162). Los Angeles: Association for Computational Linguistics.
Seidlhofer, B. (2002). Pedagogy and local learner corpora: Working with learning-driven data. In S. Granger, J. Hung, & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition and foreign language teaching (pp. 213–234). Amsterdam: John Benjamins.
Sinclair, J. (1996). Preliminary recommendations on corpus typology (Technical report, EAGLES (Expert Advisory Group on Language Engineering Standards). www.ilc.cnr.it/EAGLES96/corpustyp/corpustyp.html. Accessed 22 May 2019.
Spoelman, M. (2013). The (under)use of partitive objects in Estonian, German and Dutch learners of Finnish. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Twenty years of learner corpus research: Looking back, moving ahead (pp. 423–433). Louvain-la-Neuve: Presses universitaires de Louvain.
Tono, Y. (2012). International Corpus of Crosslinguistic Interlanguage: Project overview and a case study on the acquisition of new verb co-occurrence patterns. In Y. Tono, Y. Kawaguchi, & M. Minegishi (Eds.), Developmental and crosslinguistic perspectives in learner corpus research (pp. 27–46). Amsterdam: John Benjamins.
Van Rooy, B., & Schäfer, L. (2002). The effect of learner errors on POS tag errors during automatic POS tagging. Southern African Linguistics and Applied Language Studies, 20, 325–335.
Zechner, K., Higgins, D., Xi, X., & Williamson, D. M. (2009). Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Communication, 51(10), 883–895.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Further Reading
Further Reading
-
Granger, S. 2012. How to use foreign and second language learner corpora. In Research methods in second language acquisition: A practical guide , eds. Mackey, A., and Gass, S.M., 7–29. Chichester: Blackwell Publishing.
After briefly introducing learner corpora, this paper clearly presents the different stages that can be involved in a learner corpus study: choice of a methodological approach, selection and/or compilation of a learner corpus, data annotation, data extraction, data analysis, data interpretation and pedagogical implementation.
-
Díaz-Negrillo, A., Ballier, N., and Thompson, P., eds. 2013. Automatic treatment and analysis of learner corpus data . Amsterdam: John Benjamins.
This edited volume covers many important methodological issues related to learner corpora, such as the question of interoperability, multi-layer error annotation, automatic error detection and correction, or the use of statistics in learner corpus research.
-
Granger, S, Gilquin, G., and Meunier, F., eds. 2015. The Cambridge handbook of learner corpus research . Cambridge: Cambridge University Press.
This handbook provides a comprehensive overview of the different facets of learner corpus research, including the design of learner corpora, the methods that can be applied to study them, their use to investigate various aspects of language, and the link between learner corpus research and second language acquisition, language teaching and natural language processing.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Gilquin, G. (2020). Learner Corpora. In: Paquot, M., Gries, S.T. (eds) A Practical Handbook of Corpus Linguistics. Springer, Cham. https://doi.org/10.1007/978-3-030-46216-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-46216-1_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46215-4
Online ISBN: 978-3-030-46216-1
eBook Packages: Religion and PhilosophyPhilosophy and Religion (R0)