Learner Corpora

Gilquin, Gaëtanelle

doi:10.1007/978-3-030-46216-1_13

Gaëtanelle Gilquin³

1907 Accesses
4 Citations

Abstract

This chapter deals with learner corpora, that is, collections of (spoken and/or written) texts produced by learners of a language. It describes their main characteristics, with particular emphasis on those that are distinctive of learner corpora. Special types of corpora are introduced, such as longitudinal learner corpora or local learner corpora. The issues of the metadata accompanying learner corpora and the annotation of learner corpora are also discussed, and the challenges they involve are highlighted. Several methods of analysis designed to deal with learner corpora are presented, including Contrastive Interlanguage Analysis, Computer-aided Error Analysis and the Integrated Contrastive Model. The development of the field of learner corpus research is sketched, and possible future directions are examined, in terms of the size of learner corpora, their diversity, or the techniques of compilation and analysis. The chapter also features representative corpus-based studies of learner language, representative learner corpora, tools and resources related to learner corpora, and annotated references for further reading.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://merlin-platform.eu/. Accessed 22 May 2019.
2.
https://korpling.german.hu-berlin.de/falko-suche/. Accessed 22 May 2019.
3.
https://uclouvain.be/en/research-institutes/ilc/cecl/proceed.html. Accessed 22 May 2019.
4.
https://childes.talkbank.org/. Accessed 22 May 2019.
5.
www.flloc.soton.ac.uk/. Accessed 22 May 2019.

References

Alexopoulou, T., Geertzen, J., Korhonen, A., & Meurers, D. (2015). Exploring big educational learner corpora for SLA research: Perspectives on relative clauses. International Journal of Learner Corpus Research, 1(1), 96–129.
Article Google Scholar
Altenberg, B., & Granger, S. (2001). The grammatical and lexical patterning of make in native and non-native student writing. Applied Linguistics, 22(2), 173–194.
Article Google Scholar
Belz, J., & Vyatkina, N. (2005). Learner corpus analysis and the development of L2 pragmatic competence in networked inter-cultural language study: The case of German modal particles. The Canadian Modern Language Review/La revue canadienne des langues vivantes, 62(1):17–48.
Google Scholar
Blanchard, D., Tetreault, J., Higgins, D., Cahill, A., & Chodorow, M. (2013). TOEFL11: A corpus of non-native English. Princeton: Educational Testing Service.
Google Scholar
Caines, A., & Buttery, P. (2014). The effect of disfluencies and learner errors on the parsing of spoken learner language. In First joint workshop on statistical parsing of morphologically rich languages and syntactic analysis of non-canonical languages (pp. 74–81). Dublin, Ireland, August 23–29, 2014.
Google Scholar
Callies, M. (2015). Learner corpus methodology. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 35–55). Cambridge: Cambridge University Press.
Chapter Google Scholar
Castagnoli, S., Ciobanu, D., Kunz, K., Kübler, N., & Volanschi, A. (2011). Designing a learner translator corpus for training purposes. In N. Kübler (Ed.), Corpora, language, teaching, and resources: From theory to practice (pp. 221–248). Bern: Peter Lang.
Google Scholar
Crossley, S. A., & McNamara, D. S. (2012). Interlanguage talk: A computational analysis of non-native speakers’ lexical production and exposure. In P. M. McCarthy & C. Boonthum-Denecke (Eds.), Applied natural language processing: Identification, investigation and resolution (pp. 425–437). Hershey: IGI Global.
Chapter Google Scholar
Dagneaux, E., Denness, S., & Granger, S. (1998). Computer-aided error analysis. System, 26(2), 163–174.
Article Google Scholar
Dagneaux, E., Denness, S., Granger, S., Meunier, F., Neff, J. A., & Thewissen, J. (2008). Error tagging manual version 1.3. Louvain-la-Neuve: Centre for English Corpus Linguistics.
Google Scholar
de Bot, K., Lowie, W., & Verspoor, M. (2007). A Dynamic Systems Theory approach to second language acquisition. Bilingualism: Language and Cognition, 10(1), 7–21.
Article Google Scholar
De Felice, R., & Pulman, S. (2009). Automatic detection of preposition errors in learner writing. CALICO Journal, 26(3), 512–528.
Article Google Scholar
de Haan, P. (1984). Problem-oriented tagging of English corpus data. In J. Aarts & W. Meijs (Eds.), Corpus linguistics: Recent developments in the use of computer corpora (pp. 123–139). Amsterdam: Rodopi.
Google Scholar
de Haan, P. (2000). Tagging non-native English with the TOSCA–ICLE tagger. In C. Mair & M. Hundt (Eds.), Corpus linguistics and linguistic theory (pp. 69–79). Amsterdam: Rodopi.
Google Scholar
De Knop, S., & Meunier, F. (2015). The ‘learner corpus research, cognitive linguistics and second language acquisition’ nexus: A SWOT analysis. Corpus Linguistics and Linguistic Theory, 11(1), 1–18.
Article Google Scholar
Díaz-Negrillo, A., Meurers, D., Valera, S., & Wunsch, H. (2010). Towards interlanguage POS annotation for effective learner corpora in SLA and FLT. Language Forum, 36(1–2), 139–154.
Google Scholar
Ellis, N. C., Simpson-Vlach, R., Römer, U., O’Donnell, M. B., & Wulff, S. (2015). Learner corpora and formulaic language in second language acquisition research. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 357–378). Cambridge: Cambridge University Press.
Chapter Google Scholar
Flowerdew, L. (2015). Learner corpora and language for academic and specific purposes. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 465–484). Cambridge: Cambridge University Press.
Chapter Google Scholar
Gablasova, D., Brezina, V., McEnery, T., & Boyd, E. (2017). Epistemic stance in spoken L2 English: The effect of task and speaker style. Applied Linguistics, 38(5), 613–637.
Article Google Scholar
García Lecumberri, M. L., Cooke, M., & Wester, M. (2017). A bi-directional task-based corpus of learners’ conversational speech. International Journal of Learner Corpus Research, 3(2), 175–195.
Article Google Scholar
Geertzen, J., Alexopoulou, T., & Korhonen, A. (2014). Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge Open Language Database (EFCamDat). In R. T. I. Miller, K. Martin, C. M. Eddington, A. Henery, N. Marcos Miguel, A. M. Tseng, A. Tuninetti, & D. Walter (Eds.), Selected proceedings of the 2012 second language research forum: Building bridges between disciplines (pp. 240–254). Somerville: Cascadilla Proceedings Project.
Google Scholar
Gilquin, G. (2000/2001). The integrated contrastive model: Spicing up your data. Languages in Contrast, 3(1), 95–123.
Article Google Scholar
Gilquin, G. (2012). Lexical infelicity in English causative constructions. Comparing native and learner collostructions. In J. Leino & R. V. Waldenfels (Eds.), Analytical causatives. From ‘give’ and ‘come’ to ‘let’ and ‘make’ (pp. 41–63). München: Lincom Europa.
Google Scholar
Gilquin, G. (2016). Discourse markers in L2 English: From classroom to naturalistic input. In O. Timofeeva, A.-C. Gardner, A. Honkapohja, & S. Chevalier (Eds.), New approaches to English linguistics: Building bridges (pp. 213–249). Amsterdam: John Benjamins.
Chapter Google Scholar
Gilquin, G. (2017). POS tagging a spoken learner corpus: Testing accuracy testing. Paper presented at the 4th Learner Corpus Research Conference, Bolzano/Bozen, Italy, 5–7 October 2017.
Google Scholar
Gilquin, G. (Forthcoming). Hic sunt dracones: Exploring some terra incognita in learner corpus research. In A. Čermáková & M. Malá (Eds.), Variation in time and space: Observing the world through corpora. Berlin: De Gruyter.
Google Scholar
Gilquin, G., De Cock, S., & Granger, S. (2010). Louvain International Database of Spoken English Interlanguage. Louvain-la-Neuve: Presses universitaires de Louvain.
Google Scholar
Golden, A., Jarvis, S., & Tenfjord, K. (2017). Crosslinguistic influence and distinctive patterns of language learning: Findings and insights from a learner corpus. Bristol: Multilingual Matters.
Book Google Scholar
Granger, S. (1996). From CA to CIA and back: An integrated approach to computerized bilingual and learner corpora. In K. Aijmer, B. Altenberg, & M. Johansson (Eds.), Languages in contrast. Text-based cross-linguistic studies (pp. 37–51). Lund: Lund University Press.
Google Scholar
Granger, S. (1998). The computer learner corpus: A testbed for electronic EFL tools. In J. Nerbonne (Ed.), Linguistic databases (pp. 175–188). Stanford: CSLI Publications.
Google Scholar
Granger, S. (2004). Computer learner corpus research: Current status and future prospects. In U. Connor & T. Upton (Eds.), Applied corpus linguistics: A multidimensional perspective (pp. 123–145). Amsterdam: Rodopi.
Google Scholar
Granger, S. (2009). The contribution of learner corpora to second language acquisition and foreign language teaching: A critical evaluation. In K. Aijmer (Ed.), Corpora and language teaching (pp. 13–32). Amsterdam: John Benjamins.
Chapter Google Scholar
Granger, S. (2015). Contrastive interlanguage analysis: A reappraisal. International Journal of Learner Corpus Research, 1(1), 7–24.
Article Google Scholar
Granger, S., Dagneaux, E., Meunier, F., & Paquot, M. (2009). The International Corpus of Learner English (Handbook and CD-ROM. Version 2). Louvain-la-Neuve: Presses universitaires de Louvain.
Google Scholar
Gries, S. T., & Deshors, S. C. (2014). Using regressions to explore deviations between corpus data and a standard/target: Two suggestions. Corpora, 9(1), 109–136.
Article Google Scholar
Higgins, D., Ramineni, C., & Zechner, K. (2015). Learner corpora and automated scoring. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 587–604). Cambridge: Cambridge University Press.
Chapter Google Scholar
Hilton, H., Osborne, J., Derive, M. -J., Succo, N., O’Donnell, J., Billard, S., & Rutigliano-Daspet, S. (2008). Corpus PAROLE (Parallèle Oral en Langue Étrangère). Architecture du corpus & conventions de transcription. Chambéry: Laboratoire LLS – Équipe Langages, Université de Savoie. http://archive.sfl.cnrs.fr/sites/sfl/IMG/pdf/PAROLE_manual.pdf. Accessed 22 May 2019.
Hokamura, M. (2018). The dynamics of complexity, accuracy, and fluency: A longitudinal case study of Japanese learners’ English writing. JALT Journal, 40(1), 23–46.
Article Google Scholar
Huang, Y., Murakami, A., Alexopoulou, T., & Korhonen, A. (2018). Dependency parsing of learner English. International Journal of Corpus Linguistics, 23(1), 28–54.
Article Google Scholar
Hutchinson, J. (1996). Université Catholique de Louvain Error Editor. Louvain-la-Neuve: Centre for English Corpus Linguistics, Université catholique de Louvain.
Google Scholar
Izumi, E., Uchimoto, K., & Isahara, H. (2004). The NICT JLE Corpus: Exploiting the language learners’ speech database for research and education. International Journal of the Computer, the Internet and Management, 12(2), 119–125.
Google Scholar
James, C. (1998). Errors in language learning and use: Exploring error analysis. London/New York: Longman.
Google Scholar
Jarvis, S., & Pavlenko, A. (2008). Crosslinguistic influence in language and cognition. New York/London: Routledge.
Book Google Scholar
Jucker, A. H., Smith, S. W., & Lüdge, T. (2003). Interactive aspects of vagueness in conversation. Journal of Pragmatics, 35(12), 1737–1769.
Article Google Scholar
Liu, E. T. K., & Shaw, P. M. (2001). Investigating learner vocabulary: A possible approach to looking at EFL/ESL learners’ qualitative knowledge of the word. International Review of Applied Linguistics in Language Teaching, 39(3), 171–194.
Article Google Scholar
Lüdeling, A., Hirschmann, H., & Shadrova, A. (2017). Linguistic models, acquisition theories, and learner corpora: Morphological productivity in SLA research exemplified by complex verbs in German. Language Learning, 67(S1), 96–129.
Article Google Scholar
Meunier, F. (1998). Computer tools for interlanguage analysis: A critical approach. In S. Granger (Ed.), Learner English on computer (pp. 19–37). London/New York: Addison Wesley Longman.
Google Scholar
Meunier, F. (2016). Introduction to the LONGDALE Project. In E. Castello, K. Ackerley, & F. Coccetta (Eds.), Studies in learner corpus linguistics. Research and applications for foreign language teaching and assessment (pp. 123–126). Berlin: Peter Lang.
Google Scholar
Meunier, F., & Littré, D. (2013). Tracking learners’ progress: Adopting a dual ‘corpus cum experimental data’ approach. The Modern Language Journal, 97(S1), 61–76.
Article Google Scholar
Meurers, D. (2015). Learner corpora and natural language processing. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 537–566). Cambridge: Cambridge University Press.
Chapter Google Scholar
Möller, V. (2017). Language acquisition in CLIL and non-CLIL settings: Learner corpus and experimental evidence on passive constructions. Amsterdam: John Benjamins.
Book Google Scholar
Myles, F. (2015). Second language acquisition theory and learner corpus research. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 309–331). Cambridge: Cambridge University Press.
Chapter Google Scholar
Nesselhauf, N. (2004). Learner corpora and their potential in language teaching. In J. Sinclair (Ed.), How to use corpora in language teaching (pp. 125–152). Amsterdam: John Benjamins.
Chapter Google Scholar
Osborne, J. (2015). Transfer and learner corpus research. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 333–356). Cambridge: Cambridge University Press.
Chapter Google Scholar
Paquot, M. (2014). Cross-linguistic influence and formulaic language: Recurrent word sequences in French learner writing. In L. Roberts, I. Vedder, & J. H. Hulstijn (Eds.), EUROSLA Yearbook 14 (pp. 240–261). Amsterdam: John Benjamins.
Google Scholar
Pendar, N., & Chapelle, C. A. (2008). Investigating the promise of learner corpora: Methodological issues. CALICO Journal, 25(2), 189–206.
Google Scholar
Rayson, P., & Baron, A. (2011). Automatic error tagging of spelling mistakes in learner corpora. In F. Meunier, S. De Cock, G. Gilquin, & M. Paquot (Eds.), A taste for corpora: In honour of Sylviane Granger (pp. 109–126). Amsterdam: John Benjamins.
Chapter Google Scholar
Reder, S., Harris, K., & Setzler, K. (2003). The Multimedia Adult ESL Learner Corpus. TESOL Quarterly, 37(3), 546–557.
Article Google Scholar
Reznicek, M., Lüdeling, A., & Hirschmann, H. (2013). Competing target hypotheses in the Falko corpus: A flexible multi-layer corpus architecture. In A. Díaz-Negrillo, N. Ballier, & P. Thompson (Eds.), Automatic treatment and analysis of learner corpus data (pp. 101–124). Amsterdam: John Benjamins.
Chapter Google Scholar
Römer, U. (2004). Comparing real and ideal language learner input: The use of an EFL textbook corpus in corpus linguistics and language teaching. In G. Aston, S. Bernardini, & D. Stewart (Eds.), Corpora and language learners (pp. 152–168). Amsterdam: John Benjamins.
Google Scholar
Rozovskaya, A., & Roth, D. (2010). Training paradigms for correcting errors in grammar and usage. In Human language technologies: The 2010 annual conference of the North American chapter of the association for computational linguistics (pp. 154–162). Los Angeles: Association for Computational Linguistics.
Google Scholar
Seidlhofer, B. (2002). Pedagogy and local learner corpora: Working with learning-driven data. In S. Granger, J. Hung, & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition and foreign language teaching (pp. 213–234). Amsterdam: John Benjamins.
Google Scholar
Sinclair, J. (1996). Preliminary recommendations on corpus typology (Technical report, EAGLES (Expert Advisory Group on Language Engineering Standards). www.ilc.cnr.it/EAGLES96/corpustyp/corpustyp.html. Accessed 22 May 2019.
Spoelman, M. (2013). The (under)use of partitive objects in Estonian, German and Dutch learners of Finnish. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Twenty years of learner corpus research: Looking back, moving ahead (pp. 423–433). Louvain-la-Neuve: Presses universitaires de Louvain.
Google Scholar
Tono, Y. (2012). International Corpus of Crosslinguistic Interlanguage: Project overview and a case study on the acquisition of new verb co-occurrence patterns. In Y. Tono, Y. Kawaguchi, & M. Minegishi (Eds.), Developmental and crosslinguistic perspectives in learner corpus research (pp. 27–46). Amsterdam: John Benjamins.
Chapter Google Scholar
Van Rooy, B., & Schäfer, L. (2002). The effect of learner errors on POS tag errors during automatic POS tagging. Southern African Linguistics and Applied Language Studies, 20, 325–335.
Article Google Scholar
Zechner, K., Higgins, D., Xi, X., & Williamson, D. M. (2009). Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Communication, 51(10), 883–895.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Université catholique de Louvain, Centre for English Corpus Linguistics, Louvain-la-Neuve, Belgium
Gaëtanelle Gilquin

Authors

Gaëtanelle Gilquin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gaëtanelle Gilquin .

Editor information

Editors and Affiliations

FNRS Centre for English Corpus Linguistics, Language and Communication Institute, UCLouvain, Louvain-la-Neuve, Belgium
Magali Paquot
Department of Linguistics, University of California, Santa Barbara, CA, USA
Stefan Th. Gries

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gilquin, G. (2020). Learner Corpora. In: Paquot, M., Gries, S.T. (eds) A Practical Handbook of Corpus Linguistics. Springer, Cham. https://doi.org/10.1007/978-3-030-46216-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-46216-1_13
Published: 05 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46215-4
Online ISBN: 978-3-030-46216-1
eBook Packages: Religion and PhilosophyPhilosophy and Religion (R0)

Publish with us

Policies and ethics

Learner Corpora

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Further Reading

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Learner Corpora

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Further Reading

Further Reading

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation