Language Resources and Evaluation

, Volume 48, Issue 1, pp 45–64 | Cite as

Towards advanced collocation error correction in Spanish learner corpora

  • Gabriela Ferraro
  • Rogelio Nazar
  • Margarita Alonso Ramos
  • Leo Wanner
SI: Resources for language learning


Collocations in the sense of idiosyncratic binary lexical co-occurrences are one of the biggest challenges for any language learner. Even advanced learners make collocation mistakes in that they literally translate collocation elements from their native tongue, create new words as collocation elements, choose a wrong subcategorization for one of the elements, etc. Therefore, automatic collocation error detection and correction is increasingly in demand. However, while state-of-the-art models predict, with a reasonable accuracy, whether a given co-occurrence is a valid collocation or not, only few of them manage to suggest appropriate corrections with an acceptable hit rate. Most often, a ranked list of correction options is offered from which the learner has then to choose. This is clearly unsatisfactory. Our proposal focuses on this critical part of the problem in the context of the acquisition of Spanish as second language. For collocation error detection, we use a frequency-based technique. To improve on collocation error correction, we discuss three different metrics with respect to their capability to select the most appropriate correction of miscollocations found in our learner corpus.


Collocation Collocation error Miscollocation CALL Collocation error detection Collocation error correction 



Many thanks to Amaya Medikoetxea and Cristóbal Lozano for making the CEDEL2 corpus available to us and to the two anonymous reviewers for their insightful comments, which considerably improved the final version of the paper. Our experiments have been partially run on the Argo cluster of the Department of Communication and Information Technologies, UPF. We are grateful for this service and would like to thank especially Silvina Re and Iván Jiménez for their help. This work has been partially funded by the Spanish Ministry of Science and Innovation under the contract numbers FFI2008-06479-C02-01/02 and FFI2011-30219-CO2-01/02.


  1. Alonso Ramos, M., Wanner, L., Vázquez, N., Vincze, O., Mosqueira, E., & Prieto S. (2010a). Tagging collocations for learners. In: S. Granger & M. Paquot (Eds.), eLexicography in the 21st century: New challenges, new applications. Proceedings of eLex 2009, Cahiers du Cental, volume 7, Louvain-la-Neuve.Google Scholar
  2. Alonso Ramos, M., Wanner, L., Vincze, O., Casamayor, G., Vázquez, N., Mosqueira, E., & Prieto, S. (2010b). Towards a motivated annotation schema of collocation errors in learner corpora. In Proceedings of LREC 2010, Malta.Google Scholar
  3. Atwell, E. (1987). How to detect grammatical errors in a text without parsing it. In Proceedings of the EACL Conference (pp. 38–45). Copenhagen, Denmark.Google Scholar
  4. Bouma, G. (2010). Collocation extraction beyond the independence assumption. In Proceedings of the ACL Conference, Short paper track, Uppsala.Google Scholar
  5. Chang, Y. C., Chang J. S., Chen H. J., & Liou, H. C. (2008). An automatic collocation writing assistant for Taiwanese EFL learners. A case of corpus-based NLP technology. Computer Assisted Language Learning, 21(3), 283–299.CrossRefGoogle Scholar
  6. Chen, H. (2009). Microsoft ESL assistant and NTNU statistical grammar checker. Computational Linguistics and Chinese Language Processing, 14(2), 161–180.Google Scholar
  7. Choueka, Y. (1988). Looking for needles in a haystack or locating interesting collocational expressions in large textual databases. In Proceedings of the RIAO (pp. 34–38).Google Scholar
  8. Church, K. W., & Hanks, P. (1989). Word association norms, mutual information, and Lexicography. In Proceedings of the 27th Annual Meeting of the ACL (pp. 76–83).Google Scholar
  9. Cowie, A. P. (1994). Phraseology. In: R. E. Asher & J. Simpson (Eds.), The encyclopedia of language and linguistics (Vol. 6, pp. 3168–3171). Pergamon, Oxford.Google Scholar
  10. Dahlmeier, D., & Ng, H. T. (2011). Correcting semantic collocation errors with L1-induced paraphrases. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (pp. 107–117). Edinburgh, Scotland.Google Scholar
  11. Evert, S. (2007). Corpora and collocations. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics. An international handbook. Berlin: Mouton de Gruyter.Google Scholar
  12. Evert, S., & Kermes, H. (2003). Experiments on candidate data for collocation extraction. In Companion Volume to the Proceedings of the 10th Conference of the EACL (pp. 83–86).Google Scholar
  13. Futagi, Y., Deane, P., Chodorow, M., & Tetreault, J. (2008). A computational approach to detecting collocation errors in the writing of non-native speakers of English. Computer Assisted Language Learning, 21(1), 353–367.CrossRefGoogle Scholar
  14. Gamon, M., Leacock, C., Brockett, C., Dolan, W., Gao, J., & Belenko, D. (2009). Using statistical techniques and web search to correct ESL errors. CALICO Journal, 26(3), 491–511.Google Scholar
  15. Gilquin, G. (2007). To err is not all. What corpus and elicitation can reveal about the use of collocations by learners. Zeitschrift für Anglistik und Amerikanistik, 55(3), 273–291.CrossRefGoogle Scholar
  16. Granger, S. (1998). Prefabricated patterns in advanced EFL writing: Collocations and formulae. In: A. Cowie (Ed.), Phraseology: Theory, analysis and applications (pp. 145–160). Oxford University Press, Oxford.Google Scholar
  17. Hausmann, F.-J. (1984). Wortschatzlernen ist Kollokationslernen. Zum Lehren und Lernen französischer Wortwendungen. Praxis des neusprachlichen Unterrichts, 31(4), 395–406.Google Scholar
  18. Hausmann, F.-J. (1989). Le dictionnaire de collocations. In F.-J. Hausmann, P. Reichmann, H. E. Wiegang, & L. Zgusta (Eds.), Wörterbücher, dictionaries, dictionnaires. Ein internationales Handbuch. Berlin; De Gruyter.Google Scholar
  19. Hermet, M., Désilets A., & Szpakowicz, S. (2008). Using the web as a linguistic resource to automatically correct lexico-syntactic errors. In Proceedings of the LREC 2008 (pp. 54–57), Marrakech.Google Scholar
  20. Howarth, P. (1998a). Phraseology and second language acquisition. Applied Linguistics, 19(1), 24–44.CrossRefGoogle Scholar
  21. Howarth, P. (1998b). The phraseology of learner’s academic writing. In: A. P. Cowie (Ed.), Phraseology: Theory, analysis and applications (pp. 161–186). Oxford: Oxford University Press.Google Scholar
  22. Kessler, B. (2005). Phonetic comparison algorithms. Transactions of the Philological Society, 103(2), 243–260.CrossRefGoogle Scholar
  23. Kilgarriff, A. (2006). Collocationality (and how to measure it). In Proceedings of the 12th EURALEX International Congress, Torino.Google Scholar
  24. Klein, D., & Manning, C. D. (2003). Accurate unlexicalized parsing. In Proceedings of the ACL Conference (pp. 423–430).Google Scholar
  25. Knight, K., & Chander, I. (1994). Automated postediting of documents. In Proceedings of the AAAI Conference (pp. 779–784) Seattle, WA.Google Scholar
  26. Lesniewska, J. (2006). Collocations and second language use. Studia Linguistica Universitatis lagellonicae Cracoviensis, 123, 95–105.Google Scholar
  27. Lewis, M. (2000). Teaching collocation. Further developments in the lexical approach. London: LTP.Google Scholar
  28. Li, C. C. (2005). A Study of collocational error types in ESL/EFL College learners. Ph.D. thesis, Ming Chuan University College of Applied Languages, Department of Applied English.Google Scholar
  29. Liu, A.L.-E., Wible, D., & Tsao, N.-L. (2009). Automated suggestions for miscollocations. In Proceedings of the NAACL HLT Workshop on Innovative Use of NLP for Building Educational Applications (pp. 47–50). Boulder, CO.Google Scholar
  30. Lozano, C. (2009). CEDEL2: Corpus escrito del español L2. In C. M. Bretones Callejas (Ed.), Applied linguistics now: Understanding language and mind (pp. 197–212). Almería: Universidad de Almería.Google Scholar
  31. Lozano, C., & Mendikoetxea, A. (2013). Learner corpora and second language acquisition: The design and collection of CEDEL2. In A. Díaz-Negrillo, N. Ballier, & P. Thompson, (Eds.), Automatic treatment and analysis of learner corpus data. Amsterdam: Benjamins Academic Publishers.Google Scholar
  32. Mel’čuk, I. A. (1995). Phrasemes in language and phraseology in linguistics. In: M. Everaert, E.-J. van der Linden, A. Schenk & R. Schreuder (Eds.), Idioms: Structural and psychological perspectives (pp. 167–232). Hillsdale: Lawrence Erlbaum Associates.Google Scholar
  33. Meurers, D. (2013). Natural language processing and language learning. In: C. A. Chapelle (Ed.), Encyclopedia of applied linguistics (pp. 1–13). Hoboken: Blackwell.Google Scholar
  34. Nation, I. S. P. (2001). Learning language in another language. Cambridge: Cambridge University Press.Google Scholar
  35. Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some implications for teaching. Applied Linguistics, 24(2), 223–242.CrossRefGoogle Scholar
  36. Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: Benjamins Academic Publishers.CrossRefGoogle Scholar
  37. Pantel, P., & Lin, D. (2000). Word-for-word glossing with contextually similar words. In Proceedings of 4th NAACL Conference (pp 78–85). Seattle.Google Scholar
  38. Park, T., Lank, E., Poupart, P., & Terry, M. (2008). Is the sky pure today? AwkChecker: An assistive tool for detecting and correcting errors. In Proceedings of the 21st Annual ACM Symposium on User Interface Software and Technology (UIST ’08), New York.Google Scholar
  39. Pecina, P. (2008). A machine learning approach to multiword expression extraction. In Proceedings of the LREC 2008 Workshop Towards a Shared Task for Multiword Expressions (MWE 2008) (pp. 54–57), Marrakech.Google Scholar
  40. Shei, C. C., & Pain, H. (2000). An ESL writer’s collocation aid. Computer Assisted Language Learning, 13(2), 167–182.CrossRefGoogle Scholar
  41. Smadja, F. (1993). Retrieving collocations from text: X-Tract. Computational Linguistics, 19(1), 143–177.Google Scholar
  42. Vossen, P. (1998). EuroWordNet: A multilingual database with lexical semantic networks. Dordrecht: Kluwer Academic Publishers.CrossRefGoogle Scholar
  43. Wanner, L., Bohnet, B., & Giereth, M. (2006). Making sense of collocations. Computer Speech and Language, 20(4), 609–624.CrossRefGoogle Scholar
  44. Wible, D., Kuo, C.-H., Tsao, N.-L., Liu, A. L-E., & Lin, H.-L. (2003). Bootstrapping in a language learning environment. Journal of Computer Assisted Learning, 19(4), 90–102.CrossRefGoogle Scholar
  45. Wible, D., & Tsao, N. L. (2010). Stringnet as a computational resource for discovering and investigating linguistic constructions. In Proceedings of the NAACL-HLT Workshop on Extracting and Using Constructions in Computational Linguistics, Los Angeles.Google Scholar
  46. Wu, J.-C., Chang, Y.-C., Mitamura, T., & Chang, J. S. (2010). Automatic collocation suggestion in academic writing. In Proceedings of the ACL Conference, Short paper track, Uppsala.Google Scholar
  47. Yin, X., Gao, J., & Dolan, W. (2008). A web-based English proofing system for English as a second language users. In Proceedings of the 3rd International Joint Conference on Natural Language Processing (pp. 619–624). Hyderabad, India.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • Gabriela Ferraro
    • 1
  • Rogelio Nazar
    • 2
  • Margarita Alonso Ramos
    • 3
  • Leo Wanner
    • 4
  1. 1.Department of Information and Communication TechnologiesPompeu Fabra UniversityBarcelonaSpain
  2. 2.Institute for Applied LinguisticsPompeu Fabra UniversityBarcelonaSpain
  3. 3.Faculty of PhilologyUniversity of La CoruñaLa CoruñaSpain
  4. 4.Department of Information and Communication Technologies, Catalan Institute for Research and Advanced Studies (ICREA)Pompeu Fabra UniversityBarcelonaSpain

Personalised recommendations