A Semi-automatic Error Retrieval Method for Uncovering Collocation Errors from a Large Learner Corpus

  • Christine Ting-Yu Yang
  • Howard Hao-Jan ChenEmail author
  • Chen-Yu Liu
  • Yu-Cheng Liu
Original Paper


Previous studies on ESL/EFL learners’ verb-noun (V-N) miscollocations have shed some light on common miscollocation types and possible causes. However, barriers to further understanding of learners’ difficulties still exist, such as the limited amount of learner data generated from small corpora and the labor-intensive process of manually retrieving collocational errors. To provide researchers with a more efficient retrieval method, this study proposed the use of the Sketch-Diff function in the Sketch Engine (SKE) platform to semi-automatically retrieve collocation errors in large learner corpora. To test the feasibility of this semi-automatic retrieval method, a 7.4-million-word EFL learner corpus was investigated with Sketch-Diff, and 4541 tokens of common miscollocations were identified. Analysis of these miscollocations revealed that most errors were verb-based and often caused by negative transfer from the learners’ L1, undergeneralization (e.g., ignorance of L2 syntactic rules), and approximation (e.g., the misuse of near-synonyms, hyper-/hyponyms, antonyms, and lexemes with similar sound/form). This study demonstrates that using Sketch-Diff to retrieve V-N miscollocations from a large learner corpus is both feasible and efficient. This method can be applied to other languages to further deepen our understanding of L2 learners’ difficulties in collocation acquisition.


Collocation Error extraction method Learner corpus Error analysis 



過去針對以英語為二語/外語的學習者的動詞—名詞(動-名)錯誤搭配研究, 已揭示常見之錯誤類型與可能成因。然而, 此類研究多採用較為耗費人力之搭配詞檢索方式, 來探究小型語料庫之錯誤情形, 造成學者難以將此類擷取方法應用至大型語料庫。為提供研究者更有效率之搭配詞錯誤擷取方法, 本文提倡使用Sketch Engine語料處理平台的Sketch-Diff功能, 以半自動化方式自大型學習者語料庫中擷取搭配詞錯誤。為測試此半自動化擷取方式之可行性, 本文以Sketch-Diff檢驗一座七百四十萬字的學習者語料庫中的錯誤搭配錯誤情形, 並判別出4,541筆常見之動─名錯誤搭配詞。分析結果指出多數錯誤為動詞之誤用, 而歸咎其成因則多為學習者之母語負遷移、生成不足(如:忽略二/外語句法規則)以及相近表達誤用(如:近義詞誤用、上/下位詞誤用、反義詞誤用以及形/音近詞誤用)。本文研究結果表明Sketch-Diff功能可有效以半自動化方式自大型學習者語料庫擷取動—名搭配詞錯誤, 並建議應用此研究方法至其他語言, 以進一步加深對二/外語學習者搭配詞習得困難之理解。


搭配詞 錯誤擷取方法 學習者語料庫 錯誤分析 


Funding Information

This work was financially supported by the “Chinese Language and Technology Center” of National Taiwan Normal University (NTNU) from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan.


  1. 1.
    Aitchison, J. (1987). Words in the mind. Hoboken: Blackwell Publishing.Google Scholar
  2. 2.
    al-Hassnawi, H. (2017). Analysis of collocational errors in selected Iraqi published papers. Adab Al-Kufa, 31, 9–24.Google Scholar
  3. 3.
    Altenberg, B. (1993). Recurrent verb-complement constructions in the London-Lund Corpus. English language corpora: design, analysis and exploitation, 227–245.Google Scholar
  4. 4.
    Bahns, J., & Eldaw, M. (1993). Should we teach EFL students collocations? System, 21(1), 101–114.CrossRefGoogle Scholar
  5. 5.
    Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. London: Longman.Google Scholar
  6. 6.
    Brown, D. F. (1974). Advanced vocabulary teaching: the problem of collocation. RELC Journal, 5(2), 1–11.CrossRefGoogle Scholar
  7. 7.
    Channel, J. (1981). Applying semantic theory to vocabulary teaching. English Language Teaching Journal, 35(2), 115–122.CrossRefGoogle Scholar
  8. 8.
    Chen, M. H. (2008). A study of English collocation competence of college students in Taiwan. Master’s thesis, National Taiwan University of Science and Technology, Taiwan.Google Scholar
  9. 9.
    Chen, H. J. H. (2011). Developing and evaluating a web-based collocation retrieval tool for EFL students and teachers. Computer assisted language learning, 24(1), 59–76.CrossRefGoogle Scholar
  10. 10.
    Chorbwhan, R., & McLellan, J. (2016). First language transfer and the acquisition of English collocations by Thai learners. Southeast Asia: A Multidisciplinary Journal, 16, 16–27.Google Scholar
  11. 11.
    Nattinger, J. R., & DeCarrico, J. S. (1992). Lexical phrases and language teaching. Oxford: Oxford University Press.Google Scholar
  12. 12.
    Ellis, N. C. (1996). Sequencing in SLA. Studies in second language acquisition, 18(1), 91–126.CrossRefGoogle Scholar
  13. 13.
    Fan, M. (2009). An exploratory study of collocational use by ESL students–a task based approach. System, 37(1), 110–123.CrossRefGoogle Scholar
  14. 14.
    Gitsaki, C. (1999). Second language lexical acquisition: a study of the development of collocational knowledge. Maryland: International Scholars Publications.Google Scholar
  15. 15.
    Granger, S. (1998). Prefabricated patterns in advanced EFL writing: collocations and lexical phrases. In A. P. Cowie (Ed.), Phraseology: theory, analysis and applications (pp. 145–160). Oxford: OUP.Google Scholar
  16. 16.
    Hong, A. L., Rahim, H. A., Hua, T. K., & Salehuddin, K. (2011). Collocations in Malaysian English learners’ writing: a corpus-based error analysis. 3L: language, linguistics, literature®, 17(SI), 31–44.Google Scholar
  17. 17.
    James, C. (1998). Errors in language learning and use: exploring error analysis. Abingdon: Routledge.Google Scholar
  18. 18.
    Juknevičienė, R. (2008). Collocations with high-frequency verbs in learner English: Lithuanian learners vs native speakers. Kalbotyra, 59(3), 119–127.CrossRefGoogle Scholar
  19. 19.
    Laufer, B., & Waldman, T. (2011). Verb-noun collocations in second language writing: a corpus analysis of learners’ English. Language learning, 61(2), 647–672.CrossRefGoogle Scholar
  20. 20.
    Lewis, M. (2000). Teaching collocation: further development in the lexical approach. London: Language Teaching Publications.Google Scholar
  21. 21.
    Li, C. C. (2005). A study of collocational error types in ESL/EFL college learners’ writing. Master’s thesis, Ming Chuan University, Taiwan.Google Scholar
  22. 22.
    Lien, H. Y. (2003), The effects of collocation instruction on the reading comprehension of Taiwanese college students. PhD dissertation, Indiana University of Pennsylvania, US.Google Scholar
  23. 23.
    Liu, C. P. (1999). An analysis of collocational errors in EFL writings. In The proceedings of the eighth international symposium on English teaching (pp. 483–494). Taipei: Crane Publishing Co., Ltd..Google Scholar
  24. 24.
    Liu, L. E. (2002). A corpus-based semantic investigation of verb-noun miscollocations in Taiwan learners’ English. Master’s thesis, Tamkang University, Taiwan.Google Scholar
  25. 25.
    Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing (Vol. 999). Cambridge: MIT Press.Google Scholar
  26. 26.
    Marco, M. J. L. (2011). Exploring atypical verb+noun combinations in learner technical writing. International Journal of English Studies, 11(2), 77–95.CrossRefGoogle Scholar
  27. 27.
    Martinez, R., & Schmitt, N. (2012). A phrasal expressions list. Applied Linguistics, 33(3), 299–320.CrossRefGoogle Scholar
  28. 28.
    Matsuda, A., & Matsuda, P. K. (2010). World Englishes and the teaching of writing. Tesol Quarterly, 44(2), 369–374.CrossRefGoogle Scholar
  29. 29.
    Men, H. (2016). Vocabulary increase and collocation learning. Shanghai: Shanghai Jiao Tong University Press.Google Scholar
  30. 30.
    Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some implications for teaching. Applied Linguistics, 24(2), 223–242.CrossRefGoogle Scholar
  31. 31.
    Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: John Benjamins.CrossRefGoogle Scholar
  32. 32.
    Nguyen, T. M. H., & Webb, W. (2017). Examining second language receptive knowledge of collocation and factors that affect learning. Language Teaching Research, 21(3), 298–320.CrossRefGoogle Scholar
  33. 33.
    Paquot, M., & Granger, S. (2012). Formulaic language in learner corpora. Annual Review of Applied Linguistics, 32, 130–149.CrossRefGoogle Scholar
  34. 34.
    Pawley, A., & Syder, F. H. (1983). Two puzzles for linguistic theory: nativelike selection and nativelike fluency. Language and communication, 191, 225.Google Scholar
  35. 35.
    Rychlý, P. (2008). A lexicographer-friendly association score. In RASLAN (pp. 6–9).Google Scholar
  36. 36.
    Schmitt, N. (2010). Researching vocabulary: a vocabulary research manual. Basingstoke: Palgrave Macmillan.CrossRefGoogle Scholar
  37. 37.
    Sinclair, J. (1991). Corpus, concordance, collocation. Hong Kong: Oxford University Press.Google Scholar
  38. 38.
    Stenson, N. (1983). Induced errors. In B. W. Robinett & J. Schachter (Eds.), Second language learning: contrastive analysis, error analysis and related aspects (pp. 256–271). Ann Arbor: University of Michigan Press.Google Scholar
  39. 39.
    Van Rooy, B., & Schäfer, L. (2002). The effect of learner errors on POS tag errors during automatic POS tagging. Southern African linguistics and applied language studies, 20(4), 325–335.CrossRefGoogle Scholar
  40. 40.
    Wang, C. J. (2001). A study of the English collocational competence of English majors in Taiwan. Master’s thesis, Fu Jen Catholic University, Taiwan.Google Scholar
  41. 41.
    Wang, Y., & Shaw, P. (2008). Transfer and universality: collocation use in advanced Chinese and Swedish learner English. ICAME journal, 32, 201–232.Google Scholar
  42. 42.
    Woolard, G. (2000). Collocation-encouraging learner independence. In M. Lewis (Ed.), Teaching collocation: further developments in the lexical approach (pp. 28–46). Hove: Language Teaching Publications.Google Scholar
  43. 43.
    Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  44. 44.
    Wray, A. (2008). Formulaic language: pushing the boundaries. Oxford: Oxford University Press.Google Scholar
  45. 45.
    Wu, W. S. (1996). Lexical collocations: one way to make passive vocabulary active. In The Proceedings of the 11th Conference on English Teaching and Learning in the Republic of China (pp. 461-480).Google Scholar
  46. 46.
    Zhang, X. (2017). Effects of receptive-productive integration tasks and prior knowledge of component words on L2 collocation development. System, 66, 156–167.CrossRefGoogle Scholar
  47. 47.
    Zhang, Y., & Gao, Y. (2006). A CLEC-based study of collocation acquisition by Chinese English language learners. CELEA Journal, 29(4), 28–35.Google Scholar
  48. 48.
    Zhang, W. Z., & Yang, S. (2009). An analysis of V-N collocation errors in CLEC. Journal of PLA University of Foreign Languages, 32(2), 39–44.Google Scholar

Copyright information

© National Taiwan Normal University 2019

Authors and Affiliations

  • Christine Ting-Yu Yang
    • 1
  • Howard Hao-Jan Chen
    • 1
    Email author
  • Chen-Yu Liu
    • 1
  • Yu-Cheng Liu
    • 2
  1. 1.English Department, National Taiwan Normal UniversityTaipei CityTaiwan
  2. 2.Taipei First Girls High SchoolTaipei CityTaiwan

Personalised recommendations