Analyzing Co-occurrence Data



In this chapter, we provide an overview of quantitative approaches to co-occurrence data. We begin with a brief terminological overview of different types of co-occurrence that are prominent in corpus-linguistic studies and then discuss the computation of some widely-used measures of association used to quantify co-occurrence. We present two representative case studies, one exploring lexical collocation and learner proficiency, the other creative uses of verbs with argument structure constructions. In addition, we highlight how most widely-used measures actually all fall out from viewing corpus-linguistic association as an instance of regression modeling and discuss newer developments and potential improvements of association measure research such as utilizing directional measures of association, not uncritically conflating frequency and association-strength information in association measures, type frequencies, and entropies.

Supplementary material (2 kb)
07_Gries_Durrant (ZIP 1 kb)


  1. Ackermann, K., & Chen, Y. H. (2013). Developing the academic collocation list (ACL) – A corpus-driven and expert-judged approach. Journal of English for Academic Purposes, 12(4), 235–247.CrossRefGoogle Scholar
  2. Baayen, R. H. (2011). Corpus linguistics and naive discriminative learning. Brazilian Journal of Applied Linguistics, 11(2), 295–328.Google Scholar
  3. Bartsch, S. (2004). Structural and functional properties of collocations in English. Tübingen: NARR.Google Scholar
  4. Bestgen, Y., & Granger, S. (2014). Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing, 26(4), 28–41.CrossRefGoogle Scholar
  5. Daudaravičius, V., & Marcinkevičienė, R. (2004). Gravity counts for the boundaries of collocations. International Journal of Corpus Linguistics, 9(2), 321–348.CrossRefGoogle Scholar
  6. Durrant, P. (2014). Corpus frequency and second language learners’ knowledge of collocations. International Journal of Corpus Linguistics, 19(4), 443–477.CrossRefGoogle Scholar
  7. Durrant, P., & Schmitt, N. (2009). To what extent do native and non-native writers make use of collocations? International Review of Applied Linguistics, 47(2), 157–177.CrossRefGoogle Scholar
  8. Ellis, N. C. (2007). Language acquisition as rational contingency learning. Applied Linguistics, 27(1), 1–24.CrossRefGoogle Scholar
  9. Ellis, N. C., Simpson-Vlach, R., & Maynard, C. (2008). Formulaic language in native and second-language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quarterly, 1(3), 375–396.CrossRefGoogle Scholar
  10. Evert, S. (2009). Corpora and collocations. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics: An international handbook (Vol. 2, pp. 1212–1248). Berlin/New York: Mouton De Gruyter.Google Scholar
  11. Firth, J. R. (1957). A synopsis of linguistic theory 1930–55. Reprinted in Palmer FR (Ed.), (1968) Selected papers of J.R. Firth, 1952–1959. Longman, London.Google Scholar
  12. Goldberg, A. E. (1995). Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press.Google Scholar
  13. Goldberg, A. E., Casenhiser, D. M., & Sethuraman, N. (2004). Learning argument structure generalizations. Cognitive Linguistics, 15(3), 289–316.CrossRefGoogle Scholar
  14. Gries, S. Th. (2008a). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403–437.Google Scholar
  15. Gries, S. Th. (2008b). Phraseology and linguistic theory: A brief survey. In S. Granger & F. Meunier (Eds.), Phraseology: An interdisciplinary perspective (pp. 3–25). Amsterdam/Philadelphia: John Benjamins.Google Scholar
  16. Gries, S. Th. (2012). Frequencies, probabilities, association measures in usage−/exemplar-based linguistics: Some necessary clarifications. Studies in Language, 36(3), 477–510.Google Scholar
  17. Gries, S. Th. (2013a). 50-something years of work on collocations: What is or should be next …. International Journal of Corpus Linguistics, 18(1), 137–165.Google Scholar
  18. Gries, S. Th. (2013b). Statistics for linguistics with R (2nd rev. & ext. ed) De Gruyter Mouton: Boston/New York.Google Scholar
  19. Gries, S. Th. (2015). More (old and new) misunderstandings of collostructional analysis: On Schmid & Küchenhoff (2013). Cognitive Linguistics, 26(3), 505–536.Google Scholar
  20. Gries, S. Th. (2015). 15 years of collostructions: some long overdue additions/corrections (to/of actually all sorts of corpus-linguistics measures). International Journal of Corpus Linguistics, 24(3), 385–412.Google Scholar
  21. Gries, S. Th. (2018). On over- and underuse in learner corpus research and multifactoriality in corpus linguistics more generally. Journal of Second Language Studies, 1(2), 276–308.CrossRefGoogle Scholar
  22. Gries, S. T. (2019). 15 years of collostructions: Some long overdue additions/corrections (to/of actually all sorts of corpus-linguistics measures). International Journal of Corpus Linguistics, 24, 385.Google Scholar
  23. Gries, S. Th., & Mukherjee, J. (2010). Lexical gravity across varieties of English: An ICE-based study of n-grams in Asian Englishes. International Journal of Corpus Linguistics, 15(4), 520–548.CrossRefGoogle Scholar
  24. Gries, S. Th., Hampe, B., & Schönefeld, D. (2005). Converging evidence: Bringing together experimental and corpus data on the association of verbs and constructions. Cognitive Linguistics, 16(4), 635–676.CrossRefGoogle Scholar
  25. Hampe, B., & Schönefeld, D. (2006). Syntactic leaps or lexical variation? – More on “Creative Syntax”. In S. T. Gries & A. Stefanowitsch (Eds.), Corpora in cognitive linguistics: Corpus-based approaches to syntax and lexis (pp. 127–157). Berlin/New York: Mouton de Gruyter.Google Scholar
  26. Harris, Z. S. (1970). Papers in structural and transformational linguistics. Dordrecht: Reidel.CrossRefGoogle Scholar
  27. Lester, N. A., & Moscoso del Prado, M. F. (2016). Syntactic flexibility in the noun: Evidence from picture naming. In A. Papafragou, D. Grodner, D. Mirman, & J. C. Trueswell (Eds.), Proceedings of the 38th annual conference of the cognitive science society (pp. 2585–2590). Austin: Cognitive Science Society.Google Scholar
  28. Linzen, T., & Jaeger, T. F. (2015). Uncertainty and expectation in sentence processing: Evidence from subcategorization distributions. Cognitive Science, 40(6), 1382–1411.CrossRefGoogle Scholar
  29. McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced resource book. Oxon/New York: Routledge.Google Scholar
  30. Michelbacher, L., Evert, S., & Schütze, H. (2007). Asymmetric association measures. International Conference on Recent Advances in Natural Language Processing.Google Scholar
  31. Michelbacher, L., Evert, S., & Schütze, H. (2011). Asymmetry in corpus-derived and human word associations. Corpus Linguistics and Linguistic Theory, 7(2), 245–276.CrossRefGoogle Scholar
  32. Mollin, S. (2009). Combining corpus linguistic and psychological data on word co-occurrences: Corpus collocates versus word associations. Corpus Linguistics and Linguistic Theory, 5(2), 175–200.CrossRefGoogle Scholar
  33. Pecina, P. (2010). Lexical association measures and collocation extraction. Language Resources and Evaluation, 44(1), 137–158.CrossRefGoogle Scholar
  34. Schneider, U. (to appear). Delta P as a measure of collocation strength. Corpus Linguistics and Linguistic Theory.Google Scholar
  35. Siyanova-Chanturia, A. (2015). Collocation in beginner learner writing: A longitudinal study. System, 53(4), 148–160.CrossRefGoogle Scholar
  36. Stefanowitsch, A., & Gries, S. T. (2003). Collostructions: Investigating the interaction between words and constructions. International Journal of Corpus Linguistics, 8(2), 209–243.CrossRefGoogle Scholar
  37. Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of California, Santa BarbaraSanta BarbaraUSA
  2. 2.Justus Liebig University GiessenGiessenGermany

Personalised recommendations