Abstract
In this chapter, we provide an overview of quantitative approaches to co-occurrence data. We begin with a brief terminological overview of different types of co-occurrence that are prominent in corpus-linguistic studies and then discuss the computation of some widely-used measures of association used to quantify co-occurrence. We present two representative case studies, one exploring lexical collocation and learner proficiency, the other creative uses of verbs with argument structure constructions. In addition, we highlight how most widely-used measures actually all fall out from viewing corpus-linguistic association as an instance of regression modeling and discuss newer developments and potential improvements of association measure research such as utilizing directional measures of association, not uncritically conflating frequency and association-strength information in association measures, type frequencies, and entropies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We are ignoring the lexico-textual co-occurrence sense of colligation here.
- 2.
While AMs often agree fairly well in their assessment of the degree of attraction between two elements (or at least their overall ranking), their computation can lead to them having different ‘preferences’. For instance, pointwise MI is known to return low-frequency but perfectly predictive collocations (e.g. fixed expressions) whereas measures that are ultimately based on significance tests (such as G 2 or t) often rank more frequent items higher; see Evert (2009) for more discussion.
- 3.
The Kullback-Leibler divergence is also already mentioned in Pecina (2010).
- 4.
References
Ackermann, K., & Chen, Y. H. (2013). Developing the academic collocation list (ACL) – A corpus-driven and expert-judged approach. Journal of English for Academic Purposes, 12(4), 235–247.
Baayen, R. H. (2011). Corpus linguistics and naive discriminative learning. Brazilian Journal of Applied Linguistics, 11(2), 295–328.
Bartsch, S. (2004). Structural and functional properties of collocations in English. Tübingen: NARR.
Bestgen, Y., & Granger, S. (2014). Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing, 26(4), 28–41.
Daudaravičius, V., & Marcinkevičienė, R. (2004). Gravity counts for the boundaries of collocations. International Journal of Corpus Linguistics, 9(2), 321–348.
Durrant, P. (2014). Corpus frequency and second language learners’ knowledge of collocations. International Journal of Corpus Linguistics, 19(4), 443–477.
Durrant, P., & Schmitt, N. (2009). To what extent do native and non-native writers make use of collocations? International Review of Applied Linguistics, 47(2), 157–177.
Ellis, N. C. (2007). Language acquisition as rational contingency learning. Applied Linguistics, 27(1), 1–24.
Ellis, N. C., Simpson-Vlach, R., & Maynard, C. (2008). Formulaic language in native and second-language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quarterly, 1(3), 375–396.
Evert, S. (2009). Corpora and collocations. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics: An international handbook (Vol. 2, pp. 1212–1248). Berlin/New York: Mouton De Gruyter.
Firth, J. R. (1957). A synopsis of linguistic theory 1930–55. Reprinted in Palmer FR (Ed.), (1968) Selected papers of J.R. Firth, 1952–1959. Longman, London.
Goldberg, A. E. (1995). Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press.
Goldberg, A. E., Casenhiser, D. M., & Sethuraman, N. (2004). Learning argument structure generalizations. Cognitive Linguistics, 15(3), 289–316.
Gries, S. Th. (2008a). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403–437.
Gries, S. Th. (2008b). Phraseology and linguistic theory: A brief survey. In S. Granger & F. Meunier (Eds.), Phraseology: An interdisciplinary perspective (pp. 3–25). Amsterdam/Philadelphia: John Benjamins.
Gries, S. Th. (2012). Frequencies, probabilities, association measures in usage−/exemplar-based linguistics: Some necessary clarifications. Studies in Language, 36(3), 477–510.
Gries, S. Th. (2013a). 50-something years of work on collocations: What is or should be next …. International Journal of Corpus Linguistics, 18(1), 137–165.
Gries, S. Th. (2013b). Statistics for linguistics with R (2nd rev. & ext. ed) De Gruyter Mouton: Boston/New York.
Gries, S. Th. (2015). More (old and new) misunderstandings of collostructional analysis: On Schmid & Küchenhoff (2013). Cognitive Linguistics, 26(3), 505–536.
Gries, S. Th. (2015). 15 years of collostructions: some long overdue additions/corrections (to/of actually all sorts of corpus-linguistics measures). International Journal of Corpus Linguistics, 24(3), 385–412.
Gries, S. Th. (2018). On over- and underuse in learner corpus research and multifactoriality in corpus linguistics more generally. Journal of Second Language Studies, 1(2), 276–308.
Gries, S. T. (2019). 15 years of collostructions: Some long overdue additions/corrections (to/of actually all sorts of corpus-linguistics measures). International Journal of Corpus Linguistics, 24, 385.
Gries, S. Th., & Mukherjee, J. (2010). Lexical gravity across varieties of English: An ICE-based study of n-grams in Asian Englishes. International Journal of Corpus Linguistics, 15(4), 520–548.
Gries, S. Th., Hampe, B., & Schönefeld, D. (2005). Converging evidence: Bringing together experimental and corpus data on the association of verbs and constructions. Cognitive Linguistics, 16(4), 635–676.
Hampe, B., & Schönefeld, D. (2006). Syntactic leaps or lexical variation? – More on “Creative Syntax”. In S. T. Gries & A. Stefanowitsch (Eds.), Corpora in cognitive linguistics: Corpus-based approaches to syntax and lexis (pp. 127–157). Berlin/New York: Mouton de Gruyter.
Harris, Z. S. (1970). Papers in structural and transformational linguistics. Dordrecht: Reidel.
Lester, N. A., & Moscoso del Prado, M. F. (2016). Syntactic flexibility in the noun: Evidence from picture naming. In A. Papafragou, D. Grodner, D. Mirman, & J. C. Trueswell (Eds.), Proceedings of the 38th annual conference of the cognitive science society (pp. 2585–2590). Austin: Cognitive Science Society.
Linzen, T., & Jaeger, T. F. (2015). Uncertainty and expectation in sentence processing: Evidence from subcategorization distributions. Cognitive Science, 40(6), 1382–1411.
McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced resource book. Oxon/New York: Routledge.
Michelbacher, L., Evert, S., & Schütze, H. (2007). Asymmetric association measures. International Conference on Recent Advances in Natural Language Processing.
Michelbacher, L., Evert, S., & Schütze, H. (2011). Asymmetry in corpus-derived and human word associations. Corpus Linguistics and Linguistic Theory, 7(2), 245–276.
Mollin, S. (2009). Combining corpus linguistic and psychological data on word co-occurrences: Corpus collocates versus word associations. Corpus Linguistics and Linguistic Theory, 5(2), 175–200.
Pecina, P. (2010). Lexical association measures and collocation extraction. Language Resources and Evaluation, 44(1), 137–158.
Schneider, U. (to appear). Delta P as a measure of collocation strength. Corpus Linguistics and Linguistic Theory.
Siyanova-Chanturia, A. (2015). Collocation in beginner learner writing: A longitudinal study. System, 53(4), 148–160.
Stefanowitsch, A., & Gries, S. T. (2003). Collostructions: Investigating the interaction between words and constructions. International Journal of Corpus Linguistics, 8(2), 209–243.
Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic Supplementary Materials
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Gries, S.T., Durrant, P. (2020). Analyzing Co-occurrence Data. In: Paquot, M., Gries, S.T. (eds) A Practical Handbook of Corpus Linguistics. Springer, Cham. https://doi.org/10.1007/978-3-030-46216-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-46216-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46215-4
Online ISBN: 978-3-030-46216-1
eBook Packages: Religion and PhilosophyPhilosophy and Religion (R0)