Analyzing Co-occurrence Data

Gries, Stefan Th.; Durrant, Philip

doi:10.1007/978-3-030-46216-1_7

1913 Accesses
3 Citations
8 Altmetric

Abstract

In this chapter, we provide an overview of quantitative approaches to co-occurrence data. We begin with a brief terminological overview of different types of co-occurrence that are prominent in corpus-linguistic studies and then discuss the computation of some widely-used measures of association used to quantify co-occurrence. We present two representative case studies, one exploring lexical collocation and learner proficiency, the other creative uses of verbs with argument structure constructions. In addition, we highlight how most widely-used measures actually all fall out from viewing corpus-linguistic association as an instance of regression modeling and discuss newer developments and potential improvements of association measure research such as utilizing directional measures of association, not uncritically conflating frequency and association-strength information in association measures, type frequencies, and entropies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We are ignoring the lexico-textual co-occurrence sense of colligation here.
2.
While AMs often agree fairly well in their assessment of the degree of attraction between two elements (or at least their overall ranking), their computation can lead to them having different ‘preferences’. For instance, pointwise MI is known to return low-frequency but perfectly predictive collocations (e.g. fixed expressions) whereas measures that are ultimately based on significance tests (such as G ² or t) often rank more frequent items higher; see Evert (2009) for more discussion.
3.
The Kullback-Leibler divergence is also already mentioned in Pecina (2010).
4.
See Michelbacher et al. (2007, 2011) and Gries (2013a) for further explorations of uni-directional/asymmetric measures.

References

Ackermann, K., & Chen, Y. H. (2013). Developing the academic collocation list (ACL) – A corpus-driven and expert-judged approach. Journal of English for Academic Purposes, 12(4), 235–247.
Article Google Scholar
Baayen, R. H. (2011). Corpus linguistics and naive discriminative learning. Brazilian Journal of Applied Linguistics, 11(2), 295–328.
Google Scholar
Bartsch, S. (2004). Structural and functional properties of collocations in English. Tübingen: NARR.
Google Scholar
Bestgen, Y., & Granger, S. (2014). Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing, 26(4), 28–41.
Article Google Scholar
Daudaravičius, V., & Marcinkevičienė, R. (2004). Gravity counts for the boundaries of collocations. International Journal of Corpus Linguistics, 9(2), 321–348.
Article Google Scholar
Durrant, P. (2014). Corpus frequency and second language learners’ knowledge of collocations. International Journal of Corpus Linguistics, 19(4), 443–477.
Article Google Scholar
Durrant, P., & Schmitt, N. (2009). To what extent do native and non-native writers make use of collocations? International Review of Applied Linguistics, 47(2), 157–177.
Article Google Scholar
Ellis, N. C. (2007). Language acquisition as rational contingency learning. Applied Linguistics, 27(1), 1–24.
Article Google Scholar
Ellis, N. C., Simpson-Vlach, R., & Maynard, C. (2008). Formulaic language in native and second-language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quarterly, 1(3), 375–396.
Article Google Scholar
Evert, S. (2009). Corpora and collocations. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics: An international handbook (Vol. 2, pp. 1212–1248). Berlin/New York: Mouton De Gruyter.
Google Scholar
Firth, J. R. (1957). A synopsis of linguistic theory 1930–55. Reprinted in Palmer FR (Ed.), (1968) Selected papers of J.R. Firth, 1952–1959. Longman, London.
Google Scholar
Goldberg, A. E. (1995). Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press.
Google Scholar
Goldberg, A. E., Casenhiser, D. M., & Sethuraman, N. (2004). Learning argument structure generalizations. Cognitive Linguistics, 15(3), 289–316.
Article Google Scholar
Gries, S. Th. (2008a). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403–437.
Google Scholar
Gries, S. Th. (2008b). Phraseology and linguistic theory: A brief survey. In S. Granger & F. Meunier (Eds.), Phraseology: An interdisciplinary perspective (pp. 3–25). Amsterdam/Philadelphia: John Benjamins.
Google Scholar
Gries, S. Th. (2012). Frequencies, probabilities, association measures in usage−/exemplar-based linguistics: Some necessary clarifications. Studies in Language, 36(3), 477–510.
Google Scholar
Gries, S. Th. (2013a). 50-something years of work on collocations: What is or should be next …. International Journal of Corpus Linguistics, 18(1), 137–165.
Google Scholar
Gries, S. Th. (2013b). Statistics for linguistics with R (2nd rev. & ext. ed) De Gruyter Mouton: Boston/New York.
Google Scholar
Gries, S. Th. (2015). More (old and new) misunderstandings of collostructional analysis: On Schmid & Küchenhoff (2013). Cognitive Linguistics, 26(3), 505–536.
Google Scholar
Gries, S. Th. (2015). 15 years of collostructions: some long overdue additions/corrections (to/of actually all sorts of corpus-linguistics measures). International Journal of Corpus Linguistics, 24(3), 385–412.
Google Scholar
Gries, S. Th. (2018). On over- and underuse in learner corpus research and multifactoriality in corpus linguistics more generally. Journal of Second Language Studies, 1(2), 276–308.
Article Google Scholar
Gries, S. T. (2019). 15 years of collostructions: Some long overdue additions/corrections (to/of actually all sorts of corpus-linguistics measures). International Journal of Corpus Linguistics, 24, 385.
Google Scholar
Gries, S. Th., & Mukherjee, J. (2010). Lexical gravity across varieties of English: An ICE-based study of n-grams in Asian Englishes. International Journal of Corpus Linguistics, 15(4), 520–548.
Article Google Scholar
Gries, S. Th., Hampe, B., & Schönefeld, D. (2005). Converging evidence: Bringing together experimental and corpus data on the association of verbs and constructions. Cognitive Linguistics, 16(4), 635–676.
Article Google Scholar
Hampe, B., & Schönefeld, D. (2006). Syntactic leaps or lexical variation? – More on “Creative Syntax”. In S. T. Gries & A. Stefanowitsch (Eds.), Corpora in cognitive linguistics: Corpus-based approaches to syntax and lexis (pp. 127–157). Berlin/New York: Mouton de Gruyter.
Google Scholar
Harris, Z. S. (1970). Papers in structural and transformational linguistics. Dordrecht: Reidel.
Book Google Scholar
Lester, N. A., & Moscoso del Prado, M. F. (2016). Syntactic flexibility in the noun: Evidence from picture naming. In A. Papafragou, D. Grodner, D. Mirman, & J. C. Trueswell (Eds.), Proceedings of the 38th annual conference of the cognitive science society (pp. 2585–2590). Austin: Cognitive Science Society.
Google Scholar
Linzen, T., & Jaeger, T. F. (2015). Uncertainty and expectation in sentence processing: Evidence from subcategorization distributions. Cognitive Science, 40(6), 1382–1411.
Article Google Scholar
McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced resource book. Oxon/New York: Routledge.
Google Scholar
Michelbacher, L., Evert, S., & Schütze, H. (2007). Asymmetric association measures. International Conference on Recent Advances in Natural Language Processing.
Google Scholar
Michelbacher, L., Evert, S., & Schütze, H. (2011). Asymmetry in corpus-derived and human word associations. Corpus Linguistics and Linguistic Theory, 7(2), 245–276.
Article Google Scholar
Mollin, S. (2009). Combining corpus linguistic and psychological data on word co-occurrences: Corpus collocates versus word associations. Corpus Linguistics and Linguistic Theory, 5(2), 175–200.
Article Google Scholar
Pecina, P. (2010). Lexical association measures and collocation extraction. Language Resources and Evaluation, 44(1), 137–158.
Article Google Scholar
Schneider, U. (to appear). Delta P as a measure of collocation strength. Corpus Linguistics and Linguistic Theory.
Google Scholar
Siyanova-Chanturia, A. (2015). Collocation in beginner learner writing: A longitudinal study. System, 53(4), 148–160.
Article Google Scholar
Stefanowitsch, A., & Gries, S. T. (2003). Collostructions: Investigating the interaction between words and constructions. International Journal of Corpus Linguistics, 8(2), 209–243.
Article Google Scholar
Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press.
Book Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Santa Barbara, Santa Barbara, CA, USA
Stefan Th. Gries & Philip Durrant
Justus Liebig University Giessen, Giessen, Germany
Stefan Th. Gries & Philip Durrant

Authors

Stefan Th. Gries
View author publications
You can also search for this author in PubMed Google Scholar
Philip Durrant
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefan Th. Gries .

Editor information

Editors and Affiliations

FNRS Centre for English Corpus Linguistics, Language and Communication Institute, UCLouvain, Louvain-la-Neuve, Belgium
Magali Paquot
Department of Linguistics, University of California, Santa Barbara, CA, USA
Stefan Th. Gries

1 Electronic Supplementary Materials

07_Gries_Durrant (ZIP 1 kb)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gries, S.T., Durrant, P. (2020). Analyzing Co-occurrence Data. In: Paquot, M., Gries, S.T. (eds) A Practical Handbook of Corpus Linguistics. Springer, Cham. https://doi.org/10.1007/978-3-030-46216-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-46216-1_7
Published: 05 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46215-4
Online ISBN: 978-3-030-46216-1
eBook Packages: Religion and PhilosophyPhilosophy and Religion (R0)

Publish with us

Policies and ethics