Skip to main content

On Collocations

  • Chapter
  • First Online:
Book cover Syntax-Based Collocation Extraction

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 44))

  • 1074 Accesses

Abstract

The primary goal of this chapter is to set the stage for the practical investigations described in the remainder of this book by providing an overall picture of the existing theoretical descriptions of collocation phenomenon. Because the various approaches to collocation—motivated by linguistic, lexicographic, pedagogical or computational considerations—often made use of vague and inconsistent terminology, we begin by providing an analytic review of the definitions of collocation available in the literature and then identify the most salient and uncontroversial features of the phenomena that will serve as the basis for the discussion throughout the rest of the book. After reviewing the definitions, we present a brief discussion of the main theoretical linguistic frameworks which have addressed collocation phenomena. We then consider characterizations proposed by various researchers in terms of semantic compositionality and morpho-syntactic behaviour. We conclude with giving the definition of collocation which we believe most adequately captures this phenomena for our present purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Bally distinguishes between “groupements passagers” (free combinations), “groupements usuels” (collocations), and “séries phraséologiques” (idioms).

  2. 2.

    For instance, Wanner et al. (2006, 611) notes that the second notion “is different from the notion of collocation in the sense of Firth (1957) (…) who define a collocation as a high probability association of lexical items in the corpus”.

  3. 3.

    A statistical approach, called distributional, and a phraseological (linguistic) approach, called intensional (Evert, 2004b).

  4. 4.

    See also Manning and Schütze’s (1999) non-substitutability criterion mentioned in Section 2.5.

  5. 5.

    Literally, “general validity have”.

  6. 6.

    An analogy can be found between the idiom principle and the Elsewhere Principle in linguistics, concerned with the opposition between regularity and specificity. This principle states that whenever two rules can be applied, the more specific one overwrites the more general one. An account for idioms based on this principle was proposed by Zeevat (1995).

  7. 7.

    A linguistic operation is said to be performed in a regular way if it conforms to the lexicon and to the general rules of the grammar (general rules refer to rules involving classes of lexemes, as opposed to individual lexemes). It is performed in a non-restricted way if it can use any item of the lexicon and any grammar rule, without mentioning other lexical items (Mel’čuk, 2003).

  8. 8.

    For example, Dico and LAF (Polguère, 2000), Papillon (Boitet et al.,2002; Sérasset, 2004); (Mangeot, 2006), DAFLES (Selva et al., 2002).

  9. 9.

    Firth (1968, 183) describes the concept of colligation as “the interrelation of grammatical categories in syntactical structure”.

  10. 10.

    Sinclair’s position on the syntax-lexicon continuum and the place of collocations is even more radical: “The decoupling of lexis and syntax leads to the creation of a rubbish dump that is called ‘idiom’, ‘phraseology’, ‘collocation’, and the like. (…) The evidence now becoming available casts grave doubts on the wisdom of postulating separate domains of lexis and syntax” (Sinclair, 1991, 104).

  11. 11.

    Nevertheless, some studies have attempted to grade the compositionality of phraseological units, e.g., McCarthy et al. (2003), Baldwin et al. (2003), Venkatapathy and Joshi (2005), and Piao et al. (2006). A larger number of works focused, more generally, on distinguishing between compositional and non-compositional units: see, among many others, Bannard (2005), Fazly and Stevenson (2007), McCarthy et al. (2007), Cook et al. (2008), Diab and Bhutada (2009).

  12. 12.

    Only the criterion of non-substitutability (17c) is commonly associated with collocations.

  13. 13.

    The following abbreviations are used for part-of-speech categories: A – adjective, Adv – adverb, Conj – conjunction, D – determiner, Inter – interjection, N – noun, P – preposition, V – verb.

  14. 14.

    These combinations correspond to the so-called lexical collocations according to Benson et al.’s description of collocations (1986a). Benson et al. divide the collocations in two broad classes, one allowing exclusively open-class words (lexical collocations) and one involving function words as well (grammatical collocations).

  15. 15.

    On the same line, Kjellmer (1990, 172) notes that functional categories such as articles and prepositions are collocational in nature.

  16. 16.

    Usually, a frequency threshold is applied in related work to ensure the reliability of statistical association measures, as will be seen in Chapter 3. Instead, we consider that the initial selection of collocation candidates must be exhaustive, independent of the statistical computation, and that specific restrictions may be applied later depending on the subsequent processing.

References

  • Bahns J (1993) Lexical collocations: a contrastive view. ELT Journal 1(47):56–63

    Article  Google Scholar 

  • Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley FrameNet project. In: Proceedings of the COLING-ACL, Montreal, Canada, pp 86–90

    Google Scholar 

  • Baldwin T, Bannard C, Tanaka T, Widdows D (2003) An empirical model of multiword expression decomposability. In: Bond F, Korhonen A, McCarthy D Villavicencio A (eds) Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, Sapporo, Japan, pp 89–96

    Google Scholar 

  • Bally C (1909) Traité de stylistique française. Klincksieck, Paris

    Google Scholar 

  • Bally C (1951) Traité de stylistique française. Klincksieck, Paris

    Google Scholar 

  • Bannard C (2005) Learning about the meaning of verb-particle constructions from corpora. Computer Speech and Language 19(4):467–478

    Google Scholar 

  • Bartsch S (2004) Structural and Functional Properties of Collocations in English. A Corpus Study of Lexical and Pragmatic Constraints on Lexical Cooccurrence. Gunter Narr Verlag, Tübingen

    Google Scholar 

  • Benson M (1990) Collocations and general-purpose dictionaries. International Journal of Lexicography 3(1):23–35

    Article  Google Scholar 

  • Benson M, Benson E, Ilson R (1986a) The BBI Dictionary of English Word Combinations. John Benjamins, Amsterdam/Philadelphia

    Google Scholar 

  • Benson M, Benson E, Ilson R (1986b) Lexicographic Description of English. John Benjamins, Amsterdam/Philadelphia

    Google Scholar 

  • Boitet C, Mangeot M, Sérasset G (2002) The PAPILLON Project: Cooperatively building a multilingual lexical database to derive open source dictionaries and lexicons. In: Proceedings of the 2nd Workshop on NLP and XML (NLPXML-2002), Taipei, Taiwan

    Google Scholar 

  • Choueka Y (1988) Looking for needles in a haystack, or locating interesting collocational expressions in large textual databases. In: Proceedings of the International Conference on User-Oriented Content-Based Text and Image Handling, Cambridge, MA, USA, pp 609–623

    Google Scholar 

  • Church K, Hanks P (1990) Word association norms, mutual information, and lexicography. Computational Linguistics 16(1):22–29

    Google Scholar 

  • Cook P, Fazly A, Stevenson S (2008) The VNC-tokens dataset. In: Proceedings of the LREC Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), Marrakech, Morocco, pp 19–22

    Google Scholar 

  • Coseriu E (1967) Lexikalische Solidaritäten. Poetica (1):293–303

    Google Scholar 

  • Cowie AP (1978) The place of illustrative material and collocations in the design of a learner’s dictionary. In: Strevens P (ed) In Honour of A.S. Hornby, Oxford University Press, Oxford, pp 127–139

    Google Scholar 

  • Cruse DA (1986) Lexical Semantics. Cambridge University Press, Cambridge

    Google Scholar 

  • Diab MT, Bhutada P (2009) Verb noun construction MWE token supervised classification. In: 2009 Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation, Applications, Suntec, Singapore, pp 17–22

    Chapter  Google Scholar 

  • Evert S (2004b) The statistics of word cooccurrences: Word pairs and collocations. PhD thesis, University of Stuttgart

    Google Scholar 

  • Fazly A, Stevenson S (2007) Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In: Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, Prague, Czech Republic, pp 9–16

    Google Scholar 

  • Fillmore C, Kay P, O’Connor C (1988) Regularity and idiomaticity in grammatical constructions: The case of let alone. Language 64(3):501–538

    Google Scholar 

  • Fillmore CJ (1982) Frame semantics. In: Linguistics in the Morning Calm, Hanshin Publishing Co., Seoul, pp 111–137

    Google Scholar 

  • Firth JR (1957) Papers in Linguistics 1934-1951. Oxford University Press, Oxford

    Google Scholar 

  • Firth JR (1968) A synopsis of linguistic theory, 1930–1955. In: Palmer F (ed) Selected papers of J. R. Firth, 1952–1959, Indiana University Press, Bloomington, IN, pp 168–205

    Google Scholar 

  • Fontenelle T (1992) Collocation acquisition from a corpus or from a dictionary: A comparison. Proceedings I-II Papers submitted to the 5th EURALEX International Congress on Lexicography in Tampere, Tampere, Finland, pp 221–228

    Google Scholar 

  • Fontenelle T (1997a) Turning a Bilingual Dictionary into a Lexical-Semantic Database. Max Niemeyer Verlag, Tübingen

    Google Scholar 

  • Fontenelle T (1997b) Using a bilingual dictionary to create semantic networks. International Journal of Lexicography 10(4):276–303

    Article  Google Scholar 

  • Fontenelle T (2001) Collocation modelling: From lexical functions to frame semantics. In: Proceedings of the ACL Workshop on Collocation: Computational Extraction, Analysis and Exploitation, Toulouse, France, pp 1–7

    Google Scholar 

  • Francis G (1993) A corpus-driven approach to grammar: Principles, methods and examples. In: Baker M, Francis G, Tognini-Bonelli E (eds) Text and Technology: In Honour of John Sinclair, John Benjamins, Amsterdam, pp 137–156

    Google Scholar 

  • Gitsaki C (1996) The development of ESL collocational knowledge. PhD thesis, University of Queensland

    Google Scholar 

  • Gross G (1996) Les expressions figées en français. OPHRYS, Paris

    Google Scholar 

  • Gross M (1984) Lexicon-grammar and the syntactic analysis of French. In: Proceedings of the 10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics, Morristown, NJ, USA, pp 275–282

    Google Scholar 

  • Halliday MAK, Hasan R (1976) Cohesion in English. Longman, London

    Google Scholar 

  • Hargreaves P (2000) Collocation and testing. In: Lewis M (ed) Teaching Collocations, Language Teaching Publications, Hove

    Google Scholar 

  • Hausmann FJ (1979) Un dictionnaire des collocations est-il possible? Travaux de littérature et de linguistique de l’Université de Strasbourg 17(1):187–195

    MathSciNet  Google Scholar 

  • Hausmann FJ (1985) Kollokationen im deutschen Wörterbuch. Ein Beitrag zur Theorie des lexikographischen Beispiels. In: Bergenholtz H, Mugdan J (eds) Lexikographie und Grammatik. Akten des Essener Kolloquiums zur Grammatik im Wörterbuch, Lexicographica. Series Major 3, pp 118–129

    Google Scholar 

  • Hausmann FJ (1989) Le dictionnaire de collocations. In: Hausmann F, Reichmann O, Wiegand H, Zgusta L (eds) Wörterbücher: Ein internationales Handbuch zur Lexicographie. Dictionaries, Dictionnaires, de Gruyter, Berlin, pp 1010–1019

    Google Scholar 

  • Hausmann FJ (2004) Was sind eigentlich Kollokationen? In: Steyer K (ed) Wortverbindungen – mehr oder weniger fest. Jahrbuch des Instituts für Deutsche Sprache 2003, de Gruyter, Berlin, pp 309–334

    Google Scholar 

  • Heid U (1994) On ways words work together – research topics in lexical combinatorics. In: Proceedings of the 6th Euralex International Congress on Lexicography (EURALEX ’94), Amsterdam, The Netherlands, pp 226–257

    Google Scholar 

  • Heid U, Raab S (1989) Collocations in multilingual generation. In: Proceeding of the 4th Conference of the European Chapter of the Association for Computational Linguistics (EACL’89), Manchester, England, pp 130–136

    Google Scholar 

  • Heylen D, Maxwell KG, Verhagen M (1994) Lexical functions and machine translation. In: Proceedings of the 15th International Conference on Computational Linguistics (COLING 1994), Kyoto, Japan, pp 1240–1244

    Google Scholar 

  • Hoey M (1991) Patterns of Lexis in Text. Oxford University Press, Oxford

    Google Scholar 

  • Hoey M (1997) From concordance to text structure: New uses for computer corpora. In: Melia J, Lewandoska B (eds) Proceedings of Practical Applications of Language Corpora (PALC 1997), Lodz, Poland, pp 2–23

    Google Scholar 

  • Hoey M (2000) A world beyond collocation: New perspectives on vocabulary teaching. In: Lewis M (ed) Teaching Collocations, Language Teaching Publications, Hove

    Google Scholar 

  • Hornby AS, Cowie AP, Lewis JW (1948a) Oxford Advanced Learner’s Dictionary of Current English. Oxford University Press, London

    Google Scholar 

  • Hornby AS, Gatenby EV, Wakefield H (1948b) A Learner’s Dictionary of Current English. Oxford University Press, London

    Google Scholar 

  • Hornby AS, Gatenby EV, Wakefield H (1952) The Advanced Learner’s Dictionary of Current English. Oxford University Press, London

    Google Scholar 

  • Hunston S, Francis G (1998) Verbs observed: A corpus-driven pedagogic grammar. Applied Linguistics 19(1):45–72

    Article  Google Scholar 

  • Hunston S, Francis G, Manning E (1997) Grammar and vocabulary: Showing the connections. English Language Teaching Journal 3(51):208–215

    Google Scholar 

  • Kahane S, Polguère A (2001) Formal foundations of lexical functions. In: Proceedings of the ACL Workshop on Collocation: Computational Extraction, Analysis and Exploitation, Toulouse, France, pp 8–15

    Google Scholar 

  • Kjellmer G (1987) Aspects of English collocations. In: Meijs W (ed) Corpus Linguistics and Beyond, Rodopi, Amsterdam, pp 133–140

    Google Scholar 

  • Kjellmer G (1990) Patterns of collocability. In: Aarts J, Meijs W (eds) Theory and practice in Corpus Linguistics, Rodopi B.V., Amsterdam, pp 163–178

    Google Scholar 

  • Kjellmer G (1991) A mint of phrases. In: Aijmer K, Altenberg B (eds) English Corpus Linguistics. Studies in Honour of Jan Svartvik, Longman, London/New York, pp 111–127

    Google Scholar 

  • Lehr A (1996) Germanistische Linguistik: Kollokationen und maschinenlesbare Korpora, vol 168. Niemeyer, Tübingen

    Google Scholar 

  • Lewis M (2000) Teaching Collocations. Further Developments in the Lexical Approach. Language Teaching Publications, Hove

    Google Scholar 

  • Louw B (1993) Irony in the text or insincerity in the writer? The diagnostic potential of semantic prosodies. In: Baker M, Francis G, Tognini-Bonelli E (eds) Text and Technology: In Honour of John Sinclair, John Benjamins, Amsterdam, pp 157–176

    Google Scholar 

  • Mangeot M (2006) Papillon project: Retrospective and perspectives. In: Proceedings of the LREC 2006 Workshop on Acquiring and Representing Multilingual, Specialized Lexicons: The Case of Biomedicine, Genoa, Italy

    Google Scholar 

  • Manning CD, Schütze H (1999) Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA

    Google Scholar 

  • McCarthy D, Keller B, Carroll J (2003) Detecting a continuum of compositionality in phrasal verbs. In: Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, Sapporo, Japan, pp 73–80

    Google Scholar 

  • McCarthy D, Venkatapathy S, Joshi A (2007) Detecting compositionality of verb-object combinations using selectional preferences. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, pp 369–379

    Google Scholar 

  • McKeown KR, Radev DR (2000) Collocations. In: Dale R, Moisl H, Somers H (eds) A Handbook of Natural Language Processing, Marcel Dekker, New York, NY, pp 507–523

    Google Scholar 

  • Mel’čuk I (1998) Collocations and lexical functions. In: Cowie AP (ed) Phraseology. Theory, Analysis, and Applications, Claredon Press, Oxford, pp 23–53

    Google Scholar 

  • Mel’čuk I (2003) Collocations: définition, rôle et utilité. In: Grossmann F, Tutin A (eds) Les collocations: analyse et traitement, Editions De Werelt, Amsterdam, pp 23–32

    Google Scholar 

  • Mel’čuk et al I (1984, 1988, 1992, 1999) Dictionnaire explicatif et combinatoire du français contemporain. Recherches léxico-sémantiques. Presses de l’Université de Montréal, Montréal

    Google Scholar 

  • Meunier F, Granger S (eds) (2008) Phraseology in Foreign Language and Teaching. John Benjamins, Amsterdam/Philadelphia

    Google Scholar 

  • Mille S, Wanner L (2008) Making text resources accessible to the reader: The case of patent claims. In: Proceedings of the 6th International Language Resources and Evaluation (LREC’08), Marrakech, Morocco

    Google Scholar 

  • Moon R (1998) Fixed Expressions and Idioms in English: A Corpus-Based Approach. Claredon Press Oxford, Oxford

    Google Scholar 

  • Pawley A, Syder FH (1983) Two puzzles for linguistic theory: nativelike selection and nativelike fluency. In: Richards J, Schmidt R (eds) Language and Communication, Longman, London, pp 191–227

    Google Scholar 

  • Piao SS, Rayson P, Mudraya O, Wilson A, Garside R (2006) Measuring MWE compositionality using semantic annotation. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, Sydney, Australia, pp 2–11

    Google Scholar 

  • Polguère A (2000) Towards a theoretically-motivated general public dictionary of semantic derivations and collocations for French. In: Proceedings of the 9th EURALEX International Congress, EURALEX 2000, Stuttgart, Germany, pp 517–527

    Google Scholar 

  • Ramos MA, Rambow O, Wanner L (2008) Using semantically annotated corpora to build collocation resources. In: Proceedings of the 6th International Language Resources and Evaluation (LREC’08), Marrakech, Morocco

    Google Scholar 

  • Sag IA, Baldwin T, Bond F, Copestake A, Flickinger D (2002) Multiword expressions: A pain in the neck for NLP. In: Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2002), Mexico City, Mexico, pp 1–15

    Google Scholar 

  • Selva T, Verlinde S, Binon J (2002) Le DAFLES, un nouveau dictionnaire électronique pour apprenants du français. In: Braasch A, Povlsen C (eds) Proceedings of the 10th Euralex International Congress (EURALEX 2002), Copenhagen, Denmark, pp 199–208

    Google Scholar 

  • Silberztein M (1993) Dictionnaires électroniques et analyse automatique de textes. Le système INTEX. Masson, Paris

    Google Scholar 

  • Sinclair J (1991) Corpus, Concordance, Collocation. Oxford University Press, Oxford

    Google Scholar 

  • Smadja F (1993) Retrieving collocations from text: Xtract. Computational Linguistics 19(1):143–177

    Google Scholar 

  • Stubbs M (1995) Corpus evidence for norms of lexical collocation. In: Cook G, Seidlhofer B (eds) Principle & Practice in Applied Linguistics. Studies in Honour of H.G. Widdowson, Oxford University Press, Oxford

    Google Scholar 

  • Venkatapathy S, Joshi AK (2005) Relative compositionality of multi-word expressions: A study of verb-noun (V-N) collocations. In: Natural Language Processing - IJCNLP 2005, Lecture Notes in Computer Science, vol 3651, Springer, Berlin/Heidelberg, pp 553–564

    Google Scholar 

  • Wanner L (1997) Exploring lexical resources for text generation in a systemic functional language model. PhD thesis, University of the Saarland, Saarbrücken

    Google Scholar 

  • Wanner L, Bohnet B, Giereth M (2006) Making sense of collocations. Computer Speech & Language 20(4):609–624

    Article  Google Scholar 

  • Wehrli E (2000) Parsing and collocations. In: Christodoulakis D (ed) Natural Language Processing, Springer, Berlin, pp 272–282

    Google Scholar 

  • van der Wouden T (1997) Negative Contexts. Collocation, Polarity, and Multiple Negation. Routledge, London, New York

    Google Scholar 

  • van der Wouden T (2001) Collocational behaviour in non content words. In: Proceedings of the ACL Workshop on Collocation: Computational Extraction, Analysis and Exploitation, Toulouse, France, pp 16–23

    Google Scholar 

  • Zeevat H (1995) Idiomatic blocking and the Elsewhere principle. In: Everaert M, van der Linden EJ, Schenk A, Schreuder R (eds) Idioms: Structural and Psychological Perspectives, Lawrence Erlbaum Associates, Hillsdale, NJ and Hove, UK, pp 301–316

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Violeta Seretan .

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Seretan, V. (2011). On Collocations. In: Syntax-Based Collocation Extraction. Text, Speech and Language Technology, vol 44. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-0134-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-94-007-0134-2_2

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-007-0133-5

  • Online ISBN: 978-94-007-0134-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics