Abstract
The primary goal of this chapter is to set the stage for the practical investigations described in the remainder of this book by providing an overall picture of the existing theoretical descriptions of collocation phenomenon. Because the various approaches to collocation—motivated by linguistic, lexicographic, pedagogical or computational considerations—often made use of vague and inconsistent terminology, we begin by providing an analytic review of the definitions of collocation available in the literature and then identify the most salient and uncontroversial features of the phenomena that will serve as the basis for the discussion throughout the rest of the book. After reviewing the definitions, we present a brief discussion of the main theoretical linguistic frameworks which have addressed collocation phenomena. We then consider characterizations proposed by various researchers in terms of semantic compositionality and morpho-syntactic behaviour. We conclude with giving the definition of collocation which we believe most adequately captures this phenomena for our present purposes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Bally distinguishes between “groupements passagers” (free combinations), “groupements usuels” (collocations), and “séries phraséologiques” (idioms).
- 2.
- 3.
A statistical approach, called distributional, and a phraseological (linguistic) approach, called intensional (Evert, 2004b).
- 4.
See also Manning and Schütze’s (1999) non-substitutability criterion mentioned in Section 2.5.
- 5.
Literally, “general validity have”.
- 6.
An analogy can be found between the idiom principle and the Elsewhere Principle in linguistics, concerned with the opposition between regularity and specificity. This principle states that whenever two rules can be applied, the more specific one overwrites the more general one. An account for idioms based on this principle was proposed by Zeevat (1995).
- 7.
A linguistic operation is said to be performed in a regular way if it conforms to the lexicon and to the general rules of the grammar (general rules refer to rules involving classes of lexemes, as opposed to individual lexemes). It is performed in a non-restricted way if it can use any item of the lexicon and any grammar rule, without mentioning other lexical items (Mel’čuk, 2003).
- 8.
- 9.
Firth (1968, 183) describes the concept of colligation as “the interrelation of grammatical categories in syntactical structure”.
- 10.
Sinclair’s position on the syntax-lexicon continuum and the place of collocations is even more radical: “The decoupling of lexis and syntax leads to the creation of a rubbish dump that is called ‘idiom’, ‘phraseology’, ‘collocation’, and the like. (…) The evidence now becoming available casts grave doubts on the wisdom of postulating separate domains of lexis and syntax” (Sinclair, 1991, 104).
- 11.
Nevertheless, some studies have attempted to grade the compositionality of phraseological units, e.g., McCarthy et al. (2003), Baldwin et al. (2003), Venkatapathy and Joshi (2005), and Piao et al. (2006). A larger number of works focused, more generally, on distinguishing between compositional and non-compositional units: see, among many others, Bannard (2005), Fazly and Stevenson (2007), McCarthy et al. (2007), Cook et al. (2008), Diab and Bhutada (2009).
- 12.
Only the criterion of non-substitutability (17c) is commonly associated with collocations.
- 13.
The following abbreviations are used for part-of-speech categories: A – adjective, Adv – adverb, Conj – conjunction, D – determiner, Inter – interjection, N – noun, P – preposition, V – verb.
- 14.
These combinations correspond to the so-called lexical collocations according to Benson et al.’s description of collocations (1986a). Benson et al. divide the collocations in two broad classes, one allowing exclusively open-class words (lexical collocations) and one involving function words as well (grammatical collocations).
- 15.
On the same line, Kjellmer (1990, 172) notes that functional categories such as articles and prepositions are collocational in nature.
- 16.
Usually, a frequency threshold is applied in related work to ensure the reliability of statistical association measures, as will be seen in Chapter 3. Instead, we consider that the initial selection of collocation candidates must be exhaustive, independent of the statistical computation, and that specific restrictions may be applied later depending on the subsequent processing.
References
Bahns J (1993) Lexical collocations: a contrastive view. ELT Journal 1(47):56–63
Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley FrameNet project. In: Proceedings of the COLING-ACL, Montreal, Canada, pp 86–90
Baldwin T, Bannard C, Tanaka T, Widdows D (2003) An empirical model of multiword expression decomposability. In: Bond F, Korhonen A, McCarthy D Villavicencio A (eds) Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, Sapporo, Japan, pp 89–96
Bally C (1909) Traité de stylistique française. Klincksieck, Paris
Bally C (1951) Traité de stylistique française. Klincksieck, Paris
Bannard C (2005) Learning about the meaning of verb-particle constructions from corpora. Computer Speech and Language 19(4):467–478
Bartsch S (2004) Structural and Functional Properties of Collocations in English. A Corpus Study of Lexical and Pragmatic Constraints on Lexical Cooccurrence. Gunter Narr Verlag, Tübingen
Benson M (1990) Collocations and general-purpose dictionaries. International Journal of Lexicography 3(1):23–35
Benson M, Benson E, Ilson R (1986a) The BBI Dictionary of English Word Combinations. John Benjamins, Amsterdam/Philadelphia
Benson M, Benson E, Ilson R (1986b) Lexicographic Description of English. John Benjamins, Amsterdam/Philadelphia
Boitet C, Mangeot M, Sérasset G (2002) The PAPILLON Project: Cooperatively building a multilingual lexical database to derive open source dictionaries and lexicons. In: Proceedings of the 2nd Workshop on NLP and XML (NLPXML-2002), Taipei, Taiwan
Choueka Y (1988) Looking for needles in a haystack, or locating interesting collocational expressions in large textual databases. In: Proceedings of the International Conference on User-Oriented Content-Based Text and Image Handling, Cambridge, MA, USA, pp 609–623
Church K, Hanks P (1990) Word association norms, mutual information, and lexicography. Computational Linguistics 16(1):22–29
Cook P, Fazly A, Stevenson S (2008) The VNC-tokens dataset. In: Proceedings of the LREC Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), Marrakech, Morocco, pp 19–22
Coseriu E (1967) Lexikalische Solidaritäten. Poetica (1):293–303
Cowie AP (1978) The place of illustrative material and collocations in the design of a learner’s dictionary. In: Strevens P (ed) In Honour of A.S. Hornby, Oxford University Press, Oxford, pp 127–139
Cruse DA (1986) Lexical Semantics. Cambridge University Press, Cambridge
Diab MT, Bhutada P (2009) Verb noun construction MWE token supervised classification. In: 2009 Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation, Applications, Suntec, Singapore, pp 17–22
Evert S (2004b) The statistics of word cooccurrences: Word pairs and collocations. PhD thesis, University of Stuttgart
Fazly A, Stevenson S (2007) Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In: Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, Prague, Czech Republic, pp 9–16
Fillmore C, Kay P, O’Connor C (1988) Regularity and idiomaticity in grammatical constructions: The case of let alone. Language 64(3):501–538
Fillmore CJ (1982) Frame semantics. In: Linguistics in the Morning Calm, Hanshin Publishing Co., Seoul, pp 111–137
Firth JR (1957) Papers in Linguistics 1934-1951. Oxford University Press, Oxford
Firth JR (1968) A synopsis of linguistic theory, 1930–1955. In: Palmer F (ed) Selected papers of J. R. Firth, 1952–1959, Indiana University Press, Bloomington, IN, pp 168–205
Fontenelle T (1992) Collocation acquisition from a corpus or from a dictionary: A comparison. Proceedings I-II Papers submitted to the 5th EURALEX International Congress on Lexicography in Tampere, Tampere, Finland, pp 221–228
Fontenelle T (1997a) Turning a Bilingual Dictionary into a Lexical-Semantic Database. Max Niemeyer Verlag, Tübingen
Fontenelle T (1997b) Using a bilingual dictionary to create semantic networks. International Journal of Lexicography 10(4):276–303
Fontenelle T (2001) Collocation modelling: From lexical functions to frame semantics. In: Proceedings of the ACL Workshop on Collocation: Computational Extraction, Analysis and Exploitation, Toulouse, France, pp 1–7
Francis G (1993) A corpus-driven approach to grammar: Principles, methods and examples. In: Baker M, Francis G, Tognini-Bonelli E (eds) Text and Technology: In Honour of John Sinclair, John Benjamins, Amsterdam, pp 137–156
Gitsaki C (1996) The development of ESL collocational knowledge. PhD thesis, University of Queensland
Gross G (1996) Les expressions figées en français. OPHRYS, Paris
Gross M (1984) Lexicon-grammar and the syntactic analysis of French. In: Proceedings of the 10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics, Morristown, NJ, USA, pp 275–282
Halliday MAK, Hasan R (1976) Cohesion in English. Longman, London
Hargreaves P (2000) Collocation and testing. In: Lewis M (ed) Teaching Collocations, Language Teaching Publications, Hove
Hausmann FJ (1979) Un dictionnaire des collocations est-il possible? Travaux de littérature et de linguistique de l’Université de Strasbourg 17(1):187–195
Hausmann FJ (1985) Kollokationen im deutschen Wörterbuch. Ein Beitrag zur Theorie des lexikographischen Beispiels. In: Bergenholtz H, Mugdan J (eds) Lexikographie und Grammatik. Akten des Essener Kolloquiums zur Grammatik im Wörterbuch, Lexicographica. Series Major 3, pp 118–129
Hausmann FJ (1989) Le dictionnaire de collocations. In: Hausmann F, Reichmann O, Wiegand H, Zgusta L (eds) Wörterbücher: Ein internationales Handbuch zur Lexicographie. Dictionaries, Dictionnaires, de Gruyter, Berlin, pp 1010–1019
Hausmann FJ (2004) Was sind eigentlich Kollokationen? In: Steyer K (ed) Wortverbindungen – mehr oder weniger fest. Jahrbuch des Instituts für Deutsche Sprache 2003, de Gruyter, Berlin, pp 309–334
Heid U (1994) On ways words work together – research topics in lexical combinatorics. In: Proceedings of the 6th Euralex International Congress on Lexicography (EURALEX ’94), Amsterdam, The Netherlands, pp 226–257
Heid U, Raab S (1989) Collocations in multilingual generation. In: Proceeding of the 4th Conference of the European Chapter of the Association for Computational Linguistics (EACL’89), Manchester, England, pp 130–136
Heylen D, Maxwell KG, Verhagen M (1994) Lexical functions and machine translation. In: Proceedings of the 15th International Conference on Computational Linguistics (COLING 1994), Kyoto, Japan, pp 1240–1244
Hoey M (1991) Patterns of Lexis in Text. Oxford University Press, Oxford
Hoey M (1997) From concordance to text structure: New uses for computer corpora. In: Melia J, Lewandoska B (eds) Proceedings of Practical Applications of Language Corpora (PALC 1997), Lodz, Poland, pp 2–23
Hoey M (2000) A world beyond collocation: New perspectives on vocabulary teaching. In: Lewis M (ed) Teaching Collocations, Language Teaching Publications, Hove
Hornby AS, Cowie AP, Lewis JW (1948a) Oxford Advanced Learner’s Dictionary of Current English. Oxford University Press, London
Hornby AS, Gatenby EV, Wakefield H (1948b) A Learner’s Dictionary of Current English. Oxford University Press, London
Hornby AS, Gatenby EV, Wakefield H (1952) The Advanced Learner’s Dictionary of Current English. Oxford University Press, London
Hunston S, Francis G (1998) Verbs observed: A corpus-driven pedagogic grammar. Applied Linguistics 19(1):45–72
Hunston S, Francis G, Manning E (1997) Grammar and vocabulary: Showing the connections. English Language Teaching Journal 3(51):208–215
Kahane S, Polguère A (2001) Formal foundations of lexical functions. In: Proceedings of the ACL Workshop on Collocation: Computational Extraction, Analysis and Exploitation, Toulouse, France, pp 8–15
Kjellmer G (1987) Aspects of English collocations. In: Meijs W (ed) Corpus Linguistics and Beyond, Rodopi, Amsterdam, pp 133–140
Kjellmer G (1990) Patterns of collocability. In: Aarts J, Meijs W (eds) Theory and practice in Corpus Linguistics, Rodopi B.V., Amsterdam, pp 163–178
Kjellmer G (1991) A mint of phrases. In: Aijmer K, Altenberg B (eds) English Corpus Linguistics. Studies in Honour of Jan Svartvik, Longman, London/New York, pp 111–127
Lehr A (1996) Germanistische Linguistik: Kollokationen und maschinenlesbare Korpora, vol 168. Niemeyer, Tübingen
Lewis M (2000) Teaching Collocations. Further Developments in the Lexical Approach. Language Teaching Publications, Hove
Louw B (1993) Irony in the text or insincerity in the writer? The diagnostic potential of semantic prosodies. In: Baker M, Francis G, Tognini-Bonelli E (eds) Text and Technology: In Honour of John Sinclair, John Benjamins, Amsterdam, pp 157–176
Mangeot M (2006) Papillon project: Retrospective and perspectives. In: Proceedings of the LREC 2006 Workshop on Acquiring and Representing Multilingual, Specialized Lexicons: The Case of Biomedicine, Genoa, Italy
Manning CD, Schütze H (1999) Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA
McCarthy D, Keller B, Carroll J (2003) Detecting a continuum of compositionality in phrasal verbs. In: Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, Sapporo, Japan, pp 73–80
McCarthy D, Venkatapathy S, Joshi A (2007) Detecting compositionality of verb-object combinations using selectional preferences. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, pp 369–379
McKeown KR, Radev DR (2000) Collocations. In: Dale R, Moisl H, Somers H (eds) A Handbook of Natural Language Processing, Marcel Dekker, New York, NY, pp 507–523
Mel’čuk I (1998) Collocations and lexical functions. In: Cowie AP (ed) Phraseology. Theory, Analysis, and Applications, Claredon Press, Oxford, pp 23–53
Mel’čuk I (2003) Collocations: définition, rôle et utilité. In: Grossmann F, Tutin A (eds) Les collocations: analyse et traitement, Editions De Werelt, Amsterdam, pp 23–32
Mel’čuk et al I (1984, 1988, 1992, 1999) Dictionnaire explicatif et combinatoire du français contemporain. Recherches léxico-sémantiques. Presses de l’Université de Montréal, Montréal
Meunier F, Granger S (eds) (2008) Phraseology in Foreign Language and Teaching. John Benjamins, Amsterdam/Philadelphia
Mille S, Wanner L (2008) Making text resources accessible to the reader: The case of patent claims. In: Proceedings of the 6th International Language Resources and Evaluation (LREC’08), Marrakech, Morocco
Moon R (1998) Fixed Expressions and Idioms in English: A Corpus-Based Approach. Claredon Press Oxford, Oxford
Pawley A, Syder FH (1983) Two puzzles for linguistic theory: nativelike selection and nativelike fluency. In: Richards J, Schmidt R (eds) Language and Communication, Longman, London, pp 191–227
Piao SS, Rayson P, Mudraya O, Wilson A, Garside R (2006) Measuring MWE compositionality using semantic annotation. In: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, Sydney, Australia, pp 2–11
Polguère A (2000) Towards a theoretically-motivated general public dictionary of semantic derivations and collocations for French. In: Proceedings of the 9th EURALEX International Congress, EURALEX 2000, Stuttgart, Germany, pp 517–527
Ramos MA, Rambow O, Wanner L (2008) Using semantically annotated corpora to build collocation resources. In: Proceedings of the 6th International Language Resources and Evaluation (LREC’08), Marrakech, Morocco
Sag IA, Baldwin T, Bond F, Copestake A, Flickinger D (2002) Multiword expressions: A pain in the neck for NLP. In: Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2002), Mexico City, Mexico, pp 1–15
Selva T, Verlinde S, Binon J (2002) Le DAFLES, un nouveau dictionnaire électronique pour apprenants du français. In: Braasch A, Povlsen C (eds) Proceedings of the 10th Euralex International Congress (EURALEX 2002), Copenhagen, Denmark, pp 199–208
Silberztein M (1993) Dictionnaires électroniques et analyse automatique de textes. Le système INTEX. Masson, Paris
Sinclair J (1991) Corpus, Concordance, Collocation. Oxford University Press, Oxford
Smadja F (1993) Retrieving collocations from text: Xtract. Computational Linguistics 19(1):143–177
Stubbs M (1995) Corpus evidence for norms of lexical collocation. In: Cook G, Seidlhofer B (eds) Principle & Practice in Applied Linguistics. Studies in Honour of H.G. Widdowson, Oxford University Press, Oxford
Venkatapathy S, Joshi AK (2005) Relative compositionality of multi-word expressions: A study of verb-noun (V-N) collocations. In: Natural Language Processing - IJCNLP 2005, Lecture Notes in Computer Science, vol 3651, Springer, Berlin/Heidelberg, pp 553–564
Wanner L (1997) Exploring lexical resources for text generation in a systemic functional language model. PhD thesis, University of the Saarland, Saarbrücken
Wanner L, Bohnet B, Giereth M (2006) Making sense of collocations. Computer Speech & Language 20(4):609–624
Wehrli E (2000) Parsing and collocations. In: Christodoulakis D (ed) Natural Language Processing, Springer, Berlin, pp 272–282
van der Wouden T (1997) Negative Contexts. Collocation, Polarity, and Multiple Negation. Routledge, London, New York
van der Wouden T (2001) Collocational behaviour in non content words. In: Proceedings of the ACL Workshop on Collocation: Computational Extraction, Analysis and Exploitation, Toulouse, France, pp 16–23
Zeevat H (1995) Idiomatic blocking and the Elsewhere principle. In: Everaert M, van der Linden EJ, Schenk A, Schreuder R (eds) Idioms: Structural and Psychological Perspectives, Lawrence Erlbaum Associates, Hillsdale, NJ and Hove, UK, pp 301–316
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Seretan, V. (2011). On Collocations. In: Syntax-Based Collocation Extraction. Text, Speech and Language Technology, vol 44. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-0134-2_2
Download citation
DOI: https://doi.org/10.1007/978-94-007-0134-2_2
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-0133-5
Online ISBN: 978-94-007-0134-2
eBook Packages: Computer ScienceComputer Science (R0)