Heuristics-Based Replenishment of Collocation Databases

  • Igor A. Bolshakov
  • Alexander Gelbukh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2389)


Collocations are defined as syntactically linked and semantically plausible combinations of content words. Since collocations constitute a bulk of common texts and depend on the language, creation of collocation databases (CBDs) is important. However, manual compilation of such databases is prohibitively expensive. We present heuristics for automatic generation of new Spanish collocations based on those already present in a CBD, with the help of WordNet-like thesaurus: If a word A is semantically “similar” to a word B and a collocation B + C is known, then A + C presumably is a collocation of the same type given certain conditions are met.


Natural Language Processing Content Word Semantic Link Syntactic Dependency Apply Natural Language Processing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Benson, M., E. Benson, and R. Ilson. The BBI Combinatory Dictionary of English. John Benjamin, Amsterdam / Philadelphia, 1989.Google Scholar
  2. 2.
    Bolshakov, I. A., A. Gelbukh. A Very Large Database of Collocations and Semantic Links. In: Mokrane Bouzeghoub et al. (eds.) Natural Language Processing and Information Systems. 5th International Conference on Applications NLDB-2000, Versailles, France, June 2000. Lecture Notes in Computer Science No. 1959, Springer, 2001, p. 103–114.CrossRefGoogle Scholar
  3. 3.
    Calzolari, N., R. Bindi. Acquisition of Lexical Information from a Large Textual Italian Corpus. Proc. of COLING-90, Helsinki, 1990.Google Scholar
  4. 4.
    Fellbaum, Ch. (ed.) WordNet: An Electronic Lexical Database. MIT Press, Cambridge, London, 1998.zbMATHGoogle Scholar
  5. 5.
    Mel’čuk, Igor. Fraseología y diccionario en la lingüística moderna. In: I. Uzcanga Vivar et al. (eds.) Presencia y renovación de la lingüística francesa. Salamanca: Ediciones Universidad, 2001, p. 267–310.Google Scholar
  6. 6.
    Mel’čuk, I., A. Zholkovsky. The explanatory combinatorial dictionary. In: M. Evens (ed.) Relational models of lexicon. Cambridge University Press. Cambridge. England, 1988, p. 41–74.Google Scholar
  7. 7.
    Satoshi Sekine et al. Automatic Learning for Semantic Collocation. Proc. 3rd Conf. Applied Natural Language Processing, Trento, Italy, 1992, p. 104–110.Smadja, F. Retreiving collocations from text: Xtract. Computational Linguistics. Vol. 19, No. 1, 1991, p. 143–177.Google Scholar
  8. 8.
    Smadja, F. Retreiving collocations from text: Xtract. Computational Linguistics. Vol. 19, No. 1, 1991, p. 143–177.Google Scholar
  9. 9.
    Vossen, P. (ed.). EuroWordNet General Document. Vers. 3 final. 2000,
  10. 10.
    Wanner, Leo (ed.). Lexical Functions in Lexicography and Natural Language Processing. Studies in Language Companion Series, ser. 31. John Benjamin, Amsterdam/ Philadelphia, 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Igor A. Bolshakov
    • 1
  • Alexander Gelbukh
    • 1
  1. 1.Nacional Polytecnical InstituteCenter of Computer ResearchMexico CityMexico

Personalised recommendations