Heuristics-Based Replenishment of Collocation Databases

  • Igor A. Bolshakov
  • Alexander Gelbukh
Conference paper

DOI: 10.1007/3-540-45433-0_5

Part of the Lecture Notes in Computer Science book series (LNCS, volume 2389)
Cite this paper as:
Bolshakov I.A., Gelbukh A. (2002) Heuristics-Based Replenishment of Collocation Databases. In: Ranchhod E., Mamede N.J. (eds) Advances in Natural Language Processing. Lecture Notes in Computer Science, vol 2389. Springer, Berlin, Heidelberg

Abstract

Collocations are defined as syntactically linked and semantically plausible combinations of content words. Since collocations constitute a bulk of common texts and depend on the language, creation of collocation databases (CBDs) is important. However, manual compilation of such databases is prohibitively expensive. We present heuristics for automatic generation of new Spanish collocations based on those already present in a CBD, with the help of WordNet-like thesaurus: If a word A is semantically “similar” to a word B and a collocation B + C is known, then A + C presumably is a collocation of the same type given certain conditions are met.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Igor A. Bolshakov
    • 1
  • Alexander Gelbukh
    • 1
  1. 1.Nacional Polytecnical InstituteCenter of Computer ResearchMexico CityMexico

Personalised recommendations