Heuristics-Based Replenishment of Collocation Databases
- First Online:
- Cite this paper as:
- Bolshakov I.A., Gelbukh A. (2002) Heuristics-Based Replenishment of Collocation Databases. In: Ranchhod E., Mamede N.J. (eds) Advances in Natural Language Processing. Lecture Notes in Computer Science, vol 2389. Springer, Berlin, Heidelberg
Collocations are defined as syntactically linked and semantically plausible combinations of content words. Since collocations constitute a bulk of common texts and depend on the language, creation of collocation databases (CBDs) is important. However, manual compilation of such databases is prohibitively expensive. We present heuristics for automatic generation of new Spanish collocations based on those already present in a CBD, with the help of WordNet-like thesaurus: If a word A is semantically “similar” to a word B and a collocation B + C is known, then A + C presumably is a collocation of the same type given certain conditions are met.
Unable to display preview. Download preview PDF.