Automatic Enrichment of Very Large Dictionary of Word Combinations on the Basis of Dependency Formalism

  • Alexander Gelbukh
  • Grigori Sidorov
  • Sang-Yong Han
  • Erika Hernández-Rubio
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2972)


The paper presents a method of automatic enrichment of a very large dictionary of word combinations. The method is based on results of automatic syntactic analysis (parsing) of sentences. The dependency formalism is used for representation of syntactic trees that allows for easier treatment of information about syntactic compatibility. Evaluation of the method is presented for the Spanish language based on comparison of the automatically generated results with manually marked word combinations.


Collocations parsing dependency grammar Spanish 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baddorf, D.S., Evens, M.W.: Finding phrases rather than discovering collocations: Searching corpora for dictionary phrases. In: Proc. of the 9th Midwest Artificial Intelligence and Cognitive Science Conference (MAICS 1998), Dayton, USA (1998)Google Scholar
  2. 2.
  3. 3.
    Basili, R., Pazienza, M.T., Velardi, P.: Semi-automatic extraction of linguistic information for syntactic disambiguation. Applied Artificial Intelligence 7, 339–364 (1993)CrossRefGoogle Scholar
  4. 4.
    Bolshakov, I.A.: Multifunction thesaurus for Russian word processing. In: Proceedings of 4th Conference on Applied Natural language Processing, Stuttgart, October 13-15, pp. 200–202 (1994)Google Scholar
  5. 5.
    Bolshakov, I.A., Gelbukh, A.: A Very Large Database of Collocations and Semantic Links. In: Bouzeghoub, M., Kedad, Z., Métais, E. (eds.) NLDB 2000. LNCS, vol. 1959, pp. 103–114. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  6. 6.
    Bolshakov, I.A., Gelbukh, A.: Word Combinations as an Important Part of Modern Electronic Dictionaries. In: Revista SEPLN (Sociedad Español para el Procesamiento del Lenguaje Natural), Septiembre 2002, vol. 29, pp. 47–54 (2002)Google Scholar
  7. 7.
    Church, K., Gale, W., Hanks, P., Hindle, D.: Parsing, word associations and typical predicate-argument relations. In: Tomita, M. (ed.) Current Issues in Parsing Technology, Kluwer Academic, Dordrecht (1991)Google Scholar
  8. 8.
    Dagan, I., Lee, L., Pereira, F.: Similarity-based models of word cooccurrence probabilities. Machine Learning 34(1) (1999)Google Scholar
  9. 9.
    Gelbukh, A., Sidorov, G., Galicia Haro, S., Bolshakov, I.: Environment for Development of a Natural Language Syntactic Analyzer. In: Acta Academia 2002, Moldova, pp. 206–213 (2002)Google Scholar
  10. 10.
    Kim, S., Yoon, J., Song, M.: Automatic extraction of collocations from Korean text. Computers and the Humanities 35(3), 273–297 (2001)CrossRefGoogle Scholar
  11. 11.
    Kita, K., Kato, Y., Omoto, T., Yano, Y.: A comparative study of automatic extraction of collocations from corpora: Mutual information vs. cost criteria. Journal of Natural Language Processing 1(1), 21–33 (1994)Google Scholar
  12. 12.
    Mel’čuk, I.: Dependency syntax, p. 428. New York Press, Albany (1988)Google Scholar
  13. 13.
    Mel’čuk, I.: Phrasemes in language and phraseology in linguistics. In: Idioms: structural and psychological perspective, pp. 167–232Google Scholar
  14. 14.
    Oxford collocation dictionary, Oxford (2003)Google Scholar
  15. 15.
    Smadja, F.: Retrieving collocations from texts: Xtract. Computational linguistics 19(1), 143–177 (1993)Google Scholar
  16. 16.
    Smadja, F., McKeown, K.R., Hatzivassiloglou, V.: Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics 22(1), 1–38 (1996)Google Scholar
  17. 17.
    Strzalkowski, T.: Evaluating natural language processing techniques in information retrieval. In: Strzalkowski, T. (ed.) Natural language information retrieval, pp. 113–146. Kluwer, Dordrecht (1999)Google Scholar
  18. 18.
    Yu, J., Jin, Z., Wen, Z.: Automatic extraction of collocations (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Alexander Gelbukh
    • 1
    • 2
  • Grigori Sidorov
    • 1
  • Sang-Yong Han
    • 2
  • Erika Hernández-Rubio
    • 1
  1. 1.Center for Computing Research, National Polytechnic InstituteNatural Language and Text Processing LaboratoryMexico CityMexico
  2. 2.Department of Computer Science and EngineeringChung-Ang UniversitySeoulKorea

Personalised recommendations