Automatic Syntactic Analysis for Detection of Word Combinations

  • Alexander Gelbukh
  • Grigori Sidorov
  • Sang-Yong Han
  • Erika Hernández-Rubio
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2945)


The paper presents a method for automatic detection of “non-trivial” word combinations in the text. It is based on automatic syntactic analysis. The method shows better precision and recall than the baseline method (bigrams). It was tested on a text in Spanish. The method can be used for enrichment of very large dictionaries of word combinations.


Mutual Information Natural Language Processing Word Pair Baseline Method Morphological Filter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baddorf, D.S., Evens, M.W.: Finding phrases rather than discovering collocations: Searching corpora for dictionary phrases. In: Proc. of the 9th Midwest Artificial Intelligence and Cognitive Science Conference (MAICS 1998), Dayton, USA (1998)Google Scholar
  2. 2.
  3. 3.
    Basili, R., Pazienza, M.T., Velardi, P.: Semi-automatic extraction of linguistic information for syntactic disambiguation. Applied Artificial Intelligence 7, 339–364 (1993)CrossRefGoogle Scholar
  4. 4.
    Biemann, C., Bordag, S., Heyer, G., Quasthoff, U., Wolff, C.: Language-independent methods for compiling monolingual lexical data. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 217–228. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  5. 5.
    Bolshakov, I.A.: Multifunction thesaurus for Russian word processing. In: Proceedings of 4th Conference on Applied Natural language Processing, Stuttgart, pp. 200–202 (1994)Google Scholar
  6. 6.
    Bolshakov, I.A.: Getting One’s First Million...Collocations. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 229–242. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  7. 7.
    Bolshakov, I.A., Gelbukh, A.: A Very Large Database of Collocations and Semantic Links. In: Bouzeghoub, M., Kedad, Z., Métais, E. (eds.) NLDB 2000. LNCS, vol. 1959, pp. 103–114. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  8. 8.
    Bolshakov, I.A., Gelbukh, A.: Word Combinations as an Important Part of Modern Electronic Dictionaries. Procesamiento del Lenguaje Natural 29, 47–54 (2002)Google Scholar
  9. 9.
    Dagan, I., Lee, L., Pereira, F.: Similarity-based models of word cooccurrence probabilities. Machine Learning 34(1) (1999)Google Scholar
  10. 10.
    Gelbukh, A., Sidorov, G., Galicia Haro, S., Bolshakov, I.: Environment for Development of a Natural Language Syntactic Analyzer. In: Acta Academia 2002, Moldova, pp. 206–213 (2002)Google Scholar
  11. 11.
    Kim, S., Yoon, J., Song, M.: Automatic extraction of collocations from Korean text. Computers and the Humanities 35(3), 273–297 (2001)CrossRefGoogle Scholar
  12. 12.
    Kita, K., Kato, Y., Omoto, T., Yano, Y.: A comparative study of automatic extraction of collocations from corpora: Mutual information vs. cost criteria. Journal of Natural Language 1(1), 21–33 (1994)Google Scholar
  13. 13.
    Koster, C.H.A.: Head/Modifier Frames for Information Retrieval. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 420–432. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. 14.
    Mel’čuk, I.: Phrasemes in language and phraseology in linguistics. In: Idioms: structural and psychological perspective, pp. 167–232Google Scholar
  15. 15.
    Oxford collocation dictionary, Oxford (2003)Google Scholar
  16. 16.
    Smadja, F.: Retrieving collocations from texts: Xtract. Computational linguistics 19(1), 143–177 (1993)Google Scholar
  17. 17.
    Strzalkowski, T.: Evaluating natural language processing techniques in information retrieval. In: Strzalkowski, T. (ed.) Natural language information retrieval. Kluwer, Dordrecht (1999)Google Scholar
  18. 18.
    Yu, J., Jin, Z., Wen, Z.: Automatic extraction of collocations (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Alexander Gelbukh
    • 1
    • 2
  • Grigori Sidorov
    • 1
  • Sang-Yong Han
    • 2
  • Erika Hernández-Rubio
    • 1
  1. 1.National Polytechnic InstituteCenter for Computing ResearchMexico CityMexico
  2. 2.Department of Computer Science and EngineeringChung-Ang UniversitySeoulKorea

Personalised recommendations