Unsupervised Learning of P NP P Word Combinations

  • Sofía N. Galicia-Haro
  • Alexander Gelbukh
Conference paper

DOI: 10.1007/978-3-540-30586-6_37

Part of the Lecture Notes in Computer Science book series (LNCS, volume 3406)
Cite this paper as:
Galicia-Haro S.N., Gelbukh A. (2005) Unsupervised Learning of P NP P Word Combinations. In: Gelbukh A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg


We evaluate the possibility to learn, in an unsupervised manner, a list of idiomatic word combinations of the type preposition + noun phrase + preposition (P NP P), namely, such groups with three or more simple forms that behave as a whole lexical unit and have semantic and syntactic properties not deducible from the corresponding properties of each simple form, e.g., by means of, in order to, in front of. We show that idiomatic P NP P combinations have some statistical properties distinct from those of usual idiomatic collocations. In particular, we found that most frequent P NP P trigrams tend to be idiomatic. Of other statistical measures, log-likelihood performs almost as good as frequency for detecting idiomatic expressions of this type, while chi-square and point-wise mutual information perform very poor. We experiment on Spanish material.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Sofía N. Galicia-Haro
    • 1
  • Alexander Gelbukh
    • 2
  1. 1.Faculty of SciencesUNAM Universitary CityMexico CityMexico
  2. 2.Center for Computing ResearchNational Polytechnic InstituteMexico

Personalised recommendations