Various Criteria of Collocation Cohesion in Internet: Comparison of Resolving Power

  • Igor A. Bolshakov
  • Elena I. Bolshakova
  • Alexey P. Kotlyarov
  • Alexander Gelbukh
Conference paper

DOI: 10.1007/978-3-540-78135-6_6

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4919)
Cite this paper as:
Bolshakov I.A., Bolshakova E.I., Kotlyarov A.P., Gelbukh A. (2008) Various Criteria of Collocation Cohesion in Internet: Comparison of Resolving Power. In: Gelbukh A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg

Abstract

For extracting collocations from the Internet, it is necessary to numerically estimate the cohesion between potential collocates. Mutual Information cohesion measure (MI) based on numbers of collocate occurring closely together (N12) and apart (N1, N2) is well known, but the Web page statistics deprives MI of its statistical validity. We propose a family of different measures that depend on N1, N2 and N12 in a similar monotonic way and possess the scalability feature of MI. We apply the new criteria for a collection of N1, N2, and N12 obtained from AltaVista for links between a few tens of English nouns and several hundreds of their modifiers taken from Oxford Collocations Dictionary. The ‘noun–its own adjective’ pairs are true collocations and their measure values form one distribution. The ‘noun–alien adjective’ pairs are false collocations and their measure values form another distribution. The discriminating threshold is searched for to minimize the sum of probabilities for errors of two possible types. The resolving power of a criterion is equal to the minimum of the sum. The best criterion delivering minimum minimorum is found.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Igor A. Bolshakov
    • 1
  • Elena I. Bolshakova
    • 2
  • Alexey P. Kotlyarov
    • 1
  • Alexander Gelbukh
    • 2
  1. 1.Center for Computing Research (CIC)National Polytechnic Institute (IPN)Mexico CityMexico
  2. 2.Faculty of Computational Mathematics and CyberneticsMoscow State Lomonosov UniversityMoscowRussia

Personalised recommendations