Skip to main content

Measurements of Lexico-Syntactic Cohesion by Means of Internet

  • Conference paper
MICAI 2005: Advances in Artificial Intelligence (MICAI 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3789))

Included in the following conference series:

Abstract

Syntactic links between content words in meaningful texts are intuitively conceived ‘normal,’ thus ensuring text cohesion. Nevertheless we are not aware on a broadly accepted Internet-based measure of cohesion between words syntactically linked in terms of Dependency Grammars. We propose to measure lexico-syntactic cohesion between content words by means of Internet with a specially introduced Stable Connection Index (SCI). SCI is similar to Mutual Information known in statistics, but does not require iterative evaluation of total amount of Web-pages under search engine’s control and is insensitive to both fluctuations and slow growth of raw Web statistics. Based on Russian, Spanish, and English materials, SCI presented concentrated distributions for various types of word combinations; hence lexico-syntactic cohesion acquires a simple numeric measure. It is shown that SCI evaluations can be successfully used for semantic error detection and correction, as well as for information retrieval.

Work done under partial support of Mexican Government (CONACyT, SNI, CGEPI-IPN).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bolshakov, I.A.: Getting One’s First Million...Collocations. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 229–242. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  2. Bolshakov, I.A., Galicia-Haro, S.N.: Can We Correctly Estimate the Total Number of Pages in Google for a Specific Language? In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 415–419. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  3. Bolshakov, I.A., Galicia-Haro, S.N.: Web-assisted detection and correction of joint and disjoint malapropos word combinations. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, pp. 126–137. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. Bolshakov, I.A., Galicia-Haro, S.N., Gelbukh, A.: Detection and correction of malapropisms in spanish by means of internet search. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 115–122. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  5. Bolshakov, I.A., Gelbukh, A., Galicia-Haro, S.N.: Stable Coordinated Pairs in Text Processing. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 27–34. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  6. Bolshakova, E.I., Bolshakov, I.A., Kotlyarov, A.P.: Experiments in detection and correc-tion of Russian malapropisms by means of the Web. International Journal on Information Theories & Applications (forthcoming)

    Google Scholar 

  7. Borko, H.: The Construction of an Empirically Based Mathematically Derived Classifica-tion System. In: Proceedings of the Western Joint Computer Conference (May 1962)

    Google Scholar 

  8. Keller, F., Lapata, M.: Using the Web to Obtain Frequencies for Unseen Bigram. Computa-tional linguistics 29(3), 459–484 (2003)

    Article  Google Scholar 

  9. Kilgarriff, A., Grefenstette, G.: Introduction to the Special Issue on the Web as Corpus. Computational linguistics 29(3), 333–347 (2003)

    Article  MathSciNet  Google Scholar 

  10. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  11. Mel’čuk, I.: Dependency Syntax: Theory and Practice. SUNY Press, NY (1988)

    Google Scholar 

  12. Oxford Collocations Dictionary for Students of English. Oxford University Press, Oxford (2003)

    Google Scholar 

  13. Smadja, F.: Retreiving Collocations from text: Xtract. Computational Linguistics 19(1), 143–177 (1990)

    Google Scholar 

  14. Stiles, H.E.: The Association Factor in Information Retrieval. Journal of the Association for Computing Machinery 8(2), 271–279 (1961)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bolshakov, I.A., Bolshakova, E.I. (2005). Measurements of Lexico-Syntactic Cohesion by Means of Internet. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds) MICAI 2005: Advances in Artificial Intelligence. MICAI 2005. Lecture Notes in Computer Science(), vol 3789. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11579427_80

Download citation

  • DOI: https://doi.org/10.1007/11579427_80

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29896-0

  • Online ISBN: 978-3-540-31653-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics