Distributions of Functional and Content Words Differ Radically

  • Igor A. Bolshakov
  • Denis M. Filatov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4293)


We consider statistical properties of prepositions—the most numerous and important functional words in European languages. Usually, they syntactically link verbs and nouns to nouns. It is shown that their rank distributions in Russian differ radically from those of content words, being much more compact. The Zipf law distribution commonly used for content words fails for them, and thus approximations flatter at first ranks and steeper at higher ranks are applicable. For these purposes, the Mandelbrot family and an expo-logarithmic family of distributions are tested, and an insignificant difference between the two least-square approximations is revealed. It is proved that the first dozen of ranks cover more than 80% of all preposition occurrences in the DB of Russian collocations of Verb-Preposition-Noun and Noun-Preposition-Noun types, thus hardly leaving room for the rest two hundreds of available Russian prepositions.


Natural Language Processing Content Word Rank Distribution Accusative Case Page Statistic 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bolshakov, I.A.: Getting One’s First Million...Collocations. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 229–242. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  2. 2.
    Gelbukh, A., Sidorov, G.,: Zipf and Heaps Laws’ Coefficients Depend on Language. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 330–333. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  3. 3.
    Wikipedia, the WEB free encyclopedia, http://en.wikipedia.org/wiki/Zipf%27s_law
  4. 4.
    Wikipedia, the WEB free encyclopedia, http://en.wikipedia.org/wiki/Zipf-Mandelbrot_law

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Igor A. Bolshakov
    • 1
  • Denis M. Filatov
    • 1
  1. 1.Center for Computing Research (CIC)National Polytechnic Institute (IPN)Mexico CityMexico

Personalised recommendations