SmallWorlds of Natural Language

  • Chris Biemann
Part of the Theory and Applications of Natural Language Processing book series (NLP)


In this chapter, power-law distributions and Small World Graphs originating from natural language data are examined in the fashion of Quantitative Linguistics. After giving several data sources that exhibit power-law distributions in rank-frequency in Section 3.1, graphs with Small World properties in language data are discussed in Section 3.2. We shall see that these characteristics are omnipresent in language data, and we should be aware of them when designing Structure Discovery processes. When knowing e.g. that a few hundreds of words make the bulk of words in a text, it is safe to use only these as contextual features without losing a lot of text coverage. Knowing that word co-occurrence networks possess the scale-free Small World property has implications for clustering these networks. An interesting aspect is whether these characteristics are only inherent to real natural language data or whether they can be produced with generators of linear sequences in a much simpler way than our intuition about language complexity would suggest –in other words, we shall see how distinctive these characteristics are with respect to tests deciding whether a given sequence is natural language or not. Finally, an emergent random text generation model that captures many of the characteristics of natural language is defined and quantitatively verified in Section 3.3.


Degree Distribution Small World Sentence Length Word Generator Random Text 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Chris Biemann
    • 1
  1. 1.Computer Science DepartmentTechnische Universität DarmstadtDarmstadtGermany

Personalised recommendations