Complex Networks Reveal a Glottochronological Classification of Natural Languages

  • Harith HamoodatEmail author
  • Younis Al Rozz
  • Ronaldo Menezes
Conference paper
Part of the Springer Proceedings in Complexity book series (SPCOM)


The success of humans cannot be attributed to language, but it is certainly true that language and modern humans are inseparable. This work focuses on revealing the structure of 20 Indo-European languages belonging to three sub-families (Romance, Germanic, and Slavic) from a chronological perspective. In order to find the chronological characteristic features of these languages, we use (1) Heaps’ law, which describes the growth of vocabulary (distinct words) in a corpora for each language to the total number of words in the same corpora and (2) structural properties of networks created from word co-occurrence in corpora of 20 written languages. Using clustering approaches and entanglement, we show that in spite of differences from years of being used separately and differences in alphabets, one can find language characteristics that lead to cluster of languages resembling the organization according to historical sub-families and chronological relations.


Word co-occurrence networks Languages classification Glottochronology 


  1. 1.
    Abramov, O., Mehler, A.: Automatic language classification by means of syntactic dependency networks. J. Quant. Linguist. 18(4), 291–336 (2011)CrossRefGoogle Scholar
  2. 2.
    Al Rozz, Y., Hamoodat, H., Menezes, R.: Characterization of written languages using structural features from common corpora. In: Workshop on Complex Networks CompleNet, pp. 161–173. Springer, Berlin (2017)Google Scholar
  3. 3.
    Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008(10), P10008 (2008)CrossRefGoogle Scholar
  4. 4.
    Bolhuis, J.J., Tattersall, I., Chomsky, N., Berwick, R.C.: How could language have evolved? PLoS Biol. 12(8), e1001934 (2014)CrossRefGoogle Scholar
  5. 5.
    Borgatti, S.P.: Centrality and network flow. Soc. Netw. 27(1), 55–71 (2005)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Bosu, A., Carver, J.C.: How do social interaction networks influence peer impressions formation? a case study. In: IFIP International Conference on Open Source Systems, pp. 31–40. Springer, Berlin (2014)Google Scholar
  7. 7.
    Campbell, L.: American Indian Languages: The Historical Linguistics of Native America. Oxford University Press, Oxford (2000)Google Scholar
  8. 8.
    de Arruda, H.F.: Costa, L.da F., Amancio, D.R.: Topic segmentation via community detection in complex networks. Chaos Interdiscip. J. Nonlinear Sci. 26(6), 063120 (2016)zbMATHGoogle Scholar
  9. 9.
    Goldhahn, D., Eckart, T., Quasthoff, U.: Building large monolingual dictionaries at the leipzig corpora collection: from 100 to 200 languages. In: LREC, pp. 759–765 (2012)Google Scholar
  10. 10.
    Gordon, R.G., Grimes, B.F., et al.: Ethnologue: Languages of the World, vol. 15. SIL International, Dallas (2005)Google Scholar
  11. 11.
    Gray, R.D., Atkinson, Q.D.: LangUage-Tree Divergence Times Support the Anatolian Theory of Indo-European Origin, vol. 426. Nature Publishing Group, London (2003)Google Scholar
  12. 12.
    Gray, R.D., Atkinson, Q.D., Greenhill, S.J.: Language evolution and human history: what a difference a date makes. Philos. Trans. R. Soc. Lond. B Biol. Sci. 366(1567), 1090–1100 (2011)CrossRefGoogle Scholar
  13. 13.
    Herdan, G.: Type-Token Mathematics, vol. 4. Mouton, Berlin (1960)zbMATHGoogle Scholar
  14. 14.
    Lind, P.G., Gonzalez, M.C., Herrmann, H.J.: Cycles and clustering in bipartite networks. Phys. Rev. E 72(5), 056127 (2005)ADSCrossRefGoogle Scholar
  15. 15.
    Liu, H., Xu, C.: Can syntactic networks indicate morphological complexity of a language? EPL (Europhys. Lett.) 93(2), 28005 (2011)Google Scholar
  16. 16.
    Lü, L., Zhang, Z.-K., Zhou, T.: Zipf’s law leads to heaps’ law: analyzing their relation in finite-size systems. PloS One 5(12), e14139 (2010)ADSCrossRefGoogle Scholar
  17. 17.
    McWhorter, J.H.: The Story of Human Language. Teaching Company (2004)Google Scholar
  18. 18.
    Newman, M.E.J.: Assortative mixing in networks. Phys. Rev. Lett. 89(20), 208701 (2002)ADSCrossRefGoogle Scholar
  19. 19.
    Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103(23), 8577–8582 (2006)ADSCrossRefGoogle Scholar
  20. 20.
    Osenova, P.: Bulgarian. Revue belge de philologie et d’histoire 88(3), 643–668 (2010)CrossRefGoogle Scholar
  21. 21.
    Petroni, F., Serva, M.: Language distance and tree reconstruction. J. Stat. Mech. Theory Exp. 2008(08), P08012 (2008)CrossRefGoogle Scholar
  22. 22.
    Renfrew, C., McMahon, A., Trask, R.L.: Time depth in historical linguistics. The Macdonald Institute for Archaelogical Research (2000)Google Scholar
  23. 23.
    Rowe, B.M., Levine, D.P.: A Concise Introduction to Linguistics. Routledge, Abingdon-on-Thames (2015)Google Scholar
  24. 24.
    Schank, T., Wagner, D.: Approximating clustering-coefficient and transitivity. Universität Karlsruhe, Fakultät für Informatik (2004)zbMATHGoogle Scholar
  25. 25.
    Van der Loo, M.P.J.: The stringdist package for approximate string matching. R J. 2 (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Harith Hamoodat
    • 1
    Email author
  • Younis Al Rozz
    • 1
  • Ronaldo Menezes
    • 1
  1. 1.BioComplex Laboratory, Computer ScienceFlorida Institute of TechnologyMelbourneUSA

Personalised recommendations