Levenshtein’s Distance for Measuring Lexical Evolution Rates

  • Filippo Petroni
  • Maurizio Serva
  • Dimitri VolchenkovEmail author
Part of the Nonlinear Systems and Complexity book series (NSCH, volume 12)


The relationships between languages molded by extremely complex social, cultural and political factors are assessed by an automated method, in which the distance between languages is estimated by the average normalized Levenshtein distance between words from the list of 200 meanings maximally resistant to change. A sequential process of language classification described by random walks on the matrix of lexical distances allows to represent complex relationships between languages geometrically, in terms of distances and angles. We have tested the method on a sample of 50 Indo-European and 50 Austronesian languages. The geometric representations of language taxonomy allow for making accurate interfaces on the most significant events of human history by tracing changes in language families through time. The Anatolian and Kurgan hypothesis of the Indo-European origin and the “express train” model of the Polynesian origin are thoroughly discussed.


Language Group Language Family Orthographic Representation Standard Principal Component Analysis Language Classification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We profoundly thank R.D. Gray for the permission to use the Austronesian Basic Vocabulary Database [54] containing lexical items from languages spoken throughout the Pacific region.


  1. 1.
    D’Urville, D.: Sur les îles du Grand Océan. Bull. Soc. Goégr. 17, 1–21 (1832)Google Scholar
  2. 2.
    Swadesh, M.: Lexicostatistic dating of prehistoric ethnic contacts. Proc. Am. Philos. Soc. 96, 452–463 (1952)Google Scholar
  3. 3.
    Pagel, M., Atkinson, Q.D., Meadel, A.: Frequency of word-use predicts rates of lexical evolution throughout Indo-European history. Nature 449, 717–720 (2007).CrossRefGoogle Scholar
  4. 4.
    Nichols, J., Warnow, T.: Tutorial on computational linguistic phylogeny. Lang. Linguist. Compass 2(5), 760–820 (2008)CrossRefGoogle Scholar
  5. 5.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10, 707–710 (1966)MathSciNetGoogle Scholar
  6. 6.
    Petroni, F., Serva, M.: Language distance and tree reconstruction. J. Stat. Mech. Theory Exp. 2008, P08012 (2008)Google Scholar
  7. 7.
    Serva, M., Petroni, F.: Indo-European languages tree by Levenshtein distance. Europhys. Lett. 81, 68005 (2008)Google Scholar
  8. 8.
    Petroni, F., Serva, M.: Lexical evolution rates derived from automated stability measures. J. Stat. Mech. 2010, P03015 (2010)Google Scholar
  9. 9.
    Dyen, I., Kruskal, J., Black, P.: Comparative Indo-European Database collected by Isidore Dyen. Copyright (C) 1997 by Isidore Dyen, Joseph Kruskal, and Paul Black. The file was last modified on Feb 5, 1997. Redistributable for academic, non-commercial purposes (1997)
  10. 10.
    McMahon, A., Heggarty, P., McMahon, R., Slaska, N.: Swadesh sublists and the benefits of borrowing: An Andean case study. Trans. Philol. Soc. 103(2), 147–170 (2005)CrossRefGoogle Scholar
  11. 11.
    Greenhill, S.J., Blust, R., Gray, R.D.: The Austronesian Basic Vocabulary Database: From Bioinformatics to Lexomics. Evol. Bioinform. 4, 271. The Austronesian Basic Vocabulary Database. (2008)
  12. 12.
    The database modified by the authors is publicly available on-line at∼serva/languages/languages.html
  13. 13.
    Petroni, F., Serva, M.: Automated words stability and languages phylogeny. J. Quant. Linguist. 18(1), 53–62 (2011)CrossRefGoogle Scholar
  14. 14.
    Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York (2001)CrossRefGoogle Scholar
  15. 15.
    Jolliffe, I.T.: Principal Component Analysis, 2nd ed. Springer Series in Statistics, vol. XXIX. Springer, New York (2002)zbMATHGoogle Scholar
  16. 16.
    Blanchard, P., Volchenkov, D.: Intelligibility and first passage times in complex urban networks. Proc. R. Soc. A 464, 2153–2167 (2008)CrossRefzbMATHMathSciNetGoogle Scholar
  17. 17.
    Blanchard, P., Volchenkov, D.: Mathematical Analysis of Urban Spatial Networks. Understanding Complex Systems, vol. XIV. Springer, Berlin (2009)zbMATHGoogle Scholar
  18. 18.
    Volchenkov, D.: Random walks and flights over connected graphs and complex networks. In Communications in Nonlinear Science and Numerical Simulation. (2010)
  19. 19.
    Schölkopf, B., Smola, A.J., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10, 1299–1319 (1998)CrossRefGoogle Scholar
  20. 20.
    Blanchard, P., Petroni, F., Serva, M., Volchenkov, D.: Geometric representations of language taxonomies. Comput. Speech Lang. 25(3), 679–699 (2011).CrossRefGoogle Scholar
  21. 21.
    Gamkrelidze, T.V., Ivanov, V.V.: (1995) Indo-European and the Indo-Europeans: A Reconstruction and historical analysis of a proto-language and a proto-culture. Trends in Linguistics: Studies and Monographs, vol. 80. de Gruyter BerlinCrossRefGoogle Scholar
  22. 22.
    Renfrew, C.: Archaeology and Language: The Puzzle of Indo-European Origins. Cambridge University Press, New York (1987)Google Scholar
  23. 23.
    Baldi, P.: The Foundations of Latin. Mouton de Gruyter Series Trends in Linguistics: Studies and Monographs, vol. 117. de Gruyter, Berlin (2002)CrossRefGoogle Scholar
  24. 24.
    Gamkrelidze, T.V., Ivanov, V.V.: The early history of Indo-European languages. Sci. Am. 262(3), 110–116 (1990)CrossRefGoogle Scholar
  25. 25.
    Embelton, S.M.: Statistics in Historical Linguistics. Bochum, Brockmeyer (1986)Google Scholar
  26. 26.
    Heggarty, P.: Interdisciplinary indiscipline? Can phylogenetic methods meaningfully be applied to language data and to dating language? In: Forster&, P., Renfrew, C. (eds.) Phylogenetic Methods and the Prehistory of Languages, p. 18–3. McDonald Institute for Archaeological Research, Cambridge (2006)Google Scholar
  27. 27.
    Fouracre, P.: The New Cambridge Medieval History. Cambridge University Press (1995–2007)Google Scholar
  28. 28.
    Bryant, E.: The Quest for the Origins of Vedic Culture: The Indo-Aryan Migration Debate. Oxford University Press (2001)Google Scholar
  29. 29.
    Novotná, P., Blažek, V.: Glottochronolgy and its application to the Balto-Slavic languages. Baltistica XLII(2), 185–210 (2007)Google Scholar
  30. 30.
    Mcleod, J.: The History of India. Greenwood Publishing Group (2002)Google Scholar
  31. 31.
    Green, P.: The Greco-Persian Wars. University of California Press, Berkeley (1996)Google Scholar
  32. 32.
    Gimbutas, M.: Old Europe in the fifth millenium B.C.: The European situation on the arrival of Indo-Europeans. In: Polomé, E.C. (ed.) The Indo-Europeans in the Fourth and Third Millennia. Karoma Publishers, Ann Arbor (1982)Google Scholar
  33. 33.
    Renfrew, C.: Time depth, convergence theory, and innovation in proto-Indo-European. Proceedings of the conference languages in prehistoric Europe, p. 227, Eichstätt University, 4–6 October 1999, Heidelberg (2003)Google Scholar
  34. 34.
    Mallory, J.P.: In Search of the Indo-Europeans: Language, Archaeology, and Myth. Thames & Hudson, London (1991)Google Scholar
  35. 35.
    Krell, K.S.: Gimbutas` Kurgan-PIE homeland hypothesis: A linguistic critique. In: Blench, R., Spriggs, M. (eds.) Archaeology and Language, II, p. 26–7, London, Routledge (1998)Google Scholar
  36. 36.
    Dahl, O.C.: Avhandlinger utgitt av Egede-Instituttet 3, 408, Arne Gimnes Forlag (1951)Google Scholar
  37. 37.
    Hurles, M.E., Sykes, B.C., Jobling, M.A., Forster, P.: The dual origins of the Malagasy in island Southeast Asia and East Africa: Evidence from maternal and paternal lineages. Am. J. Hum. Genet. 76, 89–4 (2005)CrossRefGoogle Scholar
  38. 38.
    Diamond, J.M.: Express train to Polynesia. Nature 336, 307–308 (1988)CrossRefGoogle Scholar
  39. 39.
    Su, B., et al.: Polynesian origins: Insights from the Y chromosome. Proc. Natl. Acad. Sci. U S A 97(15), 8225–8228 (2000)Google Scholar
  40. 40.
    Bellwood, P., Koon, P.: Lapita colonists leave boats unburned! Antiquity 63(240), 613–622 (1989)Google Scholar
  41. 41.
    Kirch, P.V.: The Lapita Peoples: Ancestors of the Oceanic World. Blackwell, Cambridge (1997)Google Scholar
  42. 42.
    Matisoo-Smith, E., Robins, J.H.: Origins and dispersals of Pacific peoples: Evidence from mtDNA phylogenies of the Pacific rat. Proc. Natl. Acad. Sci. U S A 101(24), 9167–9172 (2004)CrossRefGoogle Scholar
  43. 43.
    Larson, G., et al.: Phylogeny and ancient DNA of Sus provides insights into neolithic expansion in island Southeast Asia and Oceania. Proc. Natl. Acad. Sci. U S A 104(12), 4834–4839 (2007)Google Scholar
  44. 44.
    Kirch, P.V.: On the road of the winds: An archaeological history of the Pacific islands before European contact. University of California Press, Berkley (2000)Google Scholar
  45. 45.
    Anderson, A., Sinoto, Y.: New radiocarbon ages for colonization sites in East Polynesia. Asian Perspect. 41, 242–257 (2002)CrossRefGoogle Scholar
  46. 46.
    Hurles, M.E., et al.: Untangling Pacific settlement: The edge of the knowable. Trends Ecol. Evol. 18, 531–540 (2003)Google Scholar
  47. 47.
    Lum, J.K., Jorde, L.B., Schiefenhovel, W.: Affinities among Melanesians, Micronesians, and Polynesians: A neutral, biparental genetic perspective. Hum. Biol. 74, 413–430 (2002)CrossRefGoogle Scholar
  48. 48.
    Kayser, M., et al.: Melanesian and Asian origins of polynesians: mtDNA and Y chromosome gradients across the Pacific. Mol. Biol. Evol. 23, 2234–2244 (2006)Google Scholar
  49. 49.
    Friedländer, J.S., et al.: Genetic structure of Pacific islanders. PLoS Genet. 4(1), e1–9 (2008) (Public Library of Science)Google Scholar
  50. 50.
    Utsurikawa, N.: A Genealogical and Classificatory Study of the Formosan Native Tribes. Toko shoin, Tokyo (1935)Google Scholar
  51. 51.
    Li, P.J.: Types of lexical derivation of men’s speech in Mayrinax. Bull. Inst. Hist. Philol. 54(3) 1–18 (1983) (Academia Sinica)Google Scholar
  52. 52.
    Li, P.J.: The dispersal of The Formosan aborigines in Taiwan. Lang. Linguist. 2(1), 271–278 (2001)Google Scholar
  53. 53.
    Volchenkov, D., Filippo, P., Maurizio, S., Søren, W.: Malagasy dialects and the peopling of madagascar, Journal of Royal Soc. Interface, p. 1–14, doi:10.1098/rsif.2011.0228 (2011)Google Scholar
  54. 54.
    The Austronesian Basic Vocabulary Database by R.D. Gray is publicly available on -line at

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Filippo Petroni
    • 3
  • Maurizio Serva
    • 2
  • Dimitri Volchenkov
    • 1
    Email author
  1. 1.Cognitive Interaction Technology—Center of ExcellenceUniversität BielefeldBielefeldGermany
  2. 2.Dipartimento di MatematicaUniversità dell’AquilaL’AquilaItaly
  3. 3.Dipartimento di Scienze Economiche ed Aziendali Università di Cagliari V.le S. IgnazioCagliariItaly

Personalised recommendations