The Distribution of Data in Word Lists and its Impact on the Subgrouping of Languages

  • Hans J. Holm
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


This work reveals the reason for the bias in the separation levels computed for natural languages with only a small amount of residues; as opposed to stochastically normal distributed test cases like those presented in Hohn (2007a). It is shown how these biased data can be correctly projected to true separation levels. The result is a partly new chain of separation for the main Indo-European branches that fits well to the grammatical facts, as well as to their geographical distribution. In particular it strongly demonstrates that the Anatolian languages did not part as first ones and thereby refutes the Indo-Hittite hypothesis.


Word List Separation Level Subgrouping Problem Linguistic Change Traditional Cluster Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. CYSOUW, M. (2004): cysouw/pdf/cysouwWIP.pdfGoogle Scholar
  2. CYSOUW, M., WICHMANN, S. and KAMHOLZ, D. (2006): A critique of the separation base method for genealogical subgrouping, with data from Mixe-Zoquean. Journal of Quantitative Linguistics, 13(2-3), 225-264.CrossRefGoogle Scholar
  3. EMBLETON, S.M. (1986): Statistics in historical linguistics [Quantitative Linguistics 30]. Brockmeyer, Bochum.Google Scholar
  4. GRZYBEK, P., and R. KÖHLER (Eds). (2007): Exact Methods in the Study of Language and Text [Quantitative Linguistics 62]. De Gruyter Berlin.Google Scholar
  5. HAMP, E.P. (1998): “Whose were the Tocharians? Linguistic subgrouping and Diagnostic Idiosyncrasy” The Bronze Age and Early Iron Age Peoples of Eastern Central Asia. Vol. 1:307-46. Edited by Victor H. Mair. Washington DC: Institute for the Study of Man.Google Scholar
  6. HOLM, H.J. (2000): Genealogy of the Main Indo-European Branches Applying the Separation Base Method. Journal of Quantitative Linguistics, 7-2, 73-95.CrossRefMathSciNetGoogle Scholar
  7. HOLM, H.J. (2003): The proportionality trap; or: What is wrong with lexicostatistics? In-dogermanische Forschungen 108, 38-46.Google Scholar
  8. HOLM, H.J. (2007a): Requirements and Limits of the Separation Level Recovery Method in Language Subgrouping. In: GRZYBEK, P. and KÖHLER, R. (Eds), Viribus Quantitatis. Exact Methods in the Study of Language and Text. Festschrift Gabriel Altmann zum 75. Geburtstag. Quantitative Linguistics 62. De Gruyter, Berlin.Google Scholar
  9. HOLM, H.J. (to appear 2007b): The new Arboretum of Indo-European “Trees”. Journal of Quantitative Linguistics 14-2.Google Scholar
  10. KENDALL, D.G. (1950): Discussion following Ross, A.S.C., Philological Probability Prob-lems. Journal of the Royal Statistical Society, Ser. B 12, p. 49f.Google Scholar
  11. POKORNY, J. (1959): Indogermanisches etymologisches Wörterbuch. Francke, Bern.Google Scholar
  12. RIX, H., KÜMMEL, M., ZEHNDER, Th., LIPP, R. and SCHIRMER, B. (2001): Lexikon der indogermanischen Verben. Die Wurzeln und ihre Primärstammbildungen. 2. Aufl. Reichert, Wiesbaden.Google Scholar
  13. SWOFFORD, D.L., OLSEN, G.J., Waddell, P.J., and HILLIS, D.M. (1996): “Phylogenetic Inference”. In: HILLIS, D.M., M. CRAIG, and B.K. MABLE (Eds). Molecular System-atics, Second Edition. Sinauer Associates, Sunderland MA, Chapter 11.Google Scholar
  14. WALDE, A., and J. Pokorny (Ed). (1926-1932): Vergleichendes Wörterbuch der indogerman-ischen Sprachen. de Gruyter, Berlin.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Hans J. Holm
    • 1
  1. 1.HannoverGermany

Personalised recommendations