Abstract
Using Phylogenetic Algebraic Geometry, we analyze computationally the phylogenetic tree of subfamilies of the Indo-European language family, using data of syntactic structures. The two main sources of syntactic data are the SSWL database and Longobardi’s recent data of syntactic parameters. We compute phylogenetic invariants and estimates of the Euclidean distance functions for two sets of Germanic languages, a set of Romance languages, a set of Slavic languages and a set of early Indo-European languages, and we compare the results with what is known through historical linguistics.
Similar content being viewed by others
References
Allman, E., Rhodes, J.: Phylogenetic ideals and varieties for general Markov models. Adv. Appl. Math. 40, 127–148 (2008)
Anthony, D.W., Ringe, D.: The Indo-European homeland from linguistic and archaeological perspectives. Annu. Rev. Linguist. 1, 199–219 (2015)
Baker, M.: The Atoms of Language. Basic Books, New York (2001)
Barbançon, F., Evans, S.N., Nakhleh, L., Ringe, D., Warnow, T.: An experimental study comparing linguistic phylogenetic reconstruction methods. Diachronica 30(2), 143–170 (2013)
Bocci, C.: Topics in phylogenetic algebraic geometry. Expo. Math. 25, 235–259 (2007)
Bouckaert, R., Lemey, P., Dunn, M., Greenhill, S.J., Alekseyenko, A.V., Drummond, A.J., Gray, R.D., Suchard, M.A., Atkinson, Q.D.: Mapping the origins and expansion of the Indo-European language family. Science 337, 957–960 (2012)
Bruns, W., Vetter, U.: Determinantal Rings. Lecture Notes in Mathematics, vol. 1327. Springer, Berlin (1988)
Casanellas, M., Fernández-Sánchez, J.: Performance of a new invariants method on homogeneous and nonhomogeneous quartet trees. Mol. Biol. Evol. 24(1), 288–293 (2007)
Cartwright, D., Häbich, M., Sturmfels, B., Werner, A.: Mustafin varieties. Selecta Math. (N.S.) 17(4), 757–793 (2011)
Chomsky, N.: Lectures on Government and Binding. Foris Publications, Dordrecht (1982)
Chomsky, N.: The Minimalist Program, 20th, Anniversary MIT Press (2015)
Chomsky, N., Lasnik, H.: The theory of Principles and Parameters. In: Syntax: An International Handbook of Contemporary Research, pp. 506–569, de Gruyter, (1993)
Draisma, J., Horobeţ, E., Ottaviani, G., Sturmfels, B., Thomas, R.: The Euclidean distance degree of an algebraic variety. Found. Comput. Math. 16(1), 99–149 (2016)
Eriksson, N.: Using invariants for phylogenetic tree construction. In: Emerging Applications of Algebraic Geometry, IMA Volumes in Mathematics and Its Applications, vol. 149, pp. 89–108. Springer (2009)
Eriksson, N., Ranestad, K., Sturmfels, B., Sullivant, S.: Phylogenetic Algebraic Geometry. In: Projective Varieties with Unexpected Properties, pp. 237–255. Walter de Gruyter (2005)
Forster, P., Renfrew, C.: Phylogenetic Methods and the Prehistory of Language. McDonald Institute Monographs, Cambridge (2006)
Gakkhar, S., Marcolli, M.: Syntactic structures and the general Markov model, in preparation
Gray, R.D., Atkinson, Q.D.: Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature 426(6965), 435–439 (2003)
Gusfield, D.: Recombinatorics. MIT Press, Cambridge (2014)
Harris, J.: Algebraic Geometry. Springer, Berlin (2013)
Kazakov, D., Cordoni, G., Algahtani, E., Ceolin, A., Irimia, M., Kim, S.S., Michelioudakis, D., Radkevich, N., Guardiano, C., Longobardi, G.: Learning implicational models of Universal Grammar parameters. In: EVOLANG XII, pp. 16–19 April 2018, Torun, Poland
Karimi, S., Piattelli-Palmarini M. (eds.): Special Issue on Parameters, Linguistic Analysis, vol. 41, No. 3–4 (2017)
Hauenstein, J., Rodriguez, J.I., Sturmfels, B.: Maximum likelihood for matrices with rank constraints. J. Algebr. Stat. 5(1), 18–38 (2014)
Longobardi, G.: Principles, parameters, and schemata. A constructivist UG. Linguist. Anal. 41(3–4), 517–556 (2017)
Longobardi, G.: A minimalist program for parametric linguistics? In: Broekhuis, H., Corver, N., Huybregts, M., Kleinhenz, U., Koster, J. (eds.) Organizing Grammar: Linguistic Studies for Henk van Riemsdijk, pp. 407–414. Mouton de Gruyter, Berlin (2005)
Longobardi, G.: Methods in parametric linguistics and cognitive history. Linguist. Var. Yearb. 3, 101–138 (2003)
Longobardi, G., Guardiano, C.: Evidence for syntax as a signal of historical relatedness. Lingua 119, 1679–1706 (2009)
Longobardi, G., Guardiano, C., Silvestri, G., Boattini, A., Ceolin, A.: Towards a syntactic phylogeny of modern Indo-European languages. J. Hist. Linguist. 3(1), 122–152 (2013)
Longobardi, G., Buch, A., Ceolin, A., Ecay, A., Guardiano, C., Irimia, M., Michelioudakis, D., Radkevich, N., Jaeger, G.: Correlated evolution or not? phylogenetic linguistics with syntactic, cognacy, and phonetic data. In: Roberts, S.G. et al. (eds.) The Evolution of Language: Proceedings of the 11th International Conference (EVOLANGX11), 2016 Online at http://evolang.org/neworleans/papers/162.html. (2016)
Marcolli, M.: Syntactic parameters and a coding theory perspective on entropy and complexity of language families. Entropy 18(4), 110 (2016)
Mirsky, L.: Symmetric gauge functions and unitarily invariant norms. Q. J. Math. 11, 1156–1159 (1966)
Murawaki, Y.: Continuous space representations of linguistic typology and their application to phylogenetic inference. In: Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the ACL, pp. 324–334 (2015)
Nakhleh, L., Ringe, D., Warnow, T.: Perfect phylogenetic networks: a new methodology for reconstructing the evolutionary history of natural languages. Language 81(2), 382–420 (2005)
Ortegaray, A., Berwick, R.C., Marcolli, M.: Heat kernel analysis of syntactic structures. arXiv:1803.09832, to appear in Mathematics in Computer Science
Pachter, L., Sturmfels, B.: The mathematics of phylogenomics. SIAM Rev. 49(1), 3–31 (2007)
Pachter, L., Sturmfels, B.: Tropical geometry of statistical models. Proc. Natl. Acad. Sci. (PNAS) 101(46), 16132–16137 (2004)
Pachter, L., Sturmfels, B.: Algebraic Statistics for Computational Biology. Cambridge University Press, Cambridge (2005)
Park, J.J., Boettcher, R., Zhao, A., Mun, A., Yuh, K., Kumar, V., Marcolli, M.: Prevalence and recoverability of syntactic parameters in sparse distributed memories. In: Geometric Science of Information. Third International Conference GSI 2017, Lecture Notes in Computer Science, vol. 10589, pp. 265–272. Springer (2017)
Perelysvaig, A., Lewis, M.W.: The Indo-European Controversy: Facts and Fallacies in Historical Linguistics. Cambridge University Press, Cambridge (2015)
PHYLIP: http://evolution.genetics.washington.edu/phylip.html
Port, A., Gheorghita, I., Guth, D., Clark, J.M., Liang, C., Dasu, S., Marcolli, M.: Persistent topology of syntax. Math. Comput. Sci. 12(1), 33–50 (2018)
Port, A., Karidi, T., Marcolli, M.: Topological analysis of syntactic structures. arXiv:1903.05181
Ringe, D., Warnow, T., Taylor, A.: Indo-European and computational cladistics. Trans. Philol. Soc. 100, 59–129 (2002)
Rizzi, L.: On the format and locus of parameters: the role of morphosyntactic features. Linguist. Anal. 41, 159–191 (2017)
Rusinko, J.P., Hipp, B.: Invariant based quartet puzzling. Algorithms Mol. Biol. 7, 35 (2012)
Shu, K., Marcolli, M.: Syntactic structures and code parameters. Math. Comput. Sci. 11(1), 79–90 (2017)
Shu, K., Aziz, S., Huynh, V.L., Warrick, D., Marcolli, M.: Syntactic phylogenetic trees. In: Kouneiher, J. (ed.) Foundations of Mathematics and Physics one Century After Hilbert, pp. 417–441. Springer, Berlin (2018)
Siva, K., Tao, J., Marcolli, M.: Spin glass models of syntax and language evolution. Linguist. Anal. 41(3–4), 559–608 (2017)
SSWL Database of Syntactic Parameters: http://sswl.railsplayground.net/
Sturmfels, B., Sullivant, S.: Toric ideals of phylogenetic invariants. J. Comput. Biol. 12(2), 204–228 (2005)
Warnow, T.: Computational Phylogenetics. Cambridge University Press, Cambridge (2017)
Warnow, T., Evans, S.N., Ringe, D., Nakhleh, L.: Stochastic models of language evolution and an application to the Indo-European family of languages. Available at http://www.stat.berkeley.edu/users/evans/659.pdf
Acknowledgements
The first and second author were partially supported by a Summer Undergraduate Research Fellowship at Caltech. The last author is partially supported by NSF Grant DMS-1707882, NSERC Discovery Grant RGPIN-2018-04937, Accelerator Supplement Grant RGPAS-2018-522593, and by the Perimeter Institute for Theoretical Physics. We are very grateful to the two anonymous referees for many very useful comments, corrections, and suggestions that greatly improved the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: SSWL Syntactic Variables of the Set \(\mathcal {S}_1(G)\) of Germanic Languages
We list here the 90 binary syntactic variables of the SSWL database that are completely mapped for the six Germanic languages \(\ell _1\,=\,\)Dutch, \(\ell _2\,=\,\)German, \(\ell _3\,=\,\)English, \(\ell _4\,=\,\)Faroese, \(\ell _5\,=\,\)Icelandic, \(\ell _6\,=\,\)Swedish. The column on the left in the tables lists the SSWL parameters P as labeled in the database.
Appendix B: SSWL Syntactic Variables of the Set \(\mathcal {S}_2(G)\) of Germanic Languages
We list here the 90 binary syntactic variables of the SSWL database that are completely mapped for the seven Germanic languages \(\ell _1\,=\,\)Norwegian, \(\ell _2\,=\,\)Danish, \(\ell _3\,=\,\)Gothic, \(\ell _4\,=\,\)Old English, \(\ell _5\,=\,\)Icelandic, \(\ell _6\,=\,\)English, \(\ell _7\,=\,\)German. The column on the left in the tables lists the SSWL parameters P as labeled in the database.
Appendix C: Flattening Matrices \(F_5\) and \(F_6\)
The flattening matrices of (3.1) (written in transpose form for convenience) for the \(T_5\) and \(T_6\) trees, in the case of the Longobardi data are given by the following:
The same flattening matrices of (3.1) for the SSWL data are given by the following.
Appendix D: List of LanGeLin Syntactic Parameters
FGP | Gramm. person | GSI | Grammaticalised inalienability |
FGM | Gramm. Case | ALP | Alienable possession |
FPC | Gramm. perception | GST | Grammaticalised Genitive |
FGT | Gramm. temporality | GEI | Genitive inversion |
FGN | Gramm. number | GNR | Non-referential head marking |
GCO | Gramm. collective number | STC | Structured cardinals |
PLS | Plurality spreading | GPC | Gender polarity cardinals |
FND | Number in D | PMN | Personal marking on numerals |
FSN | Feature spread on N | CQU | Cardinal quantifiers |
FNN | Number in N | PCA | Number spread through cardinal adjectives |
SGE | Semantic gender | PSC | Number spread from cardinal quantifiers |
FGG | Gramm. gender | RHM | Head-markong on Rel |
CGB | Unbounded sg N | FRC | Verbal relative clauses |
DGR | Gramm. amount | NRC | Nominalized relative clause |
DGP | Gramm. text anaphora | NOR | NP over verbal rel clauses/adpos gen |
CGR | Strong amount | AER | Relative extrap. |
NSD | Strong person | ARR | Free reduced rel |
FVP | Variable person | DOR | def on relatives |
DGD | Gramm. distality | NOD | NP over D |
DPQ | Free null partitive Q | NOP | NP over non-genitive arguments |
DCN | Article-checking N | PNP | P over complement |
DNN | Null-N-licensing art | NPP | N-raising with obl. pied-piping |
DIN | D-controlled infl. on N | NGO | N over GenO |
FGC | Gramm. classifier | NOA | N over As |
DBC | Strong classifier | NM2 | N over M2 As |
XCN | Conjugated nouns | NM1 | N over M1 As |
GSC | c-selection | EAF | Fronted high As |
NOE | N over ext. arg. | NON | N over numerals |
HMP | NP-heading modifier | FPO | Feature spread to genitive postpositions |
AST | Structured APs | ACM | Class MOD |
FFS | Feature spread to struct. APs | DOA | def on all +N |
ADI | D-controlled infl. on A | NEX | Gramm. expletive article |
DMP | def matching pron. poss. | NCL | Clitic poss. |
DMG | def matching genitives | PDC | Article-checking poss. |
GCN | Poss\(^o\)-checking N | ACL | Enclitic poss. on As |
GFN | Gen-feature spread to Poss\(^o\) | APO | Adjectival poss. |
GAL | Dependent Case in NP | WAP | Wackernagel adjectival poss. |
GUN | Uniform Gen | AGE | Adjectival Gen |
EZ1 | Generalized linker | OPK | Obligatory possessive with kinship noun |
EZ2 | Non-clausal linker | TSP | Split deictic demonstratives |
EZ3 | Non-genitive linker | TSD | Split demonstratives |
GAD | Adpositional Gen | TAD | Adjectival demonstratives |
GFO | GenO | TDC | Article-checking demonstratives |
PGO | Partial GenO | TLC | Loc-checking demonstratives |
GFS | GenS | TNL | NP over Loc |
GIT | Genitive-licensing iterator |
Rights and permissions
About this article
Cite this article
Shu, K., Ortegaray, A., Berwick, R.C. et al. Phylogenetics of Indo-European Language Families via an Algebro-Geometric Analysis of Their Syntactic Structures. Math.Comput.Sci. 15, 803–857 (2021). https://doi.org/10.1007/s11786-021-00507-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11786-021-00507-2
Keywords
- Phylogenetic algebraic geometry
- Syntactic parameters
- Historical linguistics
- Phylogenetic trees
- Indo-European languages