Persistent Topology of Syntax


We study the persistent homology of a data set of syntactic parameters of world languages. We show that, while homology generators behave erratically over the whole data set, non-trivial persistent homology appears when one restricts to specific language families. Different families exhibit different persistent homology. We focus on the cases of the Indo-European and the Niger–Congo families, for which we compare persistent homology over different cluster filtering values. The persistent components appear to correspond to linguistic subfamilies, while the meaning, in historical linguistic terms, of the presence of persistent generators of the first homology is more mysterious. We investigate the possible significance of the persistent first homology generator that we find in the Indo-European family. We show that it is not due to the Anglo-Norman bridge (which is a lexical, not syntactic phenomenon), but is related instead to the position of Ancient Greek and the Hellenic branch within the Indo-European phylogenetic network.

  1. SSWL Database of Syntactic Parameters:

  2. Perseus Software Package for Persistent Homology:

This work was performed within the activities of the last author’s Mathematical and Computational Linguistics lab and CS101/Ma191 class at Caltech. The last author was partially supported by NSF Grants DMS-1007207, DMS-1201512, DMS-1707882, and PHY-1205440.

Port, A., Gheorghita, I., Guth, D. et al. Persistent Topology of Syntax. Math.Comput.Sci. 12, 33–50 (2018).

