This study investigates the feasibility of applying complex networks to fine-grained language classification and of employing word co-occurrence networks based on parallel texts as a substitute for syntactic dependency networks in complex-network-based language classification. 14 word co-occurrence networks were constructed based on parallel texts of 12 Slavic languages and 2 non-Slavic languages, respectively. With appropriate combinations of major parameters of these networks, cluster analysis was able to distinguish the Slavic languages from the non-Slavic and correctly group the Slavic languages into their respective sub-branches. Moreover, the clustering could also capture the genetic relationships of some of these Slavic languages within their sub-branches. The results have shown that word co-occurrence networks based on parallel texts are applicable to fine-grained language classification and they constitute a more convenient substitute for syntactic dependency networks in complex-network-based language classification.
word co-occurrence network Slavic languages parallel texts language classification cluster analysis
Costa L D F, Oliveira O N, Travieso G, et al. Analyzing and modeling real-world phenomena with complex networks: A survey of applications. Adv Phys, 2011, 60: 329–412CrossRefGoogle Scholar
Choudhury M, Mukherjee A. The structure and dynamics of linguistic networks. In: Dynamics on and of Complex Networks, Modeli and Simulation in Science, Engineering and Technology. Boston: Birkhaeuser, 2009. 145–166CrossRefGoogle Scholar