Abstract
This study investigates the feasibility of applying complex networks to fine-grained language classification and of employing word co-occurrence networks based on parallel texts as a substitute for syntactic dependency networks in complex-network-based language classification. 14 word co-occurrence networks were constructed based on parallel texts of 12 Slavic languages and 2 non-Slavic languages, respectively. With appropriate combinations of major parameters of these networks, cluster analysis was able to distinguish the Slavic languages from the non-Slavic and correctly group the Slavic languages into their respective sub-branches. Moreover, the clustering could also capture the genetic relationships of some of these Slavic languages within their sub-branches. The results have shown that word co-occurrence networks based on parallel texts are applicable to fine-grained language classification and they constitute a more convenient substitute for syntactic dependency networks in complex-network-based language classification.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Costa L D F, Oliveira O N, Travieso G, et al. Analyzing and modeling real-world phenomena with complex networks: A survey of applications. Adv Phys, 2011, 60: 329–412
Choudhury M, Mukherjee A. The structure and dynamics of linguistic networks. In: Dynamics on and of Complex Networks, Modeli and Simulation in Science, Engineering and Technology. Boston: Birkhaeuser, 2009. 145–166
Kretzschmar W A. The Linguistics of Speech. New York: Cambridge University Press, 2009
Steyvers M, Tenenbaum J B. The large-scale structure of semantic networks: Statistical analyses and a model of semantic growth. Cognit Sci, 2005, 29: 41–78
Ferrer i Cancho R, Solé R V, Köhler R. Patterns in syntactic dependency networks. Phys Rev E, 2004, 69: 051915
Liu H T. Statistical properties of Chinese semantic networks. Chin Sci Bull, 2009, 54: 2781–2785
Liu H T, Li W W. Language clusters based on linguistic complex networks. Chin Sci Bull, 2010, 55: 3458–3465
Liu H T, Xu C S. Can syntactic networks indicate morphological complexity of a language? Europhys Lett, 2011, 93: 28005
Abramov O, Mehler A. Automatic language classification by means of syntactic dependency networks. J Quant Ling, 2011, 18: 291–336
Ruhlen M. A Guide to the World’s Languages 1: Classification. Stanford: Stanford University Press, 1991
Shibatani M, Bynon T. Approaches to language typology: A conspectus. In: Approaches to language typology. New York: Oxford University Press, 1995. 1–26
Ferrer i Cancho R, Solé R V. The small world of human language. Proc R Soc Lond B, 2001, 268: 2261–2265
Liu H T. Dependency distance as a metric of language comprehension difficulty. J Cognit Sci, 2008, 9: 159–191
Solé R V, Corominas-Murtra B, Valverde S, et al. Language networks: Their structure, function and evolution. Complexity, 2010, 15: 20–26
Chen X Y, Liu H T. Central nodes of the Chinese syntactic networks (in Chinese). Chin Sci Bull (Chin Ver), 2011, 56: 735–740
Katzner K. The Languages of the World (New Edition). London and New York: Routledge, 1995
Kelih E. The type-token relationship in Slavic parallel texts. Glottometrics, 2010, 20: 1–11
Assenov Y, Ramirez F, Schelhorn S E, et al. Computing topological parameters of biological networks. Bioinformatics, 2008, 24: 282–284
Costa L D F, Rodrigues F A, Travieso G, et al. Characterization of complex networks: A survey of measurements. Adv Phys, 2007, 56: 167–242
Altmann G, Lehfeldt W. Allgemeine Sprachtypologie. Munich: Fink, 1973
Novotná P, Blažek V. Glottochronolgy and its application to the Balto-Slavic languages. Baltistica, 2007, XLII: 185–210
Liu H T. Dependency direction as a means of word-order typology: A method based on dependency treebanks. Lingua, 2010, 120: 1567–1578
Comrie B, Corbett G G. Introduction. In: The Slavonic Languages. London: Routledge, 2002. 1–19
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is published with open access at Springerlink.com
Rights and permissions
This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.
About this article
Cite this article
Liu, H., Cong, J. Language clustering with word co-occurrence networks based on parallel texts. Chin. Sci. Bull. 58, 1139–1144 (2013). https://doi.org/10.1007/s11434-013-5711-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11434-013-5711-8