Chinese Science Bulletin

, Volume 58, Issue 10, pp 1139–1144 | Cite as

Language clustering with word co-occurrence networks based on parallel texts

Open Access
Article Applied Physics

Abstract

This study investigates the feasibility of applying complex networks to fine-grained language classification and of employing word co-occurrence networks based on parallel texts as a substitute for syntactic dependency networks in complex-network-based language classification. 14 word co-occurrence networks were constructed based on parallel texts of 12 Slavic languages and 2 non-Slavic languages, respectively. With appropriate combinations of major parameters of these networks, cluster analysis was able to distinguish the Slavic languages from the non-Slavic and correctly group the Slavic languages into their respective sub-branches. Moreover, the clustering could also capture the genetic relationships of some of these Slavic languages within their sub-branches. The results have shown that word co-occurrence networks based on parallel texts are applicable to fine-grained language classification and they constitute a more convenient substitute for syntactic dependency networks in complex-network-based language classification.

Keywords

word co-occurrence network Slavic languages parallel texts language classification cluster analysis 

References

  1. 1.
    Costa L D F, Oliveira O N, Travieso G, et al. Analyzing and modeling real-world phenomena with complex networks: A survey of applications. Adv Phys, 2011, 60: 329–412CrossRefGoogle Scholar
  2. 2.
    Choudhury M, Mukherjee A. The structure and dynamics of linguistic networks. In: Dynamics on and of Complex Networks, Modeli and Simulation in Science, Engineering and Technology. Boston: Birkhaeuser, 2009. 145–166CrossRefGoogle Scholar
  3. 3.
    Kretzschmar W A. The Linguistics of Speech. New York: Cambridge University Press, 2009CrossRefGoogle Scholar
  4. 4.
    Steyvers M, Tenenbaum J B. The large-scale structure of semantic networks: Statistical analyses and a model of semantic growth. Cognit Sci, 2005, 29: 41–78CrossRefGoogle Scholar
  5. 5.
    Ferrer i Cancho R, Solé R V, Köhler R. Patterns in syntactic dependency networks. Phys Rev E, 2004, 69: 051915CrossRefGoogle Scholar
  6. 6.
    Liu H T. Statistical properties of Chinese semantic networks. Chin Sci Bull, 2009, 54: 2781–2785CrossRefGoogle Scholar
  7. 7.
    Liu H T, Li W W. Language clusters based on linguistic complex networks. Chin Sci Bull, 2010, 55: 3458–3465CrossRefGoogle Scholar
  8. 8.
    Liu H T, Xu C S. Can syntactic networks indicate morphological complexity of a language? Europhys Lett, 2011, 93: 28005CrossRefGoogle Scholar
  9. 9.
    Abramov O, Mehler A. Automatic language classification by means of syntactic dependency networks. J Quant Ling, 2011, 18: 291–336CrossRefGoogle Scholar
  10. 10.
    Ruhlen M. A Guide to the World’s Languages 1: Classification. Stanford: Stanford University Press, 1991Google Scholar
  11. 11.
    Shibatani M, Bynon T. Approaches to language typology: A conspectus. In: Approaches to language typology. New York: Oxford University Press, 1995. 1–26Google Scholar
  12. 12.
    Ferrer i Cancho R, Solé R V. The small world of human language. Proc R Soc Lond B, 2001, 268: 2261–2265CrossRefGoogle Scholar
  13. 13.
    Liu H T. Dependency distance as a metric of language comprehension difficulty. J Cognit Sci, 2008, 9: 159–191Google Scholar
  14. 14.
    Solé R V, Corominas-Murtra B, Valverde S, et al. Language networks: Their structure, function and evolution. Complexity, 2010, 15: 20–26CrossRefGoogle Scholar
  15. 15.
    Chen X Y, Liu H T. Central nodes of the Chinese syntactic networks (in Chinese). Chin Sci Bull (Chin Ver), 2011, 56: 735–740CrossRefGoogle Scholar
  16. 16.
    Katzner K. The Languages of the World (New Edition). London and New York: Routledge, 1995Google Scholar
  17. 17.
    Kelih E. The type-token relationship in Slavic parallel texts. Glottometrics, 2010, 20: 1–11Google Scholar
  18. 18.
    Assenov Y, Ramirez F, Schelhorn S E, et al. Computing topological parameters of biological networks. Bioinformatics, 2008, 24: 282–284CrossRefGoogle Scholar
  19. 19.
    Costa L D F, Rodrigues F A, Travieso G, et al. Characterization of complex networks: A survey of measurements. Adv Phys, 2007, 56: 167–242CrossRefGoogle Scholar
  20. 20.
    Altmann G, Lehfeldt W. Allgemeine Sprachtypologie. Munich: Fink, 1973Google Scholar
  21. 21.
    Novotná P, Blažek V. Glottochronolgy and its application to the Balto-Slavic languages. Baltistica, 2007, XLII: 185–210Google Scholar
  22. 22.
    Liu H T. Dependency direction as a means of word-order typology: A method based on dependency treebanks. Lingua, 2010, 120: 1567–1578CrossRefGoogle Scholar
  23. 23.
    Comrie B, Corbett G G. Introduction. In: The Slavonic Languages. London: Routledge, 2002. 1–19Google Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  1. 1.School of International StudiesZhejiang UniversityHangzhouChina

Personalised recommendations