Advertisement

Chinese Science Bulletin

, Volume 55, Issue 30, pp 3458–3465 | Cite as

Language clusters based on linguistic complex networks

Article Applied Physics

Abstract

To investigate the feasibility of using complex networks in the study of linguistic typology, this paper builds and explores 15 linguistic complex networks based on the dependency syntactic treebanks of 15 languages. The results show that it is possible to classify human languages by means of the following main parameters of complex networks: (a) average degree of the node, (b) cluster coefficients, (c) average path length, (d) network centralization, (e) diameter, (f) power exponent of degree distribution, and (g) the determination coefficient of power law distributions. The precision of this method is similar to the results achieved by means of modern word order typology. This paper tries to solve two problems of current linguistic typology. First, the language sample of a typological study is not real text; second, typological studies pay too much attention to local language structures in the course of choosing typological parameters. This study performs better in global typological features of language and not only enhances typological methods, but it is also valuable for developing the applications of complex networks in the humanities, social, and life sciences.

Keywords

complex networks linguistic typology language network syntactic dependency network cluster analysis language classification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hudson R. Language Networks: The New Word Grammar. Oxford: Oxford University Press, 2007Google Scholar
  2. 2.
    Ferrer i Cancho R. The structure of syntactic dependency networks: Insights from recent advances in network theory. In: Altmann G, Levickij V, Perebyinis V, eds. The Problems of Quantitative Linguistics. Chernivtsi: Ruta, 2005. 60–75Google Scholar
  3. 3.
    Ferrer i Cancho R, SoléR V, Köhler R. Patterns in syntactic dependency networks. Phys Rev E, 2004, 69: 051915CrossRefGoogle Scholar
  4. 4.
    Liang W, Shi Y, Tse C K, et al. Comparison of co-occurrence networks of the Chinese and English languages. Physica A, 2010, 388: 4901–4909CrossRefGoogle Scholar
  5. 5.
    Li J, Zhou J. Chinese character structure analysis based on complex networks. Physica A, 2007, 380: 629–638CrossRefGoogle Scholar
  6. 6.
    Li Y, Wei L, Niu Y, et al. Structural organization and scale-free properties in Chinese phrase networks. Chinese Sci Bull, 2005, 50: 1304–1308CrossRefGoogle Scholar
  7. 7.
    Liu H K, Zhang X L, Cao L, et al. Analysis on the connecting mechanism of Chinese city airline network (in Chinese). Sci China Ser G (Chinese Ver), 2009, 39: 935–942Google Scholar
  8. 8.
    Liu H. The complexity of Chinese dependency syntactic networks. Physica A, 2008, 387: 3048–3058Google Scholar
  9. 9.
    Altmann G, Lehfeldt W. Allgemeine Sprachtypologie: Prinzipien und Messverfahren. Munich: Fink, 1973Google Scholar
  10. 10.
    Croft W. Typology and Universals. 2nd ed. Cambridge: Cambridge University Press, 2002Google Scholar
  11. 11.
    Song J. Linguistic Typology: Morphology and Syntax. Harlow and London: Pearson Education, 2001Google Scholar
  12. 12.
    Liu H. Dependency direction as a means of word-order typology: A method based on dependency treebanks. Lingua, 2010, 120: 1567–1578CrossRefGoogle Scholar
  13. 13.
    Liu H. Statistical properties of Chinese semantic networks. Chinese Sci Bull, 2009, 54: 2781–2785CrossRefGoogle Scholar
  14. 14.
    Liu H, Hu F. What role does syntax play in a language network? Europhys Lett, 2008, 83: 18002CrossRefGoogle Scholar
  15. 15.
    Mehler A. Large text networks as an object of corpus linguistic studies. In: Lüdeling A, Merja K, eds. Corpus Lin-guistics. An International Handbook. Berlin, New York: de Gruyter, 2008. 328–382Google Scholar
  16. 16.
    Čech R, Mačutek J. Word form and lemma syntactic dependency networks in Czech: A comparative study. Glottometrics, 2009, 19:85–98Google Scholar
  17. 17.
    Choudhury M, Mukherjee A. The structure and dynamics of linguistic networks. In: Dynamics on and of Complex Networks, Modeling and Simulation in Science, Engineering and Technology. Boston: Birkhaeuser, 2009. 145–166Google Scholar
  18. 18.
    Ke J, Yao Y. Analyzing language development from a network approach. J Quant Linguistics, 2008, 15: 70–99CrossRefGoogle Scholar
  19. 19.
    Mukherjee A, Choudhury M, Basu A, et al. Self-organization of the sound inventories: Analysis and synthesis of the occurrence and co-occurrence networks of consonants. J Quant Linguistics, 2009, 16: 157–184CrossRefGoogle Scholar
  20. 20.
    Peng G, Minett J W, Wang W S Y. The networks of syllables and characters in Chinese. J Quant Linguistics, 2008, 15: 243–255CrossRefGoogle Scholar
  21. 21.
    He D, Liu Z, Wang B. Complex Systems and Complex Networks (in Chinese). Beijing: Higher Education Press, 2009Google Scholar
  22. 22.
    Albert R, Barabási A L. Statistical mechanics of complex networks. Rev Mod Phys, 2002, 74: 47–97CrossRefGoogle Scholar
  23. 23.
    Dong J, Horvath S. Understanding network concepts in modules. BMC Syst Biol, 2007, 1: 24CrossRefGoogle Scholar
  24. 24.
    Assenov Y, Ramírez F, Schelhorn S E, et al. Computing topological parameters of biological networks. Bioinformatics, 2008, 24: 282–284CrossRefGoogle Scholar
  25. 25.
    Aduriz I. Construction of a Basque dependency treebank. In: Proceedings of the 2nd Workshop on Treebanks and Linguistic Theories, Vaxjo, Sweden. 2003Google Scholar
  26. 26.
    Afonso S. Floresta sinta(c)tica: A treebank for Portuguese. In: Proceedings of LREC-2002, 2002. 1698–1703Google Scholar
  27. 27.
    Atalay N B, Oflazer K, Say B. The annotation process in the Turkish treebank. In: Proceedings of LINC-2003, 2003Google Scholar
  28. 28.
    Bamman D, Crane G. The design and use of a Latin dependency treebank. In: Proceedings of the Fifth International Workshop on Treebanks and Linguistic Theories (TLT 2006), 2006. 67–78Google Scholar
  29. 29.
    Bamman D, Mambrini F, Crane G. An ownership model of annotation: The ancient Greek dependency treebank. In: Proceedings of the Eighth International Workshop on Treebanks and Linguistic Theories (TLT8), 2009. 5–15Google Scholar
  30. 30.
    Csendes D. The szeged treebank. In: Proceedings of the 8th International Conference on Text, Speech and Dialogue, TSD 2005, LNAI 3658, 2005. 123–131Google Scholar
  31. 31.
    Torruella M C, Antonın M. Design principles for a Spanish treebank. In: Proceedings of TLT-2002, 2002Google Scholar
  32. 32.
    Kawata Y, Bartels J. Stylebook for the Japanese treebank in VERBMOBIL. Verbmobil-Report 240, Seminar fur Sprachwissenschaft, Universitat Tubingen, 2000Google Scholar
  33. 33.
    Liu H. Building and using a Chinese dependency treebank. GrKG/Humankybernetik, 2007, 48: 3–14Google Scholar
  34. 34.
    Montemagni S, Barsotti F, Battista M, et al. Building the Italian Syntactic-Semantic Treebank. Treebanks, 2003. 189–210Google Scholar
  35. 35.
    Prokopidis P, Desipri E, Koutsombogera M, et al. Theoretical and practical issues in the construction of a Greek dependency treebank. In: Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005), 2005. 149–160Google Scholar
  36. 36.
    Buchholz S, Marsi E. CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X), 2006. 149–164Google Scholar
  37. 37.
    Nivre J, Hall J, Kübler S, et al. The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007. 915–932Google Scholar
  38. 38.
    Hajic J, Smrz O, Zemanek P, et al. Prague Arabic dependency treebank: Development in data and tools. In: Proceedings of NEMLAR-2004, 2004. 110–117Google Scholar
  39. 39.
    Liu H. Dependency distance as a metric of language comprehension difficulty. J Cognit Sci, 2008, 9: 159–191Google Scholar
  40. 40.
    Clauset A, Shalizi C R, Newman M E J. Power-law distributions in empirical data. SIAM Rev, 2009, 51: 661–703CrossRefGoogle Scholar
  41. 41.
    Liu H T, Feng Z W. Probabilistic valency pattern theory for natural language processing (in Chinese). Linguistic Sci, 2007, 3: 32–41Google Scholar
  42. 42.
    Greenberg J H. A quantitative approach to the morphological typology of language. In: Method and Perspective in Anthropology. Minneapolis: University of Minnesota Press, 1954. 192–220Google Scholar
  43. 43.
    Cysouw M. New approaches to cluster analysis of typological indices. In: Köhler R, Grzbek P, eds. Exact Methods in the Study of Language and Text. Berlin: Mouton de Gruyter, 2007. 61–76Google Scholar
  44. 44.
    Bryant D, Moulton V. Neighbor-Net: An agglomerative method for the construction of phylogenetic networks. Mol Biol Evolut, 2004, 21: 255–265CrossRefGoogle Scholar
  45. 45.
    Deng X H, Wang S Y. Classification of Languages and Dialects in China (in Chinese). Beijing: ZhongHua Book Company, 2009Google Scholar
  46. 46.
    Haspelmath M, Dryer M, Gil D, et al. The World Atlas of Language Structures. Oxford: Oxford University Press, 2005Google Scholar
  47. 47.
    Liu H T, Zhao Y Y, Huang W. How do local syntactic structures influence global properties in language networks? Glottometrics, 2010, 20: 39–59Google Scholar

Copyright information

© Science China Press and Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.School of International StudiesZhejiang UniversityHangzhouChina
  2. 2.Institute of Applied LinguisticsCommunication University of ChinaBeijingChina

Personalised recommendations