Abstract
In this paper co-occurrence language network measures from literature and legal texts are compared on the global and on the local scale. Our dataset consists of four legal texts and four short novellas both written in English. For each text we construct one directed and weighted network, where weight of a link between two nodes represents overall co-occurrence frequencies of the corresponding words. We choose four literature-law pairs of texts with approximately the same number of different words for comparison. The aim of this experiment was to investigate how complex network measures operate in different structures of texts and which of them are sensitive to different text types. Our results show that on the global scale only average strength is the measure that exhibit some uniform behaviour due to the differences in textual complexity. In general, global measures may not be well suited to discriminate between mentioned genres of texts. However, local perspective rank plots of in and out selectivity (average node strength) indicate that there are more noticeable structural differences between legal texts and literature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
Cong, J., Liu, H.: Approaching human language with complex networks. Phys. Life Rev. 11(4), 598–618 (2014)
Borge-Holthoefer, J., Arenas, A.: Semantic networks: structure and dynamics. Entropy 12(5), 1264–1302 (2010)
Cancho, R.F.I., Solé, R.V., Köhler, R.: Patterns in syntactic dependency networks. Phys. Rev. E 69(5), 051915 (2004)
Soares, M.M., Corso, G., Lucena, L.: The network of syllables in portuguese. Phys. A Stat. Mech. Appl. 355(2), 678–684 (2005)
Ban, K., Ivakic, I., Meštrović, A.: A preliminary study of croatian language syllable networks. In: 2013 36th International Convention on Information & Communication Technology Electronics & Microelectronics (MIPRO), pp. 1296–1300. IEEE (2013)
Solé, R.V., Corominas-Murtra, B., Valverde, S., Steels, L.: Language networks: their structure, function, and evolution. Complexity 15(6), 20–26 (2010)
Margan, D., Martinčić-Ipšić, S., Meštrović, A.: Preliminary report on the structure of Croatian linguistic co-occurrence networks. In: 5th International Conference on Information Technologies and Information Society (ITIS), pp. 89–96 (2013)
Ban Kirigin, T., Meštrović, A., Martinčić-Ipšić, S.: Towards a formal model of language networks. In: Dregvaite, G., Damasevicius, R. (eds.) ICIST 2015. CCIS, vol. 538, pp. 469–479. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24770-0_40
Šišović, S., Martinčić-Ipšić, S., Meštrović, A.: Comparison of the language networks from literature and blogs. In: 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1603–1608. IEEE (2014)
Opsahl, T., Agneessens, F., Skvoretz, J.: Node centrality in weighted networks: generalizing degree and shortest paths. Soc. Netw. 32(3), 245–251 (2010)
Amancio, D.R., Oliveira Jr., O.N., da Fontoura Costa, L.: Identification of literary movements using complex networks to represent texts. New J. Phys. 14(4), 043029 (2012)
Amancio, D.R., Aluisio, S.M., Oliveira Jr., O.N., da Fontoura Costa, L.: Complex networks analysis of language complexity. arXiv preprint arXiv:1302.4490 (2013)
de Arruda, H.F., da Fontoura Costa, L., Amancio, D.R.: Classifying informative, imaginative prose using complex networks. arXiv preprint arXiv: 1507.07826 (2015)
Antiqueira, L., Nunes, M.G.V., Oliveira Jr., O.N., da Fontoura Costa, L.: Strong correlations between text quality and complex networks features. Phys. A: Stat. Mech. Appl. 373, 811–820 (2007)
Masucci, A., Rodgers, G.: Differences between normal and shuffled texts: structural properties of weighted networks. Adv. Complex Syst. 12(01), 113–129 (2009)
Margan, D., Meštrović, A., Martinčić-Ipšić, S.: Complex networks measures for differentiation between normal and shuffled Croatian texts. In: 37th International IEEE Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1598–1602 (2014)
Grabska-Gradzińska, I., Kulig, A., Kwapień, J., Drożdż, S.: Complex network analysis of literary and scientific texts. Int. J. Mod. Phys. C 23(07), 1250051 (2012)
Newman, M.E.J.: Networks, an introduction (2010)
Latora, V., Marchiori, M.: Efficient behavior of small-world networks. Physical Rev. Lett. 87(19), 198701 (2001)
Latora, V., Marchiori, M.: Economic small-world behavior in weighted networks. The Eur. Phys. J. B-Condens. Matter Complex Syst. 32(2), 249–263 (2003)
Project gutenberg. https://www.gutenberg.org/
Newman, M.E.J.: Assortative mixing in networks. Phys. Rev. Lett. 89(20), 208701 (2002)
Schult, D.A., Swart, P.: Exploring network structure, dynamics, and function using networkx. In: Proceedings of the 7th Python in Science Conferences (SciPy 2008), vol. 2008, pp. 11–16 (2008)
Bastian, M., Heymann, S., Jacomy, M., et al.: Gephi: an open source software for exploring and manipulating networks. ICWSM 8, 361–362 (2009)
Margan, D., Meštrović, A., LaNCoA: a python toolkit for language networks construction and analysis. In: 38th International IEEE Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1961–1966 (2015)
Noldus, R., Van Mieghem, P.: Assortativity in complex networks. J. Complex Netw. 3(4), 507–542 (2015). http://dx.doi.org/10.1093/comnet/cnv005
Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: Selectivity-Based Keyword Extraction Method. Int. J. Semant. Inf. Syst. (IJSWIS) 12(3) (2016, accepted)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Martinčić-Ipšić, S., Miličić, T., Meštrović, A. (2016). Text Type Differentiation Based on the Structural Properties of Language Networks. In: Dregvaite, G., Damasevicius, R. (eds) Information and Software Technologies. ICIST 2016. Communications in Computer and Information Science, vol 639. Springer, Cham. https://doi.org/10.1007/978-3-319-46254-7_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-46254-7_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46253-0
Online ISBN: 978-3-319-46254-7
eBook Packages: Computer ScienceComputer Science (R0)