Text Type Differentiation Based on the Structural Properties of Language Networks

  • Sanda Martinčić-Ipšić
  • Tanja Miličić
  • Ana Meštrović
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 639)

Abstract

In this paper co-occurrence language network measures from literature and legal texts are compared on the global and on the local scale. Our dataset consists of four legal texts and four short novellas both written in English. For each text we construct one directed and weighted network, where weight of a link between two nodes represents overall co-occurrence frequencies of the corresponding words. We choose four literature-law pairs of texts with approximately the same number of different words for comparison. The aim of this experiment was to investigate how complex network measures operate in different structures of texts and which of them are sensitive to different text types. Our results show that on the global scale only average strength is the measure that exhibit some uniform behaviour due to the differences in textual complexity. In general, global measures may not be well suited to discriminate between mentioned genres of texts. However, local perspective rank plots of in and out selectivity (average node strength) indicate that there are more noticeable structural differences between legal texts and literature.

References

  1. 1.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)CrossRefGoogle Scholar
  2. 2.
    Cong, J., Liu, H.: Approaching human language with complex networks. Phys. Life Rev. 11(4), 598–618 (2014)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Borge-Holthoefer, J., Arenas, A.: Semantic networks: structure and dynamics. Entropy 12(5), 1264–1302 (2010)CrossRefMATHGoogle Scholar
  4. 4.
    Cancho, R.F.I., Solé, R.V., Köhler, R.: Patterns in syntactic dependency networks. Phys. Rev. E 69(5), 051915 (2004)CrossRefGoogle Scholar
  5. 5.
    Soares, M.M., Corso, G., Lucena, L.: The network of syllables in portuguese. Phys. A Stat. Mech. Appl. 355(2), 678–684 (2005)CrossRefGoogle Scholar
  6. 6.
    Ban, K., Ivakic, I., Meštrović, A.: A preliminary study of croatian language syllable networks. In: 2013 36th International Convention on Information & Communication Technology Electronics & Microelectronics (MIPRO), pp. 1296–1300. IEEE (2013)Google Scholar
  7. 7.
    Solé, R.V., Corominas-Murtra, B., Valverde, S., Steels, L.: Language networks: their structure, function, and evolution. Complexity 15(6), 20–26 (2010)CrossRefGoogle Scholar
  8. 8.
    Margan, D., Martinčić-Ipšić, S., Meštrović, A.: Preliminary report on the structure of Croatian linguistic co-occurrence networks. In: 5th International Conference on Information Technologies and Information Society (ITIS), pp. 89–96 (2013)Google Scholar
  9. 9.
    Ban Kirigin, T., Meštrović, A., Martinčić-Ipšić, S.: Towards a formal model of language networks. In: Dregvaite, G., Damasevicius, R. (eds.) ICIST 2015. CCIS, vol. 538, pp. 469–479. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24770-0_40 Google Scholar
  10. 10.
    Šišović, S., Martinčić-Ipšić, S., Meštrović, A.: Comparison of the language networks from literature and blogs. In: 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1603–1608. IEEE (2014)Google Scholar
  11. 11.
    Opsahl, T., Agneessens, F., Skvoretz, J.: Node centrality in weighted networks: generalizing degree and shortest paths. Soc. Netw. 32(3), 245–251 (2010)CrossRefGoogle Scholar
  12. 12.
    Amancio, D.R., Oliveira Jr., O.N., da Fontoura Costa, L.: Identification of literary movements using complex networks to represent texts. New J. Phys. 14(4), 043029 (2012)CrossRefGoogle Scholar
  13. 13.
    Amancio, D.R., Aluisio, S.M., Oliveira Jr., O.N., da Fontoura Costa, L.: Complex networks analysis of language complexity. arXiv preprint arXiv:1302.4490 (2013)
  14. 14.
    de Arruda, H.F., da Fontoura Costa, L., Amancio, D.R.: Classifying informative, imaginative prose using complex networks. arXiv preprint arXiv: 1507.07826 (2015)
  15. 15.
    Antiqueira, L., Nunes, M.G.V., Oliveira Jr., O.N., da Fontoura Costa, L.: Strong correlations between text quality and complex networks features. Phys. A: Stat. Mech. Appl. 373, 811–820 (2007)Google Scholar
  16. 16.
    Masucci, A., Rodgers, G.: Differences between normal and shuffled texts: structural properties of weighted networks. Adv. Complex Syst. 12(01), 113–129 (2009)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Margan, D., Meštrović, A., Martinčić-Ipšić, S.: Complex networks measures for differentiation between normal and shuffled Croatian texts. In: 37th International IEEE Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1598–1602 (2014)Google Scholar
  18. 18.
    Grabska-Gradzińska, I., Kulig, A., Kwapień, J., Drożdż, S.: Complex network analysis of literary and scientific texts. Int. J. Mod. Phys. C 23(07), 1250051 (2012)CrossRefGoogle Scholar
  19. 19.
    Newman, M.E.J.: Networks, an introduction (2010)Google Scholar
  20. 20.
    Latora, V., Marchiori, M.: Efficient behavior of small-world networks. Physical Rev. Lett. 87(19), 198701 (2001)CrossRefGoogle Scholar
  21. 21.
    Latora, V., Marchiori, M.: Economic small-world behavior in weighted networks. The Eur. Phys. J. B-Condens. Matter Complex Syst. 32(2), 249–263 (2003)CrossRefGoogle Scholar
  22. 22.
    Project gutenberg. https://www.gutenberg.org/
  23. 23.
    Newman, M.E.J.: Assortative mixing in networks. Phys. Rev. Lett. 89(20), 208701 (2002)CrossRefGoogle Scholar
  24. 24.
    Schult, D.A., Swart, P.: Exploring network structure, dynamics, and function using networkx. In: Proceedings of the 7th Python in Science Conferences (SciPy 2008), vol. 2008, pp. 11–16 (2008)Google Scholar
  25. 25.
    Bastian, M., Heymann, S., Jacomy, M., et al.: Gephi: an open source software for exploring and manipulating networks. ICWSM 8, 361–362 (2009)Google Scholar
  26. 26.
    Margan, D., Meštrović, A., LaNCoA: a python toolkit for language networks construction and analysis. In: 38th International IEEE Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1961–1966 (2015)Google Scholar
  27. 27.
    Noldus, R., Van Mieghem, P.: Assortativity in complex networks. J. Complex Netw. 3(4), 507–542 (2015). http://dx.doi.org/10.1093/comnet/cnv005 MathSciNetCrossRefGoogle Scholar
  28. 28.
    Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: Selectivity-Based Keyword Extraction Method. Int. J. Semant. Inf. Syst. (IJSWIS) 12(3) (2016, accepted)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Sanda Martinčić-Ipšić
    • 1
  • Tanja Miličić
    • 1
  • Ana Meštrović
    • 1
  1. 1.Department of InformaticsUniversity of RijekaRijekaCroatia

Personalised recommendations