Skip to main content

Text Type Differentiation Based on the Structural Properties of Language Networks

Part of the Communications in Computer and Information Science book series (CCIS,volume 639)

Abstract

In this paper co-occurrence language network measures from literature and legal texts are compared on the global and on the local scale. Our dataset consists of four legal texts and four short novellas both written in English. For each text we construct one directed and weighted network, where weight of a link between two nodes represents overall co-occurrence frequencies of the corresponding words. We choose four literature-law pairs of texts with approximately the same number of different words for comparison. The aim of this experiment was to investigate how complex network measures operate in different structures of texts and which of them are sensitive to different text types. Our results show that on the global scale only average strength is the measure that exhibit some uniform behaviour due to the differences in textual complexity. In general, global measures may not be well suited to discriminate between mentioned genres of texts. However, local perspective rank plots of in and out selectivity (average node strength) indicate that there are more noticeable structural differences between legal texts and literature.

Keywords

  • Cluster Coefficient
  • Network Measure
  • Text Type
  • Legal Text
  • Literature Text

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-46254-7_43
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   109.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-46254-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   139.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.

References

  1. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)

    CrossRef  Google Scholar 

  2. Cong, J., Liu, H.: Approaching human language with complex networks. Phys. Life Rev. 11(4), 598–618 (2014)

    MathSciNet  CrossRef  Google Scholar 

  3. Borge-Holthoefer, J., Arenas, A.: Semantic networks: structure and dynamics. Entropy 12(5), 1264–1302 (2010)

    CrossRef  MATH  Google Scholar 

  4. Cancho, R.F.I., Solé, R.V., Köhler, R.: Patterns in syntactic dependency networks. Phys. Rev. E 69(5), 051915 (2004)

    CrossRef  Google Scholar 

  5. Soares, M.M., Corso, G., Lucena, L.: The network of syllables in portuguese. Phys. A Stat. Mech. Appl. 355(2), 678–684 (2005)

    CrossRef  Google Scholar 

  6. Ban, K., Ivakic, I., Meštrović, A.: A preliminary study of croatian language syllable networks. In: 2013 36th International Convention on Information & Communication Technology Electronics & Microelectronics (MIPRO), pp. 1296–1300. IEEE (2013)

    Google Scholar 

  7. Solé, R.V., Corominas-Murtra, B., Valverde, S., Steels, L.: Language networks: their structure, function, and evolution. Complexity 15(6), 20–26 (2010)

    CrossRef  Google Scholar 

  8. Margan, D., Martinčić-Ipšić, S., Meštrović, A.: Preliminary report on the structure of Croatian linguistic co-occurrence networks. In: 5th International Conference on Information Technologies and Information Society (ITIS), pp. 89–96 (2013)

    Google Scholar 

  9. Ban Kirigin, T., Meštrović, A., Martinčić-Ipšić, S.: Towards a formal model of language networks. In: Dregvaite, G., Damasevicius, R. (eds.) ICIST 2015. CCIS, vol. 538, pp. 469–479. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24770-0_40

    Google Scholar 

  10. Šišović, S., Martinčić-Ipšić, S., Meštrović, A.: Comparison of the language networks from literature and blogs. In: 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1603–1608. IEEE (2014)

    Google Scholar 

  11. Opsahl, T., Agneessens, F., Skvoretz, J.: Node centrality in weighted networks: generalizing degree and shortest paths. Soc. Netw. 32(3), 245–251 (2010)

    CrossRef  Google Scholar 

  12. Amancio, D.R., Oliveira Jr., O.N., da Fontoura Costa, L.: Identification of literary movements using complex networks to represent texts. New J. Phys. 14(4), 043029 (2012)

    CrossRef  Google Scholar 

  13. Amancio, D.R., Aluisio, S.M., Oliveira Jr., O.N., da Fontoura Costa, L.: Complex networks analysis of language complexity. arXiv preprint arXiv:1302.4490 (2013)

  14. de Arruda, H.F., da Fontoura Costa, L., Amancio, D.R.: Classifying informative, imaginative prose using complex networks. arXiv preprint arXiv: 1507.07826 (2015)

  15. Antiqueira, L., Nunes, M.G.V., Oliveira Jr., O.N., da Fontoura Costa, L.: Strong correlations between text quality and complex networks features. Phys. A: Stat. Mech. Appl. 373, 811–820 (2007)

    Google Scholar 

  16. Masucci, A., Rodgers, G.: Differences between normal and shuffled texts: structural properties of weighted networks. Adv. Complex Syst. 12(01), 113–129 (2009)

    MathSciNet  CrossRef  Google Scholar 

  17. Margan, D., Meštrović, A., Martinčić-Ipšić, S.: Complex networks measures for differentiation between normal and shuffled Croatian texts. In: 37th International IEEE Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1598–1602 (2014)

    Google Scholar 

  18. Grabska-Gradzińska, I., Kulig, A., Kwapień, J., Drożdż, S.: Complex network analysis of literary and scientific texts. Int. J. Mod. Phys. C 23(07), 1250051 (2012)

    CrossRef  Google Scholar 

  19. Newman, M.E.J.: Networks, an introduction (2010)

    Google Scholar 

  20. Latora, V., Marchiori, M.: Efficient behavior of small-world networks. Physical Rev. Lett. 87(19), 198701 (2001)

    CrossRef  Google Scholar 

  21. Latora, V., Marchiori, M.: Economic small-world behavior in weighted networks. The Eur. Phys. J. B-Condens. Matter Complex Syst. 32(2), 249–263 (2003)

    CrossRef  Google Scholar 

  22. Project gutenberg. https://www.gutenberg.org/

  23. Newman, M.E.J.: Assortative mixing in networks. Phys. Rev. Lett. 89(20), 208701 (2002)

    CrossRef  Google Scholar 

  24. Schult, D.A., Swart, P.: Exploring network structure, dynamics, and function using networkx. In: Proceedings of the 7th Python in Science Conferences (SciPy 2008), vol. 2008, pp. 11–16 (2008)

    Google Scholar 

  25. Bastian, M., Heymann, S., Jacomy, M., et al.: Gephi: an open source software for exploring and manipulating networks. ICWSM 8, 361–362 (2009)

    Google Scholar 

  26. Margan, D., Meštrović, A., LaNCoA: a python toolkit for language networks construction and analysis. In: 38th International IEEE Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1961–1966 (2015)

    Google Scholar 

  27. Noldus, R., Van Mieghem, P.: Assortativity in complex networks. J. Complex Netw. 3(4), 507–542 (2015). http://dx.doi.org/10.1093/comnet/cnv005

    MathSciNet  CrossRef  Google Scholar 

  28. Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: Selectivity-Based Keyword Extraction Method. Int. J. Semant. Inf. Syst. (IJSWIS) 12(3) (2016, accepted)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanda Martinčić-Ipšić .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Martinčić-Ipšić, S., Miličić, T., Meštrović, A. (2016). Text Type Differentiation Based on the Structural Properties of Language Networks. In: Dregvaite, G., Damasevicius, R. (eds) Information and Software Technologies. ICIST 2016. Communications in Computer and Information Science, vol 639. Springer, Cham. https://doi.org/10.1007/978-3-319-46254-7_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46254-7_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46253-0

  • Online ISBN: 978-3-319-46254-7

  • eBook Packages: Computer ScienceComputer Science (R0)