Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

From Texts to Networks: Detecting and Managing the Impact of Methodological Choices for Extracting Network Data from Text Data

Abstract

This thesis (Diesner in Technical Report CMU-ISR-12-101, 2012) addresses a series of methodological problems related to extracting information on socio-technical networks from natural language text data. Theories and models from the social sciences are leveraged and combined with computational approaches to (a) construct, analyze and compare network data and (b) combine text data and network data for analysis. This thesis entails various projects that serve three purposes: First, the impact of various common coding choices, including reference resolution and co-occurrence-based link formation, on network data and analysis results is empirically identified across multiple types of text data and domains. Second, different relation extraction methods are compared across various over-time, open-source, large-scale datasets with respect to the resulting network data and analysis results. This study offers a complement to traditional strategies for accuracy assessment. The relation extraction methods considered include network data construction based on (a) manually versus automatically built thesauri, (b) meta-data, and (c) collaboration with subject matter experts. Third, the concepts of grouping and roles from network analysis are integrated with text mining methods to enable the theoretically grounded, joint consideration of text data and network data for real-world applications.

Overall, in this thesis, an interdisciplinary and computationally rigorous approach is used; thereby advancing the intersection of network analysis, natural language processing and computing. The contributions made with this work help people to utilize text data for network analysis, and to collect, manage and interpret rich network data at any scale. These steps are preconditions for asking substantive and graph-theoretic questions, testing hypotheses, and advancing theories about networks.

This is a preview of subscription content, log in to check access.

Notes

  1. 1.

    In Natural Language Processing (NLP) and Information Extraction (IE), this task is also known as Named Entity Recognition.

  2. 2.

    In NLP and IE, this step, and sometimes all three steps together, is also referred to as Relation Extraction.

References

  1. 1.

    Abello J, Broadwell P, Tangherlini TR (2012) Computational folkloristics. Commun ACM 55(7):60–70

  2. 2.

    Alderson D (2008) Catching the ‘network science’ bug: insight and opportunity for the operations researcher. Oper Res 56(5):1047–1065

  3. 3.

    Brin S (1999) Extracting patterns and relations from the World Wide Web. Paper presented at The World Wide Web and databases, Valencia, Spain, March 27–28, 1998, pp. 172–183

  4. 4.

    Burt R, Lin N (1977) Network time series from archival records. In: Heise DR (ed) Sociological methodology, vol 1977. Jossey-Bass, San Francisco, pp 224–254

  5. 5.

    Carley KM, Palmquist M (1991) Extracting, representing, and analyzing mental models. Soc Forces 70(3):601–636

  6. 6.

    Danowski JA (1993) Network analysis of message content. Prog Commun Sci 12:198–221

  7. 7.

    Diesner J (2012) Uncovering and managing the impact of methodological choices for the computational construction of socio-technical networks from texts. Technical report CMU-ISR-12-101

  8. 8.

    Diesner J, Carley KM (2010) Relation extraction from texts (in German, title: Extraktion relationaler Daten aus Texten). In: Stegbauer C, Häußling R (eds) Handbook network research (Handbuch Netzwerkforschung). Vs Verlag, Wiesbaden, pp 507–521

  9. 9.

    Diesner J, Carley KM, Tambayong L (2012) Extracting socio-cultural networks of the Sudan from open-source, large-scale text data. Comput Math Organ Theory 18(3):328–339

  10. 10.

    Gerner D, Schrodt P, Francisco R, Weddle J (1994) Machine coding of event data using regional and international sources. Int Stud Q 38(1):91–119

  11. 11.

    Hämmerli A, Gattiker R, Weyermann R (2006) Conflict and cooperation in an actors’ network of Chechnya based on event data. J Confl Resolut 50(2):159–175

  12. 12.

    Hartley R, Barnden J (1997) Semantic networks: visualizations of knowledge. Trends Cogn Sci 1(5):169–175

  13. 13.

    Janas J, Schwind C (1979) Extensional semantic networks. In: Findler NV (ed) Associative networks. Representation and use of knowledge by computers. Academic Press, New York, pp 267–302

  14. 14.

    Johnson JC, Krempel L (2004) Network visualization: The “Bush team” in Reuters news ticker, 9/11–11/15/01. J Soc Struct 5

  15. 15.

    Parastatidis S, Viegas E, Hey T (2009) Viewpoint: smart cyberinfrastructure for research. A view of semantic computing and its role in research. Commun ACM 52(12):33–37

  16. 16.

    Trigg R, Weiser M (1986) TEXTNET: a network-based approach to text handling. ACM Trans Inf Syst 4(1):1–23

Download references

Acknowledgements

This work was supported by the National Science Foundation (NSF) IGERT 9972762, the Army Research Institute (ARI) W91WAW07C0063, the Army Research Laboratory (ARL/CTA) DAAD19-01-2-0009, the Air Force Office of Scientific Research (AFOSR) MURI FA9550-05-1-0388, the Office of Naval Research (ONR) MURI N00014-08-11186, and a Siebel Scholarship. Additional support was provided by CASOS, the Center for Computational Analysis of Social and Organizational Systems at Carnegie Mellon University. The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of any sponsor, including the NSF, ARI, ARL, AFOSR, ONR, or the United States Government. I am grateful to my dissertation committee, chaired by Dr. Kathleen M. Carley, and further including William Cohen, Carolyn Rosé and Jeffrey Johnson, for their comments on this work.

Author information

Correspondence to Jana Diesner.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Diesner, J. From Texts to Networks: Detecting and Managing the Impact of Methodological Choices for Extracting Network Data from Text Data. Künstl Intell 27, 75–78 (2013). https://doi.org/10.1007/s13218-012-0225-0

Download citation

Keywords

  • Socio-technical networks
  • Semantic networks
  • Entity extraction
  • Relation extraction
  • Reference resolution
  • Network clustering