Computational and Mathematical Organization Theory

, Volume 14, Issue 3, pp 248–262 | Cite as

Conditional random fields for entity extraction and ontological text coding

Article

Abstract

Previous research suggests that one field with a strong yet unsatisfied need for automatically extracting instances of various entity classes from texts is the analysis of socio-technical systems (Feldstein in Media in Transition MiT5, 2007; Hampe et al. in Netzwerkanalyse und Netzwerktheorie, 2007; Weil et al. in Proceedings of the 2006 Command and Control Research and Technology Symposium, 2006; Diesner and Carley in XXV Sunbelt Social Network Conference, 2005). Traditional as well as non-traditional and customized sets of entity classes and the relationships between them are often specified in ontologies or taxonomies. We present a Conditional Random Fields (CRF)-based approach to distilling a set of entities that are defined in an ontology originating from organization science. CRF, a supervised sequential machine learning technique, facilitates the derivation of relational data from corpora by locating and classifying instances of various entity classes. The classified entities can be used as nodes for the construction of socio-technical networks. We find the outcome sufficiently accurate (82.7 percent accuracy of locating and classifying entities) for future application in the described problem domain. We propose using the presented methodology as a crucial step in the process of advanced modeling and analysis of complex and dynamic networks.

Keywords

Ontological Text Coding Semantic networks Entity Extraction Supervised machine learning Conditional models Conditional Random Fields 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bikel DM, Schwartz R, Weischedel RM (1999) An algorithm that learns what’s in a name. Mach Learn 34(1–3):211–231 CrossRefGoogle Scholar
  2. Borthwick A, Sterling J, Agichtein E, Grishman R (1998) Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In: Sixth workshop on very large corpora association for computational linguistics, Montreal, QC, Canada, August 1998, pp 152–160 Google Scholar
  3. Carley KM (2002) Smart agents and organizations of the future. In: Lievrouw L, Livingstone S (eds) The handbook of new media. Sage, Thousand Oaks, pp 206–220 Google Scholar
  4. Carley KM, Frantz T, Diesner J (2006) Social and knowledge networks from large scale databases. In: 56th annual conference of the international communication association (ICA), Dresden, Germany, June 2006 Google Scholar
  5. CoNLL-2003 (2003) In: Proceedings of seventh conference on natural language learning (CoNLL-2003), Edmonton, Canada, May–June 2003 Google Scholar
  6. Diesner J, Carley KM (2005) Revealing and comparing the organizational structure of covert networks with network text analysis. In: XXV Sunbelt social network conference, Redondo Beach X, CA, February 2005 Google Scholar
  7. Diesner J, Carley KM (2006) Revealing social structure from texts: meta-matrix text analysis as a novel method for network text analysis. In: Narayanan VK, Armstrong DJ (eds) Causal mapping for information systems and technology research: approaches, advances, and illustrations. Idea Group, Harrisburg, pp 81–108 Google Scholar
  8. Dietterich TG (2002) Machine learning for sequential data: A review. In: Joint IAPR international workshops SSPR 2002 and SPR 2002, Windsor, ON, Canada, August 2002 Google Scholar
  9. Feldstein A (2007) Brand communities in a world of knowledge-based products and common property. Media in Transition MiT5, Cambridge, MA, April 2007 Google Scholar
  10. Freitag D (1997) Using grammatical inference to improve precision in information extraction. In: Fourteenth international conference on machine learning, workshop on automata induction, grammatical inference, and language acquisition, Nashville, TN Google Scholar
  11. Hampe P, Hatzel I, Höhnsch J, Ueschner P (2007) Forschungsgruppe Transparentes Parlament, Ein neues Paradigma in den Sozialwissenschaften. Netzwerkanalyse und Netzwerktheorie, Frankfurt, Germany, September 2007 Google Scholar
  12. Krackhardt D, Carley KM (1998) A PCANS model of structure in organization. In: Proceedings of the 1998 international symposium on command and control, research and technology, Monterrey, CA, June 1998, pp 113–119 Google Scholar
  13. Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Eighteenth international conference on machine learning (ICML-01), San Francisco, pp 282–289 Google Scholar
  14. Linguistic Data Consortium (LDC) Automatic Content Extraction (ACE) (2007) http://projectsldcupennedu/ace/ DARPA. Accessed March 20 2007
  15. McCallum A (2005) Information extraction: distilling structured data from unstructured text. ACM Queue 3(9):48–57 CrossRefGoogle Scholar
  16. Message Understanding Conferences (MUC) 6 (2006) Named entity task definition 1995. http://csnyuedu/cs/faculty/grishman/NEtask20book_1html. Accessed 11 March 2007
  17. Sarawagi S (n.d.) CRF project page. http://crfsourceforgenet/. Accessed 20 March 2007
  18. Sha F, Pereira F (2003) Shallow parsing with conditional random fields. In: Proceedings of human language technology, HLT-NAACL 2003, pp 213–220 Google Scholar
  19. Weil SA, Carley KM, Diesner J, Freeman J, Cooke NJ (2006) Measuring situational awareness through analysis of communications: A preliminary exercise. In: Proceedings of the 2006 command and control research and technology symposium, San Diego, CA Google Scholar
  20. Weischedel R, Brunstein A (2005) In: BBN pronoun coreference and entity type corpus linguistic data consortium, Philadelphia, LDC2005T33 Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.School of Computer Science, Institute for Software Research, Center for Computational Analysis of Social and Organizational Systems (CASOS)Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations