Automatic Extraction of Events from Open Source Text for Predictive Forecasting

Boschee, Elizabeth; Natarajan, Premkumar; Weischedel, Ralph

doi:10.1007/978-1-4614-5311-6_3

Automatic Extraction of Events from Open Source Text for Predictive Forecasting

Elizabeth Boschee²,
Premkumar Natarajan² &
Ralph Weischedel²

Chapter
First Online: 01 January 2012

2104 Accesses
19 Citations

Abstract

Automated analysis of news reports is a significant empowering technology for predictive models of political instability. To date, the standard approach to this analytic task has been embodied in systems such as KEDS/TABARI [1], which use manually-generated rules and shallow parsing techniques to identify events and their participants in text. In this chapter we explore an alternative to event extraction based on BBN SERIF^TM, and BBN OnTopic^TM, two state-of-the-art statistical natural language processing engines. We empirically compare this new approach to existing event extraction techniques on five dimensions: (1) Accuracy: when an event is reported by the system, how often is it correct? (2) Coverage: how many events are correctly reported by the system? (3) Filtering of historical events: how well are historical events (e.g. 9/11) correctly filtered out of the current event data stream? (4) Topic-based event filtering: how well do systems filter out red herrings based on document topic, such as sports documents mentioning “clashes” between two countries on the playing field? (5) Domain shift: how well do event extraction models perform on data originating from diverse sources? In all dimensions we show significant improvement to the state-of-the-art by applying statistical natural language processing techniques. It is our hope that these results will lead to greater acceptance of automated coding by creators and consumers of social science models that depend on event data and provide a new way to improve the accuracy of those predictive models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://www.nist.gov/speech/tests/ace/
2.
The evaluation corpus included approximately 250,000 documents. Documents judged to be a near duplicate via BBN’s semantic de-duplication filter were removed before evaluation.
3.
Specifically, the percentages of triples seen by two annotators were 100% for Violence, 100% for Provide Aid, and 69% for Disapprove.
4.
King and Lowe report a suite of numbers for accuracy; this number assumes a constant weighting across event categories. In addition, King and Lowe report an 85% accuracy number for a very different metric: the probability of a correct event or non-event judgment on a given sentence. We did not compute this number. Given the high percentage of non-event sentences in our data, it would be meaninglessly high—the trivial baseline, where a system never returns an event, would achieve 96% accuracy on our data set. In contrast, the King and Lowe test set is specifically constructed to contain mostly sentences that have a valid event of some kind (and their raw data pool is also more event-heavy than ours), so that number has a very different meaning in their context.

References

Schrodt P (2001) Automated coding of international event data using sparse parsing techniques. Paper presented at the International Studies Association, Chicago
Google Scholar
O’Brien S (2010) Crisis early warning and decision support: contemporary approaches and thoughts on future research. Int Stud Rev 12:87–104. doi: 10.1111/j.1468-2486.2009.00914.x
Article MathSciNet Google Scholar
Ramshaw L, Boschee E, Freedman M, MacBride J, Weischedel R, Zamanian A (2011) SERIF language processing—effective trainable language understanding. Handbook of natural language processing and machine translation: DARPA Global Autonomous Language Exploitation. Springer, New York
Google Scholar
Gerner D, Schrodt P, Yilmaz O, Abu-Jabr R (2002) Conflict and Mediation Event Observations (CAMEO): a new event data framework for the analysis of foreign policy interactions. Paper presented at the International Studies Association, New Orleans, and American Political Science Association, Boston
Google Scholar
Schrodt P, Yilmaz O, Gerner D, Hermreck D (2008) The CAMEO (Conflict and Mediation Event Observations) actor coding framework. Paper presented at the International Studies Association, San Francisco
Google Scholar
Olive J, Christianson C, McCary J (2011) Handbook of natural language processing and machine translation: DARPA Global Autonomous Language Exploitation. Springer, New York
Book MATH Google Scholar
Maybury M (2004) New directions in question answering. AAAI Press/The MIT Press, Menlo Park
Google Scholar
King G, Lowe W (2003) An automated information extraction tool for international conflict data with performance as good as human coders: a rare events evaluation design. Int Organ 57:617–642
Article Google Scholar
Schwartz R, Imai T, Kubala F, Nguyen L, and Makhoul J (1997) A maximum likelihood model for topic classification of broadcast news. Proceedings of Eurospeech, Greece
Google Scholar
Prasad R, Natarajan P, Subramanian K, Saleem S, Schwartz R (2007) Finding structure in noisy text: topic classification and unsupervised clustering. Paper presented at IJCAI-2007 Workshop on Analytics for Noisy Unstructured Text Data, Hyderabad, India
Google Scholar

Download references

Author information

Authors and Affiliations

Raytheon BBN Technologies, Cambridge, MA, USA
Elizabeth Boschee, Premkumar Natarajan & Ralph Weischedel

Authors

Elizabeth Boschee
View author publications
You can also search for this author in PubMed Google Scholar
Premkumar Natarajan
View author publications
You can also search for this author in PubMed Google Scholar
Ralph Weischedel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elizabeth Boschee .

Editor information

Editors and Affiliations

, Computer Science Department, University of Maryland, AV Williams Building, College Park, 20854, Maryland, USA
V.S. Subrahmanian

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Boschee, E., Natarajan, P., Weischedel, R. (2013). Automatic Extraction of Events from Open Source Text for Predictive Forecasting. In: Subrahmanian, V. (eds) Handbook of Computational Approaches to Counterterrorism. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5311-6_3

Download citation

DOI: https://doi.org/10.1007/978-1-4614-5311-6_3
Published: 08 November 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-5310-9
Online ISBN: 978-1-4614-5311-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics