Automating Event Extraction for the Security Domain

  • Clive Best
  • Jakub Piskorski
  • Bruno Pouliquen
  • Ralf Steinberger
  • Hristo Tanev
Part of the Studies in Computational Intelligence book series (SCI, volume 135)

Abstract

This chapter presents on-going efforts at the Joint-Research Center of the European Commission for automating event extraction from news articles collected through the Internet with the Europe Media Monitor system. Event extraction builds on techniques developed over several years in the fields of information extraction, whose basic goal is to derive quantitative data from unstructured text. The motivation for automated event tracking is to provide objective incident data with broad coverage on terrorist incidents and violent conflicts from around the world. This quantitative data then forms the basis for populating incident databases and systems for trend analysis and risk assessment.

A discussion of the technical requirements for information extraction and the approach adopted by the authors is presented. In particular, we deploy lightweight methods for entity extraction and a machine-learning technique for pattern-based event extraction. A preliminary evaluation of the results shows that the accuracy is already acceptable. Future directions of improving the approach are also discussed.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
    Appelt, D.: Introduction to Information Extraction Technology. In: IJCAI 1999, Tutorial, Stockholm, Sweden (1999)Google Scholar
  3. 3.
    Best, C., van der Goot, E., Blackler, K., Garcia, T., Horby, D.: Europe Media Monitor - System Description. Technical Report EUR 22173 EN, European Commission (2005)Google Scholar
  4. 4.
    Bond, D.: Integrated Data for Event Analysis (IDEA) (1998-2002), http://vranet.com/idea
  5. 5.
    Cunningham, H., Maynard, D., Tablan, V.: JAPE: a Java Annotation Patterns Engine (2rd edn). Technical Report, CS–00–10, University of Sheffield, Department of Computer Science (2000)Google Scholar
  6. 6.
    Discoverer Extractor, http://www.temis-group.com
  7. 7.
    Drożdżyński, W., Krieger, H.-U., Piskorski, J., Schäfer, U., Xu, F.: Shallow Processing with Unification and Typed Feature Structures — Foundations and Applications. Künstliche Intelligenz 2004(1), 17–23 (2004)Google Scholar
  8. 8.
    Erjavec, T.: MULTEXT - East Morphosyntactic Specifications (2004), Web document, http://nl.ijs.si/ME/V3/msd/html
  9. 9.
    Global Public Health Information Network Google Scholar
  10. 10.
    Goldstein, J.: A Conflict-Cooperation scale for WEIS Events data. Journal of Conflict Resolution 36(2), 369–385 (1992)CrossRefGoogle Scholar
  11. 11.
  12. 12.
  13. 13.
    Institute for Counter Terrorism, http://www.itc.org.il
  14. 14.
    Inxight ThingFinder Professional, http://www.inxight.com
  15. 15.
    Jones, R., McCallum, A., Nigam, K., Riloff, E.: Bootstrapping for Text Learning Tasks. In: Proceedings of IJCAI 1999 Workshop on Text Mining: Foundations, Techniques, and Applications, Stockholm, Sweden (1999)Google Scholar
  16. 16.
    Medical Intelligence System, http://medisys.jrc.it
  17. 17.
    MIPT Terrorism Knowledge Base (TKB), http://www.tkb.org
  18. 18.
  19. 19.
    Piskorski, J.: Advances in Information Extraction. In: Abramowicz, W. (ed.) Knowledge Based Information Retrieval and Filtering from Internet. Kluwer Academic Publishers, Dordrecht (2003)Google Scholar
  20. 20.
    Piskorski, J.: On Compact Storage Models for Gazetteers. In: Yli-Jyrä, A., Karttunen, L., Karhumäki, J. (eds.) FSMNLP 2005. LNCS (LNAI), vol. 4002. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  21. 21.
    Piskorski, J.: CORLEONE - Core Linguistic Entity Online Extraction. Technical Report, European Commission (to appear, 2007)Google Scholar
  22. 22.
    Pouliquen, B., Kimler, M., Steinberger, R., Ignat, C., Oellinger, T., Blackler, K., Fuart, F., Zaghouani, W., Widiger, A., Forslund, A.C., Best, C.: Geocoding multilingual texts: Recognition, Disambiguation and Visualisation. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy, pp. 24–26 (2006)Google Scholar
  23. 23.
    Pouliquen, B., Steinberger, R., Ignat, C., Temnikova, I., Widiger, A., Zaghouani, W., Zizka, J.: Multilingual person name recognition and transliteration. Journal CORELA - Cognition, Representation, Langage. Special issue: Le traitement lexicographique des noms propres (2005)Google Scholar
  24. 24.
    Schrodt, P.: Kansas Event Data Project (KEDS). Dept. of Political Science, University of Kansas, http://www.ku.edu/~keds/project.html
  25. 25.
  26. 26.
    South Asian Terrorism Portal, http://www.satp.org
  27. 27.
    Steinberger, R., Pouliquen, B., Ignat, C.: Navigating multilingual news collections using automatically extracted information. Journal of Computing and Information Technology - CIT 13, 257–264 (2005)CrossRefGoogle Scholar
  28. 28.
    Szpektor, I., Tanev, H., Dagan, I., Coppola, B.: Scaling Web-based acquisition of Entailment Relation. In: Proceedings of EMNLP 2004, Barcelona, Spain (2004)Google Scholar
  29. 29.
  30. 30.
    Virtual Research Associates, http://www.vranet.com
  31. 31.
    Weimann, G.: Terror on the Internet. USIP Press (2006) ISBN 1929223714Google Scholar
  32. 32.
    Yangarber, R., Jokipii, L., Rauramo, A., Huttunen, S.: Information Extraction from Epidemiological Reports. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP-2005), Vancouver, Canada (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Clive Best
    • 1
  • Jakub Piskorski
    • 1
  • Bruno Pouliquen
    • 1
  • Ralf Steinberger
    • 1
  • Hristo Tanev
    • 1
  1. 1.Joint Research Center of the European CommissionWeb and Language Technology Group of IPSCItaly

Personalised recommendations