Online News Event Extraction for Global Crisis Surveillance

  • Jakub Piskorski
  • Hristo Tanev
  • Martin Atkinson
  • Eric van der Goot
  • Vanni Zavarella
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6910)

Abstract

This article presents a real-time and multilingual news event extraction system developed at the Joint Research Centre of the European Commission. It is capable of accurately and efficiently extracting violent and natural disaster events from online news. In particular, a linguistically relatively lightweight approach is deployed, in which clustered news are heavily exploited at all stages of processing. Furthermore, the technique applied for event extraction assumes the inverted-pyramid style of writing news articles, i.e., the most important parts of the story are placed in the beginning and the least important facts are left toward the end. The article focuses on the system’s architecture, real-time news clustering, geo-locating and geocoding clusters, event extraction grammar development, adapting the system to the processing of new languages, cluster-level information fusion, visual event tracking, event extraction accuracy evaluation, and detecting event reporting boundaries in news article streams. This article is an extended version of [20].

Keywords

event extraction global crisis monitoring shallow text processing information aggregation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Andrews, P.: Semantic topic extraction and segmentation for efficient document visualization’ Master’s thesis, School of Computer & Communication Sciences, Swiss Federal Institute of Technology, Lausanne (2004)Google Scholar
  2. 2.
    Aone, C., Santacruz, M.: REES: A Large-Scale Relation and Event Extraction System. In: Proceedings of ANLP 2000, 6th Applied Natural Language Processing Conference, Seattle, Washington, USA (2000)Google Scholar
  3. 3.
    Appelt, D.: Introduction to Information Extraction Technology. In: Tutorial held at IJCAI 1999, Stockholm, Sweden (1999)Google Scholar
  4. 4.
    Ashish, N., Appelt, D., Freitag, D., Zelenko, D.: Proceedings of the workshop on Event Extraction and Synthesis’, held in Conjunction with the AAAI 2006 Conference, Menlo Park, California, USA (2006)Google Scholar
  5. 5.
    Atkinson, M., Van der Goot, E.: Near Real Time Information Mining in Multilingual News. In: Proceedings of the 18th World Wide Web Conference, Madrid, Spain (2009)Google Scholar
  6. 6.
    Best, C., Van der Goot, E., Blackler, K., Garcia, T., Horby, D.: Europe Media Monitor, Technical Report, EUR 22173 EN, European Commission (2005)Google Scholar
  7. 7.
    Brants, T., Chen, F., Farahat, A.: A System for New Event Detection. In: Proceedings of the 26tth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA (2003)Google Scholar
  8. 8.
    Cox, T., Cox, M.: Multidimensional Scaling. In: Monographs on Statistics and Applied Probability, 2nd edn., Chapman & Hall, London (2001)Google Scholar
  9. 9.
    Cunningham, H., Maynard, D., Tablan, V.: Jape: a Java Annotation Patterns Engine. Technical Report, CS–00–10, University of Sheffield, Department of Computer Science (2000)Google Scholar
  10. 10.
    Drożdżyński, W., Krieger, H.-U., Piskorski, J., Schäfer, U., Xu, F.: Shallow Processing with Unification and Typed Feature Structures — Foundations and Applications. Künstliche Intelligenz 1 (2004)Google Scholar
  11. 11.
    Gale, W., Church, K., Yarowsky, D.: One sense per discourse. In: HLT 1991: Proceedings of the workshop on Speech and Natural Language, Harriman, New York, USA (1992)Google Scholar
  12. 12.
    Grishman, R., Huttunen, S., Yangarber, R.: Real-time Event Extraction for Infectious Disease Outbreaks. In: Proceedings of Human Language Technology Conference 2002, San Diego, USA (2002)Google Scholar
  13. 13.
    Hearst, M.: Subtopic structuring for full-length document access. In: Post-proceedings of SIGIR (1993)Google Scholar
  14. 14.
    Ji, H., Grishman, R.: Refining Event Extraction through Unsupervised Cross-document Inference. In: Proceedings of 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Columbus, Ohio, USA (2008)Google Scholar
  15. 15.
    Jones, R., McCallum, A., Nigam, K., Riloff, E.: Bootstrapping for Text Learning Tasks. In: Proceedings of IJCAI 1999 Workshop on Text Mining, Stockholm, Sweden (1999)Google Scholar
  16. 16.
    King, G., Lowe, W.: An Automated Information Extraction Tool For International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design. International Organization 57, 617–642 (2003)CrossRefGoogle Scholar
  17. 17.
    Naughton, M., Kushmerick, N., Carthy, J.: Event Extraction from Heterogeneous News Sources. In: Proceedings of the AAAI 2006 Workshop on Event Extraction and Synthesis, Menlo Park, California, USA (2006)Google Scholar
  18. 18.
    Piskorski, J.: ExPRESS – Extraction Pattern Recognition Engine and Specification Suite. In: Proceedings of the International Workshop Finite-State Methods and Natural language Processing 2007 (FSMNLP 2007), Potsdam, Germany (2007)Google Scholar
  19. 19.
    Piskorski, J.: CORLEONE – Core Linguistic Entity Online Extraction, Technical Report, EN 23393, Joint Research Center of the European Commission, Ispra, Italy (2008)Google Scholar
  20. 20.
    Piskorski, J., Tanev, H., Atkinson, M., Van der Goot, E.: Cluster-Centric Approach to News Event Extraction. In: Proceedings of the International Conference on Multimedia & Network Information Systems. IOS Press, Poland (2009)Google Scholar
  21. 21.
    Piskorski, J.: Exploring Curvature-based Topic Development Analysis for Detecting Event Reporting Boundaries. In: Marciniak, M., Mykowiecka, A. (eds.) Aspects of Natural Language Processing. LNCS, vol. 5070, pp. 311–331. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  22. 22.
    Piskorski, J., Atkinson, M., Belyaeva, J., Zavarella, V., Huttunen, S., Yangarber, R.: Real-Time Text Mining in Multilingual News for the Creation of a Pre-frontier Intelligence Picture. In: Proceedings of the 16th Conference on Knowledge Discovery and Data Mining (KDD 2010). ACM SIGKDD Workshop on Intelligence and Security Informatics, Washington DC, USA (2010)Google Scholar
  23. 23.
    Popov, B., Kiryakov, A., Ognyanoff, D., Manov, D., Kirilov, A., Goranov, M.: Towards Semantic Web Information Extraction. In: Proceedings of International Semantic Web Conference, Sundial Resort, Florida, USA (2003)Google Scholar
  24. 24.
    Pouliquen, B., Kimler, M., Steinberger, R., Ignat, C., Oellinger, T., Blackler, K., Fuart, F., Zaghouani, W., Widiger, A., Forslund, A., Best, C.: Geocoding multilingual texts: Recognition, Disambiguation and Visualisation. In: Proceedings of LREC 2006, Genoa, Italy, pp. 24–26 (2006)Google Scholar
  25. 25.
    Qi, Y., Candan, K.-S.: CUTS: Curvature-based Development Pattern Analysis and Segmentation for Blogs and Other Text Streams. In: Proceedings of Hypertext 2006, Odense, Denmark (2006)Google Scholar
  26. 26.
    Riloff, E.: Automatically Constructing a Dictionary for Information Extraction Tasks. In: Proceedings of the 11th National Conference on Artificial Intelligence (AAAI 1993). MIT Press, Cambridge (1993)Google Scholar
  27. 27.
    Shannon, C.: A mathematical theory of communication. The Bell System Technical Journal 27 (1948)Google Scholar
  28. 28.
    Tanev, T., Oezden-Wennerberg, P.: Learning to Populate an Ontology of Violent Events. In: Fogelman-Soulie, F., Perrotta, D., Piskorski, J., Steinberger, R. (eds.) NATO Security through Science Series: Mining Massive Datasets for Security. IOS Press, Amsterdam (2008)Google Scholar
  29. 29.
    Tanev, H., Piskorski, J., Atkinson, M.: Real-Time News Event Extraction for Global Crisis Monitoring. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds.) NLDB 2008. LNCS, vol. 5039, pp. 207–218. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  30. 30.
    Tanev, H., Zavarella, V., Linge, J., Kabadjov, M., Piskorski, J., Atkinson, M., Steinberger, R.: Exploiting Machine Learning Techniques to Build an Event Extraction System for Portuguese and Spanish. LINGUAMÁTICA Journal 2, 55–66 (2009)Google Scholar
  31. 31.
    Wagner, E., Liu, J., Birnbaum, L., Forbus, K., Baker, J.: Using Explicit Semantic Models to Track Situations Across News Articles. In: Proceedings of the AAAI 2006 workshop on Event Extraction and Synthesis, Menlo Park, California, USA (2006)Google Scholar
  32. 32.
    Yangarber, R., Grishman, R.: Machine Learning of Extraction Patterns from Un-annotated Corpora. In: Proceedings of the 14th European Conference on Artificial Intelligence: Workshop on Machine Learning for Information Extraction, Berlin, Germany (2000)Google Scholar
  33. 33.
    Yangarber, R.: Counter-Training in Discovery of Semantic Patterns. In: Proceedings of the 41st Annual Meeting of the ACL (2003)Google Scholar
  34. 34.
    Yangarber, R., Von Etter, P., Steinberger, R.: Content Collection and Analysis in the Domain of Epidemiology. In: Proceedings of DrMED 2008: International Workshop on Describing Medical Web Resources at MIE 2008: the 21st International Congress of the European Federation for Medical Informatics 2008, Goeteborg, Sweden (2008)Google Scholar
  35. 35.
    Zavarella, V., Piskorski, J., Tanev, H.: Event Extraction for Italian using a Cascade of Finite-State Grammars. In: Post-Proceedings of the 7th International Workshop on Finite-State Machines and Natural Language Processing, Ispra, Italy (2008/2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Jakub Piskorski
    • 1
  • Hristo Tanev
    • 2
  • Martin Atkinson
    • 2
  • Eric van der Goot
    • 2
  • Vanni Zavarella
    • 2
  1. 1.Polish Academy of SciencesWarszawaPoland
  2. 2.Joint Research of the European CommissionIspraItaly

Personalised recommendations