Exploring Curvature-Based Topic Development Analysis for Detecting Event Reporting Boundaries

  • Jakub Piskorski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5070)

Abstract

In the era of proliferation of electronic news media and an ever-growing demand for prompt and concise information, natural language text processing technologies which map free texts into structured data format are becoming paramount. Recently, we have witnessed an emergence of publicly accessible news aggregation systems for facilitating navigation through news. This paper reports on some explorations of refining a real-time news event extraction system, which runs on top of the Europe Media Monitoring news aggregation system developed at the Joint Research Centre of the European Commission. Our experiments focus on the task of detecting new events in a given news story, i.e. tagging events extracted by the core event extraction system as new. Several methods ranging from simple similarity computation of event descriptions of adjacent events to more elaborate ones based on curvature-based topic development analysis which utilize global knowledge. The paper describes first the particularities of the real-time news event extraction processing chain. Next, in order to get a better insight how news stories evolve over time some statistics on event dynamics are presented. Finally, the new event detection techniques are introduced and the results of the evaluation are given.

Keywords

event extraction topic detection security informatics open source intelligence 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alani, H., Kim, S., Millard, D., Weal, M., Hall, W., Lewis, P., Shadbolt, N.: Web based Knowledge Extraction and Consolidation for Automatic Ontology Instantiation. In: Proceedings of the Workshop on Knowledge Markup and Semantic Annotation, K-Cap’03 (2003)Google Scholar
  2. 2.
    Ashish, N., Appelt, D., Freitag, D., Zelenko, D.: Proceedings of the Workshop on Event Extraction and Synthesis, held in conjunction with the AAAI 2006 conference, Menlo Park, California, USA (2006)Google Scholar
  3. 3.
    Bejan, C., Harabagiu, S.: A Linguistic Resource for Discovering Event Structures and Resolving Event Coreference. In: ELRA, E.L.R.A. (ed.) Proceedings of the 6th International Language Resources and Evaluation (LREC’08), Marrakech, Morocco (2008)Google Scholar
  4. 4.
    Best, C., van der Goot, E., Blackler, K., Garcia, T., Horby, D.: Europe Media Monitor. Technical Report EUR 22173 EN, European Commission (2005)Google Scholar
  5. 5.
    Brants, T., Chen, F., Farahat, A.: A System for New Event Detection. In: SIGIR ’03: Proceedings of the 26t th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 330–337. ACM, New York (2003)CrossRefGoogle Scholar
  6. 6.
    Chieu, H., Keok Lee, Y.: Query Based Event Extraction along a Timeline. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 425–432. ACM, New York (2004)Google Scholar
  7. 7.
    Cox, T., Cox, M.: Multidimensional Scaling, 2nd edn. Monographs on Statistics and Applied Probability. Chapman and Hall, London (2001)MATHGoogle Scholar
  8. 8.
    Fillmore, C., Narayanan, S., Baker, C.: What Linguistics Can Contribute to Event Extraction. In: Proceedings of the AAAI 2006 Workshop on Event Extraction, AAAI Press, Menlo Park (2006)Google Scholar
  9. 9.
    Grishman, R., Huttunen, S., Yangarber, R.: Real-time Event Extraction for Infectious Disease Outbreaks. In: Proceedings of Human Language Technology Conference (HLT) 2002, San Diego, USA (2002)Google Scholar
  10. 10.
    Hearst, M., Plaunt, C.: Subtopic Structuring for Full-length Document Access. In: Proceedings of the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 59–68 (1993)Google Scholar
  11. 11.
    Huttunen, S., Yangarber, R., Grishman, R.: Complexity of Event Structure in IE Scenarios. In: Proceedings of the 19th International Conference on Computational Linguistics, Morristown, NJ, USA, Association for Computational Linguistics, pp. 1–7 (2002)Google Scholar
  12. 12.
    Ji, H., Grishman, R.: Refining Event Extraction through Unsupervised Cross-document Inference. In: Proceedings of 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Columbus, Ohio, USA (2008)Google Scholar
  13. 13.
    Jones, R., McCallum, A., Nigam, K., Riloff, E.: Bootstrapping for Text Learning Tasks. In: Proceedings of IJCAI-99 Workshop on Text Mining: Foundations, Techniques, and Applications, Stockholm, Sweden (1999)Google Scholar
  14. 14.
    King, G., Lowe, W.: An Automated Information Extraction Tool For International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design. International Organization 57, 617–642 (2003)CrossRefGoogle Scholar
  15. 15.
    Mann, G., Yarowsky, D.: Multi-field Information Extraction and Cross-document Fusion. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, Association for Computational Linguistics, pp. 483–490 (2005)Google Scholar
  16. 16.
    Naughton, M., Kushmerick, N., Carthy, J.: Event Extraction from Heterogeneous News Sources. In: AAAI 2006 Workshop on Event Extraction and Synthesis, AAAI Press, Menlo Park (2006)Google Scholar
  17. 17.
    Otterbacher, J., Radev, D.: Modeling Document Dynamics: an Evolutionary Approach. In (ELRA), E.L.R.A. (ed.) Proceedings of the 6th International Language Resources and Evaluation (LREC’08), Marrakech, Morocco (2008)Google Scholar
  18. 18.
    Piskorski, J.: ExPRESS – Extraction Pattern Recognition Engine and Specification Suite. In: Proceedings of the International Workshop Finite-State Methods and Natural language Processing 2007 (FSMNLP’2007), Potsdam, Germany (2007)Google Scholar
  19. 19.
    Piskorski, J.: CORLEONE – Core Linguistic Entity Online Extraction. In: Technical report 23393 EN, Joint Research Center of the European Commission, Ispra, Italy (2008)Google Scholar
  20. 20.
    Piskorski, J., Tanev, H., Atkinson, M., van der Goot, E.: Cluster-Centric Approach to News Event Extraction. In: Proceedings of the International Conference on Multimedia & Network Information Systems, Wroclaw, Poland, IOS Press, Amsterdam (2008)Google Scholar
  21. 21.
    Pouliquen, B., Kimler, M., Steinberger, R., Ignat, C., Oellinger, T., Blackler, K., Fuart, F., Zaghouani, W., Widiger, A., Forslund, A., Best, C.: Geocoding multilingual texts: Recognition, Disambiguation and Visualisation. In: Proceedings of LREC 2006, Genoa, Italy, pp. 24–26 (2006)Google Scholar
  22. 22.
    Pui, G., Fung, C., Yu, J., Liu, H., Yu, P.: Time-dependent Event Hierarchy Construction. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 300–309. ACM, New York (2007)Google Scholar
  23. 23.
    Qi, Y., Candan, K.S.: CUTS: Curvature-based Development Pattern Analysis and Segmentation for Blogs and Other Text Streams. In: Proceedings of Hypertext 2006, ACM Press, New York (2006)Google Scholar
  24. 24.
    Riloff, E.: Automatically Constructing a Dictionary for Information Extraction Tasks. In: Proceedings of the 11th National Conference on Artificial Intelligence (1993)Google Scholar
  25. 25.
    Tanev, H., Oezden-Wennerberg, P.: Learning to Populate an Ontology of Violent Events. In: Fogelman-Soulie, F., Perrotta, D., Piskorski, J., Steinberger, R. (eds.) Mining Massive Data Sets for Security, IOS Press, Amsterdam (2008)Google Scholar
  26. 26.
    Tanev, H., Piskorski, J., Atkinson, M.: Real-Time News Event Extraction for Global Crisis Monitoring. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds.) NLDB 2008. LNCS, vol. 5039, pp. 207–218. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  27. 27.
    Wagner, E., Liu, J., Birnbaum, L., Forbus, K., Baker, J.: Using Explicit Semantic Models to Track Situations Across News Articles. In: AAAI 2006 Workshop on Event Extraction and Synthesis, AAAI Press, Menlo Park (2006)Google Scholar
  28. 28.
    Wang, C., Zhang, M., Ma, S., Ru, L.: Automatic Online News Issue Construction in Web Environment. In: Proceedings of 17th International World Wide Web Conference, Bejing, China, pp. 457–466. ACM, New York (2008)Google Scholar
  29. 29.
    Yangarber, R.: Counter-Training in Discovery of Semantic Patterns. In: Proceedings of the 41st Annual Meeting of the ACL (2003)Google Scholar
  30. 30.
    Yangarber, R.: Verification of Facts across Document Boundaries. In: Proceedings International Workshop on Intelligent Information Access, IIIA-2006 (2006)Google Scholar
  31. 31.
    Yangarber, R., Jokipii, L.: Redundancy-based Correction of Automatically Extracted Facts. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Morristown, NJ, USA, Association for Computational Linguistics, pp. 57–64 (2005)Google Scholar
  32. 32.
    Zavarella, V., Piskorski, J., Tanev, H.: Event Extraction for Italian using a Cascade of Finite-State Grammars. In: Proceedings of the 7th International Workshop on Finite-State Machines and Natural Language Processsing, Ispra, Italy (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Jakub Piskorski
    • 1
  1. 1.Joint Research Centre of the European Commission, Web Mining and Intelligence of IPSCIspra (VA)Italy

Personalised recommendations