Chapter

Multi-source, Multilingual Information Extraction and Summarization

Part of the series Theory and Applications of Natural Language Processing pp 229-252

Date:

Multilingual Statistical News Summarization

  • Mijail KabadjovAffiliated withEC Joint Research Centre Email author 
  • , Josef SteinbergerAffiliated withEC Joint Research Centre
  • , Ralf SteinbergerAffiliated withEC Joint Research Centre

* Final gross prices may vary according to local VAT.

Get Access

Abstract

In this chapter we present a generic approach for summarizing clusters of multilingual news articles such as the ones produced by the Europe Media Monitor (EMM) system. Our approach uses robust statistical techniques as well as multilingual tools for named entity recognition and disambiguation to produce entity-centered summaries. We run experiments with the TAC 2008 and 2009 data sets (English corpora for summarization research), and we obtained very promising results; at TAC 2009 our runs attained top rank for linguistic quality and second best for overall responsiveness. We also run a small-scale evaluation on languages other than English, demonstrating thereby the multilinguality of our approach, but also providing interesting evidence that contradicts the pervasive assumption “if it works for English, it works for any language”. Finally, we present an online system currently under development which will eventually incorporate all the elements of the summarization approach discussed hereby and we show sample output summaries in various languages.