Using Parallel Corpora for Multilingual (Multi-document) Summarisation Evaluation

  • Marco Turchi
  • Josef Steinberger
  • Mijail Kabadjov
  • Ralf Steinberger
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6360)


We are presenting a method for the evaluation of multilingual multi-document summarisation that allows saving precious annotation time and that makes the evaluation results across languages directly comparable. The approach is based on the manual selection of the most important sentences in a cluster of documents from a sentence-aligned parallel corpus, and by projecting the sentence selection to various target languages. We also present two ways of exploiting inter-annotator agreement levels, apply them both to a baseline sentence extraction summariser in seven languages, and discuss the result differences between the two evaluation versions, as well as a preliminary analysis between languages. The same method can in principle be used to evaluate single-document summarisers or information extraction tools.


Machine Translation Binary Model Statistical Machine Translation Parallel Corpus Language Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Steinberger, R., Pouliquen, B., van der Goot, E.: An Introduction to the Europe Media Monitor Family of Applications. In: Information Access in a Multilingual World workshop at SIGIR, Boston, USA, pp. 1–8 (2009)Google Scholar
  2. 2.
    Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. In: X Machine Translation Summit, Phuket, Thailand, pp. 79–86 (2005)Google Scholar
  3. 3.
    Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., Varga, D.: The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In: LREC, Genova, Italy, pp. 24–26 (2006)Google Scholar
  4. 4.
    Steinberger, J., Ježek, K.: Update summarisation based on Latent Semantic Analysis. In: TSD, Pilsen, Czech Republic (2009)Google Scholar
  5. 5.
    Kanungo, T., Resnik, P.: The Bible, truth, and multilingual OCR evaluation. International Society for Optical Engineering, 86–96 (1999)Google Scholar
  6. 6.
    Koehn, P.: Europarl: A Multilingual Corpus for Evaluation of Machine Translation, unpublished draft (2002)Google Scholar
  7. 7.
    Van Zaanen, M., Roberts, A., Atwell, E.: A multilingual parallel parsed corpus as gold standard for grammatical inference evaluation. In: The Amazing Utility of Parallel and Comparable Corpora Workshop, pp. 58–61 (2004)Google Scholar
  8. 8.
    Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J., Celebi, A., et al.: MEAD-a platform for multidocument multilingual text summarisation. In: LREC, Lisbon, Portugal, pp. 86–96 (2004)Google Scholar
  9. 9.
    Lin, C., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: HLT-NAACL, Edmonton, Canada, pp. 71–78 (2003)Google Scholar
  10. 10.
    Hovy, E., Lin, C., Zhou, L.: Evaluating duc 2005 using basic elements. In: DUC 2005 (2005)Google Scholar
  11. 11.
    Nenkova, A., Passonneau, R.: Evaluating content selection in summarisation: The pyramid method. In: NAACL, Boston, USA (2004)Google Scholar
  12. 12.
    Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Computational Linguistics 19, 75–102 (1994)Google Scholar
  13. 13.
    Piskorski, J.: CORLEONE-Core Linguistic Entity Online Extraction. Technical report EUR 23393 EN, European Commission (2008)Google Scholar
  14. 14.
    Gong, Y., Liu, X.: Generic text summarisation using relevance measure and latent semantic analysis. In: ACM SIGIR, New Orleans, US, pp. 19–25Google Scholar
  15. 15.
    Steinberger, J., Ježek, K.: Text summarisation and singular value decomposition. In: Yakhno, T. (ed.) ADVIS 2004. LNCS, vol. 3261, pp. 245–254. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  16. 16.
    Steinberger, J., Kabadjov, M., Pouliquen, B., Steinberger, R., Poesio, M.: WB-JRC-UT’s Participation in TAC 2009: Update summarisation and AESOP Tasks. In: TAC, NIST (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Marco Turchi
    • 1
  • Josef Steinberger
    • 1
  • Mijail Kabadjov
    • 1
  • Ralf Steinberger
    • 1
  1. 1.European Commission - Joint Research Centre (JRC), IPSC - GlobSecIspra (VA)Italy

Personalised recommendations