A Zipf-Like Distant Supervision Approach for Multi-document Summarization Using Wikinews Articles

  • Felipe Bravo-Marquez
  • Manuel Manriquez
Conference paper

DOI: 10.1007/978-3-642-34109-0_15

Volume 7608 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Bravo-Marquez F., Manriquez M. (2012) A Zipf-Like Distant Supervision Approach for Multi-document Summarization Using Wikinews Articles. In: Calderón-Benavides L., González-Caro C., Chávez E., Ziviani N. (eds) String Processing and Information Retrieval. SPIRE 2012. Lecture Notes in Computer Science, vol 7608. Springer, Berlin, Heidelberg

Abstract

This work presents a sentence ranking strategy based on distant supervision for the multi-document summarization problem. Due to the difficulty of obtaining large training datasets formed by document clusters and their respective human-made summaries, we propose building a training and a testing corpus from Wikinews. Wikinews articles are modeled as “distant” summaries of their cited sources, considering that first sentences of Wikinews articles tend to summarize the event covered in the news story. Sentences from cited sources are represented as tuples of numerical features and labeled according to a relationship with the given distant summary that is based on the Zipf law. Ranking functions are trained using linear regressions and ranking SVMs, which are also combined using Borda count. Top ranked sentences are concatenated and used to build summaries, which are compared with the first sentences of the distant summary using ROUGE evaluation measures. Experimental results obtained show the effectiveness of the proposed method and that the combination of different ranking techniques outperforms the quality of the generated summary.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Felipe Bravo-Marquez
    • 1
  • Manuel Manriquez
    • 2
  1. 1.Department of Computer ScienceUniversity of ChileChile
  2. 2.University of Santiago of ChileChile