A Zipf-Like Distant Supervision Approach for Multi-document Summarization Using Wikinews Articles
- Cite this paper as:
- Bravo-Marquez F., Manriquez M. (2012) A Zipf-Like Distant Supervision Approach for Multi-document Summarization Using Wikinews Articles. In: Calderón-Benavides L., González-Caro C., Chávez E., Ziviani N. (eds) String Processing and Information Retrieval. SPIRE 2012. Lecture Notes in Computer Science, vol 7608. Springer, Berlin, Heidelberg
This work presents a sentence ranking strategy based on distant supervision for the multi-document summarization problem. Due to the difficulty of obtaining large training datasets formed by document clusters and their respective human-made summaries, we propose building a training and a testing corpus from Wikinews. Wikinews articles are modeled as “distant” summaries of their cited sources, considering that first sentences of Wikinews articles tend to summarize the event covered in the news story. Sentences from cited sources are represented as tuples of numerical features and labeled according to a relationship with the given distant summary that is based on the Zipf law. Ranking functions are trained using linear regressions and ranking SVMs, which are also combined using Borda count. Top ranked sentences are concatenated and used to build summaries, which are compared with the first sentences of the distant summary using ROUGE evaluation measures. Experimental results obtained show the effectiveness of the proposed method and that the combination of different ranking techniques outperforms the quality of the generated summary.
Unable to display preview. Download preview PDF.