Knowledge and Information Systems

, Volume 22, Issue 2, pp 245–259 | Cite as

A document-sensitive graph model for multi-document summarization

Regular Paper

Abstract

In recent years, graph-based models and ranking algorithms have drawn considerable attention from the extractive document summarization community. Most existing approaches take into account sentence-level relations (e.g. sentence similarity) but neglect the difference among documents and the influence of documents on sentences. In this paper, we present a novel document-sensitive graph model that emphasizes the influence of global document set information on local sentence evaluation. By exploiting document–document and document–sentence relations, we distinguish intra-document sentence relations from inter-document sentence relations. In such a way, we move towards the goal of truly summarizing multiple documents rather than a single combined document. Based on this model, we develop an iterative sentence ranking algorithm, namely DsR (Document-Sensitive Ranking). Automatic ROUGE evaluations on the DUC data sets show that DsR outperforms previous graph-based models in both generic and query-oriented summarization tasks.

Keywords

Graph-based summarization model Graph-based ranking algorithm Inter- and intra-document relation Generic summarization Query-oriented summarization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7): 107–117CrossRefGoogle Scholar
  2. 2.
  3. 3.
    Erkan G, Radev DR (2004a) LexPageRank: prestige in multi-document text summarization. In: Proceedings of the conference on empirical methods in natural language processing, pp 365–371Google Scholar
  4. 4.
    Erkan G, Radev DR (2004b) LexRank: graph-based centrality as salience in text summarization. J Artif Intell Res 22: 457–479Google Scholar
  5. 5.
    Haveliwala TT (2003) Topic-sensitive PageRank: a context-sensitive ranking algorithm for web search. IEEE Trans Knowl Data Eng 15(4): 784–796CrossRefGoogle Scholar
  6. 6.
    Kleinberg JM (1999) Authoritative sources in hyperlinked environment. J ACM 46(5): 604–632MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Langville AN, Meyer CD (2004) Deeper inside PageRank. J Internet Math 1(3): 335–380MATHMathSciNetGoogle Scholar
  8. 8.
    Lin CY, Hovy E (2003) Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of HLT-NAACL, pp 71–78Google Scholar
  9. 9.
    Lin Z, Chua TS, Kan MY, Lee WS, Qiu L, Ye S (2007) NUS at DUC 2007: using evolutionary models for text. In: Proceedings of Document Understanding Conference (DUC)Google Scholar
  10. 10.
    MacCluer CR (2000) The many proofs and applications of Perron’s theorem. SIAM Rev 42(3): 487–498MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Mihalcea R, Tarau P (2004) TextRank—bringing order into text. In: Proceedings of 2004 conference on empirical methods in natural language processing, pp 404–411Google Scholar
  12. 12.
    Otterbacher J, Erkan G, Radev DR (2005) Using random walks for question-focused sentence retrieval. In: Proceedings of the human language technology conference/conference on empirical methods in natural language processing, pp 915–922Google Scholar
  13. 13.
    Padmanabhan D, Desikan P, Srivastava J, Riaz K (2005) WICER: A weighted inter-cluster edge ranking for clustered graphs. In: Proceedings of 2005 IEEE/WIC/ACM international conference on web intelligence, pp 522–528Google Scholar
  14. 14.
    Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: bringing order to the web. Stanford University (manuscript in Progress)Google Scholar
  15. 15.
  16. 16.
    Radev DR, Jing HY, Stys M, Tam D (2003) Centroid-based summarization of multiple documents. Inf Process Manage 40: 919–938CrossRefGoogle Scholar
  17. 17.
    Tong H., Faloutsos C, Pan JY (2008) Random walk with restart: fast solutions and applications. Knowl Inf Syst 14(3): 327–346MATHCrossRefGoogle Scholar
  18. 18.
    Varadarajan R, Hristidis V (2006) A system for query-specific document summarization. In: Proceedings of the 15th ACM conference on information and knowledge management, pp 622–631Google Scholar
  19. 19.
    Wan X, Yang J, Xiao J (2006a) Using cross-document random walks for topic-focused multi-document summarization. In: Proceedings of the 2006 IEEE/WIC/ACM international conference on web intelligence, pp 1012–1018Google Scholar
  20. 20.
    Wan X, Yang J, Xiao J (2006b) The great importance of cross-document relationships for multi- document summarization. In: Proceedings of the 21st international conference on the computer processing of oriental languages, pp 131–138Google Scholar
  21. 21.
    Wu X, Kumar V, Quinlan JR et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37CrossRefGoogle Scholar
  22. 22.
    Yoshioka M, Haraguchi M (2004) Multiple news articles summarization based on event reference information. In Working Notes of NTCIR-4Google Scholar
  23. 23.
    Zha HY (2002) Generic summarization and key phrase extraction using mutual reinforcement principle and sentence clustering. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, pp 113–120Google Scholar
  24. 24.
    Zhang Y, Chu CH, Ji X, Zha HY (2004) Correlating summarization of multi-source news with K-way graph bi-clustering. ACM SIGKDD Explor Newslett 6(2): 34–42CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2009

Authors and Affiliations

  1. 1.Department of ComputingThe Hong Kong Polytechnic UniversityKowloonHong Kong
  2. 2.Department of Computer Science and TechnologyWuhan UniversityWuhanChina

Personalised recommendations