Information Retrieval

, Volume 11, Issue 1, pp 25–49 | Cite as

Using only cross-document relationships for both generic and topic-focused multi-document summarizations

Article

Abstract

In recent years graph-ranking based algorithms have been proposed for single document summarization and generic multi-document summarization. The algorithms make use of the “votings” or “recommendations” between sentences to evaluate the importance of the sentences in the documents. This study aims to differentiate the cross-document and within-document relationships between sentences for generic multi-document summarization and adapt the graph-ranking based algorithm for topic-focused summarization. The contributions of this study are two-fold: (1) For generic multi-document summarization, we apply the graph-based ranking algorithm based on each kind of sentence relationship and explore their relative importance for summarization performance. (2) For topic-focused multi-document summarization, we propose to integrate the relevance of the sentences to the specified topic into the graph-ranking based method. Each individual kind of sentence relationship is also differentiated and investigated in the algorithm. Experimental results on DUC 2002–DUC 2005 data demonstrate the great importance of the cross-document relationships between sentences for both generic and topic-focused multi-document summarizations. Even the approach based only on the cross-document relationships can perform better than or at least as well as the approaches based on both kinds of relationships between sentences.

Keywords

Multi-document summarization Topic-focused summarization Cross-document relationship Graph-ranking 

Notes

Acknowledgement

Supported by the National Science Foundation of China (60703064).

References

  1. Allan, J., Carbonell, J., Doddington, G., Yamron, J. P., & Yang, Y. (1998). Topic detection and tracking pilot study: final report. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop (pp. 194–218).Google Scholar
  2. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrival. ACM Press and Addison Wesley.Google Scholar
  3. Barzilay, R., McKeown, K. R., & Elhadad, M. (1999). Information fusion in the context of multi-document summarization. In Proceedings of the 37th Association for Computational Linguistics on Computational Linguistics, Maryland (pp. 550–557).Google Scholar
  4. Bollegala, D., Okazaki, N., & Ishizuka, M. (2006). A bottom-up approach to sentence ordering for multi-document summarization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp. 385–392).Google Scholar
  5. Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 335–336).Google Scholar
  6. Conroy, J. M., & Schlesinger, J. D. (2005). CLASSY query-based multi-document summarization. In Proceedings of 2005 Document Understanding Conference.Google Scholar
  7. Daumé, H., & Marcu, D. (2006). Bayesian query-focused summarization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp. 305–312).Google Scholar
  8. Erkan, G., & Radev, D. (2004a). LexPageRank: Prestige in multi-document text summarization. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (pp. 365–371).Google Scholar
  9. Erkan, G., & Radev, D. (2004b) LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, 457–479.Google Scholar
  10. Farzindar, A., Rozon, F., & Lapalme, G. (2005). CATS a topic-oriented multi-document summarization system at DUC 2005. In Proceedings of the 2005 Document Understanding Conference.Google Scholar
  11. Ge, J., Huang, X., & Wu, L. (2003). Approaches to event-focused summarization based on named entities and query words. In Proceedings of the 2003 Document Understanding Conference.Google Scholar
  12. Harabagiu, S., & Lacatusu, F. (2005). Topic themes for multi-document summarization. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil (pp. 202–209).Google Scholar
  13. Hardy, H., Shimizu, N., Strzalkowski, T., Ting, L., Wise, G. B., & Zhang, X. (2002). Cross-document summarization by concept classification. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland (pp. 121–128).Google Scholar
  14. Haveliwala, T. H. (2002). Topic-sensitive PageRank. In Proceedings of the Eleventh International World Wide Web Conference (pp. 517–526).Google Scholar
  15. Hovy, E., Lin, C.-Y., & Zhou, L. (2005). A BE-based multi-document summarizer with query interpretation. In Proceedings of the 2005 Document Understanding Conference.Google Scholar
  16. Ji, P. D., & Pulman, S. (2006). Sentence ordering with manifold-based classification in multi-document summarization. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (pp. 526–533).Google Scholar
  17. Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.MATHCrossRefMathSciNetGoogle Scholar
  18. Knight, K., & Marcu, D. (2002). Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence, 139(1), 91–107.MATHCrossRefMathSciNetGoogle Scholar
  19. Li, W., Wu, M., Lu, Q., Xu, W., & Yuan, C. (2006). Extractive summarization using inter- and intra- event relevance. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp. 369–376).Google Scholar
  20. Lin, C.-Y., & Hovy, E. H. (2002). From single to multi-document summarization: A prototype system and its evaluation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 25–34).Google Scholar
  21. Lin, C.-Y., & Hovy, E. H. (2003). Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (pp. 71–78).Google Scholar
  22. Mani, I., & Bloedorn, E. (1999). Summarizing similarities and differences among related documents. Information Retrieval, 1(1–2), 35–67.CrossRefGoogle Scholar
  23. McKeown, K., Klavans, J., Hatzivassiloglou, V., Barzilay, R., & Eskin, E. (1999). Towards multidocument summarization by reformulation: Progress and prospects. In Proceedings of the Sixteenth National Conference on Artificial Intelligence, Orlando, Florida (pp. 453–460).Google Scholar
  24. Mihalcea, R., & Tarau, P. (2005). A language independent algorithm for single and multiple document summarization. In Proceedings of the Second International Joint Conference on Natural Language Processing (pp. 19–24).Google Scholar
  25. Nenkova, A., Vanderwende, L., & McKeown, K. (2006). A compositional context sensitive multi-document summarizer: Exploring the factors that influence summarization. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 573–580).Google Scholar
  26. Otterbacher, J., Erkan, G., & Radev, D. R. (2005). Using random walks for question-focused sentence retrieval. In Proceedings of 2005 Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP2005) (pp. 915–922).Google Scholar
  27. Page, L., Brin, S., Motwani, R., & Winograd, T. (1998) The PageRank citation ranking: Bringing order to the Web. Technical Report, Computer Science Department, Stanford University.Google Scholar
  28. Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.Google Scholar
  29. Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J., et al. (2003). The Mead multi-document summarizer. http://www.summarization.com/mead/. Accessed on 21 March 2006.
  30. Radev, D. R., Jing, H. Y., Stys, M., & Tam, D. (2004). Centroid-based summarization of multiple documents. Information Processing and Management, 40, 919–938.MATHCrossRefGoogle Scholar
  31. Saggion, H., Bontcheva, K., & Cunningham, H. (2003). Robust generic and query-based summarization. In Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics (pp. 235–238).Google Scholar
  32. Salton, G., Singhal, A., Mitra, M., & Buckley, C. (1997). Automatic text structuring and summarization. Information Processing and Management, 33(2), 193–207.CrossRefGoogle Scholar
  33. Zhang, Z., Blair-Goldensohn, S., & Radev, D. R. (2002). Towards CST-enhanced summarization. In Proceedings of the 18th National Conference on Artificial Intelligence (pp. 439–445).Google Scholar
  34. Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., & Ma, W.-Y. (2005). Improving web search results using affinity graph. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 504–511).Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Institute of Computer Science & TechnologyPeking UniversityBeijingChina

Personalised recommendations