Using only cross-document relationships for both generic and topic-focused multi-document summarizations
Abstract
In recent years graph-ranking based algorithms have been proposed for single document summarization and generic multi-document summarization. The algorithms make use of the “votings” or “recommendations” between sentences to evaluate the importance of the sentences in the documents. This study aims to differentiate the cross-document and within-document relationships between sentences for generic multi-document summarization and adapt the graph-ranking based algorithm for topic-focused summarization. The contributions of this study are two-fold: (1) For generic multi-document summarization, we apply the graph-based ranking algorithm based on each kind of sentence relationship and explore their relative importance for summarization performance. (2) For topic-focused multi-document summarization, we propose to integrate the relevance of the sentences to the specified topic into the graph-ranking based method. Each individual kind of sentence relationship is also differentiated and investigated in the algorithm. Experimental results on DUC 2002–DUC 2005 data demonstrate the great importance of the cross-document relationships between sentences for both generic and topic-focused multi-document summarizations. Even the approach based only on the cross-document relationships can perform better than or at least as well as the approaches based on both kinds of relationships between sentences.
Keywords
Multi-document summarization Topic-focused summarization Cross-document relationship Graph-rankingNotes
Acknowledgement
Supported by the National Science Foundation of China (60703064).
References
- Allan, J., Carbonell, J., Doddington, G., Yamron, J. P., & Yang, Y. (1998). Topic detection and tracking pilot study: final report. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop (pp. 194–218).Google Scholar
- Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrival. ACM Press and Addison Wesley.Google Scholar
- Barzilay, R., McKeown, K. R., & Elhadad, M. (1999). Information fusion in the context of multi-document summarization. In Proceedings of the 37th Association for Computational Linguistics on Computational Linguistics, Maryland (pp. 550–557).Google Scholar
- Bollegala, D., Okazaki, N., & Ishizuka, M. (2006). A bottom-up approach to sentence ordering for multi-document summarization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp. 385–392).Google Scholar
- Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 335–336).Google Scholar
- Conroy, J. M., & Schlesinger, J. D. (2005). CLASSY query-based multi-document summarization. In Proceedings of 2005 Document Understanding Conference.Google Scholar
- Daumé, H., & Marcu, D. (2006). Bayesian query-focused summarization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp. 305–312).Google Scholar
- Erkan, G., & Radev, D. (2004a). LexPageRank: Prestige in multi-document text summarization. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (pp. 365–371).Google Scholar
- Erkan, G., & Radev, D. (2004b) LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, 457–479.Google Scholar
- Farzindar, A., Rozon, F., & Lapalme, G. (2005). CATS a topic-oriented multi-document summarization system at DUC 2005. In Proceedings of the 2005 Document Understanding Conference.Google Scholar
- Ge, J., Huang, X., & Wu, L. (2003). Approaches to event-focused summarization based on named entities and query words. In Proceedings of the 2003 Document Understanding Conference.Google Scholar
- Harabagiu, S., & Lacatusu, F. (2005). Topic themes for multi-document summarization. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil (pp. 202–209).Google Scholar
- Hardy, H., Shimizu, N., Strzalkowski, T., Ting, L., Wise, G. B., & Zhang, X. (2002). Cross-document summarization by concept classification. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland (pp. 121–128).Google Scholar
- Haveliwala, T. H. (2002). Topic-sensitive PageRank. In Proceedings of the Eleventh International World Wide Web Conference (pp. 517–526).Google Scholar
- Hovy, E., Lin, C.-Y., & Zhou, L. (2005). A BE-based multi-document summarizer with query interpretation. In Proceedings of the 2005 Document Understanding Conference.Google Scholar
- Ji, P. D., & Pulman, S. (2006). Sentence ordering with manifold-based classification in multi-document summarization. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (pp. 526–533).Google Scholar
- Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.MATHCrossRefMathSciNetGoogle Scholar
- Knight, K., & Marcu, D. (2002). Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence, 139(1), 91–107.MATHCrossRefMathSciNetGoogle Scholar
- Li, W., Wu, M., Lu, Q., Xu, W., & Yuan, C. (2006). Extractive summarization using inter- and intra- event relevance. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp. 369–376).Google Scholar
- Lin, C.-Y., & Hovy, E. H. (2002). From single to multi-document summarization: A prototype system and its evaluation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 25–34).Google Scholar
- Lin, C.-Y., & Hovy, E. H. (2003). Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (pp. 71–78).Google Scholar
- Mani, I., & Bloedorn, E. (1999). Summarizing similarities and differences among related documents. Information Retrieval, 1(1–2), 35–67.CrossRefGoogle Scholar
- McKeown, K., Klavans, J., Hatzivassiloglou, V., Barzilay, R., & Eskin, E. (1999). Towards multidocument summarization by reformulation: Progress and prospects. In Proceedings of the Sixteenth National Conference on Artificial Intelligence, Orlando, Florida (pp. 453–460).Google Scholar
- Mihalcea, R., & Tarau, P. (2005). A language independent algorithm for single and multiple document summarization. In Proceedings of the Second International Joint Conference on Natural Language Processing (pp. 19–24).Google Scholar
- Nenkova, A., Vanderwende, L., & McKeown, K. (2006). A compositional context sensitive multi-document summarizer: Exploring the factors that influence summarization. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 573–580).Google Scholar
- Otterbacher, J., Erkan, G., & Radev, D. R. (2005). Using random walks for question-focused sentence retrieval. In Proceedings of 2005 Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP2005) (pp. 915–922).Google Scholar
- Page, L., Brin, S., Motwani, R., & Winograd, T. (1998) The PageRank citation ranking: Bringing order to the Web. Technical Report, Computer Science Department, Stanford University.Google Scholar
- Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.Google Scholar
- Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J., et al. (2003). The Mead multi-document summarizer. http://www.summarization.com/mead/. Accessed on 21 March 2006.
- Radev, D. R., Jing, H. Y., Stys, M., & Tam, D. (2004). Centroid-based summarization of multiple documents. Information Processing and Management, 40, 919–938.MATHCrossRefGoogle Scholar
- Saggion, H., Bontcheva, K., & Cunningham, H. (2003). Robust generic and query-based summarization. In Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics (pp. 235–238).Google Scholar
- Salton, G., Singhal, A., Mitra, M., & Buckley, C. (1997). Automatic text structuring and summarization. Information Processing and Management, 33(2), 193–207.CrossRefGoogle Scholar
- Zhang, Z., Blair-Goldensohn, S., & Radev, D. R. (2002). Towards CST-enhanced summarization. In Proceedings of the 18th National Conference on Artificial Intelligence (pp. 439–445).Google Scholar
- Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., & Ma, W.-Y. (2005). Improving web search results using affinity graph. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 504–511).Google Scholar