Learning Similarity Functions in Graph-Based Document Summarization

  • You Ouyang
  • Wenjie Li
  • Furu Wei
  • Qin Lu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5459)


Graph-based models have been extensively explored in document summarization in recent years. Compared with traditional feature-based models, graph-based models incorporate interrelated information into the ranking process. Thus, potentially they can do a better job in retrieving the important contents from documents. In this paper, we investigate the problem of how to measure sentence similarity which is a crucial issue in graph-based summarization models but in our belief has not been well defined in the past. We propose a supervised learning approach that brings together multiple similarity measures and makes use of human-generated summaries to guide the combination process. Therefore, it can be expected to provide more accurate estimation than a single cosine similarity measure. Experiments conducted on the DUC2005 and DUC2006 data sets show that the proposed learning approach is successful in measuring similarity. Its competitiveness and adaptability are also demonstrated.


Document summarization graph-based ranking sentence similarity calculation support vector machine 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Luhn, H.P.: The automatic creation of literature abstracts. IBM J. of R. and D. 2(2) (1958)Google Scholar
  2. 2.
    Radev, D.R., Hovy, E., McKeown, K.: Introduction to special issue on summarization. Computational Linguistics 28(4), 399–408 (2002)CrossRefGoogle Scholar
  3. 3.
    Barzilay, R., McKeown, K., Elhadad, M.: Information fusion in the context of multi-document Summarization. In: Proceedings of ACL 1999. College Park, MD (1999)Google Scholar
  4. 4.
    Zajic, D., B. Dorr.: Automatic headline generation for newspaper stories. In: Proceedings of the ACL workshop on Automatic Summarization/Document Understanding Conference (2002) Google Scholar
  5. 5.
    Ercan, G., Cicekli, I.: Using lexical chains for keyword extraction. Information Processing & Management 43, 1705–1714 (2007)CrossRefGoogle Scholar
  6. 6.
    Kupiec, J.M., Pedersen, J., Chen, F.: A Trainable Document Summarizer. In: Fox, E.A., Ingwersen, P., Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68–73 (1995)Google Scholar
  7. 7.
    Mitra, M., Singhal, A., Buckley, C.: Automatic text summarization by paragraph extraction. In: Proceedings of the ACL 1997 VEACL 1997 Workshop on Intelligent Scalable Text Summarization, Madrid (1997)Google Scholar
  8. 8.
    Ouyang, Y., Li, S., Li, W.: Developing learning strategies for topic-based summarization. In: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, Lisbon, Portugal, pp. 79–86 (2007)Google Scholar
  9. 9.
    Mihalcea, R., Tarau, P.: TextRank – bringing order into texts. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (2004)Google Scholar
  10. 10.
    Erkan, G., Radev, D.R.: LexPageRank: Prestige in Multi-Document Text Summarization. In: Proceedings of EMNLP, pp. 365–371 (2004)Google Scholar
  11. 11.
    Zha, H.: Generic Summarization and Key Phrase Extraction using Mutual Reinforcement Principlae and Sentence Clustering. In: Proceedings of ACM SIGIR, pp. 113–120 (2002)Google Scholar
  12. 12.
    Mihalcea, R., Tarau, P.: An Algorithm for Language Independent Single and Multiple Document Summarization. In: Proceedings of IJCNLP (2005)Google Scholar
  13. 13.
    OtterBacher, J., Erkan, G., Radev, D.R.: Using Random Walks for Question-focused Sentence Retrieval. In: Proceedings of HLT/EMNLP, pp. 915–922 (2005)Google Scholar
  14. 14.
    Wan, X., Yang, J., Xiao, J.: Using Cross-Document Random Walks for Topic-Focused Multi-Document Summarization. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, pp. 1012–1018 (2006)Google Scholar
  15. 15.
    Tombros, A., van Rijsbergen, C.J.: Query-Sensitive Similarity Measures for Information Retrieval. Knowledge and Information Systems 6, 617–642 (2004)CrossRefGoogle Scholar
  16. 16.
    Schölkopf, B., Smola, A., Williamson, R., Bartlett, P.L.: New Support Vector Algorithms. Neural Computation 12, 1207–1245 (2000)CrossRefGoogle Scholar
  17. 17.
    Brin, S., Page, L.: The Anatomy of a Large-scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)CrossRefGoogle Scholar
  18. 18.
    Dang, H.T.: Overview of DUC 2005. In: Document Understanding Conference 2005 (2005),
  19. 19.
    Lin, C.-Y., Hovy, E.: Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In: Proceedings of HLT-NAACL, pp. 71–78 (2003)Google Scholar
  20. 20.
    Carbonell, J., Goldstein, J.: The use of MMR and diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (1998)Google Scholar
  21. 21.
    Chang, C.-C., Lin, C.-J.: LIBSVM: A Library for Support Vector Machines,

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • You Ouyang
    • 1
  • Wenjie Li
    • 1
  • Furu Wei
    • 1
  • Qin Lu
    • 1
  1. 1.Department of ComputingThe Hong Kong Polytechinc UniversityHong Kong

Personalised recommendations