Multi-document Summarization Using Weighted Similarity Between Topic and Clustering-Based Non-negative Semantic Feature

  • Sun Park
  • Ju-Hong Lee
  • Deok-Hwan Kim
  • Chan-Min Ahn
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4505)


This paper presents a new multi-document summarization method using weighted similarity between topic and non-negative semantic features to extract meaningful sentences relevant to a given topic. The proposed method decomposes a sentence into the linear combination of sparse non-negative semantic features so that it can represent a sentence as the sum of a few semantic features that are comprehensible intuitively. It can avoid extracting the sentences whose similarities with topic are high but are meaningless by using the weighted similarity measure between the topic and the semantic features. Clustering sentences remove noises so that it can avoid the biased semantics of the documents to be reflected in summaries. Besides, it can enhance the coherence of document summaries by arranging extracted sentences in the order of their rank. The experimental results using DUC data show that the proposed method achieves better performance than the other methods.


multi-document summarization non-negative matrix factorization clustering topic-based summarization weighted similarity measure 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chin-Yew, L.: ROUGE: A Package for Automatic Evaluation of Summaries. In: Proceedings of Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL (2004)Google Scholar
  2. 2.
    Goldstein, J., Mittal, V., Carbonell, J., Kantrowitz, M.: Multi-Document Summarization By Sentence Extraction. In: The Proceeding of the ANLP/NAACL Workshop (2000)Google Scholar
  3. 3.
    Gong, Y., Liu, X.: Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In: Proceeding of ACM SIGIR, pp. 19–25 (2001)Google Scholar
  4. 4.
    Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)Google Scholar
  5. 5.
    Hoa, H.D.: Overview of DUC 2005. In: Proceedings of the DUC (2005)Google Scholar
  6. 6.
    Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)CrossRefGoogle Scholar
  7. 7.
    Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, vol. 13, pp. 556–562 (2000)Google Scholar
  8. 8.
    Nomoto, T., Matsumoto, Y.: A New Approach to Unsupervised Text Summarization. In: Proceeding of ACM SIGIR, pp. 26–34 (2001)Google Scholar
  9. 9.
    Lee, J.H., Part, S., Ahn, C.M.: Automatic Generic Document Summarization Based on Non-negative Matrix Factorization. In: Proceeding of BIS (2007)Google Scholar
  10. 10.
    Park, S., Lee, J.-H., Ahn, C.-M., Hong, J.S., Chun, S.-J.: Query Based Summarization Using Non-negative Matrix Factorization. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds.) KES 2006. LNCS (LNAI), vol. 4253, pp. 84–89. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Park, S., Lee, J.-H., Kim, D.-H., Ahn, C.-M.: Multi-document Summarization Based on Cluster Using Non-negative Matrix Factorization. In: van Leeuwen, J., Italiano, G.F., van der Hoek, W., Meinel, C., Sack, H., Plášil, F. (eds.) SOFSEM 2007. LNCS, vol. 4362, pp. 761–770. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  12. 12.
    Radev, D.R., Hovy, E., Mckeown, K.: Introduction to the Special Issue on Summarization. Computational Linguistics 28, 399–408 (2002)CrossRefGoogle Scholar
  13. 13.
    Ricardo, B.Y., Berthier, R.N.: Modern Information Retrieval. ACM Press, New York (1999)Google Scholar
  14. 14.
    Sassion, H.: Topic-based Summarization at DUC 2005. In: Proceedings of DUC (2005)Google Scholar
  15. 15.
    Wild, S., Curry, J., Dougherty, A.: Motivating Non-Negative Matrix Factorizations. In: Proceeding of SIAM ALA (2003)Google Scholar
  16. 16.
    Xu, W., Liu, X., Gong, Y.: Document Clustering Based On Non-negative Matrix Factorization. In: Proceeding of ACM SIGIR, pp. 267–273 (2003)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Sun Park
    • 1
  • Ju-Hong Lee
    • 1
  • Deok-Hwan Kim
    • 2
  • Chan-Min Ahn
    • 1
  1. 1.Dept. of Computer Science & Information Engineering, Inha University, IncheonKorea
  2. 2.Dept. of Electronics Engineering, Inha University 

Personalised recommendations