Skip to main content

Multi-document Summarization Based on BE-Vector Clustering

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3878))

Abstract

In this paper, we propose a novel multi-document summarization strategy based on Basic Element (BE) vector clustering. In this strategy, sentences are represented by BE vectors instead of word or term vectors before clustering. BE is a head-modifier-relation triple representation of sentence content, and it is more precise to use BE as semantic unit than to use word. The BE-vector clustering is realized by adopting the k-means clustering method, and a novel clustering analysis method is employed to automatically detect the number of clusters, K. The experimental results indicate a superiority of the proposed strategy over the traditional summarization strategy based on word vector clustering. The summaries generated by the proposed strategy achieve a ROUGE-1 score of 0.37291 that is better than those generated by traditional strategy (at 0.36936) on DUC04 task-2.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dragomir, R., Hongyan, J., Malgorzata, B.: Centroid-Based Summarization of Multiple Documents: Sentence Extraction, Utility-Based Evaluation and User Studies. Information Processing and Management 40, 919–938 (2004)

    Article  MATH  Google Scholar 

  2. Hilda, H.: Cross-Document Summarization by Concept Classification. In: Proceedings of the 25th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 121–128. ACM Press, New York (2002)

    Google Scholar 

  3. Mitra, M., Amit, S., Chris, B.: Automatic Text Summarization by Paragraph Extraction. In: ACL/EACL 1997 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, pp. 31–36 (1997)

    Google Scholar 

  4. Kevin, K., Daniel, M.: Summarization Beyond Sentence Extraction: a Probabilistic Approach to Sentence Compression. Artificial Intelligence 139, 91–107 (2002)

    Article  MATH  Google Scholar 

  5. Regina, B., McKeown Kathleen, R., Elhadad, M.: Information Fusion in the Context of Multi-Document Summarization. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pp. 550–557. Association for Computational Linguistics, New Jersey (1999)

    Google Scholar 

  6. Manuel, J.: MAN‘A-LO‘PEZ: Multi-document Summarization: An Added Value to Clustering in Interactive Retrieval. ACM Transactions on Information Systems 22, 215–241 (2004)

    Article  Google Scholar 

  7. Hu, P., He, T., Ji, D., Wang, M.: A Study of Chinese Text Summarization Using Adaptive Clustering of Paragraphs. In: Proceeding of the Fourth International Conference on Computer and Information Technology (CIT 2004), Wuhan, pp. 1159–1164 (2004)

    Google Scholar 

  8. Hovy, E., Lin, C.-Y., Zhou, L., Fukumoto, J.: Basic Elements. Technical Report (2005), http://www.isi.edu/~cyl/BE/index.html

  9. Lin, D.: Minipar (1998), http://www.cs.ualberta.ca/~lindek/minipar.htm

  10. Baeza Yates, R., Ribeiro Neto, B.: Modern Information Retrieval, pp. 27–30. Addison Wesley, New York (1999)

    Google Scholar 

  11. Pantel, P., Lin, D.: Document Clustering with Committees. In: Proceedings of ACM, SIGIR 2002, pp. 199–206. ACM, New York (2002)

    Google Scholar 

  12. Webb, A.R.: Statistical Pattern Recognition, 2nd edn., pp. 376–379. John Wiley & Sons, Chichester (2002)

    Book  MATH  Google Scholar 

  13. Paul, O., James, Y.: An Introduction to DUC-2004. In: Proceedings of the 4th Document Understanding Conference, DUC 2004 (2004)

    Google Scholar 

  14. Lin, C.-Y., Hovy, E.: Automatic Evaluation of Summaries Using N-gram Co-Occurrence Statistics. In: Proceedings of the Human Technology Conference (HLTNAACL 2003), Edmonton, Canada (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, D., He, Y., Ji, D., Yang, H. (2006). Multi-document Summarization Based on BE-Vector Clustering. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299_49

Download citation

  • DOI: https://doi.org/10.1007/11671299_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32205-4

  • Online ISBN: 978-3-540-32206-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics