Multi-document Summarization Based on BE-Vector Clustering

Liu, Dexi; He, Yanxiang; Ji, Donghong; Yang, Hua

doi:10.1007/11671299_49

Dexi Liu^17,18,19,
Yanxiang He^17,19,
Donghong Ji^19,20 &
…
Hua Yang^17,19

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3878))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1386 Accesses
7 Citations

Abstract

In this paper, we propose a novel multi-document summarization strategy based on Basic Element (BE) vector clustering. In this strategy, sentences are represented by BE vectors instead of word or term vectors before clustering. BE is a head-modifier-relation triple representation of sentence content, and it is more precise to use BE as semantic unit than to use word. The BE-vector clustering is realized by adopting the k-means clustering method, and a novel clustering analysis method is employed to automatically detect the number of clusters, K. The experimental results indicate a superiority of the proposed strategy over the traditional summarization strategy based on word vector clustering. The summaries generated by the proposed strategy achieve a ROUGE-1 score of 0.37291 that is better than those generated by traditional strategy (at 0.36936) on DUC04 task-2.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dragomir, R., Hongyan, J., Malgorzata, B.: Centroid-Based Summarization of Multiple Documents: Sentence Extraction, Utility-Based Evaluation and User Studies. Information Processing and Management 40, 919–938 (2004)
Article MATH Google Scholar
Hilda, H.: Cross-Document Summarization by Concept Classification. In: Proceedings of the 25th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 121–128. ACM Press, New York (2002)
Google Scholar
Mitra, M., Amit, S., Chris, B.: Automatic Text Summarization by Paragraph Extraction. In: ACL/EACL 1997 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, pp. 31–36 (1997)
Google Scholar
Kevin, K., Daniel, M.: Summarization Beyond Sentence Extraction: a Probabilistic Approach to Sentence Compression. Artificial Intelligence 139, 91–107 (2002)
Article MATH Google Scholar
Regina, B., McKeown Kathleen, R., Elhadad, M.: Information Fusion in the Context of Multi-Document Summarization. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pp. 550–557. Association for Computational Linguistics, New Jersey (1999)
Google Scholar
Manuel, J.: MAN‘A-LO‘PEZ: Multi-document Summarization: An Added Value to Clustering in Interactive Retrieval. ACM Transactions on Information Systems 22, 215–241 (2004)
Article Google Scholar
Hu, P., He, T., Ji, D., Wang, M.: A Study of Chinese Text Summarization Using Adaptive Clustering of Paragraphs. In: Proceeding of the Fourth International Conference on Computer and Information Technology (CIT 2004), Wuhan, pp. 1159–1164 (2004)
Google Scholar
Hovy, E., Lin, C.-Y., Zhou, L., Fukumoto, J.: Basic Elements. Technical Report (2005), http://www.isi.edu/~cyl/BE/index.html
Lin, D.: Minipar (1998), http://www.cs.ualberta.ca/~lindek/minipar.htm
Baeza Yates, R., Ribeiro Neto, B.: Modern Information Retrieval, pp. 27–30. Addison Wesley, New York (1999)
Google Scholar
Pantel, P., Lin, D.: Document Clustering with Committees. In: Proceedings of ACM, SIGIR 2002, pp. 199–206. ACM, New York (2002)
Google Scholar
Webb, A.R.: Statistical Pattern Recognition, 2nd edn., pp. 376–379. John Wiley & Sons, Chichester (2002)
Book MATH Google Scholar
Paul, O., James, Y.: An Introduction to DUC-2004. In: Proceedings of the 4th Document Understanding Conference, DUC 2004 (2004)
Google Scholar
Lin, C.-Y., Hovy, E.: Automatic Evaluation of Summaries Using N-gram Co-Occurrence Statistics. In: Proceedings of the Human Technology Conference (HLTNAACL 2003), Edmonton, Canada (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, Wuhan University, Wuhan, 430079, P.R. China
Dexi Liu, Yanxiang He & Hua Yang
School of Physics, Xiangfan University, Xiangfan, 441053, P.R. China
Dexi Liu
Center for Study of Language and Information, Wuhan University, Wuhan, 430079, P.R. China
Dexi Liu, Yanxiang He, Donghong Ji & Hua Yang
Institute for Infocomm Research, Heng Mui Keng Terrace, 119613, Singapore
Donghong Ji

Authors

Dexi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yanxiang He
View author publications
You can also search for this author in PubMed Google Scholar
Donghong Ji
View author publications
You can also search for this author in PubMed Google Scholar
Hua Yang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, D., He, Y., Ji, D., Yang, H. (2006). Multi-document Summarization Based on BE-Vector Clustering. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299_49

Download citation

DOI: https://doi.org/10.1007/11671299_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32205-4
Online ISBN: 978-3-540-32206-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics