Abstract
Multi-document summarization can produce a condensed representation of the contents of multiple related text documents. With this summarization facility, web users can judge rapidly the relevance of a group of documents returned by the search engines and decide whether those should be discarded. This reduces the total search cost for the users. This paper presents a multi-document summarization system, which has two components: (1) the sentence extraction component that produces draft summaries by sentence extraction and (2) the sentence-trimming component that eliminates the low content and redundant elements from the sentences in the draft summaries for improving the summarization performance. In this paper, we also introduced several new local and global sentence-trimming rules. Our experiment on DUC 2004 data set shows that the local and global trimming can improve the extractive multi-document summarization performance in many cases.
Preview
Unable to display preview. Download preview PDF.
References
Baxendale, P. B.: Man-made index for technical literature—An experiment. IBM Journal of Research and Development 2(4), 354–361 (1958)
Edmundson, H. P.: New methods in automatic extracting. Journal of the Association for Computing Machinery 16(2), 264–285 (1969)
Luhn, H. P.: The automatic creation of lite rature abstracts. IBM Journal of Research Development 2(2), 159–165 (1958)
McKeown, K. R. and Radev R.D.: Generating summaries of multiple news articles. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval: Seattle, July, pp. 74–82 (1995)
Carbonell, Jaime G. and Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval: Melbourne, Australia, pp. 335–336 (1998)
McKeown, K, Klavans J., Hatzivassiloglou V., Barzilay R., and Eskin, E.: Towards multi-document summarization by reformulation: Progress and prospects. In Proceedings of the 16th National Conference of the American Association for Artificial Intelligence, pp. 453–460, 18–22 July (1999)
Marcu, D and Gerber L.: An inquiry into the nature of multi-document abstracts, extracts, and their evaluation. In Proceedings of the NAACL-2001 Workshop on Automatic Summarization: Pittsburgh, June. NAACL, pages 1–8 (2001)
Radev, D. R., Jing, H., Budzikowska, M. Centroid-based summarization of multiple documents: Sentence extraction, utility-based evaluation, and user studies. In ANLP/NAACL Workshop on Summarization: Seattle, April (2000)
Radev, D. R., Jing, H., Sty M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manage. 40(6), 919–938 (2004)
Barzilay, R., McKeown, K., Elhadad, M.: Information fusion in the context of multi-document summarization. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics: College Park, MD, 20–26 June, pp. 550–557 (1999)
Mani, I., Barbara, G., and Eric, B. Improving summaries by revising them. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics: College Park, MD, June, pp. 558–565 (1999)
Lin, C. Improving Summarization Performance by Sentence Compression—A Pilot Study. In the Proceedings of the Sixth International Workshop on Information Retrieval with Asian Language (IRAL): Sapporo, Japan, July 7 (2003)
Knight., Marcu, D.: Statistics-Based Summarization-Step One: Sentence Compression. In Proceedings of AAAI: Austin, TX, USA (2000)
Hovy, E., Lin, Z. L.: A BE-based Multi-document summarizer with sentence compression. In Proceedings of Multilingual Summariza-tion Evaluation (ACL), Ann Arbor, MI (2005)
Liu, H.: MontyLingua: An end-to-end natural language processor with common sense.: Available at: web.media.mit.edu/~hugo/montylingua, (2004)
Dorr, B. Zajic, J., David, S. R.: Hedgetrimmer: A parse-and-trim approach to headline generation. In Proceedings of the HLT/NAACL Text Summarization Workshop and Document Understanding Conference (DUC): (pp. 1–8). Edmonton, Alberta (2003)
Hovy, E.H., Fukumoto, J., Lin, C.-Y., Zhou L.: Basic Elements.: http://www.isi.edu/~cyl/BE (2005)
Barzilay, R., Elhadad., McKeown, K.: Sentence ordering in multi-document summarization. In Proceedings of the Human Language Technology Conference. (2001)
Lin, C.-Y., Hovy, E.: Automatic evaluation of summaries using n-gram cooccurrence. In Proceedings of Language Technology Conference (HLT-NAACL):, Edmonton, Canada, May 27–June 1 (2003)
Lin, C.Y.: ROUGE: A package for automatic evaluation of summaries. In WAS 2004: Proceedings of the Workshop on Text Summarization Branches Out, Barcelona, Spain July 25–26 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Indian Institute of Information Technology, India
About this paper
Cite this paper
Sarkar, K. (2009). Improving Multi-document Text Summarization Performance using Local and Global Trimming. In: Tiwary, U.S., Siddiqui, T.J., Radhakrishna, M., Tiwari, M.D. (eds) Proceedings of the First International Conference on Intelligent Human Computer Interaction. Springer, New Delhi. https://doi.org/10.1007/978-81-8489-203-1_27
Download citation
DOI: https://doi.org/10.1007/978-81-8489-203-1_27
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-8489-404-2
Online ISBN: 978-81-8489-203-1
eBook Packages: Computer ScienceComputer Science (R0)