Estimating Risk of Picking a Sentence for Document Summarization

Kumar, Chandan; Pingali, Prasad; Varma, Vasudeva

doi:10.1007/978-3-642-00382-0_46

Chandan Kumar¹⁷,
Prasad Pingali¹⁷ &
Vasudeva Varma¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5449))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1778 Accesses
1 Citations

Abstract

Automatic Document summarization is proving to be an increasingly important task to overcome the information overload. The primary task of document summarization process is to pick subset of sentences as a representative of whole document set. We treat this as a decision making problem and estimate the risk involve in making this decision. We calculate the risk of information loss associated with each sentence and extract sentences based on ascending order of their risk. The experimental result shows that the proposed approach performs better than various state of the art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, New York (1991)
Book MATH Google Scholar
Lin, C.Y., Hovy, E.H.: Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In: Proceedings of HLT-NAACL 2003(2003)
Google Scholar
Lin, C., Hovy, E.: The automatic acquistion of topic signatures for text summarization. In: Proc. of COLING (2000)
Google Scholar
Lin, C.Y., Hovy, E.H.: From Single to Multidocument Summarization: A Prototype System and its Evaluation. In: Proceedings of ACL 2002 (2002)
Google Scholar
Daume, H., Marcu, D.: Bayesian query-focused summarization. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 05–312 (2006)
Google Scholar
Radev, D.R., Jing, H.Y., Stys, M., Tam, D.: Centroid-based summarization of multiple documents. Information Processing and Management 40, 919–938 (2004)
Article MATH Google Scholar
Mani, I., Maybury, M.: Advances in Automatic Text Summarization. MIT Press, Cambridge (1999)
Google Scholar
Mani, I., Bloedorn, E.: Summarizing Similarities and Differences Among Related Documents. Journal of Information Retrieval (2000)
Google Scholar
Erkan, G., Radev, D.: LexPageRank: prestige in multidocument text summarization. In: Proceedings of EMNLP 2004 (2004)
Google Scholar
Hardy, H., Shimizu, N., Strzalkowski, T., Ting, L., Wise, G.B., Zhang, X.: Cross-document summarization by concept classification. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland (2002)
Google Scholar
Berger, J.: Statistical decision theory and Bayesian analysis. Springer, Heidelberg (1985)
Book MATH Google Scholar
Conroy, J., Schlesinger, J., Goldstein, J., OLeary, D.: Left-brain/right-brain multi-document summarization. In: Proceedings of DUC (2004)
Google Scholar
Conroy, J., Schlesinger, J., Goldstein, J.: Three classy ways to perform arabic and english multidocument summarization. In: Proc. of MSE (2005)
Google Scholar
Kupiec, J., Pederson, J., Chen, F.A.: Trainable Document Summarizer. In: Proceedings of the 18th ACM SIGIR, pp. 68–73 (1995)
Google Scholar
Amini, M.-R., Gallinari, P.: The Use of unlabeled data to improve supervised learning for text summarization. In: Proceedings of the 25th ACM SIGIR, pp. 105–112 (2002)
Google Scholar
Over, P., Yen, J.: An introduction to DUC 2004 intrinsic evaluation of generic news text summarization systems. In: Proceedings of DUC (2004)
Google Scholar
Harabagiu, S., Lacatusu, F.: Topic themes for multidocument summarization. In: Proceedings of SIGIR, Salvador, Brazil, pp. 202–209 (2005)
Google Scholar
Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Yih, W.T., Goodman, J., Vanderwende, L., Suzuki, H.: Multi-document summarization by maximizing informative content words. In: IJCAI 2007: 20th International Joint Conference on Artificial Intelligence (January 2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Language Technologies Research Centre, International Institute of Information Technology, Hyderabad, India
Chandan Kumar, Prasad Pingali & Vasudeva Varma

Authors

Chandan Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Prasad Pingali
View author publications
You can also search for this author in PubMed Google Scholar
Vasudeva Varma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, C., Pingali, P., Varma, V. (2009). Estimating Risk of Picking a Sentence for Document Summarization. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2009. Lecture Notes in Computer Science, vol 5449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00382-0_46

Download citation

DOI: https://doi.org/10.1007/978-3-642-00382-0_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00381-3
Online ISBN: 978-3-642-00382-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics