Estimating Risk of Picking a Sentence for Document Summarization

  • Chandan Kumar
  • Prasad Pingali
  • Vasudeva Varma
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5449)

Abstract

Automatic Document summarization is proving to be an increasingly important task to overcome the information overload. The primary task of document summarization process is to pick subset of sentences as a representative of whole document set. We treat this as a decision making problem and estimate the risk involve in making this decision. We calculate the risk of information loss associated with each sentence and extract sentences based on ascending order of their risk. The experimental result shows that the proposed approach performs better than various state of the art approaches.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, New York (1991)CrossRefMATHGoogle Scholar
  2. 2.
    Lin, C.Y., Hovy, E.H.: Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In: Proceedings of HLT-NAACL 2003(2003)Google Scholar
  3. 3.
    Lin, C., Hovy, E.: The automatic acquistion of topic signatures for text summarization. In: Proc. of COLING (2000)Google Scholar
  4. 4.
    Lin, C.Y., Hovy, E.H.: From Single to Multidocument Summarization: A Prototype System and its Evaluation. In: Proceedings of ACL 2002 (2002)Google Scholar
  5. 5.
    Daume, H., Marcu, D.: Bayesian query-focused summarization. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 05–312 (2006)Google Scholar
  6. 6.
    Radev, D.R., Jing, H.Y., Stys, M., Tam, D.: Centroid-based summarization of multiple documents. Information Processing and Management 40, 919–938 (2004)CrossRefMATHGoogle Scholar
  7. 7.
    Mani, I., Maybury, M.: Advances in Automatic Text Summarization. MIT Press, Cambridge (1999)Google Scholar
  8. 8.
    Mani, I., Bloedorn, E.: Summarizing Similarities and Differences Among Related Documents. Journal of Information Retrieval (2000)Google Scholar
  9. 9.
    Erkan, G., Radev, D.: LexPageRank: prestige in multidocument text summarization. In: Proceedings of EMNLP 2004 (2004)Google Scholar
  10. 10.
    Hardy, H., Shimizu, N., Strzalkowski, T., Ting, L., Wise, G.B., Zhang, X.: Cross-document summarization by concept classification. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland (2002)Google Scholar
  11. 11.
    Berger, J.: Statistical decision theory and Bayesian analysis. Springer, Heidelberg (1985)CrossRefMATHGoogle Scholar
  12. 12.
    Conroy, J., Schlesinger, J., Goldstein, J., OLeary, D.: Left-brain/right-brain multi-document summarization. In: Proceedings of DUC (2004)Google Scholar
  13. 13.
    Conroy, J., Schlesinger, J., Goldstein, J.: Three classy ways to perform arabic and english multidocument summarization. In: Proc. of MSE (2005)Google Scholar
  14. 14.
    Kupiec, J., Pederson, J., Chen, F.A.: Trainable Document Summarizer. In: Proceedings of the 18th ACM SIGIR, pp. 68–73 (1995)Google Scholar
  15. 15.
    Amini, M.-R., Gallinari, P.: The Use of unlabeled data to improve supervised learning for text summarization. In: Proceedings of the 25th ACM SIGIR, pp. 105–112 (2002)Google Scholar
  16. 16.
    Over, P., Yen, J.: An introduction to DUC 2004 intrinsic evaluation of generic news text summarization systems. In: Proceedings of DUC (2004)Google Scholar
  17. 17.
    Harabagiu, S., Lacatusu, F.: Topic themes for multidocument summarization. In: Proceedings of SIGIR, Salvador, Brazil, pp. 202–209 (2005)Google Scholar
  18. 18.
    Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)MATHGoogle Scholar
  19. 19.
    Yih, W.T., Goodman, J., Vanderwende, L., Suzuki, H.: Multi-document summarization by maximizing informative content words. In: IJCAI 2007: 20th International Joint Conference on Artificial Intelligence (January 2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Chandan Kumar
    • 1
  • Prasad Pingali
    • 1
  • Vasudeva Varma
    • 1
  1. 1.Language Technologies Research CentreInternational Institute of Information TechnologyHyderabadIndia

Personalised recommendations