Estimating Risk of Picking a Sentence for Document Summarization

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5449)


Automatic Document summarization is proving to be an increasingly important task to overcome the information overload. The primary task of document summarization process is to pick subset of sentences as a representative of whole document set. We treat this as a decision making problem and estimate the risk involve in making this decision. We calculate the risk of information loss associated with each sentence and extract sentences based on ascending order of their risk. The experimental result shows that the proposed approach performs better than various state of the art approaches.


Relative Entropy Information Loss Source Document Coverage Baseline Document Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, New York (1991)CrossRefzbMATHGoogle Scholar
  2. 2.
    Lin, C.Y., Hovy, E.H.: Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In: Proceedings of HLT-NAACL 2003(2003)Google Scholar
  3. 3.
    Lin, C., Hovy, E.: The automatic acquistion of topic signatures for text summarization. In: Proc. of COLING (2000)Google Scholar
  4. 4.
    Lin, C.Y., Hovy, E.H.: From Single to Multidocument Summarization: A Prototype System and its Evaluation. In: Proceedings of ACL 2002 (2002)Google Scholar
  5. 5.
    Daume, H., Marcu, D.: Bayesian query-focused summarization. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 05–312 (2006)Google Scholar
  6. 6.
    Radev, D.R., Jing, H.Y., Stys, M., Tam, D.: Centroid-based summarization of multiple documents. Information Processing and Management 40, 919–938 (2004)CrossRefzbMATHGoogle Scholar
  7. 7.
    Mani, I., Maybury, M.: Advances in Automatic Text Summarization. MIT Press, Cambridge (1999)Google Scholar
  8. 8.
    Mani, I., Bloedorn, E.: Summarizing Similarities and Differences Among Related Documents. Journal of Information Retrieval (2000)Google Scholar
  9. 9.
    Erkan, G., Radev, D.: LexPageRank: prestige in multidocument text summarization. In: Proceedings of EMNLP 2004 (2004)Google Scholar
  10. 10.
    Hardy, H., Shimizu, N., Strzalkowski, T., Ting, L., Wise, G.B., Zhang, X.: Cross-document summarization by concept classification. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland (2002)Google Scholar
  11. 11.
    Berger, J.: Statistical decision theory and Bayesian analysis. Springer, Heidelberg (1985)CrossRefzbMATHGoogle Scholar
  12. 12.
    Conroy, J., Schlesinger, J., Goldstein, J., OLeary, D.: Left-brain/right-brain multi-document summarization. In: Proceedings of DUC (2004)Google Scholar
  13. 13.
    Conroy, J., Schlesinger, J., Goldstein, J.: Three classy ways to perform arabic and english multidocument summarization. In: Proc. of MSE (2005)Google Scholar
  14. 14.
    Kupiec, J., Pederson, J., Chen, F.A.: Trainable Document Summarizer. In: Proceedings of the 18th ACM SIGIR, pp. 68–73 (1995)Google Scholar
  15. 15.
    Amini, M.-R., Gallinari, P.: The Use of unlabeled data to improve supervised learning for text summarization. In: Proceedings of the 25th ACM SIGIR, pp. 105–112 (2002)Google Scholar
  16. 16.
    Over, P., Yen, J.: An introduction to DUC 2004 intrinsic evaluation of generic news text summarization systems. In: Proceedings of DUC (2004)Google Scholar
  17. 17.
    Harabagiu, S., Lacatusu, F.: Topic themes for multidocument summarization. In: Proceedings of SIGIR, Salvador, Brazil, pp. 202–209 (2005)Google Scholar
  18. 18.
    Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)zbMATHGoogle Scholar
  19. 19.
    Yih, W.T., Goodman, J., Vanderwende, L., Suzuki, H.: Multi-document summarization by maximizing informative content words. In: IJCAI 2007: 20th International Joint Conference on Artificial Intelligence (January 2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  1. 1.Language Technologies Research CentreInternational Institute of Information TechnologyHyderabadIndia

Personalised recommendations