Multi-document Automatic Text Summarization Using Entropy Estimates

  • G. Ravindra
  • N. Balakrishnan
  • K. R. Ramakrishnan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2932)


This paper describes a sentence ranking technique using entropy measures, in a multi-document unstructured text summarization application. The method is topic specific and makes use of a simple language independent training framework to calculate entropies of symbol units. The document set is summarized by assigning entropy-based scores to a reduced set of sentences obtained using a graph representation for sentence similarity. The performance is seen to be better than some of the common statistical techniques, when applied on the same data set. Commonly used measures like precision, recall and f-score have been modified and used as a new set of measures for comparing the performance of summarizers. The rationale behind such a modification is also presented. Experimental results are presented to illustrate the relevance of this method in cases where it is difficult to have language specific dictionaries, translators and document-summary pairs for training.


Singular Value Decomposition Latent Semantic Analysis Membership Grade Latent Semantic Indexing Entropy Estimate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baldwin, B., Morton, T.: Dynamic Co-Reference Based Summarization. In: Proc. Third Conference on Emperical Methods in Natural Language Processing, pp. 630–632 (1998)Google Scholar
  2. 2.
    Carbonell, J.G., Goldstein, J.: Use of mmr Diversity-Based Re-Ranking for Recording Documents and Producing Summaries. In: Proc. ACM, SIGIR 1998 (1998)Google Scholar
  3. 3.
    Hovy, E.H., Lin, C.Y.: Automated Text Summarization in SUMMARIST, ch. 8. MIT Press, Cambridge (1999)Google Scholar
  4. 4.
    Deerwester, S.D., et al.: Indexing by Latent Semantic Analysis. American Society for Information Science 41, 391–407 (1990)CrossRefGoogle Scholar
  5. 5.
    Paice, C.: Constructing Literature Abstracts by Computer: Techniques and Prospects. Information Processing and Management 26, 171–186 (1990)CrossRefGoogle Scholar
  6. 6.
    Barzilay, R., Elhadad, M.: Using Lexical Chains for Text Summarization. In: Proc. Workshop on Intelligent Scalable Text Summarization, Madrid, Spain (1997)Google Scholar
  7. 7.
    Morris, J., Hirst, G.: Lexical Cohesion Computed by Thesaural Relations as an Indication of the Structure of Text. Computational Linguistics 17, 21–43 (1991)Google Scholar
  8. 8.
    Yihong Gong, X.L.: Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In: Proc. ACM SIGIR 2001, pp. 19–25 (2001)Google Scholar
  9. 9.
    Radev, D., Budzikowska, M: Centroid-Based Summarization of Multiple Documents: Sentence Extaction, Utility-Based Evaluation and User Studies. In: Proc. ANLP/NAACL 2000 (2000) Google Scholar
  10. 10.
    Dragomir Radev, V.H., McKeowen, K.R.: A Description of the Cidr System as Used for tdt-2. In: Proc. DARPA Broadcast News Workshop, Herndon (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • G. Ravindra
    • 1
  • N. Balakrishnan
    • 1
  • K. R. Ramakrishnan
    • 2
  1. 1.Institute of ScienceSupercomputer Education and Research CenterBangaloreIndia
  2. 2.Dept. of Electrical Engineering, Institute of ScienceBangaloreIndia

Personalised recommendations