A Method for the Automatic Summarization of Topic-Based Clusters of Documents

  • Aurora Pons-Porrata
  • José Ruiz-Shulcloper
  • Rafael Berlanga-Llavori
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2905)


In this paper we propose an effective method to summarize document clusters. This method is based on the Testor Theory, and it is applied to a group of newspaper articles in order to summarize the events that they describe. This method is also applicable to either a very large document collection or a very large document, in order to identify the main themes (topics) of the collection (documents) and to summarize them. The results obtained in the experiments demonstrate the usefulness of the proposed method.


Typical Testors Compression Rate Testor Theory Newspaper Article Document Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic Detection and Tracking Pilot Study: Final Report. In: Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, pp. 194–218 (1998)Google Scholar
  2. 2.
    Yang, Y., Ault, T., Pierce, T., Lattimer, C.W.: Improving text categorization methods for event tracking. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2000, Athens, pp. 65–72 (2000)Google Scholar
  3. 3.
    Carbonell, J., Yang, Y., Lafferty, J., Brown, R.D., Pierce, T., Liu, X.: CMU Report on TDT-2: Segmentation, detection and tracking. In: Proceedings of DARPA Broadcast News Workshop, pp. 117–120 (1999)Google Scholar
  4. 4.
    Yamron, J.: Dragon’s Tracking and Detection Systems for TDT2000 Evaluation. In: Proceedings of Topic Detection & Tracking Workshop, pp. 75–80 (2000)Google Scholar
  5. 5.
    Allan, J., Lavrenko, V., Frey, D., Khandelwal, V.: UMASS at TDT 2000. In: Proceedings TDT 2000 Workshop (2000)Google Scholar
  6. 6.
    Pons-Porrata, A., Berlanga-Llavori, R., Ruiz-Shulcloper, J.: Detecting events and topics by using temporal references. In: Garijo, F.J., Riquelme, J.-C., Toro, M. (eds.) IBERAMIA 2002. LNCS (LNAI), vol. 2527, pp. 11–20. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Pons-Porrata, A., Berlanga-Llavori, R., Ruiz-Shulcloper, J.: Building a hierarchy of events and topics for newspaper digital libraries. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 588–596. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  8. 8.
    Zamir, O., Etzioni, O., Madani, O., Karp, R.M.: Fast and Intituitive Clustering of Web Documents. In: Proceedings of KDD 1997, pp. 287–290 (1997)Google Scholar
  9. 9.
    Cutting, D.R., Karger, D.R., Pedersen, J.O.: Constant Interaction-Time Scatter/Gather Browsing of Very Large Document Collections. In: Proceedings of the 16th International ACM SIGIR Conference on Research and Development in Information Retrieval (1993)Google Scholar
  10. 10.
    Mani, I., Bloedorn, E.: Multi-Document Summarization by Graph Search and Matching. In: AAAI/IAAI 1997, pp. 622–628 (1997)Google Scholar
  11. 11.
    Barzilay, R., Elhadad, N., McKeown, K.: Inferring Strategies for Sentence Ordering in Multidocument News Summarization. Journal of Artificial Intelligence Research 17, 35–55 (2002)zbMATHGoogle Scholar
  12. 12.
    Mani, I.: Automatic Summarisation. John Benjamins Publishing Company, Amsterdam (2001)Google Scholar
  13. 13.
    Marcu, D.: Discourse-based summarisation in DUC-2001. In: Proceedings of Document Understanding Conference, DUC 2001 (2001)Google Scholar
  14. 14.
    Lazo-Cortés, M., Ruiz-Shulcloper, J., Alba-Cabrera, E.: An overview of the concept testor. Pattern Recognition 34(4), 13–21 (2001)CrossRefGoogle Scholar
  15. 15.
    Llidó, D., Berlanga, R., Aramburu, M.J.: Extracting temporal references to automatically assign document event-time periods. In: Proceedings of Database and Expert System Applications 2001, pp. 62–71. Springer, Munich (2001)CrossRefGoogle Scholar
  16. 16.
    Santiesteban, Y., Pons, A.: LEX: a new algorithm for the calculus of typical testors. Rev. Ciencias Matemáticas 21(1) (2003) (in Spanish)Google Scholar
  17. 17.

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Aurora Pons-Porrata
    • 1
  • José Ruiz-Shulcloper
    • 2
  • Rafael Berlanga-Llavori
    • 3
  1. 1.Universidad de OrienteSantiago de CubaCuba
  2. 2.Institute of Cybernetics, Mathematics and PhysicsLa HabanaCuba
  3. 3.Universitat Jaume ICastellónSpain

Personalised recommendations