Advertisement

Fuzzy Clustering for Topic Analysis and Summarization of Document Collections

  • René Witte
  • Sabine Bergler
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4509)

Abstract

Large document collections, such as those delivered by Internet search engines, are difficult and time-consuming for users to read and analyse. The detection of common and distinctive topics within a document set, together with the generation of multi-document summaries, can greatly ease the burden of information management. We show how this can be achieved with a clustering algorithm based on fuzzy set theory, which (i) is easy to implement and integrate into a personal information system, (ii) generates a highly flexible data structure for topic analysis and summarization, and (iii) also delivers excellent performance.

Keywords

Topic Analysis Noun Phrase Fuzzy Cluster Cluster Graph Distinctive Topic 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Klir, G.J., Folger, T.A.: Fuzzy Sets, Uncertainty, and Information. Prentice-Hall, Englewood Cliffs (1988)zbMATHGoogle Scholar
  2. 2.
    Witte, R.: Architektur von Fuzzy-Informationssystemen. BoD, Norderstedt (2002)Google Scholar
  3. 3.
    Witte, R., Bergler, S.: Fuzzy Coreference Resolution for Summarization. In: Proceedings of 2003 International Symposium on Reference Resolution and Its Applications to Question Answering and Summarization (ARQAS), Venice, Italy, Università Ca’ Foscari, June 23–24, 2003, vol. 50, pp. 43–50 (2003), http://rene-witte.net
  4. 4.
    Angheluta, R., Jeuniaux, P., Mitra, R., Moens, M.-F.: Clustering Algorithms for Noun Phrase Coreference Resolution. In: Proc. of 7èmes Journées internationales d’Analyse statistique des Données Textuelles, Louvain La Neuve, Belgium, March 10–12, 2004, pp. 60–70 (2004)Google Scholar
  5. 5.
    DUC 2004 Workshop on Text Summarization, Boston Park Plaza Hotel and Towers, Boston, USA, May 6–7, 2004, NIST (2004), http://duc.nist.gov/pubs.html#2004
  6. 6.
    Proceedings of the HLT/NAACL Workshop on Text Summarization, Edmonton, Canada, May 31–June 1, 2003, NIST (2003)Google Scholar
  7. 7.
    Witte, R., Krestel, R., Bergler, S.: ERSS 2005: Coreference-Based Summarization Reloaded. In: Proceedings of Document Understanding Workshop (DUC), Vancouver, B.C., Canada, October 9–10 (2005)Google Scholar
  8. 8.
    Witte, R., Krestel, R., Bergler, S.: Context-based Multi-Document Summarization using Fuzzy Coreference Cluster Graphs. In: Proceedings of Document Understanding Workshop (DUC), New York City, NY, USA, June 8–9 (2006)Google Scholar
  9. 9.
    Mani, I.: Automatic Summarization. John Benjamins, Amsterdam (2001)zbMATHGoogle Scholar
  10. 10.
    Lin, C.-Y.: ROUGE: a Package for Automatic Evaluation of Summaries. In: Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona, Spain, July 25–26 (2004)Google Scholar
  11. 11.
    Zhai, C., Velivelli, A., Yu, B.: A Cross-Collection Mixture Model for Comparative Text Mining. In: Proc. of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’04), Seattle, WA, USA, pp. 743–748. ACM Press, New York (2004), doi:10.1145/1014052.1014150CrossRefGoogle Scholar
  12. 12.
    Liu, B., Ma, Y., Yu, P.S.: Discovering unexpected information from your competitors’ web sites. In: Knowledge Discovery and Data Mining (2001), citeseer.ist.psu.edu/article/liu01discovering.html
  13. 13.
    Jatowt, A., Bun, K.K., Ishizuka, M.: Change Summarization in Web Collections. In: Orchard, B., Yang, C., Ali, M. (eds.) IEA/AIE 2004. LNCS (LNAI), vol. 3029, pp. 653–662. Springer, Heidelberg (2004)Google Scholar
  14. 14.
    Berry, M.W.: Survey of Text Mining: Clustering, Classification, and Retrieval. Springer, Heidelberg (2003)Google Scholar
  15. 15.
    Radev, D.R., Jing, H., Styś, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manage. 40(6), 919–938 (2004)zbMATHCrossRefGoogle Scholar
  16. 16.
    Witte, R., Gerlach, P., Joachim, M., Kappler, T., Krestel, R., Perera, P.: Engineering a Semantic Desktop for Building Historians and Architects. In: Proceedings of the Semantic Desktop Workshop at the ISWC, Galway, Ireland, November 6, 2005. CEUR Workshop Proceedings, vol. 175, pp. 138–152 (2005)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • René Witte
    • 1
  • Sabine Bergler
    • 2
  1. 1.Institut für Programmstrukturen und Datenorganisation (IPD), Universität Karlsruhe (TH)Germany
  2. 2.Department of Computer Science and Software Engineering, Concordia University, MontréalCanada

Personalised recommendations