Explaining Text Clustering Results Using Semantic Structures

  • Andreas Hotho
  • Steffen Staab
  • Gerd Stumme
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2838)


Common text clustering techniques offer rather poor capabilities for explaining to their users why a particular result has been achieved. They have the disadvantage that they do not relate semantically nearby terms and that they cannot explain how resulting clusters are related to each other. In this paper, we discuss a way of integrating a large thesaurus and the computation of lattices of resulting clusters into common text clustering in order to overcome these two problems. As its major result, our approach achieves an explanation using an appropriate level of granularity at the concept level as well as an appropriate size and complexity of the explaining lattice of resulting clusters.


Formal Concept Concept Lattice Semantic Structure Text Representation Formal Context 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Buckley, C., Lewit, A.: Optimizations of inverted vector searches. In: SIGIR 1985, pp. 97–110 (1985)Google Scholar
  2. 2.
    Cohen, W.W.: Fast effective rule induction. In: Proc. of ICML 1995, July 9–12, pp. 115–123. Morgan Kaufmann, San Francisco (1995)Google Scholar
  3. 3.
    Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/gather: A cluster-based approach to browsing large document collections. In: SIGIR 1992, pp. 318–329 (1992)Google Scholar
  4. 4.
    Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  5. 5.
    Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. The MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  6. 6.
    Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Heidelberg (1999)zbMATHGoogle Scholar
  7. 7.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Research and Development in Information Retrieval, pp. 50–57 (1999)Google Scholar
  8. 8.
    Hotho, A., Maedche, A., Staab, S., Studer, R.: SEAL-II — the soft spot between richly structured and unstructured knowledge. Journal of Universal Computer Science (J.UCS) 7(7), 566–590 (2001)zbMATHGoogle Scholar
  9. 9.
    Hotho, A., Staab, S., Stumme, G.: Wordnet improves text document clustering. In: Proc. Of the SIGIR 2003 Semantic Web Workshop (2003)Google Scholar
  10. 10.
    Karypis, G., Han, E.-H.: Fast supervised dimensionality reduction algorithm with applications to document categorization and retrieval. In: Proceedings of CIKM 2000, pp. 12–19. ACM Press, New York (2000)CrossRefGoogle Scholar
  11. 11.
    Kowalski, G.: Information Retrieval systems-theory and implementations. Kluwer Academic Publishers, Dordrecht (1997)Google Scholar
  12. 12.
    Maedche, A., Staab, S.: Ontology learning for the semantic web. IEEE Intelligent Systems 16(2), 72–79 (2001)CrossRefGoogle Scholar
  13. 13.
    Mladenic, D.: Text learning and related intelligent agents. IEEE Expert (July/August 1999)Google Scholar
  14. 14.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
  15. 15.
    Van Rijsbergen, C.: Information Retrieval. Buttersworth, London (1989)Google Scholar
  16. 16.
    Salton, G.: Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)Google Scholar
  17. 17.
    Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining (2000)Google Scholar
  18. 18.
    Stumme, G., Taouil, R., Bastide, Y., Pasqier, N., Lakhal, L.: Computing iceberg concept lattices with Titanic. J. on Knowledge and Data Engineering 42, 189–222 (2002)zbMATHCrossRefGoogle Scholar
  19. 19.
    Zamir, O., Etzioni, O., Madani, O., Karp, R.M.: Fast and intuitive clustering of web documents. In: KDD 1997, pp. 287–290 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Andreas Hotho
    • 1
  • Steffen Staab
    • 1
  • Gerd Stumme
    • 1
  1. 1.Institute of Applied Informatics and Formal Description Methods AIFBUniversity of KarlsruheKarlsruheGermany

Personalised recommendations