Dynamic Hierarchical Compact Clustering Algorithm

  • Reynaldo Gil-García
  • José M. Badía-Contelles
  • Aurora Pons-Porrata
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3773)

Abstract

In this paper we introduce a general framework for hierarchical clustering that deals with both static and dynamic data sets. From this framework, different hierarchical agglomerative algorithms can be obtained, by specifying an inter-cluster similarity measure, a subgraph of the β-similarity graph, and a cover algorithm. A new clustering algorithm called Hierarchical Compact Algorithm and its dynamic version are presented, which are specific versions of the proposed framework. Our evaluation experiments on several standard document collections show that this algorithm requires less computational time than standard methods in dynamic data sets while achieving a comparable or even better clustering quality. Therefore, we advocate its use for tasks that require dynamic clustering, such as information organization, creation of document taxonomies and hierarchical topic detection.

Keywords

Cluster Algorithm Document Cluster Hierarchical Cluster Algorithm Hierarchical Cluster Method Cover Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Text REtrieval Conference (TREC), http://trec.nist.gov
  2. 2.
    TDT2 collection, version 4.0 (1998), http://www.nist.gov/speech/tests/tdt.html
  3. 3.
    Carpineto, C., Romano, G.: A lattice conceptual clustering system and its application to browsing retrieval. Machine Learning 24(2), 95–122 (1996)Google Scholar
  4. 4.
    Carrasco-Ochoa, J.A., Ruiz-Shulcloper, J., De la Vega-Doria, L.A.: Sensitivity analysis for beta0-compact sets. In: VI Iberoamerican Symposium on Pattern Recognition, pp. 14–19 (2001)Google Scholar
  5. 5.
    Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering and dynamic information retrieval. In: 29th Annual Symposium on Theory of Computing, pp. 626–635 (1997)Google Scholar
  6. 6.
    Hammouda, K.M., Kamel, M.S.: Efficient phrase-based document indexing for web document clustering. IEEE Transactions on Knowledge and Data Engineering 16(10), 1279–1296 (2004)CrossRefGoogle Scholar
  7. 7.
    Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: KDD 1999, pp. 16–22 (1999)Google Scholar
  8. 8.
    Lewis, D.: Reuters-21578 text collection, version 1.2., http://kdd.ics.uci.edu
  9. 9.
    Pons-Porrata, A., Berlanga-Llavori, R., Ruiz-Shulcloper, J.: On-line event and topic detection by using the compact sets clustering algorithm. Journal of Intelligent and Fuzzy Systems 3(4), 185–194 (2002)Google Scholar
  10. 10.
    Wai-chiu, W., Wai-chee Fu, A.: Incremental document clustering for web page classification. In: IEEE 2000 International Conference on Information Society in the 21st Century: Emerging technologies and new challenges (2000)Google Scholar
  11. 11.
    Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: International Conference on Information and Knowledge Management, pp. 515–524 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Reynaldo Gil-García
    • 1
  • José M. Badía-Contelles
    • 2
  • Aurora Pons-Porrata
    • 1
  1. 1.Center of Pattern Recognition and Data MiningUniversidad de OrienteSantiago de CubaCuba
  2. 2.Universitat Jaume ICastellónSpain

Personalised recommendations