Hierarchical Star Clustering Algorithm for Dynamic Document Collections

  • Reynaldo Gil-García
  • Aurora Pons-Porrata
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5197)

Abstract

In this paper, a new clustering algorithm called Dynamic Hierarchical Star is introduced. Our approach aims to construct a hierarchy of overlapped clusters, dealing with dynamic data sets. The experimental results on several benchmark text collections show that this method obtains smaller hierarchies than traditional algorithms while achieving a similar clustering quality. Therefore, we advocate its use for tasks that require dynamic overlapped clustering, such as information organization, creation of document taxonomies and hierarchical topic detection.

Keywords

hierarchical clustering dynamic clustering overlapped clusters 

References

  1. 1.
    Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)MATHGoogle Scholar
  2. 2.
    Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: ACM SIGKDD Workshop on Text Mining, Boston, pp. 109–110 (2000)Google Scholar
  3. 3.
    Li, Y., Chung, S.M., Holt, J.D.: Text document clustering based on frequent word meaning sequences. Data & Knowledge Engineering 64, 381–404 (2008)CrossRefGoogle Scholar
  4. 4.
    Wong, W., Wai-chee Fu, A.: Incremental Document Clustering for Web Page Classification. In: IEEE Int. Conf. on Information Society in the 21st Century: Emerging technologies and new challenges, Japan (2000)Google Scholar
  5. 5.
    Widyantoro, D., Yen, J.: An incremental approach to building a cluster hierarchy. In: 2nd IEEE International Conference on Data Mining, Japan, pp. 705–708 (2002)Google Scholar
  6. 6.
    Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: Knowledge Discovery and Data Mining, pp. 436–442. ACM Press, Canada (2002)Google Scholar
  7. 7.
    Maslowska, I.: Phrase-based hierarchical clustering of web search results. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 555–562. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  8. 8.
    Zamir, O., Etziony, O.: Web document clustering: A Feasibility demonstration. In: 21st SIGIR Conference, pp. 46–54. ACM Press, Melbourne (1998)Google Scholar
  9. 9.
    Gil-García, R., Baddía-Contelles, J., Pons-Porrata, A.: Dynamic Hierarchical Compact Clustering Algorithm. In: Sanfeliu, A., Cortés, M.L. (eds.) CIARP 2005. LNCS, vol. 3773, pp. 302–310. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Gil-García, R., Badía-Contelles, J., Pons-Porrata, A.: Extended Star Clustering Algorithm. In: Sanfeliu, A., Ruiz-Shulcloper, J. (eds.) CIARP 2003. LNCS, vol. 2905, pp. 480–487. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  11. 11.
    Pérez-Súarez, A., Medina-Pagola, J.: A clustering algorithm based on generalized stars. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 248–262. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  12. 12.
    Aslam, J., Pelekhov, K., Rus, D.: Static and Dynamic Information Organization with Star Clusters. In: CIKM 1998, pp. 208–217. ACM Press, Maryland (1998)Google Scholar
  13. 13.
    Larsen, B., Aone, C.: Fast and Effective Text Mining Using Linear-time Document Clustering. In: KDD 1999, pp. 16–22. ACM Press, San Diego (1999)Google Scholar
  14. 14.
    Banerjee, A., Krumpelman, C.: Model based overlapping clustering. In: KDD 2005, pp. 532–537. ACM Press, Chicago (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Reynaldo Gil-García
    • 1
  • Aurora Pons-Porrata
    • 1
  1. 1.Center for Pattern Recognition and Data MiningUniversidad de OrienteSantiago de CubaCuba

Personalised recommendations