ACONS: A New Algorithm for Clustering Documents

  • Andrés Gago Alonso
  • Airel Pérez Suárez
  • José E. Medina Pagola
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4756)

Abstract

In this paper we present a new algorithm for document clustering called Condensed Star (ACONS). This algorithm is a natural evolution of the Star algorithm proposed by Aslam et al., and improved by them and other researchers. In this method, we introduced a new concept of star allowing a different star-shaped form; in this way we retain the strengths of previous algorithms as well as address previous shortcomings. The evaluation experiments on standard document collections show that the proposed algorithm outperforms previously defined methods and obtains a smaller number of clusters. Since the ACONS algorithm is relatively simple to implement and is also efficient, we advocate its use for tasks that require clustering, such as information organization, browsing, topic tracking, and new topic detection.

Keywords

Clustering Document processing 

References

  1. 1.
    Aslam, J., Pelekhov, K., Rus, D.: Static and Dynamic Information Organization with Star Clusters. In: Proceedings of the 1998 Conference on Information Knowledge Management, Baltimore (1998)Google Scholar
  2. 2.
    Aslam, J., Pelekhov, K., Rus, D.: Using Star Clusters for Filtering. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, USA (2000)Google Scholar
  3. 3.
    Aslam, J., Pelekhov, K., Rus, D.: The Star Clustering Algorithm for Static and Dynamic Information Organization. Journal of Graph Algorithms and Applications 8(1), 95–129 (2004)MATHMathSciNetGoogle Scholar
  4. 4.
    Banerjee, A., Krumpelman, C., Basu, S., Mooney, R., Ghosh, J.: Model Based Overlapping Clustering. In: Proceedings of International Conference on Knowledge Discovery and Data Mining (KDD) (2005)Google Scholar
  5. 5.
    Duda, R., Hart, P., Stork, D.: Pattern Classification. John Wiley & Sons, Chichester (2001)MATHGoogle Scholar
  6. 6.
    Gil-García, R.J., Badía-Contelles, J.M., Pons-Porrata, A.: Extended Star Clustering Algorithm. In: Sanfeliu, A., Ruiz-Shulcloper, J. (eds.) CIARP 2003. LNCS, vol. 2905, pp. 480–487. Springer, Heidelberg (2003)Google Scholar
  7. 7.
    Gil-García, R.J., Badía-Contelles, J.M., Pons-Porrata, A.: Parallel Algorithm for Extended Star Clustering. In: Sanfeliu, A., Martínez Trinidad, J.F., Carrasco Ochoa, J.A. (eds.) CIARP 2004. LNCS, vol. 3287, p. 402. Springer, Heidelberg (2004)Google Scholar
  8. 8.
    Kuncheva, L., Hadjitodorov, S.: Using Diversity in Cluster Ensembles. In: Proceedings of IEEE SMC 2004, The Netherlands (2004)Google Scholar
  9. 9.
    van Rijsbergen, C.J.: Information Retrieval, Buttersworth, London, 2nd edn. (1979)Google Scholar
  10. 10.
    Zhong, S., Ghosh, J.: A Comparative Study of Generative Models for Document Clustering. In: Proceedings of SDM Workshop on Clustering High Dimensional Data and Its Applications (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Andrés Gago Alonso
    • 1
  • Airel Pérez Suárez
    • 1
  • José E. Medina Pagola
    • 1
  1. 1.Advanced Technologies Application Center (CENATAV), 7a # 21812 e/ 218 y 222, Rpto. Siboney, Playa, C.P. 12200, La HabanaCuba

Personalised recommendations