Parallel Algorithm for Extended Star Clustering

  • Reynaldo Gil-García
  • José M. Badía-Contelles
  • Aurora Pons-Porrata
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3287)

Abstract

In this paper we present a new parallel clustering algorithm based on the extended star clustering method. This algorithm can be used for example to cluster massive data sets of documents on distributed memory multiprocessors. The algorithm exploits the inherent data-parallelism in the extended star clustering algorithm. We implemented our algorithm on a cluster of personal computers connected through a Myrinet network. The code is portable to different architectures and it uses the MPI message-passing library. The experimental results show that the parallel algorithm clearly improves its sequential version with large data sets. We show that the speedup of our algorithm approaches the optimal as the number of objects increases.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aslam, J., Pelekhov, K., Rus, D.: Static and Dynamic Information Organization with Star Clusters. In: Proceedings of the 1998 Conference on Information Knowledge Management, Baltimore, MD (1998)Google Scholar
  2. 2.
    Aslam, J., Pelekhov, K., Rus, D.: Scalable Information Organization. In: Proceedings of RIAO (2000)Google Scholar
  3. 3.
    Gil-García, R.J., Badía-Contelles, J.M., Pons-Porrata, A.: Extended Star Clustering Algorithm. In: Sanfeliu, A., Ruiz-Shulcloper, J. (eds.) CIARP 2003. LNCS, vol. 2905, pp. 480–487. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. 4.
    Dhillon, I., Modha, B.A.: Data Clustering Algorithm on Distributed Memory Multiprocessor. In: Workshop on Large-scale Parallel KDD Systems, pp. 245–260 (2000)Google Scholar
  5. 5.
    Nagesh, H., Goil, S., Choudhary, A.: A Scalable Parallel Subspace Clustering Algorithm for Massive Data Sets. In: International Conference on Parallel Processing, pp. 447–454 (2000)Google Scholar
  6. 6.
    Gil-García, R., Badía-Contelles, J.M.: GLC Parallel Clustering Algorithm. In: Pattern Recognition. Advances and Perspectives. Research on Computing Science, pp. 38–394 (2002) (in Spanish)Google Scholar
  7. 7.
    Gil-García, R.J., Badía-Contelles, J.M., Pons-Porrata, A.: A Parallel Algorithm for Incremental Compact Clustering. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 310–317. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  8. 8.
    Larsen, B., Aone, C.: Fast and Effective Text Mining Using Linear-time Document Clustering. In: KDD 1999, San Diego, California, pp. 16–22 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Reynaldo Gil-García
    • 1
  • José M. Badía-Contelles
    • 2
  • Aurora Pons-Porrata
    • 1
  1. 1.Universidad de OrienteSantiago de CubaCuba
  2. 2.Universitat Jaume ICastellónSpain

Personalised recommendations