Skip to main content
Log in

High performance data clustering: a comparative analysis of performance for GPU, RASC, MPI, and OpenMP implementations

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Compared to Beowulf clusters and shared-memory machines, GPU and FPGA are emerging alternative architectures that provide massive parallelism and great computational capabilities. These architectures can be utilized to run compute-intensive algorithms to analyze ever-enlarging datasets and provide scalability.

In this paper, we present four implementations of K-means data clustering algorithm for different high performance computing platforms. These four implementations include a CUDA implementation for GPUs, a Mitrion C implementation for FPGAs, an MPI implementation for Beowulf compute clusters, and an OpenMP implementation for shared-memory machines. The comparative analyses of the cost of each platform, difficulty level of programming for each platform, and the performance of each implementation are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Hey T, Tansley S, Tolle K (2009) The fourth paradigm: data-intensive scientific discovery. Microsoft Research

  2. Sarkar S, Majumder T, Kalyanaraman A, Pande P (2012) Hardware accelerators for biocomputing: a survey. In: IEEE international symposium on circuits and systems (ISCS)

    Google Scholar 

  3. NVIDIA Corporation (2009) NVIDIA CUDA programming guide, Version 2.3.1

  4. Schlesinger TE (2005) Information storage and nanotechnology. Keynote speech at the 22nd IEEE/13th NASA Goddard conference on mass storage systems and technologies (MSST 2005), April 2005, Monterey, CA

  5. Dunning TH Jr. (2005) The once and future SciDAC. J Phys Conf Ser 16(2005)

  6. Sarrafzadeh M, Wong CK (1996) An introduction to VLSI physical design. McGraw-Hill, New York

    Google Scholar 

  7. Kindratenko V, Pointer D (2006) A case study in porting a production scientific supercomputing application to a reconfigurable computer. In: Proceedings of IEEE symposium on field-programmable custom computing machines (FCCM 2006), 24–26 April 2006, Napa, CA

    Google Scholar 

  8. Silicon Graphics, Inc (2004) Extraordinary acceleration of workflows with reconfigurable application-specific computing from SGI. The SGI white paper, November 2004

  9. Message Passing Interface Forum MPI (1994) A message passing interface standard. Int J Supercomput Appl High Perform Comput 8(3/4):165–414

    Google Scholar 

  10. OpenMP website. http://openmp.org/mp/

  11. Chapman B, Jost G, van der Pas R (2007) Using OpenMP: portable shared memory parallel programming. MIT Press, Cambridge

    Google Scholar 

  12. K-means clustering, definition of. Wikipedia page, http://en.wikipedia.org/wiki/K-means_algorithm

  13. Fang W, Lau K, Lu M, Xiao X, Lam C, Yang P, He B, Luo Q, Sander P, Yang K (2008) Parallel data mining on graphics processors. HKUST-CS08-07

  14. Jian L, Wang C, Liu Y, Liang S, Yi W, Shi Y (2011) Parallel data mining techniques on graphics processing unit with Compute Unified Device Architecture (CUDA). J Supercomput

  15. Mitrionics Inc. The Mitrion C user’s guide. http://forum.mitrionics.com/uploads/Mitrion_Users_Guide.pdf

  16. NVIDIA Corporation. White paper of Kepler GK110 architecture. http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf

Download references

Acknowledgements

The authors would like to acknowledge the use of the SGI Altix 4700 located at Idaho National Laboratory for the work performed in this paper, and consultation with Dr. Charles Tolle for the data analysis of this project. The work is part of INL Subcontract/ISU No. 125-229-59.

This work was also made possible by NIH Grant #P20 RR016454 from the INBRE Program of the National Center for Research Resources.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luobin Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, L., Chiu, S.C., Liao, WK. et al. High performance data clustering: a comparative analysis of performance for GPU, RASC, MPI, and OpenMP implementations. J Supercomput 70, 284–300 (2014). https://doi.org/10.1007/s11227-013-0906-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-013-0906-y

Keywords

Navigation