Abstract
Compared to Beowulf clusters and shared-memory machines, GPU and FPGA are emerging alternative architectures that provide massive parallelism and great computational capabilities. These architectures can be utilized to run compute-intensive algorithms to analyze ever-enlarging datasets and provide scalability.
In this paper, we present four implementations of K-means data clustering algorithm for different high performance computing platforms. These four implementations include a CUDA implementation for GPUs, a Mitrion C implementation for FPGAs, an MPI implementation for Beowulf compute clusters, and an OpenMP implementation for shared-memory machines. The comparative analyses of the cost of each platform, difficulty level of programming for each platform, and the performance of each implementation are presented.
Similar content being viewed by others
References
Hey T, Tansley S, Tolle K (2009) The fourth paradigm: data-intensive scientific discovery. Microsoft Research
Sarkar S, Majumder T, Kalyanaraman A, Pande P (2012) Hardware accelerators for biocomputing: a survey. In: IEEE international symposium on circuits and systems (ISCS)
NVIDIA Corporation (2009) NVIDIA CUDA programming guide, Version 2.3.1
Schlesinger TE (2005) Information storage and nanotechnology. Keynote speech at the 22nd IEEE/13th NASA Goddard conference on mass storage systems and technologies (MSST 2005), April 2005, Monterey, CA
Dunning TH Jr. (2005) The once and future SciDAC. J Phys Conf Ser 16(2005)
Sarrafzadeh M, Wong CK (1996) An introduction to VLSI physical design. McGraw-Hill, New York
Kindratenko V, Pointer D (2006) A case study in porting a production scientific supercomputing application to a reconfigurable computer. In: Proceedings of IEEE symposium on field-programmable custom computing machines (FCCM 2006), 24–26 April 2006, Napa, CA
Silicon Graphics, Inc (2004) Extraordinary acceleration of workflows with reconfigurable application-specific computing from SGI. The SGI white paper, November 2004
Message Passing Interface Forum MPI (1994) A message passing interface standard. Int J Supercomput Appl High Perform Comput 8(3/4):165–414
OpenMP website. http://openmp.org/mp/
Chapman B, Jost G, van der Pas R (2007) Using OpenMP: portable shared memory parallel programming. MIT Press, Cambridge
K-means clustering, definition of. Wikipedia page, http://en.wikipedia.org/wiki/K-means_algorithm
Fang W, Lau K, Lu M, Xiao X, Lam C, Yang P, He B, Luo Q, Sander P, Yang K (2008) Parallel data mining on graphics processors. HKUST-CS08-07
Jian L, Wang C, Liu Y, Liang S, Yi W, Shi Y (2011) Parallel data mining techniques on graphics processing unit with Compute Unified Device Architecture (CUDA). J Supercomput
Mitrionics Inc. The Mitrion C user’s guide. http://forum.mitrionics.com/uploads/Mitrion_Users_Guide.pdf
NVIDIA Corporation. White paper of Kepler GK110 architecture. http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf
Acknowledgements
The authors would like to acknowledge the use of the SGI Altix 4700 located at Idaho National Laboratory for the work performed in this paper, and consultation with Dr. Charles Tolle for the data analysis of this project. The work is part of INL Subcontract/ISU No. 125-229-59.
This work was also made possible by NIH Grant #P20 RR016454 from the INBRE Program of the National Center for Research Resources.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, L., Chiu, S.C., Liao, WK. et al. High performance data clustering: a comparative analysis of performance for GPU, RASC, MPI, and OpenMP implementations. J Supercomput 70, 284–300 (2014). https://doi.org/10.1007/s11227-013-0906-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-013-0906-y