High performance data clustering: a comparative analysis of performance for GPU, RASC, MPI, and OpenMP implementations
- 445 Downloads
Compared to Beowulf clusters and shared-memory machines, GPU and FPGA are emerging alternative architectures that provide massive parallelism and great computational capabilities. These architectures can be utilized to run compute-intensive algorithms to analyze ever-enlarging datasets and provide scalability.
In this paper, we present four implementations of K-means data clustering algorithm for different high performance computing platforms. These four implementations include a CUDA implementation for GPUs, a Mitrion C implementation for FPGAs, an MPI implementation for Beowulf compute clusters, and an OpenMP implementation for shared-memory machines. The comparative analyses of the cost of each platform, difficulty level of programming for each platform, and the performance of each implementation are presented.
KeywordsParallel data clustering K-means clustering Scalability Reconfigurable computing HPC
The authors would like to acknowledge the use of the SGI Altix 4700 located at Idaho National Laboratory for the work performed in this paper, and consultation with Dr. Charles Tolle for the data analysis of this project. The work is part of INL Subcontract/ISU No. 125-229-59.
This work was also made possible by NIH Grant #P20 RR016454 from the INBRE Program of the National Center for Research Resources.
- 1.Hey T, Tansley S, Tolle K (2009) The fourth paradigm: data-intensive scientific discovery. Microsoft Research Google Scholar
- 2.Sarkar S, Majumder T, Kalyanaraman A, Pande P (2012) Hardware accelerators for biocomputing: a survey. In: IEEE international symposium on circuits and systems (ISCS) Google Scholar
- 3.NVIDIA Corporation (2009) NVIDIA CUDA programming guide, Version 2.3.1 Google Scholar
- 4.Schlesinger TE (2005) Information storage and nanotechnology. Keynote speech at the 22nd IEEE/13th NASA Goddard conference on mass storage systems and technologies (MSST 2005), April 2005, Monterey, CA Google Scholar
- 5.Dunning TH Jr. (2005) The once and future SciDAC. J Phys Conf Ser 16(2005) Google Scholar
- 6.Sarrafzadeh M, Wong CK (1996) An introduction to VLSI physical design. McGraw-Hill, New York Google Scholar
- 7.Kindratenko V, Pointer D (2006) A case study in porting a production scientific supercomputing application to a reconfigurable computer. In: Proceedings of IEEE symposium on field-programmable custom computing machines (FCCM 2006), 24–26 April 2006, Napa, CA Google Scholar
- 8.Silicon Graphics, Inc (2004) Extraordinary acceleration of workflows with reconfigurable application-specific computing from SGI. The SGI white paper, November 2004 Google Scholar
- 9.Message Passing Interface Forum MPI (1994) A message passing interface standard. Int J Supercomput Appl High Perform Comput 8(3/4):165–414 Google Scholar
- 10.OpenMP website. http://openmp.org/mp/
- 11.Chapman B, Jost G, van der Pas R (2007) Using OpenMP: portable shared memory parallel programming. MIT Press, Cambridge Google Scholar
- 12.K-means clustering, definition of. Wikipedia page, http://en.wikipedia.org/wiki/K-means_algorithm
- 13.Fang W, Lau K, Lu M, Xiao X, Lam C, Yang P, He B, Luo Q, Sander P, Yang K (2008) Parallel data mining on graphics processors. HKUST-CS08-07 Google Scholar
- 14.Jian L, Wang C, Liu Y, Liang S, Yi W, Shi Y (2011) Parallel data mining techniques on graphics processing unit with Compute Unified Device Architecture (CUDA). J Supercomput Google Scholar
- 15.Mitrionics Inc. The Mitrion C user’s guide. http://forum.mitrionics.com/uploads/Mitrion_Users_Guide.pdf
- 16.NVIDIA Corporation. White paper of Kepler GK110 architecture. http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf