High performance data clustering: a comparative analysis of performance for GPU, RASC, MPI, and OpenMP implementations

Yang, Luobin; Chiu, Steve C.; Liao, Wei-Keng; Thomas, Michael A.

doi:10.1007/s11227-013-0906-y

High performance data clustering: a comparative analysis of performance for GPU, RASC, MPI, and OpenMP implementations

Published: 30 March 2013

Volume 70, pages 284–300, (2014)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Luobin Yang¹,
Steve C. Chiu²,
Wei-Keng Liao³ &
…
Michael A. Thomas¹

634 Accesses
14 Citations
Explore all metrics

Abstract

Compared to Beowulf clusters and shared-memory machines, GPU and FPGA are emerging alternative architectures that provide massive parallelism and great computational capabilities. These architectures can be utilized to run compute-intensive algorithms to analyze ever-enlarging datasets and provide scalability.

In this paper, we present four implementations of K-means data clustering algorithm for different high performance computing platforms. These four implementations include a CUDA implementation for GPUs, a Mitrion C implementation for FPGAs, an MPI implementation for Beowulf compute clusters, and an OpenMP implementation for shared-memory machines. The comparative analyses of the cost of each platform, difficulty level of programming for each platform, and the performance of each implementation are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large scale K-means clustering using GPUs

Article Open access 18 October 2022

Efficient and Scalable k‑Means on GPUs

Article 06 September 2018

CPU and GPU parallelized kernel K-means

Article 22 May 2018

References

Hey T, Tansley S, Tolle K (2009) The fourth paradigm: data-intensive scientific discovery. Microsoft Research
Sarkar S, Majumder T, Kalyanaraman A, Pande P (2012) Hardware accelerators for biocomputing: a survey. In: IEEE international symposium on circuits and systems (ISCS)
Google Scholar
NVIDIA Corporation (2009) NVIDIA CUDA programming guide, Version 2.3.1
Schlesinger TE (2005) Information storage and nanotechnology. Keynote speech at the 22nd IEEE/13th NASA Goddard conference on mass storage systems and technologies (MSST 2005), April 2005, Monterey, CA
Dunning TH Jr. (2005) The once and future SciDAC. J Phys Conf Ser 16(2005)
Sarrafzadeh M, Wong CK (1996) An introduction to VLSI physical design. McGraw-Hill, New York
Google Scholar
Kindratenko V, Pointer D (2006) A case study in porting a production scientific supercomputing application to a reconfigurable computer. In: Proceedings of IEEE symposium on field-programmable custom computing machines (FCCM 2006), 24–26 April 2006, Napa, CA
Google Scholar
Silicon Graphics, Inc (2004) Extraordinary acceleration of workflows with reconfigurable application-specific computing from SGI. The SGI white paper, November 2004
Message Passing Interface Forum MPI (1994) A message passing interface standard. Int J Supercomput Appl High Perform Comput 8(3/4):165–414
Google Scholar
OpenMP website. http://openmp.org/mp/
Chapman B, Jost G, van der Pas R (2007) Using OpenMP: portable shared memory parallel programming. MIT Press, Cambridge
Google Scholar
K-means clustering, definition of. Wikipedia page, http://en.wikipedia.org/wiki/K-means_algorithm
Fang W, Lau K, Lu M, Xiao X, Lam C, Yang P, He B, Luo Q, Sander P, Yang K (2008) Parallel data mining on graphics processors. HKUST-CS08-07
Jian L, Wang C, Liu Y, Liang S, Yi W, Shi Y (2011) Parallel data mining techniques on graphics processing unit with Compute Unified Device Architecture (CUDA). J Supercomput
Mitrionics Inc. The Mitrion C user’s guide. http://forum.mitrionics.com/uploads/Mitrion_Users_Guide.pdf
NVIDIA Corporation. White paper of Kepler GK110 architecture. http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf

Download references

Acknowledgements

The authors would like to acknowledge the use of the SGI Altix 4700 located at Idaho National Laboratory for the work performed in this paper, and consultation with Dr. Charles Tolle for the data analysis of this project. The work is part of INL Subcontract/ISU No. 125-229-59.

This work was also made possible by NIH Grant #P20 RR016454 from the INBRE Program of the National Center for Research Resources.

Author information

Authors and Affiliations

Department of Biological Sciences, Idaho State University, Pocatello, ID, USA
Luobin Yang & Michael A. Thomas
Department of Electrical Engineering and Computer Science, Idaho State University, Pocatello, ID, USA
Steve C. Chiu
Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL, USA
Wei-Keng Liao

Authors

Luobin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Steve C. Chiu
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Keng Liao
View author publications
You can also search for this author in PubMed Google Scholar
Michael A. Thomas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luobin Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, L., Chiu, S.C., Liao, WK. et al. High performance data clustering: a comparative analysis of performance for GPU, RASC, MPI, and OpenMP implementations. J Supercomput 70, 284–300 (2014). https://doi.org/10.1007/s11227-013-0906-y

Download citation

Published: 30 March 2013
Issue Date: October 2014
DOI: https://doi.org/10.1007/s11227-013-0906-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High performance data clustering: a comparative analysis of performance for GPU, RASC, MPI, and OpenMP implementations

Abstract

Access this article

Similar content being viewed by others

Large scale K-means clustering using GPUs

Efficient and Scalable k‑Means on GPUs

CPU and GPU parallelized kernel K-means

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High performance data clustering: a comparative analysis of performance for GPU, RASC, MPI, and OpenMP implementations

Abstract

Access this article

Similar content being viewed by others

Large scale K-means clustering using GPUs

Efficient and Scalable k‑Means on GPUs

CPU and GPU parallelized kernel K-means

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation