Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

CPU and GPU parallelized kernel K-means


K-means is one of the most commonly used clustering algorithms, with diverse scope for implementation in the signal processing, artificial intelligence and image processing fields, among others. Different variations and improvements of K-means exist, with kernel K-means being the most famous. K-means has been the subject of many studies aiming to improve its hardware and software implementations. Several of these studies have focused on the parallelization of K-means. Kernelization mainly transforms the data into a feature space of high dimensionality by computing the inner product between each possible data pair. As a result of kernelization, kernel K-means involves several computational steps and has additional computational requirements. As a result, kernel K-means has not seen the same interest and much can still be done in terms of its parallelized and robust implementations. This original research studies and develops different parallel implementations of kernel K-means on both the CPU and the GPU. The proposed CPU implementations use OpenMP and BLAS, while for the developed GPU implementation, the concentration is on CUDA available on Nvidia GPUs. Several datasets of a varying number of features and patterns are used. The results show that CUDA generally provides the best run-times with speedups varying between two to more than two hundred times over a single-core CPU implementation according to the used dataset.

This is a preview of subscription content, log in to check access.

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13


  1. 1.

    Aggarwal CC, Reddy CK (2013) Data clustering: algorithms and applications. CRC Press, Boca Raton

  2. 2.

    Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

  3. 3.

    Dagum L, Menon R (1998) OpenMP: an industry standard API for shared-memory programming. IEEE Comput Sci Eng 5(1):46–55

  4. 4.

    Sharafeddin M, Saghir MA, Akkary H, Artail H, Hajj H (2016) On the effectiveness of accelerating MapReduce functions using the Xilinx Vivado HLS tool. Int J High Perform Syst Archit 6(1):1–12

  5. 5.

    Sharafeddin M, Partamian H, Awad M, Saghir M.A, Akkary H, Artail H, Hajj H, Baydoun M (2016) Towards distributed acceleration of image processing applications using reconfigurable active SSD clusters: a case study of seismic data analysis. Int J High Perform Comput Networking (in press)

  6. 6.

    Baydoun M, Al-Alaoui MA (2014) Enhancing stereo matching with classification. IEEE Access 2:485–499

  7. 7.

    Tzortzis GF, Likas AC (2009) The global kernel-means algorithm for clustering in feature space. IEEE Trans Neural Netw 20(7):1181–1194

  8. 8.

    Tzortzis G, Likas A (2008) The global kernel K-means clustering algorithm. In: Neural Networks, 2008. IJCNN 2008 IEEE World Congress on Computational Intelligence. IEEE International Joint Conference on IEEE, pp 1977–1984

  9. 9.

    Arthur D, Vassilvitskii S (2007) K-means: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete algorithms Society for Industrial and Applied Mathematics, pp 1027–1035

  10. 10.

    Zhang R, Rudnicky AI (2002) A large scale clustering scheme for kernel K-means. In: 16th International Conference on Pattern Recognition, 2002. Proceedings, vol 4. IEEE, pp 289–292

  11. 11.

    Tsapanos N, Tefas A, Nikolaidis N et al (2015) A distributed framework for trimmed kernel K-means clustering. Pattern Recognit 48(8):2685–2698

  12. 12.

    Tsapanos N, Tefas A, Nikolaidis N et al (2015) Distributed, MapReduce-based nearest neighbor and E-ball kernel K-means. In: Computational Intelligence, 2015 IEEE Symposium Series on IEEE, pp 509–515

  13. 13.

    Baydoun M, Dawi M, Ghaziri H (2016) Parallel kernel K-means on the CPU and the GPU. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA) the Steering Committee of the World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), p 117

  14. 14.

    Che S, Boyer M, Meng J et al (2008) A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380

  15. 15.

    Farivar R, Rebolledo D, Chan E et al (2008) A parallel implementation of K-means clustering on GPUs. PDPTA 13:212–312

  16. 16.

    Zechner M, Granitzer M (2009) Accelerating K-means on the graphics processor via cuda. In: Intensive Applications and Services, 2009. INTENSIVE’09. First International Conference on IEEE, pp 7–15

  17. 17.

    Baydoun M, Dawi M, Ghaziri H (2016) Enhanced parallel implementation of the K-means clustering algorithm. In: Advances in Computational Tools for Engineering Applications (ACTEA), 2016 3rd International Conference on IEEE, pp 7–11

  18. 18.

    Zhao W, Ma H, He Q (2009) Parallel K-Means clustering based on mapreduce. In: IEEE International Conference on Cloud Computing. Springer, Berlin, Heidelberg, pp 674–679

  19. 19.

    Bhimani J, Leeser M, Mi N (2015) Accelerating K-means clustering with parallel implementations and GPU computing. In: High Performance Extreme Computing Conference (HPEC), 2015 IEEE IEEE, pp 1–6

  20. 20.

    Rodrigues LM, Zárate LE, Nobre CN et al (2012) Parallel and distributed kmeans to identify the translation initiation site of proteins. In: Systems, Man, and Cybernetics (SMC), 2012 IEEE International Conference on IEEE, pp 1639–1645

  21. 21.

    Stoffel K, Belkoniene A (1999) Parallel k/h-means clustering for large data sets. In: Euro-Par’99 Parallel Processing. Springer, pp 1451–1454

  22. 22.

    Lau KW, Yin H, Hubbard S (2006) Kernel self-organising maps for classification. Neurocomputing 69(16):2033–2040

  23. 23. Accessed 20 Oct 2017

  24. 24.

    Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning, neural and statistical classification. Ellis Horwood, Amsterdam

  25. 25.

    Blake C, Merz CJ (1998) UCI Repository of machine learning databases. University of California, Irvine. Accessed 20 Oct 2017

  26. 26.

    LeCun Y, Cortes C, Burges CJ (1998) The MNIST database of handwritten digits. Accessed 20 Oct 2017

  27. 27. Accessed 20 Oct 2017

Download references

Author information

Correspondence to Mohammed Baydoun.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Baydoun, M., Ghaziri, H. & Al-Husseini, M. CPU and GPU parallelized kernel K-means. J Supercomput 74, 3975–3998 (2018).

Download citation


  • Kernel K-means
  • Clustering
  • CUDA
  • OpenMP
  • BLAS