Skip to main content

Advertisement

Log in

A vectorized k-means algorithm for compressed datasets: design and experimental analysis

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Clustering algorithms (i.e., Gaussian mixture models, k-means) tackle the problem of grouping a set of elements in such a way that elements from the same group (or cluster) have more similar properties to each other than to those elements in other clusters. This simple concept turns out to be the basis in complex algorithms from many application areas, including sequence analysis and genotyping in bioinformatics, medical imaging, antimicrobial activity, market research, social networking, etc. However, as the data volume continues to increase, the performance of clustering algorithms is heavily influenced by the memory subsystem. In this paper, we propose a novel and efficient implementation of Lloyd’s k-means clustering algorithm to substantially reduce data movement along the memory hierarchy. Our contributions are based on the fact that the vast majority of processors are equipped with powerful Single Instruction Multiple Data (SIMD) instructions that are, in most cases, underused. SIMD improves the CPU computational power and, if used wisely, can be seen as an opportunity to improve on the application data transfers by compressing/decompressing the data, specially for memory-bound applications. Our contributions include a SIMD-friendly data layout organization, in-register implementation of key functions and SIMD-based compression. We demonstrate that using our optimized SIMD-based compression method, it is possible to improve the performance and energy of k-means by a factor of 4.5x and 8.7x, respectively, for a i7 Haswell machine, and 22x and 22.2x for Xeon Phi: KNL, running a single thread.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Positron emission tomography.

  2. Store instructions that skip the first level of the cache hierarchy.

  3. The addition of all the data values within a vector register.

  4. Network on Chip.

References

  1. Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms, pp 1027–1035

  2. Ayguadé D, Copty N, Duran A, Hoefinger J, Lin Y, Massaioli F, Teruel X, Unnikrishnan P, Zhang G (2009) The design of OpenMP tasks. IEEE Trans Parallel Distrib Syst 20(3):401–418

    Article  Google Scholar 

  3. Browne S, Dongarra J, Garner N, Ho G, Mucci P (2000) A portable programming interface for performance evaluation on modern processors. Int J High Perform Comput Appl 14(3):189–204

    Article  Google Scholar 

  4. Burks S, Harrell G, Wang J (2015) On initial effects of the K-means clustering. In: Proceedings of the International Conference on Scientific Computing, pp 200–205

  5. Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Skadron K (2008) A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380

    Article  Google Scholar 

  6. Cui X, Zhu P, Yang X, Li K, Ji C (2014) Optimized big data K-means clustering using MapReduce. J Supercomput 70(3):1249–1259

    Article  Google Scholar 

  7. Hadian A, Shahrivari S (2014) High performance parallel K-means clustering for disk-resident datasets on multi-core CPUs. J Supercomput 69(2):845–863

    Article  Google Scholar 

  8. Hamerly G (2010) Making k-means even faster. In: 2010 SIAM International Conference on Data Mining, pp 130–140

  9. Hasib AA, Cebrián JM, Natvig L (2015) V-pfordelta: data compression for energy efficient computation of time series. In: 2015 IEEE 22nd International Conference on High Performance Computing, pp 416–425

  10. Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892

    Article  MATH  Google Scholar 

  11. Lemire D, Boytsov L, Kurz N (2015) SIMD compression and the intersection of sorted integers. Softw Pract Exp 46(6):723–749

    Article  Google Scholar 

  12. Lloyd S (2006) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137

    Article  MathSciNet  MATH  Google Scholar 

  13. Mall R (2015) Sparsity in large scale kernel models. Ph.D. thesis, Leuven Arenberg Doctoral School

  14. Mall R, Jumutc V, Langone R, Suykens JAK (2014) Representative subsets for big data learning using K-NN graphs. In: IEEE International Conference on Big Data, pp 37–42

  15. Mathew J, Vijayakumar R (2015) Enhancement of parallel K-means algorithm. In: Proceedings of the International Conference on Innovations in Information, Embedded and Communication Systems, pp 1–6

  16. Mittal S, Vetter J (2015) A survey of architectural approaches for data compression in cache and main memory systems. IEEE Trans Parallel Distrib Syst 99(1):1–14

    Google Scholar 

  17. Stephens N (2016) The scalable vector extension (SVE) for the ARMv8-A architecture. https://community.arm.com/groups/processors/blog/2016/08/22/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture

  18. Northwestern University, USA (2013) Parallel K-means data clustering. http://www.ece.northwestern.edu/~wkliao/Kmeans/index.html

  19. Ravaee H (2012) Finding protein complexes via fuzzy learning vector quantization algorithm. In: Cai W, Hong H (eds) Protein–protein interactions—computational and experimental tools. InTech, London, United Kingdom, pp 273–284

    Google Scholar 

  20. Rivoire S, Shah MA, Ranganathan P, Kozyrakis C, Meza J (2007) Models and metrics to enable energy-efficiency optimizations. Computer 40(12):39–48

    Article  Google Scholar 

  21. University of California, Irvine (2018) Machine learning repository. https://archive.ics.uci.edu/ml/datasets.html

  22. University of California, Irvine (2018) Synthetic control chart dataset. http://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/synthetic_control.data.html

  23. Fränti P et al (2015) Clustering datasets. http://cs.uef.fi/sipu/datasets/

  24. Wang J, Wang J, Song J, Xu XS, Shen HT, Li S (2015) Optimized cartesian k-means. IEEE Trans Knowl Data Eng 27(1):180–192

    Article  Google Scholar 

  25. Wu F, Wu Q, Tan Y, Wei L, Shao L, Gao L (2013) A vectorized K-means algorithm for intel many integrated core architecture. In: International Symposium on Advanced Parallel Processing Technologies, pp 277–294

  26. Xiao L, Shao Z, Liu G (2006) K-means algorithm based on particle swarm optimization algorithm for anomaly intrusion detection. In: The Sixth World Congress on Intelligent Control and Automation, vol 2, pp 5854–5858

  27. Zechner M, Granitzer M (2009) K-means on the graphics processor: design and experimental analysis. Int J Adv Syst Meas 2:224–235

    Google Scholar 

  28. Zeng G (2012) Fast approximate K-means via cluster closures. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR’12, Washington, DC, USA, pp 3037–3044

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdullah Al Hasib.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al Hasib, A., Cebrian, J.M. & Natvig, L. A vectorized k-means algorithm for compressed datasets: design and experimental analysis. J Supercomput 74, 2705–2728 (2018). https://doi.org/10.1007/s11227-018-2310-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2310-0

Keywords

Navigation