An OpenMP Parallelization of the K-means Algorithm Accelerated Using KD-trees

  • Wojciech KwedloEmail author
  • Michał Łubowicz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12043)


In the paper a KD-tree based filtering algorithm for K-means clustering is considered. A parallel version of the algorithm for shared memory systems, which uses OpenMP tasks both for KD-tree construction and filtering in the assignment step of K-means, is proposed. In our approach, an OpenMP task is created for a recursive call performed by tree construction and filtering procedures. A data partitioning step during the tree construction is also parallelized by OpenMP tasks. In computational experiments we measured runtimes of the parallel and serial version of the filtering algorithm and a parallel version of classical Lloyd’s algorithm for six datasets sampled from two distributions. The results of experiments, performed on a 24-core system indicate that our version filtering algorithm has very good parallel efficiency. Its runtime is up to four orders of magnitude shorter than the runtime of parallel Lloyd’s algorithm.


K-means clustering OpenMP tasks KD-trees 



This work was supported by Białystok University of Technology grant S/WI/2/2018 funded by Polish Ministry of Science and Higher Education. The calculations were carried out at the Academic Computer Centre in Gdańsk, Poland.


  1. 1.
    Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009). Scholar
  2. 2.
    Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable K-means++. Proc. VLDB Endow. 5(7), 622–633 (2012). Scholar
  3. 3.
    Bentley, J.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Chan, E., Heimlich, M., Purkayastha, A., van de Geijn, R.: Collective communication: theory, practice, and experience. Concurr. Comput. Pract. Exp. 19(13), 1749–1783 (2007). Scholar
  5. 5.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2009)zbMATHGoogle Scholar
  6. 6.
    Frias, L., Petit, J.: Parallel partition revisited. In: McGeoch, C.C. (ed.) WEA 2008. LNCS, vol. 5038, pp. 142–153. Springer, Heidelberg (2008). Scholar
  7. 7.
    Hamerly, G., Drake, J.: Accelerating Lloyd’s algorithm for k-means clustering. In: Celebi, M.E. (ed.) Partitional Clustering Algorithms, pp. 41–78. Springer, Cham (2015). Scholar
  8. 8.
    Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651–666 (2010). Scholar
  9. 9.
    Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient K-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002). Scholar
  10. 10.
    Kwedlo, W., Czochański, P.J.: A hybrid MPI/OpenMP parallelization of K-means algorithms accelerated using the triangle inequality. IEEE Access 7, 42280–42297 (2019). Scholar
  11. 11.
    Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982). Scholar
  12. 12.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)Google Scholar
  13. 13.
    Maitra, R., Melnykov, V.: Simulating data to study performance of finite mixture modeling and clustering algorithms. J. Comput. Graph. Stat. 19(2), 354–376 (2010)MathSciNetCrossRefGoogle Scholar
  14. 14.
    OpenMP Architecture Review Board: OpenMP application program interface version 4.5 (2015).
  15. 15.
    Pelleg, D., Moore, A.: Accelerating exact K-means algorithms with geometric reasoning. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 277–281 (1999).
  16. 16.
    Pettinger, D., Di Fatta, G.: Scalability of efficient parallel K-means. In: Proceedings of the 5th IEEE International Conference on e-Science, Workshop on Computational e-Science, pp. 96–101 (2009).

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Faculty of Computer ScienceBiałystok University of TechnologyBiałystokPoland

Personalised recommendations