Abstract
In the paper a KD-tree based filtering algorithm for K-means clustering is considered. A parallel version of the algorithm for shared memory systems, which uses OpenMP tasks both for KD-tree construction and filtering in the assignment step of K-means, is proposed. In our approach, an OpenMP task is created for a recursive call performed by tree construction and filtering procedures. A data partitioning step during the tree construction is also parallelized by OpenMP tasks. In computational experiments we measured runtimes of the parallel and serial version of the filtering algorithm and a parallel version of classical Lloyd’s algorithm for six datasets sampled from two distributions. The results of experiments, performed on a 24-core system indicate that our version filtering algorithm has very good parallel efficiency. Its runtime is up to four orders of magnitude shorter than the runtime of parallel Lloyd’s algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009). https://doi.org/10.1007/s10994-009-5103-0
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable K-means++. Proc. VLDB Endow. 5(7), 622–633 (2012). https://doi.org/10.14778/2180912.2180915
Bentley, J.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Chan, E., Heimlich, M., Purkayastha, A., van de Geijn, R.: Collective communication: theory, practice, and experience. Concurr. Comput. Pract. Exp. 19(13), 1749–1783 (2007). https://doi.org/10.1002/cpe.1206
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2009)
Frias, L., Petit, J.: Parallel partition revisited. In: McGeoch, C.C. (ed.) WEA 2008. LNCS, vol. 5038, pp. 142–153. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68552-4_11
Hamerly, G., Drake, J.: Accelerating Lloyd’s algorithm for k-means clustering. In: Celebi, M.E. (ed.) Partitional Clustering Algorithms, pp. 41–78. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-09259-1_2
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651–666 (2010). https://doi.org/10.1016/j.patrec.2009.09.011
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient K-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002). https://doi.org/10.1109/TPAMI.2002.1017616
Kwedlo, W., Czochański, P.J.: A hybrid MPI/OpenMP parallelization of K-means algorithms accelerated using the triangle inequality. IEEE Access 7, 42280–42297 (2019). https://doi.org/10.1109/ACCESS.2019.2907885
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982). https://doi.org/10.1109/TIT.1982.1056489
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)
Maitra, R., Melnykov, V.: Simulating data to study performance of finite mixture modeling and clustering algorithms. J. Comput. Graph. Stat. 19(2), 354–376 (2010)
OpenMP Architecture Review Board: OpenMP application program interface version 4.5 (2015). http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
Pelleg, D., Moore, A.: Accelerating exact K-means algorithms with geometric reasoning. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 277–281 (1999). https://doi.org/10.1145/312129.312248
Pettinger, D., Di Fatta, G.: Scalability of efficient parallel K-means. In: Proceedings of the 5th IEEE International Conference on e-Science, Workshop on Computational e-Science, pp. 96–101 (2009). https://doi.org/10.1109/ESCIW.2009.5407991
Acknowledgments
This work was supported by Białystok University of Technology grant S/WI/2/2018 funded by Polish Ministry of Science and Higher Education. The calculations were carried out at the Academic Computer Centre in Gdańsk, Poland.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Kwedlo, W., Łubowicz, M. (2020). An OpenMP Parallelization of the K-means Algorithm Accelerated Using KD-trees. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2019. Lecture Notes in Computer Science(), vol 12043. Springer, Cham. https://doi.org/10.1007/978-3-030-43229-4_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-43229-4_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43228-7
Online ISBN: 978-3-030-43229-4
eBook Packages: Computer ScienceComputer Science (R0)