Abstract
Computation of a Euclidean distance matrix (EDM) is a typical task in a wide spectrum of problems connected with data analysis. Currently, many parallel algorithms for this task have been developed for GPUs. However, these developments cannot be directly applied to the Intel Xeon Phi many-core processor. In this paper, we address the task of accelerating EDM computation on Intel Xeon Phi in the case when the input data fit into the main memory. We present a parallel algorithm based on a novel block-oriented scheme of computations that allows for the efficient utilization of Intel Xeon Phi vectorization abilities. Experimental evaluation of the algorithm on real-world and synthetic datasets shows that it is highly scalable and outruns analogues in the case of rectangular matrices with low-dimensional data points.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Strictly speaking, an EDM should contain Euclidean distances, and not the squares thereof. However, we adhere to this ambiguous convention in order to ensure compatibility with most papers related to EDMs [5].
- 2.
Note that this definition also covers the case \(\mathbf {A} \equiv \mathbf {B}\).
- 3.
- 4.
References
Arefin, A.S., Riveros, C., Berretta, R., Moscato, P.: Computing large-scale distance matrices on GPU. In: The 7th International Conference on Computer Science and Education, ICCSE 2012, Melbourne, Australia, 14–17 July 2012, pp. 576–580. IEEE Computer Society (2012). https://doi.org/10.1109/ICCSE.2012.6295141
Chang, D., Jones, N.A., Li, D., Ouyang, M., Ragade, R.K.: Compute pairwise Euclidean distances of data points with GPUs. In: Proceedings of the IASTED International Symposium on Computational Biology and Bioinformatics, CBB’2008, Orlando, Florida, USA, 16–18 November 2008, pp. 278–283. IASTED (2008)
Chrysos, G.: Intel® Xeon Phi coprocessor (codename Knights Corner). In: 2012 IEEE Hot Chips 24th Symposium (HCS), Cupertino, CA, USA, 27–29 August 2012, pp. 1–31 (2012). https://doi.org/10.1109/HOTCHIPS.2012.7476487
Dembélé, D., Kastner, P.: Fuzzy c-means method for clustering microarray data. Bioinformatics 19(8), 973–980 (2003)
Dokmanic, I., Parhizkar, R., Ranieri, J., Vetterli, M.: Euclidean distance matrices: essential theory, algorithms, and applications. IEEE Sig. Process. Mag. 32(6), 12–30 (2015)
Engreitz Jr., J.M., Daigle, B.J., Marshall, J.J., Altman, R.B.: Independent component analysis: mining microarray data for fundamental human gene expression modules. J. Biomed. Inform. 43(6), 932–944 (2010)
Foote, J.: An overview of audio information retrieval. Multimed. Syst. 7(1), 2–10 (1999)
Hassan, Q.F.: Innovative Research and Applications in Next-Generation High Performance Computing. IGI Global, Hershey (2016). https://doi.org/10.4018/978-1-5225-0287-6
Jaros, M., et al.: Implementation of k-means segmentation algorithm on Intel Xeon Phi and GPU: application in medical imaging. Adv. Eng. Softw. 103, 21–28 (2017)
Kim, S., Ouyang, M.: Compute distance matrices with GPU. In: Proceedings of the 3rd Annual International Conference on Advances in Distributed and Parallel Computing, ADPC’2012, Bali, Indonesia, 17–18 September 2012 (2012). https://doi.org/10.5176/2251-1652_ADPC12.07
Kostenetskiy, P., Safonov, A.: SUSU supercomputer resources. In: Sokolinsky, L., Starodubov, I., (eds.) PCT’2016, International Scientific Conference on Parallel Computational Technologies, Arkhangelsk, Russia, 29–31 March 2016. CEUR Workshop Proceedings, vol. 1576, pp. 561–573 (2016)
Lee, S., Liao, W., Agrawal, A., Hardavellas, N., Choudhary, A.N.: Evaluation of K-means data clustering algorithm on Intel Xeon Phi. In: Joshi, J., et al. (eds.) 2016 IEEE International Conference on Big Data, BigData 2016, Washington DC, USA, 5–8 December 2016, pp. 2251–2260. IEEE (2016)
Li, Q., Kecman, V., Salman, R.: A chunking method for Euclidean distance matrix calculation on large dataset using multi-GPU. In: Draghici, S., Khoshgoftaar, T.M., Palade, V., Pedrycz, W., Wani, M.A., Zhu, X. (eds.) The 9th International Conference on Machine Learning and Applications, ICMLA 2010, Washington, DC, USA, 12–14 December 2010, pp. 208–213. IEEE Computer Society (2010). https://doi.org/10.1109/ICMLA.2010.38
Meek, C., Thiesson, B., Heckerman, D.: The learning-curve sampling method applied to model-based clustering. J. Mach. Learn. Res. 2, 397–418 (2002)
Melnykov, V., Chen, W.C., Maitra, R.: MixSim: an R package for simulating data to study performance of clustering algorithms. J. Stat. Softw. 51(12), 1–25 (2012). https://doi.org/10.18637/jss.v051.i12
Narayanan, R., Özisikyilmaz, B., Zambreno, J., Memik, G., Choudhary, A.N.: Minebench: a benchmark suite for data mining workloads. In: Proceedings of the 2006 IEEE International Symposium on Workload Characterization, IISWC 2006, San Jose, California, USA, 25–27 October 2006, pp. 182–188. IEEE Computer Society (2006)
Rechkalov, T., Zymbler, M.: Accelerating medoids-based clustering with the Intel Many Integrated Core architecture. In: 9th International Conference on Application of Information and Communication Technologies, AICT 2015, 14–16 October 2015, Rostov-on-Don, Russia - Proceedings, pp. 413–417 (2015). https://doi.org/10.1109/ICAICT.2015.7338591
Sodani, A.: Knights Landing (KNL): 2nd generation Intel® Xeon Phi processor. In: 2015 IEEE Hot Chips 27th Symposium (HCS), Cupertino, CA, USA, 22–25 August 2015, pp. 1–24. IEEE (2015)
Valenzise, G., Gerosa, L., Tagliasacchi, M., Antonacci, F., Sarti, A.: Scream and gunshot detection and localization for audio-surveillance systems. In: Fourth IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2007, Queen Mary, University of London, London, United Kingdom, September 5–7 2007, pp. 21–26. IEEE Computer Society (2007)
Wu, F., Wu, Q., Tan, Y., Wei, L., Shao, L., Gao, L.: A vectorized K-means algorithm for intel many integrated core architecture. In: Wu, C., Cohen, A. (eds.) APPT 2013. LNCS, vol. 8299, pp. 277–294. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45293-2_21
Zou, J., Chen, L., Chen, C.L.P.: Ensemble fuzzy c-means clustering algorithms based on KL-Divergence for medical image segmentation. In: Li, G., et al. (eds.) 2013 IEEE International Conference on Bioinformatics and Biomedicine, Shanghai, China, 18–21 December 2013, pp. 291–296. IEEE Computer Society (2013)
Acknowledgments
This work was financially supported by the Russian Foundation for Basic Research (grant No. 17-07-00463), by Act 211 of the Government of the Russian Federation (contract No. 02.A03.21.0011) and by the Ministry of Education and Science of the Russian Federation (government order 2.7905.2017/8.9).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Rechkalov, T., Zymbler, M. (2018). A Study of Euclidean Distance Matrix Computation on Intel Many-Core Processors. In: Sokolinsky, L., Zymbler, M. (eds) Parallel Computational Technologies. PCT 2018. Communications in Computer and Information Science, vol 910. Springer, Cham. https://doi.org/10.1007/978-3-319-99673-8_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-99673-8_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99672-1
Online ISBN: 978-3-319-99673-8
eBook Packages: Computer ScienceComputer Science (R0)