A Study of Euclidean Distance Matrix Computation on Intel Many-Core Processors

  • Timofey Rechkalov
  • Mikhail Zymbler
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 910)


Computation of a Euclidean distance matrix (EDM) is a typical task in a wide spectrum of problems connected with data analysis. Currently, many parallel algorithms for this task have been developed for GPUs. However, these developments cannot be directly applied to the Intel Xeon Phi many-core processor. In this paper, we address the task of accelerating EDM computation on Intel Xeon Phi in the case when the input data fit into the main memory. We present a parallel algorithm based on a novel block-oriented scheme of computations that allows for the efficient utilization of Intel Xeon Phi vectorization abilities. Experimental evaluation of the algorithm on real-world and synthetic datasets shows that it is highly scalable and outruns analogues in the case of rectangular matrices with low-dimensional data points.


Euclidean distance matrix OpenMP Intel Xeon Phi Data layout Vectorization 



This work was financially supported by the Russian Foundation for Basic Research (grant No. 17-07-00463), by Act 211 of the Government of the Russian Federation (contract No. 02.A03.21.0011) and by the Ministry of Education and Science of the Russian Federation (government order 2.7905.2017/8.9).


  1. 1.
    Arefin, A.S., Riveros, C., Berretta, R., Moscato, P.: Computing large-scale distance matrices on GPU. In: The 7th International Conference on Computer Science and Education, ICCSE 2012, Melbourne, Australia, 14–17 July 2012, pp. 576–580. IEEE Computer Society (2012).
  2. 2.
    Chang, D., Jones, N.A., Li, D., Ouyang, M., Ragade, R.K.: Compute pairwise Euclidean distances of data points with GPUs. In: Proceedings of the IASTED International Symposium on Computational Biology and Bioinformatics, CBB’2008, Orlando, Florida, USA, 16–18 November 2008, pp. 278–283. IASTED (2008)Google Scholar
  3. 3.
    Chrysos, G.: Intel® Xeon Phi coprocessor (codename Knights Corner). In: 2012 IEEE Hot Chips 24th Symposium (HCS), Cupertino, CA, USA, 27–29 August 2012, pp. 1–31 (2012).
  4. 4.
    Dembélé, D., Kastner, P.: Fuzzy c-means method for clustering microarray data. Bioinformatics 19(8), 973–980 (2003)CrossRefGoogle Scholar
  5. 5.
    Dokmanic, I., Parhizkar, R., Ranieri, J., Vetterli, M.: Euclidean distance matrices: essential theory, algorithms, and applications. IEEE Sig. Process. Mag. 32(6), 12–30 (2015)CrossRefGoogle Scholar
  6. 6.
    Engreitz Jr., J.M., Daigle, B.J., Marshall, J.J., Altman, R.B.: Independent component analysis: mining microarray data for fundamental human gene expression modules. J. Biomed. Inform. 43(6), 932–944 (2010)CrossRefGoogle Scholar
  7. 7.
    Foote, J.: An overview of audio information retrieval. Multimed. Syst. 7(1), 2–10 (1999)CrossRefGoogle Scholar
  8. 8.
    Hassan, Q.F.: Innovative Research and Applications in Next-Generation High Performance Computing. IGI Global, Hershey (2016).
  9. 9.
    Jaros, M., et al.: Implementation of k-means segmentation algorithm on Intel Xeon Phi and GPU: application in medical imaging. Adv. Eng. Softw. 103, 21–28 (2017)CrossRefGoogle Scholar
  10. 10.
    Kim, S., Ouyang, M.: Compute distance matrices with GPU. In: Proceedings of the 3rd Annual International Conference on Advances in Distributed and Parallel Computing, ADPC’2012, Bali, Indonesia, 17–18 September 2012 (2012).
  11. 11.
    Kostenetskiy, P., Safonov, A.: SUSU supercomputer resources. In: Sokolinsky, L., Starodubov, I., (eds.) PCT’2016, International Scientific Conference on Parallel Computational Technologies, Arkhangelsk, Russia, 29–31 March 2016. CEUR Workshop Proceedings, vol. 1576, pp. 561–573 (2016)Google Scholar
  12. 12.
    Lee, S., Liao, W., Agrawal, A., Hardavellas, N., Choudhary, A.N.: Evaluation of K-means data clustering algorithm on Intel Xeon Phi. In: Joshi, J., et al. (eds.) 2016 IEEE International Conference on Big Data, BigData 2016, Washington DC, USA, 5–8 December 2016, pp. 2251–2260. IEEE (2016)Google Scholar
  13. 13.
    Li, Q., Kecman, V., Salman, R.: A chunking method for Euclidean distance matrix calculation on large dataset using multi-GPU. In: Draghici, S., Khoshgoftaar, T.M., Palade, V., Pedrycz, W., Wani, M.A., Zhu, X. (eds.) The 9th International Conference on Machine Learning and Applications, ICMLA 2010, Washington, DC, USA, 12–14 December 2010, pp. 208–213. IEEE Computer Society (2010).
  14. 14.
    Meek, C., Thiesson, B., Heckerman, D.: The learning-curve sampling method applied to model-based clustering. J. Mach. Learn. Res. 2, 397–418 (2002)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Melnykov, V., Chen, W.C., Maitra, R.: MixSim: an R package for simulating data to study performance of clustering algorithms. J. Stat. Softw. 51(12), 1–25 (2012). Scholar
  16. 16.
    Narayanan, R., Özisikyilmaz, B., Zambreno, J., Memik, G., Choudhary, A.N.: Minebench: a benchmark suite for data mining workloads. In: Proceedings of the 2006 IEEE International Symposium on Workload Characterization, IISWC 2006, San Jose, California, USA, 25–27 October 2006, pp. 182–188. IEEE Computer Society (2006)Google Scholar
  17. 17.
    Rechkalov, T., Zymbler, M.: Accelerating medoids-based clustering with the Intel Many Integrated Core architecture. In: 9th International Conference on Application of Information and Communication Technologies, AICT 2015, 14–16 October 2015, Rostov-on-Don, Russia - Proceedings, pp. 413–417 (2015).
  18. 18.
    Sodani, A.: Knights Landing (KNL): 2nd generation Intel® Xeon Phi processor. In: 2015 IEEE Hot Chips 27th Symposium (HCS), Cupertino, CA, USA, 22–25 August 2015, pp. 1–24. IEEE (2015)Google Scholar
  19. 19.
    Valenzise, G., Gerosa, L., Tagliasacchi, M., Antonacci, F., Sarti, A.: Scream and gunshot detection and localization for audio-surveillance systems. In: Fourth IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2007, Queen Mary, University of London, London, United Kingdom, September 5–7 2007, pp. 21–26. IEEE Computer Society (2007)Google Scholar
  20. 20.
    Wu, F., Wu, Q., Tan, Y., Wei, L., Shao, L., Gao, L.: A vectorized K-means algorithm for intel many integrated core architecture. In: Wu, C., Cohen, A. (eds.) APPT 2013. LNCS, vol. 8299, pp. 277–294. Springer, Heidelberg (2013). Scholar
  21. 21.
    Zou, J., Chen, L., Chen, C.L.P.: Ensemble fuzzy c-means clustering algorithms based on KL-Divergence for medical image segmentation. In: Li, G., et al. (eds.) 2013 IEEE International Conference on Bioinformatics and Biomedicine, Shanghai, China, 18–21 December 2013, pp. 291–296. IEEE Computer Society (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.South Ural State UniversityChelyabinskRussia

Personalised recommendations