Skip to main content

Comparison of HPC Architectures for Computing All-Pairs Shortest Paths. Intel Xeon Phi KNL vs NVIDIA Pascal

Part of the Communications in Computer and Information Science book series (CCIS,volume 1409)


Today, one of the main challenges for high-performance computing systems is to improve their performance by keeping energy consumption at acceptable levels. In this context, a consolidated strategy consists of using accelerators such as GPUs or many-core Intel Xeon Phi processors. In this work, devices of the NVIDIA Pascal and Intel Xeon Phi Knights Landing architectures are described and compared. Selecting the Floyd-Warshall algorithm as a representative case of graph and memory-bound applications, optimized implementations were developed to analyze and compare performance and energy efficiency on both devices. As it was expected, Xeon Phi showed superior when considering double-precision data. However, contrary to what was considered in our preliminary analysis, it was found that the performance and energy efficiency of both devices were comparable using single-precision datatype.


  • Shortest paths
  • Floyd-Warshall
  • Xeon Phi
  • Knights landing
  • NVIDIA Pascal
  • Titan X

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-75836-3_3
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-75836-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.


  1. 1.

  2. 2.


  3. 3.

    If there is no path between nodes i and j, their distance is considered to be infinite (usually represented as the largest positive value).

  4. 4.

    The characteristics of each platform were described at the end of Sect. 2.1.

  5. 5.

    Intel Performance Counter Monitor:

  6. 6.

    NVIDIA System Management Interface:

  7. 7.

    Naturally, it is also possible to develop an implementation that processes the matrix in parts and does not have this memory limitation. However, the need to run I/O operations for each round would significantly degrade performance.


  1. Codreanu, V., Rodríguez, J., Saastad, O.W.: Best practice guide - knights landing (2017).

  2. Costanzo, M., Rucci, E., Costi, U., Chichizola, F., Naiouf, M.: Comparación de Arquitecturas HPC para Computar Caminos Mínimos en Grafos. Intel Xeon Phi KNL vs NVIDIA Pascal. In: Actas del XXVI Congreso Argentino de Ciencias de la Computación (CACIC 2020), pp. 82–92 (2020)

    Google Scholar 

  3. Deng, L., Bai, H., Zhao, D., Wang, F.: Kepler GPU vs. Xeon phi: performance case study with a high-order CFD application. In: 2015 IEEE International Conference on Computer and Communications (ICCC), pp. 87–94 (2015)

    Google Scholar 

  4. Deveci, M., Trott, C., Rajamanickam, S.: Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures. Parallel Comput. 78, 33–46 (2018).

  5. Foley, D., Danskin, J.: Ultra-performance pascal GPU and NVLINK interconnect. IEEE Micro 37(2), 7–17 (2017)

    CrossRef  Google Scholar 

  6. Gawande, N.A., Daily, J.A., Siegel, C., Tallent, N.R., Vishnu, A.: Scaling deep learning workloads: Nvidia DGX-1/pascal and intel knights landing. Futur. Gener. Comput. Syst. 108, 1162–1172 (2020)

    CrossRef  Google Scholar 

  7. Giefers, H., Staar, P., Bekas, C., Hagleitner, C.: Analyzing the energy-efficiency of sparse matrix multiplication on heterogeneous systems: a comparative study of GPU, Xeon Phi and FPGA. In: 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 46–56 (2016)

    Google Scholar 

  8. Hashemi, S., Anthony, N., Tann, H., Bahar, R.I., Reda, S.: Understanding the impact of precision quantization on the accuracy and energy of neural networks. In: Design, Automation Test in Europe Conference Exhibition (DATE), pp. 1474–1479 (2017).

  9. Igual, F.D., García, C., Botella, G., Piñuel, L., Prieto-Matías, M., Tirado, F.: Non-negative matrix factorization on low-power architectures and accelerators. Comput. Electr. Eng. 46(C), 139–156 (2015).

  10. Katz, G.J., Kider, Jr, J.T.: All-pairs shortest-paths for large graphs on the GPU. In: Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, GH 2008, pp. 47–55. Eurographics Association, Aire-la-Ville (2008)

    Google Scholar 

  11. Lund, B.D., Smith, J.W.: A multi-stage CUDA kernel for Floyd-Warshall. CoRR abs/1001.4108 (2010).

  12. Morgan, T.P.: The end of Xeon Phi - It’s Xeon and Maybe GPUs from here (2018).

  13. NVIDIA: NVIDIA Tesla P100.

  14. Reinders, J., Jeffers, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming Knights, Landing edn. Morgan Kaufmann Publishers Inc., Boston (2016)

    Google Scholar 

  15. Robertsén, F., Mattila, K., Westerholm, J.: High-performance SIMD implementation of the lattice-Boltzmann method on the Xeon Phi processor. Concurr. Comput. Pract. Exp. 31(13), e5072 (2019).

    CrossRef  Google Scholar 

  16. Rucci, E., De Giusti, A., Naiouf, M.: Blocked all-pairs shortest paths algorithm on Intel Xeon Phi KNL processor: a case study. In: De Giusti, A.E. (ed.) CACIC 2017. CCIS, vol. 790, pp. 47–57. Springer, Cham (2018).

    CrossRef  Google Scholar 

  17. Rucci, E., Garcia, C., Botella, G., De Giusti, A., Naiouf, M., Prieto-Matias, M.: SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences. BMC Syst. Biol. 12(5), 96 (2018).

    CrossRef  Google Scholar 

  18. Sakamoto, R., Kondo, M., Fujita, K., Ichimura, T., Nakajima, K.: The effectiveness of low-precision floating arithmetic on numerical codes: a case study on power consumption. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, HPCAsia2020, pp. 199–206. Association for Computing Machinery, New York (2020).

  19. Scheidegger, S., Mikushin, D., Kubler, F., Schenk, O.: Rethinking large-scale economic modeling for efficiency: optimizations for GPU and Xeon Phi clusters. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 610–619 (2018)

    Google Scholar 

  20. Trader, T.: Requiem for a Phi: knights landing discontinued (2018).

  21. Venkataraman, G., Sahni, S., Mukhopadhyaya, S.: A blocked all-pairs shortest-paths algorithm. SWAT 2000. LNCS, vol. 1851, pp. 419–432. Springer, Heidelberg (2000).

    MATH  CrossRef  Google Scholar 

  22. Véstias, M., Neto, H.: Trends of CPU, GPU and FPGA for high-performance computing. In: 2014 24th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–6 (2014).

Download references


The authors are grateful for the support of NVIDIA through the donation of the Titan X GPU used in this research.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Enzo Rucci .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Costanzo, M., Rucci, E., Costi, U., Chichizola, F., Naiouf, M. (2021). Comparison of HPC Architectures for Computing All-Pairs Shortest Paths. Intel Xeon Phi KNL vs NVIDIA Pascal. In: Pesado, P., Eterovic, J. (eds) Computer Science – CACIC 2020. CACIC 2020. Communications in Computer and Information Science, vol 1409. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-75835-6

  • Online ISBN: 978-3-030-75836-3

  • eBook Packages: Computer ScienceComputer Science (R0)