Analyzing Performance of Selected NESAP Applications on the Cori HPC System

  • Thorsten KurthEmail author
  • William Arndt
  • Taylor Barnes
  • Brandon Cook
  • Jack Deslippe
  • Doug Doerfler
  • Brian Friesen
  • Yun (Helen) He
  • Tuomas Koskela
  • Mathieu Lobet
  • Tareq Malas
  • Leonid Oliker
  • Andrey Ovsyannikov
  • Samuel Williams
  • Woo-Sun Yang
  • Zhengji Zhao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10524)


NERSC has partnered with over 20 representative application developer teams to evaluate and optimize their workloads on the Intel® Xeon Phi™Knights Landing processor. In this paper, we present a summary of this two year effort and will present the lessons we learned in that process. We analyze the overall performance improvements of these codes quantifying impacts of both Xeon Phi™architectural features as well as code optimization on application performance. We show that the architectural advantage, i.e. the average speedup of optimized code on KNL vs. optimized code on Haswell is about 1.1\(\times \). The average speedup obtained through application optimization, i.e. comparing optimized vs. original codes on KNL, is about 5\(\times \).



Research used resources of NERSC, a DOE Office of Science User Facility supported by the Office of Science of the U.S. DOE under Contract No. DE-AC02-05CH11231. This article has been authored at Lawrence Berkeley National Lab under Contract No. DE-AC02-05CH11231 and UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the United States Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan [3].


  1. 1.
    BerkeleyGW Website.
  2. 2.
    CESM Web Site.
  3. 3.
  4. 4.
    GROMACS Web Site.
  5. 5.
    HMMER Web Site.
  6. 6.
  7. 7.
    NERSC and DOE Requirements Reviews Series.
  8. 8.
  9. 9.
  10. 10.
    NERSC Web Site.
  11. 11.
    QBox Web Site.
  12. 12.
    Warp Web Site.
  13. 13.
  14. 14.
    Almgren, A.S., Bell, J.B., Lijewski, M.J., Lukić, Z., Van Andel, E.: Nyx: a massively parallel AMR code for computational cosmology. Astrophys. J. 765, 39 (2013)CrossRefGoogle Scholar
  15. 15.
    Barnes, T., Cook, B., Deslippe, J., Doerfler, D., Friesen, B., He, Y.H., Kurth, T., Koskela, T., Lobet, M., Malas, T., Oliker, L., Ovsyannikov, A., Sarje, A., Vay, J.L., Vincenti, H., Williams, S., Carrier, P., Wichmann, N., Wagner, M., Kent, P., Kerr, C., Dennis, J.: Evaluating and optimizing the NERSC workload on knights landing. In: Proceedings of the 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, PMBS 2016, pp. 43–53. IEEE Press (2016)Google Scholar
  16. 16.
    Barnes, T.A., Kurth, T., Carrier, P., Wichmann, N., Prendergast, D., Kent, P.R., Deslippe, J.: Improved treatment of exact exchange in quantum espresso. Comput. Phys. Commun. 214, 52–58 (2017)CrossRefGoogle Scholar
  17. 17.
    Bauer, B., Gottlieb, S., Hoefler, T.: Performance modeling and comparative analysis of the MILC Lattice QCD application su3_rmd. In: Proceedings of CCGRID 2012: IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (2012)Google Scholar
  18. 18.
    Binder, S., Calci, A., Epelbaum, E., Furnstahl, R.J., Golak, J., Hebeler, K., Kamada, H., Krebs, H., Langhammer, J., Liebig, S., Maris, P., Meißner, U.G., Minossi, D., Nogga, A., Potter, H., Roth, R., Skinińki, R., Topolnicki, K., Vary, J.P., Witała, H.: Few-nucleon systems with state-of-the-art chiral nucleon-nucleon forces. Phys. Rev. C 93(4), 044002 (2016)CrossRefGoogle Scholar
  19. 19.
    Deslippe, J., Samsonidze, G., Strubbe, D.A., Jain, M., Cohen, M.L., Louie, S.G.: Berkeleygw: a massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures. Comput. Phys. Commun. 183(6), 1269–1289 (2012). CrossRefGoogle Scholar
  20. 20.
    Doerfler, D., Austin, B., Cook, B., Deslippe, J., Kandalla, K., Mendygral, P.: Evaluating the networking characteristics of the cray XC-40 intel knights landing based cori supercomputer at NERSC. In: Cray User Group Meeting (CUG), May 2017Google Scholar
  21. 21.
    Doerfler, D., Deslippe, J., Williams, S., Oliker, L., Cook, B., Kurth, T., Lobet, M., Malas, T., Vay, J.-L., Vincenti, H.: Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor. In: Taufer, M., Mohr, B., Kunkel, J.M. (eds.) ISC High Performance 2016. LNCS, vol. 9945, pp. 339–353. Springer, Cham (2016). doi: 10.1007/978-3-319-46079-6_24 CrossRefGoogle Scholar
  22. 22.
    Eddy, S.R.: Accelerated profile hmm searches. PLOS Comput. Biol. 7(10), 1–16 (2011). MathSciNetCrossRefGoogle Scholar
  23. 23.
    Edwards, R.G., Joo, B.: The Chroma software system for lattice QCD. Nucl. Phys. Proc. Suppl. 140, 832 (2005)CrossRefGoogle Scholar
  24. 24.
    Friesen, B., Almgren, A., Lukić, Z., Weber, G., Morozov, D., Beckner, V., Day, M.: In situ and in-transit analysis of cosmological simulations. Comput. Astrophys. Cosmol. 3, 4 (2016)CrossRefGoogle Scholar
  25. 25.
    Giannozzi, P., Baroni, S., Bonini, N., Calandra, M., Car, R., Cavazzoni, C., Ceresoli, D., Chiarotti, G.L., Cococcioni, M., Dabo, I., Corso, A.D., de Gironcoli, S., Fabris, S., Fratesi, G., Gebauer, R., Gerstmann, U., Gougoussis, C., Kokalj, A., Lazzeri, M., Martin-Samos, L., Marzari, N., Mauri, F., Mazzarello, R., Paolini, S., Pasquarello, A., Paulatto, L., Sbraccia, C., Scandolo, S., Sclauzero, G., Seitsonen, A.P., Smogunov, A., Umari, P., Wentzcovitch, R.M.: Quantum espresso: a modular and open-source software project for quantum simulations of materials. J. Phys. Condens. Matter 21(39), 395502 (2009). CrossRefGoogle Scholar
  26. 26.
    Gygi, F.: Architecture of Qbox: a scalable first-principles molecular dynamics code. IBM J. Res. Dev. 52(1/2), 137–144 (2008). CrossRefGoogle Scholar
  27. 27.
    Hager, R., Yoon, E., Ku, S., D’Azevedo, E., Worley, P., Chang, C.: A fully non-linear multi-species fokkerplancklandau collision operator for simulation of fusion plasma. J. Comput. Phys. 315, 644–660 (2016). MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Hurrell, J., Holland, M., Gent, P., Ghan, S., Kay, J., Kushner, P., Lamarque, J.F., Large, W., Lawrence, D., Lindsay, K., Lipscomb, W., Long, M., Mahowald, N., Marsh, D., Neale, R., Rasch, P., Vavrus, S., Vertenstein, M., Bader, D., Collins, W., Hack, J., Kiehl, J., Marshall, S.: The community earth system model: a framework for collaborative research. Bull. Am. Meteorol. Soc. 94, 1339–1360 (2013)CrossRefGoogle Scholar
  29. 29.
    Joó, B.: qphix package web page.
  30. 30.
    Joó, B.: qphix-codegen package web page.
  31. 31.
    Kresse, G., Furthmueller, J.: Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 6(1), 15–50 (1996). CrossRefGoogle Scholar
  32. 32.
    Ku, S., Chang, C., Diamond, P.: Full-f gyrokinetic particle simulation of centrally heated global itg turbulence from magnetic axis to edge pedestal top in a realistic tokamak geometry. Nucl. Fusion 49(11), 115021 (2009)CrossRefGoogle Scholar
  33. 33.
    Lukić, Z., Stark, C.W., Nugent, P., White, M., Meiksin, A.A., Almgren, A.: The Lyman \(\alpha \) forest in optically thin hydrodynamical simulations. Mon. Not. R. Astron. Soc. 446, 3697–3724 (2015)CrossRefGoogle Scholar
  34. 34.
    Maris, P., Caprio, M.A., Vary, J.P.: Emergence of rotational bands in ab initio no-core configuration interaction calculations of the Be isotopes. Phys. Rev. C 91(1), 014310 (2015)CrossRefGoogle Scholar
  35. 35.
    Maris, P., Vary, J.P., Navratil, P., Ormand, W.E., Nam, H., Dean, D.J.: Origin of the anomalous long lifetime of \(^{14}\)C. Phys. Rev. Lett. 106(20), 202502 (2011)CrossRefGoogle Scholar
  36. 36.
    Maris, P., Vary, J.P., Gandolfi, S., Carlson, J., Pieper, S.C.: Properties of trapped neutrons interacting with realistic nuclear Hamiltonians. Phys. Rev. C 87(5), 054318 (2013)CrossRefGoogle Scholar
  37. 37.
    Petersen, M.R., Jacobsen, D.W., Ringler, T.D., Hecht, M.W., Maltrud, M.E.: Evaluation of the arbitrary lagrangian-eulerian vertical coordinate method in the MPAS-ocean model. Ocean Modell. 86, 93–113 (2015). CrossRefGoogle Scholar
  38. 38.
    Petrov, P.V., Newman, G.A.: Three-dimensional inverse modelling of damped elastic wave propagation in the fourier domain. Geophys. J. Int. 198(3), 1599–1617 (2014)CrossRefGoogle Scholar
  39. 39.
    Petrov, P.V., Newman, G.A.: 3D finite-difference modeling of elastic wave propagation in the laplace-fourier domain. GEOPHYSICS 77(4), T137–T155 (2012). CrossRefGoogle Scholar
  40. 40.
    Pronk, S., Pll, S., Schulz, R., Larsson, P., Bjelkmar, P., Apostolov, R., Shirts, M.R., Smith, J.C., Kasson, P.M., van der Spoel, D., Hess, B., Lindahl, E.: Gromacs 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29(7), 845 (2013). CrossRefGoogle Scholar
  41. 41.
    Ringler, T., Petersen, M., Higdon, R.L., Jacobsen, D., Jones, P.W., Maltrud, M.: A multi-resolution approach to global ocean modeling. Ocean Model. 69, 211–232 (2013). CrossRefGoogle Scholar
  42. 42.
    Straalen, B.V., Trebotich, D., Ovsyannikov, A., Graves, D.T.: Scalable structured adaptive mesh refinement with complex geometry. In: Exascale Scientific Applications: Programming Approaches for Scalability Performance and Portability. CRC Press (in press)Google Scholar
  43. 43.
    Trebotich, D., Adams, M.F., Molins, S., Steefel, C.I., Chaopeng, S.: High-resolution simulation of pore-scale reactive transport processes associated with carbon sequestration. Comput. Sci. Eng. 16(6), 22–31 (2014)CrossRefGoogle Scholar
  44. 44.
    Trebotich, D., Graves, D.: An adaptive finite volume method for the incompressible Navier-Stokes equations in complex geometries. Commun. Appl. Math. Comput. Sci. 10(1), 43–82 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  45. 45.
    Vincenti, H., Lobet, M., Lehe, R., Sasanka, R., Vay, J.L.: An efficient and portable SIMD algorithm for charge/current deposition in particle-in-cell codes. Comput. Phys. Commun. 210, 145–154 (2017). CrossRefzbMATHGoogle Scholar
  46. 46.
    Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). CrossRefGoogle Scholar
  47. 47.
    Williams, S.W.: Auto-tuning Performance on Multicore Computers. Ph.D. thesis, EECS Department, University of California, Berkeley, December 2008.

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Thorsten Kurth
    • 1
    Email author
  • William Arndt
    • 1
  • Taylor Barnes
    • 1
  • Brandon Cook
    • 1
  • Jack Deslippe
    • 1
  • Doug Doerfler
    • 1
  • Brian Friesen
    • 1
  • Yun (Helen) He
    • 1
  • Tuomas Koskela
    • 1
  • Mathieu Lobet
    • 1
  • Tareq Malas
    • 1
  • Leonid Oliker
    • 2
  • Andrey Ovsyannikov
    • 1
  • Samuel Williams
    • 2
  • Woo-Sun Yang
    • 1
  • Zhengji Zhao
    • 1
  1. 1.National Energy Research Scientific Computing CenterBerkeleyUSA
  2. 2.Computational Research DivisionLawrence Berkeley National LabBerkeleyUSA

Personalised recommendations