Abstract
NERSC has partnered with over 20 representative application developer teams to evaluate and optimize their workloads on the Intel® Xeon Phi™Knights Landing processor. In this paper, we present a summary of this two year effort and will present the lessons we learned in that process. We analyze the overall performance improvements of these codes quantifying impacts of both Xeon Phi™architectural features as well as code optimization on application performance. We show that the architectural advantage, i.e. the average speedup of optimized code on KNL vs. optimized code on Haswell is about 1.1\(\times \). The average speedup obtained through application optimization, i.e. comparing optimized vs. original codes on KNL, is about 5\(\times \).
Notes
- 1.
This includes the subsets F, CD, ER, PF but not VL, BW, DQ, IFMA, VBMI.
- 2.
numactl -p 1 mimics the behavior of numactl -m 1 but it is safer as it will not abort execution if there is no remaining free space in MCDRAM.
References
BerkeleyGW Website. http://www.berkeleygw.org
CESM Web Site. http://www.cesm.ucar.edu
DOE Public Access Plan. https://energy.gov/downloads/doe-public-access-plan
GROMACS Web Site. http://www.gromacs.org
HMMER Web Site. http://hmmer.org/
MILC Website. http://physics.indiana.edu/~sg/milc.html
NERSC and DOE Requirements Reviews Series. http://www.nersc.gov/science/hpc-requirements-reviews/
NERSC NESAP applications. http://www.nersc.gov/users/computational-systems/cori/nesap/nesap-projects/
NERSC NESAP case studies. http://www.nersc.gov/users/computational-systems/cori/application-porting-and-performance/application-case-studies/
NERSC Web Site. http://www.nersc.gov
QBox Web Site. http://qboxcode.org
Warp Web Site. http://warp.lbl.gov
XGC1 Web Site. http://epsi.pppl.gov/computing/xgc-1
Almgren, A.S., Bell, J.B., Lijewski, M.J., Lukić, Z., Van Andel, E.: Nyx: a massively parallel AMR code for computational cosmology. Astrophys. J. 765, 39 (2013)
Barnes, T., Cook, B., Deslippe, J., Doerfler, D., Friesen, B., He, Y.H., Kurth, T., Koskela, T., Lobet, M., Malas, T., Oliker, L., Ovsyannikov, A., Sarje, A., Vay, J.L., Vincenti, H., Williams, S., Carrier, P., Wichmann, N., Wagner, M., Kent, P., Kerr, C., Dennis, J.: Evaluating and optimizing the NERSC workload on knights landing. In: Proceedings of the 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, PMBS 2016, pp. 43–53. IEEE Press (2016)
Barnes, T.A., Kurth, T., Carrier, P., Wichmann, N., Prendergast, D., Kent, P.R., Deslippe, J.: Improved treatment of exact exchange in quantum espresso. Comput. Phys. Commun. 214, 52–58 (2017)
Bauer, B., Gottlieb, S., Hoefler, T.: Performance modeling and comparative analysis of the MILC Lattice QCD application su3_rmd. In: Proceedings of CCGRID 2012: IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (2012)
Binder, S., Calci, A., Epelbaum, E., Furnstahl, R.J., Golak, J., Hebeler, K., Kamada, H., Krebs, H., Langhammer, J., Liebig, S., Maris, P., Meißner, U.G., Minossi, D., Nogga, A., Potter, H., Roth, R., Skinińki, R., Topolnicki, K., Vary, J.P., Witała, H.: Few-nucleon systems with state-of-the-art chiral nucleon-nucleon forces. Phys. Rev. C 93(4), 044002 (2016)
Deslippe, J., Samsonidze, G., Strubbe, D.A., Jain, M., Cohen, M.L., Louie, S.G.: Berkeleygw: a massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures. Comput. Phys. Commun. 183(6), 1269–1289 (2012). http://www.sciencedirect.com/science/article/pii/S0010465511003912
Doerfler, D., Austin, B., Cook, B., Deslippe, J., Kandalla, K., Mendygral, P.: Evaluating the networking characteristics of the cray XC-40 intel knights landing based cori supercomputer at NERSC. In: Cray User Group Meeting (CUG), May 2017
Doerfler, D., Deslippe, J., Williams, S., Oliker, L., Cook, B., Kurth, T., Lobet, M., Malas, T., Vay, J.-L., Vincenti, H.: Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor. In: Taufer, M., Mohr, B., Kunkel, J.M. (eds.) ISC High Performance 2016. LNCS, vol. 9945, pp. 339–353. Springer, Cham (2016). doi:10.1007/978-3-319-46079-6_24
Eddy, S.R.: Accelerated profile hmm searches. PLOS Comput. Biol. 7(10), 1–16 (2011). https://doi.org/10.1371/journal.pcbi.1002195
Edwards, R.G., Joo, B.: The Chroma software system for lattice QCD. Nucl. Phys. Proc. Suppl. 140, 832 (2005)
Friesen, B., Almgren, A., Lukić, Z., Weber, G., Morozov, D., Beckner, V., Day, M.: In situ and in-transit analysis of cosmological simulations. Comput. Astrophys. Cosmol. 3, 4 (2016)
Giannozzi, P., Baroni, S., Bonini, N., Calandra, M., Car, R., Cavazzoni, C., Ceresoli, D., Chiarotti, G.L., Cococcioni, M., Dabo, I., Corso, A.D., de Gironcoli, S., Fabris, S., Fratesi, G., Gebauer, R., Gerstmann, U., Gougoussis, C., Kokalj, A., Lazzeri, M., Martin-Samos, L., Marzari, N., Mauri, F., Mazzarello, R., Paolini, S., Pasquarello, A., Paulatto, L., Sbraccia, C., Scandolo, S., Sclauzero, G., Seitsonen, A.P., Smogunov, A., Umari, P., Wentzcovitch, R.M.: Quantum espresso: a modular and open-source software project for quantum simulations of materials. J. Phys. Condens. Matter 21(39), 395502 (2009). http://stacks.iop.org/0953-8984/21/i=39/a=395502
Gygi, F.: Architecture of Qbox: a scalable first-principles molecular dynamics code. IBM J. Res. Dev. 52(1/2), 137–144 (2008). http://dl.acm.org/citation.cfm?id=1375990.1376003
Hager, R., Yoon, E., Ku, S., D’Azevedo, E., Worley, P., Chang, C.: A fully non-linear multi-species fokkerplancklandau collision operator for simulation of fusion plasma. J. Comput. Phys. 315, 644–660 (2016). http://www.sciencedirect.com/science/article/pii/S0021999116300298
Hurrell, J., Holland, M., Gent, P., Ghan, S., Kay, J., Kushner, P., Lamarque, J.F., Large, W., Lawrence, D., Lindsay, K., Lipscomb, W., Long, M., Mahowald, N., Marsh, D., Neale, R., Rasch, P., Vavrus, S., Vertenstein, M., Bader, D., Collins, W., Hack, J., Kiehl, J., Marshall, S.: The community earth system model: a framework for collaborative research. Bull. Am. Meteorol. Soc. 94, 1339–1360 (2013)
Joó, B.: qphix package web page. http://jeffersonlab.github.io/qphix
Joó, B.: qphix-codegen package web page. http://jeffersonlab.github.io/qphix-codegen
Kresse, G., Furthmueller, J.: Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 6(1), 15–50 (1996). http://www.sciencedirect.com/science/article/pii/0927025696000080
Ku, S., Chang, C., Diamond, P.: Full-f gyrokinetic particle simulation of centrally heated global itg turbulence from magnetic axis to edge pedestal top in a realistic tokamak geometry. Nucl. Fusion 49(11), 115021 (2009)
Lukić, Z., Stark, C.W., Nugent, P., White, M., Meiksin, A.A., Almgren, A.: The Lyman \(\alpha \) forest in optically thin hydrodynamical simulations. Mon. Not. R. Astron. Soc. 446, 3697–3724 (2015)
Maris, P., Caprio, M.A., Vary, J.P.: Emergence of rotational bands in ab initio no-core configuration interaction calculations of the Be isotopes. Phys. Rev. C 91(1), 014310 (2015)
Maris, P., Vary, J.P., Navratil, P., Ormand, W.E., Nam, H., Dean, D.J.: Origin of the anomalous long lifetime of \(^{14}\)C. Phys. Rev. Lett. 106(20), 202502 (2011)
Maris, P., Vary, J.P., Gandolfi, S., Carlson, J., Pieper, S.C.: Properties of trapped neutrons interacting with realistic nuclear Hamiltonians. Phys. Rev. C 87(5), 054318 (2013)
Petersen, M.R., Jacobsen, D.W., Ringler, T.D., Hecht, M.W., Maltrud, M.E.: Evaluation of the arbitrary lagrangian-eulerian vertical coordinate method in the MPAS-ocean model. Ocean Modell. 86, 93–113 (2015). http://www.sciencedirect.com/science/article/pii/S1463500314001796
Petrov, P.V., Newman, G.A.: Three-dimensional inverse modelling of damped elastic wave propagation in the fourier domain. Geophys. J. Int. 198(3), 1599–1617 (2014)
Petrov, P.V., Newman, G.A.: 3D finite-difference modeling of elastic wave propagation in the laplace-fourier domain. GEOPHYSICS 77(4), T137–T155 (2012). http://dx.doi.org/10.1190/geo2011-0238.1
Pronk, S., Pll, S., Schulz, R., Larsson, P., Bjelkmar, P., Apostolov, R., Shirts, M.R., Smith, J.C., Kasson, P.M., van der Spoel, D., Hess, B., Lindahl, E.: Gromacs 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29(7), 845 (2013). http://dx.doi.org/10.1093/bioinformatics/btt055
Ringler, T., Petersen, M., Higdon, R.L., Jacobsen, D., Jones, P.W., Maltrud, M.: A multi-resolution approach to global ocean modeling. Ocean Model. 69, 211–232 (2013). http://www.sciencedirect.com/science/article/pii/S1463500313000760
Straalen, B.V., Trebotich, D., Ovsyannikov, A., Graves, D.T.: Scalable structured adaptive mesh refinement with complex geometry. In: Exascale Scientific Applications: Programming Approaches for Scalability Performance and Portability. CRC Press (in press)
Trebotich, D., Adams, M.F., Molins, S., Steefel, C.I., Chaopeng, S.: High-resolution simulation of pore-scale reactive transport processes associated with carbon sequestration. Comput. Sci. Eng. 16(6), 22–31 (2014)
Trebotich, D., Graves, D.: An adaptive finite volume method for the incompressible Navier-Stokes equations in complex geometries. Commun. Appl. Math. Comput. Sci. 10(1), 43–82 (2015)
Vincenti, H., Lobet, M., Lehe, R., Sasanka, R., Vay, J.L.: An efficient and portable SIMD algorithm for charge/current deposition in particle-in-cell codes. Comput. Phys. Commun. 210, 145–154 (2017). http://www.sciencedirect.com/science/article/pii/S0010465516302764
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). http://doi.acm.org/10.1145/1498765.1498785
Williams, S.W.: Auto-tuning Performance on Multicore Computers. Ph.D. thesis, EECS Department, University of California, Berkeley, December 2008. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-164.html
Acknowledgement
Research used resources of NERSC, a DOE Office of Science User Facility supported by the Office of Science of the U.S. DOE under Contract No. DE-AC02-05CH11231. This article has been authored at Lawrence Berkeley National Lab under Contract No. DE-AC02-05CH11231 and UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the United States Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan [3].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Kurth, T. et al. (2017). Analyzing Performance of Selected NESAP Applications on the Cori HPC System. In: Kunkel, J., Yokota, R., Taufer, M., Shalf, J. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10524. Springer, Cham. https://doi.org/10.1007/978-3-319-67630-2_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-67630-2_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67629-6
Online ISBN: 978-3-319-67630-2
eBook Packages: Computer ScienceComputer Science (R0)