A Performance Study of Quantum ESPRESSO’s PWscf Code on Multi-core and GPU Systems

  • Joshua RomeroEmail author
  • Everett Phillips
  • Gregory Ruetsch
  • Massimiliano Fatica
  • Filippo Spiga
  • Paolo Giannozzi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10724)


We describe the porting of PWscf (Plane-Wave Self Consistent Field), a key component of the Quantum ESPRESSO open-source suite of codes for materials modeling, to GPU systems using CUDA Fortran. Kernel loop directives (CUF kernels) have been extensively used in order to have a single source code for both CPU and GPU implementations. The results of the GPU version have been carefully validated and the performance of the code on several GPU systems (both x86 and POWER8 based) has been compared with traditional Intel multi-core (CPU only) systems. This current GPU version can reduce the time-to-solution by an average factor of 2–3 running two different input cases widely used as benchmarks on small and large high performance computing systems.


DFT Materials science Eigensolver GPU computing CUDA Fortran 



This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This work was also supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project ID g33. Wilkes-2 is part of the Cambridge Service for Data Driven Discovery (CSD3) system operated by the University of Cambridge Research Computing Service funded by EPSRC Tier-2 capital grant EP/P020259/1, the STFC DiRAC HPC Facility (BIS National E-infrastructure capital grant ST/K001590/1, STFC capital grants ST/H008861/1 and ST/H00887X/1, Operations grant ST/K00333X/1) and the University of Cambridge. CSD3 and DiRAC are part of the UK National e-Infrastructure. Paolo Giannozzi also acknowledges support from the European Union through the MaX Centre of Excellence (Grant No. 676598).


  1. 1.
    Auckenthaler, T., Blum, V., Bungartz, H.J., Huckle, T., Johanni, R., Krämer, L., Lang, B., Lederer, H., Willems, P.R.: Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations. Parallel Comput. 37(12), 783–794 (2011)CrossRefGoogle Scholar
  2. 2.
    Blackford, L.S., Choi, J., Cleary, A., D’Azeuedo, E., Demmel, J., Dhillon, I., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK User’s Guide. Society for Industrial and Applied Mathematics (1997)Google Scholar
  3. 3.
    Fatica, M.: Customize CUDA Fortran Profiling with NVTX (2015).
  4. 4.
    Fatica, M., Ruetsch, G.: CUDA Fortran for Scientists and Engineers. Morgan Kaufmann, Burlington (2014)Google Scholar
  5. 5.
    Froyen, S.: Brillouin-zone integration by Fourier quadrature: special points for superlattice and supercell calculations. Phys. Rev. B 39, 3168–3172 (1989)CrossRefGoogle Scholar
  6. 6.
    Giannozzi, P., Baroni, S., Bonini, N., Calandra, M., Car, R., Cavazzoni, C., Ceresoli, D., Chiarotti, G.L., Cococcioni, M., Dabo, I., et al.: QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials. J. Phys. Condensed Matter 21(39), 395502 (2009)CrossRefGoogle Scholar
  7. 7.
    Dongarra, J., Gates, M., Haidar, A., Kurzak, J., Luszczek, P., Tomov, S., Yamazaki, I.: Accelerating numerical dense linear algebra calculations with GPUs. In: Kindratenko, V. (ed.) Numerical Computations with GPUs, pp. 3–28. Springer, Cham (2014). Google Scholar
  8. 8.
    Johnson, D.D.: Modified Broyden’s method for accelerating convergence in self-consistent calculations. Phys. Rev. B 38, 12807–12813 (1988)CrossRefGoogle Scholar
  9. 9.
    Kohn, W.: Fundamentals of density functional theory. In: Joubert, D. (ed.) Density Functionals: Theory and Applications, pp. 1–7. Springer, Heidelberg (1998). Google Scholar
  10. 10.
    Kraus, J.: CUDA Pro Tip: generate custom application profile timelines with NVTX (2013).
  11. 11.
    Marek, A., Blum, V., Johanni, R., Havu, V., Lang, B., Auckenthaler, T., Heinecke, A., Bungartz, H.J., Lederer, H.: The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science. J. Phys. Condensed Matter 26(21), 213201 (2014)CrossRefGoogle Scholar
  12. 12.
    Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, Version 2.2. Technical report (2009).
  13. 13.
    Parr, R.G., Yang, W.: Density-Functional Theory of Atoms and Molecules (International Series of Monographs on Chemistry). Oxford University Press, New York (1994)Google Scholar
  14. 14.
    Pickett, W.E.: Pseudopotential methods in condensed matter applications. Comput. Phys. Rep. 9(3), 115–197 (1989)CrossRefGoogle Scholar
  15. 15.
    Romero, J.: Developing an Improved Generalized Eigensolver with Limited CPU Offloading. In: GPU Technology Conference, San Jose, CA (2017).
  16. 16.
    Spiga, F.: Plug-in code to accelerate Quantum ESPRESSO v5 using NVIDIA GPU.
  17. 17.
    Spiga, F.: Implementing and testing mixed parallel programming model into Quantum ESPRESSO. In: Science and Supercomputing in Europe - Research Highlights 2009, CINECA Consorzio Interuniversitario, Bologna, Italy (2010)Google Scholar
  18. 18.
    Spiga, F., Girotto, I.: phiGEMM: a CPU-GPU library for porting Quantum ESPRESSO on hybrid systems. In: 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp. 368–375 (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Joshua Romero
    • 1
    Email author
  • Everett Phillips
    • 1
  • Gregory Ruetsch
    • 1
  • Massimiliano Fatica
    • 1
  • Filippo Spiga
    • 2
  • Paolo Giannozzi
    • 3
  1. 1.NVIDIA CorporationSanta ClaraUSA
  2. 2.Research Computing ServiceUniversity of CambridgeCambridgeUK
  3. 3.Dip. Scienze Matematiche Informatiche e FisicheUniversity of UdineUdineItaly

Personalised recommendations