Optimizing Excited-State Electronic-Structure Codes for Intel Knights Landing: A Case Study on the BerkeleyGW Software

  • Jack Deslippe
  • Felipe H. da Jornada
  • Derek Vigil-Fowler
  • Taylor Barnes
  • Nathan Wichmann
  • Karthik Raman
  • Ruchira Sasanka
  • Steven G. Louie
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9945)

Abstract

We profile and optimize calculations performed with the BerkeleyGW [2, 3] code on the Xeon-Phi architecture. BerkeleyGW depends both on hand-tuned critical kernels as well as on BLAS and FFT libraries. We describe the optimization process and performance improvements achieved. We discuss a layered parallelization strategy to take advantage of vector, thread and node-level parallelism. We discuss locality changes (including the consequence of the lack of L3 cache) and effective use of the on-package high-bandwidth memory. We show preliminary results on Knights-Landing including a roofline study of code performance before and after a number of optimizations. We find that the GW method is particularly well-suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-wave components, band-pairs, and frequencies.

References

  1. 1.
  2. 2.
    Deslippe, J., Samsonidze, G., Strubbe, D.A., Jain, M., Cohen, M.L., Louie, S.G.: BerkeleyGW: a massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures. Comput. Phys. Commun. 183(6), 1269–1289 (2012)CrossRefGoogle Scholar
  3. 3.
  4. 4.
    Frigo, M., Steven, G.J.: FFTW: an adaptive software architecture for the FFT. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 3, pp. 1381–1384. IEEE (1998)Google Scholar
  5. 5.
    Giannozzi, P., Baroni, S., Bonini, N., Calandra, M., Car, R., Cavazzoni, C., Ceresoli, D., Chiarotti, G.L., Cococcioni, M., Dabo, I., Dal Corso, A., Fabris, S., Fratesi, G., de Gironcoli, S., Gebauer, R., Gerstmann, U., Gougoussis, C., Kokalj, A., Lazzeri, M., Martin-Samos, L., Marzari, N., Mauri, F., Mazzarello, R., Paolini, S., Pasquarello, A., Paulatto, L., Sbraccia, C., Scandolo, S., Sclauzero, G., Seitsonen, A.P., Smogunov, A., Umari, P., Wentzcovitch, R.M.: J. Phys.: Condens. Matter 21, 395502 (2009). http://dx.doi.org/10.1088/0953-8984/21/39/395502 Google Scholar
  6. 6.
    Hybertsen, M.S., Louie, S.G.: Electron correlation in semiconductors and insulators: band gaps and quasiparticle energies. Phys. Rev. B 34(8), 5390 (1986)CrossRefGoogle Scholar
  7. 7.
    Hybertsen, M.S., Louie, S.G.: First-principles theory of quasiparticles: calculation of band gaps in semiconductors and insulators. Phys. Rev. Lett. 55(13), 1418 (1985)CrossRefGoogle Scholar
  8. 8.
  9. 9.
    Kronik, L., Makmal, A., Tiago, M.L., Alemany, M.M.G., Jain, M., Huang, X., Saad, Y., Chelikowsky, J.R.: PARSEC the pseudopotential algorithm for realspace electronic structure calculations: recent advances and novel applications to nanostructures. Phys. Status Solidi (b) 243(5), 1063–1079 (2006)CrossRefGoogle Scholar
  10. 10.
  11. 11.
  12. 12.
  13. 13.
    Pfrommer, B., Raczkowski, D., Canning, A., Louie. S.G.: PARATEC (PARAllel Total Energy Code), Lawrence Berkeley National Laboratory (with contributions from Mauri, F., Cote, M., Yoon, Y., Pickard, C., Heynes, P.). For more information see www.nersc.gov/projects/paratec. There is no corresponding record for this reference
  14. 14.
    Raman, K.: Calculating “flop” using intel software development emulator (intel sde), March 2015. https://software.intel.com/en-us/articles/calculating-flop-using-intel-software-development-emulator-intel-sde
  15. 15.
    Soler, J.M., Artacho, E., Gale, J.D., Garca, A., Junquera, J., Ordejn, P., Snchez-Portal, D.: The SIESTA method for ab initio order-N materials simulation. J. Phys. Condens. Matter 14(11), 2745 (2002)CrossRefGoogle Scholar
  16. 16.
  17. 17.
    Williams, S.: Auto-tuning Performance on Multicore Computers. Ph.D. thesis, EECS Department, University of California, Berkeley, December 2008Google Scholar
  18. 18.
    Williams, S., Watterman, A., Patterson, D.: Roofline: An insightful visual performance model for floating-point programs and multicore architectures. Commun. ACM 52(4), April 2009Google Scholar
  19. 19.

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Jack Deslippe
    • 1
  • Felipe H. da Jornada
    • 2
  • Derek Vigil-Fowler
    • 3
  • Taylor Barnes
    • 1
  • Nathan Wichmann
    • 4
  • Karthik Raman
    • 5
  • Ruchira Sasanka
    • 5
  • Steven G. Louie
    • 2
  1. 1.NERSCLawrence Berkeley National LaboratoryBerkeleyUSA
  2. 2.Department of PhysicsUniversity of California at Berkeley, and Materials Sciences Division, Lawrence Berkeley National LaboratoryBerkeleyUSA
  3. 3.National Renewable Energy LaboratoryGoldenUSA
  4. 4.CraySaint PaulUSA
  5. 5.IntelHillsboroUSA

Personalised recommendations