Optimizing Excited-State Electronic-Structure Codes for Intel Knights Landing: A Case Study on the BerkeleyGW Software
- Cite this paper as:
- Deslippe J. et al. (2016) Optimizing Excited-State Electronic-Structure Codes for Intel Knights Landing: A Case Study on the BerkeleyGW Software. In: Taufer M., Mohr B., Kunkel J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science, vol 9945. Springer, Cham
We profile and optimize calculations performed with the BerkeleyGW [2, 3] code on the Xeon-Phi architecture. BerkeleyGW depends both on hand-tuned critical kernels as well as on BLAS and FFT libraries. We describe the optimization process and performance improvements achieved. We discuss a layered parallelization strategy to take advantage of vector, thread and node-level parallelism. We discuss locality changes (including the consequence of the lack of L3 cache) and effective use of the on-package high-bandwidth memory. We show preliminary results on Knights-Landing including a roofline study of code performance before and after a number of optimizations. We find that the GW method is particularly well-suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-wave components, band-pairs, and frequencies.