Williams, S., et al.: CACM 52(4), 65–76 (2009)
Google Scholar
Ilic, A., et al.: IEEE Comput. Architect. Lett. 12(1), 21–24 (2013)
CrossRef
Google Scholar
Marques, D., et al.: Performance analysis with cache-aware roofline model in intel advisor. In: 2017 International Conference on High Performance Computing & Simulation (HPCS), pp. 898–907. IEEE, 17 July 2017
Google Scholar
Doerfler, D., et al.: Applying the roofline performance model to the intel xeon phi knights landing processor. In: ISC Workshops (2016)
Google Scholar
Intel Advisor Roofline. https://software.intel.com/en-us/articles/intel-advisor-roofline
Intel(r) Advisor Roofline Analysis. CodeProject, February 2017 https://www.codeproject.com/Articles/1169323/Intel-Advisor-Roofline-Analysis
How to use Intel Advisor Python. Intel Developer Zone, June 2017. https://software.intel.com/en-us/articles/how-to-use-the-intel-advisor-python-api
Koskela, T., et al.: Performance tuning of scientific codes with the roofline model. Tutorial in SC 2017 (2017). http://bit.ly/tut160, https://sc17.supercomputing.org/full-program/
Koskela, T., et al.: A practical approach to application performance tuning with the Roofline Model, Tutorial submitted to ISC 2018 (2018)
Google Scholar
Classical molecular dynamics proxy application, Exascale Co-Design Center for Materials in Extreme Environments. exmatex.org, https://github.com/ECP-copa/CoMD.git
Ku, S., et al.: Nuclear Fusion, vol. 49 no. 11, Article 115021 (2009)
Google Scholar
Koskela, T., Deslippe, J.: Optimizing fusion PIC code performance at scale on cori phase two. In: Kunkel, J.M., Yokota, R., Taufer, M., Shalf, J. (eds.) ISC High Performance 2017. LNCS, vol. 10524, pp. 430–440. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67630-2_32
CrossRef
Google Scholar
https://software.intel.com/en-us/articles/intel-xeon-processor-scalable-family-technical-overview
Kresse, G., Furthmüller, J.: Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mat. Sci. 6, 15 (1996)
CrossRef
Google Scholar
http://www.vasp.at/
Wende, F., Marsman, M., Zhao, Z., Kim, J.: Porting VASP from MPI to MPI+OpenMP [SIMD]. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 107–122. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_8
CrossRef
Google Scholar
Shan, H., et al.: Parallel implementation and performance optimization of the configuration-interaction method. In: Supercomputing (SC) (2015)
Google Scholar
Johansen, H., et al.: Toward exascale earthquake ground motion simulations for near-fault engineering analysis. Comput. Sci. Eng. 19(5), 27 (2017)
CrossRef
Google Scholar
Mohd-Yusof, J.: CoDesign Molecular Dynamics (CoMD) Proxy App, LA-UR-12-21782, Los Alamos National Lab (2012)
Google Scholar
Cicotti, P., et al.: An evaluation of threaded models for a classical MD proxy application. In: 2014 Hardware-Software Co-Design for High Performance Computing, New Orleans, LA, pp. 41–48 (2014). https://doi.org/10.1109/Co-HPC.2014.6
Adedoyin, A.: A Case Study on Software Modernizationusing CoMD - A Molecular Dynamics Proxy Application, LA-UR-17-22676, Los Alamos National Lab (2017)
Google Scholar
Gunter, D., Adedoyin, A.: Kokkos Port of CoMD Mini-App, DOE COE Performance Portability Meeting (2017)
Google Scholar
Germann, T.C., et al.: 369 Tflop-s molecular dynamics simulations on the petaflop hybrid supercomputer ‘Roadrunner’. Concurrency Comput. Pract. Experience 21(17), 2143–2159 (2009)
CrossRef
Google Scholar
https://berkeleygw.org
https://github.com/cyanguwa/BerkeleyGW-GPP
Soininen, J.A., et al.: Electron self-energy calculation using a general multi-pole approximation. J. Phys. Condensed Matter 15(17), 2573 (2003)
CrossRef
Google Scholar
Treibig, J., Hager, G.: Introducing a performance model for bandwidth-limited loop kernels. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009. LNCS, vol. 6067, pp. 615–624. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14390-8_64
CrossRef
Google Scholar
http://icl.cs.utk.edu/papi
https://github.com/RRZE-HPC/likwid
Culler, D., et al.: LogP: towards a realistic model of parallel computation. In: PPoPP (1993)
CrossRef
Google Scholar
Alexandrov, A., et al.: LogGP: incorporating long messages into the LogP model. JPDC 44(1), 71–79 (1997)
MathSciNet
Google Scholar
Altaf, M.B., Wood, D.A.: LogCA: a performance model for hardware accelerators. In: ISCA (2017)
Google Scholar
Shende, S., Malony, A.: The TAU parallel performance system. IJHPCA 20(2), 287–311 (2005)
Google Scholar
Adhianto, L., et al.: HPCToolkit: performance measurement and analysis for supercomputers with node-level parallelism. In: Workshop on Node Level Parallelism for Large Scale Supercomputers (2008)
Google Scholar
http://docs.cray.com