Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor

  • Douglas Doerfler
  • Jack Deslippe
  • Samuel Williams
  • Leonid Oliker
  • Brandon Cook
  • Thorsten Kurth
  • Mathieu Lobet
  • Tareq Malas
  • Jean-Luc Vay
  • Henri Vincenti
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9945)

Abstract

The Roofline Performance Model is a visually intuitive method used to bound the sustained peak floating-point performance of any given arithmetic kernel on any given processor architecture. In the Roofline, performance is nominally measured in floating-point operations per second as a function of arithmetic intensity (operations per byte of data). In this study we determine the Roofline for the Intel Knights Landing (KNL) processor, determining the sustained peak memory bandwidth and floating-point performance for all levels of the memory hierarchy, in all the different KNL cluster modes. We then determine arithmetic intensity and performance for a suite of application kernels being targeted for the KNL based supercomputer Cori, and make comparisons to current Intel Xeon processors. Cori is the National Energy Research Scientific Computing Center’s (NERSC) next generation supercomputer. Scheduled for deployment mid-2016, it will be one of the earliest and largest KNL deployments in the world.

Notes

Acknowledgments

This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

This material is based upon work supported by the Advanced Scientific Computing Research Program in the U.S. Department of Energy, Office of Science, under Award Number DE-AC02-05CH11231.

J.D. was supported by the SciDAC Program on Excited State Phenomena in Energy Materials funded by the U. S. Department of Energy, Office of Basic Energy Sciences and of Advanced Scientific Computing Research, under Contract No. DE-AC02-05CH11231 at Lawrence Berkeley National Laboratory

References

  1. 1.
    Aktulga, H.M., Buluc, A., Williams, S., Yang, C.: Optimizing sparse matrix-multiple vector multiplication for nuclear configuration interaction calculations. In: International Parallel and Distributed Processing Symposium (IPDPS 2014), May 2014Google Scholar
  2. 2.
    Aktulga, H.M., Yang, C., Ng, E.G., Maris, P., Vary, J.P.: Improving the scalability of a symmetric iterative eigensolver for multi-core platforms. Concurrency Comput. Pract. Exp. 26(16), 2631–2651 (2014). doi: 10.1002/cpe.3129 CrossRefGoogle Scholar
  3. 3.
    Carr, S., Kennedy, K.: Improving the ratio of memory operations to floating-point operations in loops. ACM Trans. Program. Lang. Syst. 16(6), 1768–1810 (1994). http://doi.acm.org/10.1145/197320.197366 CrossRefGoogle Scholar
  4. 4.
    Birdsall, C.K., Langdon, A.B.: Plasma Physics Via Computer Simulation. Series in Plasma Physics. CRC Press, Boca Raton (2005)Google Scholar
  5. 5.
    Cray xc series supercomputers. http://www.cray.com/products/computing/xc-series
  6. 6.
    Deslippe, J., Samsonidze, G., Strubbe, D.A., Jain, M., Cohen, M.L., Louie, S.G.: Berkeleygw: a massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures. Comput. Phys. Commun. 183(6), 1269–1289 (2012). http://www.sciencedirect.com/science/article/pii/S0010465511003912 CrossRefGoogle Scholar
  7. 7.
    Doerfler, D.: Understanding application data movement characteristics using intel vtune amplifier and software development emulator tools. In: IXPUG 2015, Berkeley, CA, September 28 - October 2 2015Google Scholar
  8. 8.
    Hill, M.D., Smith, A.J.: Evaluating associativity in CPU caches. IEEE Trans. Comput. 38(12), 1612–1630 (1989)CrossRefGoogle Scholar
  9. 9.
    Lawrence Berkeley National Laboratory.: Warp website. http://warp.lbl.gov
  10. 10.
  11. 11.
    Malas, T., Kurth, T., Deslippe, J.: Optimization of the sparse matrix-vector products of an idr krylov iterative solver for the intel knl manycore processor (in preparation)Google Scholar
  12. 12.
    Maris, P., Aktulga, H.M., Caprio, M.A., Çatalyürek, Ü.V., Ng, E.G., Oryspayev, D., Potter, H., Saule, E., Sosonkina, M., Vary, J.P., Yang, C., Zhou, Z.: Large-scale ab initio configuration interaction calculations for light nuclei. J. Phys. Conf. Ser. 403(1), 012019 (2012). http://stacks.iop.org/1742-6596/403/i=1/a=012019 CrossRefGoogle Scholar
  13. 13.
  14. 14.
  15. 15.
  16. 16.
    Petrov, P.V., Newman, G.A.: 3d finite-difference modeling of elasticwave propagation in the laplace-fourier domain. Geophysics 77(4), T137–T155 (2012). doi: 10.1190/geo2011-0238.1 CrossRefGoogle Scholar
  17. 17.
    Raman, K.: Calculating “flop” using intel software developmentemulator (intelsde) (March 2015). https://software.intel.com/en-us/articles/calculating-flop-using-intel-software-development-emulator-intel-sde
  18. 18.
    Sodani, A.: Knights landing (knl): 2nd generation intel xeon phiprocessor. In: Hot Chips 27. Flint Center, Cupertino, August 23rd-25th 2015. http://www.hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday-Epub/HC27.25.70-Processors-Epub/HC27.25.710-Knights-Landing-Sodani-Intel.pdf
  19. 19.
  20. 20.
    Vincenti, H., Lehe, R., Sasanka, R., Vay, J.: An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes. ArXiv e-prints, January 2016Google Scholar
  21. 21.
    Williams, S.: Auto-tuning Performance on Multicore Computers. Ph.D. thesis, EECS Department, University of California, Berkeley, December 2008Google Scholar
  22. 22.
    Williams, S., Watterman, A., Patterson, D.: Roofline: an insightful visual performance model for floating-point programs and multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRefGoogle Scholar
  23. 23.
    Williams, S., Stralen, B.V., Ligocki, T., Oliker, L., Cordery, M., Lo, L.: Roofline performance model. http://crd.lbl.gov/departments/computer-science/PAR/research/roofline/

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Douglas Doerfler
    • 1
  • Jack Deslippe
    • 1
  • Samuel Williams
    • 1
  • Leonid Oliker
    • 1
  • Brandon Cook
    • 1
  • Thorsten Kurth
    • 1
  • Mathieu Lobet
    • 1
  • Tareq Malas
    • 1
  • Jean-Luc Vay
    • 1
  • Henri Vincenti
    • 1
  1. 1.Lawrence Berkeley National LaboratoryBerkeleyUSA

Personalised recommendations