Advertisement

Performance and Energy Usage of Workloads on KNL and Haswell Architectures

  • Tyler AllenEmail author
  • Christopher S. Daley
  • Douglas Doerfler
  • Brian Austin
  • Nicholas J. Wright
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10724)

Abstract

Manycore architectures are an energy-efficient step towards exascale computing within a constrained power budget. The Intel Knights Landing (KNL) manycore chip is a specific example of this and has seen early adoption by a number of HPC facilities. It is therefore important to understand the performance and energy usage characteristics of KNL. In this paper, we evaluate the performance and energy efficiency of KNL in contrast to the Xeon (Haswell) architecture for applications representative of the workload of users at NERSC. We consider the optimal MPI/OpenMP configuration of each application and use the results to characterize KNL in contrast to Haswell. As well as traditional DDR memory, KNL contains MCDRAM and we also evaluate its efficacy. Our results show that, averaged over our benchmarks, KNL is 1.84\(\times \) more energy efficient than Haswell and has 1.27\(\times \) greater performance.

Keywords

Benchmarking Power consumption Energy Hyperthreads Manycore architecture Intel Knights Landing Haswell 

Notes

Acknowledgment

This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
    STREAM: Sustainable Memory Bandwidth in High Performance Computers. https://www.cs.virginia.edu/stream/FTP/Code/
  7. 7.
    Agelastos, A.M., Rajan, M., Wichmann, N., Baker, R., Domino, S., Draeger, E.W., Anderson, S., Balma, J., Behling, S., Berry, M., Carrier, P., Davis, M., McMahon, K., Sandness, D., Thomas, K., Warren, S., Zhu, T.: Performance on Trinity phase 2 (a Cray XC40 utilizing Intel Xeon Phi processors) with acceptance applications and benchmarks. In: Cray User Group CUG, May 2017. https://cug.org/proceedings/cug2017_proceedings/includes/files/pap138s2-file1.pdf
  8. 8.
    Almgren, A.S., Beckner, V.E., Bell, J.B., Day, M.S., Howell, L.H., Joggerst, C.C., Lijewski, M.J., Nonaka, A., Singer, M., Zingale, M.: CASTRO: A new compressible astrophysical solver. I. hydrodynamics and self-gravity. Astrophys. J. 715, 1221–1238 (2010)CrossRefGoogle Scholar
  9. 9.
    Almgren, A.S., Bell, J.B., Lijewski, M.J., Lukić, Z., Andel, E.V.: Nyx: A massively parallel AMR code for computational cosmology. Astrophys. J. 765(1), 39 (2013). http://stacks.iop.org/0004-637X/765/i=1/a=39
  10. 10.
  11. 11.
    Austin, B., Wright, N.J.: Measurement and interpretation of microbenchmark and application energy use on the Cray XC30. In: Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing, pp. 51–59. IEEE Press (2014)Google Scholar
  12. 12.
    Barnes, T., Cook, B., Deslippe, J., Doerfler, D., Friesen, B., He, Y., Kurth, T., Koskela, T., Lobet, M., Malas, T., Oliker, L., Ovsyannikov, A., Sarje, A., Vay, J.L., Vincenti, H., Williams, S., Carrier, P., Wichmann, N., Wagner, M., Kent, P., Kerr, C., Dennis, J.: Evaluating and optimizing the NERSC workload on knights landing. In: 2016 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pp. 43–53, November 2016Google Scholar
  13. 13.
    Bauer, B., Gottlieb, S., Hoefler, T.: Performance modeling and comparative analysis of the MILC Lattice QCD application su3_rmd. In: Proceedings CCGRID2012: IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (2012)Google Scholar
  14. 14.
    Coghlan, S., Kumaran, K., Loy, R.M., Messina, P., Morozov, V., Osborn, J.C., Parker, S., Riley, K.M., Romero, N.A., Williams, T.J.: Argonne applications for the IBM Blue Gene/Q, Mira. IBM J. Res. Dev. 57(1/2), 12:1–12:11 (2013)CrossRefGoogle Scholar
  15. 15.
    LANL Trinity Supercomputer. http://www.lanl.gov/projects/trinity/
  16. 16.
    NERSC Cori Supercomputer. https://www.nersc.gov/systems/cori/
  17. 17.
    Cray XC Series Supercomputers. http://www.cray.com/products/computing/xc-series
  18. 18.
    Evangelinos, C., Walkup, R.E., Sachdeva, V., Jordan, K.E., Gahvari, H., Chung, I.H., Perrone, M.P., Lu, L., Liu, L.K., Magerlein, K.: Determination of performance characteristics of scientific applications on IBM Blue Gene/Q. IBM J. Res. Dev. 57(1), 99–110 (2013).  https://doi.org/10.1147/JRD.2012.2229901
  19. 19.
  20. 20.
    Fuerlinger, K., Wright, N.J., Skinner, D.: Effective performance measurement at petascale using IPM. In: 2010 IEEE 16th International Conference on Parallel and Distributed Systems, pp. 373–380, December 2010Google Scholar
  21. 21.
    Fürlinger, K., Wright, N.J., Skinner, D.: Performance analysis and workload characterization with IPM. In: Müller, M., Resch, M., Schulz, A., Nagel, W. (eds.) Tools for High Performance Computing 2009, pp. 31–38. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-11261-4_3
  22. 22.
    Fürlinger, K., Wright, N.J., Skinner, D., Klausecker, C., Kranzlmüller, D.: Effective holistic performance measurement at petascale using IPM. In: Bischof, C., Hegering, H.G., Nagel, W., Wittum, G. (eds.) Competence in High Performance Computing 2010, pp. 15–26. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-24025-6_2
  23. 23.
    Giannozzi, P., Baroni, S., Bonini, N., Calandra, M., Car, R., Cavazzoni, C., Ceresoli, D., Chiarotti, G.L., Cococcioni, M., Dabo, I., Dal Corso, A., de Gironcoli, S., Fabris, S., Fratesi, G., Gebauer, R., Gerstmann, U., Gougoussis, C., Kokalj, A., Lazzeri, M., Martin-Samos, L., Marzari, N., Mauri, F., Mazzarello, R., Paolini, S., Pasquarello, A., Paulatto, L., Sbraccia, C., Scandolo, S., Sclauzero, G., Seitsonen, A.P., Smogunov, A., Umari, P., Wentzcovitch, R.M.: QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials. J. Phys. Condens. Matter 21(39), 395502 (19pp) (2009). http://www.quantum-espresso.org
  24. 24.
    Hackenberg, D., Oldenburg, R., Molka, D., Schöne, R.: Introducing FIRESTARTER: a processor stress test utility. In: 2013 International Green Computing Conference Proceedings, pp. 1–9, June 2013Google Scholar
  25. 25.
    He, Y., Cook, B., Deslippe, J., Friesen, B., Gerber, R., Hartman-Baker, R., Koniges, A., Kurth, T., Leak, S., Yang, W.S., Zhao, Z.: Preparing NERSC users for Cori, a Cray XC40 system with Intel many integrated cores. In: Cray User Group CUG, May 2017. https://cug.org/proceedings/cug2017_proceedings/includes/files/pap161s2-file1.pdf
  26. 26.
    Hill, P., Snyder, C., Sygulla, J.: KNL system software. In: Cray User Group CUG, May 2017. https://cug.org/proceedings/cug2017_proceedings/includes/files/pap169s2-file1.pdf
  27. 27.
    Jeffers, J., Reinders, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming: Knights, Landing edn. Morgan Kaufmann, Boston (2016)Google Scholar
  28. 28.
    Lawson, G., Sundriyal, V., Sosonkina, M., Shen, Y.: Runtime power limiting of parallel applications on Intel Xeon Phi Processors. In: 2016 4th International Workshop on Energy Efficient Supercomputing (E2SC), pp. 39–45, November 2016Google Scholar
  29. 29.
    Martin, S.J., Kappel, M.: Cray XC30 power monitoring and management. In: Cray User Group 2014 Proceedings (2014)Google Scholar
  30. 30.
    National Energy Research Scientific Computing Center. https://www.nersc.gov
  31. 31.
    Parker, S., Morozov, V., Chunduri, S., Harms, K., Knight, C., Kumaran, K.: Early evaluation of the Cray XC40 Xeon Phi System ‘Theta’ at Argonne. In: Cray User Group CUG, May 2017. https://cug.org/proceedings/cug2017_proceedings/includes/files/pap113s2-file1.pdf
  32. 32.
    Patwary, M.M.A., Dubey, P., Byna, S., Satish, N.R., Sundaram, N., Lukić, Z., Roytershteyn, V., Anderson, M.J., Yao, Y., Prabhat: BD-CATS: big data clustering at trillion particle scale. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC 2015, pp. 1–12. ACM Press, New York (2015). http://dl.acm.org/citation.cfm?doid=2807591.2807616
  33. 33.
    Peng, I.B., Gioiosa, R., Kestor, G., Laure, E., Markidis, S.: Exploring the Performance Benefit of Hybrid Memory System on HPC Environments. CoRR abs/1704.08273 (2017). http://arxiv.org/abs/1704.08273
  34. 34.
    Ramos, S., Hoefler, T.: Capability models for manycore memory systems: a case-study with Xeon Phi KNL. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 297–306, May 2017Google Scholar
  35. 35.
    Roberts, S.I., Wright, S.A., Fahmy, S.A., Jarvis, S.A.: Metrics for energy-aware software optimisation. In: Kunkel, J.M., Yokota, R., Balaji, P., Keyes, D. (eds.) ISC 2017. LNCS, vol. 10266, pp. 413–430. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-58667-0_22 CrossRefGoogle Scholar
  36. 36.
    Rush, D., Martin, S.J., Kappel, M., Sandstedt, M., Williams, J.: Cray XC40 power monitoring and control for knights landing. In: Cray User Group CUG, May 2017. https://cug.org/proceedings/cug2016_proceedings/includes/files/pap112s2-file1.pdf
  37. 37.
    Saini, S., Jin, H., Hood, R., Barker, D., Mehrotra, P., Biswas, R.: The impact of hyper-threading on processor resource utilization in production applications. In: Proceedings of the 2011 18th International Conference on High Performance Computing, pp. 1–10, HIPC 2011, IEEE Computer Society, Washington, DC, USA (2011).  https://doi.org/10.1109/HiPC.2011.6152743
  38. 38.
    Sodani, A.: Knights landing (KNL): 2nd generation Intel Xeon Phi Processor. In: Hot Chips 27, Flint Center, Cupertino, CA, August 23–25 2015. http://www.hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday-Epub/HC27.25.70-Processors-Epub/HC27.25.710-Knights-Landing-Sodani-Intel.pdf
  39. 39.
    ANL Theta Supercomputer. https://www.alcf.anl.gov/theta
  40. 40.
    Wang, B., Ethier, S., Tang, W.M., Ibrahim, K.Z., Madduri, K., Williams, S., Oliker, L.: Modern Gyrokinetic Particle-In-Cell Simulation of Fusion Plasmas on Top Supercomputers. CoRR abs/1510.05546 (2015). http://arxiv.org/abs/1510.05546
  41. 41.
    Zhao, Z., Wright, N.J., Antypas, K.: Effects of hyper-threading on the NERSC workload on Edison. In: Cray User Group CUG, May 2013. https://www.nersc.gov/assets/CUG13HTpaper.pdf

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Tyler Allen
    • 1
    Email author
  • Christopher S. Daley
    • 2
  • Douglas Doerfler
    • 2
  • Brian Austin
    • 2
  • Nicholas J. Wright
    • 2
  1. 1.Clemson UniversityClemsonUSA
  2. 2.Lawrence Berkeley National LaboratoryBerkeleyUSA

Personalised recommendations