Skip to main content

A Novel Multi-level Integrated Roofline Model Approach for Performance Characterization

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 10876)

Abstract

With energy-efficient architectures, including accelerators and many-core processors, gaining traction, application developers face the challenge of optimizing their applications for multiple hardware features including many-core parallelism, wide processing vector-units and on-chip high-bandwidth memory. In this paper, we discuss the development and utilization of a new application performance tool based on an extension of the classical roofline-model for simultaneously profiling multiple levels in the cache-memory hierarchy. This tool presents a powerful visual aid for the developer and can be used to frame the many-dimensional optimization problem in a tractable way. We show case studies of real scientific applications that have gained insights from the Integrated Roofline Model.

Keywords

  • Performance models
  • Application performance measurement
  • Roofline
  • Knights landing

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-92040-5_12
  • Chapter length: 20 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-92040-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.

References

  1. Williams, S., et al.: CACM 52(4), 65–76 (2009)

    Google Scholar 

  2. Ilic, A., et al.: IEEE Comput. Architect. Lett. 12(1), 21–24 (2013)

    CrossRef  Google Scholar 

  3. Marques, D., et al.: Performance analysis with cache-aware roofline model in intel advisor. In: 2017 International Conference on High Performance Computing & Simulation (HPCS), pp. 898–907. IEEE, 17 July 2017

    Google Scholar 

  4. Doerfler, D., et al.: Applying the roofline performance model to the intel xeon phi knights landing processor. In: ISC Workshops (2016)

    Google Scholar 

  5. Intel Advisor Roofline. https://software.intel.com/en-us/articles/intel-advisor-roofline

  6. Intel(r) Advisor Roofline Analysis. CodeProject, February 2017 https://www.codeproject.com/Articles/1169323/Intel-Advisor-Roofline-Analysis

  7. How to use Intel Advisor Python. Intel Developer Zone, June 2017. https://software.intel.com/en-us/articles/how-to-use-the-intel-advisor-python-api

  8. Koskela, T., et al.: Performance tuning of scientific codes with the roofline model. Tutorial in SC 2017 (2017). http://bit.ly/tut160, https://sc17.supercomputing.org/full-program/

  9. Koskela, T., et al.: A practical approach to application performance tuning with the Roofline Model, Tutorial submitted to ISC 2018 (2018)

    Google Scholar 

  10. Classical molecular dynamics proxy application, Exascale Co-Design Center for Materials in Extreme Environments. exmatex.org, https://github.com/ECP-copa/CoMD.git

  11. Ku, S., et al.: Nuclear Fusion, vol. 49 no. 11, Article 115021 (2009)

    Google Scholar 

  12. Koskela, T., Deslippe, J.: Optimizing fusion PIC code performance at scale on cori phase two. In: Kunkel, J.M., Yokota, R., Taufer, M., Shalf, J. (eds.) ISC High Performance 2017. LNCS, vol. 10524, pp. 430–440. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67630-2_32

    CrossRef  Google Scholar 

  13. https://software.intel.com/en-us/articles/intel-xeon-processor-scalable-family-technical-overview

  14. Kresse, G., Furthmüller, J.: Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mat. Sci. 6, 15 (1996)

    CrossRef  Google Scholar 

  15. http://www.vasp.at/

  16. Wende, F., Marsman, M., Zhao, Z., Kim, J.: Porting VASP from MPI to MPI+OpenMP [SIMD]. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 107–122. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_8

    CrossRef  Google Scholar 

  17. Shan, H., et al.: Parallel implementation and performance optimization of the configuration-interaction method. In: Supercomputing (SC) (2015)

    Google Scholar 

  18. Johansen, H., et al.: Toward exascale earthquake ground motion simulations for near-fault engineering analysis. Comput. Sci. Eng. 19(5), 27 (2017)

    CrossRef  Google Scholar 

  19. Mohd-Yusof, J.: CoDesign Molecular Dynamics (CoMD) Proxy App, LA-UR-12-21782, Los Alamos National Lab (2012)

    Google Scholar 

  20. Cicotti, P., et al.: An evaluation of threaded models for a classical MD proxy application. In: 2014 Hardware-Software Co-Design for High Performance Computing, New Orleans, LA, pp. 41–48 (2014). https://doi.org/10.1109/Co-HPC.2014.6

  21. Adedoyin, A.: A Case Study on Software Modernizationusing CoMD - A Molecular Dynamics Proxy Application, LA-UR-17-22676, Los Alamos National Lab (2017)

    Google Scholar 

  22. Gunter, D., Adedoyin, A.: Kokkos Port of CoMD Mini-App, DOE COE Performance Portability Meeting (2017)

    Google Scholar 

  23. Germann, T.C., et al.: 369 Tflop-s molecular dynamics simulations on the petaflop hybrid supercomputer ‘Roadrunner’. Concurrency Comput. Pract. Experience 21(17), 2143–2159 (2009)

    CrossRef  Google Scholar 

  24. https://berkeleygw.org

  25. https://github.com/cyanguwa/BerkeleyGW-GPP

  26. Soininen, J.A., et al.: Electron self-energy calculation using a general multi-pole approximation. J. Phys. Condensed Matter 15(17), 2573 (2003)

    CrossRef  Google Scholar 

  27. Treibig, J., Hager, G.: Introducing a performance model for bandwidth-limited loop kernels. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009. LNCS, vol. 6067, pp. 615–624. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14390-8_64

    CrossRef  Google Scholar 

  28. http://icl.cs.utk.edu/papi

  29. https://github.com/RRZE-HPC/likwid

  30. Culler, D., et al.: LogP: towards a realistic model of parallel computation. In: PPoPP (1993)

    CrossRef  Google Scholar 

  31. Alexandrov, A., et al.: LogGP: incorporating long messages into the LogP model. JPDC 44(1), 71–79 (1997)

    MathSciNet  Google Scholar 

  32. Altaf, M.B., Wood, D.A.: LogCA: a performance model for hardware accelerators. In: ISCA (2017)

    Google Scholar 

  33. Shende, S., Malony, A.: The TAU parallel performance system. IJHPCA 20(2), 287–311 (2005)

    Google Scholar 

  34. Adhianto, L., et al.: HPCToolkit: performance measurement and analysis for supercomputers with node-level parallelism. In: Workshop on Node Level Parallelism for Large Scale Supercomputers (2008)

    Google Scholar 

  35. http://docs.cray.com

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tuomas Koskela .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Koskela, T. et al. (2018). A Novel Multi-level Integrated Roofline Model Approach for Performance Characterization. In: Yokota, R., Weiland, M., Keyes, D., Trinitis, C. (eds) High Performance Computing. ISC High Performance 2018. Lecture Notes in Computer Science(), vol 10876. Springer, Cham. https://doi.org/10.1007/978-3-319-92040-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-92040-5_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-92039-9

  • Online ISBN: 978-3-319-92040-5

  • eBook Packages: Computer ScienceComputer Science (R0)