Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis

  • Yu Jung Lo
  • Samuel Williams
  • Brian Van Straalen
  • Terry J. Ligocki
  • Matthew J. Cordery
  • Nicholas J. Wright
  • Mary W. Hall
  • Leonid Oliker
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8966)

Abstract

We present preliminary results of the Roofline Toolkit for multicore, manycore, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measure sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.

Keywords

Roofline Memory bandwidth CUDA unified memory 

References

  1. 1.
  2. 2.
    Bailey, D.H., Lucas, R.F., Williams, S.W.: Performance Tuning of Scientific Applications. CRC Press, New York (2011)MATHGoogle Scholar
  3. 3.
    Choi, J.W., Bedard, D., Fowler, R., Vuduc, R.: A roofline model of energy. In: IEEE IPDPS, May 2013Google Scholar
  4. 4.
  5. 5.
    IBM Corporation: IBM system blue gene solution: Blue gene/q application development. IBM, June 2013Google Scholar
  6. 6.
    Intel Corporation: Intel xeon phi corprocessor system softeare developers guide. Intel, June 2012Google Scholar
  7. 7.
    Nvidia Corporation: Kepler gk 110: The fatest, most efficient hpc architecture ever built. Nvidia v1.0 (2012)Google Scholar
  8. 8.
    Nvidia Corporation: Cuda c programming guide. Nvidia PG-02819 v6.0, February 2014Google Scholar
  9. 9.
    Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Katherine, Y.: Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Rev. 51(1), 129–159 (2009)CrossRefGoogle Scholar
  10. 10.
  11. 11.
  12. 12.
    Gyrokinetic Toroidal Code Website. http://phoenix.ps.uci.edu/GTC/
  13. 13.
    Hager, G., Treibig, J., Habich, J., Wellein, G.: Exploring performance and power properties of modern multicore chips via simple machine models. CoRR abs/1208.2908 (2012)Google Scholar
  14. 14.
    HPGMG website. http://hpgmg.org
  15. 15.
    Kamil, S., Husbands, P., Oliker, L., Shalf, J., Yelick, K.: Impact of modern memory subsystems on cache optimizations for stencil computations. In: ACM MSP (2005)Google Scholar
  16. 16.
    LLCBench - Low Level Architectural Characterization Benchmark Suite. http://icl.cs.utk.edu/projects/llcbench/index.htm
  17. 17.
    QEforge website: MiniDFT. http://qe-forge.org/gf/project/minidft/
  18. 18.
  19. 19.
    Williams, S.: Auto-tuning performance on multicore computers. Ph.D. thesis, EECS Department, University of California, Berkeley, December 2008Google Scholar
  20. 20.
    Williams, S., Watterman, A., Patterson, D.: Roofline: an insightful visual performance model for floating-point programs and multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Yu Jung Lo
    • 1
  • Samuel Williams
    • 1
  • Brian Van Straalen
    • 1
  • Terry J. Ligocki
    • 1
  • Matthew J. Cordery
    • 1
  • Nicholas J. Wright
    • 1
  • Mary W. Hall
    • 1
  • Leonid Oliker
    • 1
  1. 1.Lawerence Berkeley National LaboratoryUniversity of UtahSalt Lake CityUSA

Personalised recommendations