The Journal of Supercomputing

, Volume 70, Issue 2, pp 696–708 | Cite as

3DyRM: a dynamic roofline model including memory latency information

  • O. G. LorenzoEmail author
  • T. F. Pena
  • J. C. Cabaleiro
  • J. C. Pichel
  • F. F. Rivera


Modern systems present complex memory hierarchies and heterogeneity among cores and processors. As a consequence, efficient programming is challenging. An easy-to-understand performance model, offering guidelines and information about the behaviour of a code, may be useful to alleviate these issues. In this paper, we present two extensions of the well-known Berkeley Roofline Model. The first of these extensions, the Dynamic Roofline Model (DyRM), takes into consideration the complexities of multicore and heterogeneous systems, offering a more detailed view of the evolution of the execution of a code. The second, the 3DyRM, also adds information about the latency of memory accesses to better represent the behaviour on systems with complex memory hierarchies. A set of tools to obtain and represent the models has been implemented. These tools obtain the needed data from hardware counters, with low overhead. Different views are displayed by the tool that can be used to extract the main features of the code. Results of studying, with these tools, the NAS Parallel Benchmarks for OpenMP on two different systems are presented.


Roofline model Performance Hardware counters  PEBS NPB Multicore 



This work has been partially supported by the Ministry of Education and Science of Spain, FEDER funds under contract TIN 2010-17541, and Xunta de Galicia, EM2013/041. It has been developed in the framework of the European network HiPEAC-2 and the Spanish network CAPAP-H4 (TIN2011-15734-E).


  1. 1.
    HP (2013) HP Caliper, Rockville. Accessed 2014
  2. 2.
    Intel (2012) Intel\(\textregistered \)64 and IA-32 architectures software developer’s manual volume 3B: system programming guide, part 2. Accessed 2014
  3. 3.
    Intel (2013) Intel VTune performance analyzer. Intel Corporation, Santa Clara. Accessed 2014
  4. 4.
    Intel (2013) Intel ark. Accessed 2014
  5. 5.
    Jin H, Frumkin M, Yan J (1999) The OpenMP implementation of NAS parallel benchmarks and its performance. In: Technical report NAS-99-011, NASA Ames Research Center, Moffett FieldGoogle Scholar
  6. 6.
    Lorenzo OG, Lorenzo JA, Cabaleiro JC, Heras DB, Suarez M, Pichel JC (2011) A study of memory access patterns in irregular parallel codes using hardware counter-based tools. In: Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA), pp 920–923.Google Scholar
  7. 7.
    Martínez DR, Blanco V, Cabaleiro JC, Pena TF, Rivera FF (2013) Modeling the performance of parallel applications using model selection techniques. Concurr Comput Pract Exp doi: 10.1002/cpe.3020
  8. 8.
    McCalpin JD (1995) Memory bandwidth and machine balance in current high performance computers. In: IEEE computer society technical committee on computer architecture (TCCA) newsletter, pp 19–25Google Scholar
  9. 9.
    Mosberger D, Eranian S (2001) IA-64 linux kernel: design and implementation. Prentice Hall PTR, Upper Saddle RiverGoogle Scholar
  10. 10.
    Paradyn Project (2013) Paradyn, Cape Coral. Accessed 2014
  11. 11.
    perfmon2 (2013) Precise event-based sampling (PEBS). Accessed 2014
  12. 12.
    R Development Core Team (2008) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (ISBN 3-900051-07-0)Google Scholar
  13. 13.
    Shende SS, Malony AD (2006) The tau parallel performance system. Int J High Perform Comput Appl 20(2):287–311Google Scholar
  14. 14.
    Taylor V, Wu X, Stevens R (2003) Prophesy: an infrastructure for performance analysis and modeling of parallel and grid applications. ACM SIGMETRICS Perform Eval Rev 30(4):13–18CrossRefGoogle Scholar
  15. 15.
    Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76 doi: 10.1145/1498765.1498785 Google Scholar
  16. 16.
    Wu X (1999) Performance, evaluation, prediction and visualization of parallel systems. Kluwer Academic Publishers, BostonGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • O. G. Lorenzo
    • 1
    Email author
  • T. F. Pena
    • 1
  • J. C. Cabaleiro
    • 1
  • J. C. Pichel
    • 1
  • F. F. Rivera
    • 1
  1. 1.Centro de Investigación en Tecnoloxías da Información, CITIUSUniversity of Santiago de CompostelaSantiago de CompostelaSpain

Personalised recommendations