Skip to main content

Accurate and Complete Hardware Profiling for OpenMP

Multiplexing Hardware Events Across Executions

Part of the Lecture Notes in Computer Science book series (LNPSE,volume 10468)

Abstract

Analyzing the behavior of OpenMP programs and their interaction with the hardware is essential for locating performance bottlenecks and identifying performance optimization opportunities. However, current architectures only provide a small number of dedicated registers to quantify hardware events, which strongly limits the scope of performance analyses. Hardware event multiplexing can help cover more events, but incurs a significant loss of accuracy and introduces overheads that change the behavior of program execution significantly. In this paper, we present an implementation of our technique for building a unique, coherent profile that contains all available hardware events from multiple executions of the same OpenMP program, each monitoring only a subset of the available hardware events. Reconciliation of the execution profiles relies on a new labeling scheme for OpenMP that uniquely identifies each dynamic unit of work across executions under dynamic scheduling across processing units. We show that our approach yields significantly better accuracy and lower monitoring overhead per execution than hardware event multiplexing.

Keywords

  • Performance analysis
  • Hardware events
  • Performance monitoring counters
  • OpenMP profiling

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurrency Comput. Pract. Exp. 22(6), 685–701 (2010)

    Google Scholar 

  2. Intel Corporation: Intel VTune Amplifier (2017). https://software.intel.com/en-us/intel-vtune-amplifier-xe. Accessed 30 Apr 2017

  3. Dimakopoulou, M., Eranian, S., Koziris, N., Bambos, N.: Reliable and efficient performance monitoring in Linux. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 34 (2016)

    Google Scholar 

  4. Drebes, A., Bréjon, J.-B., Pop, A., Heydemann, K., Cohen, A.: Language-centric performance analysis of OpenMP programs with aftermath. In: Maruyama, N., Supinski, B.R., Wahib, M. (eds.) IWOMP 2016. LNCS, vol. 9903, pp. 237–250. Springer, Cham (2016). doi:10.1007/978-3-319-45550-1_17

    CrossRef  Google Scholar 

  5. Drebes, A., Pop, A., Heydemann, K., Cohen, A., Drach-Temam, N.: Aftermath: a graphical tool for performance analysis and debugging of fine-grained task-parallel programs and run-time systems. In: 7th Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG), Vienna, Austria (2014)

    Google Scholar 

  6. Hauswirth, M., Diwan, A., Sweeney, P.F., Mozer, M.C.: Automating vertical profiling. In: Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA 2005, pp. 281–296. ACM, New York (2005)

    Google Scholar 

  7. Levina, E., Bickel, P.: The earth mover’s distance is the Mallows distance: some insights from statistics. In: Proceedings Eighth IEEE International Conference on Computer Vision, ICCV 2001, vol. 2, pp. 251–256 (2001)

    Google Scholar 

  8. Lim, R.V., Carrillo-Cisneros, D., Scherson, I.D.: Computationally efficient multiplexing of events on hardware counters. In: Linux Symposium, pp. 101–110 (2014)

    Google Scholar 

  9. Mathur, W., Cook, J.: Towards accurate performance evaluation using hardware counters. In: ITEA Modeling and Simulation Workshop (2003)

    Google Scholar 

  10. Mathur, W., Cook, J.: Improved estimation for software multiplexing of performance counters. In: Proceedings - IEEE Computer Society’s Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, MASCOTS, vol. 2005, pp. 23–32. IEEE (2005)

    Google Scholar 

  11. Muddukrishna, A., Jonsson, P.A., Brorsson, M.: Characterizing task-based OpenMP programs. PLoS ONE 10(4), e0123545 (2015)

    CrossRef  Google Scholar 

  12. Mytkowicz, T., Sweeney, P.F., Hauswirth, M., Diwan, A.: Time interpolation: so many metrics, so few registers. In: 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2007, pp. 286–300. IEEE (2007)

    Google Scholar 

  13. NASA: NAS Parallel Benchmarks. https://www.nas.nasa.gov/publications/npb.html. Accessed 30 Apr 2017

  14. Neill, R., Drebes, A., Pop, A.: Fuse: accurate multiplexing of hardware performance counters across executions (2017)

    Google Scholar 

  15. University of Versailles Saint Quentin en Yvelines: NAS Parallel Benchmarks 3.0 Unofficial OpenMP C Version (2014). https://github.com/benchmark-subsetting/NPB3.0-omp-C. Accessed 30 Apr 2017

  16. Pele, O., Werman, M.: A linear time histogram metric for improved SIFT matching. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5304, pp. 495–508. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88690-7_37

    CrossRef  Google Scholar 

  17. Pop, A., Cohen, A.: OpenStream: expressiveness and data-flow compilation of OpenMP streaming programs. ACM Trans. Architect. Code Optim. 9(4), 5301–5325 (2013)

    Google Scholar 

  18. Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000)

    CrossRef  MATH  Google Scholar 

  19. Shende, S.S., Malony, A.D.: The Tau parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard Neill .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Neill, R., Drebes, A., Pop, A. (2017). Accurate and Complete Hardware Profiling for OpenMP. In: de Supinski, B., Olivier, S., Terboven, C., Chapman, B., Müller, M. (eds) Scaling OpenMP for Exascale Performance and Portability. IWOMP 2017. Lecture Notes in Computer Science(), vol 10468. Springer, Cham. https://doi.org/10.1007/978-3-319-65578-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65578-9_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65577-2

  • Online ISBN: 978-3-319-65578-9

  • eBook Packages: Computer ScienceComputer Science (R0)