Abstract
Analyzing the behavior of OpenMP programs and their interaction with the hardware is essential for locating performance bottlenecks and identifying performance optimization opportunities. However, current architectures only provide a small number of dedicated registers to quantify hardware events, which strongly limits the scope of performance analyses. Hardware event multiplexing can help cover more events, but incurs a significant loss of accuracy and introduces overheads that change the behavior of program execution significantly. In this paper, we present an implementation of our technique for building a unique, coherent profile that contains all available hardware events from multiple executions of the same OpenMP program, each monitoring only a subset of the available hardware events. Reconciliation of the execution profiles relies on a new labeling scheme for OpenMP that uniquely identifies each dynamic unit of work across executions under dynamic scheduling across processing units. We show that our approach yields significantly better accuracy and lower monitoring overhead per execution than hardware event multiplexing.
Keywords
- Performance analysis
- Hardware events
- Performance monitoring counters
- OpenMP profiling
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurrency Comput. Pract. Exp. 22(6), 685–701 (2010)
Intel Corporation: Intel VTune Amplifier (2017). https://software.intel.com/en-us/intel-vtune-amplifier-xe. Accessed 30 Apr 2017
Dimakopoulou, M., Eranian, S., Koziris, N., Bambos, N.: Reliable and efficient performance monitoring in Linux. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 34 (2016)
Drebes, A., Bréjon, J.-B., Pop, A., Heydemann, K., Cohen, A.: Language-centric performance analysis of OpenMP programs with aftermath. In: Maruyama, N., Supinski, B.R., Wahib, M. (eds.) IWOMP 2016. LNCS, vol. 9903, pp. 237–250. Springer, Cham (2016). doi:10.1007/978-3-319-45550-1_17
Drebes, A., Pop, A., Heydemann, K., Cohen, A., Drach-Temam, N.: Aftermath: a graphical tool for performance analysis and debugging of fine-grained task-parallel programs and run-time systems. In: 7th Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG), Vienna, Austria (2014)
Hauswirth, M., Diwan, A., Sweeney, P.F., Mozer, M.C.: Automating vertical profiling. In: Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA 2005, pp. 281–296. ACM, New York (2005)
Levina, E., Bickel, P.: The earth mover’s distance is the Mallows distance: some insights from statistics. In: Proceedings Eighth IEEE International Conference on Computer Vision, ICCV 2001, vol. 2, pp. 251–256 (2001)
Lim, R.V., Carrillo-Cisneros, D., Scherson, I.D.: Computationally efficient multiplexing of events on hardware counters. In: Linux Symposium, pp. 101–110 (2014)
Mathur, W., Cook, J.: Towards accurate performance evaluation using hardware counters. In: ITEA Modeling and Simulation Workshop (2003)
Mathur, W., Cook, J.: Improved estimation for software multiplexing of performance counters. In: Proceedings - IEEE Computer Society’s Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, MASCOTS, vol. 2005, pp. 23–32. IEEE (2005)
Muddukrishna, A., Jonsson, P.A., Brorsson, M.: Characterizing task-based OpenMP programs. PLoS ONE 10(4), e0123545 (2015)
Mytkowicz, T., Sweeney, P.F., Hauswirth, M., Diwan, A.: Time interpolation: so many metrics, so few registers. In: 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2007, pp. 286–300. IEEE (2007)
NASA: NAS Parallel Benchmarks. https://www.nas.nasa.gov/publications/npb.html. Accessed 30 Apr 2017
Neill, R., Drebes, A., Pop, A.: Fuse: accurate multiplexing of hardware performance counters across executions (2017)
University of Versailles Saint Quentin en Yvelines: NAS Parallel Benchmarks 3.0 Unofficial OpenMP C Version (2014). https://github.com/benchmark-subsetting/NPB3.0-omp-C. Accessed 30 Apr 2017
Pele, O., Werman, M.: A linear time histogram metric for improved SIFT matching. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5304, pp. 495–508. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88690-7_37
Pop, A., Cohen, A.: OpenStream: expressiveness and data-flow compilation of OpenMP streaming programs. ACM Trans. Architect. Code Optim. 9(4), 5301–5325 (2013)
Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000)
Shende, S.S., Malony, A.D.: The Tau parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Neill, R., Drebes, A., Pop, A. (2017). Accurate and Complete Hardware Profiling for OpenMP. In: de Supinski, B., Olivier, S., Terboven, C., Chapman, B., Müller, M. (eds) Scaling OpenMP for Exascale Performance and Portability. IWOMP 2017. Lecture Notes in Computer Science(), vol 10468. Springer, Cham. https://doi.org/10.1007/978-3-319-65578-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-65578-9_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65577-2
Online ISBN: 978-3-319-65578-9
eBook Packages: Computer ScienceComputer Science (R0)