Abstract
The design of future HPC systems is trending towards more heterogeneity with different types of accelerators, special purpose instructions sets, system-on-chip designs, complex memory hierarchies, and multiple memory coherence domains. This complexity exacerbates the design challenges and testing of programming models which aim to provide a high-level interface while also producing high performance programs. In this paper we describe how to use full-system architectural simulation of OpenMP applications to provide a platform for experimenting with OpenMP extensions on future architecture designs. Furthermore, we put forward the concept of integrating the OpenMP Tools API (OMPT) in conjunction with other tools (performance, emulators, etc.) to help speed up the use of architectural simulators with new OpenMP implementations. In this work, we evaluate an initial implementation of this simulation testbed design using gem5, an open source full system simulation, with the EPCC OpenMP micro-benchmarks that are instrumented with OMPT. We show that OMPT can be a powerful tool for the codesign of future systems models and programming model features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Binkert, N., et al.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011). https://doi.org/10.1145/2024716.2024718
Bruening, D., Garnett, T., Amarasinghe, S.: An infrastructure for adaptive dynamic optimization. In: International Symposium on Code Generation and Optimization, CGO 2003, pp. 265–275, March 2003. https://doi.org/10.1109/CGO.2003.1191551
Carlson, T.E., Heirman, W., Van Craeynest, K., Eeckhout, L.: Barrierpoint: Sampled simulation of multi-threaded applications. In: 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). pp. 2–12 (2014)
Castro, P.D.O., Akel, C., Petit, E., Popov, M., Jalby, W.: Cere: LLVM-based codelet extractor and replayer for piecewise benchmarking and optimization. ACM Trans. Archit. Code Optim. 12(1) (2015). https://doi.org/10.1145/2724717
Eichenberger, A.E., Mellor-Crummey, J., Schulz, M., Wong, M., Copty, N., Dietrich, R., Liu, X., Loh, E., Lorenz, D.: OMPT: an OpenMP tools application programming interface for performance analysis. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 171–185. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40698-0_13
Gutierrez, A., et al.: Lost in abstraction: pitfalls of analyzing GPUs at the intermediate language level. In: 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 608–619, February 2018. https://doi.org/10.1109/HPCA.2018.00058
Hamerly, G., Perelman, E., Calder, B.: Comparing multinomial and k-means clustering for SimPoint. In: 2006 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 131–142 (2006)
Kodama, Y., Odajima, T., Matsuda, M., Tsuji, M., Lee, J., Sato, M.: Preliminary performance evaluation of application kernels using arm SVE with multiple vector lengths. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 677–684, September 2017. https://doi.org/10.1109/CLUSTER.2017.93
Pennycook, S.J., Sewall, J.D., Hammond, J.R.: Evaluating the impact of proposed OpenMP 5.0 features on performance, portability and productivity. In: 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 37–46, November 2018. https://doi.org/10.1109/P3HPC.2018.00007
Perelman, E., Hamerly, G., Van Biesbrouck, M., Sherwood, T., Calder, B.: Using SimPoint for accurate and efficient simulation. ACM SIGMETRICS Perform. Eval. Rev. 31(1), 318–319 (2003)
Rico, A., Joao, J.A., Adeniyi-Jones, C., Van Hensbergen, E.: Arm HPC ecosystem and the reemergence of vectors: Invited paper. In: Proceedings of the Computing Frontiers Conference, CF 2017, pp. 329–334. ACM, New York (2017). https://doi.org/10.1145/3075564.3095086
Tairum Cruz, M., Bischoff, S., Rusitoru, R.: Shifting the barrier: extending the boundaries of the barrierpoint methodology. In: 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 120–122 (2018)
Acknowledgments
This research was funded by the Laboratory Directed Research and Development (LDRD) Program of the Oak Ridge National Laboratory managed by UT-Battelle, LLC, for the U.S. Department of Energy under Contract DE-AC05-00OR22725. This research also used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Baker, M., Hernandez, O., Young, J. (2020). Co-designing OpenMP Features Using OMPT and Simulation Tools. In: Milfeld, K., de Supinski, B., Koesterke, L., Klinkenberg, J. (eds) OpenMP: Portable Multi-Level Parallelism on Modern Systems. IWOMP 2020. Lecture Notes in Computer Science(), vol 12295. Springer, Cham. https://doi.org/10.1007/978-3-030-58144-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-58144-2_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58143-5
Online ISBN: 978-3-030-58144-2
eBook Packages: Computer ScienceComputer Science (R0)