Abstract
Remote memory access (RMA) describes the ability of a process to access all or parts of the memory belonging to a remote process directly, without explicit participation of the remote side. There are a number of parallel programming models based on RMA operations that are relevant for High Performance Computing (HPC). On the one hand, Partitioned Global Address Space (PGAS) language extensions use RMA operations as underlying communication substrate, e.g. Co-Array Fortran and UPC. On the other hand, RMA programming APIs provide so called one-sided data transfer primitives as an alternative to the classic two-sided message passing. In this paper, we describe how Score-P, a scalable performance measurement infrastructure for parallel applications, is extended to support trace-based performance analyses of RMA parallelization models. Emphasis is given to the generic event model we designed to record RMA operations in the OTF2 trace format across a range of one-sided APIs and libraries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Sometimes they are called read and write instead, but get and put are the typical terms. To avoid confusion, load and store are used explicitly for local memory accesses.
References
Benedict, S., Petkov, V., Gerndt, M.: PERISCOPE: An online-based distributed performance analysis tool. In: M.S. Mller, M.M. Resch, A. Schulz, W.E. Nagel (eds.) Tools for High Performance Computing 2009, pp. 1–16. Springer, Berlin/Heidelberg (2010). URL http://dx.doi.org/10.1007/978-3-642-11261-4_1
Bonachea, D.: GASNet Specification, v1.1. Tech. rep., University of California, Berkeley (2002). URL http://techreports.lib.berkeley.edu/accessPages/CSD-02-1207
Bonachea, D., Duell, J.: Problems with using mpi 1.1 and 2.0 as compilation targets for parallel language implementations. International Journal of High Performance Computing and Networking 1(1–3), 91–99 (2004). DOI 10.1504/IJHPCN.2004.007569. URL http://portal.acm.org/citation.cfm?id=1359705
Chapman, B., Curtis, T., Pophale, S., Poole, S., Kuehn, J., Koelbel, C., Smith, L.: Introducing openshmem: Shmem for the pgas community. In: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, PGAS’10, pp. 2:1–2:3. ACM, New York, NY, USA (2010). DOI 10.1145/2020373.2020375. URL http://doi.acm.org/10.1145/2020373.2020375
Eschweiler, D., Wagner, M., Geimer, M., Knüpfer, A., Nagel, W.E., Wolf, F.: Open Trace Format 2 – The next generation of scalable trace formats and support libraries. In: Proc. of the Intl. Conference on Parallel Computing (ParCo), Ghent, Belgium, August 30–September 2, 2011, Advances in Parallel Computing, vol. 22, pp. 481–490. IOS Press (2012). DOI 10.3233/978-1-61499-041-3-481
Frings, W., Wolf, F., Petkov, V.: Scalable massively parallel I/O to task-local files. In: Proc. of the ACM/IEEE Conference on Supercomputing (SC09), Portland, OR, USA. ACM (2009). DOI 10.1145/1654059.1654077
Geimer, M., Wolf, F., Wylie, B.J., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca Performance Toolset Architecture. Concurrency and Computation: Practice and Experience 22(6), 702–719 (2010). DOI 10.1002/cpe.1556
Knüpfer, A., Brendel, R., Brunst, H., Mix, H., Nagel, W.E.: Introducing the Open Trace Format (OTF). In: Computational Science ICCS 2006: 6th International Conference, LNCS 3992. Springer, Reading, UK (2006)
Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The Vampir Performance Analysis Tool Set. In: Tools for High Performance Computing, pp. 139–155. Springer (2008)
Knüpfer, A., Rössel, C., an Mey, D., Biersdorff, S., Diethelm, K., Eschweiler, D., Geimer, M., Gerndt, M., Lorenz, D., Malony, A.D., Nagel, W.E., Oleynik, Y., Philippen, P., Saviankou, P., Schmidl, D., Shende, S.S., Tschüter, R., Wagner, M., Wesarg, B., Wolf, F.: Score-P – A joint performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir. In: Proc. of 5th Parallel Tools Workshop, Dresden, Germany (2011)
Machado, R., Lojewski, C., Abreu, S., Pfreundt, F.J.: Unbalanced tree search on a manycore system using the gpi programming model. Computer Science – R&D 26(3–4), 229–236 (2011). URL http://dblp.uni-trier.de/db/journals/ife/ife26.html#MachadoLAP11
Nieplocha, J., Harrison, R.J., Littlefield, R.J.: Global Arrays: a nonuniform memory access programming model for high-performance computers. J. Supercomput. 10, 169–189 (1996). URL http://portal.acm.org/citation.cfm?id=243179.243182
Nieplocha, J., Tipparaju, V., Krishnan, M., Panda, D.K.: High performance remote memory access communication: The ARMCI approach. Int. J. High Perform. Comput. Appl. 20, 233–253 (2006). DOI 10.1177/1094342006064504. URL http://portal.acm.org/citation.cfm?id=1125980.1125986
Poole, S.W., Hernandez, O., Kuehn, J.A., Shipman, G.M., Curtis, A., Feind, K.: Openshmem – toward a unified rma model. In: D.A. Padua (ed.) Encyclopedia of Parallel Computing, pp. 1379–1391. Springer (2011). URL http://dblp.uni-trier.de/db/reference/parallel/parallel2011.html#PooleHKSCF11
Reid, J.: Coarrays in the next fortran standard. SIGPLAN Fortran Forum 29, 10–27 (2010). DOI 10.1145/1837137.1837138
Shende, S., Malony, A.D.: The TAU Parallel Performance System, SAGE Publications. International Journal of High Performance Computing Applications 20(2), 287–331 (2006)
The Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, Version 3.0 (Draft Aug. 2012). Tech. rep. (2012). Aug. 2012
UPC Consortium: UPC Language Specifications, v1.2. Tech Report LBNL-59208, Lawrence Berkeley National Lab (2005). URL http://www.gwu.edu/~upc/publications/LBNL-59208.pdf
Vishnu, A., ten Bruggencate, M., Olson, R.: Evaluating the potential of cray gemini interconnect for pgas communication runtime systems. In: High Performance Interconnects (HOTI), 2011 IEEE 19th Annual Symposium on, pp. 70–77 (2011). DOI 10.1109/hoti.2011.19
Wolf, F., Mohr, B.: EPILOG Binary Trace-Data Format. Tech. Rep. FZJ-ZAM-IB-2004-06, Forschungszentrum Jülich (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Knüpfer, A. et al. (2013). Generic Support for Remote Memory Access Operations in Score-P and OTF2. In: Cheptsov, A., Brinkmann, S., Gracia, J., Resch, M., Nagel, W. (eds) Tools for High Performance Computing 2012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37349-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-37349-7_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37348-0
Online ISBN: 978-3-642-37349-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)