Abstract
CFL (Communication Fusion Library) is an experimental C++ library which supports shared reduction variables in MPI programs. It uses overloading to distinguish private variables from replicated, shared variables, and automatically introduces MPI communication to keep replicated data consistent. This paper concerns a simple but surprisingly effective technique which improves performance substantially: CFL operators are executed lazily in order to expose opportunities for run-time, context-dependent, optimisation such as message aggregation and operator fusion. We evaluate the idea using both toy benchmarks and a ‘production’ code for simulating plankton population dynamics in the upper ocean. The results demonstrate the library’s software engineering benefits, and show that performance close to that of manually optimised code can be achieved automatically in many cases.
Chapter PDF
Similar content being viewed by others
Keywords
- Shared Variable
- Reduction Operation
- Abstract Data Type
- Lazy Evaluation
- Future Generation Computer System
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
S. H. M. Al-Battran: Simulation of Plankton Ecology Using the Fujitsu AP3000, MSc thesis, Imperial College, September 1998
A J Field, T L Hansen and P H J Kelly: Run-time fusion of MPI calls in a parallel C++ library. Poster paper at LCPC2000, The 13th Intl. Workshop on Languages and Compilers for High-Performance Computing, Yorktown Heights, August 2000.
J.M.D. Hill, D.B. Skillicorn: Lessons learned from implementing BSP. J. Future Generation Computer Systems, Vol 13, No 4–5, pp. 327–335, March 1998.
S. Karmesin, J. Crotinger, J. Cummings, S. Haney, W.J. Humphrey, J. Reynders, S. Smith, T. Williams: Array Design and Expression Evaluation in POOMA II. ISCOPE’98 pp. 231–238. Springer LNCS 1505 (1998).
P. Keleher, A. L. Cox, S. Dwarkadas, W. Zwaenepoel: TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. Proc. 1994 Winter Usenix Conference, pp. 115–131, January 1994
O. Beckmann, P. H. J. Kelly: Efficient Interprocedural Data Placement Optimisation in a Parallel Library, In LCR98, pp. 123–138. Springer-Verlag LNCS 1511 (May 1998).
J. Woods and W. Barkmann, Simulation Plankton Ecosystems using the Lagrangian Ensemble Method, Philosophical Transactions of the Royal Society, B343, pp. 27–31.
S.J. Fink, S.B. Baden and S.R. Kohn, Efficient Run-time Support for Irregular Block-Structured Applications, J. Parallel and Distributed Programming, V. 50, No. 1, pp 61–82, 1998.
A.J. Bennett and P.H.J. Kelly, Efficient shared-memory support for parallel graph reduction. Future Generation Computer Systems, V. 12 No. 6 pp. 481–503 (1997).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Field, A.J., Kelly, P.H.J., Hansen, T.L. (2002). Optimising Shared Reduction Variables in MPI Programs. In: Monien, B., Feldmann, R. (eds) Euro-Par 2002 Parallel Processing. Euro-Par 2002. Lecture Notes in Computer Science, vol 2400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45706-2_87
Download citation
DOI: https://doi.org/10.1007/3-540-45706-2_87
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44049-9
Online ISBN: 978-3-540-45706-0
eBook Packages: Springer Book Archive