Scalability and Locality of Extrapolation Methods for Distributed-Memory Architectures
The numerical simulation of systems of ordinary differential equations (ODEs), which arise from the mathematical modeling of time-dependent processes, can be highly computationally intensive. Thus, efficient parallel solution methods are desirable. This paper considers the parallel solution of systems of ODEs by explicit extrapolation methods. We analyze and compare the scalability of several implementation variants for distributed-memory architectures which make use of different load balancing strategies and different loop structures. By exploiting the special structure of a large class of ODE systems, the communication costs can be reduced considerably. Further, by processing the micro-steps using a pipeline-like loop structure, the locality of memory references can be increased and a better utilization of the cache hierarchy can be achieved. Runtime experiments on modern parallel computer systems show that the optimized implementations can deliver a high scalability.
Unable to display preview. Download preview PDF.
- 9.Nørsett, S.P., Simonsen, H.H.: Aspects of parallel Runge–Kutta methods. In: Numerical Methods for Ordinary Differential Equations. LNM, vol. 1386, pp. 103–117 (1989)Google Scholar
- 10.Orozco, D., Gao, G.: Mapping the FDTD application to many-core chip architectures. In: Int. Conf. on Parallel Processing (ICPP-2009). IEEE, Los Alamitos (2009)Google Scholar
- 13.Snir, M., Otto, S.W., Huss-Lederman, S., Walker, D.W., Dongarra, J.: MPI the complete reference, 2nd edn. MIT Press, Cambridge (1998)Google Scholar
- 14.van der Houwen, P.J., Sommeijer, B.P.: Parallel ODE solvers. In: ACM Int. Conf. on Supercomputing, pp. 71–81 (1990)Google Scholar