Parallelization of an Edge- and Coherence-Enhancing Anisotropic Diffusion Filter with a Distributed Memory Approach Based on GPI
Numerical algorithms in the seismic industry are among the most challenging areas of High Performance Computing and require an ever growing number of compute power and main memory. The Global Address Space Programming Interface (GPI) provides a model to program distributed memory clusters based on RDMA transfers in a Partitioned Global Address Space (PGAS). Based on GPI a generic straight forward parallelization of an Anisotropic Diffusion Filter (ADF) is implemented as an example of an Explicit Finite Difference scheme. Key features of the implementation are a complete overlay of the computation with network data transfers, a dynamic load distribution scheme and the usage of one-sided communication patterns throughout the algorithm to orchestrate read and write accesses to the image data. Synchronization points between the compute nodes or barriers are completely avoided. Benchmarks on a cluster with 260 nodes and 1040 cores reveal a constant communication overhead of less than 6% of the total computation time. This figure is still met if the compute nodes in the cluster differ significantly in performance capacity.
KeywordsMessage Passing Interface Current Time Step Dependency Range Synchronization Point Remote Direct Memory Access
We like to thank Joachim Weickert, University of Saarbrücken for providing a single threaded reference implementation of the PSPro Edge- and Coherence-Enhancing Anisotropic Diffusion Filter.
- 4.Weickert, J.: Anisotropic diffusion in image processing. Teubner (1998)Google Scholar