High performance parallel FFT on distributed memory parallel computers
In this paper, a high performance parallelizing method of FFT is presented. Well known four or six step parallel algorithm with standard index map is not suitable for highly parallel computers, because it requires all-to-all communications between two phases of sub-FFTs which can not be overlap the computation of the each sub-FFT over the communication. We introduce another index map and algorithm which is intended to overcome the problem, and our results shows that our method out-perform the four step method in the 26 case out of 32 experiments. The results was obtained with up to 128 processors NEC Cenju-3 using the mini-MPI library.
Unable to display preview. Download preview PDF.
- 1.Van Loan, C: Computational Frameworks for the Fast Fourier Transform, SIAM, 1992Google Scholar
- 2.Swartztrauber, P.N.: Multiprocessor FFTs.Parallel Computing,no.5, (1987)197–210.Google Scholar
- 3.Hegland, M: Block Algorithms for FFTs on Vector and Parallel Computers, Parallel Computing: Trends and Applications, Elsevier Science, 1994Google Scholar
- 4.Takahashi, D., Kaneda, Y.: Implementation and Evaluation of 1-D FFT with External Memory on Parallel Computers, IPSJ SIG Notes, Vol.97, No.22, pp.7–12, 1997Google Scholar