A self-sorting in-place fast Fourier transform algorithm suitable for vector and parallel processing
We propose a new algorithm for fast Fourier transforms. This algorithm features uniformly long vector lengths and stride one data access. Thus it is well adapted to modern vector computers like the Fujitsu VP2200 having several floating point pipelines per CPU and very fast stride one data access. It also has favorable properties for distributed memory computers as all communication is gathered together in one step. The algorithm has been implemented on the Fujitsu VP2200 using the basic subroutines for fast Fourier transforms discussed elsewhere. We develop the theory of index digit permutations to some extent. With this theory we can derive the splitting formulas for almost all mixed-radix FFT algorithms known so far. This framework enables us to prove these algorithms but also to derive our new algorithm. The development and systematic use of this framework is new and allows us to simplify the proofs which are now reduced to the application of matrix recursions.
Unable to display preview. Download preview PDF.