Abstract
We review 1-dimensional FFT algorithms for distributed-memory machines with vector processing nodes. To attain high performance on this type of machine, one has to achieve both high single-processor performance and high parallel efficiency at the same time. We explain a general framework for designing 1-D FFT based on a 3-dimensional representation of the data that can satisfy both of these requirements. Among many algorithms derived from this framework, two variants are shown to be optimal from the viewpoint of both parallel performance and usability. We also introduce several ideas that further improve performance and flexibility of user interface. Numerical experiments on the Hitachi SR2201, a distributed-memory parallel machine with pseudo-vector processing nodes, show that our program can attain 48% of the peak performance when computing the FFT of 226 points using 64 nodes.
This work was done while the author was at the Central Research Laboratory, Hitachi Ltd.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. C. Agarwal and J. W. Cooley: Vectorized Mixed Radix Discrete Fourier Transform Algorithms, Proc. of IEEE, Vol. 75, No. 9, pp. 1283–1292 (1987).
R. C. Agarwal, F. G. Gustavson and M. Zubair: A High Prformance Parallel Algorithm for 1-D FFT, Proc. of Supercomputing’ 94, pp. 34–40 (1994).
D. H. Bailey: FFTs in External or Hierarchical Memory, The Journal of Supercomputing, Vol. 4, pp. 23–35 (1990).
L. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker and R. Whaley: ScaLAPACK User’s Guide, SIAM, Philadelphia, PA, 1997.
D. A. Carlson: Ultrahigh-Performance FFTs for the Cray-2 and Cray Y-MP Supercomputers, Journal of Supercomputing, Vol. 6, pp. 107–116 (1992).
J. W. Cooley and J. W. Tukey: An Algorithm for the Machine Calculation of Complex Fourier Series, Mathematics of Computation, Vol. 19, pp. 297–301 (1965).
A. Dubey, M. Zubair and C. E. Grosch: A General Purpose Subroutine for Fast Fourier Transform on a Distributed Memory Parallel Machine, Parallel Computing, Vol. 20, pp. 1697–1710 (1994).
H. Fujii, Y. Yasuda, H. Akashi, Y. Inagami, M. Koga, O. Ishihara, M. Kashiyama, H. Wada and T. Sumimoto: Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System, Proc. of IPPS’ 97, pp. 233–241, 1997.
M. Hegland: Real and Complex Fast Fourier Transforms on the Fujitsu VPP500, Parallel Computing, Vol. 22, pp. 539–553 (1996).
S. L. Johnson and R. L. Krawitz: Cooley-Tukey FFT on the Connection Machine, Parallel Computing, Vol. 18, pp. 1201–1221 (1992).
K. Nakazawa, H. Nakamura, H. Imori and S. Kawabe: Pseudo Vector Processor Based on Register-Windowed Superscalar Pipeline, Proc. of Supercomputing’ 92, pp. 642–651 (1992).
P. N. Swarztrauber: FFT Algorithms for Vector Computers, Parallel Computing, Vol. 1, pp. 45–63 (1984).
P. N. Swarztrauber: Multiprocessor FFTs, Parallel Computing, Vol. 5, pp. 197–210 (1987).
D. Takahashi: Parallel FFT Algorithms for the Distributed-Memory Parallel Computer Hitachi SR8000, Proc. of JSPP2000, pp. 91–98, 2000 (in Japanese).
C. Van Loan: Computational Frameworks for the Fast Fourier Transform, SIAM Press, Philadelphia, PA (1992).
Y. Yamamoto, M. Igai and K. Naono: A Vector-Parallel FFT with a User-Specifiable Data Distribution Scheme, in M. Guo and L. T. Yang, eds., Parallel and Distributed Processing and Applications, Lecture Notes in Computer Science 2745, Springer-Verlag, pp. 362–374, 2003.
Y. Yasuda, H. Fujii, H. Akashi, Y. Inagami, T. Tanaka, J. Nakagoshi, H. Wada and T. Sum-imoto: Deadlock-Free Fault-Tolerant Routing in the Multi-Dimensional Crossbar Network and its Implementation for the Hitachi SR2201, Proc. oflPPS’ 97, pp. 346–352, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer Science+Business Media, Inc.
About this chapter
Cite this chapter
Yamamoto, Y., Kawamura, H., Igai, M. (2005). Vector-Parallel Algorithms for 1-Dimensional Fast Fourier Transform. In: Guo, M., Yang, L.T. (eds) New Horizons of Parallel and Distributed Computing. Springer, Boston, MA. https://doi.org/10.1007/0-387-28967-4_4
Download citation
DOI: https://doi.org/10.1007/0-387-28967-4_4
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24434-1
Online ISBN: 978-0-387-28967-0
eBook Packages: Computer ScienceComputer Science (R0)