An Implementation of Parallel 3-D FFT with 2-D Decomposition on a Massively Parallel Cluster of Multi-core Processors

  • Daisuke Takahashi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6067)


In this paper, we propose an implementation of a parallel three-dimensional fast Fourier transform (FFT) with two-dimensional decomposition on a massively parallel cluster of multi-core processors. The proposed parallel three-dimensional FFT algorithm is based on the multicolumn FFT algorithm. We show that a two-dimensional decomposition effectively improves performance by reducing the communication time for larger numbers of MPI processes. We successfully achieved a performance of over 401 GFlops on 256 nodes of Appro Xtreme-X3 (648 nodes, 147.2 GFlops/node, 95.4 TFlops peak performance) for 2563-point FFT.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)zbMATHMathSciNetGoogle Scholar
  2. 2.
    Brass, A., Pawley, G.S.: Two and three dimensional FFTs on highly parallel computers. Parallel Computing 3, 167–184 (1986)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Agarwal, R.C., Gustavson, F.G., Zubair, M.: An efficient parallel algorithm for the 3-D FFT NAS parallel benchmark. In: Proceedings of the Scalable High-Performance Computing Conference, pp. 129–133 (1994)Google Scholar
  4. 4.
    Takahashi, D.: Efficient implementation of parallel three-dimensional FFT on clusters of PCs. Computer Physics Communications 152, 144–150 (2003)CrossRefGoogle Scholar
  5. 5.
    Eleftheriou, M., Fitch, B.G., Rayshubskiy, A., Ward, T.J.C., Germain, R.S.: Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: Implementation and early performance measurements. IBM J. Res. Dev. 49, 457–464 (2005)CrossRefGoogle Scholar
  6. 6.
    Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93, 216–231 (2005)CrossRefGoogle Scholar
  7. 7.
    Takahashi, D.: A hybrid MPI/OpenMP implementation of a parallel 3-D FFT on SMP clusters. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Waśniewski, J. (eds.) PPAM 2005. LNCS, vol. 3911, pp. 970–977. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Fang, B., Deng, Y., Martyna, G.: Performance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer. Computer Physics Communications 176, 531–538 (2007)CrossRefGoogle Scholar
  9. 9.
    Van Loan, C.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia (1992)zbMATHGoogle Scholar
  10. 10.
    MVAPICH: MPI over InfiniBand and iWARP,

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Daisuke Takahashi
    • 1
  1. 1.Graduate School of Systems and Information EngineeringUniversity of TsukubaIbarakiJapan

Personalised recommendations