Performance Measurements of the 3D FFT on the Blue Gene/L Supercomputer

  • Maria Eleftheriou
  • Blake Fitch
  • Aleksandr Rayshubskiy
  • T. J. Christopher Ward
  • Robert Germain
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3648)


This paper presents performance characteristics of a communications-intensive kernel, the complex data 3D FFT, running on the Blue Gene/L architecture. Two implementations of the volumetric FFT algorithm were characterized, one built on the MPI library using an optimized collective all-to-all operation [2] and another built on a low-level System Programming Interface (SPI) of the Blue Gene/L Advanced Diagnostics Environment (BG/L ADE) [17]. We compare the current results to those obtained using a reference MPI implementation (MPICH2 ported to BG/L with unoptimized collectives) and to a port of version 2.1.5 the FFTW library [14]. Performance experiments on the Blue Gene/L prototype indicate that both of our implementations scale well and the current MPI-based implementation shows a speedup of 730 on 2048 nodes for 3D FFTs of size 128 × 128 × 128. Moreover, the volumetric FFT outperforms FFTW port by a factor 8 for a 128× 128× 128 complex FFT on 2048 nodes.


Node Count Communication Layer Task Count Memory Access Pattern Destination List 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Almaśi, G., Archer, C., Castanos, J., Gupta, M., Martorell, X., Moreira, J.E.: MPI on Blue Gene/L:Designing an Efficient General Purpose Messaging Solution for a Large Cellular System. In: Proceedings of the 10th Euro PVM/MPI conference, Klagenfurt, Austria. LNCS (2003)Google Scholar
  2. 2.
    Almaśi, G., Archer, C., Chris Eway, C., Heidelberger, P., Martorell, X., Moreira, J.E., Steinmacher-Burow, B.D., Zheng, Y.: Optimization of MPI collective operations on Blue- Genełsystems (2005) (to appear at ICS 2005)Google Scholar
  3. 3.
    Almasi, G., et al.: Design and implementation of message-passing services for the Blue Gene/L supercomputer. IBM Journal of Research and Development 49(2/3), 393–406 (2005)CrossRefGoogle Scholar
  4. 4.
    Cramer, C.E., Board, J.A.: The Development and Integration of a Distributed 3D FFT for a cluster of workstations. In: 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 121–128 (2000)Google Scholar
  5. 5.
    Deserno, M., Holm, C.: How to mesh up ewald sums. i. a theoretical and numerical comparison of various particle mesh routines. J. Chem. Phys. 109(18), 7678–7693 (1998)CrossRefGoogle Scholar
  6. 6.
    Ding, H.Q., Ferraro, R.D., Gennery, D.B.: A portable 3D FFT Package for Distributed- Memory Parallel Architecture. In: SIAM Conference on Parallel Processing for Scientific Computing (1995)Google Scholar
  7. 7.
    Edelman, A., McCorquodale, P., Toledo, S.: The future fast Fourier transform? SIAM J. Sci. Comput. 20, 1094–1114 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Eleftheriou, M., Fitch, B.G., Rayshubskiy, A., Ward, T.J.C., Germain, R.S.: Scalable framework for 3d FFTs on the Blue Gene/L supercomputer: Implementation and early performance measurements. IBM Journal of Research and Development 49(2/3), 457–464 (2005)CrossRefGoogle Scholar
  9. 9.
    Eleftheriou, M., Moreira, J.E., Fitch, B.G., Germain, R.S.: A Volumetric FFT for BlueGene/L. In: Pinkston, T.M., Prasanna, V.K. (eds.) HiPC 2003. LNCS (LNAI), vol. 2913, pp. 194–203. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  10. 10.
    Allen, F., et al.: Blue Gene: a vision for protein science using a petaflop supercomputer. IBM Systems Journal 40(2), 310–327 (2001)CrossRefGoogle Scholar
  11. 11.
    Adiga, N.R., et al.: An overview of the Blue Gene/L supercomputer. In: Supercomputing 2002 Proceedings (November 2002),
  12. 12.
    Fitch, B.G., Germain, R.S., Mendell, M., Pitera, J., Pitman, M., Rayshubskiy, A., Sham, Y., Suits, F., Swope, W., Ward, T.J.C., Zhestkov, Y., Zhou, R.: Blue Matter, an application framework for molecular simulation on Blue Gene. Journal of Parallel and Distributed Computing 63, 759–773 (2003)CrossRefGoogle Scholar
  13. 13.
    Frigo, M., Johnson, S.G.: The Fastest Fourier Transform in the West. Technical Report MIT-LCS-TR-728, Laboratory for Computing Sciences, MIT, Cambridge, MA (1997)Google Scholar
  14. 14.
    Frigo, M., Johnson, S.G.: FFTW: An Adaptive Software Architecture for the FFT. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 3, pp. 1381–1384 (1998)Google Scholar
  15. 15.
    Gara, A., et al.: Overview of the Blue Gene/L system architecture. IBM Journal of Research and Development 49(2/3), 195–212 (2005)CrossRefGoogle Scholar
  16. 16.
    Gara, A., Heidelberger, P., Steinmacher-burow, B.: private communicationGoogle Scholar
  17. 17.
    Giampapa, M.E., et al.: Blue Gene/L advanced diagnostics environment. IBM Journal of Research and Development 49(2/3), 319–332 (2005)CrossRefGoogle Scholar
  18. 18.
    Haynes, P.D., Cote, M.: Parallel Fast Fourier Transforms for electronic structure calculations. Comp. Phys. Comm. 130, 121 (2000)CrossRefGoogle Scholar
  19. 19.
    Karplus, M., McCammon, J.A.: Molecular dynamics simulations of biomolecules. Nature Structural Biology 9(9), 646–652 (2002)CrossRefGoogle Scholar
  20. 20.
    Kral, S., Franchetti, F., Lorenz, J., Ueberhuber, C.W., Wurzinger, P.: FFT Compiler Techniques. In: Proceedings of Compiler Construction: 13th International Conference, CC 2004, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2004, Barcelona, Spain, March 29-April 2, pp. 217–231 (2004)Google Scholar
  21. 21.
    Lorenz, J., Kral, S., Franchett, F., Ueberhuber, C.W.: Vectorization techniques for the Blue Gene/L double FPU. IBM Journal of Research and Development 49(2/3) (2005)Google Scholar
  22. 22.
    The MPICH and MPICH2 homepage (January 2004),
  23. 23.
    Zubair, M., Agarwal, R.C., Gustavson, F.G.: A high performance parallel algorithm for 1D-FFT (1994)Google Scholar
  24. 24.
    Zapata, E.L., Rivera, F.F., Benavides, J., Garazo, J.M., Peskin, R.: Multidimensional Fast Fourier Transform into fixed size hypercubes. IEE Proceedings 137(4), 253–260 (1990)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Maria Eleftheriou
    • 1
  • Blake Fitch
    • 1
  • Aleksandr Rayshubskiy
    • 1
  • T. J. Christopher Ward
    • 1
  • Robert Germain
    • 1
  1. 1.IBM Thomas J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations