Automatic Tuning for Parallel FFTs

  • Daisuke Takahashi


In this paper, we propose an implementation of parallel fast Fourier transforms (FFTs) with automatic performance tuning on distributed-memory parallel computers. A blocking algorithm for parallel FFTs utilizes cache memory effectively. Since the optimal block size may depend on the problem size, we propose a method to determine the optimal block size that minimizes the number of cache misses. In addition, parallel FFTs require intensive all-to-all communication, which affects the performance of FFTs. An automatic tuning of all-to-all communication is also implemented. The performance results demonstrate that the proposed implementation of parallel FFTs with automatic performance tuning is efficient for improving the performance.


Fast Fourier Transform Discrete Fourier Transform Problem Size Fast Fourier Transform Algorithm Automatic Tuning 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19:297–301MathSciNetMATHCrossRefGoogle Scholar
  2. 2.
    Swarztrauber PN (1987) Multiprocessor FFTs. Parallel Comput 5:197–210MathSciNetMATHCrossRefGoogle Scholar
  3. 3.
    Agarwal RC, Gustavson FG, Zubair M (1994) A high performance parallel algorithm for 1-D FFT. In: Proceedings of the Supercomputing 1994, Washington, DC. pp 34–40Google Scholar
  4. 4.
    Hegland M (1994) A self-sorting in-place fast Fourier transform algorithm suitable for vector and parallel processing. Numer Math 68:507–547MathSciNetMATHCrossRefGoogle Scholar
  5. 5.
    Edelman A, McCorquodale P, Toledo S (1999) The future fast Fourier transform? SIAM J Sci Comput 20:1094–1114MathSciNetMATHCrossRefGoogle Scholar
  6. 6.
    Frigo M, Johnson SG (2005) The design and implementation of FFTW3. Proc IEEE 93: 216–231CrossRefGoogle Scholar
  7. 7.
    Püschel M, Moura JMF, Johnson J, Padua D, Veloso M, Singer BW, Xiong J, Franchetti F, Gacic A, Voronenko Y, Chen K, Johnson RW, Rizzolo N (2005) SPIRAL: Code generation for DSP transforms. Proc IEEE 93:232–275CrossRefGoogle Scholar
  8. 8.
    Mirković D, Johnsson SL (2001) Automatic performance tuning in the UHFFT library. In: Proceedings of the 2001 International Conference on Computational Science (ICCS 2001). Lecture Notes in Computer Science, Vol 2073, Springer, pp 71–80Google Scholar
  9. 9.
    Bonelli A, Franchetti F, Lorenz J, Püschel M, Ueberhuber CW (2006) Automatic performance optimization of the discrete Fourier transform on distributed memory computers. In: Proceedings of 4th International Symposium on Parallel and Distributed Processing and Applications (ISPA 2006). Lecture Notes in Computer Science, Vol 4330, Springer, pp 818–832Google Scholar
  10. 10.
    Takahashi D, Boku T, Sato M (2002) A blocking algorithm for parallel 1-D FFT on clusters of PCs. In: Proceedings of the 8th International Euro-Par Conference (Euro-Par 2002). Lecture Notes in Computer Science, Vol 2400, Springer, pp 691–700Google Scholar
  11. 11.
    FFTE: A Fast Fourier Transform Package.
  12. 12.
    Swarztrauber PN (1984) FFT algorithms for vector computers. Parallel Comput 1:45–63MATHCrossRefGoogle Scholar
  13. 13.
    Van Loan C (1992) Computational frameworks for the fast Fourier transform. SIAM, Philadelphia, PAMATHCrossRefGoogle Scholar
  14. 14.
    Faraj A, Yuan X (2005) Automatic generation and tuning of mpi collective communication routines. In: Proceedings of the 19th ACM International Conference on Supercomputing (ICS’05). pp 393–402Google Scholar
  15. 15.
    Kumar R, Mamidala A, Panda DK (2008) Scaling alltoall collective on multi-core systems. In: Proceedings of the 2008 IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008)Google Scholar
  16. 16.
    MVAPICH: MPI over InfiniBand and iWARP. Scholar

Copyright information

© Springer New York 2011

Authors and Affiliations

  1. 1.Graduate School of Systems and Information EngineeringUniversity of TsukubaTsukubaJapan

Personalised recommendations