Adaptive Computation of Self Sorting In-Place FFTs on Hierarchical Memory Architectures

  • Ayaz Ali
  • Lennart Johnsson
  • Jaspal Subhlok
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4782)


Computing ”in-place and in-order” FFT poses a very difficult problem on hierarchical memory architectures where data movement can seriously degrade the performance. In this paper we present recursive formulation of a self sorting in-place FFT algorithm that adapts to the target architecture. For transform sizes where an in-place, in-order execution is not possible, we show how schedules can be constructed that use minimum work-space to perform the computation efficiently. In order to express and construct FFT schedules, we present a context free grammar that generates the FFT Schedule Specification Language. We conclude by comparing the performance of our in-place in-order FFT implementation with that of other well known FFT libraries. We also present a performance comparison between the out-of-place and in-place execution of various FFT sizes.


Fast Fourier Transform Fast Fourier Transform Algorithm Adaptive Computation Installation Time Middle Rank 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ali, A., Johnsson, L., Mirkovic, D.: Empirical Auto-tuning Code Generator for FFT and Trignometric Transforms. In: ODES: 5th Workshop on Optimizations for DSP and Embedded Systems, in conjunction with International Symposium on Code Generation and Optimization (CGO), San Jose, CA (March 2007)Google Scholar
  2. 2.
    Ali, A., Johnsson, L., Subhlok, J.: Scheduling FFT Computation on SMP and Multicore Systems. In: International Conference on Supercomputing, Seattle, WA (June 2007)Google Scholar
  3. 3.
    Burrus, C.S., Eschenbacher, P.W.: An in-place, in-order prime factor FFT algorithm. IEEE Transactions on Acoustics, Speech, and Signal Processing 29, 806–817 (1981)zbMATHCrossRefGoogle Scholar
  4. 4.
    Burrus, C.S., Johnson, H.W.: An in-order, in-place radix-2 FFT. IEEE Transactions on Acoustics, Speech, and Signal Processing 9, 473–476 (1984)Google Scholar
  5. 5.
    Cooley, J., Tukey, J.: An algorithm for the machine computation of complex fourier series. Mathematics of Computation 19, 297–301 (1965)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Franchetti, F., Voronenko, Y., Püschel, M.: FFT program generation for shared memory: SMP and multicore. In: SC 2006. Proceedings of the 2006 ACM/IEEE conference on Supercomputing, p. 115. ACM Press, New York (2006)Google Scholar
  7. 7.
    Frigo, M.: A fast Fourier transform compiler. In: PLDI 1999. Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, pp. 169–180. ACM Press, New York (1999)CrossRefGoogle Scholar
  8. 8.
    Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. In: Proceedings of the IEEE 1993, vol. 2, pp. 216–231 (2005), special issue on Program Generation, Optimization, and Platform AdaptationGoogle Scholar
  9. 9.
    Hegland, M.: A self-sorting in-place fast Fourier transform algorithm suitable for vector and parallel processing. Numerische Mathematik 68(4), 507–547 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Loan, C.V.: Computational frameworks for the fast Fourier transform. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (1992)Google Scholar
  11. 11.
    Mirkovic, D., Johnsson, S.L.: Automatic Performance Tuning in the UHFFT Library. In: Delugach, H.S., Stumme, G. (eds.) ICCS 2001. LNCS (LNAI), vol. 2120, pp. 71–80. Springer, Heidelberg (2001)Google Scholar
  12. 12.
    Mirkovic, D., Mahasoom, R., Johnsson, S.L.: An adaptive software library for fast Fourier transforms. In: International Conference on Supercomputing, pp. 215–224 (2000)Google Scholar
  13. 13.
    Püschel, M., Moura, J.M.F., Johnson, J., Padua, D., Veloso, M., Singer, B.W., Xiong, J., Franchetti, F., Gačić, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N.: SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on Program Generation, Optimization, and Adaptation 93(2), 232–275 (2005)Google Scholar
  14. 14.
    Singleton, R.C.: An algorithm for computing the mixed radix fast Fourier transform. IEEE Transactions on Audio and Electroacoustics 17, 93–103 (1969)CrossRefGoogle Scholar
  15. 15.
    Tang, P.T.P.: DFTI – A New Interface for Fast Fourier Transform Libraries. ACM Transactions on Mathematical Software 31(4), 475–507 (2005)zbMATHCrossRefGoogle Scholar
  16. 16.
    Temperton, C.: Self-Sorting Mixed-Radix Fast Fourier Transforms. Journal of Computational Physics 52, 1–23 (1983)zbMATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Temperton, C.: Implementation of a Self-Sorting In-Place Prime Factor FFT Algorithm. Journal of Computational Physics 54, 283–299 (1985)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Temperton, C.: A new set of minimum-add small-n rotated DFT modules. J. Comput. Phys. 75(1), 190–198 (1988)zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Temperton, C.: Self-Sorting In-Place Fast Fourier Transforms. SIAM Journal on Scientific and Statistical Computing 12(4), 808–823 (1991)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Ayaz Ali
    • 1
  • Lennart Johnsson
    • 1
  • Jaspal Subhlok
    • 1
  1. 1.Department of Computer Science, University of Houston, Houston,TX 77204 

Personalised recommendations