Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Scalable FFT Processors and Pipelined Butterfly Units


This paper considers partial-column radix-2 FFT processors and realizations of butterfly operations. The area and power-efficiency of butterfly units to be used in the proposed processor organization based on bit-parallel multipliers, distributed arithmetic, and CORDIC are analyzed and compared. All the selected butterfly units are synthesized onto the same 0.11 μ ASIC technology allowing the results to be compared. The proposed processor organization permits the area of the FFT implementation to be traded against the computation time, thus the final structure can be easily tailored according to the requirements of the given application. The power consumption comparison shows that butterflies based on bit-parallel multipliers are power-efficient but have limitations on clock frequency. Butterflies based on distributed arithmetic could be used when higher clock frequencies are used. If extremely long FFTs are needed, the CORDIC based butterflies are applicable.

This is a preview of subscription content, log in to check access.


  1. 1.

    J. Cooley and J. Tukey, “An algorithm for the machine calculation of the complex Fourier series,” Math. Comput., vol. 19, 1965, pp. 297–301.

  2. 2.

    Tran-Thong and B. Liu, “Fixed-point fast Fourier transform error analysis,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 24, no. 6, 1976, pp. 563–573.

  3. 3.

    J. Granata, M. Conner, and R. Tolimieri, “Recursive fast algorithms and the role of the tensor product,” IEEE Trans. Signal Processing, vol. 40, no. 12, 1992, pp. 2921–2930.

  4. 4.

    S. F. Gorman and J. M. Wills. “Partial column FFT pipelines,” IEEE Trans. Circuits Syst. II, vol. 42, no. 6, 1995, pp. 414–423.

  5. 5.

    S. He and M. Torkelson, “Design and implementation of a 1024-point pipeline FFT processor,” in Proc. IEEE Custom Integrated Circuits Conf., Santa Clara, CA, 11–14 1998, pp. 131– 134.

  6. 6.

    E. H. Wold and A. M. Despain, “Pipeline and parallel-pipeline FFT processors for VLSI implementations,” IEEE Trans. Comput., vol. 33, no. 5, 1984, pp. 414–426.

  7. 7.

    M. Hasan and T. Arslan, “Implementation of low-power FFT processor cores using a novel order-based processing scheme,” IEE Proc. Circuits Devices Syst., vol. 150, no. 3, 2003, pp. 149–154.

  8. 8.

    M. Wosnitza, M. Cavadini, M. Thaler, and G. Tröster, “A high precision 1024-point FFT processor for 2D convolution,” in Dig. Tech. Papers IEEE Solid-State Circuits Conf., San Francisco, CA, 5–7 1998, pp. 118–119.

  9. 9.

    A. M. Despain, “Fourier transform computers using CORDIC iterations,” IEEE Trans. Comput., vol. 23, no. 10, 1974, pp. 993–1001.

  10. 10.

    A. Berkeman, V. öwall, and M. Torkelson, “A low logic depth complex multiplier using distributed arithmetic,” IEEE J. Solid-State Circuits, vol. 35, no. 4, 2000, pp. 656–659.

  11. 11.

    L. Wanhammar, DSP Integrated Circuits. San Diego, CA: Academic Press, 1999.

  12. 12.

    J. Takala and K. Punkka, “Scalable FFT processors and pipelined butterfly units,” in Computer Systems: Architectures, Modeling, and Simulation, ser. Lecture Notes in Computer Science, A. D. Pimentel and S. Vassiliadis, Eds. Berlin, Germany: Springer-Verlag, 2004, pp. 373–382.

  13. 13.

    J. Takala and T. Järvinen, “Stride permutation access in interleaved memory systems,” in Domain-Specific Processors: Systems, Architectures, Modeling, and Simulation, S. S. Bhattacharyya, E. F. Deprettere, and J. Teich, Eds. New York, NY: Marcel Dekker, 2004, ch. 4, pp. 63–84.

  14. 14.

    A. Wenzler and E. Lüder, “New structures for complex multipliers and their noise analysis,” in Proc. IEEE ISCAS, vol. 2, Seattle, WA, 1995, pp. 1432–1435.

  15. 15.

    S. A. White, “A simple FFT butterfly arithmetic unit,” IEEE Trans. Circuits Syst., vol. 28, no. 4, 1981, pp. 352–355.

  16. 16.

    E. Chu and A. George, Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms. Boca Raton, FL: CRC Press, 2000.

  17. 17.

    M. Hasan and T. Arslan, “FFT coefficient memory reduction technique for OFDM applications,” in Proc. IEEE ICASSP, vol. 1, Orlando, FL, 13–17 2002, pp. 1085–1088.

Download references

Author information

Correspondence to Jarmo Takala.

Additional information

Jarmo Takala received his M.Sc. (hons) degree in Electronics and Dr.Tech. degree in Information Technology from Tampere University of Technology, Tampere, Finland (TUT) in 1987 and 1999, respectively. From 1992 to 1996, he was a Research Scientist at VTT-Automation, Tampere, Finland. Between 1995 and 1996, he was a Senior Research Engineer at Nokia Research Center, Tampere, Finland. From 1996 to 1999, he was a Researcher at TUT. Currently, he is Professor in Computer Engineering at TUT and head of the Insitute of Digital and Computer Systems of TUT. His research interests include circuit techniques, parallel architectures, and design methodologies for digital signal processing systems.

Konsta Punkka received his M.Sc. degree (hons) in Electrical Engineering from Tampere University of Technology (TUT), in 2002. He is currently working towards his Dr.Tech. degree as a research scientist in the Institute of Digital and Computer Systems at TUT. His research interests include optimization and implementation of DSP architectures.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Takala, J., Punkka, K. Scalable FFT Processors and Pipelined Butterfly Units. J VLSI Sign Process Syst Sign Image Video Technol 43, 113–123 (2006). https://doi.org/10.1007/s11265-006-7265-3

Download citation


  • application-specific integrated circuit
  • parallel processing
  • radix-2
  • distributed arithmetic