Advertisement

Journal of Signal Processing Systems

, Volume 90, Issue 11, pp 1609–1621 | Cite as

Design Space Exploration of 1-D FFT Processor

  • Shaohan Liu
  • Dake LiuEmail author
Article
  • 211 Downloads

Abstract

A design space exploration methodology of 1-D FFT processor is proposed to find the best hardware architecture in a quantitative way during early design. The methodology includes architecture candidate collection, coarse-grained architecture selection, and circuit level design optimizations. We show how to select a better architecture from candidates including different architectures (SDF, SDC, MDF, MDC and memory-based) with different degree of parallelism at different radices. The sub-level designs, including designs of rotator and data scaling module, are introduced for further optimizations. As a proof of concept, an FFT processor for 4G, WLAN and future 5G is designed supporting 16-4096 and 12-2400 point FFTs. Memory-based architecture with 16-datapath mixed-radix butterfly unit is selected to satisfy the demands for 1GS/s (4096) throughput. The synthesis result based on 65nm technology shows that the silicon cost and power consumption are 1.46mm2 and 68.64mW respectively. The proposed processor has better normalized throughput per area unit and normalized FFTs per energy unit than the state of the art available designs.

Keywords

FFT (Fast Fourier Transform) Design space exploration Twiddle factor BFP (Block Floating Point) Non-power-of-two-point FFT 

Notes

Acknowledgments

The authors would like to thank Synopsys for their support in the use of ASIP Designer, which is used as a high-level synthesizer.

The finance supporting from National High Technical Research and Development Program of China (863 program) 2014AA01A705 is sincerely acknowledged by authors.

References

  1. 1.
    Guideline for 3.5GHz 5G System Prototype and Trial(Version 1.0). Tech. rep. CMCC (2017).Google Scholar
  2. 2.
    3GPP TS 36.211: Evolved Universal Terrestrial Radio Access (E-UTRA); LTE Physical Channels and Modulation (2012).Google Scholar
  3. 3.
    Antelo, E., Villalba, J., Bruguera, J.D., Zapata, E.L. (1997). High performance rotation architectures based on the radix-4 CORDIC algorithm. IEEE Transactions on Computers, 46(8), 855–870.CrossRefGoogle Scholar
  4. 4.
    Ayinala, M., Brown, M., Parhi, K.K. (2012). Pipelined parallel FFT architectures via folding transformation. IEEE Transactions on VLSI Systems, 20(6), 1068–1081.CrossRefGoogle Scholar
  5. 5.
    Baas, B.M. (1999). A low-power, high-performance, 1024-point FFT processor. IEEE Journal of Solid-State Circuits, 34(3), 380–387.CrossRefGoogle Scholar
  6. 6.
    Bidet, E., Castelain, D., Joanblanq, C., Senn, P. (1995). A Fast single-chip implementation of 8192 complex point FFT. IEEE Journal of Solid-State Circuits, 30(3), 300–305.CrossRefGoogle Scholar
  7. 7.
    Chen, J., Hu, J., Lee, S., Sobelman, G.E. (2015). Hardware Efficient Mixed Radix-25/16/9 FFT for LTE Systems. IEEE Transaction on VLSI Systems, 23(2), 221–229.CrossRefGoogle Scholar
  8. 8.
    Chen, S.G., Huang, S.J., Garrido, M., Jou, S.J. (2014). Continuous-flow parallel bit-reversal circuit for MDF and MDC FFT architectures. IEEE Transactions on Circuits and Systems I: Regular Papers, 61(10), 2869–2877.CrossRefGoogle Scholar
  9. 9.
    Chen, Y., Lin, Y.W., Tsao, Y.C., Lee, C.Y. (2008). A 2.4-gsample/s DVFS FFT processor for MIMO OFDM communication systems. IEEE Journal of Solid-State Circuits, 43(5), 1260–1273.CrossRefGoogle Scholar
  10. 10.
    Chen, Y., Tsao, Y.C., Lin, Y.W., Lin, C.H., Lee, C.Y. (2008). An indexed-scaling pipelined FFT processor for OFDM-based WPAN applications. IEEE Transactions on Circuits and Systems II: Express Briefs, 55 (2), 146–150.CrossRefGoogle Scholar
  11. 11.
    Cohen, D. (1976). Simplified control of FFT hardware. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(6), 577–579.CrossRefGoogle Scholar
  12. 12.
    Cooley, J.W., & Tukey, J.W. (1965). An Algorithm for the Machine Calculation of Complex Fourier Series. Mathematics of Computation, 19(90), 297–301.MathSciNetCrossRefGoogle Scholar
  13. 13.
    Despain, A.M. (1974). Fourier transform computers using CORDIC iterations. IEEE Transactions on Communications, C-23(10), 993–1001.zbMATHGoogle Scholar
  14. 14.
    Duhamel, P., & Hollmann, H. (1984). ’Split radix’ FFT algorithm. Electronics Letters, 20(1), 14–16.CrossRefGoogle Scholar
  15. 15.
    Frigo, M., & Johnson, S.G. (2005). The design and implementation of FFTW3. Proceedings of the IEEE, 93 (2), 216–231.CrossRefGoogle Scholar
  16. 16.
    Fu, B., & Ampadu, P. (2009). An area efficient FFT/IFFT processor for MIMO-OFDM WLAN 802.11n. Journal of Signal Processing Systems, 56(1), 59–68.CrossRefGoogle Scholar
  17. 17.
    Garrido, M., & Grajal, J. (2007). Efficient memoryless CORDIC for FFT computation. In Proc. IEEE Int. Conf. acoustics, speech, and signal proceess. (ICASSP) (Vol. 2, pp. II–113–II–116).Google Scholar
  18. 18.
    Garrido, M., Huang, S.J., Chen, S.G. (2018). Feedforward FFT hardware architectures based on rotator allocation. IEEE Transactions on Circuits and Systems I: Regular Papers, 65(2), 581–592.CrossRefGoogle Scholar
  19. 19.
    Garrido, M., Huang, S.J., Chen, S.G., Gustafsson, O. (2016). The serial commutator (SC) FFT. IEEE Transactions on Circuits and Systems II: Express Briefs, 63(10), 974–978.CrossRefGoogle Scholar
  20. 20.
    Garrido, M., Sanchez, M.A., Lopez-Vallejo, M.L., Grajal, J. (2017). A 4096-Point Radix-4 memory-based FFT using DSP slices. IEEE Transactions of VLSI Systems, 25(1), 375–379.CrossRefGoogle Scholar
  21. 21.
    Guan, X., Fei, Y., Lin, H. (2012). Hierarchical design of an application-specific instruction set processor for high-throughput and scalable FFT processing. IEEE Transactions on VLSI Systems, 20(3), 551–563.CrossRefGoogle Scholar
  22. 22.
    Hasan, M., & Arslan, T. (2002). Scheme for reducing size of coefficient memory in FFT processor. Electronics Letters, 38(4), 163–164.CrossRefGoogle Scholar
  23. 23.
    Hsiao, C.F., Chen, Y., Lee, C.Y. (2010). A generalized mixed-radix algorithm for memory-based FFT processors. IEEE Transactions on Circuits and Systems II: Express Briefs, 57(1), 26–30.CrossRefGoogle Scholar
  24. 24.
    Huang, S.J., & Chen, S.G. (2012). A high throughput Radix-16 FFT processor with parallel and normal input/output ordering for IEEE 802.15.3c systems. IEEE Transactions on Circuits and Systems I: Regular Papers, 59(8), 1752–1765.MathSciNetCrossRefGoogle Scholar
  25. 25.
    Huang, S.J., & Chen, S.G. (2014). A new memoryless and low-latency FFT rotator architecture. In Int. Symp. on integrated circuits (ISIC) (pp. 180–183).Google Scholar
  26. 26.
    Humphries, B., Zhang, H., Sheng, J., Landaverde, R., Herbordt, M.C. (2014). 3D FFTs on a single FPGA. In IEEE 22nd Annual Int. symp. on field-programmable custom computing machines (pp. 68–71).Google Scholar
  27. 27.
    Good, I.J. (1958). The interaction algorithm and practical fourier analysis. Journal of the Royal Statistical Society. Series B, 20(2), 361–372.MathSciNetzbMATHGoogle Scholar
  28. 28.
    IEEE 802.11ac-2013: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications–Amendment 4: Enhancements for Very High Throughput for Operation in Bands below 6 GHz.Google Scholar
  29. 29.
    Ingemarsson, C., Kallstrom, P., Qureshi, F., Gustafsson, O. (2017). Efficient FPGA mapping of pipeline SDF FFT cores. IEEE Transactions of VLSI Systems, 25(9), 2486–2497.CrossRefGoogle Scholar
  30. 30.
    Jaime, F.J., Sanchez, A.M., Hormigo, J., Villalba, J., Zapata, E.L. (2010). Enhanced scaling-free CORDIC. IEEE Transactions on Circuits and Systems I: Regular Papers, 57(7), 1654– 1662.MathSciNetCrossRefGoogle Scholar
  31. 31.
    Jeon, D., Seok, M., Chakrabarti, C. (2012). A super pipelined energy efficient subthreshold 240 MS/s FFT core in 65 nm CMOS. IEEE Journal of Solid-State Circuits, 47(1), 23–34.CrossRefGoogle Scholar
  32. 32.
    Jui, P.C., Wey, C.L., Shiue, M.T. (2013). Low-cost parallel FFT processors with conflict-free ROM-based twiddle factor generator for DVB-T2 applications. In IEEE Int. Midwest symp. circuits syst. (MWSCAS) (pp. 1003–1006).Google Scholar
  33. 33.
    Choi, J.-R., Park, S.-B., Han, D.-S., Park, S.-H. (2000). A 2048 complex point FFT architecture for digital audio broadcasting system. In Proc. IEEE Int. symp. circuits syst. emerging technol. for the 21st Century (Vol. 5, pp. 693–696).Google Scholar
  34. 34.
    Kang, H.J., Yang, B.D., Lee, J.Y. (2013). Low complexity twiddle factor multiplication with ROM partitioning in FFT processor. Electronics Letters, 49(9), 589–591.CrossRefGoogle Scholar
  35. 35.
    Kim, D., & Choi, H.W. (2008). Advanced constant multiplier for multipath pipelined FFT processor. Electronics Letters, 44(8), 518–519.CrossRefGoogle Scholar
  36. 36.
    Koutsoyannis, R., Milder, P.A., Berger, C.R., Glick, M., Hoe, J.C., Puschel, M. (2012). Improving fixed-point accuracy of FFT cores in O-OFDM systems. In Proc. IEEE Int. conf. acoustics, speech, and signal proceess. (ICASSP) (pp. 1585–1588).Google Scholar
  37. 37.
    Kuhlmann, M., & Parhi, K.K. (2002). P-CORDIC: a precomputation based rotation CORDIC algorithm. EURASIP Journal of Applied Signal Processing, 2002(9), 936–943.zbMATHGoogle Scholar
  38. 38.
    Lakshmi, B., & Dhar, A.S. (2008). High speed architectural implementation of CORDIC algorithm. In TENCON 2008 - 2008 IEEE region 10 conf. (pp. 1–5).Google Scholar
  39. 39.
    Lee, Y.H., Yu, T.H., Huang, K.K., Wu, A.Y. (2006). Rapid IP design of variable-length cached-FFT processor for OFDM-based communication systems. In Proc. IEEE Workshop signal process. syst. design and implement. (pp. 62–65).Google Scholar
  40. 40.
    Lenart, T., & Owall, V. (2003). A 2048 complex point FFT processor using a novel data scaling approach. In Proc. IEEE Int. Symp. circuits syst. (ISCAS) (Vol. 4, pp. IV–45–IV–48).Google Scholar
  41. 41.
    Lenart, T., & Owall, V. (2006). Architectures for dynamic data scaling in 2/4/8K pipeline FFT cores. IEEE Transactions on VLSI Systems, 14(11), 1286–1290.CrossRefGoogle Scholar
  42. 42.
    Lin, C.H., & Wu, A.Y. (2005). Mixed-scaling-rotation CORDIC (MSR-CORDIC) algorithm and architecture for high-performance vector rotational DSP applications. IEEE Transactions on Circuits and Systems I: Regular Papers, 52(11), 2385–2396.CrossRefGoogle Scholar
  43. 43.
    Lin, Y.W., Liu, H.Y., Lee, C.Y. (2004). A dynamic scaling FFT processor for DVB-T applications. IEEE Journal of Solid-State Circuits, 39(11), 2005–2013.CrossRefGoogle Scholar
  44. 44.
    Lin, Y.W., Liu, H.Y., Lee, C.Y. (2005). A 1-GS/s FFT/IFFT processor for UWB applications. IEEE Journal of Solid-State Circuits, 40(8), 1726–1735.CrossRefGoogle Scholar
  45. 45.
    Luo, H.F., Liu, Y.J., Shieh, M.D. (2015). Efficient memory-addressing algorithms for FFT processor design. IEEE Transactions of VLSI Systems, 23(10), 2162–2172.CrossRefGoogle Scholar
  46. 46.
    Maharatna, K., Banerjee, S., Grass, E., Krstic, M., Troya, A. (2005). Modified virtually scaling-free adaptive CORDIC rotator algorithm and architecture. IEEE Transactions on Circuits and Systems for Video Technology, 15(11), 1463–1474.CrossRefGoogle Scholar
  47. 47.
    Oh, J.Y., & Lim, M.S. (2005). Area and power efficient pipeline FFT algorithm. In Proc. IEEE Workshop signal process. syst. design and implement (pp. 520–525).Google Scholar
  48. 48.
    Park, S.Y., & Yu, Y.J. (2012). Fixed-point analysis and parameter selections of MSR-CORDIC with applications to FFT designs. IEEE Transactions on Signal Processing, 60(12), 6245–6256.MathSciNetCrossRefGoogle Scholar
  49. 49.
    Qian, Z., & Margala, M. (2016). Low-power split-radix FFT processors using radix-2 butterfly units. IEEE Transactions on VLSI Systems, 24(9), 3008–3012.CrossRefGoogle Scholar
  50. 50.
    Qureshi, F., Garrido, M., Gustafsson, O. (2013). Unified architecture for 2,3,4,5,and 7-point DFTs based on Winograd fourier transform algorithm. Electronics Letters, 49(5), 348–349.CrossRefGoogle Scholar
  51. 51.
    Rader, C.M. (1968). Discrete Fourier transforms when the number of data samples is prime. Proceedings of the IEEE, 56(6), 1107–1108.CrossRefGoogle Scholar
  52. 52.
    Richardson, S., Markoviċ, D., Danowitz, A., Brunhaver, J., Horowitz, M. (2015). Building conflict-free FFT schedules. IEEE Transactions on Circuits and Systems I: Regular Papers, 62(4), 1146–1155.MathSciNetCrossRefGoogle Scholar
  53. 53.
    Shih, X.Y., Chou, H.R., Liu, Y.Q. (2018). VLSI design and implementation of reconfigurable 46-mode combined-radix-based FFT hardware architecture for 3GPP-LTE applications. IEEE Transactions on Circuits and Systems I: Regular Papers, 65(1), 118–129.CrossRefGoogle Scholar
  54. 54.
    Shih, X.Y., Liu, Y.Q., Chou, H.R. (2017). 48-mode reconfigurable design of SDF FFT hardware architecture using Radix-3ˆ,2 and Radix-2ˆ3 design approaches. IEEE Transactions on Circuits and Systems I: Regular Papers, 64(6), 1456–1467.CrossRefGoogle Scholar
  55. 55.
    Shousheng, H., & Torkelson, M. (1998). Designing pipeline FFT processor for OFDM (de)modulation. In Proc. URSI int. symp. signals, syst. and elect. (pp. 257–262).Google Scholar
  56. 56.
    Tang, S.N., Jan, F.C., Cheng, H.W., Lin, C.K., Wu, G.Z. (2014). Multimode memory-based FFT processor for wireless display FD-OCT medical systems. IEEE Transactions on Circuits and Systems I: Regular Papers, 61(12), 3394–3406.CrossRefGoogle Scholar
  57. 57.
    Tang, S.N., Liao, C.H., Chang, T.Y. (2012). An area- and energy-efficient multimode FFT processor for WPAN/WLAN/WMAN systens. IEEE Journal of Solid-State Circuits, 47(6), 1419–1435.CrossRefGoogle Scholar
  58. 58.
    Tang, S.N., Tsai, J.W., Chang, T.Y. (2010). A 2.4-GS/s FFT processor for OFDM-based WPAN applications. IEEE Transactions on Circuits and Systems II: Express Briefs, 57(6), 451–455.CrossRefGoogle Scholar
  59. 59.
    Thomas, L.H. (1963). Using a computer to solve problems in physics. Applications of digital computers. Boston: Ginn.Google Scholar
  60. 60.
    Wang, J., Xiong, C., Zhang, K., Wei, J. (2016). A mixed-decimation MDF architecture for Radix-2k parallel FFT. IEEE Transactions on VLSI Systems, 24(1), 67–78.CrossRefGoogle Scholar
  61. 61.
    Wang, Z., Liu, X., He, B., Yu, F. (2015). A combined SDC-SDF architecture for normal I/O pipelined Radix-2 FFT. IEEE Transactions on VLSI Systems, 23(5), 973–977.CrossRefGoogle Scholar
  62. 62.
    Xia, K.F., Wu, B., Xiong, T., Ye, T.C. (2017). A memory-based FFT processor design with generalized efficient conflict-free address schemes. IEEE Transactions on VLSI Systems, 25(6), 1919–1929.CrossRefGoogle Scholar
  63. 63.
    Yang, C.H., Yu, T.H., Markoviċ, D. (2012). Power and area minimization of reconfigurable FFT processors: a 3GPP-LTE example. IEEE Journal of Solid-State Circuits, 47(3), 757– 768.CrossRefGoogle Scholar
  64. 64.
    Yang, K.J., Tsai, S.H., Chuang, G.C.H. (2013). MDC FFT/IFFT processor with variable length for MIMO-OFDM systems. IEEE Transactions on VLSI Systems, 21(4), 720–731.CrossRefGoogle Scholar
  65. 65.
    Yang, S.W., & Lee, J.Y. (2014). Constant twiddle factor multiplier sharing in multipath delay feedback parallel pipelined FFT processors. Electronics Letters, 50(15), 1050–1052.CrossRefGoogle Scholar
  66. 66.
    Yu, C., & Yen, M.h. (2015). Area-efficient 128- to 2048/1536-point pipeline FFT processor for LTE and mobile WiMAX systems. IEEE Transactions on VLSI Systems, 23(9), 1793–1800.MathSciNetCrossRefGoogle Scholar
  67. 67.
    Yu, C.L., Irick, K., Chakrabarti, C., Narayanan, V. (2011). Multidimensional DFT IP generator for FPGA platforms. IEEE Transactions on Circuits and Systems I: Regular Papers, 58(4), 755–764.MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Information and ElectronicsBeijing Institute of TechnologyBeijingChina

Personalised recommendations