Skip to main content

The Implementation and Optimization of FFT Calculation Based on the MT-3000 Chip

  • Conference paper
  • First Online:
Intelligent Computers, Algorithms, and Applications (IC 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2036))

  • 154 Accesses

Abstract

Based on the accelerator chip MT-3000, the FFT algorithm in SAR imaging has been implemented and optimized. The optimization includes vectorization and MPI, DMA transmission and dual buffer transmission, and linear assembly. The experimental results show that on the platform, the performance increased by more than 99.2% after the FFT function was optimized.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Huang, J.: FFT based on YHFT-Matrix design and realize. National University of Defense Technology (2012)

    Google Scholar 

  2. Xiang, H.: FFT algorithm design and implementation. National University of Defense Technology (2014)

    Google Scholar 

  3. Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Mathem. Comput. 19, 297–301 (1965)

    Article  MathSciNet  Google Scholar 

  4. Duhamel, P., Hollmann, H.: Split radix FFT algorithm. Electron. Lett. 20(1), 14–16 (1984)

    Article  Google Scholar 

  5. Liu, H., Xie, Z.: Discussion and improvement of the FFT algorithm of the split foundation. Commun. Technol. 03, 124–125 (2008)

    Google Scholar 

  6. Swartzlander, E.E., Young, W.K.W., Joseph, S.J.: A radix 4 delay commutator for fast Fourier transform processor implementation. IEEE J. Solid-State Circ. 19(5), 702–709 (1984)

    Article  Google Scholar 

  7. Bouguezel, S., Ahmad, M.O., Swamy, M.N.S.: Improved radix-4 and radix-8 FFT algorithms. In: 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No. 04CH37512), vol. 3, p. III-561. IEEE (2004)

    Google Scholar 

  8. Singleton, R.C.: An algorithm for computing the mixed radix fast Fourier transform. IEEE Trans. Audio Electroac. AU 17(2), 93–103 (1969)

    Google Scholar 

  9. Bai, X.: Research on the development of high performance computing. Military Civilian Dual-use Technol. Products (02), 26–29 (2023). https://doi.org/10.19385/J.CNKI.1009-8119.2023.02.001

  10. Denham, M., Lamperti, E., Areta, J.: Weather radar data processing on graphic cards. J. Supercomput. 74(2), 868–885 (2018)

    Article  Google Scholar 

  11. Cui, Z., Quan, H., Cao, Z., Xu, S., Ding, C., Wu, J.: SAR Target CFAR detection via GPU parallel operation. IEEE J. Selected Topics Appli. Earth Observ. Remote Sensing 11(12), 4884–4894 (2018)

    Article  Google Scholar 

  12. Tiebin, W., Feng, G., Di, W.: Research on the core operation architecture of high-performance processor computing for E-Class. Comput. Eng. Sci. 45(05), 761–771 (2023)

    Google Scholar 

  13. AMD Instinct MI200 datasheet [EB/OL]. https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instinct-mi200-datasheet.pdf

  14. NVIDIA Hopper architecture whitepaper (H100 tensor coreGPU architecture)[EB/OL]. https://resources.nvidia.com/en-us-tensor-core

  15. Liao, K.: MPI transplantation based on FT-C6XX multi-core DSP realizes and optimization. National University of Defense Technology (2015)

    Google Scholar 

  16. Petersen, W., Arbenz, P.: Introduction to Parallel Computing: A practical guide with examples in C. OUP Oxford (2004)

    Google Scholar 

  17. Liu, S., et al.: A self-designed heterogeneous integration accelerator for E-class high performance computing. Computer Res. Develop. 58(06), 1234–1237 (2021)

    Google Scholar 

  18. Hu, Y.:32-bit high-performance M-DSP DMA design and verification of high-efficiency data transmission. National University of Defense Technology (2015)

    Google Scholar 

  19. Guo, P., Chen, M., Liang, Z., Ma, X., Xu, B.: Optimization and implementation of the accumulation algorithm for the FT-M7002 platform. Computer Eng. Sci. 44(11), 1909–1917 (2022)

    Google Scholar 

Download references

Acknowledgements

Thanks a lot for the research help from National University of Defense Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bangjian Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cheng, D., Li, G., Song, A., Xu, B. (2024). The Implementation and Optimization of FFT Calculation Based on the MT-3000 Chip. In: Cruz, C., Zhang, Y., Gao, W. (eds) Intelligent Computers, Algorithms, and Applications. IC 2023. Communications in Computer and Information Science, vol 2036. Springer, Singapore. https://doi.org/10.1007/978-981-97-0065-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0065-3_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0064-6

  • Online ISBN: 978-981-97-0065-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics