Skip to main content

An Implementation of Parallel 2-D FFT Using Intel AVX Instructions on Multi-core Processors

  • Conference paper
Algorithms and Architectures for Parallel Processing (ICA3PP 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7440))

Abstract

In this paper, we propose an implementation of a parallel two-dimensional fast Fourier transform (FFT) using Intel Advanced Vector Extensions (AVX) instructions on multi-core processors. The combination of vectorization and a block two-dimensional FFT algorithm is shown to effectively improve performance. We vectorized FFT kernels using the AVX instructions. Performance results of two-dimensional FFTs on multi-core processors are reported. We successfully achieved a performance of over 61 GFlops on an Intel Xeon E5-2670 (2.6 GHz, two CPUs, 16 cores) and over 24 GFlops on an Intel Core i7-3930K (3.2 GHz, one CPU, six cores) for a 212×212-point FFT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  2. Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93, 216–231 (2005)

    Article  Google Scholar 

  3. Püschel, M., Moura, J.M.F., Johnson, J., Padua, D., Veloso, M., Singer, B.W., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N.: SPIRAL: Code generation for DSP transforms. Proc. IEEE 93, 232–275 (2005)

    Article  Google Scholar 

  4. McFarlin, D.S., Arbatov, V., Franchetti, F., Püschel, M.: Automatic SIMD vectorization of fast Fourier transforms for the Larrabee and AVX instruction sets. In: Proc. 25th International Conference on Supercomputing, ICS 2011, pp. 265–274 (2011)

    Google Scholar 

  5. Takahashi, D.: Implementation and evaluation of parallel FFT using SIMD instructions on multi-core processors. In: Proc. 2007 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems, IWIA 2007, pp. 53–59 (2007)

    Google Scholar 

  6. Intel Corporation: Intel 64 and IA-32 Architectures Software Developer’s Manual, Volume 1: Basic Architecture (2012)

    Google Scholar 

  7. Intel Corporation: Intel C++ Compiler XE 12.1 User and Reference Guides (2011)

    Google Scholar 

  8. Brigham, E.O.: The Fast Fourier Transform and its Applications. Prentice-Hall, Englewood Cliffs (1988)

    Google Scholar 

  9. Van Loan, C.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia (1992)

    Book  MATH  Google Scholar 

  10. Swarztrauber, P.N.: FFT algorithms for vector computers. Parallel Computing 1, 45–63 (1984)

    Article  MATH  Google Scholar 

  11. Intel Corporation: Intel Math Kernel Library Reference Manual (2012)

    Google Scholar 

  12. Marr, D.T., Binns, F., Hill, D.L., Hinton, G., Koufaty, D.A., Miller, J.A., Upton, M.: Hyper-threading technology architecture and microarchitecture. Intel Technology Journal 6, 1–11 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Takahashi, D. (2012). An Implementation of Parallel 2-D FFT Using Intel AVX Instructions on Multi-core Processors. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2012. Lecture Notes in Computer Science, vol 7440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33065-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33065-0_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33064-3

  • Online ISBN: 978-3-642-33065-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics