Abstract
In this paper, we propose an implementation of a parallel one-dimensional real fast Fourier transform (FFT) on Intel Xeon Phi processors. The proposed implementation of the parallel one-dimensional real FFT is based on the conjugate symmetry property for the discrete Fourier transform (DFT) and the six-step FFT algorithm. We vectorized FFT kernels using the Intel Advanced Vector Extensions 512 (AVX-512) instructions, and parallelized the six-step FFT by using OpenMP. Performance results of one-dimensional FFTs on Intel Xeon Phi processors are reported. We successfully achieved a performance of over 91 GFlops on an Intel Xeon Phi 7250 (1.4 GHz, 68 cores) for a \(2^{29}\)-point real FFT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
OpenMP Application Program Interface. http://www.openmp.org/mp-documents/spec30.pdf
Intel Math Kernel Library Developer Reference (2017). https://software.intel.com/sites/default/files/managed/ff/c8/mkl-2017-developer-reference-c_0.pdf
Bailey, D.H.: FFTs in external or hierarchical memory. J. Supercomput. 4, 23–35 (1990)
Brigham, E.O.: The Fast Fourier Transform and Its Applications. Prentice-Hall, Upper Saddle River (1988)
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)
Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93, 216–231 (2005)
Hascoet, J., Nezan, J.F., Ensor, A., de Dinechin, B.D.: Implementation of a fast Fourier transform algorithm onto a manycore processor. In: Proceedings of the 2015 Conference on Design and Architectures for Signal and Image Processing (DASIP 2015) (2015)
Intel Corporation: Intel architecture instruction set extensions programming reference (2016). https://software.intel.com/sites/default/files/managed/26/40/319433-026.pdf
Intel Corporation: Intel C++ compiler 17.0 developer guide and reference (2016). https://software.intel.com/en-us/intel-cplusplus-compiler-17.0-user-and-reference-guide-pdf
Marr, D.T., Binns, F., Hill, D.L., Hinton, G., Koufaty, D.A., Miller, J.A., Upton, M.: Hyper-threading technology architecture and microarchitecture. Intel Technol. J. 6, 1–11 (2002)
McFarlin, D.S., Arbatov, V., Franchetti, F., Püschel, M.: Automatic SIMD vectorization of fast Fourier transforms for the Larrabee and AVX instruction sets. In: Proceedings of the 25th International Conference on Supercomputing (ICS 2011), pp. 265–274 (2011)
Püschel, M., Moura, J.M.F., Johnson, J.R., Padua, D., Veloso, M.M., Singer, B.W., Xiong, J., Franchetti, F., Gačić, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N.: SPIRAL: code generation for DSP transforms. Proc. IEEE 93, 232–275 (2005)
Sodani, A., et al.: Knights Landing: second-generation Intel Xeon Phi product. IEEE Micro 36, 34–46 (2016)
Swarztrauber, P.N.: FFT algorithms for vector computers. Parallel Comput. 1, 45–63 (1984)
Takahashi, D.: A blocking algorithm for FFT on cache-based processors. In: Hertzberger, B., Hoekstra, A., Williams, R. (eds.) HPCN-Europe 2001. LNCS, vol. 2110, pp. 551–554. Springer, Heidelberg (2001). doi:10.1007/3-540-48228-8_58
Takahashi, D.: A radix-16 FFT algorithm suitable for multiply-add instruction based on Goedecker method. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), vol. 2, pp. 665–668 (2003)
Takahashi, D.: An Implementation of parallel 2-D FFT using Intel AVX instructions on multi-core processors. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds.) ICA3PP 2012. LNCS, vol. 7440, pp. 197–205. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33065-0_21
Van Loan, C.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia (1992)
Acknowledgments
This research was partially supported by Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency (JST).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Takahashi, D. (2017). An Implementation of Parallel 1-D Real FFT on Intel Xeon Phi Processors. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2017. ICCSA 2017. Lecture Notes in Computer Science(), vol 10404. Springer, Cham. https://doi.org/10.1007/978-3-319-62392-4_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-62392-4_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62391-7
Online ISBN: 978-3-319-62392-4
eBook Packages: Computer ScienceComputer Science (R0)