Abstract
This paper presents the automatic library generation for modular FFT algorithms with arbitrary input sizes. We show how to represent the transform and its algorithms at a high abstraction level. Symbolic manipulations and code optimizations that use rewriting systems can then be systematically applied to generate a library with recursive function closure. The generated library is automatically optimized for the target computing platforms, and is intended to support modular algorithms for multivariate polynomial computations in the modpn library used by modpn. The resulting scalar and vector codes provide comparable speedup to the fixed-size code presented in [LJF10], which is an order of magnitude faster over the hand-tuned modpn library. Thread-level parallelism has also been utilized by the generated library and delivers additional speedup.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Meng, L., Johnson, J., Franchetti, F., Voronenko, Y., Moreno Maza, M., Xie, Y.: Spiral-Generated Modular FFT Algorithms. In: Proc. International Workshop on Parallel and Symbolic Computation (PASCO), pp. 169–170 (2010)
Voronenko, Y.: Library Generation for Linear Transforms. PhD. thesis, Electrical and Computer Engineering, Carnegie Mellon University (2008)
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. of Computation 19, 297–301 (1965)
Filatei, A., Li, X., Moreno Maza, M., Schost, É.: Implementation techniques for fast polynomial arithmetic in a high-level programming environment. In: Proc. ISSAC 2006, pp. 93–100. ACM Press, New York (2006)
Franchetti, F., Voronenko, Y., Püschel, M.: Formal Loop Merging for Signal Transforms. In: Proc. Programming Languages Design and Implementation (PLDI), pp. 315–326 (2005)
Franchetti, F., Voronenko, Y., Püschel, M.: FFT Program Generation for Shared Memory: SMP and Multicore. In: Proc. Supercomputing, SC (2006)
Franchetti, F., Voronenko, Y., Püschel, M.: A Rewriting System for the Vectorization of Signal Transforms. In: Daydé, M., Palma, J.M.L.M., Coutinho, Á.L.G.A., Pacitti, E., Lopes, J.C. (eds.) VECPAR 2006. LNCS, vol. 4395, pp. 363–377. Springer, Heidelberg (2007)
Johnson, J., Johnson, R.W., Rodriguez, D., Tolimieri, R.: A Methodology for Designing, Modifying, and Implementing Fourier Transform Algorithms on Various Architectures. IEEE Trans. Circuits Sys. 9 (1990)
Li, X., Maza, M.M.: Efficient implementation of polynomial arithmetic in a multiple-level programming environment. In: Iglesias, A., Takayama, N. (eds.) ICMS 2006. LNCS, vol. 4151, pp. 12–23. Springer, Heidelberg (2006)
Li, X., Moreno Maza, M., Pan, W.: Computations modulo regular chains. In: Proc. ISSAC 2009, pp. 239–246. ACM, New York (2009)
Li, X., Moreno Maza, M., Rasheed, R., Schost, É.: High-Performance Symbolic Computation in a Hybrid Compiled-Interpreted Programming Environment. In: Proc. CASA 2008. LNCS. Springer (2008)
Li, X., Moreno Maza, M., Schos, É.: Fast arithmetic for triangular sets: From theory to practice. In: Proc. ISSAC 2007, pp. 269–276. ACM Press (2007)
Montgomery, P.L.: Modular Multiplication Without Trial Division. Mathematics of Computation 44(170), 519–521 (1985)
Püschel, M., Moura, J., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R., Rizzolo, N.: SPIRAL: Code Generation for DSP Transforms. Proc. IEEE Special Issue on “Program Generation, Optimization, and Adaptation” 93(2), 232–275 (2005)
Spiral project website, http://www.spiral.net
Xiong, J., Johnson, J., Johnson, R., Padua, D.: SPL: A Language and Compiler for DSP Algorithms. In: Proc. PLDI, pp. 298–308 (2001)
Frigo, M., Johnson, S.G.: The Design and Implementation of FFTW3. Proc. IEEE Special Issue on “Program Generation, Optimization, and Adaptation” 93(2), 216–231 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Meng, L., Johnson, J. (2013). Automatic Parallel Library Generation for General-Size Modular FFT Algorithms. In: Gerdt, V.P., Koepf, W., Mayr, E.W., Vorozhtsov, E.V. (eds) Computer Algebra in Scientific Computing. CASC 2013. Lecture Notes in Computer Science, vol 8136. Springer, Cham. https://doi.org/10.1007/978-3-319-02297-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-02297-0_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02296-3
Online ISBN: 978-3-319-02297-0
eBook Packages: Computer ScienceComputer Science (R0)