Semantics, Applications, and Implementation of Program Generation

Volume 1924 of the series Lecture Notes in Computer Science pp 190-211


Code Generators for Automatic Tuning of Numerical Kernels: Experiences with FFTW Position Paper

  • Richard VuducAffiliated withComputer Science Division, University of California at Berkeley
  • , James W. DemmelAffiliated withComputer Science Division and Dept. of Mathematics, University of California at Berkeley


Achieving peak performance in important numerical kernels such as dense matrix multiply or sparse-matrix vector multiplication usually requires extensive, machine-dependent tuning by hand. In response, a number automatic tuning systems have been developed which typically operate by (1) generating multiple implementations of a kernel, and (2) empirically selecting an optimal implementation. One such system is FFTW (Fastest Fourier Transform in the West) for the discrete Fourier transform. In this paper, we review FFTW’s inner workings with an emphasis on its code generator, and report on our empirical evaluation of the system on two different hardware and compiler platforms. We then describe a number of our own extensions to the FFTW code generator that compute effcient discrete cosine transforms and show promising speed-ups over a vendor-tuned library. We also comment on current opportunities to develop tuning systems in the spirit of FFTW for other widely-used kernels.