On the Use of Small 2D Convolutions on GPUs
Computing many small 2D convolutions using FFTs is a basis for a large number of applications in many domains in science and engineering, among them electromagnetic diffraction modeling in physics. The GPU architecture seems to be a suitable architecture to accelerate these convolutions, but reaching high application performance requires substantial development time and non-portable optimizations. In this work, we present the techniques, performance results and considerations to accelerate small 2D convolutions using CUDA, and compare performance to a multi-threaded CPU implementation. To improve programmability and performance of applications that make heavy use of small convolutions, we argue that two improvements to software and hardware are needed: FFT libraries must be extended with a single convolution function and communication bandwidth between CPU and GPU needs to be drastically improved.
KeywordsFast Fourier Transform Graphic Processing Unit Graphic Processing Unit Memory Graphic Processing Unit Architecture Fast Fourier Transform Size
Unable to display preview. Download preview PDF.
- 1.Govindaraju, N., Lloyd, B., Dotsenko, Y., Smith, B., Manferdelli, J.: High performance discrete fourier transforms on graphics processors. In: Proc. of the ACM/IEEE Conf. on Supercomputing, pp. 1–12. IEEE Press, Los Alamitos (2008)Google Scholar
- 2.Podlozhnyuk, V.: Image convolution with CUDA. Tech. rep., NVIDIA (2007)Google Scholar
- 3.NVIDIA: CUDA Programming Guide (February 2010)Google Scholar
- 4.Podlozhnyuk, V.: FFT-based 2D convolution. Tech. rep., NVIDIA (2007)Google Scholar
- 5.Podlozhnyuk, V.: Image convolution with CUDA. Tech. rep., NVIDIA (2007)Google Scholar