Efficient Mapping of Streaming Applications for Image Processing on Graphics Cards
In the last decade, there has been a dramatic growth in research and development of massively parallel commodity graphics hardware both in academia and industry. Graphics card architectures provide an optimal platform for parallel execution of many number crunching loop programs from fields like image processing or linear algebra. However, it is hard to efficiently map such algorithms to the graphics hardware even with detailed insight into the architecture. This paper presents a multiresolution image processing algorithm and shows the efficient mapping of this type of algorithms to graphics hardware as well as double buffering concepts to hide memory transfers. Furthermore, the impact of execution configuration is illustrated and a method is proposed to determine offline the best configuration. Using CUDA as programming model, it is demonstrated that the image processing algorithm is significantly accelerated and that a speedup of more than \(145\times \) can be achieved on NVIDIA’s Tesla C1060 compared to a parallelized implementation on a Xeon Quad Core. For deployment in a streaming application with steadily new incoming data, it is shown that the memory transfer overhead to the graphics card is reduced by a factor of six using double buffering.
KeywordsCUDA OpenCL Image processing Mapping methodology Streaming application
We are indebted to our colleagues Philipp Kutzer and Michael Glaß for providing the sample pictures.
- 1.Baskaran, M., Bondhugula, U., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 1–10. ACM, February 2008. https://doi.org/10.1145/1345206.1345210
- 2.do Carmo Lucas, A., Ernst, R.: An image processor for digital film. In: Proceedings of IEEE 16th International Conference on Application-Specific Systems, Architectures, and Processors (ASAP), pp. 219–224. IEEE, July 2005. https://doi.org/10.1109/ASAP.2005.13
- 4.Dutta, H., Hannig, F., Teich, J., Heigl, B., Hornegger, H.: A design methodology for hardware acceleration of adaptive filter algorithms in image processing. In: Proceedings of IEEE 17th International Conference on Application-Specific Systems, Architectures, and Processors (ASAP), pp. 331–337. IEEE, September 2006. https://doi.org/10.1109/ASAP.2006.4
- 6.Kunz, D., Eck, K., Fillbrandt, H., Aach, T.: Nonlinear multiresolution gradient adaptive filter for medical images. In: Proceedings of the SPIE: Medical Imaging 2003: Image Processing, vol. 5032, pp. 732–742. SPIE, May 2003. https://doi.org/10.1117/12.481323
- 9.Membarth, R., Hannig, F., Dutta, H., Teich, J.: Efficient mapping of multiresolution image filtering algorithms on graphics processors. In: Bertels, K., Dimopoulos, N., Silvano, C., Wong, S. (eds.) SAMOS 2009. LNCS, vol. 5657, pp. 277–288. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03138-0_31CrossRefGoogle Scholar
- 10.Munshi, A.: The OpenCL Specification. Khronos OpenCL Working Group (2009)Google Scholar
- 12.Ryoo, S., Rodrigues, C., Stone, S., Baghsorkhi, S., Ueng, S., Hwu, W.: Program optimization study on a 128-core GPU. In: The First Workshop on General Purpose Processing on Graphics Processing Units (GPGPU) (2007)Google Scholar
- 13.Ryoo, S., Rodrigues, C., Baghsorkhi, S., Stone, S., Kirk, D., Wen-Mei, W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 73–82. ACM, February 2008. https://doi.org/10.1145/1345206.1345220
- 15.Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: Proceedings of the Sixth International Conference on Computer Vision, pp. 839–846, January 1998. https://doi.org/10.1109/ICCV.1998.710815
- 16.Wolfe, M., Shanklin, C., Ortega, L.: High Performance Compilers for Parallel Computing. Addison-Wesley Longman Publishing Co., Boston (1995)Google Scholar