A Study on a Method of Effective Memory Utilization on GPU Applied for Neighboring Filter on Image Processing

  • Yoshio Yanagihara
  • Yuki Minamiura
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 145)


In this paper, the methods of implementing neighboring filters on newly supplied Graphics Processing Unit (GPU) are described. In general, neighboring filters are always utilized in image processing. Mainly in consideration of memory accesses, four methods implementing neighboring filtering are proposed and compared. The experimental result shows that one of the proposed methods (called “4X-block”) at the block size of 16 is the fastest among them, when loading and processing data in shared memory in GPU. It is also shown that this method is about 1.45X faster than the basic method implemented on GPU.


Graphic Processing Unit Block Size Memory Access Shared Memory Central Processing Unit 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sanders, J., Kandrot, E.: Cuda By Example. Addison-Wesley (2010)Google Scholar
  2. 2.
    Munekawa, Y., Ino, F., Hagihara, K.: Acceleration Smith-Waterman Algorithm for Biological Database Search on CUDA-Compatible GPUs. IEICE Trans. Inf. & Syst., E93-D 6 (2010)Google Scholar
  3. 3.
    Chen, G., Li, G., Pei, S., Wu, B.: Gpgpu supported cooperative acceleration in molecular dynamics. In: Proc. Conf. of Computer Supported Cooperative Work in Design, pp. 113–118 (2009)Google Scholar
  4. 4.
    Nukada, S.M.A.: Auto-tuning 3-d fft library for cuda gpu. In: Proc. High Performance Computing Networking (2009)Google Scholar
  5. 5.
    Fontes, F.P.X., Barroso, G.A., Coupe, P., Hellier, P.: Real time ultrasound image denoising. J. Real-Time Image Proc. (2010)Google Scholar
  6. 6.
    Yang, Y., Zhong, Z., Wang, J., Sorberg, T.: Real-Time GPU-Aided Ling Tumor Tracking. In: Fourth Symp. on Image and Video Technology (2010)Google Scholar
  7. 7.
    Yanagihara, Y.: A Study of Region Extraction and System Model on an Observation System of Time-Sequenced 3-D CT Images. In: Proc. ISITA, M-TA-4 (2008)Google Scholar
  8. 8.
    Yanagihara, Y.: A study about software architecture for realtime processing and smoothed presentation on an observation system of time-sequenced 3-D CT images. CARS 5,1, S341 (2010)Google Scholar
  9. 9.
    Fialka, O., Cadik, M.: FFT and Convolution Performance in Image Filtering on GPU. In: Proc. of ICIV, pp. 609–614 (2006)Google Scholar
  10. 10.
    Ogawa, K., Ito, Y., Nakano, K.: Efficient Canny Edge Detection Using a GPU. In: Proc. of ICNC, pp. 279–280 (2010)Google Scholar
  11. 11.
    Zhang, N., Chen, Y., Wang, J.: Image parallel processing based on GPU. In: Proc. of ICACC, pp. 367–370 (2010)Google Scholar
  12. 12.
    Kalentiv, O., Rai, A., Kemniz, S., Achneider, R.: Connected component labelling on a 2D grid using CUDA. J. Parallel Distrib. Compt. 71, 615–620 (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Osaka City UniversityOsakaJapan

Personalised recommendations