Skip to main content

A CUDA-enabled parallel algorithm for accelerating retinex


Retinex is an image restoration approach used to restore the original appearance of an image. Among various methods, a center/surround retinex algorithm is favorable for parallelization because it uses the convolution operations with large-scale sizes to achieve dynamic range compression and color/lightness rendition. This paper presents a GPURetinex algorithm, which is a data parallel algorithm accelerating a modified center/surround retinex with GPGPU/CUDA. The GPURetinex algorithm exploits the massively parallel threading and heterogeneous memory hierarchy of a GPGPU to improve efficiency. Two challenging problems, irregular memory access and block size for data partition, are analyzed mathematically. The proposed mathematical models help optimally choose memory spaces and block sizes for maximal parallelization performance. The mathematical analyses are applied to three parallelization issues existing in the retinex problem: block-wise, pixel-wise, and serial operations. The experimental results conducted on GT200 GPU and CUDA 3.2 showed that the GPURetinex can gain 74 times acceleration, compared with an SSE-optimized single-threaded implementation on Core2 Duo for the images with 4,096 × 4,096 resolution. The proposed method also outperforms the parallel retinex implemented with the nVidia Performance Primitives library. Our experimental results indicate that careful design of memory access and multithreading patterns for CUDA devices should acquire great performance acceleration for real-time processing of image restoration.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11


  1. Marsi, S., Saponara, S.: Integrated video motion estimator with retinex-like pre-processing for robust motion analysis in automotive scenarios: algorithmic and real-time architecture design. J. Real-Time Image Proc. 5(4), 275–289 (2010)

    Article  Google Scholar 

  2. Meylan, L., Susstrunk, S.: High dynamic range image rendering with a retinex-based adaptive filter. IEEE Trans. Image Process. 15(9), 2820–2830 (2006)

    Article  Google Scholar 

  3. Park, Y., Kim, J.: Fast adaptive smoothing based on LBP for robust face recognition. Electronic Lett. 43(24), 1350–1351 (2007)

    Article  Google Scholar 

  4. Ra, J., Jang, J., Bae, Y.: Contrast-Enhanced Fusion of Multi-Sensor Images Using Subband-Decomposed Multiscale Retinex. IEEE Trans. Image Process. 21(8), 3479–3490 (2012)

    Article  MathSciNet  Google Scholar 

  5. Ebner, M.: Color constancy. John Wiley & Sons Ltd, pp. 143–153. Chichester, England (2007)

  6. Land, E.: The Retinex. Amer. Scient. 52(2), 247–264 (1964)

    Google Scholar 

  7. Jobson, D.J., Rahman, Z., Woodell, G.A.: Properties and performance of a center/surround Retinex. IEEE Trans. Image Process. 6(3), 451–462 (1997)

    Article  Google Scholar 

  8. Rahman, Z., Jobson, D., Woodell, G. A.: Multiscale Retinex for color image enhancement. In: Proceedings of the IEEE International Conference on Image Processing, vol. 3, pp. 1003–1006. Lausanne, Switzerland (1996)

  9. Jobson, D.J., Rahman, Z., Woodell, G.A.: A multi-scale Retinex for bridging the gap between color images and the human observation of scenes. IEEE Transactions on Image Processing: Special Issue on Color Processing 6(7), 965–976 (1997)

    Article  Google Scholar 

  10. Rahman, Z., Jobson, D., Woodell, G.A.: Retinex processing for automatic image enhancement. J. Electron. Imaging 13(1), 100–110 (2004)

    Article  Google Scholar 

  11. Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W.: Skadron. K.: A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. 68(10), 1370–1380 (2008)

    Article  Google Scholar 

  12. Cope, B., Cheung, P.Y.K., Luk, W., Howes, L.: Performance comparison of graphics processors to reconfigurable logic: a case study. IEEE Trans. Comput. 59(4), 433–448 (2010)

    Article  MathSciNet  Google Scholar 

  13. Siegel, H.J., Wang, L., So, J.E., Maheswaran, M.: Data parallel algorithms. Parallel and Distributed Computing Handbook, pp. 466–499. McGraw-Hill, New York (1996)

  14. Moreland, K., Angel, E.: The FFT on a GPU. In: SIGGRAPH/Eurographics Work-shop on Graphics Hardware, pp. 112–119. Aire-la-Ville, Switzerland (2003)

  15. Strzodka, R., Garbe, C.: Real-time motion estimation and visualization on graphics cards. In: Proceedings of IEEE Visualization, pp.545–552. Austin, USA (2004)

  16. Shen, G., Gao, G.P., Li, S., Shum, H., Zhang, Y.: Accelerate video decoding with generic GPU. IEEE Trans. Circuits Syst. Video Technol. 15(5), 685–693 (2005)

    Article  Google Scholar 

  17. GPU4Vision, (2011)

  18. Fung, J., Mann, S., Aimone, C.: OpenVIDIA: parallel GPU computer vision. In: Proceedings of ACM International Conference on Multimedia, pp. 849–852. Hilton, Singapore (2005)

  19. Allusse, Y., Horain, P., Agarwal, A., Saipriyadarshan, C.: GpuCV: an opensource gpu-accelerated framework for image processing and computer vision. In: Proceedings of ACM International Conference on Multimedia, pp. 1089–1092. Vancouver, Canada (2008)

  20. Babenko, P., Shah, M.: MinGPU: a minimum GPU library for computer vision. J. Real-Time Image Proc. 3(4), 255–268 (2008)

    Article  Google Scholar 

  21. Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. Comput. Graph. Forum 26(1), 80–113 (2007)

    Article  Google Scholar 

  22. Luo, Y., Duraiswami, R.: Canny edge detection on NVIDIA CUDA. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–8. Anchorage, USA (2008)

  23. Lozano, M., Otsuka, K.: Real-time visual tracker by stream processing. J. Sign. Process. Syst. 57(2), 674–679 (2009)

    Google Scholar 

  24. Su, Y., Xu, Z.: Parallel implementation of wavelet-based image denoising on programmable PC-grade graphics hardware. Signal Process. 90(8), 2396–2411 (2009)

    Article  Google Scholar 

  25. Herout, A., Josth, R., Havel, J., Hradis, M., Zemcik, P.: Real-time object detection on CUDA. J. Real-Time Image Proc. 6, 159–170 (2011)

    Article  Google Scholar 

  26. Jensen, L.B.W., Kjar-Nielsen, A., Pauwels, K., Jessen, J.B., Hulle, M.V., Kruger, N.: A two-level real-time vision machine combining coarse- and fine-grained parallelism. J. Real-Time Image Proc. 5, 291–304 (2010)

    Article  Google Scholar 

  27. Jang, B., Schaa, D., Mistry, P., Kaeli, D.: Exploiting memory access patterns to improve memory performance in data-parallel architectures. IEEE Trans. Parallel Distrib. Comput. 22(1), 105–118 (2011)

    Article  Google Scholar 

  28. Brainard, D., Wandell, B.: Analysis of the Retinex theory of color vision. J. Opt. Soc. Am. A 3(10), 1651–1661 (1986)

    Article  Google Scholar 

  29. Rizzi, A., Gatta, C., Marini, D.: From Retinex to automatic color equalization issues in developing a new algorithm for unsupervised color equalization. J. Electron. Imag. 13(1), 15–28 (2004)

    Article  Google Scholar 

  30. Frankle, J., McCann, J.: Method and apparatus for lightness imaging. US Patent, 4384336 (1983)

  31. McCann, J.: Lesson learned from mondrians applied to real images and color gamuts. In: Proceedings of IS&T/SID Seventh Color Imaging Conference, vol. 14, pp. 1–8. Scottsdale, USA (1999)

  32. Sobol, R.: Improving the Retinex algorithm for rendering wide dynamic range photographs. J. Electron. Imag. 13(1), 65–74 (2004)

    Article  Google Scholar 

  33. Land, E.: An alternative technique for the computation of the designator in the Retinex theory of color vision. In: Proceedings of the National Academy of Science, vol. 83, pp. 3078–3080. USA (1986)

  34. Tao, L., Asari, V.: Modified luminance based MSR for fast and efficient image enhancement. In: 32nd Applied Imagery Pattern Recognition Workshop, pp. 174–179. Washington, USA (2003)

  35. Seo, H., Kwon, O.: CUDA implementation of McCann99 Retinex Algorithm. In: Proceedings of International Conference Computer Science and Convergence Information Technology, pp. 388–393 (2010)

  36. Wang, Z. N., Liu, C. Z., Lu, Y., Wu, M., Zhang, P.: The implementation of multi-scale Retinex image enhancement algorithm based on GPU via CUDA. In: Proceedings of International Symposium Intelligent Signal Processing and Communications Systems, pp. 1–4. (2010)

  37. Wang, Y. K., Huang, W. B.: Acceleration of an improved Retinex algorithm. In: IS&T/SPIE Electronic Imaging, Parallel Processing for Image Applications. Proceedings of SPIE, vol. 7872. California, USA (2011)

  38. Wang, Y.K., Huang, W.B.: Acceleration of the Retinex algorithm for image restoration by GPGPU/CUDA. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 72–77. Colorado, USA (2011)

  39. Harris, M.: Optimizing parallel reduction in CUDA. In: NVIDIA Developer Technology (2007)

  40. Podlozhnyuk, V.: Histogram calculation in CUDA. In: NVIDIA white paper (2007)

  41. Shams, R., Kennedy, R.A.: Efficient histogram algorithms for NVIDIA CUDA compatible devices. In: Proceedings of International Conference Signal Processing and Communications Systems (ICSPCS), pp. 418–422. Gold Coast, Australia (2007)

  42. NVIDIA Corporation: NVIDIA Performance Primitives (NPP), Version 4.0. (2011)

  43. Kirk, D.B., Hwu, W.W.: Programming Massively Parallel Processors: A Hands-on Approach, pp. 77–94. Elsevier, Burlington (2010)

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Yuan-Kai Wang.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Wang, YK., Huang, WB. A CUDA-enabled parallel algorithm for accelerating retinex. J Real-Time Image Proc 9, 407–425 (2014).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Center/surround retinex
  • CUDA
  • Parallelization
  • Image restoration