Abstract
Retinex is an image restoration approach used to restore the original appearance of an image. Among various methods, a center/surround retinex algorithm is favorable for parallelization because it uses the convolution operations with large-scale sizes to achieve dynamic range compression and color/lightness rendition. This paper presents a GPURetinex algorithm, which is a data parallel algorithm accelerating a modified center/surround retinex with GPGPU/CUDA. The GPURetinex algorithm exploits the massively parallel threading and heterogeneous memory hierarchy of a GPGPU to improve efficiency. Two challenging problems, irregular memory access and block size for data partition, are analyzed mathematically. The proposed mathematical models help optimally choose memory spaces and block sizes for maximal parallelization performance. The mathematical analyses are applied to three parallelization issues existing in the retinex problem: block-wise, pixel-wise, and serial operations. The experimental results conducted on GT200 GPU and CUDA 3.2 showed that the GPURetinex can gain 74 times acceleration, compared with an SSE-optimized single-threaded implementation on Core2 Duo for the images with 4,096 × 4,096 resolution. The proposed method also outperforms the parallel retinex implemented with the nVidia Performance Primitives library. Our experimental results indicate that careful design of memory access and multithreading patterns for CUDA devices should acquire great performance acceleration for real-time processing of image restoration.
This is a preview of subscription content, access via your institution.











References
Marsi, S., Saponara, S.: Integrated video motion estimator with retinex-like pre-processing for robust motion analysis in automotive scenarios: algorithmic and real-time architecture design. J. Real-Time Image Proc. 5(4), 275–289 (2010)
Meylan, L., Susstrunk, S.: High dynamic range image rendering with a retinex-based adaptive filter. IEEE Trans. Image Process. 15(9), 2820–2830 (2006)
Park, Y., Kim, J.: Fast adaptive smoothing based on LBP for robust face recognition. Electronic Lett. 43(24), 1350–1351 (2007)
Ra, J., Jang, J., Bae, Y.: Contrast-Enhanced Fusion of Multi-Sensor Images Using Subband-Decomposed Multiscale Retinex. IEEE Trans. Image Process. 21(8), 3479–3490 (2012)
Ebner, M.: Color constancy. John Wiley & Sons Ltd, pp. 143–153. Chichester, England (2007)
Land, E.: The Retinex. Amer. Scient. 52(2), 247–264 (1964)
Jobson, D.J., Rahman, Z., Woodell, G.A.: Properties and performance of a center/surround Retinex. IEEE Trans. Image Process. 6(3), 451–462 (1997)
Rahman, Z., Jobson, D., Woodell, G. A.: Multiscale Retinex for color image enhancement. In: Proceedings of the IEEE International Conference on Image Processing, vol. 3, pp. 1003–1006. Lausanne, Switzerland (1996)
Jobson, D.J., Rahman, Z., Woodell, G.A.: A multi-scale Retinex for bridging the gap between color images and the human observation of scenes. IEEE Transactions on Image Processing: Special Issue on Color Processing 6(7), 965–976 (1997)
Rahman, Z., Jobson, D., Woodell, G.A.: Retinex processing for automatic image enhancement. J. Electron. Imaging 13(1), 100–110 (2004)
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W.: Skadron. K.: A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. 68(10), 1370–1380 (2008)
Cope, B., Cheung, P.Y.K., Luk, W., Howes, L.: Performance comparison of graphics processors to reconfigurable logic: a case study. IEEE Trans. Comput. 59(4), 433–448 (2010)
Siegel, H.J., Wang, L., So, J.E., Maheswaran, M.: Data parallel algorithms. Parallel and Distributed Computing Handbook, pp. 466–499. McGraw-Hill, New York (1996)
Moreland, K., Angel, E.: The FFT on a GPU. In: SIGGRAPH/Eurographics Work-shop on Graphics Hardware, pp. 112–119. Aire-la-Ville, Switzerland (2003)
Strzodka, R., Garbe, C.: Real-time motion estimation and visualization on graphics cards. In: Proceedings of IEEE Visualization, pp.545–552. Austin, USA (2004)
Shen, G., Gao, G.P., Li, S., Shum, H., Zhang, Y.: Accelerate video decoding with generic GPU. IEEE Trans. Circuits Syst. Video Technol. 15(5), 685–693 (2005)
GPU4Vision, http://www.gpu4vision.org (2011)
Fung, J., Mann, S., Aimone, C.: OpenVIDIA: parallel GPU computer vision. In: Proceedings of ACM International Conference on Multimedia, pp. 849–852. Hilton, Singapore (2005)
Allusse, Y., Horain, P., Agarwal, A., Saipriyadarshan, C.: GpuCV: an opensource gpu-accelerated framework for image processing and computer vision. In: Proceedings of ACM International Conference on Multimedia, pp. 1089–1092. Vancouver, Canada (2008)
Babenko, P., Shah, M.: MinGPU: a minimum GPU library for computer vision. J. Real-Time Image Proc. 3(4), 255–268 (2008)
Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. Comput. Graph. Forum 26(1), 80–113 (2007)
Luo, Y., Duraiswami, R.: Canny edge detection on NVIDIA CUDA. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–8. Anchorage, USA (2008)
Lozano, M., Otsuka, K.: Real-time visual tracker by stream processing. J. Sign. Process. Syst. 57(2), 674–679 (2009)
Su, Y., Xu, Z.: Parallel implementation of wavelet-based image denoising on programmable PC-grade graphics hardware. Signal Process. 90(8), 2396–2411 (2009)
Herout, A., Josth, R., Havel, J., Hradis, M., Zemcik, P.: Real-time object detection on CUDA. J. Real-Time Image Proc. 6, 159–170 (2011)
Jensen, L.B.W., Kjar-Nielsen, A., Pauwels, K., Jessen, J.B., Hulle, M.V., Kruger, N.: A two-level real-time vision machine combining coarse- and fine-grained parallelism. J. Real-Time Image Proc. 5, 291–304 (2010)
Jang, B., Schaa, D., Mistry, P., Kaeli, D.: Exploiting memory access patterns to improve memory performance in data-parallel architectures. IEEE Trans. Parallel Distrib. Comput. 22(1), 105–118 (2011)
Brainard, D., Wandell, B.: Analysis of the Retinex theory of color vision. J. Opt. Soc. Am. A 3(10), 1651–1661 (1986)
Rizzi, A., Gatta, C., Marini, D.: From Retinex to automatic color equalization issues in developing a new algorithm for unsupervised color equalization. J. Electron. Imag. 13(1), 15–28 (2004)
Frankle, J., McCann, J.: Method and apparatus for lightness imaging. US Patent, 4384336 (1983)
McCann, J.: Lesson learned from mondrians applied to real images and color gamuts. In: Proceedings of IS&T/SID Seventh Color Imaging Conference, vol. 14, pp. 1–8. Scottsdale, USA (1999)
Sobol, R.: Improving the Retinex algorithm for rendering wide dynamic range photographs. J. Electron. Imag. 13(1), 65–74 (2004)
Land, E.: An alternative technique for the computation of the designator in the Retinex theory of color vision. In: Proceedings of the National Academy of Science, vol. 83, pp. 3078–3080. USA (1986)
Tao, L., Asari, V.: Modified luminance based MSR for fast and efficient image enhancement. In: 32nd Applied Imagery Pattern Recognition Workshop, pp. 174–179. Washington, USA (2003)
Seo, H., Kwon, O.: CUDA implementation of McCann99 Retinex Algorithm. In: Proceedings of International Conference Computer Science and Convergence Information Technology, pp. 388–393 (2010)
Wang, Z. N., Liu, C. Z., Lu, Y., Wu, M., Zhang, P.: The implementation of multi-scale Retinex image enhancement algorithm based on GPU via CUDA. In: Proceedings of International Symposium Intelligent Signal Processing and Communications Systems, pp. 1–4. (2010)
Wang, Y. K., Huang, W. B.: Acceleration of an improved Retinex algorithm. In: IS&T/SPIE Electronic Imaging, Parallel Processing for Image Applications. Proceedings of SPIE, vol. 7872. California, USA (2011)
Wang, Y.K., Huang, W.B.: Acceleration of the Retinex algorithm for image restoration by GPGPU/CUDA. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 72–77. Colorado, USA (2011)
Harris, M.: Optimizing parallel reduction in CUDA. In: NVIDIA Developer Technology (2007)
Podlozhnyuk, V.: Histogram calculation in CUDA. In: NVIDIA white paper (2007)
Shams, R., Kennedy, R.A.: Efficient histogram algorithms for NVIDIA CUDA compatible devices. In: Proceedings of International Conference Signal Processing and Communications Systems (ICSPCS), pp. 418–422. Gold Coast, Australia (2007)
NVIDIA Corporation: NVIDIA Performance Primitives (NPP), Version 4.0. (2011)
Kirk, D.B., Hwu, W.W.: Programming Massively Parallel Processors: A Hands-on Approach, pp. 77–94. Elsevier, Burlington (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, YK., Huang, WB. A CUDA-enabled parallel algorithm for accelerating retinex. J Real-Time Image Proc 9, 407–425 (2014). https://doi.org/10.1007/s11554-012-0301-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-012-0301-6