Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

GPU-accelerated denoising of 3D magnetic resonance images


The raw computational power of GPU accelerators enables fast denoising of 3D MR images using bilateral filtering, anisotropic diffusion, and non-local means. In practice, applying these filtering operations requires setting multiple parameters. This study was designed to provide better guidance to practitioners for choosing the most appropriate parameters by answering two questions: what parameters yield the best denoising results in practice? And what tuning is necessary to achieve optimal performance on a modern GPU? To answer the first question, we use two different metrics, mean squared error (MSE) and mean structural similarity (MSSIM), to compare denoising quality against a reference image. Surprisingly, the best improvement in structural similarity with the bilateral filter is achieved with a small stencil size that lies within the range of real-time execution on an NVIDIA Tesla M2050 GPU. Moreover, inappropriate choices for parameters, especially scaling parameters, can yield very poor denoising performance. To answer the second question, we perform an autotuning study to empirically determine optimal memory tiling on the GPU. The variation in these results suggests that such tuning is an essential step in achieving real-time performance. These results have important implications for the real-time application of denoising to MR images in clinical settings that require fast turn-around times.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. 1.

    Bethel, E.W.: High performance, three-dimensional bilateral filtering. Tech. Rep. LBNL-1601E, Lawrence Berkeley National Laboratory (2009)

  2. 2.

    Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Volume 2, IEEE Computer Society, Washington, DC, USA, CVPR ’05, p 60–65 (2005). doi:10.1109/CVPR.2005.38

  3. 3.

    Chen, J., Paris, S., Durand, F.: Real-time edge-aware image processing with the bilateral grid. In: SIGGRAPH ’07: ACM SIGGRAPH 2007 papers, ACM, New York, NY, USA, p 103 (2007).

  4. 4.

    Cocosco, C.A., Kollokian, V., Kwan, R.K.S., Pike, G.B., Evans, A.C.: BrainWeb: Online Interface to a 3D MRI Simulated Brain Database. NeuroImage 5, 425 (1997)

  5. 5.

    Coupe, P., Yger, P., Prima, S., Hellier, P., Kervrann, C., Barillot, C.: An optimized blockwise nonlocal means denoising filter for 3-d magnetic resonance images. IEEE Trans. Med. Imaging 27(4), 425–441 (2008). doi:10.1109/TMI.2007.906087

  6. 6.

    de la Cruz, R., Araya-Polo, M.: Towards a multi-level cache performance model for 3D stencil computation. Procedia Computer Science, Proceedings of the International Conference on Computational Sciences, ICCS 4, 2145–2155 (2011)

  7. 7.

    Darbon, J., Cunha, A., Chan, T., Osher, S., Jensen, G.: Fast nonlocal filtering applied to electron cryomicroscopy. In: Proceedings of the 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro (ISBI’08), pp. 1331–1334 (2008). doi:10.1109/ISBI.2008.4541250

  8. 8.

    Datta, K., Williams, S., Volkov, V., Carter, J., Oliker, L., Shalf, J., Yelick, K.: Auto-tuning the 27-point stencil for multicore. In: 4th International Workshop on Automatic Performance Tuning (iWAPT)

  9. 9.

    Eklund, A., Andersson, M., Knutsson, H.: True 4D image denoising on the GPU. Int. J. Biomed. Imaging (2011). doi:10.1155/2011/952819

  10. 10.

    Ganapathi, A., Datta, K., Fox, A., Patterson, D.: A case for machine learning to optimize multicore performance. In: Proceedings of the 1st USENIX Conference on Hot Topics in Parallelism (HotPar ’09), USENIX Association, Berkeley, CA, USA, HotPar’09, p 1–1 (2009).

  11. 11.

    Garcia, C., Botella, G., Ayuso, F., Prieto, M., Tirado, F.: Multi-gpu based on multicriteria optimization for motion estimation system. EURASIP. J. Adv. Signal Process. 1, 23 (2013)

  12. 12.

    Gerig, G., Kübler, O., Kikinis, R., Jolesz, F.A.: Nonlinear anisotropic filtering of MRI data. IEEE Trans. Med. Imaging 11(2), 221–232 (1992)

  13. 13.

    Hamarneh, G., Hradsky, J.: Bilateral filtering of diffusion tensor magnetic resonance images. IEEE Trans. Image Process. 16(10), 2463–2475 (2007). doi:10.1109/TIP.2007.904964

  14. 14.

    Hollingsworth, J., Tiwari, A.: End-to-end Auto-tuning with Active Harmony. In: Bailey, D.H., Lucas, R.F., Williams, S.W. (eds.) Performance Tuning of Scientific Applications, CRC, USA (2010)

  15. 15.

    Howison, M.: Comparing GPU implementations of bilateral and anisotropic diffusion filters for 3D biomedical datasets. Chicago, IL, USA (2010).

  16. 16.

    Ibanez, L., Schroeder, W., Ng, L.: Cates J. The ITK Software Guide, The Insight Segmentation and Registration Toolkit. Kitware (2003)

  17. 17.

    Kamil, S., Chan, C., Oliker, L., Shalf, J., Williams, S.: An auto-tuning framework for parallel multicore stencil computations. In: International Parallel and Distributed Processing Symposium (IPDPS) (2010)

  18. 18.

    Kwan, R., Evans, A., Pike, G.: MRI simulation-based evaluation of image-processing and classification methods. IEEE Trans. Med. Imaging 18(11), 1085–1097 (1999). doi:10.1109/42.816072

  19. 19.

    Landman, B.A., Huang, A.J., Gifford, A., Vikram, D.S., Lim, I.A.L., Farrell, J.A., Bogovic, J.A., Hua, J., Chen, M., Jarso, S., et al.: Multi-parametric neuroimaging reproducibility: a 3-t resource study. Neuroimage 54(4), 2854–2866 (2011)

  20. 20.

    Magni, A., Grewe, D., Johnson, N.: Input-aware auto-tuning for directive-based gpu programming. In: Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, ACM, GPGPU-6, pp. 66–75 (2013)

  21. 21.

    Mahmoudi, M., Sapiro, G.: Fast image and video denoising via nonlocal means of similar neighborhoods. IEEE Signal Process. Lett. 12(12), 839–842 (2005). doi:10.1109/LSP.2005.859509

  22. 22.

    Manjón, J.V., Carbonell-Caballero, J., Lull, J.J., Garca-Mart, G., Mart-Bonmat, L., Robles, M.: MRI denoising using non-local means. Medical Image Analysis 12(4), pp. 514–523 (2008). doi:10.1016/

  23. 23.

    McConnel Brain Imaging Center MNI.

  24. 24.

    NVIDIA Corporation: CUDA C Programming Guide. (2012)

  25. 25.

    Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell. 12(7), 629–639 (1990)

  26. 26.

    Rivera, G., Tseng, C.: Tiling optimizations for 3D scientific computations. In: SC ’00: Proceedings of the 2000 ACM/IEEE conference on Supercomputing (2000)

  27. 27.

    Ryoo, S., Rodrigues, C.I., Stone, S.S., Baghsorkhi, S.S., Ueng, S.Z., Stratton, J.A., Mei, W., Hwu, W.: Program optimization space pruning for a multithreaded GPU. In: CGO ’08: Proceedings of the Sixth Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 195–204 (2006)

  28. 28.

    Seymour, K., You, H., Dongarra, J.: A comparison of search heuristics for empirical code optimization. In: 2008 IEEE International Conference on Cluster Computing, pp. 421–429 (2008). doi:10.1109/CLUSTR.2008.4663803

  29. 29.

    Stone, S.S., Haldar, J.P., Tsao, S.C., Hwu, W.M., Sutton, B.P., Liang, Z.P.: Accelerating advanced mri reconstructions on gpus. J Parallel Distrib Comput 68(10):1307–1318 (2008).

  30. 30.

    Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: ICCV ’98: Proceedings of the Sixth International Conference on Computer Vision, IEEE Computer Society, Washington, DC, USA, p 839 (1998)

  31. 31.

    Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

  32. 32.

    Weickert, J., Benhamouda, B.: Why the Perona–Malik filter works. Tech. Rep. DKIU-TR-97/22, Department of Computer Science, University of Copenhagen (1997)

  33. 33.

    Williams, S., Datta, K., Oliker, L., Carter, J., Shalf, J., Yelick, K.: Auto-tuning Memory-Intensive Kernels for Multicore. In: Bailey, D.H., Lucas, R.F., Williams, S.W. (eds.) Performance Tuning of Scientific Applications. CRC, USA (2010)

  34. 34.

    Zheng, Z., Xu, W., Mueller, K.: Performance tuning for CUDA-accelerated neighborhood denoising filters. 3rd Workshop on High-Performance Image Reconstruction (HPIR), pp. 52–55. Potsdam, Germany (2011)

Download references


Thanks to Dani Ushizima, Alex Cunha, and Owen Carmichael for their comments and input on earlier versions of this study, and to D. Louis Collins for following up with us on problems accessing the BrainWeb database. This work was supported by the Director, Office of Science, Office and Advanced Scientific Computing Research, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This work was conducted using computational resources and services at the Center for Computation and Visualization, Brown University. Data used for this study were downloaded from the Biomedical Informatics Research Network (BIRN) Data Repository (, supported by grants to the BIRN Coordinating Center (U24-RR019701), Function BIRN (U24-RR021992), Morphometry BIRN (U24-RR021382), and Mouse BIRN (U24-RR021760) Testbeds funded by the National Center for Research Resources at the National Institutes of Health, USA.

Author information

Correspondence to Mark Howison.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Howison, M., Bethel, E.W. GPU-accelerated denoising of 3D magnetic resonance images. J Real-Time Image Proc 13, 713–724 (2017).

Download citation


  • MRI denoising
  • Non-local means
  • Bilateral filter
  • Anisotropic diffusion
  • CUDA