GPU-accelerated denoising of 3D magnetic resonance images

Original Research Paper

Abstract

The raw computational power of GPU accelerators enables fast denoising of 3D MR images using bilateral filtering, anisotropic diffusion, and non-local means. In practice, applying these filtering operations requires setting multiple parameters. This study was designed to provide better guidance to practitioners for choosing the most appropriate parameters by answering two questions: what parameters yield the best denoising results in practice? And what tuning is necessary to achieve optimal performance on a modern GPU? To answer the first question, we use two different metrics, mean squared error (MSE) and mean structural similarity (MSSIM), to compare denoising quality against a reference image. Surprisingly, the best improvement in structural similarity with the bilateral filter is achieved with a small stencil size that lies within the range of real-time execution on an NVIDIA Tesla M2050 GPU. Moreover, inappropriate choices for parameters, especially scaling parameters, can yield very poor denoising performance. To answer the second question, we perform an autotuning study to empirically determine optimal memory tiling on the GPU. The variation in these results suggests that such tuning is an essential step in achieving real-time performance. These results have important implications for the real-time application of denoising to MR images in clinical settings that require fast turn-around times.

Keywords

MRI denoising Non-local means Bilateral filter Anisotropic diffusion CUDA 

References

  1. 1.
    Bethel, E.W.: High performance, three-dimensional bilateral filtering. Tech. Rep. LBNL-1601E, Lawrence Berkeley National Laboratory (2009)Google Scholar
  2. 2.
    Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Volume 2, IEEE Computer Society, Washington, DC, USA, CVPR ’05, p 60–65 (2005). doi:10.1109/CVPR.2005.38
  3. 3.
    Chen, J., Paris, S., Durand, F.: Real-time edge-aware image processing with the bilateral grid. In: SIGGRAPH ’07: ACM SIGGRAPH 2007 papers, ACM, New York, NY, USA, p 103 (2007). http://doi.acm.org/10.1145/1275808.1276506
  4. 4.
    Cocosco, C.A., Kollokian, V., Kwan, R.K.S., Pike, G.B., Evans, A.C.: BrainWeb: Online Interface to a 3D MRI Simulated Brain Database. NeuroImage 5, 425 (1997)Google Scholar
  5. 5.
    Coupe, P., Yger, P., Prima, S., Hellier, P., Kervrann, C., Barillot, C.: An optimized blockwise nonlocal means denoising filter for 3-d magnetic resonance images. IEEE Trans. Med. Imaging 27(4), 425–441 (2008). doi:10.1109/TMI.2007.906087 CrossRefGoogle Scholar
  6. 6.
    de la Cruz, R., Araya-Polo, M.: Towards a multi-level cache performance model for 3D stencil computation. Procedia Computer Science, Proceedings of the International Conference on Computational Sciences, ICCS 4, 2145–2155 (2011)Google Scholar
  7. 7.
    Darbon, J., Cunha, A., Chan, T., Osher, S., Jensen, G.: Fast nonlocal filtering applied to electron cryomicroscopy. In: Proceedings of the 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro (ISBI’08), pp. 1331–1334 (2008). doi:10.1109/ISBI.2008.4541250
  8. 8.
    Datta, K., Williams, S., Volkov, V., Carter, J., Oliker, L., Shalf, J., Yelick, K.: Auto-tuning the 27-point stencil for multicore. In: 4th International Workshop on Automatic Performance Tuning (iWAPT)Google Scholar
  9. 9.
    Eklund, A., Andersson, M., Knutsson, H.: True 4D image denoising on the GPU. Int. J. Biomed. Imaging (2011). doi:10.1155/2011/952819
  10. 10.
    Ganapathi, A., Datta, K., Fox, A., Patterson, D.: A case for machine learning to optimize multicore performance. In: Proceedings of the 1st USENIX Conference on Hot Topics in Parallelism (HotPar ’09), USENIX Association, Berkeley, CA, USA, HotPar’09, p 1–1 (2009). http://dl.acm.org/citation.cfm?id=1855591.1855592
  11. 11.
    Garcia, C., Botella, G., Ayuso, F., Prieto, M., Tirado, F.: Multi-gpu based on multicriteria optimization for motion estimation system. EURASIP. J. Adv. Signal Process. 1, 23 (2013)CrossRefGoogle Scholar
  12. 12.
    Gerig, G., Kübler, O., Kikinis, R., Jolesz, F.A.: Nonlinear anisotropic filtering of MRI data. IEEE Trans. Med. Imaging 11(2), 221–232 (1992)CrossRefGoogle Scholar
  13. 13.
    Hamarneh, G., Hradsky, J.: Bilateral filtering of diffusion tensor magnetic resonance images. IEEE Trans. Image Process. 16(10), 2463–2475 (2007). doi:10.1109/TIP.2007.904964 CrossRefMathSciNetGoogle Scholar
  14. 14.
    Hollingsworth, J., Tiwari, A.: End-to-end Auto-tuning with Active Harmony. In: Bailey, D.H., Lucas, R.F., Williams, S.W. (eds.) Performance Tuning of Scientific Applications, CRC, USA (2010)Google Scholar
  15. 15.
    Howison, M.: Comparing GPU implementations of bilateral and anisotropic diffusion filters for 3D biomedical datasets. Chicago, IL, USA (2010). http://vis.lbl.gov/Publications/2010/LBNL-3425E
  16. 16.
    Ibanez, L., Schroeder, W., Ng, L.: Cates J. The ITK Software Guide, The Insight Segmentation and Registration Toolkit. Kitware (2003)Google Scholar
  17. 17.
    Kamil, S., Chan, C., Oliker, L., Shalf, J., Williams, S.: An auto-tuning framework for parallel multicore stencil computations. In: International Parallel and Distributed Processing Symposium (IPDPS) (2010)Google Scholar
  18. 18.
    Kwan, R., Evans, A., Pike, G.: MRI simulation-based evaluation of image-processing and classification methods. IEEE Trans. Med. Imaging 18(11), 1085–1097 (1999). doi:10.1109/42.816072 CrossRefGoogle Scholar
  19. 19.
    Landman, B.A., Huang, A.J., Gifford, A., Vikram, D.S., Lim, I.A.L., Farrell, J.A., Bogovic, J.A., Hua, J., Chen, M., Jarso, S., et al.: Multi-parametric neuroimaging reproducibility: a 3-t resource study. Neuroimage 54(4), 2854–2866 (2011)CrossRefGoogle Scholar
  20. 20.
    Magni, A., Grewe, D., Johnson, N.: Input-aware auto-tuning for directive-based gpu programming. In: Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, ACM, GPGPU-6, pp. 66–75 (2013)Google Scholar
  21. 21.
    Mahmoudi, M., Sapiro, G.: Fast image and video denoising via nonlocal means of similar neighborhoods. IEEE Signal Process. Lett. 12(12), 839–842 (2005). doi:10.1109/LSP.2005.859509 CrossRefGoogle Scholar
  22. 22.
    Manjón, J.V., Carbonell-Caballero, J., Lull, J.J., Garca-Mart, G., Mart-Bonmat, L., Robles, M.: MRI denoising using non-local means. Medical Image Analysis 12(4), pp. 514–523 (2008). doi:10.1016/j.media.2008.02.004. http://www.sciencedirect.com/science/article/pii/S1361841508000248
  23. 23.
    McConnel Brain Imaging Center MNI. http://www.bic.mni.mcgill.ca/brainweb/
  24. 24.
    NVIDIA Corporation: CUDA C Programming Guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html (2012)
  25. 25.
    Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell. 12(7), 629–639 (1990)CrossRefGoogle Scholar
  26. 26.
    Rivera, G., Tseng, C.: Tiling optimizations for 3D scientific computations. In: SC ’00: Proceedings of the 2000 ACM/IEEE conference on Supercomputing (2000)Google Scholar
  27. 27.
    Ryoo, S., Rodrigues, C.I., Stone, S.S., Baghsorkhi, S.S., Ueng, S.Z., Stratton, J.A., Mei, W., Hwu, W.: Program optimization space pruning for a multithreaded GPU. In: CGO ’08: Proceedings of the Sixth Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 195–204 (2006)Google Scholar
  28. 28.
    Seymour, K., You, H., Dongarra, J.: A comparison of search heuristics for empirical code optimization. In: 2008 IEEE International Conference on Cluster Computing, pp. 421–429 (2008). doi:10.1109/CLUSTR.2008.4663803
  29. 29.
    Stone, S.S., Haldar, J.P., Tsao, S.C., Hwu, W.M., Sutton, B.P., Liang, Z.P.: Accelerating advanced mri reconstructions on gpus. J Parallel Distrib Comput 68(10):1307–1318 (2008). http://dx.doi.org/10.1016/j.jpdc.2008.05.013
  30. 30.
    Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: ICCV ’98: Proceedings of the Sixth International Conference on Computer Vision, IEEE Computer Society, Washington, DC, USA, p 839 (1998)Google Scholar
  31. 31.
    Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)CrossRefGoogle Scholar
  32. 32.
    Weickert, J., Benhamouda, B.: Why the Perona–Malik filter works. Tech. Rep. DKIU-TR-97/22, Department of Computer Science, University of Copenhagen (1997)Google Scholar
  33. 33.
    Williams, S., Datta, K., Oliker, L., Carter, J., Shalf, J., Yelick, K.: Auto-tuning Memory-Intensive Kernels for Multicore. In: Bailey, D.H., Lucas, R.F., Williams, S.W. (eds.) Performance Tuning of Scientific Applications. CRC, USA (2010)Google Scholar
  34. 34.
    Zheng, Z., Xu, W., Mueller, K.: Performance tuning for CUDA-accelerated neighborhood denoising filters. 3rd Workshop on High-Performance Image Reconstruction (HPIR), pp. 52–55. Potsdam, Germany (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg (outside the USA) 2014

Authors and Affiliations

  1. 1.Center for Computation and VisualizationBrown UniversityProvidenceUSA
  2. 2.Computational Research Division, Lawrence Berkeley National LaboratoryBerkeleyUSA

Personalised recommendations