Abstract
Two-dimensional orthogonal matching pursuit (2D-OMP) algorithm is an extension of the one-dimensional OMP (1D-OMP), whose complexity and memory usage are lower than the 1D-OMP when they are applied to 2D sparse signal recovery. However, the major shortcoming of the 2D-OMP still resides in long computing time. To overcome this disadvantage, we develop a novel parallel design strategy of the 2D-OMP algorithm on a graphics processing unit (GPU) in this paper. We first analyze the complexity of the 2D-OMP and point out that the bottlenecks lie in matrix inverse and projection. After adopting the strategy of matrix inverse update whose performance is superior to traditional methods to reduce the complexity of original matrix inverse, projection becomes the most time-consuming module. Hence, a parallel matrix–matrix multiplication leveraging tiling algorithm strategy is launched to accelerate projection computation on GPU. Moreover, a fast matrix–vector multiplication, a parallel reduction algorithm, and some other parallel skills are also exploited to boost the performance of the 2D-OMP further on GPU. In the case of the sensing matrix of size 128 \(\times \) 256 (176 \(\times \) 256, resp.) for a 256 \(\times \) 256 scale image, experimental results show that the parallel 2D-OMP achieves 17\(\times \) to 41\(\times \) (24\(\times \) to 62\(\times \), resp.) speedup over the original C code compiled with the O\(_2\) optimization option. Higher speedup would be further obtained with larger-size image recovery.
Similar content being viewed by others
References
Eldar YC, Kutyniok G (2012) Compressed sensing: theory and applications. Cambridge University Press, Cambridge
Donoho DL (2006) Compressed sensing. IEEE Trans Inf Theory 52(4):1289–1306
Tropp J, Gilbert A (2007) Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans Inf Theory 53(12):4655–4666
Cai TT, Wang L (2011) Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Trans Inf Theory 57(7):4680–4688
Yong FANG, JiaJi WU, Huang BM (2012) 2D sparse signal recovery via 2D orthogonal matching pursuit. Sci China Inf Sci 55(4):889–897
Ho T-Y, Lam P-M, Leung C-S (2008) Parallelization of cellular neural networks on GPU. Pattern Recognit 41:2684–2692
Brodtkorb A, Hagen TR, Sætra ML (2013) Graphics processing unit (GPU) programming strategies and trends in GPU computing. J Parallel Distrib Comput 73:4–13
Schulz C (2013) Efficient local search on the GPU—investigations on the vehicle routing problem. J Parallel Distrib Comput 73:14–31
Kirk DB, Hwu W (2010) Programming Massively Parallel Processors. Elsevier, Amsterdam
Shi Xiaohua, Li Chuang (2011) Computing prestack Kirchhoff time migration on general purpose GPU. Comput Geosci 37(10):1702–1710
Castaño-Díez D, Moser D, Schoenegger A, Pruggnaller S, Frangakis AS (2008) Performance evaluation of image processing algorithms on the GPU. J Struct Biol 164:153–160
Cao T-T, Tang K, Mohamed A, Tan T-S (2010) Parallel banding algorithm to compute exact distance transform with the GPU, I3D, pp 83–90
Krishnamurthy Adarsh, McMains Sara (2011) Accurate GPU-accelerated surface integrals for moment computation. Comput Aided Design 43(10):1284–1295
Colic A, Kalva H, Furht B (2010) Exploring NVIDIA-CUDA for video coding. In: Proceedings of the first annual ACM SIGMM conference on Multimedia systems, February 22–23, pp 13–22
Bakkum P, Skadron K (2010) Accelerating SQL database operations on a GPU with CUDA, GPGPU-3, pp 94–103
Veysi İşler SS (2011) A parallel algorithm for UAV flight route planning on GPU. Int J Parallel Prog 39:809–837
Nocentino AE, Rhodes PJ (2010) Optimizing memory access on GPUs using Morton Order Indexing, ACMSE10, April 15–17
Joseph M, Elble JM, Sahinidis NV, Vouzis P (2010) GPU computing with Kaczmarzs and other iterative algorithms for linear systems. Parallel Comput 36:215–231
Yang C-T, Huang C-L, Lin C-F (2011) Hybrid CUDA. OpenMP, and MPI parallel programming on multicore GPU clusters. Comput Phys Commun 182:266–269
Press WH, Teukolsky SA, Vetterling WT, Brian P (1992) Numerical Recipes. Cambridge University Press, Flannery
Krishnamoorthy A, Menon D (2011) Matrix Inversion Using Cholesky Decomposition, Mathematical Software
Hager WW (1989) Updating the inverse of a matrix. Soc Ind Appl Math 31(2):221–239
Beal MJ (20003) Variational Algorithms for Approximate Bayesian Inference, PhD. Thesis, The Gatsby Computational Neuroscience Unit, University College London
Sanders J, Kandrot E (2010) CUDA by example: an introduction to general-purpose GPU Programming, 1st edn. Addison-Wesley Professional
NVIDIA (2010) NVIDIA CUDA C Programming Guide
Rob farber (2011) CUDA application design and development
Acknowledgments
This work was supported by the National Science Foundation of China (Grant Nos. 61271280, 61001100) and the National Key Technology R & D Program (Grant Nos. 2012BAH29B04).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dai, Y., He, D., Fang, Y. et al. Accelerating 2D orthogonal matching pursuit algorithm on GPU. J Supercomput 69, 1363–1381 (2014). https://doi.org/10.1007/s11227-014-1188-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-014-1188-8