Skip to main content
Log in

Accelerating 2D orthogonal matching pursuit algorithm on GPU

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Two-dimensional orthogonal matching pursuit (2D-OMP) algorithm is an extension of the one-dimensional OMP (1D-OMP), whose complexity and memory usage are lower than the 1D-OMP when they are applied to 2D sparse signal recovery. However, the major shortcoming of the 2D-OMP still resides in long computing time. To overcome this disadvantage, we develop a novel parallel design strategy of the 2D-OMP algorithm on a graphics processing unit (GPU) in this paper. We first analyze the complexity of the 2D-OMP and point out that the bottlenecks lie in matrix inverse and projection. After adopting the strategy of matrix inverse update whose performance is superior to traditional methods to reduce the complexity of original matrix inverse, projection becomes the most time-consuming module. Hence, a parallel matrix–matrix multiplication leveraging tiling algorithm strategy is launched to accelerate projection computation on GPU. Moreover, a fast matrix–vector multiplication, a parallel reduction algorithm, and some other parallel skills are also exploited to boost the performance of the 2D-OMP further on GPU. In the case of the sensing matrix of size 128 \(\times \) 256 (176 \(\times \) 256, resp.) for a 256 \(\times \) 256 scale image, experimental results show that the parallel 2D-OMP achieves 17\(\times \) to 41\(\times \) (24\(\times \) to 62\(\times \), resp.) speedup over the original C code compiled with the O\(_2\) optimization option. Higher speedup would be further obtained with larger-size image recovery.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Eldar YC, Kutyniok G (2012) Compressed sensing: theory and applications. Cambridge University Press, Cambridge

    Book  Google Scholar 

  2. Donoho DL (2006) Compressed sensing. IEEE Trans Inf Theory 52(4):1289–1306

    Article  MathSciNet  MATH  Google Scholar 

  3. Tropp J, Gilbert A (2007) Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans Inf Theory 53(12):4655–4666

    Article  MathSciNet  MATH  Google Scholar 

  4. Cai TT, Wang L (2011) Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Trans Inf Theory 57(7):4680–4688

    Article  MathSciNet  Google Scholar 

  5. Yong FANG, JiaJi WU, Huang BM (2012) 2D sparse signal recovery via 2D orthogonal matching pursuit. Sci China Inf Sci 55(4):889–897

    Article  MathSciNet  MATH  Google Scholar 

  6. Ho T-Y, Lam P-M, Leung C-S (2008) Parallelization of cellular neural networks on GPU. Pattern Recognit 41:2684–2692

    Article  MATH  Google Scholar 

  7. Brodtkorb A, Hagen TR, Sætra ML (2013) Graphics processing unit (GPU) programming strategies and trends in GPU computing. J Parallel Distrib Comput 73:4–13

    Article  Google Scholar 

  8. Schulz C (2013) Efficient local search on the GPU—investigations on the vehicle routing problem. J Parallel Distrib Comput 73:14–31

    Article  Google Scholar 

  9. Kirk DB, Hwu W (2010) Programming Massively Parallel Processors. Elsevier, Amsterdam

    Google Scholar 

  10. Shi Xiaohua, Li Chuang (2011) Computing prestack Kirchhoff time migration on general purpose GPU. Comput Geosci 37(10):1702–1710

    Article  Google Scholar 

  11. Castaño-Díez D, Moser D, Schoenegger A, Pruggnaller S, Frangakis AS (2008) Performance evaluation of image processing algorithms on the GPU. J Struct Biol 164:153–160

    Article  Google Scholar 

  12. Cao T-T, Tang K, Mohamed A, Tan T-S (2010) Parallel banding algorithm to compute exact distance transform with the GPU, I3D, pp 83–90

  13. Krishnamurthy Adarsh, McMains Sara (2011) Accurate GPU-accelerated surface integrals for moment computation. Comput Aided Design 43(10):1284–1295

    Article  Google Scholar 

  14. Colic A, Kalva H, Furht B (2010) Exploring NVIDIA-CUDA for video coding. In: Proceedings of the first annual ACM SIGMM conference on Multimedia systems, February 22–23, pp 13–22

  15. Bakkum P, Skadron K (2010) Accelerating SQL database operations on a GPU with CUDA, GPGPU-3, pp 94–103

  16. Veysi İşler SS (2011) A parallel algorithm for UAV flight route planning on GPU. Int J Parallel Prog 39:809–837

    Article  Google Scholar 

  17. Nocentino AE, Rhodes PJ (2010) Optimizing memory access on GPUs using Morton Order Indexing, ACMSE10, April 15–17

  18. Joseph M, Elble JM, Sahinidis NV, Vouzis P (2010) GPU computing with Kaczmarzs and other iterative algorithms for linear systems. Parallel Comput 36:215–231

    Article  MathSciNet  MATH  Google Scholar 

  19. Yang C-T, Huang C-L, Lin C-F (2011) Hybrid CUDA. OpenMP, and MPI parallel programming on multicore GPU clusters. Comput Phys Commun 182:266–269

    Article  Google Scholar 

  20. Press WH, Teukolsky SA, Vetterling WT, Brian P (1992) Numerical Recipes. Cambridge University Press, Flannery

    Google Scholar 

  21. Krishnamoorthy A, Menon D (2011) Matrix Inversion Using Cholesky Decomposition, Mathematical Software

  22. Hager WW (1989) Updating the inverse of a matrix. Soc Ind Appl Math 31(2):221–239

    MathSciNet  MATH  Google Scholar 

  23. Beal MJ (20003) Variational Algorithms for Approximate Bayesian Inference, PhD. Thesis, The Gatsby Computational Neuroscience Unit, University College London

  24. Sanders J, Kandrot E (2010) CUDA by example: an introduction to general-purpose GPU Programming, 1st edn. Addison-Wesley Professional

  25. NVIDIA (2010) NVIDIA CUDA C Programming Guide

  26. Rob farber (2011) CUDA application design and development

Download references

Acknowledgments

This work was supported by the National Science Foundation of China (Grant Nos. 61271280, 61001100) and the National Key Technology R & D Program (Grant Nos. 2012BAH29B04).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongjian He.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dai, Y., He, D., Fang, Y. et al. Accelerating 2D orthogonal matching pursuit algorithm on GPU. J Supercomput 69, 1363–1381 (2014). https://doi.org/10.1007/s11227-014-1188-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1188-8

Keywords

Navigation