Abstract
Sparse representation is a building block for many image processing applications such as compression, denoising, fusion and so on. In the era of “Big data”, the current spare representation methods generally do not meet the demand of time-efficiently processing the large image dataset. Aiming at this problem, this paper employed the contemporary general-purpose computing on the graphics processing unit (GPGPU) to extend a sparse representation method for big image datasets, IK-SVD, namely G-IK-SVD. The GPU-aided IK-SVD parallelized IK-SVD with three GPU optimization methods: (1) a batch-OMP algorithm based on GPU-aided Cholesky decomposition algorithm, (2) a GPU sparse matrix operation optimization method and (3) a hybrid parallel scheme. The experimental results indicate that (1) the GPU-aided batch-OMP algorithm shows speedups of up to 30 times than the sparse coding part of IK-SVD, (2) the optimized sparse matrix operations improve the whole procedure of IK-SVD up to 15 times,(3) the proposed parallel scheme can further accelerate the procedure of sparsely representing one large image dataset up to 24 times, and (4) G-IK-SVD can gain the same quality of dictionary learning as IK-SVD.
Similar content being viewed by others
References
Aharon M, Elad M, Bruckstein A (2006) Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Process 15:3736–3745
Nejati M, Samavi S, Shirani S (2015) Multi-focus image fusion using dictionary-based sparse representation. Inf Fusion 25:72–84
Zhao Y, Chen Q, Sui X, Gu G (2015) A novel infrared image super-resolution method based on sparse representation. Infrared Phys Technol 71:506–513
Zhang C, Wang S, Huang QJL, Liang C, Tian Q (2013) Image classification using spatial pyramid robust sparse coding. Pattern Recog Let 34:1046–1052
Xu Y, Yu L, Xu H, Zhang H, Nguyen T (2015) Vector sparse representation of color image using quaternion matrix analysis. IEEE Trans Signal Process 24:1315–1329
Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31:210–227
Vidal R, Ma Y, Sastry S (2005) Generalized principal component analysis (gpca). IEEE Trans Pattern Anal Mach Intell 27:210–227
Li J, Qiu M, Ming Z, Quan G, Qin X, Gu Z (2012) Online optimization for scheduling preemptable tasks on iaas cloud systems. J Parallel Distrib Comput 72:666–677
Wu G, Zhang H, Qiu M, Ming Z, Li J, Qin X (2013) A decentralized approach for mining event correlations in distributed system monitoring. J Parallel Distrib Comput 73(3):330–340 Models and Algorithms for High-Performance Distributed Data Mining
Wu G, Zhang H, Qiu M, Ming Z, Lib J, Qin X (2013) A decentralized approach for mining event correlations in distributed system monitoring. J Parallel Distrib Comput 73:330–340
Chen L, Ma Y, Liu P, Wei J, Jie W, He J (2015) A review of parallel computing for large-scale remote sensing image mosaicking. Clust Comput 18:517–529
Bartuschat D, Borsdorf A, Köstler H, Rubinstein R, Stürmer M (2009) A parallel k-svd implementation for ct image denoising. Tech. rep., Department of Computer Science
Li J, Sun J., Song Y, Xu Y, Zhao J (2014) Accelerating the reconstruction of magnetic resonance imaging by three-dimensional dual-dictionary learning using cuda. In: Annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 2412–2415
Duan H, Peng Y, Min G, Xiang X (2015) Distributed in-memory vocabulary tree for real-time retrieval of big data images. Ad Hoc Netw, pp 210–227
Mairal J, Bach F, Ponce J, Sapiro G (2009) Online dictionary learning for sparse coding. In: International Conference on machine learning, pp 689–696
Wang L, Lu K, Liu P, Ranjan R, Chen L (2014) Ik-svd: Dictionary learning for spatial big data via incremental atom update. Comput Sci Eng 16:41–52
Li L, Xue W, Jin Z (2013) A scalable helmholtz solver in grapes over large scale multi-core cluster. Concurr Comput Pract Exp 25:1722–1737
Nickolls J, Dally WJ (2010) The GPU computing era. IEEE Micro 30(2):56–69
Chen D, Li X, Cui D, Wang L, Lu D (2014) Global synchronization measurement of multivariate neural signals with massively parallel nonlinear interdependence analysis. IEEE Trans Neural Syst Rehab Eng 22:33–43
Chen D, Li X, Wang L, Khan S, Wang J, Zeng K, Cai C (2015) Fast and scalable multi-way analysis of massive neural data. IEEE Trans Comput 64:707–719
Yang D, Peterson GD, Li H (2012) Compressed sensing and cholesky decomposition on fpgas and gpus. Parallel Comput 38:421–437
Ashari A, Sedaghati N, Eisenlohr J, Sadayappan P (2015) A model-driven blocking strategy for load balanced sparse matrixvector multiplication on gpus. J Parallel Distrib Comput 76:3–15
NVIDIA CUDA C Programming Guide version 6.5 (2015)
Jiang S, Hao X (2007) Hybrid fourier-wavelet image denoising. Electr Lett 43:1081–1082
Manikandan M, Saravanan A, Bagan KB (2007) Curvelet transform based embedded lossy image compression. In: International conference on signal processing communications and networking, pp 274–276
Zhou M, Chen H, Paisley J, Ren L, Li L, Xing Z, Dunson D, Sapiro G, Carin L (2012) Nonparametric bayesian dictionary learning for analysis of noisy and incomplete images. IEEE Trans Signal Process 21:130–144
Rubinstein R, Zibulevsky M, Elad M (2008) Efficient implementation of the k-svd algorithm using batch orthogonal matching pursuit. Tech. rep., Department of Computer Science, Israel Institute of Technology
Xu S, Xue W, Lin HX (2013) Performance modeling and optimization of sparse matrix-vector multiplication on nvidia cuda platform. J Supercomput 63:710–721
CUSP: The nvidia library of generic parallel algorithms for sparse linear algebra and graph computations on cuda architecture gpus (2015). https://developer.nvidia.com/cusp
cuSPARSE: The NVIDIA CUDA sparse matrix library (2015). http://docs.nvidia.com/cuda/cusparse/index.html
NVIDIA Corporation (2013) Kepler—the world’s fastest, most efficient hpc architecture. http://www.nvidia.com/object/nvidia-kepler.html
Xue W., Yang C, Fu H, Wang X, Xu Y, Gan L, Lu Y, Zhu X (2014) Enabling and scaling a global shallow-water atmospheric model on tianhe-2. In: International parallel & distributed processing symposium, pp 745–754
Xue W, Yang C, Fu H, Wang X, Xu Y, Liao J, Gan L, Lu Y, Ranjan R, Wang L (2015) Ultra-scalable cpu-mic acceleration of mesoscale atmospheric modeling on tianhe-2. IEEE Trans Comput 64:2382–2393
Yang C, Xue W, Fu H, Gan L, Li L, Xu Y, Lu Y, Sun J, Yang G, Zheng W (2013) A peta-scalable cpu-gpu algorithm for global atmospheric simulations. In: ACM SIGPLAN symposium on principles and practice of parallel programming, pp 1–12
Zhan X, Zhang R, Yin D, Huo C (2013) Sar image compression using multiscale dictionary learning and sparse representation. IEEE Geosci Remote Sens Lett 10:1090–1094
Acknowledgments
This paper was supported by the National Natural Science Foundation of China (No. 41471368 and No. 41571413).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Song, W., Deng, Z., Wang, L. et al. G-IK-SVD: parallel IK-SVD on GPUs for sparse representation of spatial big data. J Supercomput 73, 3433–3450 (2017). https://doi.org/10.1007/s11227-016-1652-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1652-8