Skip to main content
Log in

An efficient parallel entropy coding method for JPEG compression based on GPU

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The fast JPEG image compression algorithm is a requisite in many applications such as high-speed video measurement systems and digital cinema. Many existing methods have implemented the JPEG compression in parallel based on GPU except for entropy coding, which is a variable-length coding method and seems like a better fit for sequential implementation. However, entropy coding is an essential part of the JPEG compression system and typically takes up a large proportion of the time when implemented on the CPU. To tackle this problem, we propose an efficient parallel entropy coding (EPEnt) method for parallel JPEG compressing. The proposed method conducts entropy coding in three parallel steps: coding, shifting, and stuffing. Specifically, according to the different characteristics of image components, we devise thread-based and warp-based functions in the coding stage to further improve the efficiency under guaranteeing image quality, respectively. We apply the proposed method to the parallel JPEG compression system and evaluate the performance based on compute unified device architecture (CUDA). The experimental results demonstrate that compared with sequential implementation, the maximum speedup ratio of entropy coding can reach 39 times without affecting compressed images quality. Meanwhile, the whole JPEG compression process efficiency increases by at least 28% compared with state-of-the-art parallel methods in terms of speedup ratio.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Aguilar AH, Bonilla-Robles JC, Díaz JCZ et al (2019) Real-time video image processing through GPUs and CUDA and its future implementation in real problems in a Smart City. Int J Combinat Optim Prob Inform 10(3):33–49

    Google Scholar 

  2. Haiyan Zhang A (2002) Image compression. Technology 14(7):831–835

    Google Scholar 

  3. Li J, Wu J, Jeon G et al (2020) GPU acceleration of clustered DPCM for lossless compression of hyperspectral Images. IEEE Trans Industr Inf 16(5):2906–2916

    Article  Google Scholar 

  4. Wallace GK (1991) The JPEG still picture compression standard. Commun ACM 34(4):30–44

    Article  Google Scholar 

  5. Tadisetty S (2019) A novel ortho normalized multi-stage discrete fast Stockwell transform based memory-aware high-speed VLSI implementation for image compression. Multim Tools Appl 78(13):17673–17699

    Article  Google Scholar 

  6. Salah A, Li K, Hosny KM et al (2020) Accelerated CPU–GPUs implementations for quaternion polar harmonic transform of color images. Futur Gener Comput Syst 107:368–382

    Article  Google Scholar 

  7. Spiliotis IM, Bekakos MP, Boutalis YS (2020) Parallel implementation of the image block representation using OpenMP. J Parall Distrib Comput 137:134–147

    Article  Google Scholar 

  8. Hosny KM, Salah A, Saleh HI et al (2019) Fast computation of 2D and 3D Legendre moments using multi-core CPUs and GPU parallel architectures. J Real-Time Image Proc 16(6):2027–2041

    Article  Google Scholar 

  9. Yuan Y, Yang X, Wu W et al (2019) A fast single-image super-resolution method implemented with CUDA. J Real-Time Image Proc 16(1):81–97

    Article  Google Scholar 

  10. Alqudami N, Kim SD (2016) OpenCL-based optimization methods for utilizing forward DCT and quantization of image compression on a heterogeneous platform. J Real-Time Image Proc 12(2):219–235

    Article  Google Scholar 

  11. Ghetia S, Gajjar N, Gajjar R (2013) Implementation of 2-D discrete cosine transform algorithm on GPU. Int J Adv Res Electric Electron Instrum Eng 2(7):3024–3030

    Google Scholar 

  12. Haweel RT, El-Kilani WS, Ramadan HH (2016) Fast approximate DCT with GPU implementation for image compression. J Vis Commun Image Represent 40:357–365

    Article  Google Scholar 

  13. Obukhov A, Kharlamov A (2008) Discrete cosine transform for 8x8 blocks with CUDA. NVIDIA white paper

  14. Tokdemir S, Belkasim S. Parallel processing of DCT on GPU. 2011 Data Compression Conference. IEEE, 2011: 479–479

  15. Shan R, Zhou X, Wang CY et al (2016) All phase discrete sine biorthogonal transform and its application in JPEG-like image coding using GPU. TIIS 10(9):4467–4486

    Google Scholar 

  16. Wang C, Shan R, Zhou X (2015) APBT-JPEG image coding based on GPU. KSII Trans Int Inform Syst (TIIS) 9(4):1457–1470

    Google Scholar 

  17. Shatnawi MKA, Shatnawi HA A performance model of fast 2D-DCT parallel JPEG encoding using CUDA GPU and SMP-architecture. 2014 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2014: 1-6

  18. Liu D, Fan XY. Parallel program design for JPEG compression encoding. 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, 2012: 2502–2506.

  19. Enfedaque P, Auli-Llinas F, Moure JC. Strategies of SIMD computing for image coding in GPU. 2015 IEEE 22nd International Conference on High Performance Computing (HiPC). IEEE, 2015: 345–354.

  20. Balevic A. Parallel variable-length encoding on GPGPUs. International Conference on Parallel Processing, 2009: 26–35.

  21. Patel P, Wong J, Tatikonda M, et al. JPEG compression algorithm using CUDA. Department of Computer Engineering, University of Toronto, Course Project for ECE, 2009, 1724.

  22. Zhang M, Zhang J, Qiu X (2017) Parallel design and implementation of JPEG compression algorithm based on OpenCL. Comput Eng Sci 39(5):855–860

    Google Scholar 

  23. Rahmani H, Topal C, Akinlar C (2014) A parallel Huffman coder on the CUDA architecture[C]. In: IEEE Visual Communications and Image Processing Conference, vol 2014. IEEE, pp 311–314

  24. Sudarshan ESC and Chigarapalle S, 2017 A compact parallel Huffman entropy coding technique on GPGPU using CUDA. ARPN J Eng Appl Sci 7111–7118.

  25. Single pass prefix sum in a vertex shader. U.S. Patent Application 16/007,893. 2019.

  26. M. Harris, S. Sengupta, J. D. Owens, H. Nguyen. Parallal prefix Sum (Scan) with CUDA, in: GPU Gems 3 Part VI: GPU Computing, Addison Wesley, 2007: 851–876.

  27. Sengupta S, A. E Lefohn, J.D. Owens. A work-efficient step-efficient prefix sum algorithm, in: Workshop on Edge Computing Using New Commodity Architectures, 2006.

  28. Shan R, Wang C, Huang W, Zhou X (2015) DCT-JPEG image coding based on GPU. Int J Hybrid Inform Technol 8(5):293–302

    Article  Google Scholar 

  29. NVIDIA CUDA C++ Programming Guide, 10.2, 2018

  30. Harris M. Optimizing parallel reduction in cuda, [online] Available: https://developer.download. nvidia.com/assets/cuda/files/reduction.pdf.

  31. Sodsong W, Jung M, Park J, et al. JParEnt: Parallel entropy decoding for JPEG decompression on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, 2017, 29(15).

  32. Hore A, Ziou D. Image Quality Metrics: PSNR vs. SSIM. International Conference on Pattern Recognition, 2010: 2366–2369.

  33. Pereira AD, Ramos L, Goes LF et al (2015) PSkel: A stencil programming framework for CPU-GPU systems. Concurren Comput Prac Exp 27(17):4938–4953

    Article  Google Scholar 

  34. Tian J , Rivera C , Di S , et al. Revisiting huffman coding: toward extreme performance on modern GPU Architectures[C]// The 35th IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2020.

  35. Yamamoto N, Nakano K, Ito Y, et al. Huffman Coding with Gap Arrays for GPU Acceleration[C]//49th International Conference on Parallel Processing-ICPP. 2020: 1–11.

Download references

Acknowledgements

The authors would sincerely like to thank the editor and anonymous reviewers for their detailed review.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hua Yan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, F., Yan, H. An efficient parallel entropy coding method for JPEG compression based on GPU. J Supercomput 78, 2681–2708 (2022). https://doi.org/10.1007/s11227-021-03971-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-03971-6

Keywords

Navigation