An Optimized Parallel IDCT on Graphics Processing Units

Wang, Biao; Alvarez-Mesa, Mauricio; Chi, Chi Ching; Juurlink, Ben

doi:10.1007/978-3-642-36949-0_18

Biao Wang²⁷,
Mauricio Alvarez-Mesa^27,28,
Chi Ching Chi²⁷ &
…
Ben Juurlink²⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7640))

Included in the following conference series:

European Conference on Parallel Processing

2653 Accesses
5 Citations

Abstract

In this paper we present an implementation of the H.264/AVC Inverse Discrete Cosine Transform (IDCT) optimized for Graphics Processing Units (GPUs) using OpenCL. By exploiting that most of the input data of the IDCT for real videos are zero valued coefficients a new compacted data representation is created that allows for several optimizations. Experimental evaluations conducted on different GPUs show average speedups from 1.7× to 7.4× compared to an optimized single-threaded SIMD CPU version.

Download to read the full chapter text

Chapter PDF

Image and Video Processing on GPU: Implementation Scheme, Applications and Future Directions

A 3D graphics rendering pipeline implementation based on the openCL massively parallel processing

Article 04 January 2021

GPUs and Multicore CPUs Implementations of a Static Video Summarization

Keywords

References

Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the H.264/AVC Video Coding Standard. IEEE Trans. on Circuits and Sys. for Video Technol. 13(7), 560–576 (2003)
Article Google Scholar
NVIDIA, NVIDIA CUDA C Programming Guide 4.2, http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf
Khronos OpenCL Working Group, The OpenCL Specification 1.1, http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf
Malvar, H.S., Hallapuro, A., Karczewicz, M., Kerofsky, L.: Low-Complexity Transform and Quantization in H.264/AVC. IEEE Trans. on Circuits and Sys. for Video Technol. 13(7), 598–603 (2003)
Article Google Scholar
Sullivan, G.J., Topiwala, P., Luthra, A.: The H.264/AVC Advanced Video Coding Standard: Overview and Introduction to the Fidelity Range Extensions. In: SPIE Conf. on Applications of Digital Image Processing XXVII, pp. 454–474 (2004)
Google Scholar
Chen, W.H., Smith, C., Fralick, S.: A Fast Computational Algorithm for the Discrete Cosine Transform. IEEE Transactions on Communications 25(9), 1004–1009 (1977)
Article MATH Google Scholar
Fang, B., Shen, G., Li, S., Chen, H.: Techniques for Efficient DCT/IDCT Implementation on Generic GPU. In: Proc. of the IEEE Int. Symp. on Circuits and Sys. (May 2005)
Google Scholar
Obukhov, A., Kharlamov, A.: Discrete Cosine Transform for 8x8 Blocks with CUDA (October 2008), http://www.nvidia.com/content/cudazone/cuda_sdk/Image_Video_Processing_and_Data_Compression.html
FFmpeg, A H.264/AVC decoder, http://ffmpeg.org/
Wittenbrink, C.M., Kilgariff, E., Prabhu, A.: Fermi GF100 GPU Architecture. IEEE Micro 31, 50–59 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Embedded Systems Architecture, Technische Universitat Berlin, Berlin, Germany
Biao Wang, Mauricio Alvarez-Mesa, Chi Ching Chi & Ben Juurlink
Multimedia Communications, Fraunhofer HHI, Berlin, Germany
Mauricio Alvarez-Mesa

Authors

Biao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Mauricio Alvarez-Mesa
View author publications
You can also search for this author in PubMed Google Scholar
Chi Ching Chi
View author publications
You can also search for this author in PubMed Google Scholar
Ben Juurlink
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Technology Institute and Press “Diophantus” & Department of Computer Engineering and Informatics, University of Patras, 26504, Rio, Greece
Ioannis Caragiannis
Technische Universität Wien, Austria
Michael Alexander
Artificial Intelligence Research Institute (IIIA), Spanish National Research Council (CSIC), Spain
Rosa Maria Badia
Department of Medical and Surgical Sciences, Bioinformatics Laboratory, University Magna Græcia of Catanzaro, 88100, Catanzaro, Italy
Mario Cannataro
Inria Rennes, France
Alexandru Costan
Dept. Computer Science, Univ. Pisa, Largo Pontecorvo 3, 56127, Pisa, Italy
Marco Danelutto
Inria, 46 Allée d’Italie, 69364, Lyon Cedex 7, France
Frédéric Desprez
Université de Versailles, France
Bettina Krammer
Department of Computer Engineering (DISCA), Universitat Politècnica de València, Spain
Julio Sahuquillo
Oak Ridge National Laboratory, USA
Stephen L. Scott
Technische Universität München, Germany
Josef Weidendorfer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, B., Alvarez-Mesa, M., Chi, C.C., Juurlink, B. (2013). An Optimized Parallel IDCT on Graphics Processing Units. In: Caragiannis, I., et al. Euro-Par 2012: Parallel Processing Workshops. Euro-Par 2012. Lecture Notes in Computer Science, vol 7640. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36949-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-36949-0_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36948-3
Online ISBN: 978-3-642-36949-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Optimized Parallel IDCT on Graphics Processing Units

Abstract

Chapter PDF

Similar content being viewed by others

Image and Video Processing on GPU: Implementation Scheme, Applications and Future Directions

A 3D graphics rendering pipeline implementation based on the openCL massively parallel processing

GPUs and Multicore CPUs Implementations of a Static Video Summarization

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An Optimized Parallel IDCT on Graphics Processing Units

Abstract

Chapter PDF

Similar content being viewed by others

Image and Video Processing on GPU: Implementation Scheme, Applications and Future Directions

A 3D graphics rendering pipeline implementation based on the openCL massively parallel processing

GPUs and Multicore CPUs Implementations of a Static Video Summarization

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation