Abstract
In this paper we present an implementation of the H.264/AVC Inverse Discrete Cosine Transform (IDCT) optimized for Graphics Processing Units (GPUs) using OpenCL. By exploiting that most of the input data of the IDCT for real videos are zero valued coefficients a new compacted data representation is created that allows for several optimizations. Experimental evaluations conducted on different GPUs show average speedups from 1.7× to 7.4× compared to an optimized single-threaded SIMD CPU version.
Chapter PDF
Similar content being viewed by others
References
Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the H.264/AVC Video Coding Standard. IEEE Trans. on Circuits and Sys. for Video Technol. 13(7), 560–576 (2003)
NVIDIA, NVIDIA CUDA C Programming Guide 4.2, http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf
Khronos OpenCL Working Group, The OpenCL Specification 1.1, http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf
Malvar, H.S., Hallapuro, A., Karczewicz, M., Kerofsky, L.: Low-Complexity Transform and Quantization in H.264/AVC. IEEE Trans. on Circuits and Sys. for Video Technol. 13(7), 598–603 (2003)
Sullivan, G.J., Topiwala, P., Luthra, A.: The H.264/AVC Advanced Video Coding Standard: Overview and Introduction to the Fidelity Range Extensions. In: SPIE Conf. on Applications of Digital Image Processing XXVII, pp. 454–474 (2004)
Chen, W.H., Smith, C., Fralick, S.: A Fast Computational Algorithm for the Discrete Cosine Transform. IEEE Transactions on Communications 25(9), 1004–1009 (1977)
Fang, B., Shen, G., Li, S., Chen, H.: Techniques for Efficient DCT/IDCT Implementation on Generic GPU. In: Proc. of the IEEE Int. Symp. on Circuits and Sys. (May 2005)
Obukhov, A., Kharlamov, A.: Discrete Cosine Transform for 8x8 Blocks with CUDA (October 2008), http://www.nvidia.com/content/cudazone/cuda_sdk/Image_Video_Processing_and_Data_Compression.html
FFmpeg, A H.264/AVC decoder, http://ffmpeg.org/
Wittenbrink, C.M., Kilgariff, E., Prabhu, A.: Fermi GF100 GPU Architecture. IEEE Micro 31, 50–59 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, B., Alvarez-Mesa, M., Chi, C.C., Juurlink, B. (2013). An Optimized Parallel IDCT on Graphics Processing Units. In: Caragiannis, I., et al. Euro-Par 2012: Parallel Processing Workshops. Euro-Par 2012. Lecture Notes in Computer Science, vol 7640. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36949-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-36949-0_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36948-3
Online ISBN: 978-3-642-36949-0
eBook Packages: Computer ScienceComputer Science (R0)