An Optimized Parallel IDCT on Graphics Processing Units

  • Biao Wang
  • Mauricio Alvarez-Mesa
  • Chi Ching Chi
  • Ben Juurlink
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7640)


In this paper we present an implementation of the H.264/AVC Inverse Discrete Cosine Transform (IDCT) optimized for Graphics Processing Units (GPUs) using OpenCL. By exploiting that most of the input data of the IDCT for real videos are zero valued coefficients a new compacted data representation is created that allows for several optimizations. Experimental evaluations conducted on different GPUs show average speedups from 1.7× to 7.4× compared to an optimized single-threaded SIMD CPU version.


IDCT GPU H.264 OpenCL parallel programming 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the H.264/AVC Video Coding Standard. IEEE Trans. on Circuits and Sys. for Video Technol. 13(7), 560–576 (2003)CrossRefGoogle Scholar
  2. 2.
  3. 3.
    Khronos OpenCL Working Group, The OpenCL Specification 1.1,
  4. 4.
    Malvar, H.S., Hallapuro, A., Karczewicz, M., Kerofsky, L.: Low-Complexity Transform and Quantization in H.264/AVC. IEEE Trans. on Circuits and Sys. for Video Technol. 13(7), 598–603 (2003)CrossRefGoogle Scholar
  5. 5.
    Sullivan, G.J., Topiwala, P., Luthra, A.: The H.264/AVC Advanced Video Coding Standard: Overview and Introduction to the Fidelity Range Extensions. In: SPIE Conf. on Applications of Digital Image Processing XXVII, pp. 454–474 (2004)Google Scholar
  6. 6.
    Chen, W.H., Smith, C., Fralick, S.: A Fast Computational Algorithm for the Discrete Cosine Transform. IEEE Transactions on Communications 25(9), 1004–1009 (1977)zbMATHCrossRefGoogle Scholar
  7. 7.
    Fang, B., Shen, G., Li, S., Chen, H.: Techniques for Efficient DCT/IDCT Implementation on Generic GPU. In: Proc. of the IEEE Int. Symp. on Circuits and Sys. (May 2005)Google Scholar
  8. 8.
    Obukhov, A., Kharlamov, A.: Discrete Cosine Transform for 8x8 Blocks with CUDA (October 2008),
  9. 9.
    FFmpeg, A H.264/AVC decoder,
  10. 10.
    Wittenbrink, C.M., Kilgariff, E., Prabhu, A.: Fermi GF100 GPU Architecture. IEEE Micro 31, 50–59 (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Biao Wang
    • 1
  • Mauricio Alvarez-Mesa
    • 1
    • 2
  • Chi Ching Chi
    • 1
  • Ben Juurlink
    • 1
  1. 1.Embedded Systems ArchitectureTechnische Universitat BerlinBerlinGermany
  2. 2.Multimedia CommunicationsFraunhofer HHIBerlinGermany

Personalised recommendations