International Journal of Parallel Programming

, Volume 45, Issue 6, pp 1515–1535 | Cite as

GPU Parallelization of HEVC In-Loop Filters

  • Biao Wang
  • Diego F. de SouzaEmail author
  • Mauricio Alvarez-Mesa
  • Chi Ching Chi
  • Ben Juurlink
  • Aleksandar Ilic
  • Nuno Roma
  • Leonel Sousa


In the High Efficiency Video Coding (HEVC) standard, multiple decoding modules have been designed to take advantage of parallel processing. In particular, the HEVC in-loop filters (i.e., the deblocking filter and sample adaptive offset) were conceived to be exploited by parallel architectures. However, the type of the offered parallelism mostly suits the capabilities of multi-core CPUs, thus making a real challenge to efficiently exploit massively parallel architectures such as Graphic Processing Units (GPUs), mainly due to the existing data dependencies between the HEVC decoding procedures. In accordance, this paper presents a novel strategy to increase the amount of parallelism and the resulting performance of the HEVC in-loop filters on GPU devices. For this purpose, the proposed algorithm performs the HEVC filtering at frame-level and employs intrinsic GPU vector instructions. When compared to the state-of-the-art HEVC in-loop filter implementations, the proposed approach also reduces the amount of required memory transfers, thus further boosting the performance. Experimental results show that the proposed GPU in-loop filters deliver a significant improvement in decoding performance. For example, average frame rates of 76 frames per second (FPS) and 125 FPS for Ultra HD 4K are achieved on an embedded NVIDIA GPU for All Intra and Random Access configurations, respectively.


High Efficiency Video Coding (HEVC) Graphics Processor Unit (GPU) In-loop filters Parallelization Decoder 



This work was supported by national funds through FCT, under projects PTDC/EEI-ELC/3152/2012 and UID/CEC/50021/2013. Diego F. de Souza also acknowledges FCT for the Ph.D. scholarship SFRH/BD/76285/2011.


  1. 1.
    Bossen, F.: Common test conditions and software reference configurations. Doc. JCTVC-L1100 of JCT-VC (2013)Google Scholar
  2. 2.
    Bossen, F., Bross, B., Suhring, K., Flynn, D.: HEVC complexity and implementation analysis. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1685–1696 (2012). doi: 10.1109/TCSVT.2012.2221255 CrossRefGoogle Scholar
  3. 3.
    Chi, C.C., Alvarez-Mesa, M., Bross, B., Juurlink, B., Schierl, T.: SIMD acceleration for HEVC decoding. IEEE Trans. Circuits Syst. Video Technol. 25(5), 841–855 (2015). doi: 10.1109/TCSVT.2014.2364413 CrossRefGoogle Scholar
  4. 4.
    Chi, C.C., Alvarez-Mesa, M., Juurlink, B., Clare, G., Henry, F., Pateux, S., Schierl, T.: Parallel scalability and efficiency of HEVC parallelization approaches. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1827–1838 (2012). doi: 10.1109/TCSVT.2012.2223056 CrossRefGoogle Scholar
  5. 5.
    Cho, S., Kim, H., Kim, H.Y., Kim, M.: Efficient in-loop filtering across tile boundaries for multi-core HEVC hardware decoders with 4 K/8 K-UHD video applications. IEEE Trans. Multimed. 17(6), 778–791 (2015). doi: 10.1109/TMM.2015.2418995 CrossRefGoogle Scholar
  6. 6.
    Eldeken, A.F., Dansereau, R.M., Fouad, M.M., Salama, G.I.: High throughput parallel scheme for HEVC deblocking filter. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 1538–1542 (2015). doi: 10.1109/ICIP.2015.7351058
  7. 7.
    Fu, C.M., Alshina, E., Alshin, A., Huang, Y.W., Chen, C.Y., Tsai, C.Y., Hsu, C.W., Lei, S.M., Park, J.H., Han, W.J.: Sample adaptive offset in the HEVC standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1755–1764 (2012). doi: 10.1109/TCSVT.2012.2221529 CrossRefGoogle Scholar
  8. 8.
    Haglund, L.: The SVT high definition multi format test set. Tech. rep., Sveriges Television AB (SVT), Sweden (2006).
  9. 9.
    Hautala, I., Boutellier, J., Hannuksela, J., Silvén, O.: Programmable low-power multicore coprocessor architecture for HEVC/H.265 in-loop filtering. IEEE Trans. Circuits Syst. Video Technol. 25(7), 1217–1230 (2015). doi: 10.1109/TCSVT.2014.2369744 CrossRefGoogle Scholar
  10. 10.
    JCT-VC: High Efficient Video Coding (HEVC). ITU-T Recommendation H.265 and ISO/IEC 23008-2, ITU-T and ISO/IEC JTC 1 (2013)Google Scholar
  11. 11.
    JCT-VC: Subversion repository for the HEVC test model version HM 15.0 (2014).
  12. 12.
    Norkin, A., Bjøntegaard, G., Fuldseth, A., Narroschke, M., Ikeda, M., Andersson, K., Zhou, M., Van der Auwera, G.: HEVC deblocking filter. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1746–1754 (2012). doi: 10.1109/TCSVT.2012.2223053 CrossRefGoogle Scholar
  13. 13.
    Norkin, A., Bjontegaard, G., Fuldseth, A., Narroschke, M., Ikeda, M., Andersson, K., Zhou, M., Van der Auwera, G.: HEVC deblocking filter. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1746–1754 (2012)CrossRefGoogle Scholar
  14. 14.
    NVIDIA Corporation: NVIDIA\(^{{\textregistered }} \text{CUDA}^{{\rm TM}}\) Compute Unified Device Architecture Programming Guide (version 1.0: Jun. 2007 (and subsequent editions))Google Scholar
  15. 15.
    Ohm, J., Sullivan, G., Schwarz, H., Tan, T.K., Wiegand, T.: Comparison of the coding efficiency of video coding standards-including high efficiency video coding (HEVC). IEEE Trans. Circuits Syst. Video Technol. 22(12), 1669–1684 (2012)CrossRefGoogle Scholar
  16. 16.
    de Souza, D.F., Ilic, A., Roma, N., Sousa, L.: HEVC in-loop filters GPU parallelization in embedded systems. In: 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), pp. 123–130 (2015). doi: 10.1109/SAMOS.2015.7363667
  17. 17.
    Subramanya, P.N., Adireddy, R., Anand, D.: SAO in CTU decoding loop for HEVC video decoder. In: 2013 International Conference on Signal Processing and Communication (ICSC), pp. 507–511 (2013). doi: 10.1109/ICSPCom.2013.6719845
  18. 18.
    Sullivan, G.J., Ohm, J., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012). doi: 10.1109/TCSVT.2012.2221191 CrossRefGoogle Scholar
  19. 19.
    Wang, B., Alvarez-Mesa, M., Chi, C.C., Juurlink, B.: An optimized parallel IDCT on graphics processing units. In: Proceedings of the 18th International Conference on Parallel Processing Workshops, Euro-Par’12, pp. 155–164. Springer, Berlin, Heidelberg (2013). doi: 10.1007/978-3-642-36949-0_18.
  20. 20.
    Wang, B., Alvarez-Mesa, M., Chi, C.C., Juurlink, B.: Parallel H.264/AVC motion compensation for gpus using opencl. IEEE Trans. Circuits Syst. Video Technol. 25(3), 525–531 (2015). doi: 10.1109/TCSVT.2014.2344512 CrossRefGoogle Scholar
  21. 21.
    Zhou, W., Zhang, J., Zhou, X., Liu, Z., Liu, X.: A high-throughput and multi-parallel VLSI architecture for HEVC deblocking filter. IEEE Trans. Multimed. PP(99), 1–1 (2016). doi: 10.1109/TMM.2016.2537217 Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Biao Wang
    • 1
  • Diego F. de Souza
    • 2
    Email author
  • Mauricio Alvarez-Mesa
    • 1
  • Chi Ching Chi
    • 1
  • Ben Juurlink
    • 1
  • Aleksandar Ilic
    • 2
  • Nuno Roma
    • 2
  • Leonel Sousa
    • 2
  1. 1.AES, Technische Universität BerlinBerlinGermany
  2. 2.INESC-ID, IST, Universidade de LisboaLisbonPortugal

Personalised recommendations