Skip to main content

Parallel HEVC Decoding on Multi- and Many-core Architectures

A Power and Performance Analysis


The Joint Collaborative Team on Video Decoding is developing a new standard named High Efficiency Video Coding (HEVC) that aims at reducing the bitrate of H.264/AVC by another 50 %. In order to fulfill the computational demands of the new standard, in particular for high resolutions and at low power budgets, exploiting parallelism is no longer an option but a requirement. Therefore, HEVC includes several coding tools that allows to divide each picture into several partitions that can be processed in parallel, without degrading the quality nor the bitrate. In this paper we adapt one of these approaches, the Wavefront Parallel Processing (WPP) coding, and show how it can be implemented on multi- and many-core processors. Our approach, named Overlapped Wavefront (OWF), processes several partitions as well as several pictures in parallel. This has the advantage that the amount of (thread-level) parallelism stays constant during execution. In addition, performance and power results are provided for three platforms: a server Intel CPU with 8 cores, a laptop Intel CPU with 4 cores, and a TILE-Gx36 with 36 cores from Tilera. The results show that our parallel HEVC decoder is capable of achieving an average frame rate of 116 fps for 4k resolution on a standard multicore CPU. The results also demonstrate that exploiting more parallelism by increasing the number of cores can improve the energy efficiency measured in terms of Joules per frame substantially.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11


  1. 1.

    Advanced video coding for generic audiovisual services (2003). ITU-T Recommendation H.264 and ISO/IEC 14496-10 (MPEG-4 AVC).

  2. 2.

    Akenine-Möller, T., & Johnsson, B. (2012). Performance per what? Journal of Computer Graphics Techniques (JCGT), 1(1), 37–41.

    Google Scholar 

  3. 3.

    Alvarez-Mesa, M., Chi, C.C., Juurlink, B., George, V., Schierl, T. (2012) . Parallel video decoding in the emerging HEVC standard. In Proceedings of the 37th international conference on acoustics, speech, and signal processing (ICASSP).

  4. 4.

    Bell, S., Edwards, B., Amann, J., Conlin, R., Joyce, K., Leung, V., MacKay, J., Reif, M., Bao, L., Brown, J., Mattina, M., Miao, C.C., Ramey, C., Wentzlaff, D., Anderson, W., Berger, E. Fairbanks, N., Khan, D., Montenegro, F., Stickney, J., Zook, J. (2008). TILE64–processor: a 64-Core SoC with mesh interconnect. In Digest of technical papers of the IEEE international solid-state circuits conference (ISSCC) (pp. 88–598).

  5. 5.

    Bossen, F. (2012). Common test conditions and software reference configurations. Tech. Rep. JCTVC-H1100.

  6. 6.

    Bross, B., Han, W.J., Sullivan, G.J., Ohm, J.R., Wiegand, T. (2012). High Efficiency Video Coding (HEVC) text specification draft 8. Tech. Rep. JCTVC-J1003.

  7. 7.

    Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P. (2000). A portable programming interface for performance evaluation on modern processors. International Journal of High Performance Computing Applications, 14(3), 189–204.

    Article  Google Scholar 

  8. 8.

    Chi, C.C., & Juurlink, B. (2011). A QHD-capable parallel H.264 decoder. In Proceedings of the international conference on super-computing (ICS) (pp. 317–326).

  9. 9.

    Chi, C.C., Alvarez-Mesa, M., Juurlink, B., Clare, G., Henry, F., Pateux, S., Schierl, T. (2012). Parallel scalability and efficiency of HEVC parallelization approaches. IEEE Transactions of Circuits and Systems for Video Technology, 22(12).

  10. 10.

    Chi, C.C., Alvarez-Mesa, M., Juurlink, B., George, V., Schierl, T. (2012). Improving the parallelization efficiency of HEVC decoding. In Proceedings of IEEE international conference on image processing (ICIP).

  11. 11.

    David, H., Gorbatov, E., Hanebutte, U.R., Khanna, R., Le, C. (2010). RAPL: memory power estimation and capping. In Proceedings of the ACM/IEEE international symposium on low-power electronics and design (ISLPED) (pp. 189–194).

  12. 12.

    der Tol, E.B.V., Jaspers, E.G.T., Gelderblom, R.H. (2003). Mapping of H.264 decoding on a multiprocessor architecture. In Proceedings of SPIE, 5022, image and video communications and processing (pp. 707–718).

  13. 13.

    Fu, C.M., Chen, C.Y., Tsai, C.Y., Huang, Y.W., Lei, S. (2011). CE13: sample adaptive offset with LCU-independent decoding. Tech. Rep. JCTVC-E409.

  14. 14.

    Fuldseth, A., Horowitz, M., Xu, S., Zhou, M. (2011). Tiles. Tech. Rep. JCTVC-E408.

  15. 15.

    Han, W.J., Min, J., Kim, I.K., Alshina, E., Alshin, A., Lee, T., Chen, J., Seregin, V., Lee, S., Hong, Y.M., Cheon, M.S., Shlyakhov, N., McCann, K., Davies, T., Park, J.H. (2010). Improved video compression efficiency through flexible unit representation and corresponding extension of coding tools. IEEE Transactions on Circuits and Systems for Video Technology, 20(12), 1709–1720.

    Article  Google Scholar 

  16. 16.

    Henry, F., & Pateux, S. (2011). Wavefront parallel processing. Tech. Rep. JCTVC-E196.

  17. 17.

    Hoffman, H., Kouadio, A., Thomas, Y., Visca, M. (2012). The Turin shoots. In EBU Tech-i (Vol. 13, pp. 8–9). Geneva: European Broadcasting Union (EBU).

  18. 18.

    Juurlink, B., Alvarez-Mesa, M., Chi, C.C., Azevedo, A., Meenderinck, C., Ramirez, A. (2012). Scalable parallel programming applied to H.264/AVC decoding. Berlin: Springer.

  19. 19.

    Meenderinck, C., Azevedo, A., Alvarez, M., Juurlink, B., Ramírez, A. (2009). Parallel scalability of video decoders. Journal of Signal Processing Systems, 57, 173–194.

    Article  Google Scholar 

  20. 20.

    Misra, K., Zhao, J., Segall, A. (2010). Entropy slices for parallel entropy coding. Tech. Rep. JCTVC-B111.

  21. 21.

    Rotem, E., Naveh, A., Rajwan, D., Ananthakrishnan, A.,Weissmann, E. (2012). Power-management architecture of the Intel microarchitecture code-named Sandy Bridge. IEEE Micro, 32(2), 20–27.

    Article  Google Scholar 

  22. 22.

    Seitner, F.H., Schreier, R.M., Bleyer, M., Gelautz, M. (2008). Evaluation of data-parallel splitting approaches for H.264 decoding. In Proceedings of the international conference on advances in mobile computing and multimedia (pp. 40–49).

  23. 23.

    Sullivan, G.J., & Ohm, J.R. (2010). Recent developments in standardization of High Efficiency Video Coding (HEVC). In Proceedings of SPIE, applications of digital image processing XXXIII (pp. 77980V-77980V-7).

  24. 24.

    Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T. (2012). Overview of the High Efficiency Video Coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology, 22(12).

  25. 25.

    Tourapis, A.M. (2002). Enhanced predictive zonal search for single and multiple frame motion estimation. In Proceedings of SPIE visual communications and image processing 2002 (pp. 1069–1079).

Download references

Author information



Corresponding author

Correspondence to Mauricio Alvarez-Mesa.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Chi, C.C., Alvarez-Mesa, M., Lucas, J. et al. Parallel HEVC Decoding on Multi- and Many-core Architectures. J Sign Process Syst 71, 247–260 (2013).

Download citation


  • HEVC
  • Video coding
  • Parallel processing
  • Power analysis
  • Real-time 4k
  • UHD