Journal of Signal Processing Systems

, Volume 71, Issue 3, pp 247–260 | Cite as

Parallel HEVC Decoding on Multi- and Many-core Architectures

A Power and Performance Analysis
  • Chi Ching Chi
  • Mauricio Alvarez-MesaEmail author
  • Jan Lucas
  • Ben Juurlink
  • Thomas Schierl


The Joint Collaborative Team on Video Decoding is developing a new standard named High Efficiency Video Coding (HEVC) that aims at reducing the bitrate of H.264/AVC by another 50 %. In order to fulfill the computational demands of the new standard, in particular for high resolutions and at low power budgets, exploiting parallelism is no longer an option but a requirement. Therefore, HEVC includes several coding tools that allows to divide each picture into several partitions that can be processed in parallel, without degrading the quality nor the bitrate. In this paper we adapt one of these approaches, the Wavefront Parallel Processing (WPP) coding, and show how it can be implemented on multi- and many-core processors. Our approach, named Overlapped Wavefront (OWF), processes several partitions as well as several pictures in parallel. This has the advantage that the amount of (thread-level) parallelism stays constant during execution. In addition, performance and power results are provided for three platforms: a server Intel CPU with 8 cores, a laptop Intel CPU with 4 cores, and a TILE-Gx36 with 36 cores from Tilera. The results show that our parallel HEVC decoder is capable of achieving an average frame rate of 116 fps for 4k resolution on a standard multicore CPU. The results also demonstrate that exploiting more parallelism by increasing the number of cores can improve the energy efficiency measured in terms of Joules per frame substantially.


HEVC Video coding Parallel processing Power analysis Real-time 4k UHD 


  1. 1.
    Advanced video coding for generic audiovisual services (2003). ITU-T Recommendation H.264 and ISO/IEC 14496-10 (MPEG-4 AVC).Google Scholar
  2. 2.
    Akenine-Möller, T., & Johnsson, B. (2012). Performance per what? Journal of Computer Graphics Techniques (JCGT), 1(1), 37–41.Google Scholar
  3. 3.
    Alvarez-Mesa, M., Chi, C.C., Juurlink, B., George, V., Schierl, T. (2012) . Parallel video decoding in the emerging HEVC standard. In Proceedings of the 37th international conference on acoustics, speech, and signal processing (ICASSP).Google Scholar
  4. 4.
    Bell, S., Edwards, B., Amann, J., Conlin, R., Joyce, K., Leung, V., MacKay, J., Reif, M., Bao, L., Brown, J., Mattina, M., Miao, C.C., Ramey, C., Wentzlaff, D., Anderson, W., Berger, E. Fairbanks, N., Khan, D., Montenegro, F., Stickney, J., Zook, J. (2008). TILE64–processor: a 64-Core SoC with mesh interconnect. In Digest of technical papers of the IEEE international solid-state circuits conference (ISSCC) (pp. 88–598).Google Scholar
  5. 5.
    Bossen, F. (2012). Common test conditions and software reference configurations. Tech. Rep. JCTVC-H1100.Google Scholar
  6. 6.
    Bross, B., Han, W.J., Sullivan, G.J., Ohm, J.R., Wiegand, T. (2012). High Efficiency Video Coding (HEVC) text specification draft 8. Tech. Rep. JCTVC-J1003.Google Scholar
  7. 7.
    Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P. (2000). A portable programming interface for performance evaluation on modern processors. International Journal of High Performance Computing Applications, 14(3), 189–204.CrossRefGoogle Scholar
  8. 8.
    Chi, C.C., & Juurlink, B. (2011). A QHD-capable parallel H.264 decoder. In Proceedings of the international conference on super-computing (ICS) (pp. 317–326).Google Scholar
  9. 9.
    Chi, C.C., Alvarez-Mesa, M., Juurlink, B., Clare, G., Henry, F., Pateux, S., Schierl, T. (2012). Parallel scalability and efficiency of HEVC parallelization approaches. IEEE Transactions of Circuits and Systems for Video Technology, 22(12).Google Scholar
  10. 10.
    Chi, C.C., Alvarez-Mesa, M., Juurlink, B., George, V., Schierl, T. (2012). Improving the parallelization efficiency of HEVC decoding. In Proceedings of IEEE international conference on image processing (ICIP).Google Scholar
  11. 11.
    David, H., Gorbatov, E., Hanebutte, U.R., Khanna, R., Le, C. (2010). RAPL: memory power estimation and capping. In Proceedings of the ACM/IEEE international symposium on low-power electronics and design (ISLPED) (pp. 189–194).Google Scholar
  12. 12.
    der Tol, E.B.V., Jaspers, E.G.T., Gelderblom, R.H. (2003). Mapping of H.264 decoding on a multiprocessor architecture. In Proceedings of SPIE, 5022, image and video communications and processing (pp. 707–718).Google Scholar
  13. 13.
    Fu, C.M., Chen, C.Y., Tsai, C.Y., Huang, Y.W., Lei, S. (2011). CE13: sample adaptive offset with LCU-independent decoding. Tech. Rep. JCTVC-E409.Google Scholar
  14. 14.
    Fuldseth, A., Horowitz, M., Xu, S., Zhou, M. (2011). Tiles. Tech. Rep. JCTVC-E408.Google Scholar
  15. 15.
    Han, W.J., Min, J., Kim, I.K., Alshina, E., Alshin, A., Lee, T., Chen, J., Seregin, V., Lee, S., Hong, Y.M., Cheon, M.S., Shlyakhov, N., McCann, K., Davies, T., Park, J.H. (2010). Improved video compression efficiency through flexible unit representation and corresponding extension of coding tools. IEEE Transactions on Circuits and Systems for Video Technology, 20(12), 1709–1720.CrossRefGoogle Scholar
  16. 16.
    Henry, F., & Pateux, S. (2011). Wavefront parallel processing. Tech. Rep. JCTVC-E196.Google Scholar
  17. 17.
    Hoffman, H., Kouadio, A., Thomas, Y., Visca, M. (2012). The Turin shoots. In EBU Tech-i (Vol. 13, pp. 8–9). Geneva: European Broadcasting Union (EBU).
  18. 18.
    Juurlink, B., Alvarez-Mesa, M., Chi, C.C., Azevedo, A., Meenderinck, C., Ramirez, A. (2012). Scalable parallel programming applied to H.264/AVC decoding. Berlin: Springer.Google Scholar
  19. 19.
    Meenderinck, C., Azevedo, A., Alvarez, M., Juurlink, B., Ramírez, A. (2009). Parallel scalability of video decoders. Journal of Signal Processing Systems, 57, 173–194.CrossRefGoogle Scholar
  20. 20.
    Misra, K., Zhao, J., Segall, A. (2010). Entropy slices for parallel entropy coding. Tech. Rep. JCTVC-B111.Google Scholar
  21. 21.
    Rotem, E., Naveh, A., Rajwan, D., Ananthakrishnan, A.,Weissmann, E. (2012). Power-management architecture of the Intel microarchitecture code-named Sandy Bridge. IEEE Micro, 32(2), 20–27.CrossRefGoogle Scholar
  22. 22.
    Seitner, F.H., Schreier, R.M., Bleyer, M., Gelautz, M. (2008). Evaluation of data-parallel splitting approaches for H.264 decoding. In Proceedings of the international conference on advances in mobile computing and multimedia (pp. 40–49).Google Scholar
  23. 23.
    Sullivan, G.J., & Ohm, J.R. (2010). Recent developments in standardization of High Efficiency Video Coding (HEVC). In Proceedings of SPIE, applications of digital image processing XXXIII (pp. 77980V-77980V-7).Google Scholar
  24. 24.
    Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T. (2012). Overview of the High Efficiency Video Coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology, 22(12).Google Scholar
  25. 25.
    Tourapis, A.M. (2002). Enhanced predictive zonal search for single and multiple frame motion estimation. In Proceedings of SPIE visual communications and image processing 2002 (pp. 1069–1079).Google Scholar

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  • Chi Ching Chi
    • 1
  • Mauricio Alvarez-Mesa
    • 1
    • 2
    Email author
  • Jan Lucas
    • 1
  • Ben Juurlink
    • 1
  • Thomas Schierl
    • 2
  1. 1.Technische Universität BerlinBerlinGermany
  2. 2.Image Processing DepartmentFraunhofer Heinrich Hertz Institute (HHI)BerlinGermany

Personalised recommendations