Journal of Real-Time Image Processing

, Volume 13, Issue 1, pp 25–38 | Cite as

Architecture-aware optimization of an HEVC decoder on asymmetric multicore processors

  • Rafael Rodríguez-SánchezEmail author
  • Enrique S. Quintana-Ortí
Special Issue Paper


Low-power asymmetric multicore processors (AMPs) have attracted considerable attention due to their appealing performance/power ratio for energy-constrained environments. However, these processors pose a significant programming challenge due to the integration of cores with different performance capabilities, asking for an asymmetry-aware scheduling solution that carefully distributes the workload. The recent HEVC standard, which offers several high-level parallelization strategies, is an important application that can benefit from an implementation tailored for the low-power AMPs present in many current mobile or handheld devices. In this scenario, we present an architecture-aware implementation of an HEVC decoder that embeds a criticality-aware scheduling strategy tuned for a Samsung Exynos 5422 System-on-Chip furnished with an ARM big.LITTLE AMP. The performance and energy efficiency of our solution are further enhanced by exploiting the NEON vector engine available in the ARM big.LITTLE architecture. Our experimental results expose a 1080p real-time HEVC decoding at 24 frames/s and a reduction of energy consumption over 20 %.


HEVC Asymmetric multicore processors Scheduling Vector intrinsics Real-time decoding Energy efficiency 



This work was supported by Project CICYT TIN2014-53495-R of MINECO and FEDER.


  1. 1.
    Alonso, P., Badia, R.M., Labarta, J., Barreda, M., Dolz, M.F., Mayo, R., Quintana-Ortí, E.S., Reyes, R.: Tools for Power-Energy Modelling and Analysis of Parallel Scientific Applications. In: 41st International Conference on Parallel Processing—ICPP, pp. 420–429 (2012)Google Scholar
  2. 2.
    ARM: The ARM NEON general-purpose SIMD engine (2015).
  3. 3.
    Bariani, M., Lambruschini, P., Raggio, M., Pezzoni, L.: An optimized software implementation of the HEVC/H.265 video decoder. In: 2014 IEEE 11th Consumer Communications and Networking Conference (CCNC), pp. 77–82 (2014). doi: 10.1109/CCNC.2014.7056307
  4. 4.
    Benmoussa, Y., Boukhobza, J., Senn, E., Benazzouz, D.: On the energy efficiency of parallel multi-core vs hardware accelerated HD video decoding. SIGBED Rev. 11(4), 25–30 (2015). doi: 10.1145/2724942.2724946 CrossRefGoogle Scholar
  5. 5.
    Bossen, F., Bross, B., Suhring, K., Flynn, D.: HEVC complexity and implementation analysis. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1685–1696 (2012). doi: 10.1109/TCSVT.2012.2221255 CrossRefGoogle Scholar
  6. 6.
    Bross, B., Han, W., Ohm, J., Sullivan, G., Wang, Y.K., Wiegand, T.: High efficiency video coding (HEVC) text specification draft 10. Joint collaborative Team on video Coding 12th meeting, Doc. JCTVC-L1003 (2013)Google Scholar
  7. 7.
    Calandrino, J., Baumberger, D., Li, T., Hahn, S., Anderson, J.: Soft real-time scheduling on performance asymmetric multicore platforms. In: Real Time and Embedded Technology and Applications Symposium, 2007. RTAS ’07. 13th IEEE, pp. 101–112 (2007). doi: 10.1109/RTAS.2007.35
  8. 8.
    Chi, C.C., Alvarez-Mesa, M., Bross, B., Juurlink, B., Schierl, T.: SIMD acceleration for HEVC decoding. IEEE Trans. Circuits Syst. Video Technol. 25(5), 841–855 (2015). doi: 10.1109/TCSVT.2014.2364413 CrossRefGoogle Scholar
  9. 9.
    Chi, C.C., Alvarez-Mesa, M., Juurlink, B.: Low-power high efficiency video decoding using general-purpose processors. ACM Trans. Archit. Code Optim. 11(4), 56 (2014)Google Scholar
  10. 10.
    Chi, C.C., Alvarez-Mesa, M., Juurlink, B., Clare, G., Henry, F., Pateux, S., Schierl, T.: Parallel scalability and efficiency of HEVC parallelization approaches. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1827–1838 (2012)CrossRefGoogle Scholar
  11. 11.
    Chi, C.C., Alvarez-Mesa, M., Lucas, J., Juurlink, B., Schierl, T.: Parallel HEVC decoding on multi-and many-core architectures. J. Signal Process. Syst. 71(3), 247–260 (2013)CrossRefGoogle Scholar
  12. 12.
    Chiang, P.T., Chang, T.S.: A reconfigurable inverse transform architecture design for HEVC decoder. In: 2013 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1006–1009 (2013). doi: 10.1109/ISCAS.2013.6572019
  13. 13.
    Colin, A., Kandhalu, A., Rajkumar, R.R.: Energy-efficient allocation of real-time applications onto single-ISA heterogeneous multi-core processors. J. Signal Process. Syst. (2015). doi: 10.1007/s11265-015-0987-3 Google Scholar
  14. 14.
    Duan, Y., Sun, J., Yan, L., Chen, K., Guo, Z.: Novel efficient HEVC decoding solution on general-purpose processors. IEEE Trans. Multimed. 16(7), 1915–1928 (2014). doi: 10.1109/TMM.2014.2337834 CrossRefGoogle Scholar
  15. 15.
    Farin, D.: libde265—open H.265 codec implementation (2015).
  16. 16.
    Fuldseth, A., Horowitz, M., Xu, S., Zhou, M.: Tiles. Joint Collaborative Team on Video Coding (JCT-VC), Document JCTVC-E408 (2011)Google Scholar
  17. 17.
    Gaspar, F., Taniça, L., Tomás, P., Ilic, A., Sousa, L.: A framework for application-guided task management on heterogeneous embedded systems. ACM Trans. Archit. Code Optim. 12(4), 42:1–42:25 (2015). doi: 10.1145/2835177 CrossRefGoogle Scholar
  18. 18.
    Hardkernel company Ltd: ODROID-XU3 computing platform (2013).
  19. 19.
    Henry, F., Pateux, S.: Wavefront Parallel Processing. Joint Collaborative Team on Video Coding (JCT-VC), Document JCTVC-E196 (2011)Google Scholar
  20. 20.
    ITU–T: ITU–T Recomendacin H.264, Advanced Video Coding for Generic Audiovisual Services (2003)Google Scholar
  21. 21.
    Joint Collaborative Team on Video Coding (JCT-VC): Common test conditions and software reference configurations. Joint Collaborative Team on Video Coding (JCT-VC), Doc. JCTVC-L1100 (2013)Google Scholar
  22. 22.
    Joint Collaborative Team on Video Coding (JCT-VC): HEVC Test Model 16.2 (2015).
  23. 23.
    Ju, C.C., Liu, T.M., Chang, Y.C., Wang, C.M., Lin, H.M., Cheng, C.Y., Chen, C.C., Chiu, M.H., Wang, S.J., Chao, P., Hu, M.J., Yeh, F.C., Chuang, S.H., Lin, H.Y., Wu, M.L., Chen, C.H., Tsai, C.H.: A 0.2nJ/pixel 4K 60fps Main-10 HEVC decoder with multi-format capabilities for UHD-TV applications. In: European Solid State Circuits Conference (ESSCIRC), ESSCIRC 2014—40th, pp. 195–198 (2014). doi: 10.1109/ESSCIRC.2014.6942055
  24. 24.
    Kalali, E., Adibelli, Y., Hamzaoglu, I.: A high performance and low energy intra prediction hardware for HEVC video decoding. In: 2012 Conference on Design and Architectures for Signal and Image Processing (DASIP), pp. 1–8 (2012)Google Scholar
  25. 25.
    Lakshminarayana, N.B., Lee, J., Kim, H.: Age based scheduling for asymmetric multiprocessors. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC ’09, pp. 25:1–25:12. ACM, New York, NY (2009). doi: 10.1145/1654059.1654085
  26. 26.
    Meng, S., Duan, Y., Sun, J., Guo, Z.: Highly optimized implementation of HEVC decoder for general processors. In: 2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6 (2014). doi: 10.1109/MMSP.2014.6958819
  27. 27.
    Nogues, E., Holmbacka, S., Pelcat, M., Menard, D., Lilius, J.: Power-aware HEVC decoding with tunable image quality. In: 2014 IEEE Workshop on Signal Processing Systems (SiPS), pp. 1–6 (2014). doi: 10.1109/SiPS.2014.6986059
  28. 28.
    Nogues, E., Raffin, E., Pelcat, M., Menard, D.: A modified HEVC decoder for low power decoding. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF ’15, pp. 60:1–60:6. ACM, New York (2015). doi: 10.1145/2742854.2747284
  29. 29.
    Ohm, J., Sullivan, G., Schwarz, H., Tan, T.K., Wiegand, T.: Comparison of the coding efficiency of video coding standards-including high efficiency video coding (HEVC). IEEE Trans. Circuits Syst. Video Technol. 22, 1669–1684 (2012)CrossRefGoogle Scholar
  30. 30.
    Raffin, E., Hamidouche, W., Nogues, E., Pelcat, M., Menard, D., Tomperi, S.: Energy efficiency of a parallel HEVC software decoder for embedded devices. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF ’15, pp. 62:1–62:6. ACM, New York, NY (2015). doi: 10.1145/2742854.2747286
  31. 31.
    Raffin, E., Nogues, E., Hamidouche, W., Tomperi, S., Pelcat, M., Menard, D.: Low power HEVC software decoder for mobile devices. J. Real-Time Image Process. (2015). doi: 10.1007/s11554-015-0512-8 Google Scholar
  32. 32.
    Ratcliff, J.W.: SSE2NEON.h: A porting guide and header file to convert SSE intrinsics to their ARM NEON equivalent (2015).
  33. 33.
    Rodríguez-Sánchez, R., Igual, F.D., Martínez, J.L., Mayo, R., Quintana-Ortí, E.S.: Parallel performance and energy efficiency of modern video encoders on multithreaded architectures. In: 2014 Proceedings 22nd European Signal Processing Conference (EUSIPCO), pp. 191–195 (2014)Google Scholar
  34. 34.
    Saez, J.C., Prieto, M., Fedorova, A., Blagodurov, S.: A comprehensive scheduler for asymmetric multicore systems. In: Proceedings of the 5th European Conference on Computer Systems, EuroSys ’10, pp. 139–152. ACM, New York (2010). doi: 10.1145/1755913.1755929
  35. 35.
    Somu Muthukaruppan, T., Pathania, A., Mitra, T.: Price theory based power management for heterogeneous multi-cores. SIGARCH Comput. Archit. News 42(1), 161–176 (2014). doi: 10.1145/2654822.2541974 Google Scholar
  36. 36.
    Sullivan, G., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22, 1649–1668 (2012)CrossRefGoogle Scholar
  37. 37.
    Sze, V., Budagavi, M., Sullivan, G.J.: High efficiency video coding (HEVC): algorithms and architectures. In: Sze, V., Budagavi, M., Sullivan, G.J. (eds.) Integrated Circuits and Systems. Springer, Berlin (2014)Google Scholar
  38. 38.
    Tikekar, M., Huang, C.T., Juvekar, C., Sze, V., Chandrakasan, A.: A 249-Mpixel/s HEVC video-decoder chip for 4K ultra-HD applications. IEEE J. Solid State Circuits 49(1), 61–72 (2014). doi: 10.1109/JSSC.2013.2284362 CrossRefGoogle Scholar
  39. 39.
    Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002). doi: 10.1109/71.993206 CrossRefGoogle Scholar
  40. 40.
    Van Craeynest, K., Jaleel, A., Eeckhout, L., Narvaez, P., Emer, J.: Scheduling heterogeneous multi-cores through performance impact estimation (PIE). SIGARCH Comput. Archit. News 40(3), 213–224 (2012). doi: 10.1145/2366231.2337184 CrossRefGoogle Scholar
  41. 41.
    Yong, H., Wang, R., Wang, W., Wang, Z., Dong, S., Han, B., Gao, W.: Acceleration of HEVC transform and inverse transform on ARM NEON platform. In: 2013 International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS), pp. 169–173 (2013). doi: 10.1109/ISPACS.2013.6704541
  42. 42.
    Yu, K., Han, D., Youn, C., Hwang, S., Lee, J.: Power-aware task scheduling for big.LITTLE mobile processor. In: SoC Design Conference (ISOCC), 2013 International, pp. 208–212 (2013). doi: 10.1109/ISOCC.2013.6864009
  43. 43.
    Zhu, J., Zhou, D., He, G., Goto, S.: A combined SAO and de-blocking filter architecture for HEVC video decoder. In: 2013 20th IEEE International Conference on Image Processing (ICIP), pp. 1967–1971 (2013). doi: 10.1109/ICIP.2013.6738405
  44. 44.
    Zhu, Y., Halpern, M., Reddi, V.J.: Event-based scheduling for energy-efficient QoS (eQoS) in mobile Web applications. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 137–149 (2015). doi: 10.1109/HPCA.2015.7056028

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Rafael Rodríguez-Sánchez
    • 1
    Email author
  • Enrique S. Quintana-Ortí
    • 1
  1. 1.Depto. Ingeniería y Ciencia de ComputadoresUniversidad Jaume ICastellón de la PlanaSpain

Personalised recommendations