Skip to main content

Advertisement

Log in

Architecture-aware optimization of an HEVC decoder on asymmetric multicore processors

  • Special Issue Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Low-power asymmetric multicore processors (AMPs) have attracted considerable attention due to their appealing performance/power ratio for energy-constrained environments. However, these processors pose a significant programming challenge due to the integration of cores with different performance capabilities, asking for an asymmetry-aware scheduling solution that carefully distributes the workload. The recent HEVC standard, which offers several high-level parallelization strategies, is an important application that can benefit from an implementation tailored for the low-power AMPs present in many current mobile or handheld devices. In this scenario, we present an architecture-aware implementation of an HEVC decoder that embeds a criticality-aware scheduling strategy tuned for a Samsung Exynos 5422 System-on-Chip furnished with an ARM big.LITTLE AMP. The performance and energy efficiency of our solution are further enhanced by exploiting the NEON vector engine available in the ARM big.LITTLE architecture. Our experimental results expose a 1080p real-time HEVC decoding at 24 frames/s and a reduction of energy consumption over 20 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Alonso, P., Badia, R.M., Labarta, J., Barreda, M., Dolz, M.F., Mayo, R., Quintana-Ortí, E.S., Reyes, R.: Tools for Power-Energy Modelling and Analysis of Parallel Scientific Applications. In: 41st International Conference on Parallel Processing—ICPP, pp. 420–429 (2012)

  2. ARM: The ARM NEON general-purpose SIMD engine (2015). http://www.arm.com

  3. Bariani, M., Lambruschini, P., Raggio, M., Pezzoni, L.: An optimized software implementation of the HEVC/H.265 video decoder. In: 2014 IEEE 11th Consumer Communications and Networking Conference (CCNC), pp. 77–82 (2014). doi:10.1109/CCNC.2014.7056307

  4. Benmoussa, Y., Boukhobza, J., Senn, E., Benazzouz, D.: On the energy efficiency of parallel multi-core vs hardware accelerated HD video decoding. SIGBED Rev. 11(4), 25–30 (2015). doi:10.1145/2724942.2724946

    Article  Google Scholar 

  5. Bossen, F., Bross, B., Suhring, K., Flynn, D.: HEVC complexity and implementation analysis. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1685–1696 (2012). doi:10.1109/TCSVT.2012.2221255

    Article  Google Scholar 

  6. Bross, B., Han, W., Ohm, J., Sullivan, G., Wang, Y.K., Wiegand, T.: High efficiency video coding (HEVC) text specification draft 10. Joint collaborative Team on video Coding 12th meeting, Doc. JCTVC-L1003 (2013)

  7. Calandrino, J., Baumberger, D., Li, T., Hahn, S., Anderson, J.: Soft real-time scheduling on performance asymmetric multicore platforms. In: Real Time and Embedded Technology and Applications Symposium, 2007. RTAS ’07. 13th IEEE, pp. 101–112 (2007). doi:10.1109/RTAS.2007.35

  8. Chi, C.C., Alvarez-Mesa, M., Bross, B., Juurlink, B., Schierl, T.: SIMD acceleration for HEVC decoding. IEEE Trans. Circuits Syst. Video Technol. 25(5), 841–855 (2015). doi:10.1109/TCSVT.2014.2364413

    Article  Google Scholar 

  9. Chi, C.C., Alvarez-Mesa, M., Juurlink, B.: Low-power high efficiency video decoding using general-purpose processors. ACM Trans. Archit. Code Optim. 11(4), 56 (2014)

    Google Scholar 

  10. Chi, C.C., Alvarez-Mesa, M., Juurlink, B., Clare, G., Henry, F., Pateux, S., Schierl, T.: Parallel scalability and efficiency of HEVC parallelization approaches. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1827–1838 (2012)

    Article  Google Scholar 

  11. Chi, C.C., Alvarez-Mesa, M., Lucas, J., Juurlink, B., Schierl, T.: Parallel HEVC decoding on multi-and many-core architectures. J. Signal Process. Syst. 71(3), 247–260 (2013)

    Article  Google Scholar 

  12. Chiang, P.T., Chang, T.S.: A reconfigurable inverse transform architecture design for HEVC decoder. In: 2013 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1006–1009 (2013). doi:10.1109/ISCAS.2013.6572019

  13. Colin, A., Kandhalu, A., Rajkumar, R.R.: Energy-efficient allocation of real-time applications onto single-ISA heterogeneous multi-core processors. J. Signal Process. Syst. (2015). doi:10.1007/s11265-015-0987-3

    Google Scholar 

  14. Duan, Y., Sun, J., Yan, L., Chen, K., Guo, Z.: Novel efficient HEVC decoding solution on general-purpose processors. IEEE Trans. Multimed. 16(7), 1915–1928 (2014). doi:10.1109/TMM.2014.2337834

    Article  Google Scholar 

  15. Farin, D.: libde265—open H.265 codec implementation (2015). https://github.com/strukturag/libde265

  16. Fuldseth, A., Horowitz, M., Xu, S., Zhou, M.: Tiles. Joint Collaborative Team on Video Coding (JCT-VC), Document JCTVC-E408 (2011)

  17. Gaspar, F., Taniça, L., Tomás, P., Ilic, A., Sousa, L.: A framework for application-guided task management on heterogeneous embedded systems. ACM Trans. Archit. Code Optim. 12(4), 42:1–42:25 (2015). doi:10.1145/2835177

    Article  Google Scholar 

  18. Hardkernel company Ltd: ODROID-XU3 computing platform (2013). http://www.hardkernel.com

  19. Henry, F., Pateux, S.: Wavefront Parallel Processing. Joint Collaborative Team on Video Coding (JCT-VC), Document JCTVC-E196 (2011)

  20. ITU–T: ITU–T Recomendacin H.264, Advanced Video Coding for Generic Audiovisual Services (2003)

  21. Joint Collaborative Team on Video Coding (JCT-VC): Common test conditions and software reference configurations. Joint Collaborative Team on Video Coding (JCT-VC), Doc. JCTVC-L1100 (2013)

  22. Joint Collaborative Team on Video Coding (JCT-VC): HEVC Test Model 16.2 (2015). http://hevc.hhi.fraunhofer.de/

  23. Ju, C.C., Liu, T.M., Chang, Y.C., Wang, C.M., Lin, H.M., Cheng, C.Y., Chen, C.C., Chiu, M.H., Wang, S.J., Chao, P., Hu, M.J., Yeh, F.C., Chuang, S.H., Lin, H.Y., Wu, M.L., Chen, C.H., Tsai, C.H.: A 0.2nJ/pixel 4K 60fps Main-10 HEVC decoder with multi-format capabilities for UHD-TV applications. In: European Solid State Circuits Conference (ESSCIRC), ESSCIRC 2014—40th, pp. 195–198 (2014). doi:10.1109/ESSCIRC.2014.6942055

  24. Kalali, E., Adibelli, Y., Hamzaoglu, I.: A high performance and low energy intra prediction hardware for HEVC video decoding. In: 2012 Conference on Design and Architectures for Signal and Image Processing (DASIP), pp. 1–8 (2012)

  25. Lakshminarayana, N.B., Lee, J., Kim, H.: Age based scheduling for asymmetric multiprocessors. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC ’09, pp. 25:1–25:12. ACM, New York, NY (2009). doi:10.1145/1654059.1654085

  26. Meng, S., Duan, Y., Sun, J., Guo, Z.: Highly optimized implementation of HEVC decoder for general processors. In: 2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6 (2014). doi:10.1109/MMSP.2014.6958819

  27. Nogues, E., Holmbacka, S., Pelcat, M., Menard, D., Lilius, J.: Power-aware HEVC decoding with tunable image quality. In: 2014 IEEE Workshop on Signal Processing Systems (SiPS), pp. 1–6 (2014). doi:10.1109/SiPS.2014.6986059

  28. Nogues, E., Raffin, E., Pelcat, M., Menard, D.: A modified HEVC decoder for low power decoding. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF ’15, pp. 60:1–60:6. ACM, New York (2015). doi:10.1145/2742854.2747284

  29. Ohm, J., Sullivan, G., Schwarz, H., Tan, T.K., Wiegand, T.: Comparison of the coding efficiency of video coding standards-including high efficiency video coding (HEVC). IEEE Trans. Circuits Syst. Video Technol. 22, 1669–1684 (2012)

    Article  Google Scholar 

  30. Raffin, E., Hamidouche, W., Nogues, E., Pelcat, M., Menard, D., Tomperi, S.: Energy efficiency of a parallel HEVC software decoder for embedded devices. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF ’15, pp. 62:1–62:6. ACM, New York, NY (2015). doi:10.1145/2742854.2747286

  31. Raffin, E., Nogues, E., Hamidouche, W., Tomperi, S., Pelcat, M., Menard, D.: Low power HEVC software decoder for mobile devices. J. Real-Time Image Process. (2015). doi:10.1007/s11554-015-0512-8

    Google Scholar 

  32. Ratcliff, J.W.: SSE2NEON.h: A porting guide and header file to convert SSE intrinsics to their ARM NEON equivalent (2015). https://github.com/jratcliff63367/sse2neon

  33. Rodríguez-Sánchez, R., Igual, F.D., Martínez, J.L., Mayo, R., Quintana-Ortí, E.S.: Parallel performance and energy efficiency of modern video encoders on multithreaded architectures. In: 2014 Proceedings 22nd European Signal Processing Conference (EUSIPCO), pp. 191–195 (2014)

  34. Saez, J.C., Prieto, M., Fedorova, A., Blagodurov, S.: A comprehensive scheduler for asymmetric multicore systems. In: Proceedings of the 5th European Conference on Computer Systems, EuroSys ’10, pp. 139–152. ACM, New York (2010). doi:10.1145/1755913.1755929

  35. Somu Muthukaruppan, T., Pathania, A., Mitra, T.: Price theory based power management for heterogeneous multi-cores. SIGARCH Comput. Archit. News 42(1), 161–176 (2014). doi:10.1145/2654822.2541974

    Google Scholar 

  36. Sullivan, G., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22, 1649–1668 (2012)

    Article  Google Scholar 

  37. Sze, V., Budagavi, M., Sullivan, G.J.: High efficiency video coding (HEVC): algorithms and architectures. In: Sze, V., Budagavi, M., Sullivan, G.J. (eds.) Integrated Circuits and Systems. Springer, Berlin (2014)

  38. Tikekar, M., Huang, C.T., Juvekar, C., Sze, V., Chandrakasan, A.: A 249-Mpixel/s HEVC video-decoder chip for 4K ultra-HD applications. IEEE J. Solid State Circuits 49(1), 61–72 (2014). doi:10.1109/JSSC.2013.2284362

    Article  Google Scholar 

  39. Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002). doi:10.1109/71.993206

    Article  Google Scholar 

  40. Van Craeynest, K., Jaleel, A., Eeckhout, L., Narvaez, P., Emer, J.: Scheduling heterogeneous multi-cores through performance impact estimation (PIE). SIGARCH Comput. Archit. News 40(3), 213–224 (2012). doi:10.1145/2366231.2337184

    Article  Google Scholar 

  41. Yong, H., Wang, R., Wang, W., Wang, Z., Dong, S., Han, B., Gao, W.: Acceleration of HEVC transform and inverse transform on ARM NEON platform. In: 2013 International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS), pp. 169–173 (2013). doi:10.1109/ISPACS.2013.6704541

  42. Yu, K., Han, D., Youn, C., Hwang, S., Lee, J.: Power-aware task scheduling for big.LITTLE mobile processor. In: SoC Design Conference (ISOCC), 2013 International, pp. 208–212 (2013). doi:10.1109/ISOCC.2013.6864009

  43. Zhu, J., Zhou, D., He, G., Goto, S.: A combined SAO and de-blocking filter architecture for HEVC video decoder. In: 2013 20th IEEE International Conference on Image Processing (ICIP), pp. 1967–1971 (2013). doi:10.1109/ICIP.2013.6738405

  44. Zhu, Y., Halpern, M., Reddi, V.J.: Event-based scheduling for energy-efficient QoS (eQoS) in mobile Web applications. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 137–149 (2015). doi:10.1109/HPCA.2015.7056028

Download references

Acknowledgments

This work was supported by Project CICYT TIN2014-53495-R of MINECO and FEDER.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafael Rodríguez-Sánchez.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rodríguez-Sánchez, R., Quintana-Ortí, E.S. Architecture-aware optimization of an HEVC decoder on asymmetric multicore processors. J Real-Time Image Proc 13, 25–38 (2017). https://doi.org/10.1007/s11554-016-0606-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-016-0606-y

Keywords

Navigation