Abstract
Low-power asymmetric multicore processors (AMPs) have attracted considerable attention due to their appealing performance/power ratio for energy-constrained environments. However, these processors pose a significant programming challenge due to the integration of cores with different performance capabilities, asking for an asymmetry-aware scheduling solution that carefully distributes the workload. The recent HEVC standard, which offers several high-level parallelization strategies, is an important application that can benefit from an implementation tailored for the low-power AMPs present in many current mobile or handheld devices. In this scenario, we present an architecture-aware implementation of an HEVC decoder that embeds a criticality-aware scheduling strategy tuned for a Samsung Exynos 5422 System-on-Chip furnished with an ARM big.LITTLE AMP. The performance and energy efficiency of our solution are further enhanced by exploiting the NEON vector engine available in the ARM big.LITTLE architecture. Our experimental results expose a 1080p real-time HEVC decoding at 24 frames/s and a reduction of energy consumption over 20 %.
Similar content being viewed by others
References
Alonso, P., Badia, R.M., Labarta, J., Barreda, M., Dolz, M.F., Mayo, R., Quintana-Ortí, E.S., Reyes, R.: Tools for Power-Energy Modelling and Analysis of Parallel Scientific Applications. In: 41st International Conference on Parallel Processing—ICPP, pp. 420–429 (2012)
ARM: The ARM NEON general-purpose SIMD engine (2015). http://www.arm.com
Bariani, M., Lambruschini, P., Raggio, M., Pezzoni, L.: An optimized software implementation of the HEVC/H.265 video decoder. In: 2014 IEEE 11th Consumer Communications and Networking Conference (CCNC), pp. 77–82 (2014). doi:10.1109/CCNC.2014.7056307
Benmoussa, Y., Boukhobza, J., Senn, E., Benazzouz, D.: On the energy efficiency of parallel multi-core vs hardware accelerated HD video decoding. SIGBED Rev. 11(4), 25–30 (2015). doi:10.1145/2724942.2724946
Bossen, F., Bross, B., Suhring, K., Flynn, D.: HEVC complexity and implementation analysis. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1685–1696 (2012). doi:10.1109/TCSVT.2012.2221255
Bross, B., Han, W., Ohm, J., Sullivan, G., Wang, Y.K., Wiegand, T.: High efficiency video coding (HEVC) text specification draft 10. Joint collaborative Team on video Coding 12th meeting, Doc. JCTVC-L1003 (2013)
Calandrino, J., Baumberger, D., Li, T., Hahn, S., Anderson, J.: Soft real-time scheduling on performance asymmetric multicore platforms. In: Real Time and Embedded Technology and Applications Symposium, 2007. RTAS ’07. 13th IEEE, pp. 101–112 (2007). doi:10.1109/RTAS.2007.35
Chi, C.C., Alvarez-Mesa, M., Bross, B., Juurlink, B., Schierl, T.: SIMD acceleration for HEVC decoding. IEEE Trans. Circuits Syst. Video Technol. 25(5), 841–855 (2015). doi:10.1109/TCSVT.2014.2364413
Chi, C.C., Alvarez-Mesa, M., Juurlink, B.: Low-power high efficiency video decoding using general-purpose processors. ACM Trans. Archit. Code Optim. 11(4), 56 (2014)
Chi, C.C., Alvarez-Mesa, M., Juurlink, B., Clare, G., Henry, F., Pateux, S., Schierl, T.: Parallel scalability and efficiency of HEVC parallelization approaches. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1827–1838 (2012)
Chi, C.C., Alvarez-Mesa, M., Lucas, J., Juurlink, B., Schierl, T.: Parallel HEVC decoding on multi-and many-core architectures. J. Signal Process. Syst. 71(3), 247–260 (2013)
Chiang, P.T., Chang, T.S.: A reconfigurable inverse transform architecture design for HEVC decoder. In: 2013 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1006–1009 (2013). doi:10.1109/ISCAS.2013.6572019
Colin, A., Kandhalu, A., Rajkumar, R.R.: Energy-efficient allocation of real-time applications onto single-ISA heterogeneous multi-core processors. J. Signal Process. Syst. (2015). doi:10.1007/s11265-015-0987-3
Duan, Y., Sun, J., Yan, L., Chen, K., Guo, Z.: Novel efficient HEVC decoding solution on general-purpose processors. IEEE Trans. Multimed. 16(7), 1915–1928 (2014). doi:10.1109/TMM.2014.2337834
Farin, D.: libde265—open H.265 codec implementation (2015). https://github.com/strukturag/libde265
Fuldseth, A., Horowitz, M., Xu, S., Zhou, M.: Tiles. Joint Collaborative Team on Video Coding (JCT-VC), Document JCTVC-E408 (2011)
Gaspar, F., Taniça, L., Tomás, P., Ilic, A., Sousa, L.: A framework for application-guided task management on heterogeneous embedded systems. ACM Trans. Archit. Code Optim. 12(4), 42:1–42:25 (2015). doi:10.1145/2835177
Hardkernel company Ltd: ODROID-XU3 computing platform (2013). http://www.hardkernel.com
Henry, F., Pateux, S.: Wavefront Parallel Processing. Joint Collaborative Team on Video Coding (JCT-VC), Document JCTVC-E196 (2011)
ITU–T: ITU–T Recomendacin H.264, Advanced Video Coding for Generic Audiovisual Services (2003)
Joint Collaborative Team on Video Coding (JCT-VC): Common test conditions and software reference configurations. Joint Collaborative Team on Video Coding (JCT-VC), Doc. JCTVC-L1100 (2013)
Joint Collaborative Team on Video Coding (JCT-VC): HEVC Test Model 16.2 (2015). http://hevc.hhi.fraunhofer.de/
Ju, C.C., Liu, T.M., Chang, Y.C., Wang, C.M., Lin, H.M., Cheng, C.Y., Chen, C.C., Chiu, M.H., Wang, S.J., Chao, P., Hu, M.J., Yeh, F.C., Chuang, S.H., Lin, H.Y., Wu, M.L., Chen, C.H., Tsai, C.H.: A 0.2nJ/pixel 4K 60fps Main-10 HEVC decoder with multi-format capabilities for UHD-TV applications. In: European Solid State Circuits Conference (ESSCIRC), ESSCIRC 2014—40th, pp. 195–198 (2014). doi:10.1109/ESSCIRC.2014.6942055
Kalali, E., Adibelli, Y., Hamzaoglu, I.: A high performance and low energy intra prediction hardware for HEVC video decoding. In: 2012 Conference on Design and Architectures for Signal and Image Processing (DASIP), pp. 1–8 (2012)
Lakshminarayana, N.B., Lee, J., Kim, H.: Age based scheduling for asymmetric multiprocessors. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC ’09, pp. 25:1–25:12. ACM, New York, NY (2009). doi:10.1145/1654059.1654085
Meng, S., Duan, Y., Sun, J., Guo, Z.: Highly optimized implementation of HEVC decoder for general processors. In: 2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6 (2014). doi:10.1109/MMSP.2014.6958819
Nogues, E., Holmbacka, S., Pelcat, M., Menard, D., Lilius, J.: Power-aware HEVC decoding with tunable image quality. In: 2014 IEEE Workshop on Signal Processing Systems (SiPS), pp. 1–6 (2014). doi:10.1109/SiPS.2014.6986059
Nogues, E., Raffin, E., Pelcat, M., Menard, D.: A modified HEVC decoder for low power decoding. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF ’15, pp. 60:1–60:6. ACM, New York (2015). doi:10.1145/2742854.2747284
Ohm, J., Sullivan, G., Schwarz, H., Tan, T.K., Wiegand, T.: Comparison of the coding efficiency of video coding standards-including high efficiency video coding (HEVC). IEEE Trans. Circuits Syst. Video Technol. 22, 1669–1684 (2012)
Raffin, E., Hamidouche, W., Nogues, E., Pelcat, M., Menard, D., Tomperi, S.: Energy efficiency of a parallel HEVC software decoder for embedded devices. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF ’15, pp. 62:1–62:6. ACM, New York, NY (2015). doi:10.1145/2742854.2747286
Raffin, E., Nogues, E., Hamidouche, W., Tomperi, S., Pelcat, M., Menard, D.: Low power HEVC software decoder for mobile devices. J. Real-Time Image Process. (2015). doi:10.1007/s11554-015-0512-8
Ratcliff, J.W.: SSE2NEON.h: A porting guide and header file to convert SSE intrinsics to their ARM NEON equivalent (2015). https://github.com/jratcliff63367/sse2neon
Rodríguez-Sánchez, R., Igual, F.D., Martínez, J.L., Mayo, R., Quintana-Ortí, E.S.: Parallel performance and energy efficiency of modern video encoders on multithreaded architectures. In: 2014 Proceedings 22nd European Signal Processing Conference (EUSIPCO), pp. 191–195 (2014)
Saez, J.C., Prieto, M., Fedorova, A., Blagodurov, S.: A comprehensive scheduler for asymmetric multicore systems. In: Proceedings of the 5th European Conference on Computer Systems, EuroSys ’10, pp. 139–152. ACM, New York (2010). doi:10.1145/1755913.1755929
Somu Muthukaruppan, T., Pathania, A., Mitra, T.: Price theory based power management for heterogeneous multi-cores. SIGARCH Comput. Archit. News 42(1), 161–176 (2014). doi:10.1145/2654822.2541974
Sullivan, G., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22, 1649–1668 (2012)
Sze, V., Budagavi, M., Sullivan, G.J.: High efficiency video coding (HEVC): algorithms and architectures. In: Sze, V., Budagavi, M., Sullivan, G.J. (eds.) Integrated Circuits and Systems. Springer, Berlin (2014)
Tikekar, M., Huang, C.T., Juvekar, C., Sze, V., Chandrakasan, A.: A 249-Mpixel/s HEVC video-decoder chip for 4K ultra-HD applications. IEEE J. Solid State Circuits 49(1), 61–72 (2014). doi:10.1109/JSSC.2013.2284362
Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002). doi:10.1109/71.993206
Van Craeynest, K., Jaleel, A., Eeckhout, L., Narvaez, P., Emer, J.: Scheduling heterogeneous multi-cores through performance impact estimation (PIE). SIGARCH Comput. Archit. News 40(3), 213–224 (2012). doi:10.1145/2366231.2337184
Yong, H., Wang, R., Wang, W., Wang, Z., Dong, S., Han, B., Gao, W.: Acceleration of HEVC transform and inverse transform on ARM NEON platform. In: 2013 International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS), pp. 169–173 (2013). doi:10.1109/ISPACS.2013.6704541
Yu, K., Han, D., Youn, C., Hwang, S., Lee, J.: Power-aware task scheduling for big.LITTLE mobile processor. In: SoC Design Conference (ISOCC), 2013 International, pp. 208–212 (2013). doi:10.1109/ISOCC.2013.6864009
Zhu, J., Zhou, D., He, G., Goto, S.: A combined SAO and de-blocking filter architecture for HEVC video decoder. In: 2013 20th IEEE International Conference on Image Processing (ICIP), pp. 1967–1971 (2013). doi:10.1109/ICIP.2013.6738405
Zhu, Y., Halpern, M., Reddi, V.J.: Event-based scheduling for energy-efficient QoS (eQoS) in mobile Web applications. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 137–149 (2015). doi:10.1109/HPCA.2015.7056028
Acknowledgments
This work was supported by Project CICYT TIN2014-53495-R of MINECO and FEDER.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rodríguez-Sánchez, R., Quintana-Ortí, E.S. Architecture-aware optimization of an HEVC decoder on asymmetric multicore processors. J Real-Time Image Proc 13, 25–38 (2017). https://doi.org/10.1007/s11554-016-0606-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-016-0606-y