Abstract
The sum of absolute difference (SAD) calculation is one of the most computing-intensive operations in video encoders compatible with recent standards, such as high-efficiency video coding (HEVC). SAD hardware architectures employ an adder tree to accumulate the coefficients from the absolute difference between two video blocks. This paper employs high-order adder compressors (HOAC) structures into SAD hardware architectures to achieve ultra-high definition (UHD) encoding in real time, using block sizes compatible with HEVC. The proposed HOAC architectures are power-efficient and enable low-power SAD hardware accelerators. Our throughput analysis shows that the HOAC-based SAD hardware architecture is capable of encoding UHD 4K (\(3840\times 2160\)) videos in real-time at 60 frames per second. The architectures were entirely designed as dedicated ASIC blocks and were synthesized to ST 65 nm CMOS standard cells. Synthesis results show that SAD architectures using 64-2, 32-2, 16-2 and 8-2 compressors built from 4-2 compressors are significantly more efficient in terms of circuit area and total power dissipation when compared with SAD architectures using conventional adders selected by a commercial logic synthesis tool.
Similar content being viewed by others
References
Abreu, B., Paim, G., Grellert, M., Silveira, B., Diniz, C., Costa, E., Bampi, S.: Exploiting absolute arithmetic for power-efficient sum of absolute differences. In: IEEE International Conference on Electronics, Circuits and Systems (ICECS), pp. 522–525 (2017)
Afonso, V., Maich, H., Agostini, L., Franco, D.: Low cost and high throughput FME interpolation for the HEVC emerging video coding standard. In: 2013 IEEE 4th Latin American Symposium on Circuits and Systems (LASCAS), pp. 1–4 (2013). https://doi.org/10.1109/LASCAS.2013.6519017
Afonso, V., Maich, H., Agostini, L., Franco, D.: Simplified HEVC FME interpolation unit targeting a low cost and high throughput hardware design. In: 2013 Data Compression Conference, pp. 473 (2013). https://doi.org/10.1109/DCC.2013.55
Afonso, V., Maich, H., Audibert, L., Zatt, B., Porto, M., Agostini, L.: Memory-aware and high-throughput hardware design for the HEVC fractional motion estimation. In: 2015 28th Symposium on Integrated Circuits and Systems Design (SBCCI), pp. 1–6 (2015)
Alcocer, E., Gutierrez, R., Lopez-Granado, O., Malumbres, M.P.: Design and implementation of an efficient hardware integer motion estimator for an HEVC video encoder. J. Real-Time Image Process. 20L, 1–11 (2016)
Altermann, J.S., da Costa, E.A.C., Bampi, S.: Fast forward and inverse transforms for the H.264/AVC standard using hierarchical adder compressors. In: 2010 18th IEEE/IFIP International Conference on VLSI and System-on-Chip, pp. 310–315 (2010)
Bross, B., Han, W.J., Ohm, J.R., Sullivan, G.J., Wiegand, T.: High efficiency video coding (HEVC) text specification draft 7 (2012)
Byun, J., Jung, Y., Kim, J.: Design of integer motion estimator of HEVC for asymmetric motion-partitioning mode and 4K-UHD. Electron. Lett. 49(18), 1142–1143 (2013). https://doi.org/10.1049/el.2013.0936
Chang, C.H., Gu, J., Zhang, M.: Hardware implementation for the HEVC fractional motion estimation targeting real-time and low-energy. J. Integrat. Circ. Syst. 11(2), 106–120 (2016)
Chen, C.Y., Chien, S.Y., Huang, Y.W., Chen, T.C., Wang, T.C., Chen, L.G.: Analysis and architecture design of variable block-size motion estimation for H.264/AVC. IEEE Trans. Circ. Syst. I Regul. Pap. 53(3), 578–593 (2006)
Ding, D., Ye, X., Wang, S.: 1/2 and 1/4 pixel paralleled FME with a scalable search pattern for HEVC ultra-HD encoding. In: IEEE International Conference on Communication Technology, pp. 278–281 (2015)
Dinh, C., Nguyen, T., Pham, C., Nguyen, P., Duong, D., Phung, H., Pham, T., Nguyen, T.: A novel parallel hardware architecture for inter motion estimation in HEVC. J. Telecommun. Electron. Comput. Eng 9, 1–3 (2017)
Dinh, V.N., Phuong, H.A.: High speed SAD architecture for variable block size motion estimation in HEVC encoder. In: 2016 IEEE Sixth International Conference on Communications and Electronics (ICCE), pp. 195–198 (2016)
Esmaeilzadeh, H., Blem, E., Amant, R.S., Sankaralingam, K., Burger, D.: Dark silicon and the end of multicore scaling. IEEE Micro 32(3), 122–134 (2012). https://doi.org/10.1109/MM.2012.17
Grellert, M., Bampi, S., Zatt, B.: Complexity-scalable HEVC encoding. In: 2016 Picture Coding Symposium (PCS), pp. 1–5 (2016). https://doi.org/10.1109/PCS.2016.7906356
H.264 x264 Encoder. http://www.videolan.org/developers/x264.html
Hanoosh, Z., Roodaki, H.: A parallel architecture for motion estimation in HEVC encoder. CSI J. Comput. Sci. Eng. 20, 4 (2018)
Hse, G., Zhou, D.L.Y., Chen, Z., Zhang, T., Goto, S.: High-throughput power-efficient VLSI architecture of fractional motion estimation for ultra-HD HEVC video encoding. IEEE Trans. Very Large Scale Integrat. Syst. 23, 12 (2015)
Huang, Y., Liu, Q., Ikenaga, T.: Compressor tree based processing element optimization in propagate partial SAD architecture. In: APCCAS 2008—2008 IEEE Asia Pacific Conference on Circuits and Systems, pp. 1786–1789 (2008). https://doi.org/10.1109/APCCAS.2008.4746388
Jayakrishnan, P., Kittu, H.: Pipelined architecture for motion estimation in HEVC video coding. Indian J. Sci. Technol. 32, 1–5 (2016)
Jou, S.Y., Chang, S.J., Chang, T.S.: Fast motion estimation algorithm and design for real time QFHD high efficiency video coding. IEEE Trans. Circuits Syst. Video Technol. 25(9), 1533–1544 (2015). https://doi.org/10.1109/TCSVT.2015.2389472
Khdr, H., Pagani, S., Sousa, E., Lari, V., Pathania, A., Hannig, F., Shafique, M., Teich, J., Henkel, J.: Power density-aware resource management for heterogeneous tiled multicores. IEEE Trans. Comput. 66(3), 488–501 (2017). https://doi.org/10.1109/TC.2016.2595560
Khemiri, R., Chouchene, M., Bahri, H., Sayadi, F., Kibeyaa, H., Atri, M., Masmoudi, N.: Fast sad algorithm of HEVC video encoder on two successive GPU generations. Int. J. Imaging Robot. 17, 4 (2017)
Khemiri, R., Kibeya, H., Sayadi, F., Bahri, N., Atri, M., Masmoudi, N.: Optimisation of HEVC motion estimation exploiting sad and SSD GPU-based implementation. IET Image Process. J. 12, 4 (2018)
Kibeya, H., Bahri, N., Ayed, M., Masmoudi, N.: Sad and SSE implementation for HEVC encoder on DSP tms320c6678. In: International Image Processing, Applications and Systems (IPAS), pp. 1–6 (2016)
Koren, I.: Computer Arithmetic Algorithms. A. K. Peters, Natick (2002)
Kumm, M., Kleinlein, M., Zipf, P.: Efficient sum of absolute difference computation on FPGAs. In: 2016 26th International Conference on Field Programmable Logic and Applications, pp. 1–4 (2016). https://doi.org/10.1109/FPL.2016.7577374
Kwon, O., Novka, K., Swartzlander, E.: A 16-bit $\times $ 16-bit MAC design using fast 5:2 compressor. In: IEEE International Conference on Application-Specific Systems (2000)
León, J.S., Cárdenas, C.S., Castillo, E.V.: A high parallel HEVC fractional motion estimation architecture. In: 2016 IEEE ANDESCON, pp. 1–4 (2016). https://doi.org/10.1109/ANDESCON.2016.7836203
Lin, Y.K., Lin, C.C., Kuo, T.Y., Chang, T.S.: A hardware-efficient H.264/AVC motion-estimation design for high-definition video. IEEE Trans. Circ. Syst. I Regul. Pap. 55(6), 1526–1535 (2008). https://doi.org/10.1109/TCSI.2008.916681
Liu, Z., Goto, S., Ikenaga, T.: Optimization of propagate partial SAD and SAD tree motion estimation hardwired engine for H.264. In: 2008 IEEE International Conference on Computer Design, pp. 328–333 (2008). https://doi.org/10.1109/ICCD.2008.4751881
Maich, H., Afonso, V., Zatt, B., Agostini, L., Porto, M.: HEVC fractional motion estimation complexity reduction for real-time applications. In: 2014 IEEE 5th Latin American Symposium on Circuits and Systems, pp. 1–4 (2014). https://doi.org/10.1109/LASCAS.2014.6820302
Manjunatha, D., Kumar, P., Karthik, R.: Fpga implementation of sum of absolute difference (SAD) for video applications. ARPN J. Eng. Appl. Sci. 12(24), 7192–7197 (2017)
McCann, K., Bross, B., Han, W.J., Kim, I.K., Sugimoto, K., Sullivan, G.J.: HM10: High Efficiency Video Coding Test Model (HM10) Encoder Description (2013)
Mert, A., Kalali, E., Hamzaoglu, I.: Low complexity HEVC sub-pixel motion estimation technique and its hardware implementation. In: IEEE 6th International Conference on Consumer Electronics (ICCE), pp. 1–4 (2016)
Multicoreware: x265—An Open-Source HEVC Encoder. http://x265.org/. Accessed Apr 2018
Nalluri, P., Alves, L.N., Navarro, A.: A novel SAD architecture for variable block size motion estimation in HEVC video coding. In: International Symposium on System on Chip (SoC) (2013). https://doi.org/10.1109/ISSoC.2013.6675269
Nalluri, P., Alves, L., Navarro, A.: High speed SAD architectures for variable block size motion estimation in HEVC video coding. In: IEEE International Conference on Image Processing (ICIP), pp. 1233–1237 (2014)
Pastuszak, G., Trochimiuk, M.: Algorithm and architecture design of the motion estimation for the H.265/HEVC 4K-UHD encoder. J. Real-Time Image Proc. 12(2), 517–529 (2016)
Patricio, M., Gonzalez, A.: FPGA implementation of an efficient similarity-based adaptive window algorithm for real-time stereo matching. J. Real-Time Image Process. (2015)
Porto, M., Bampi, S., Altermann, J., Costa, E., Agostini, L.: A real time and power efficient HDTV motion estimation architecture using Àdder-compressors. In: 2011 IEEE Second Latin American Symposium on Circuits and Systems (LASCAS), pp. 1–4 (2011). https://doi.org/10.1109/LASCAS.2011.5750279
Porto, M., Silva, A., Almeida, S., Costa, E., Bampi, S.: Motion estimation architecture using efficient adder-compressors for HDTV video coding. J. Integrat. Circ. Syst. 5(1), 78–88 (2010)
Rahmani, A.M., Liljeberg, P., Hemani, A., Jantsch, A., Tenhunen, H.: The Dark Side of Silicon: Energy Efficient Computing in the Dark Silicon Era, 1st edn. Springer, Berlin (2017)
Sanchez, G., Porto, M., Agostini, L.: A hardware friedly motion estimation algorithm for the emergent HEVC standard and its low power hardware design. In: 2013 IEEE International Conference on Image Processing, pp. 1991–1994 (2013). https://doi.org/10.1109/ICIP.2013.6738410
Sanchez, G., Saldanha, M., Zatt, B., Porto, M., Agostini, L., Marcon, C.: Edge-aware depth motion estimation—a complexity reduction scheme for 3D-HEVC. In: 25th European Signal Processing Conference (EUSIPCO), pp. 1524–1528 (2017)
Sanchez, G., Zatt, B., Porto, M., Agostini, L.: Hardware-friendly HEVC motion estimation: new algorithms and efficient VLSI designs targeting high definition videos. Analog Integr. Circ. Sig. Process 82(1), 135–146 (2015)
Sayadi, F., Chouchene, M., Bahri, H., Haggui, O., Ounir, B.: Improved approach for full search motion estimation on GPU. Int. J. Comput 20, 4 (2017)
Sayadi, F., Chouchene, M., Bahri, H., Khemiri, R., Atri, M.: Cuda memory optimisation strategies for motion estimation. IET Computers and Digital Techniques 13, (2019)
Seidel, I., Brascher, A.B., Guntzel, J.L.: Combining Pel decimation with partial distortion elimination to increase SAD energy efficiency. In: 2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), pp. 177–184 (2015). https://doi.org/10.1109/PATMOS.2015.7347604
Seidel, I., Brascher, A.B., Monteiro, M., Guntzel, J.L.: Exploring Pel decimation to trade off between energy and quality in video coding. In: 2014 IEEE 5th Latin American Symposium on Circuits and Systems, pp. 1–4 (2014). https://doi.org/10.1109/LASCAS.2014.6820316
Seidel, I., de Moraes, B.G., Brascher, A.B., Guntzel, J.L.: On the impacts of Pel decimation and high-Vt/Low-Vdd on SAD calculation. In: 2013 26th Symposium on Integrated Circuits and Systems Design (SBCCI), pp. 1–6 (2013). https://doi.org/10.1109/SBCCI.2013.6644880
Seidel, I., de Moraes, B.G., Guntzel, J.L.: A low-power configurable VLSI architecture for sum of absolute differences calculation. In: 2013 IEEE 4th Latin American Symposium on Circuits and Systems (LASCAS), pp. 1–4 (2013). https://doi.org/10.1109/LASCAS.2013.6519042
Shafique, M., Garg, S.: Computing in the dark silicon era: current trends and research challenges. IEEE Design Test 34(2), 8–23 (2017). https://doi.org/10.1109/MDAT.2016.2633408
Shafique, M., Henkel, J.: Mitigating the power density and temperature problems in the nano-era. In: 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 176–177 (2015). https://doi.org/10.1109/ICCAD.2015.7372567
Shahid, M.U., Ahmed, A., Martina, M., Masera, G., Magli, E.: Parallel h.264/avc fast rate-distortion optimized motion estimation by using a graphics processing unit and dedicated hardware. IEEE Trans. Circ. Syst. Video Technol. 25(4), 701–715 (2015). https://doi.org/10.1109/TCSVT.2014.2351111
Silveira, B., Paim, G., Abreu, B., Grellert, M., Diniz, C.M., da Costa, E.A.C., Bampi, S.: Power-efficient sum of absolute differences hardware architecture using adder compressors for integer motion estimation design. IEEE Trans. Circ. Syst. I Regul. Pap. 64(12), 3126–3137 (2017). https://doi.org/10.1109/TCSI.2017.2728802
Silveira, B., Paim, G., Diniz, C.M., da Costa, E.A.C.: Power-efficient sum of absolute differences architecture using adder compressors. In: 2016 IEEE International Conference on Electronics, Circuits and Systems (ICECS), pp. 340–343 (2016). https://doi.org/10.1109/ICECS.2016.7841202
Singh, K., Ahamed, S.: Low power motion estimation algorithm and architecture of HEVC/H.265 for consumer applications. IEEE Trans. Consum. Electron. 64, 3 (2018)
ST 65nm standard cell library. www.st.com
Ultra Video Group. http://ultravideo.cs.tut.fi/#testsequences
Vanne, J., Aho, E., Hamalainen, T.D., Kuusilinna, K.: A high-performance sum of absolute difference implementation for motion estimation. IEEE Trans. Circ. Syst. Video Technol. 16(7), 876–883 (2006). https://doi.org/10.1109/TCSVT.2006.877150
Vayalil, N.C., Safari, A., Kong, Y.: ASIC design in residue number system for calculating minimum sum of absolute differences. In: 2015 Tenth International Conference on Computer Engineering Systems (ICCES), pp. 129–132 (2015). https://doi.org/10.1109/ICCES.2015.7393032
Walter, F.L., Diniz, C.M., Bampi, S.: Synthesis and comparison of low-power high-throughput architectures for SAD calculation. In: 2011 IEEE Second Latin American Symposium on Circuits and Systems (LASCAS), pp. 1–4 (2011). https://doi.org/10.1109/LASCAS.2011.5750276
Walter, F.L., Diniz, C.M., Bampi, S.: Synthesis and comparison of low-power high-throughput architectures for SAD calculation. Analog Integr. Circ. Sig. Process 73(3), 873–884 (2012)
Weinberger, A.: 4–2 carry-save adder module. IBM Tech. Disclosure Bull. 23(8), 3811–3814 (1981)
Xue, Y., Su, H., Ren, J., Wen, M., Zhang, C., Xiao, L.: A highly parallel and scalable motion estimation algorithm with GPU for HEVC. Sci. Program. J. (2017)
Yuan, X., Jinsong, L., Liwei, G., Zhi, Z., Teng, R.K.F.: A high performance VLSI architecture for integer motion estimation in HEVC. In: 2013 IEEE 10th International Conference on ASIC, pp. 1–4 (2013). https://doi.org/10.1109/ASICON.2013.6811845
Yufei, L., Xiubo, F., Qin, W.: A high-performance low cost SAD architecture for video coding. IEEE Trans. Consum. Electron. 53(2), 535–541 (2007). https://doi.org/10.1109/TCE.2007.381726
Zhang, Y., Fan, R., Zhang, C., Wang, G., Li, Z.: SIMD acceleration for HEVC encoding on DSP. In: Proceedings of APSIPA Annual Summit and Conference, pp. 1–7 (2017)
Zhou, D., Zhou, J., He, G., Goto, S.: A 1.59 Gpixel/s motion estimation processor with - 211 to +211 search range for UHDTV video encoder. IEEE J. Solid-State Circ. 49(4), 827–837 (2014). https://doi.org/10.1109/JSSC.2013.2293136
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Paim, G., Santana, G.M., Abreu, B.A. et al. Exploring high-order adder compressors for power reduction in sum of absolute differences architectures for real-time UHD video encoding. J Real-Time Image Proc 17, 1735–1754 (2020). https://doi.org/10.1007/s11554-019-00939-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-019-00939-x