Exploring high-order adder compressors for power reduction in sum of absolute differences architectures for real-time UHD video encoding

Paim, Guilherme; Santana, Gustavo M.; Abreu, Brunno A.; Rocha, Leandro M. G.; Grellert, Mateus; da Costa, Eduardo A. C.; Bampi, Sergio

doi:10.1007/s11554-019-00939-x

Exploring high-order adder compressors for power reduction in sum of absolute differences architectures for real-time UHD video encoding

Original Research Paper
Published: 04 January 2020

Volume 17, pages 1735–1754, (2020)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Guilherme Paim¹,
Gustavo M. Santana¹,
Brunno A. Abreu¹,
Leandro M. G. Rocha¹,
Mateus Grellert²,
Eduardo A. C. da Costa² &
…
Sergio Bampi¹

358 Accesses
8 Citations
Explore all metrics

Abstract

The sum of absolute difference (SAD) calculation is one of the most computing-intensive operations in video encoders compatible with recent standards, such as high-efficiency video coding (HEVC). SAD hardware architectures employ an adder tree to accumulate the coefficients from the absolute difference between two video blocks. This paper employs high-order adder compressors (HOAC) structures into SAD hardware architectures to achieve ultra-high definition (UHD) encoding in real time, using block sizes compatible with HEVC. The proposed HOAC architectures are power-efficient and enable low-power SAD hardware accelerators. Our throughput analysis shows that the HOAC-based SAD hardware architecture is capable of encoding UHD 4K ($3840\times 2160$) videos in real-time at 60 frames per second. The architectures were entirely designed as dedicated ASIC blocks and were synthesized to ST 65 nm CMOS standard cells. Synthesis results show that SAD architectures using 64-2, 32-2, 16-2 and 8-2 compressors built from 4-2 compressors are significantly more efficient in terms of circuit area and total power dissipation when compared with SAD architectures using conventional adders selected by a commercial logic synthesis tool.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Low-Power Sum of Absolute Difference Architecture for Video Coding

Hardware-friendly HEVC motion estimation: new algorithms and efficient VLSI designs targeting high definition videos

Article 14 June 2014

Low-complexity motion estimation design using modified XOR function

Article 24 September 2015

References

Abreu, B., Paim, G., Grellert, M., Silveira, B., Diniz, C., Costa, E., Bampi, S.: Exploiting absolute arithmetic for power-efficient sum of absolute differences. In: IEEE International Conference on Electronics, Circuits and Systems (ICECS), pp. 522–525 (2017)
Afonso, V., Maich, H., Agostini, L., Franco, D.: Low cost and high throughput FME interpolation for the HEVC emerging video coding standard. In: 2013 IEEE 4th Latin American Symposium on Circuits and Systems (LASCAS), pp. 1–4 (2013). https://doi.org/10.1109/LASCAS.2013.6519017
Afonso, V., Maich, H., Agostini, L., Franco, D.: Simplified HEVC FME interpolation unit targeting a low cost and high throughput hardware design. In: 2013 Data Compression Conference, pp. 473 (2013). https://doi.org/10.1109/DCC.2013.55
Afonso, V., Maich, H., Audibert, L., Zatt, B., Porto, M., Agostini, L.: Memory-aware and high-throughput hardware design for the HEVC fractional motion estimation. In: 2015 28th Symposium on Integrated Circuits and Systems Design (SBCCI), pp. 1–6 (2015)
Alcocer, E., Gutierrez, R., Lopez-Granado, O., Malumbres, M.P.: Design and implementation of an efficient hardware integer motion estimator for an HEVC video encoder. J. Real-Time Image Process. 20L, 1–11 (2016)
Google Scholar
Altermann, J.S., da Costa, E.A.C., Bampi, S.: Fast forward and inverse transforms for the H.264/AVC standard using hierarchical adder compressors. In: 2010 18th IEEE/IFIP International Conference on VLSI and System-on-Chip, pp. 310–315 (2010)
Bross, B., Han, W.J., Ohm, J.R., Sullivan, G.J., Wiegand, T.: High efficiency video coding (HEVC) text specification draft 7 (2012)
Byun, J., Jung, Y., Kim, J.: Design of integer motion estimator of HEVC for asymmetric motion-partitioning mode and 4K-UHD. Electron. Lett. 49(18), 1142–1143 (2013). https://doi.org/10.1049/el.2013.0936
Article Google Scholar
Chang, C.H., Gu, J., Zhang, M.: Hardware implementation for the HEVC fractional motion estimation targeting real-time and low-energy. J. Integrat. Circ. Syst. 11(2), 106–120 (2016)
Google Scholar
Chen, C.Y., Chien, S.Y., Huang, Y.W., Chen, T.C., Wang, T.C., Chen, L.G.: Analysis and architecture design of variable block-size motion estimation for H.264/AVC. IEEE Trans. Circ. Syst. I Regul. Pap. 53(3), 578–593 (2006)
Article Google Scholar
Ding, D., Ye, X., Wang, S.: 1/2 and 1/4 pixel paralleled FME with a scalable search pattern for HEVC ultra-HD encoding. In: IEEE International Conference on Communication Technology, pp. 278–281 (2015)
Dinh, C., Nguyen, T., Pham, C., Nguyen, P., Duong, D., Phung, H., Pham, T., Nguyen, T.: A novel parallel hardware architecture for inter motion estimation in HEVC. J. Telecommun. Electron. Comput. Eng 9, 1–3 (2017)
Google Scholar
Dinh, V.N., Phuong, H.A.: High speed SAD architecture for variable block size motion estimation in HEVC encoder. In: 2016 IEEE Sixth International Conference on Communications and Electronics (ICCE), pp. 195–198 (2016)
Esmaeilzadeh, H., Blem, E., Amant, R.S., Sankaralingam, K., Burger, D.: Dark silicon and the end of multicore scaling. IEEE Micro 32(3), 122–134 (2012). https://doi.org/10.1109/MM.2012.17
Article Google Scholar
Grellert, M., Bampi, S., Zatt, B.: Complexity-scalable HEVC encoding. In: 2016 Picture Coding Symposium (PCS), pp. 1–5 (2016). https://doi.org/10.1109/PCS.2016.7906356
H.264 x264 Encoder. http://www.videolan.org/developers/x264.html
Hanoosh, Z., Roodaki, H.: A parallel architecture for motion estimation in HEVC encoder. CSI J. Comput. Sci. Eng. 20, 4 (2018)
Google Scholar
Hse, G., Zhou, D.L.Y., Chen, Z., Zhang, T., Goto, S.: High-throughput power-efficient VLSI architecture of fractional motion estimation for ultra-HD HEVC video encoding. IEEE Trans. Very Large Scale Integrat. Syst. 23, 12 (2015)
Google Scholar
Huang, Y., Liu, Q., Ikenaga, T.: Compressor tree based processing element optimization in propagate partial SAD architecture. In: APCCAS 2008—2008 IEEE Asia Pacific Conference on Circuits and Systems, pp. 1786–1789 (2008). https://doi.org/10.1109/APCCAS.2008.4746388
Jayakrishnan, P., Kittu, H.: Pipelined architecture for motion estimation in HEVC video coding. Indian J. Sci. Technol. 32, 1–5 (2016)
Article Google Scholar
Jou, S.Y., Chang, S.J., Chang, T.S.: Fast motion estimation algorithm and design for real time QFHD high efficiency video coding. IEEE Trans. Circuits Syst. Video Technol. 25(9), 1533–1544 (2015). https://doi.org/10.1109/TCSVT.2015.2389472
Article Google Scholar
Khdr, H., Pagani, S., Sousa, E., Lari, V., Pathania, A., Hannig, F., Shafique, M., Teich, J., Henkel, J.: Power density-aware resource management for heterogeneous tiled multicores. IEEE Trans. Comput. 66(3), 488–501 (2017). https://doi.org/10.1109/TC.2016.2595560
Article MathSciNet Google Scholar
Khemiri, R., Chouchene, M., Bahri, H., Sayadi, F., Kibeyaa, H., Atri, M., Masmoudi, N.: Fast sad algorithm of HEVC video encoder on two successive GPU generations. Int. J. Imaging Robot. 17, 4 (2017)
Google Scholar
Khemiri, R., Kibeya, H., Sayadi, F., Bahri, N., Atri, M., Masmoudi, N.: Optimisation of HEVC motion estimation exploiting sad and SSD GPU-based implementation. IET Image Process. J. 12, 4 (2018)
Google Scholar
Kibeya, H., Bahri, N., Ayed, M., Masmoudi, N.: Sad and SSE implementation for HEVC encoder on DSP tms320c6678. In: International Image Processing, Applications and Systems (IPAS), pp. 1–6 (2016)
Koren, I.: Computer Arithmetic Algorithms. A. K. Peters, Natick (2002)
MATH Google Scholar
Kumm, M., Kleinlein, M., Zipf, P.: Efficient sum of absolute difference computation on FPGAs. In: 2016 26th International Conference on Field Programmable Logic and Applications, pp. 1–4 (2016). https://doi.org/10.1109/FPL.2016.7577374
Kwon, O., Novka, K., Swartzlander, E.: A 16-bit $\times $ 16-bit MAC design using fast 5:2 compressor. In: IEEE International Conference on Application-Specific Systems (2000)
León, J.S., Cárdenas, C.S., Castillo, E.V.: A high parallel HEVC fractional motion estimation architecture. In: 2016 IEEE ANDESCON, pp. 1–4 (2016). https://doi.org/10.1109/ANDESCON.2016.7836203
Lin, Y.K., Lin, C.C., Kuo, T.Y., Chang, T.S.: A hardware-efficient H.264/AVC motion-estimation design for high-definition video. IEEE Trans. Circ. Syst. I Regul. Pap. 55(6), 1526–1535 (2008). https://doi.org/10.1109/TCSI.2008.916681
Article MathSciNet Google Scholar
Liu, Z., Goto, S., Ikenaga, T.: Optimization of propagate partial SAD and SAD tree motion estimation hardwired engine for H.264. In: 2008 IEEE International Conference on Computer Design, pp. 328–333 (2008). https://doi.org/10.1109/ICCD.2008.4751881
Maich, H., Afonso, V., Zatt, B., Agostini, L., Porto, M.: HEVC fractional motion estimation complexity reduction for real-time applications. In: 2014 IEEE 5th Latin American Symposium on Circuits and Systems, pp. 1–4 (2014). https://doi.org/10.1109/LASCAS.2014.6820302
Manjunatha, D., Kumar, P., Karthik, R.: Fpga implementation of sum of absolute difference (SAD) for video applications. ARPN J. Eng. Appl. Sci. 12(24), 7192–7197 (2017)
Google Scholar
McCann, K., Bross, B., Han, W.J., Kim, I.K., Sugimoto, K., Sullivan, G.J.: HM10: High Efficiency Video Coding Test Model (HM10) Encoder Description (2013)
Mert, A., Kalali, E., Hamzaoglu, I.: Low complexity HEVC sub-pixel motion estimation technique and its hardware implementation. In: IEEE 6th International Conference on Consumer Electronics (ICCE), pp. 1–4 (2016)
Multicoreware: x265—An Open-Source HEVC Encoder. http://x265.org/. Accessed Apr 2018
Nalluri, P., Alves, L.N., Navarro, A.: A novel SAD architecture for variable block size motion estimation in HEVC video coding. In: International Symposium on System on Chip (SoC) (2013). https://doi.org/10.1109/ISSoC.2013.6675269
Nalluri, P., Alves, L., Navarro, A.: High speed SAD architectures for variable block size motion estimation in HEVC video coding. In: IEEE International Conference on Image Processing (ICIP), pp. 1233–1237 (2014)
Pastuszak, G., Trochimiuk, M.: Algorithm and architecture design of the motion estimation for the H.265/HEVC 4K-UHD encoder. J. Real-Time Image Proc. 12(2), 517–529 (2016)
Article Google Scholar
Patricio, M., Gonzalez, A.: FPGA implementation of an efficient similarity-based adaptive window algorithm for real-time stereo matching. J. Real-Time Image Process. (2015)
Porto, M., Bampi, S., Altermann, J., Costa, E., Agostini, L.: A real time and power efficient HDTV motion estimation architecture using Àdder-compressors. In: 2011 IEEE Second Latin American Symposium on Circuits and Systems (LASCAS), pp. 1–4 (2011). https://doi.org/10.1109/LASCAS.2011.5750279
Porto, M., Silva, A., Almeida, S., Costa, E., Bampi, S.: Motion estimation architecture using efficient adder-compressors for HDTV video coding. J. Integrat. Circ. Syst. 5(1), 78–88 (2010)
Google Scholar
Rahmani, A.M., Liljeberg, P., Hemani, A., Jantsch, A., Tenhunen, H.: The Dark Side of Silicon: Energy Efficient Computing in the Dark Silicon Era, 1st edn. Springer, Berlin (2017)
Book Google Scholar
Sanchez, G., Porto, M., Agostini, L.: A hardware friedly motion estimation algorithm for the emergent HEVC standard and its low power hardware design. In: 2013 IEEE International Conference on Image Processing, pp. 1991–1994 (2013). https://doi.org/10.1109/ICIP.2013.6738410
Sanchez, G., Saldanha, M., Zatt, B., Porto, M., Agostini, L., Marcon, C.: Edge-aware depth motion estimation—a complexity reduction scheme for 3D-HEVC. In: 25th European Signal Processing Conference (EUSIPCO), pp. 1524–1528 (2017)
Sanchez, G., Zatt, B., Porto, M., Agostini, L.: Hardware-friendly HEVC motion estimation: new algorithms and efficient VLSI designs targeting high definition videos. Analog Integr. Circ. Sig. Process 82(1), 135–146 (2015)
Article Google Scholar
Sayadi, F., Chouchene, M., Bahri, H., Haggui, O., Ounir, B.: Improved approach for full search motion estimation on GPU. Int. J. Comput 20, 4 (2017)
Google Scholar
Sayadi, F., Chouchene, M., Bahri, H., Khemiri, R., Atri, M.: Cuda memory optimisation strategies for motion estimation. IET Computers and Digital Techniques 13, (2019)
Seidel, I., Brascher, A.B., Guntzel, J.L.: Combining Pel decimation with partial distortion elimination to increase SAD energy efficiency. In: 2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), pp. 177–184 (2015). https://doi.org/10.1109/PATMOS.2015.7347604
Seidel, I., Brascher, A.B., Monteiro, M., Guntzel, J.L.: Exploring Pel decimation to trade off between energy and quality in video coding. In: 2014 IEEE 5th Latin American Symposium on Circuits and Systems, pp. 1–4 (2014). https://doi.org/10.1109/LASCAS.2014.6820316
Seidel, I., de Moraes, B.G., Brascher, A.B., Guntzel, J.L.: On the impacts of Pel decimation and high-Vt/Low-Vdd on SAD calculation. In: 2013 26th Symposium on Integrated Circuits and Systems Design (SBCCI), pp. 1–6 (2013). https://doi.org/10.1109/SBCCI.2013.6644880
Seidel, I., de Moraes, B.G., Guntzel, J.L.: A low-power configurable VLSI architecture for sum of absolute differences calculation. In: 2013 IEEE 4th Latin American Symposium on Circuits and Systems (LASCAS), pp. 1–4 (2013). https://doi.org/10.1109/LASCAS.2013.6519042
Shafique, M., Garg, S.: Computing in the dark silicon era: current trends and research challenges. IEEE Design Test 34(2), 8–23 (2017). https://doi.org/10.1109/MDAT.2016.2633408
Article Google Scholar
Shafique, M., Henkel, J.: Mitigating the power density and temperature problems in the nano-era. In: 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 176–177 (2015). https://doi.org/10.1109/ICCAD.2015.7372567
Shahid, M.U., Ahmed, A., Martina, M., Masera, G., Magli, E.: Parallel h.264/avc fast rate-distortion optimized motion estimation by using a graphics processing unit and dedicated hardware. IEEE Trans. Circ. Syst. Video Technol. 25(4), 701–715 (2015). https://doi.org/10.1109/TCSVT.2014.2351111
Article Google Scholar
Silveira, B., Paim, G., Abreu, B., Grellert, M., Diniz, C.M., da Costa, E.A.C., Bampi, S.: Power-efficient sum of absolute differences hardware architecture using adder compressors for integer motion estimation design. IEEE Trans. Circ. Syst. I Regul. Pap. 64(12), 3126–3137 (2017). https://doi.org/10.1109/TCSI.2017.2728802
Article Google Scholar
Silveira, B., Paim, G., Diniz, C.M., da Costa, E.A.C.: Power-efficient sum of absolute differences architecture using adder compressors. In: 2016 IEEE International Conference on Electronics, Circuits and Systems (ICECS), pp. 340–343 (2016). https://doi.org/10.1109/ICECS.2016.7841202
Singh, K., Ahamed, S.: Low power motion estimation algorithm and architecture of HEVC/H.265 for consumer applications. IEEE Trans. Consum. Electron. 64, 3 (2018)
Article Google Scholar
ST 65nm standard cell library. www.st.com
Ultra Video Group. http://ultravideo.cs.tut.fi/#testsequences
Vanne, J., Aho, E., Hamalainen, T.D., Kuusilinna, K.: A high-performance sum of absolute difference implementation for motion estimation. IEEE Trans. Circ. Syst. Video Technol. 16(7), 876–883 (2006). https://doi.org/10.1109/TCSVT.2006.877150
Article Google Scholar
Vayalil, N.C., Safari, A., Kong, Y.: ASIC design in residue number system for calculating minimum sum of absolute differences. In: 2015 Tenth International Conference on Computer Engineering Systems (ICCES), pp. 129–132 (2015). https://doi.org/10.1109/ICCES.2015.7393032
Walter, F.L., Diniz, C.M., Bampi, S.: Synthesis and comparison of low-power high-throughput architectures for SAD calculation. In: 2011 IEEE Second Latin American Symposium on Circuits and Systems (LASCAS), pp. 1–4 (2011). https://doi.org/10.1109/LASCAS.2011.5750276
Walter, F.L., Diniz, C.M., Bampi, S.: Synthesis and comparison of low-power high-throughput architectures for SAD calculation. Analog Integr. Circ. Sig. Process 73(3), 873–884 (2012)
Article Google Scholar
Weinberger, A.: 4–2 carry-save adder module. IBM Tech. Disclosure Bull. 23(8), 3811–3814 (1981)
Google Scholar
Xue, Y., Su, H., Ren, J., Wen, M., Zhang, C., Xiao, L.: A highly parallel and scalable motion estimation algorithm with GPU for HEVC. Sci. Program. J. (2017)
Yuan, X., Jinsong, L., Liwei, G., Zhi, Z., Teng, R.K.F.: A high performance VLSI architecture for integer motion estimation in HEVC. In: 2013 IEEE 10th International Conference on ASIC, pp. 1–4 (2013). https://doi.org/10.1109/ASICON.2013.6811845
Yufei, L., Xiubo, F., Qin, W.: A high-performance low cost SAD architecture for video coding. IEEE Trans. Consum. Electron. 53(2), 535–541 (2007). https://doi.org/10.1109/TCE.2007.381726
Article Google Scholar
Zhang, Y., Fan, R., Zhang, C., Wang, G., Li, Z.: SIMD acceleration for HEVC encoding on DSP. In: Proceedings of APSIPA Annual Summit and Conference, pp. 1–7 (2017)
Zhou, D., Zhou, J., He, G., Goto, S.: A 1.59 Gpixel/s motion estimation processor with - 211 to +211 search range for UHDTV video encoder. IEEE J. Solid-State Circ. 49(4), 827–837 (2014). https://doi.org/10.1109/JSSC.2013.2293136
Article Google Scholar

Download references

Author information

Authors and Affiliations

Graduate Program in Microelectronics (PGMicro) 67-215, UFRGS, Av. Bento Gonçalves 9500, Porto Alegre, RS, 91501-970, Brazil
Guilherme Paim, Gustavo M. Santana, Brunno A. Abreu, Leandro M. G. Rocha & Sergio Bampi
Graduate Program on Electronic Engineering and Computer Science, Catholic University of Pelotas (UCPel), Av. Gonçalves Chaves 373, Pelotas, RS, 96015-560, Brazil
Mateus Grellert & Eduardo A. C. da Costa

Authors

Guilherme Paim
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo M. Santana
View author publications
You can also search for this author in PubMed Google Scholar
Brunno A. Abreu
View author publications
You can also search for this author in PubMed Google Scholar
Leandro M. G. Rocha
View author publications
You can also search for this author in PubMed Google Scholar
Mateus Grellert
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo A. C. da Costa
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Bampi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guilherme Paim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Paim, G., Santana, G.M., Abreu, B.A. et al. Exploring high-order adder compressors for power reduction in sum of absolute differences architectures for real-time UHD video encoding. J Real-Time Image Proc 17, 1735–1754 (2020). https://doi.org/10.1007/s11554-019-00939-x

Download citation

Received: 29 March 2019
Accepted: 17 December 2019
Published: 04 January 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11554-019-00939-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring high-order adder compressors for power reduction in sum of absolute differences architectures for real-time UHD video encoding

Abstract

Access this article

Similar content being viewed by others

Low-Power Sum of Absolute Difference Architecture for Video Coding

Hardware-friendly HEVC motion estimation: new algorithms and efficient VLSI designs targeting high definition videos

Low-complexity motion estimation design using modified XOR function

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploring high-order adder compressors for power reduction in sum of absolute differences architectures for real-time UHD video encoding

Abstract

Access this article

Similar content being viewed by others

Low-Power Sum of Absolute Difference Architecture for Video Coding

Hardware-friendly HEVC motion estimation: new algorithms and efficient VLSI designs targeting high definition videos

Low-complexity motion estimation design using modified XOR function

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation