Skip to main content

Advertisement

Log in

Exploring high-order adder compressors for power reduction in sum of absolute differences architectures for real-time UHD video encoding

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

The sum of absolute difference (SAD) calculation is one of the most computing-intensive operations in video encoders compatible with recent standards, such as high-efficiency video coding (HEVC). SAD hardware architectures employ an adder tree to accumulate the coefficients from the absolute difference between two video blocks. This paper employs high-order adder compressors (HOAC) structures into SAD hardware architectures to achieve ultra-high definition (UHD) encoding in real time, using block sizes compatible with HEVC. The proposed HOAC architectures are power-efficient and enable low-power SAD hardware accelerators. Our throughput analysis shows that the HOAC-based SAD hardware architecture is capable of encoding UHD 4K (\(3840\times 2160\)) videos in real-time at 60 frames per second. The architectures were entirely designed as dedicated ASIC blocks and were synthesized to ST 65 nm CMOS standard cells. Synthesis results show that SAD architectures using 64-2, 32-2, 16-2 and 8-2 compressors built from 4-2 compressors are significantly more efficient in terms of circuit area and total power dissipation when compared with SAD architectures using conventional adders selected by a commercial logic synthesis tool.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Abreu, B., Paim, G., Grellert, M., Silveira, B., Diniz, C., Costa, E., Bampi, S.: Exploiting absolute arithmetic for power-efficient sum of absolute differences. In: IEEE International Conference on Electronics, Circuits and Systems (ICECS), pp. 522–525 (2017)

  2. Afonso, V., Maich, H., Agostini, L., Franco, D.: Low cost and high throughput FME interpolation for the HEVC emerging video coding standard. In: 2013 IEEE 4th Latin American Symposium on Circuits and Systems (LASCAS), pp. 1–4 (2013). https://doi.org/10.1109/LASCAS.2013.6519017

  3. Afonso, V., Maich, H., Agostini, L., Franco, D.: Simplified HEVC FME interpolation unit targeting a low cost and high throughput hardware design. In: 2013 Data Compression Conference, pp. 473 (2013). https://doi.org/10.1109/DCC.2013.55

  4. Afonso, V., Maich, H., Audibert, L., Zatt, B., Porto, M., Agostini, L.: Memory-aware and high-throughput hardware design for the HEVC fractional motion estimation. In: 2015 28th Symposium on Integrated Circuits and Systems Design (SBCCI), pp. 1–6 (2015)

  5. Alcocer, E., Gutierrez, R., Lopez-Granado, O., Malumbres, M.P.: Design and implementation of an efficient hardware integer motion estimator for an HEVC video encoder. J. Real-Time Image Process. 20L, 1–11 (2016)

    Google Scholar 

  6. Altermann, J.S., da Costa, E.A.C., Bampi, S.: Fast forward and inverse transforms for the H.264/AVC standard using hierarchical adder compressors. In: 2010 18th IEEE/IFIP International Conference on VLSI and System-on-Chip, pp. 310–315 (2010)

  7. Bross, B., Han, W.J., Ohm, J.R., Sullivan, G.J., Wiegand, T.: High efficiency video coding (HEVC) text specification draft 7 (2012)

  8. Byun, J., Jung, Y., Kim, J.: Design of integer motion estimator of HEVC for asymmetric motion-partitioning mode and 4K-UHD. Electron. Lett. 49(18), 1142–1143 (2013). https://doi.org/10.1049/el.2013.0936

    Article  Google Scholar 

  9. Chang, C.H., Gu, J., Zhang, M.: Hardware implementation for the HEVC fractional motion estimation targeting real-time and low-energy. J. Integrat. Circ. Syst. 11(2), 106–120 (2016)

    Google Scholar 

  10. Chen, C.Y., Chien, S.Y., Huang, Y.W., Chen, T.C., Wang, T.C., Chen, L.G.: Analysis and architecture design of variable block-size motion estimation for H.264/AVC. IEEE Trans. Circ. Syst. I Regul. Pap. 53(3), 578–593 (2006)

    Article  Google Scholar 

  11. Ding, D., Ye, X., Wang, S.: 1/2 and 1/4 pixel paralleled FME with a scalable search pattern for HEVC ultra-HD encoding. In: IEEE International Conference on Communication Technology, pp. 278–281 (2015)

  12. Dinh, C., Nguyen, T., Pham, C., Nguyen, P., Duong, D., Phung, H., Pham, T., Nguyen, T.: A novel parallel hardware architecture for inter motion estimation in HEVC. J. Telecommun. Electron. Comput. Eng 9, 1–3 (2017)

    Google Scholar 

  13. Dinh, V.N., Phuong, H.A.: High speed SAD architecture for variable block size motion estimation in HEVC encoder. In: 2016 IEEE Sixth International Conference on Communications and Electronics (ICCE), pp. 195–198 (2016)

  14. Esmaeilzadeh, H., Blem, E., Amant, R.S., Sankaralingam, K., Burger, D.: Dark silicon and the end of multicore scaling. IEEE Micro 32(3), 122–134 (2012). https://doi.org/10.1109/MM.2012.17

    Article  Google Scholar 

  15. Grellert, M., Bampi, S., Zatt, B.: Complexity-scalable HEVC encoding. In: 2016 Picture Coding Symposium (PCS), pp. 1–5 (2016). https://doi.org/10.1109/PCS.2016.7906356

  16. H.264 x264 Encoder. http://www.videolan.org/developers/x264.html

  17. Hanoosh, Z., Roodaki, H.: A parallel architecture for motion estimation in HEVC encoder. CSI J. Comput. Sci. Eng. 20, 4 (2018)

    Google Scholar 

  18. Hse, G., Zhou, D.L.Y., Chen, Z., Zhang, T., Goto, S.: High-throughput power-efficient VLSI architecture of fractional motion estimation for ultra-HD HEVC video encoding. IEEE Trans. Very Large Scale Integrat. Syst. 23, 12 (2015)

    Google Scholar 

  19. Huang, Y., Liu, Q., Ikenaga, T.: Compressor tree based processing element optimization in propagate partial SAD architecture. In: APCCAS 2008—2008 IEEE Asia Pacific Conference on Circuits and Systems, pp. 1786–1789 (2008). https://doi.org/10.1109/APCCAS.2008.4746388

  20. Jayakrishnan, P., Kittu, H.: Pipelined architecture for motion estimation in HEVC video coding. Indian J. Sci. Technol. 32, 1–5 (2016)

    Article  Google Scholar 

  21. Jou, S.Y., Chang, S.J., Chang, T.S.: Fast motion estimation algorithm and design for real time QFHD high efficiency video coding. IEEE Trans. Circuits Syst. Video Technol. 25(9), 1533–1544 (2015). https://doi.org/10.1109/TCSVT.2015.2389472

    Article  Google Scholar 

  22. Khdr, H., Pagani, S., Sousa, E., Lari, V., Pathania, A., Hannig, F., Shafique, M., Teich, J., Henkel, J.: Power density-aware resource management for heterogeneous tiled multicores. IEEE Trans. Comput. 66(3), 488–501 (2017). https://doi.org/10.1109/TC.2016.2595560

    Article  MathSciNet  Google Scholar 

  23. Khemiri, R., Chouchene, M., Bahri, H., Sayadi, F., Kibeyaa, H., Atri, M., Masmoudi, N.: Fast sad algorithm of HEVC video encoder on two successive GPU generations. Int. J. Imaging Robot. 17, 4 (2017)

    Google Scholar 

  24. Khemiri, R., Kibeya, H., Sayadi, F., Bahri, N., Atri, M., Masmoudi, N.: Optimisation of HEVC motion estimation exploiting sad and SSD GPU-based implementation. IET Image Process. J. 12, 4 (2018)

    Google Scholar 

  25. Kibeya, H., Bahri, N., Ayed, M., Masmoudi, N.: Sad and SSE implementation for HEVC encoder on DSP tms320c6678. In: International Image Processing, Applications and Systems (IPAS), pp. 1–6 (2016)

  26. Koren, I.: Computer Arithmetic Algorithms. A. K. Peters, Natick (2002)

    MATH  Google Scholar 

  27. Kumm, M., Kleinlein, M., Zipf, P.: Efficient sum of absolute difference computation on FPGAs. In: 2016 26th International Conference on Field Programmable Logic and Applications, pp. 1–4 (2016). https://doi.org/10.1109/FPL.2016.7577374

  28. Kwon, O., Novka, K., Swartzlander, E.: A 16-bit $\times $ 16-bit MAC design using fast 5:2 compressor. In: IEEE International Conference on Application-Specific Systems (2000)

  29. León, J.S., Cárdenas, C.S., Castillo, E.V.: A high parallel HEVC fractional motion estimation architecture. In: 2016 IEEE ANDESCON, pp. 1–4 (2016). https://doi.org/10.1109/ANDESCON.2016.7836203

  30. Lin, Y.K., Lin, C.C., Kuo, T.Y., Chang, T.S.: A hardware-efficient H.264/AVC motion-estimation design for high-definition video. IEEE Trans. Circ. Syst. I Regul. Pap. 55(6), 1526–1535 (2008). https://doi.org/10.1109/TCSI.2008.916681

    Article  MathSciNet  Google Scholar 

  31. Liu, Z., Goto, S., Ikenaga, T.: Optimization of propagate partial SAD and SAD tree motion estimation hardwired engine for H.264. In: 2008 IEEE International Conference on Computer Design, pp. 328–333 (2008). https://doi.org/10.1109/ICCD.2008.4751881

  32. Maich, H., Afonso, V., Zatt, B., Agostini, L., Porto, M.: HEVC fractional motion estimation complexity reduction for real-time applications. In: 2014 IEEE 5th Latin American Symposium on Circuits and Systems, pp. 1–4 (2014). https://doi.org/10.1109/LASCAS.2014.6820302

  33. Manjunatha, D., Kumar, P., Karthik, R.: Fpga implementation of sum of absolute difference (SAD) for video applications. ARPN J. Eng. Appl. Sci. 12(24), 7192–7197 (2017)

    Google Scholar 

  34. McCann, K., Bross, B., Han, W.J., Kim, I.K., Sugimoto, K., Sullivan, G.J.: HM10: High Efficiency Video Coding Test Model (HM10) Encoder Description (2013)

  35. Mert, A., Kalali, E., Hamzaoglu, I.: Low complexity HEVC sub-pixel motion estimation technique and its hardware implementation. In: IEEE 6th International Conference on Consumer Electronics (ICCE), pp. 1–4 (2016)

  36. Multicoreware: x265—An Open-Source HEVC Encoder. http://x265.org/. Accessed Apr 2018

  37. Nalluri, P., Alves, L.N., Navarro, A.: A novel SAD architecture for variable block size motion estimation in HEVC video coding. In: International Symposium on System on Chip (SoC) (2013). https://doi.org/10.1109/ISSoC.2013.6675269

  38. Nalluri, P., Alves, L., Navarro, A.: High speed SAD architectures for variable block size motion estimation in HEVC video coding. In: IEEE International Conference on Image Processing (ICIP), pp. 1233–1237 (2014)

  39. Pastuszak, G., Trochimiuk, M.: Algorithm and architecture design of the motion estimation for the H.265/HEVC 4K-UHD encoder. J. Real-Time Image Proc. 12(2), 517–529 (2016)

    Article  Google Scholar 

  40. Patricio, M., Gonzalez, A.: FPGA implementation of an efficient similarity-based adaptive window algorithm for real-time stereo matching. J. Real-Time Image Process. (2015)

  41. Porto, M., Bampi, S., Altermann, J., Costa, E., Agostini, L.: A real time and power efficient HDTV motion estimation architecture using Àdder-compressors. In: 2011 IEEE Second Latin American Symposium on Circuits and Systems (LASCAS), pp. 1–4 (2011). https://doi.org/10.1109/LASCAS.2011.5750279

  42. Porto, M., Silva, A., Almeida, S., Costa, E., Bampi, S.: Motion estimation architecture using efficient adder-compressors for HDTV video coding. J. Integrat. Circ. Syst. 5(1), 78–88 (2010)

    Google Scholar 

  43. Rahmani, A.M., Liljeberg, P., Hemani, A., Jantsch, A., Tenhunen, H.: The Dark Side of Silicon: Energy Efficient Computing in the Dark Silicon Era, 1st edn. Springer, Berlin (2017)

    Book  Google Scholar 

  44. Sanchez, G., Porto, M., Agostini, L.: A hardware friedly motion estimation algorithm for the emergent HEVC standard and its low power hardware design. In: 2013 IEEE International Conference on Image Processing, pp. 1991–1994 (2013). https://doi.org/10.1109/ICIP.2013.6738410

  45. Sanchez, G., Saldanha, M., Zatt, B., Porto, M., Agostini, L., Marcon, C.: Edge-aware depth motion estimation—a complexity reduction scheme for 3D-HEVC. In: 25th European Signal Processing Conference (EUSIPCO), pp. 1524–1528 (2017)

  46. Sanchez, G., Zatt, B., Porto, M., Agostini, L.: Hardware-friendly HEVC motion estimation: new algorithms and efficient VLSI designs targeting high definition videos. Analog Integr. Circ. Sig. Process 82(1), 135–146 (2015)

    Article  Google Scholar 

  47. Sayadi, F., Chouchene, M., Bahri, H., Haggui, O., Ounir, B.: Improved approach for full search motion estimation on GPU. Int. J. Comput 20, 4 (2017)

    Google Scholar 

  48. Sayadi, F., Chouchene, M., Bahri, H., Khemiri, R., Atri, M.: Cuda memory optimisation strategies for motion estimation. IET Computers and Digital Techniques 13, (2019)

  49. Seidel, I., Brascher, A.B., Guntzel, J.L.: Combining Pel decimation with partial distortion elimination to increase SAD energy efficiency. In: 2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), pp. 177–184 (2015). https://doi.org/10.1109/PATMOS.2015.7347604

  50. Seidel, I., Brascher, A.B., Monteiro, M., Guntzel, J.L.: Exploring Pel decimation to trade off between energy and quality in video coding. In: 2014 IEEE 5th Latin American Symposium on Circuits and Systems, pp. 1–4 (2014). https://doi.org/10.1109/LASCAS.2014.6820316

  51. Seidel, I., de Moraes, B.G., Brascher, A.B., Guntzel, J.L.: On the impacts of Pel decimation and high-Vt/Low-Vdd on SAD calculation. In: 2013 26th Symposium on Integrated Circuits and Systems Design (SBCCI), pp. 1–6 (2013). https://doi.org/10.1109/SBCCI.2013.6644880

  52. Seidel, I., de Moraes, B.G., Guntzel, J.L.: A low-power configurable VLSI architecture for sum of absolute differences calculation. In: 2013 IEEE 4th Latin American Symposium on Circuits and Systems (LASCAS), pp. 1–4 (2013). https://doi.org/10.1109/LASCAS.2013.6519042

  53. Shafique, M., Garg, S.: Computing in the dark silicon era: current trends and research challenges. IEEE Design Test 34(2), 8–23 (2017). https://doi.org/10.1109/MDAT.2016.2633408

    Article  Google Scholar 

  54. Shafique, M., Henkel, J.: Mitigating the power density and temperature problems in the nano-era. In: 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 176–177 (2015). https://doi.org/10.1109/ICCAD.2015.7372567

  55. Shahid, M.U., Ahmed, A., Martina, M., Masera, G., Magli, E.: Parallel h.264/avc fast rate-distortion optimized motion estimation by using a graphics processing unit and dedicated hardware. IEEE Trans. Circ. Syst. Video Technol. 25(4), 701–715 (2015). https://doi.org/10.1109/TCSVT.2014.2351111

    Article  Google Scholar 

  56. Silveira, B., Paim, G., Abreu, B., Grellert, M., Diniz, C.M., da Costa, E.A.C., Bampi, S.: Power-efficient sum of absolute differences hardware architecture using adder compressors for integer motion estimation design. IEEE Trans. Circ. Syst. I Regul. Pap. 64(12), 3126–3137 (2017). https://doi.org/10.1109/TCSI.2017.2728802

    Article  Google Scholar 

  57. Silveira, B., Paim, G., Diniz, C.M., da Costa, E.A.C.: Power-efficient sum of absolute differences architecture using adder compressors. In: 2016 IEEE International Conference on Electronics, Circuits and Systems (ICECS), pp. 340–343 (2016). https://doi.org/10.1109/ICECS.2016.7841202

  58. Singh, K., Ahamed, S.: Low power motion estimation algorithm and architecture of HEVC/H.265 for consumer applications. IEEE Trans. Consum. Electron. 64, 3 (2018)

    Article  Google Scholar 

  59. ST 65nm standard cell library. www.st.com

  60. Ultra Video Group. http://ultravideo.cs.tut.fi/#testsequences

  61. Vanne, J., Aho, E., Hamalainen, T.D., Kuusilinna, K.: A high-performance sum of absolute difference implementation for motion estimation. IEEE Trans. Circ. Syst. Video Technol. 16(7), 876–883 (2006). https://doi.org/10.1109/TCSVT.2006.877150

    Article  Google Scholar 

  62. Vayalil, N.C., Safari, A., Kong, Y.: ASIC design in residue number system for calculating minimum sum of absolute differences. In: 2015 Tenth International Conference on Computer Engineering Systems (ICCES), pp. 129–132 (2015). https://doi.org/10.1109/ICCES.2015.7393032

  63. Walter, F.L., Diniz, C.M., Bampi, S.: Synthesis and comparison of low-power high-throughput architectures for SAD calculation. In: 2011 IEEE Second Latin American Symposium on Circuits and Systems (LASCAS), pp. 1–4 (2011). https://doi.org/10.1109/LASCAS.2011.5750276

  64. Walter, F.L., Diniz, C.M., Bampi, S.: Synthesis and comparison of low-power high-throughput architectures for SAD calculation. Analog Integr. Circ. Sig. Process 73(3), 873–884 (2012)

    Article  Google Scholar 

  65. Weinberger, A.: 4–2 carry-save adder module. IBM Tech. Disclosure Bull. 23(8), 3811–3814 (1981)

    Google Scholar 

  66. Xue, Y., Su, H., Ren, J., Wen, M., Zhang, C., Xiao, L.: A highly parallel and scalable motion estimation algorithm with GPU for HEVC. Sci. Program. J. (2017)

  67. Yuan, X., Jinsong, L., Liwei, G., Zhi, Z., Teng, R.K.F.: A high performance VLSI architecture for integer motion estimation in HEVC. In: 2013 IEEE 10th International Conference on ASIC, pp. 1–4 (2013). https://doi.org/10.1109/ASICON.2013.6811845

  68. Yufei, L., Xiubo, F., Qin, W.: A high-performance low cost SAD architecture for video coding. IEEE Trans. Consum. Electron. 53(2), 535–541 (2007). https://doi.org/10.1109/TCE.2007.381726

    Article  Google Scholar 

  69. Zhang, Y., Fan, R., Zhang, C., Wang, G., Li, Z.: SIMD acceleration for HEVC encoding on DSP. In: Proceedings of APSIPA Annual Summit and Conference, pp. 1–7 (2017)

  70. Zhou, D., Zhou, J., He, G., Goto, S.: A 1.59 Gpixel/s motion estimation processor with - 211 to +211 search range for UHDTV video encoder. IEEE J. Solid-State Circ. 49(4), 827–837 (2014). https://doi.org/10.1109/JSSC.2013.2293136

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guilherme Paim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Paim, G., Santana, G.M., Abreu, B.A. et al. Exploring high-order adder compressors for power reduction in sum of absolute differences architectures for real-time UHD video encoding. J Real-Time Image Proc 17, 1735–1754 (2020). https://doi.org/10.1007/s11554-019-00939-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-019-00939-x

Keywords

Navigation