AxRSU- $$2^{m}$$ : Higher-Order m-Bit Approximate Encoders for Radix- $$2^{m}$$ Squarer Units

da Rosa, Morgana M. A.; Paim, Guilherme; da Costa, Eduardo A. C.; Soares, Rafael; Bampi, Sergio

doi:10.1007/s00034-024-02616-2

AxRSU-$2^{m}$: Higher-Order m-Bit Approximate Encoders for Radix-$2^{m}$ Squarer Units

Published: 26 February 2024

(2024)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Morgana M. A. da Rosa ORCID: orcid.org/0000-0001-9582-9011¹,
Guilherme Paim³,
Eduardo A. C. da Costa²,
Rafael Soares¹ &
…
Sergio Bampi³

55 Accesses
Explore all metrics

Abstract

Approximate computing has emerged as a design alternative to enhance design efficiency by capitalizing on the inherent error resilience observed in numerous applications. Various error-resilient and compute-intensive applications, such as signal, image and video processing, computer vision, and supervised machine learning, necessitate dedicated hardware accelerators for mean squared error estimation during runtime. In these application domains, using efficient arithmetic operators, particularly a squarer unit, represents one of the most effective strategies for low-power design. This work introduces an approximate Radix-$2^{m}$ squarer unit, denoted as AxRSU-$2^{m}$. The proposed squarer unit employs m-bit approximate encoders to execute operations on m-bit data concurrently. The AxRSU-$2^{m}$ under consideration explores encoders with m equal to 2 (AxRSU-4), 3 (AxRSU-8), and 4 (AxRSU-16). These approximate encoders exhibit low complexity and diminish the necessary partial products operating on m bits simultaneously, thereby substantially enhancing energy efficiency and reducing circuit area in the AxRSU-$2^{m}$. To illustrate the trade-off between error and quality in the AxRSU-$2^{m}$, we apply it to an SSD (sum squared difference) hardware accelerator designed for video processing, with a square-accumulate serving as a case study. Our findings reveal a novel Pareto front, presenting eight optimal AxRSU-$2^{m}$ solutions that achieve accuracy levels ranging from 3.76 to 75.53%. These solutions yield energy savings ranging from 46.20 to 95.57% and circuit area reductions ranging from 37.68 to 66.73%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate Computing at the Algorithmic Level

Fast and energy-efficient approximate motion estimation architecture for real-time 4 K UHD processing

Article 10 September 2020

Efficient Design of Modified Wallace Tree Approximate Multipliers Based on Imprecise Compressors for Error-Tolerance Applications

Article 17 October 2023

References

A. Banerjee, D. Kumar, A New Squarer design with reduced area and delay, in 19th International Symposium on VLSI Design and Test (VDAT), Avignon (2016), pp. 205–214
S. Bui, J. Stine, Additional optimizations for parallel squarer units, in IEEE International Symposium on Circuits and Systems (ISCAS), Melbourne VIC, Australia (2014)
F. Bossen, Common test conditions and software configurations, in JCT-VC Document No. L1100 (2013)
Y.-H. Chen, Area-efficient fixed-width squarer with dynamic error-compensation circuit. IEEE Trans. Circuits Syst. II Express Briefs 62(9), 851–855 (2015)
Google Scholar
L. Dadda, Some schemes for parallel multipliers. Alta Frequenza (1965)
G. Gillani, M. Hanif, M. Krone, S. Gerez, M. Shafique, A. Kokkeler, SquASH: approximate square-accumulate with self-healing. IEEE Access 6, 49112–49128 (2018)
Article Google Scholar
V. Guidotti, G. Paim, L.M. Rocha, E. Costa, S. Almeida, S. Bampi, Power-efficient approximate Newton–Raphson integer divider applied to NLMS adaptive filter for high-quality interference cancelling. Circuits Syst. Signal Process. 39, 5729–5757 (2020)
Article Google Scholar
M.B. Hisham, S.N. Yaakob, R.A.A. Raof, A.A. Nazren, N.M.W. Embedded, Template matching using sum of squared difference and normalized cross correlation, in IEEE Student Conference on Research and Development (SCOReD) (2015)
HM. HEVC Test Model (HM) v. 16.7 (2023). http://hevc.hhi.fraunhofer.de
R. Jaikumar, M. Karpagam, L. Raju, A novel approach to implement high speed squaring circuit using ancient Vedic mathematics techniques. Int. J. Appl. Eng. Res. 10, 1–6 (2015)
Google Scholar
H. Jiang, C. Liu, F. Lombardi, J. Han, Low-power approximate unsigned multipliers with configurable error recovery. IEEE Trans. Circuits Syst. I Regul. Pap. 66(1), 189–202 (2019)
Article Google Scholar
C. Kommu, D. Rani, High performance 3-2 compressors architectures for high speed multipliers, in International Conference on Smart Systems and Inventive Technology (ICSSIT) (2018)
A. Liddicoat, M. Flynn, Parallel square and cube computations. Computer Systems Laboratory, Department of Electrical Engineering Stanford University (2014)
G. Paim, G. Zervakis, G. Pahwa, Y.S. Chauhan, E.A.C. da Costa, S. Bampi, J. Henkel, H. Amrouch, On the resiliency of NCFET circuits against voltage over-scaling. IEEE Trans. Circuits Syst. I Regul. Pap. 68(4), 1481–1492 (2021)
Article Google Scholar
G. Paim, L.M.G. Rocha, H. Amrouch, E.A.C. da Costa, S. Bampi, J. Henkel, A cross-layer gate-level-to-application co- simulation for design space exploration of approximate circuits in HEVC video encoders. IEEE Trans. Circuits Syst. Video Technol. 30(10), 3814–3828 (2020)
Article Google Scholar
K. Reddy, M. Vasantha, Y. Kumar, D. Dwivedi, Design of approximate booth squarer for error-tolerant computing. IEEE Trans. Very Large Scale Integr. VLSI Syst. 28(5), 1230–1241 (2020)
Article Google Scholar
L. Rocha, G. Paim, R. Ferreira, E. Costa, S. Bampi, Framework-based arithmetic core generation to explore ASIC-based parallel binary multipliers, in 2017 24th IEEE International Conference on Electronics, Circuits and Systems (ICECS) (2017), pp. 478–481
L. Rocha, G. Paim, G.M. Santana, E.A.C. da Costa, S. Bampi, Framework-based arithmetic datapath generation to explore parallel binary multipliers. J. Integr. Circuits Syst. 15(3), 1–10 (2020)
Article Google Scholar
M. da Rosa, G. Paim, L.M.G. Rocha, E.A.C. da Costa, S. Bampi, The Radix-2m Squared Multiplier, in 2020 27th IEEE International Conference on Electronics, Circuits and Systems (ICECS) (2020), pp. 1–4
M. da Rosa, E.A. da Costa, L.G. Rocha, G. Paim, S. Bampi, Energy-efficient VLSI squarer unit with optimized Radix-2m multiplication logic. Circuits Syst. Signal Process. 1–25 (2022)
M. da Rosa, G. Paim, J. Castro-Godínez, E.A.C. Da Costa, R.I. Soares, S. Bampi, AxRSU: approximate Radix-4 squarer unit, in 2022 IEEE International Symposium on Circuits and Systems (ISCAS) (2022), pp. 1655–1659
M. da Rosa, G. Paim, P.L.D. Costa, E.A.C.D. Costa, R.I. Soares, S. Bampi, AxPPA: approximate parallel prefix adders. IEEE Trans. Very Large Scale Integr. VLSI Syst. 31(1), 17–28 (2023)
Article Google Scholar
M. da Rosa, G. Paim, L.M. Rocha, E.A. da Costa, S. Bampi, Exploring efficient adder compressors for power-efficient sum of squared differences design, in 2020 27th IEEE International Conference on Electronics, Circuits and Systems (ICECS) (2020), pp. 1–4
S. Salvini, S.J. Wijnholds, Fast gain calibration in radio astronomy using alternating direction implicit methods: analysis and applications. Astron. Astrophys. 571, A97 (2014)
Article ADS Google Scholar
H. Seidel, M.M.A. da Rosa, G. Paim, E.A.C. da Costa, S.J.M. Almeida, S. Bampi, Approximate pruned and truncated Haar discrete wavelet transform VLSI hardware for energy-efficient ECG signal processing. IEEE Trans. Circuits Syst. I Regul. Pap. 68(5), 1814–1826 (2021)
Article MathSciNet Google Scholar
B. Shao, P. Li, Array-based approximate arithmetic computing: a general model and applications to multiplier and squarer design. IEEE Trans. Circuits Syst. I Regul. Pap. 62(4), 1090 (2015)
Article MathSciNet Google Scholar
D. Sharath, G. Devaraju, C. Kavitha, Optimization and implementation of parallel squarer. IJRET Int. J. Res. Eng. Technol.
R. Sharma, M. Kaur, G. Singh, Design and FPGA implementation of optimized 32-bit Vedic multiplier and square architectures, in International Conference on Industrial Instrumentation and Control (ICIC) (2015)
B. Silveira, G. Paim, B. Abreu, M. Grellert, C.M. Diniz, E.A.C. da Costa, S. Bampi, Power-efficient sum of absolute differences hardware architecture using adder compressors for integer motion estimation design. IEEE Trans. Circuits Syst. I Regul. Pap. 64(12), 3126–3137 (2017)
Article Google Scholar
P. Stanley-Marbell, A. Alaghi, M. Carbin, E. Darulova, L. Dolecek, A. Gerstlauer, G. Gillani, D. Jevdjic, T. Moreau, M. Cacciotti, A. Daglis, N.E. Jerger, B. Falsafi, S. Misailovic, A. Sampson, D. Zufferey, Exploiting errors for efficiency: a survey from circuits to applications. ACM Comput. Surv. 53(3), 1–39 (2020)
Article Google Scholar
A. Strollo, E. Napoli, D. De Caro, N. Petra, G.D. Meo, Comparison and extension of approximate 4–2 compressors for low-power approximate multipliers. IEEE Trans. Circuits Syst. I Regul. Pap. 67(9), 3021–3034 (2020)
Article MathSciNet Google Scholar
Z. Tasoulas, G. Zervakis, I. Anagnostopoulos, H. Amrouch, J. Henkel, Weight-oriented approximation for energy-efficient neural network inference accelerators. IEEE Trans. Circuits Syst. I Regul. Pap. 67(12), 4670–4683 (2020)
Article Google Scholar
K. Tsai, Y.-J. Chang, C.-H. Wang, C.-T. Chiang, Accuracy-configurable radix-4 adder with a dynamic output modification scheme. IEEE Trans. Circuits Syst. I Regul. Pap. 68(8), 3328–3336 (2021)
Article Google Scholar
C. Wallace, A suggestion for fast multiplier. IEEE Trans. Electron. Comput. (1964)
G. Zervakis, H. Saadat, H. Amrouch, A. Gerstlauer, S. Parameswaran, J. Henkel, Approximate computing for ML: state-of-the-art, challenges and visions, in 2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC) (2021), pp. 189–196

Download references

Acknowledgements

This work was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brazil (CAPES) - Finance Code 001. It was also supported by the Brazilian Microelectronics Society (SBMicro) through the Programa de Apoio a Projeto de Circuitos Integrados em Universidades (APCI).

Author information

Authors and Affiliations

Graduate Program on Computing, Federal University of Pelotas (UFPel), Pelotas, Rio Grande do Sul, Brazil
Morgana M. A. da Rosa & Rafael Soares
Graduate Program in Electronics and Computer Engineering, Catholic University of Pelotas, Pelotas, Rio Grande do Sul, Brazil
Eduardo A. C. da Costa
Graduate Program in Microelectronics, Federal University of RGS, Porto Alegre, Rio Grande do Sul, Brazil
Guilherme Paim & Sergio Bampi

Authors

Morgana M. A. da Rosa
View author publications
You can also search for this author in PubMed Google Scholar
Guilherme Paim
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo A. C. da Costa
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Soares
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Bampi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Morgana M. A. da Rosa.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

da Rosa, M.M.A., Paim, G., da Costa, E.A.C. et al. AxRSU-$2^{m}$: Higher-Order m-Bit Approximate Encoders for Radix-$2^{m}$ Squarer Units. Circuits Syst Signal Process (2024). https://doi.org/10.1007/s00034-024-02616-2

Download citation

Received: 31 March 2023
Revised: 13 January 2024
Accepted: 19 January 2024
Published: 26 February 2024
DOI: https://doi.org/10.1007/s00034-024-02616-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AxRSU-\(2^{m}\): Higher-Order m-Bit Approximate Encoders for Radix-\(2^{m}\) Squarer Units

Abstract

Access this article

Similar content being viewed by others

Approximate Computing at the Algorithmic Level

Fast and energy-efficient approximate motion estimation architecture for real-time 4 K UHD processing

Efficient Design of Modified Wallace Tree Approximate Multipliers Based on Imprecise Compressors for Error-Tolerance Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

AxRSU-\(2^{m}\): Higher-Order m-Bit Approximate Encoders for Radix-\(2^{m}\) Squarer Units

Abstract

Access this article

Similar content being viewed by others

Approximate Computing at the Algorithmic Level

Fast and energy-efficient approximate motion estimation architecture for real-time 4 K UHD processing

Efficient Design of Modified Wallace Tree Approximate Multipliers Based on Imprecise Compressors for Error-Tolerance Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation