Skip to main content

Advertisement

Log in

A Survey of Approximate Computing: From Arithmetic Units Design to High-Level Applications

  • Survey
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Realizing a high-performance and energy-efficient circuit system is one of the critical tasks for circuit designers. Conventional researchers always concentrated on the tradeoffs between the energy and the performance in circuit and system design based on accurate computing. However, as video/image processing and machine learning algorithms are widespread, the technique of approximate computing in these applications has become a hot topic. The errors caused by approximate computing could be tolerated by these applications with specific processing or algorithms, and large improvements in performance or power savings could be achieved with some acceptable loss in final output quality. This paper presents a survey of approximate computing from arithmetic units design to high-level applications, in which we try to give researchers a comprehensive and insightful understanding of approximate computing. We believe that approximate computing will play an important role in the circuit and system design in the future, especially with the rapid development of artificial intelligence algorithms and their related applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Xu Q, Mytkowicz T, Kim N S. Approximate computing: A survey. IEEE Design & Test, 2016, 33(1): 8–22. DOI: https://doi.org/10.1109/MDAT.2015.2505723.

    Article  Google Scholar 

  2. Zervakis G, Saadat H, Amrouch H, Gerstlauer A, Parameswaran S, Henkel J. Approximate computing for ML: State-of-the-art, challenges and visions. In Proc. the 26th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2021, pp.189–196. DOI: 10.1145/3394885.3431632.

  3. Jiang H L, Santiago F J H, Mo H, Liu L B, Han J. Approximate arithmetic circuits: A survey, characterization, and recent applications. Proceedings of the IEEE, 2020, 108(12): 2108–2135. DOI: https://doi.org/10.1109/JPROC.2020.3006451.

    Article  Google Scholar 

  4. Amanollahi S, Kamal M, Afzali-Kusha A, Pedram M. Circuit-level techniques for logic and memory blocks in approximate computing systems. Proceedings of the IEEE, 2020, 108(12): 2150–2177. DOI: https://doi.org/10.1109/JPROC.2020.3020792.

    Article  Google Scholar 

  5. Cheemalavagu S, Korkmaz P, Palem K V et al. A probabilistic CMOS switch and its realization by exploiting noise. In Proc. IFIP International Conference on VLSI, Oct. 2005, pp.535–541.

  6. Gupta V, Mohapatra D, Raghunathan A, Roy K. Lowpower digital signal processing using approximate adders. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 2013, 32(1): 124–137. DOI: https://doi.org/10.1109/TCAD.2012.2217962.

    Article  Google Scholar 

  7. Kim Y, Zhang Y, Li P. An energy efficient approximate adder with carry skip for error resilient neuromorphic VLSI systems. In Proc. the 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov. 2013, pp.130–137. DOI: 10.1109/ICCAD.2013.6691108.

  8. Zhu N, Goh W L, Wang G, Yeo K S. Enhanced low-power high-speed adder for error-tolerant application. In Proc. the 2010 International SoC Design Conference, Nov. 2010, pp.323–327. DOI: 10.1109/SOCDC.2010.5682905.

  9. Lin I C, Yang Y M, Lin C C. High-performance low-power carry speculative addition with variable latency. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2015, 23(9): 1591–1603. DOI: https://doi.org/10.1109/TVLSI.2014.2355217.

    Article  Google Scholar 

  10. Hu J J, Li Z J, Yang M, Huang Z X, Qian W K. A highaccuracy approximate adder with correct sign calculation. Integration, 2019, 65: 370–388. DOI: https://doi.org/10.1016/j.vlsi.2017.09.003.

    Article  Google Scholar 

  11. Yang X H, Xing Y, Qiao F, Yang H Z. Multistage latency adders architecture employing approximate computing. Journal of Circuits, Systems and Computers, 2017, 26(3): 1750039. DOI: https://doi.org/10.1142/S0218126617500396.

    Article  Google Scholar 

  12. Zhang T T, Liu W Q, McLarnon E, O’Neill M, Lombardi F. Design of majority logic (ML) based approximate full adders. In Proc. the 2018 IEEE International Symposium on Circuits and Systems (ISCAS), May 2018. DOI: https://doi.org/10.1109/ISCAS.2018.8350962.

  13. Liang J H, Han J, Lombardi F. New metrics for the reliability of approximate and probabilistic adders. IEEE Trans. Computers, 2013, 62(9): 1760–1771. DOI: https://doi.org/10.1109/TC.2012.146.

    Article  MathSciNet  MATH  Google Scholar 

  14. Niharika A, Ramesh M K. 16×16 modified booth multiplier implementation using Wallace tree structures. Journal of Signal Processing, 2022, 8(1): 16–21.

    Google Scholar 

  15. Kulkarni P, Gupta P, Ercegovac M. Trading accuracy for power with an underdesigned multiplier architecture. In Proc. the 24th Internatioal Conference on VLSI Design, Jan. 2011, pp.346–351. DOI: 10.1109/VLSID.2011.51.

  16. Rehman S, El-Harouni W, Shafique M, Kumar A, Henkel J, Henkel J. Architectural-space exploration of approximate multipliers. In Proc. the 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov. 2016. DOI: https://doi.org/10.1145/2966986.2967005.

  17. Waris H, Wang C H, Xu C Y, Liu W Q. AxRMs: Approximate recursive multipliers using high-performance building blocks. IEEE Trans. Emerging Topics in Computing, 2022, 10(2): 1229–1235. DOI: https://doi.org/10.1109/TETC.2021.3096515.

    Article  Google Scholar 

  18. Mahdiani H R, Ahmadi A, Fakhraie S M, Lucas C. Bioinspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications. IEEE Trans. Circuits and Systems I: Regular Papers, 2009, 57(4): 850–862. DOI: https://doi.org/10.1109/TCSI.2009.2027626.

    Article  Google Scholar 

  19. Baran D, Aktan M, Oklobdzija V G. Energy efficient implementation of parallel CMOS multipliers with improved compressors. In Proc. the 16th ACM/IEEE International Symposium on Low-Power Electronics and Design, Aug. 2010, pp.147–152. DOI: 10.1145/1840845.1840876.

  20. Zendegani R, Kamal M, Bahadori M, Afzali-Kusha A, Pedram M. RoBA multiplier: A rounding-based approximate multiplier for high-speed yet energy-efficient digital signal processing. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2017, 25(2): 393–401. DOI: https://doi.org/10.1109/TVLSI.2016.2587696.

    Article  Google Scholar 

  21. Narayanamoorthy S, Moghaddam H A, Liu Z H, Park T, Kim N S. Energy-efficient approximate multiplication for digital signal processing and classification applications. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2015, 23(6): 1180–1184. DOI: https://doi.org/10.1109/TVLSI.2014.2333366.

    Article  Google Scholar 

  22. Liu W Q, Qian L Y, Wang C H, Jiang H L, Han J, Lombardi F. Design of approximate radix-4 booth multipliers for error-tolerant computing. IEEE Trans. Computers, 2017, 66(8): 1435–1441. DOI: https://doi.org/10.1109/TC.2017.2672976.

    Article  MathSciNet  MATH  Google Scholar 

  23. Venkatachalam S, Adams E, Lee H J, Ko S B. Design and analysis of area and power efficient approximate booth multipliers. IEEE Trans. Computers, 2019, 68(11): 1697–1703. DOI: https://doi.org/10.1109/TC.2019.2926275.

    Article  MathSciNet  MATH  Google Scholar 

  24. Waris H, Wang C H, Liu W Q. Hybrid low radix encoding-based approximate booth multipliers. IEEE Trans. Circuits and Systems II: Express Briefs, 2020, 67(12): 3367–3371. DOI: https://doi.org/10.1109/TCSII.2020.2975094.

  25. Mitchell J N. Computer multiplication and division using binary logarithms. IRE Trans. Electronic Computers, 1962, EC-11(4): 512–517. DOI: https://doi.org/10.1109/TEC.1962.5219391.

    Article  MathSciNet  MATH  Google Scholar 

  26. Liu W Q, Xu J H, Wang D Y, Wang C H, Montuschi P, Lombardi F. Design and evaluation of approximate logarithmic multipliers for low power error-tolerant applications. IEEE Trans. Circuits and Systems I: Regular Papers, 2018, 65(9): 2856–2868. DOI: https://doi.org/10.1109/TCSI.2018.2792902.

    Article  Google Scholar 

  27. Zhang T T, Jiang H L, Mo H, Liu W Q, Lombardi F, Liu L B, Han J. Design of majority logic-based approximate booth multipliers for error-tolerant applications. IEEE Trans. Nanotechnology, 2022, 21: 81–89. DOI: https://doi.org/10.1109/TNANO.2022.3145362.

    Article  Google Scholar 

  28. Chen L B, Han J, Liu W Q, Lombardi F. On the design of approximate restoring dividers for error-tolerant applications. IEEE Trans. Computers, 2016, 65(8): 2522–2533. DOI: https://doi.org/10.1109/TC.2015.2494005.

    Article  MathSciNet  MATH  Google Scholar 

  29. Ercegovac M D, Lang T, Montuschi P. Very-high radix division with prescaling and selection by rounding. IEEE Trans. Computers, 1994, 43(8): 909–918. DOI: https://doi.org/10.1109/12.295853.

    Article  MATH  Google Scholar 

  30. Chen L B, Lombardi F, Montuschi P, Han J, Liu W Q. Design of approximate high-radix dividers by inexact binary signed-digit addition. In Proc. the on Great Lakes Symposium on VLSI 2017, May 2017, pp.293–298. DOI: https://doi.org/10.1145/3060403.3060404.

  31. Lin C P, Tseng P C, Chiu Y T, Lin S S, Cheng C C, Fang H C, Chao W M, Chen L G. A 5mW MPEG4 SP encoder with 2D bandwidth-sharing motion estimation for mobile applications. In Proc. the 2006 IEEE International Solid State Circuits Conference-Digest of Technical Papers, Feb. 2006, pp.1626–1635. DOI: 10.1109/ISSCC.2006.1696217.

  32. Carroll A, Heiser G. An analysis of power consumption in a smartphone. In Proc. the 2010 USENIX Conference on USENIX Annual Technical Conference, Jun. 2010.

  33. Chang I J, Mohapatra D, Roy K. A priority-based 6T/8T hybrid SRAM architecture for aggressive voltage scaling in video applications. IEEE Trans. Circuits and Systems for Video Technology, 2011, 21(2): 101–112. DOI: https://doi.org/10.1109/TCSVT.2011.2105550.

    Article  Google Scholar 

  34. Zhou N, Qiao F, Yang H Z, Wang H. Low-power off-chip memory design for video decoder using embedded bus-invert coding. In Proc. the 10th International Symposium on Autonomous Decentralized Systems, Mar. 2011, pp.251–255. DOI: 10.1109/ISADS.2011.33.

  35. Joo Y, Choi Y, Shim H. Energy exploration and reduction of SDRAM memory systems. In Proc. the 2002 Design Automation Conference, Jun. 2002, pp.892–897. DOI: 10.1109/DAC.2002.1012748.

  36. Liu S, Pattabiraman K, Moscibroda T, Zorn B G. Flikker: Saving DRAM refresh-power through critical data partitioning. In Proc. the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2011, pp.213–224. DOI: 10.1145/1950365.1950391.

  37. Tian Y, Zhang Q, Wang T, Yuan F, Xu Q. ApproxMA: Approximate memory access for dynamic precision scaling. In Proc. the 25th Edition on Great Lakes Symposium on VLSI, May 2015, pp.337–342. DOI: 10.1145/2742060.2743759.

  38. Shiga H, Takashima D, Shiratake S I, Hoya K, Miyakawa T, Ogiwara R, Fukuda R, Takizawa R, Hatsuda K, Matsuoka F, Nagadomi Y, Hashimoto D, Nishimura H, Hio-ka T, Doumae S, Shimizu S, Kawano M, Taguchi T, Watanabe Y, Fujii S, Ozaki T, Kanaya H, Kumura Y, Shimojo Y, Yamada Y, Minami Y, Shuto S, Yamakawa K, Yamazaki S, Kunishima I, Hamamoto T, Nitayama A, Furuyama T. A 1.6 GB/s DDR2 128 Mb chain FeRAM with scalable octal bitline and sensing schemes. IEEE Journal of Solid-State Circuits, 2010, 45(1): 142–152. DOI: https://doi.org/10.1109/JSSC.2009.2034414.

    Article  Google Scholar 

  39. Li B X, Xia L X, Gu P, Wang Y, Yang H Z. Merging the interface: Power, area and accuracy co-optimization for RRAM crossbar-based mixed-signal computing system. In Proc. the 52nd ACM/EDAC/IEEE Design Automation Conference, Jun. 2015. DOI: 10.1145/2744769.2744870.

  40. Nelson J, Sampson A, Ceze L. Dense approximate storage in phase-change memory. In Proc. the Wild and Crazy Ideas w/International Conference on Architectural Support for Programming Languages and Operating Systems (WACI w/ASPLOS), Mar. 2011.

  41. Sidiroglou-Douskos S, Misailovic S, Hoffmann H, Rinard M. Managing performance vs. accuracy trade-offs with loop perforation. In Proc. the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, Sept. 2011, pp.124–134. DOI: 10.1145/2025113.2025133.

  42. Lashgar A, Atoofian E, Baniasadi A. Loop perforation in OpenACC. In Proc. the 2018 IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/-BDCloud/SocialCom/SustainCom), Dec. 2018, pp.163–170. DOI: 10.1109/BDCloud.2018.00036.

  43. Rubio-González C, Nguyen C, Nguyen H D, Demmel J, Kahan W, Sen K, Bailey D H, Iancu C, Hough D. Precimonious: Tuning assistant for floating-point precision. In Proc. the International Conference on High Performance Computing, Networking, Storage and Analysis, Nov. 2013. DOI: https://doi.org/10.1145/2503210.2503296.

  44. Hsiao C C, Chu S L, Chen C Y. Energy-aware hybrid precision selection framework for mobile GPUs. Computers & Graphics, 2013, 37(5): 431–444. DOI: https://doi.org/10.1016/j.cag.2013.03.003.

    Article  Google Scholar 

  45. Lesser B, Mücke M, Gansterer W N. Effects of reduced precision on floating-point SVM classification accuracy. Procedia Computer Science, 2011, 4: 508–517. DOI: https://doi.org/10.1016/j.procs.2011.04.053.

    Article  Google Scholar 

  46. Venkataramani S, Ranjan A, Roy K, Raghunathan A. AxNN: Energy-efficient neuromorphic systems using approximate computing. In Proc. the 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), Aug. 2014, pp.27–32. DOI: 10.1145/2627369.2627613.

  47. Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P. Deep learning with limited numerical precision. In Proc. the 32nd International Conference on Machine Learning, Jul. 2015, pp.1737–1746.

  48. Krishnamoorthi R. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv: 1806. 08342, 2018. https://arxiv.org/abs/1806.08342, April 2023.

  49. Zhu F, Gong R H, Yu F W, Liu X L, Wang Y F, Li Z L, Yang X Q, Yan J J. Towards unified INT8 training for convolutional neural network. In Proc. the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1966–1976. DOI: https://doi.org/10.1109/CVPR42600.2020.00204.

  50. Gysel P, Pimentel J, Motamedi M, Ghiasi S. Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks. IEEE Trans. Neural Networks and Learning Systems, 2018, 29(11): 5784–5789. DOI: https://doi.org/10.1109/TNNLS.2018.2808319.

    Article  Google Scholar 

  51. Banner R, Nahshan Y, Soudry D. Post training 4-bit quantization of convolutional networks for rapid-deployment. In Proc. the 33rd International Conference on Neural Information Processing Systems, Dec. 2019, pp.7950–7958.

  52. Sun X, Choi J, Chen C Y, Wang N G, Venkataramani S, Srinivasan V V, Cui X D, Zhang W, Gopalakrishnan K. Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks. In Proc. the 33rd International Conference on Neural Information Processing Systems, Dec. 2019, pp.4900–4909.

  53. Micikevicius P, Narang S, Alben J, Diamos G, Elsen E, Garcia D, Ginsburg B, Houston M, Kuchaiev O, Venkatesh G, Wu H. Mixed precision training. arXiv: 1710.03740, 2017. https://arxiv.org/abs/1710.03740#, April 2023.

  54. Hanson S J, Pratt L Y. Comparing biases for minimal network construction with back-propagation. In Proc. the 1st International Conference on Neural Information Processing Systems, Jan. 1988, pp.177–185.

  55. LeCun Y, Denker J S, Solla S A. Optimal brain damage. In Proc. the Advances in Neural Information Processing Systems, Nov. 1989. pp.598–605.

  56. Zhu M, Gupta S. To prune, or not to prune: Exploring the efficacy of pruning for model compression. In Proc. the 6th International Conference on Learning Representations, Apr. 2018.

  57. Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural networks. In Proc. the 28th International Conference on Advances in Neural Information Processing Systems, Dec. 2015. pp.1135–1143.

  58. Liu Z, Li J G, Shen Z Q, Huang G, Yan S M, Zhang C S. Learning efficient convolutional networks through network slimming. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2755–2763. DOI: 10.1109/ICCV.2017.298.

  59. Ye J B, Lu X, Lin Z, Wang J Z. Rethinking the smallernorm-less-informative assumption in channel pruning of convolution layers. In Proc. the 6th International Conference on Learning Representations, Apr. 2018.

  60. Fleischer B, Shukla S, Ziegler M, Silberman J, Oh J, Srinivasan V, Choi J, Mueller S, Agrawal A, Babinsky T, Cao M Z, Chen C Y, Chuang P, Fox T, Gristede G, Guillorn M, Haynie H, Klaiber M, Lee D, LO S H, Maier G, Scheuermann M, Venkataramani S, Vezyrtzis C, Wang N G, Yee F, Zhou C, Lu P F, Curran B, Chang L, Gopalakrishnan K. A scalable multi-TeraOPS deep learning processor core for AI Trainina and inference. In Proc. the 2018 IEEE Symposium on VLSI Circuits, Jun. 2018, pp.35–36. DOI: https://doi.org/10.1109/VLSIC.2018.8502276.

  61. Li H, Pang Y R, Zhang J L. Security enhancements for approximate machine learning. In Proc. the on Great Lakes Symposium on VLSI 2021, Jun. 2021, pp.461–466. DOI: https://doi.org/10.1145/3453688.3461753.

  62. Leipnitz M T, Nazar G L. High-level synthesis of resource-oriented approximate designs for FPGAs. In Proc. the 56th ACM/IEEE Design Automation Conference (DAC), Jun. 2019.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xing-Hua Yang.

Supplementary Information

ESM 1

(PDF 148 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Que, HH., Jin, Y., Wang, T. et al. A Survey of Approximate Computing: From Arithmetic Units Design to High-Level Applications. J. Comput. Sci. Technol. 38, 251–272 (2023). https://doi.org/10.1007/s11390-023-2537-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-023-2537-y

Keywords

Navigation