Skip to main content

Designing Resource-Efficient Hardware Arithmetic for FPGA-Based Accelerators Leveraging Approximations and Mixed Quantizations

  • Chapter
  • First Online:
Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing

Abstract

While ASIC-based hardware platforms provide better application-specific cost–accuracy trade-offs, the diversity of embedded systems deploying machine learning algorithms has risen steadily. Consequently, given their reconfigurability and high performance, FPGA-based hardware platforms are increasingly used for embedded machine learning. However, the low-power designs devised for ASICs, using methods such as precision scaling, approximate computing, and mixed/custom quantization, do not result in proportionate gains when implemented on FPGAs. This lack of proportional gains can be attributed primarily to the lack of optimizations for FPGA’s LUT-based architecture in the ASIC-optimized designs. Consequently, there has been active research on improving the efficacy of low-power methods in FPGA-based systems.

In this chapter, we provide an overview of such FPGA-oriented low-power design methods and delve into the details of selected works that report considerable improvements in this regard. Specifically, we cover custom optimizations for both accurate and approximate multiplier designs and MAC units employing mixed quantization of Posit and fixed-point/integer number representations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The total number of recorded fractional bits depends on the deployed bit width of the quantization scheme.

  2. 2.

    The data in Fig. 17 refer to the design with the better metrics among the ToolOpt and non-ToolOpt versions.

  3. 3.

    The best-case latency refers to the latency corresponding to the CPD of the design.

References

  1. Zablocki, É., Ben-Younes, H., Pérez, P., Cord, M.: Explainability of vision-based autonomous driving systems: Review and challenges. CoRR, vol. abs/2101.05307, 2021. https://arxiv.org/abs/2101.05307

  2. Prabakaran, B.S., Akhtar, A., Rehman, S., Hasan, O., Shafique, M.: BioNetExplorer: Architecture-space exploration of biosignal processing deep neural networks for wearables. IEEE Internet Things J. 8(17), 13251–13265 (2021)

    Article  Google Scholar 

  3. Chlingaryan, A., Sukkarieh, S., Whelan, B.: Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review. Comput. Electron. Agric. 151, 61–69 (2018)

    Article  Google Scholar 

  4. Kotsiopoulos, T., Sarigiannidis, P., Ioannidis, D., Tzovaras, D.: Machine learning and deep learning in smart manufacturing: The smart grid paradigm. Computer Science Review 40, 100341 (2021). https://www.sciencedirect.com/science/article/pii/S157401372030441X

    Article  MathSciNet  Google Scholar 

  5. Control your smart home — google assistant. Accessed on 17 February, 2022. https://assistant.google.com/smart-home/

  6. Lin, J., Chen, W.-M., Lin, Y., cohn, j., Gan, C., Han, S.: MCUNet: Tiny deep learning on IoT devices. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 11711–11722. Curran Associates, Inc., New York (2020). https://proceedings.neurips.cc/paper/2020/file/86c51678350f656dcc7f490a43946ee5-Paper.pdf

  7. Warden, P., Situnayake, D.: TinyML: Machine learning with TensorFlow Lite on Arduino and ultra-low-power microcontrollers. O’Reilly Media (2019)

    Google Scholar 

  8. Chippa, V.K., Chakradhar, S.T., Roy, K., and Raghunathan, A.: Analysis and characterization of inherent application resilience for approximate computing. In: Proceedings of the 50th Annual Design Automation Conference (2013), pp. 1–9

    Google Scholar 

  9. Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural network. In: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (NIPS’15), pp. 1135–1143. MIT Press, Cambridge, MA (2015)

    Google Scholar 

  10. Gupta, S., Ullah, S., Ahuja, K., Tiwari, A., Kumar, A.: ALigN: A highly accurate adaptive layerwise log_2_lead quantization of pre-trained neural networks. IEEE Access 8, 118899–118911 (2020)

    Article  Google Scholar 

  11. Ullah, S., Sahoo, S.S., Ahmed, N., Chaudhury, D., Kumar, A.: AppAxO: Designing application-specific approximate operators for FPGA-based embedded systems. ACM Trans. Embed. Comput. Syst. (2022). https://doi.org/10.1145/3513262

  12. Ullah, S., Schmidl, H., Sahoo, S.S., Rehman, S., Kumar, A.: Area-optimized accurate and approximate softcore signed multiplier architectures. IEEE Trans. Comput. 70(3), 384–392 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  13. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-scale machine learning on heterogeneous systems. In: 2015, Software available from tensorflow.org. https://www.tensorflow.org/

  14. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014)

    Google Scholar 

  15. Nambi, S., Ullah, S., Sahoo, S.S., Lohana, A., Merchant, F., Kumar, A.: ExPAN(N)D: Exploring posits for efficient artificial neural network design in FPGA-based systems. IEEE Access 9, 103691–103708 (2021)

    Article  Google Scholar 

  16. Courbariaux, M., Bengio, Y., David, J.-P.: BinaryConnect: training deep neural networks with binary weights during propagations. In: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (NIPS’15), pp. 3123–3131. MIT Press, Cambridge, MA, USA (2015)

    Google Scholar 

  17. Li, F., Zhang, B., Liu, B.: Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016)

    Google Scholar 

  18. Gustafson, J.L., Yonemoto, I.T.: Beating floating point at its own game: Posit arithmetic. Supercomputing Frontiers and Innovations 4(2), 71–86 (2017)

    Google Scholar 

  19. Xilinx: UltraScale Architecture Configuration: User Guide. https://www.xilinx.com/support/documentation/user_guides/ug570-ultrascale-configuration.pdf (2022)

  20. Xilinx 7 Series DSP48E1 Slice. https://www.xilinx.com/support/documentation/user_guides/ug479_7Series_DSP48E1.pdf (2018)

  21. Intel® Stratix® 10 Variable Precision DSP Blocks User Guide. https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/stratix-10/ug-s10-dsp.pdf (2020)

  22. Ullah, S., Rehman, S., Shafique, M., Kumar, A.: High-performance accurate and approximate multipliers for FPGA-based hardware accelerators. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp. 1–1 (2021). https://doi.org/10.1109%2Ftcad.2021.3056337

  23. Xilinx LogiCORE IP v12.0 . https://www.xilinx.com/support/documentation/ip_documentation/mult_gen/v12_0/pg108-mult-gen.pdf (2015)

  24. Intel: Integer Arithmetic IP Cores User Guide. https://www.altera.com/en_US/pdfs/literature/ug/ug_lpm_alt_mfug.pdf (2020)

  25. Baugh, C., Wooley, B.: A two’s complement parallel array multiplication algorithm. IEEE Trans. Comput. C-22(12), 1045–1047 (1973)

    Article  MATH  Google Scholar 

  26. Ullah, S., Nguyen, T.D.A., Kumar, A.: Energy-efficient low-latency signed multiplier for fpga-based hardware accelerators. IEEE Embed. Syst. Lett. 13(2), 41–44 (2021)

    Article  Google Scholar 

  27. Kumm, M., Abbas, S., Zipf, P.: An efficient softcore multiplier architecture for Xilinx FPGAs. In: 2015 IEEE 22nd Symposium on Computer Arithmetic, pp. 18–25. IEEE, New York (2015)

    Google Scholar 

  28. Booth, A.D.: A signed binary multiplication technique. Q. J. Mech. Appl. Math. 4(2), 236–240 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  29. MNIST-cnn. https://github.com/integeruser/MNIST-cnn (2016)

  30. Rehman, S., El-Harouni, W., Shafique, M., Kumar, A., Henkel, J.: Architectural-space exploration of approximate multipliers. In: Proceedings of the 35th International Conference on Computer-Aided Design, ser. ICCAD ’16. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2966986.2967005

  31. Kulkarni, P., Gupta, P., Ercegovac, M.: Trading accuracy for power with an underdesigned multiplier architecture. In: 2011 24th International Conference on VLSI Design, pp. 346–351. IEEE, New York (2011)

    Google Scholar 

  32. SIPI Image Database. http://sipi.usc.edu/database/database.php?volume=misc (2019)

  33. Kalamkar, D., Mudigere, D., Mellempudi, N., Das, D., Banerjee, K., Avancha, S., Vooturi, D.T., Jammalamadaka, N., Huang, J., Yuen, H., et al.: A study of BFLOAT16 for deep learning training. arXiv preprint arXiv:1905.12322 (2019)

    Google Scholar 

  34. Introduction to Cloud TPU. https://cloud.google.com/tpu/docs/intro-to-tpu

  35. Intel® Deep Learning Boost New Deep Learning Instruction BFLOAT16 - Intrinsic Functions. https://www.intel.com/content/www/us/en/developer/articles/technical/intel-deep-learning-boost-new-instruction-bfloat16.html

  36. Arm Armv9-A A64 Instruction Set Architecture. https://developer.arm.com/documentation/ddi0602/2021-12/?lang=en

  37. TensorFloat-32 in the A100 GPU Accelerates AI Training, HPC up to 20x. https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/

  38. Vogel, S., Springer, J., Guntoro, A., Ascheid, G.: Self-supervised quantization of pre-trained neural networks for multiplierless acceleration. In: 2019 Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 1094–1099. IEEE, New York (2019).

    Google Scholar 

  39. Sarwar, S.S., Venkataramani, S., Raghunathan, A., Roy, K.: Multiplier-less artificial neurons exploiting error resiliency for energy-efficient neural computing. In: 2016 Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 145–150. IEEE, New York (2016)

    Google Scholar 

  40. Chaurasiya, R., Gustafson, J., Shrestha, R., Neudorfer, J., Nambiar, S., Niyogi, K., Merchant, F., Leupers, R.: Parameterized posit arithmetic hardware generator. In: 2018 IEEE 36th International Conference on Computer Design (ICCD), pp. 334–341. IEEE, New York (2018)

    Google Scholar 

  41. Podobas, A., Matsuoka, S.: Hardware Implementation of POSITs and Their Application in FPGAs. In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 138–145 (2018)

    Google Scholar 

  42. Jaiswal, M.K., So, H.K.: Universal number posit arithmetic generator on FPGA. In: 2018 Design, Automation Test in Europe Conference Exhibition (DATE), pp. 1159–1162 (2018)

    Google Scholar 

  43. Jaiswal, M.K., So, H.K.: PACoGen: A Hardware Posit Arithmetic Core Generator. IEEE Access 7, 74586–74601 (2019)

    Article  Google Scholar 

  44. Jain, R., Sharma, N., Merchant, F., Patkar, S., Leupers, R.: CLARINET: A RISC-V Based Framework for Posit Arithmetic Empiricism (2020)

    Google Scholar 

  45. Cococcioni, M., Rossi, F., Ruffaldi, E., Saponara, S.: Fast deep neural networks for image processing using posits and ARM scalable vector extension. J. Real-Time Image Proc. 17(3), 759–771 (2020)

    Article  Google Scholar 

  46. Carmichael, Z., Langroudi, H.F., Khazanov, C., Lillie, J., Gustafson, J.L., Kudithipudi, D.: Deep Positron: A deep neural network using the posit number system. In: 2019 Design, Automation Test in Europe Conference Exhibition (DATE), pp. 1421–1426 (2019)

    Google Scholar 

  47. Langroudi, H.F., Carmichael, Z., Gustafson, J.L., Kudithipudi, D.: PositNN framework: Tapered precision deep learning inference for the edge. In: 2019 IEEE Space Computing Conference (SCC), pp. 53–59. IEEE, New York (2019)

    Google Scholar 

  48. Fatemi Langroudi, S.H., Pandit, T., Kudithipudi, D.: Deep learning inference on embedded devices: Fixed-point vs posit. In: 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), pp. 19–23 (2018)

    Google Scholar 

  49. Langroudi, H.F., Karia, V., Gustafson, J.L., Kudithipudi, D.: Adaptive posit: Parameter aware numerical format for deep learning inference on the edge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 726–727 (2020)

    Google Scholar 

  50. Wu, B.: SmallPositHDL. https://github.com/starbrilliance/SmallPositHDL (2020)

  51. Xiao, F., Liang, F., Wu, B., Liang, J., Cheng, S., Zhang, G.: Posit arithmetic hardware implementations with the minimum cost divider and SquareRoot. Electronics 9(10), 1622 (2020). https://www.mdpi.com/2079-9292/9/10/1622

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akash Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ullah, S., Sahoo, S.S., Kumar, A. (2024). Designing Resource-Efficient Hardware Arithmetic for FPGA-Based Accelerators Leveraging Approximations and Mixed Quantizations. In: Pasricha, S., Shafique, M. (eds) Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing. Springer, Cham. https://doi.org/10.1007/978-3-031-19568-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19568-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19567-9

  • Online ISBN: 978-3-031-19568-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics