Advertisement

Implementation of high precision/low latency FP divider using Urdhva–Tiryakbhyam multiplier for SoC applications

  • C. R. S. HanumanEmail author
  • J. Kamala
  • A. R. Aruna
Article
  • 4 Downloads

Abstract

The increasing demand of Industrial and Scientific data intensive applications are higher precision arithmetic with reduced computation time. In this paper, we designed a high-precision, fully pipelined 32-bit floating-point (FP) divider using Newton–Raphson (NR) algorithm realized with Urdhva–Tiryakbhyam (UT) multiplier for System on Chip applications. The divider design is based on Newton–Raphson (multiplicative) method and it supports all IEEE rounding modes with a latency of 15 cycles. The iterative NR computations are performed by using FP multiplier and FP adder. The key module of FP multiplier for calculating mantissa part is UT multiplier. It’s an ancient Vedic multiplication technique used from few centuries back for doing fast multiplications. We implemented two UT multipliers: one using carry look-ahead adders and another one using carry save adders. The results show that, the proposed architectures have 12% better precision with 24% high throughput than existing algorithms, at the cost of high on-chip power. The inputs to the divider are represented in IEEE-754 standard. The design uses Xilinx Vivado software and it is implemented on Virtex7 FPGA.

Keywords

Newton–Raphson Floating-point Urdhva–Tiryakbhyam FPGA 

Notes

References

  1. 1.
    IEEE Computer Society (2008) IEEE standard for floating-point arithmetic. IEEE standard 754-2008, August 2008. http://ieeexplore.ieee.org/servlet/opac?punumber=4610933
  2. 2.
    Jean-Michel M, Brisebarre N, Dinechin FD, Lefevre V et al (2010) Handbook of floating-point arithmetic. Birkhauser, Springer, BerlinzbMATHGoogle Scholar
  3. 3.
    Liu Wie, Nannarelli Alberto (2013) Power efficient division and square root unit. IEEE Trans Comput 61(8):1059–1071MathSciNetCrossRefGoogle Scholar
  4. 4.
    Baliga H, Cooray N, Gamsaragan E, Smith P, Yoon K, Abel J, Valles A (2008) Improvements in the Intel Core2 Penryn processor family architecture and microarchitecture. Intel Technol J 12(3):179–192Google Scholar
  5. 5.
    Gerwig G, Wetter H, Schwarz EM, Haess J (2003) High performance floating-point unit with 116 bit wide divider. In: Proceedings of the 16th symposium computer arithmetic, pp. 87–94Google Scholar
  6. 6.
    Soderquist Peter, Lesser Miriam (1996) Area and performance tradeoffs in floating-point divide and square root implementations. ACM Comput Surv 28(3):1–48CrossRefGoogle Scholar
  7. 7.
    Goldberg R, Even G, Seidel PM (2007) An FPGA implementation of pipelined multiplicative division with IEEE rounding. In: International symposium on field-programmable custom computing machines, pp. 185–94Google Scholar
  8. 8.
    Oberman SF (1999) Floating-point division and square root algorithms and implementation in the AMD- K7 microprocessor. In: 14th IEEE Symposium Computer Arithmetic, Adelaide, pp. 106–115Google Scholar
  9. 9.
    Fermi (2009) NVIDIA’s next generation CUDA compute architecture. http://www.nvidia.com/content/PDF/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf. Accessed 25 Apr 2019
  10. 10.
    Amaricai Alexandru, Vladutiu Mircea, Boncalo Oana (2010) Design issues and implementations for floating-point divide-add fused. IEEE Trans Circuits and Systems 57(4):295–299Google Scholar
  11. 11.
    Lenart T, Owall V (2006) Architectures for dynamic data scaling in 2/4/8 K pipeline FFT cores. IEEE Trans Very Large Scale Integr 14(11):1286–1290CrossRefGoogle Scholar
  12. 12.
    Kornerup Peter, Muller Jean-Michel (2006) Choosing starting values for certain Newton–Raphson Iterations. Theoret Comput Sci 351:101–110MathSciNetCrossRefGoogle Scholar
  13. 13.
    Nannarelli Alberto (2016) Performance/Power Space Exploration for Binary64 Division Units. IEEE Trans. Comput 65(5):1671–1677MathSciNetCrossRefGoogle Scholar
  14. 14.
    Arish S, Sharma RK (2015) An efficient floating point multiplier design for high speed applications using Karatsuba algorithm and Urdhva–Tiryagbhyam algorithm. In: IEEE international conference on signal processing and communication, pp. 303–308Google Scholar
  15. 15.
    Ali MH, Sahani AK (2013) Study, implementation and comparison of different multipliers based on Array, KCM and Vedic Mathematics using EDA tools. Int J Sci Res Publ 3(6):1Google Scholar
  16. 16.
    Galal Sameh, Horowitz Mark (2011) Energy-efficient floating-point unit design. IEEE Trans Comput 60(7):913–922MathSciNetCrossRefGoogle Scholar
  17. 17.
    Tino Anita, Raahemifar Kaamran (2017) Increasing the efficiency and feasibility of configurable computing units. J Syst Arch 75:107–119CrossRefGoogle Scholar
  18. 18.
    Bailey DH (2005) High-precision floating-point arithmetic in scientific computation. Comput Sci Eng 7(3):54–61CrossRefGoogle Scholar
  19. 19.
    Hanuman CRS, Kamala J (2017) Implementation of high-performance floating point divider using pipeline architecture in FPGA. In: Proceedings of 6th international conference on FICTA—advances in intelligent systems and computing, Bhubaneswar, pp. 129–138Google Scholar
  20. 20.
    Venkataraman NL, Kumar R, Shakeel PM (2019) Ant lion optimized bufferless routing in the design of low power application specific network on chip. Circuits Syst Signal Process.  https://doi.org/10.1007/s00034-019-01065-6 CrossRefGoogle Scholar
  21. 21.
    Joldes Mioara, Marty Olivier, Muller Jean-Michel (2016) Arithmetic algorithms for extended precision using floating-point expansions. IEEE Trans Comput 65(4):1197–1210MathSciNetCrossRefGoogle Scholar
  22. 22.
    Tan Dimitri, Lemonds Carl E, Schulte Michel J (2009) Low-power multiple-precision iterative floating- point multiplier with SIMD support. IEEE Trans Comput 58(2):175–187MathSciNetCrossRefGoogle Scholar
  23. 23.
    Renxi G, Shangjun Z, Hainan Z, Xiaobi M, et al (2015) Hardware implementation of high speed floating point multiplier based on FPGA. In: IEEE international conference on computer science and education, pp. 1902–1906Google Scholar
  24. 24.
    Sriraman L, Prabhakar TN (2012) Design and implementation of two variable multiplier using KCM and vedic mathematics. In: IEEE International Conference on Recent Advances in Information Technolog, pp. 852–857Google Scholar
  25. 25.
    Anjana S, Pradeep C, Samuel P (2015) Synthesize of high speed floating-point multipliers based on Vedic mathematics. Proc Comput Sci 46:1294–1302CrossRefGoogle Scholar
  26. 26.
    Kodali RK, Boppana L, Yenamachintala SS (2015) FPGA Implementation of Vedic floating point multiplier. In: IEEE international conference on signal processing, informatics, communication and energy systems, KozhikodeGoogle Scholar
  27. 27.
    Zhang Hao, Chen Dongdong, Ko Seok-Bum (2017) High performance and energy efficient single- precision and double-precision merged floating-point adder on FPGA. IET Comput Digital Tech 12(1):20–29CrossRefGoogle Scholar
  28. 28.
    Luo Zhen, Martonosi Margaret (2000) Accelerating pipelined integer and floating-point accumulations in configurable hardware with delayed addition techs. IEEE Trans Comput 49(3):208–218CrossRefGoogle Scholar
  29. 29.
    Tingting HE, Jiyang CHEN, Baozhou ZHU et al (2017) High-performance FP divider with sharing multipliers based on goldschmidt algorithm. Chin J Electron 26(2):292–298CrossRefGoogle Scholar
  30. 30.
    Campbell Keith, Zuo Wei, Chen Deming (2017) New advances of high-level synthesis for efficient and reliable hardware design. Interation 58:189–214Google Scholar
  31. 31.
    Pimentel JJ, Bohnenstiehl B, Baas BM (2016) Hybrid hardware/software floating-point implementations for optimized area and throughput tradeoffs. IEEE Trans Very Large Scale Integr Syst 25(1):100–113CrossRefGoogle Scholar
  32. 32.
    Guralnik Elena, Aharoni Merav, Birnbaum Ariel J, Koyfman Anatoli (2011) Simulation-based verification of floating-point division. IEEE Trans Comput 60(2):176–188MathSciNetCrossRefGoogle Scholar
  33. 33.
    Aguilera-Galicia CR, Longoria-Gandara O, Pizano-Escalante L, Vázquez-Castillo J, Salim-Maza M (2018) On-chip implementation of a low-latency bit-accurate reciprocal square root unit. Integration 63:9–17CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Electronics and Communications EngineeringCEG, Anna UniversityChennaiIndia

Personalised recommendations