Circuits, Systems, and Signal Processing

, Volume 37, Issue 1, pp 383–407 | Cite as

An Unified Architecture for Single, Double, Double-Extended, and Quadruple Precision Division

  • Manish Kumar Jaiswal
  • Hayden K.-H. So


A hardware architecture for quadruple precision floating point division arithmetic with multi-precision support is presented. Division is an important yet far more complex arithmetic operation than addition and multiplication, which demands significant amount of hardware resources for a complete implementation. The proposed architecture also supports the processing of single-, double-, and double-extended precision computations with varied latency. An iterative multiplicative-based architecture for multi-precision quadruple precision division is proposed with small size and promising performance. The proposed mantissa division architecture, the most complex sub-unit, employs a series expansion methodology of division. The architecture follows the standard state-of-the-art flow for floating point division arithmetic with normal as well as subnormal processing. The proposed division architecture is synthesized using UMC 90nm ASIC standard cell library. It is also demonstrated using a Xilinx FPGA-based implementation which is integrated with a wide integer multiplier for mantissa division further optimized for FPGA implementations facilitating the built-in DSP blocks efficiently. When compared to existing quadruple precision divider available in the literature, the proposed architecture has 25% equivalent area saving, 2\({\times }\) improvement in latency with improved speed on FPGA platform; and it has more than 50% area saving, 3\({\times }\) improvement in latency-throughput with better speed on ASIC platform.


Quadruple precision arithmetic Division ASIC FPGA Iterative architecture Multi-precision division Digital arithmetic 


  1. 1.
    E. Antelo, T. Lang, P. Montuschi, A. Nannarelli, Low latency digit-recurrence reciprocal and square-root reciprocal algorithm and architecture, in 17th IEEE Symposium on Computer Arithmetic (2005), pp. 147–154. doi: 10.1109/ARITH.2005.29
  2. 2.
    D.H. Bailey, R. Barrio, J.M. Borwein, High-precision computation: mathematical physics and dynamics. Appl. Math. Comput. 218(20), 10106–10121 (2012). doi: 10.1016/j.amc.2012.03.087 MathSciNetzbMATHGoogle Scholar
  3. 3.
    S. Banescu, F. de Dinechin, B. Pasca, R. Tudoran, Multipliers for floating-point double precision and beyond on FPGAs. SIGARCH Comput. Archit. News 38, 73–79 (2011). doi: 10.1145/1926367.1926380 CrossRefGoogle Scholar
  4. 4.
    L. Dadda, Some schemes for parallel multipliers. Alta Freq. 34, 349–356 (1965)Google Scholar
  5. 5.
    M.M. Daniel, F.S. Diego, H.L. Carlos, A.R. Mauricio, Tradeoff of FPGA design of a floating-point library for arithmeitic operators. J. Integr. Circuits Syst. 5(1), 42–52 (2010)Google Scholar
  6. 6.
    F. de Dinechin, Large multipliers with fewer DSP blocks, in International Conference on Field Programmable Logic and Applications (2009), pp. 250–255. doi: 10.1109/FPL.2009.5272296
  7. 7.
    F. de Dinechin, G. Villard, High precision numerical accuracy in physics research. Nucl. Instrum. Methods Phys. Res. A 559(1), 207–210 (2006). doi: 10.1016/j.nima.2005.11.140 CrossRefGoogle Scholar
  8. 8.
    P. Diniz, G. Govindu, Design of a field-programmable dual-precision floating-point arithmetic unit, in Field Programmable Logic and Applications, 2006. FPL ’06. International Conference on (2006), pp. 1–4. doi: 10.1109/FPL.2006.311302
  9. 9.
    Y. Dou, Y. Lei, G. Wu, S. Guo, J. Zhou, L. Shen, FPGA accelerating double/quad-double high precision floating-point applications for ExaScale computing, in ICS ’10: Proceedings of the 24th ACM International Conference on Supercomputing (ACM, New York, 2010), pp. 325–336. doi: 10.1145/1810085.1810129
  10. 10.
    X. Fang, M. Leeser, Vendor agnostic, high performance, double precision floating point division for FPGAs, in The 17th IEEE High Performance Extreme Computing (HPEC) (Waltham, 2013)Google Scholar
  11. 11.
    R.E. Goldschmidt, Application of division by convergence. Master’s thesis, Massachusetts Institute of Technology (1964)Google Scholar
  12. 12.
    K.S. Hemmert, K.D. Underwood, Floating-point divider design for FPGAs. IEEE Trans. Very Large Scale Integr. Syst. 15(1), 115–118 (2007). doi: 10.1109/TVLSI.2007.891099 CrossRefGoogle Scholar
  13. 13.
    IEEE standard for floating-point arithmetic, IEEE Std 754-2008, 1–70 (2008). doi: 10.1109/IEEESTD.2008.4610935
  14. 14.
    A. Isseven, A. Akkaş, A dual-mode quadruple precision floating-point divider, in Signals, Systems and Computers, 2006. ACSSC ’06. Fortieth Asilomar Conference on (2006), pp. 1697–1701. doi: 10.1109/ACSSC.2006.355050
  15. 15.
    M.K. Jaiswal, R. Cheung, M. Balakrishnan, K. Paul, Series expansion based efficient architectures for double precision floating point division. Circuits Syst. Signal Process. 33(11), 3499–3526 (2014). doi: 10.1007/s00034-014-9811-8 CrossRefzbMATHGoogle Scholar
  16. 16.
    M.K. Jaiswal, R.C.C. Cheung, Area-efficient architectures for large integer and quadruple precision floating point multipliers, in The 20th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. IEEE Computer Society, Los Alamitos, CA, USA (2012), pp. 25–28. doi: 10.1109/FCCM.2012.14
  17. 17.
    M.K. Jaiswal, H.K.H. So, architecture for quadruple precision floating point division with multi-precision support, in 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP) (2016), pp. 239–240. doi: 10.1109/ASAP.2016.7760807
  18. 18.
    M.K. Jaiswal, H.K.H. So, Taylor series based architecture for quadruple precision floating point division, in 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (2016), pp. 518–523. doi: 10.1109/ISVLSI.2016.10
  19. 19.
    J.C. Jeong, W.C. Park, W. Jeong, T.D. Han, M.K. Lee, A cost-effective pipelined divider with a small lookup table. IEEE Trans. Comput. 53(4), 489–495 (2004). doi: 10.1109/TC.2004.1268407 Google Scholar
  20. 20.
    A. Karatsuba, Y. Ofman, Multiplication of many-digital numbers by automatic computers. Proc. USSR Acad. Sci. 145, 293–294 (1962)Google Scholar
  21. 21.
    P.M. Kogge, H.S. Stone, A parallel algorithm for the efficient solution of a general class of recurrence equations. IEEE Trans. Comput. C–22(8), 786–793 (1973). doi: 10.1109/TC.1973.5009159 MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    S.F. Obermann, M.J. Flynn, Division algorithms and implementations. IEEE Trans. Comput. 46(8), 833–854 (1997). doi: 10.1109/12.609274 MathSciNetCrossRefGoogle Scholar
  23. 23.
    B. Pasca, Correctly rounded floating-point division for dsp-enabled fpgas, in Field Programmable Logic and Applications (FPL), 2012 22nd International Conference on (2012), pp. 249 –254. doi: 10.1109/FPL.2012.6339189
  24. 24.
    X. Wang, M. Leeser, Vfloat: a variable precision fixed- and floating-point library for reconfigurable hardware. ACM Trans. Reconfig. Technol. Syst. 3(3), 16:1–16:34 (2010). doi: 10.1145/1839480.1839486 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Department of EEEThe University of Hong KongPok Fu LamHong Kong

Personalised recommendations