Basic Floating-Point Operators

de Dinechin, Florent; Kumm, Martin

doi:10.1007/978-3-031-42808-1_11

Florent de Dinechin³ &
Martin Kumm⁴

149 Accesses

Abstract

This chapter shows how to build the operators for the basic operations (addition and subtraction, multiplication, division, and square root) in floating point. Specialized floating-point operators (such as squarers and constant multipliers) and fused floating-point operators (such as fused multiply-add, combined sum and difference, or sum of squares) will be reviewed in Chap. 15. For each operation, we start with the construction of simple but non-standard operators suitable for hidden application-specific datapaths. Then, refinements for improved standard compliance or improved performance are presented.

It makes me nervous to fly on airplanes since I know they are designed using floating-point arithmetic.Alston Householder

Relax. Today’s planes are piloted using floating-point arithmetic. The authors

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The IEEE 754 standard [754-19] defines five exceptions (Invalid, Overflow, Underflow, DivideByZero, and Inexact) that can be trapped by software to manage the respective situations. Software may also ignore these exceptions, because the hardware returns a value in each of these situations (a NaN for Invalid, an infinity for Overflow and DivideByZero, a subnormal result for Underflow). In this book we assume that application-specific hardware will do without raising these exceptions. Our reader having a need for any of them should be aware that they have been well thought out in the IEEE 754 standard.
2.
It may be the subtraction of two numbers with the same sign or the addition of numbers with different signs.
3.
Mathematically, the equality of two infinities could be endlessly debated. At least this choice is consistent with a comparison of the concatenation of fraction and exponent field (when using IEEE 754 encoding).

References

IEEE Standard for Floating-Point Arithmetic. also IEEE/ISO/IEC 60559-2020. 2019
Google Scholar
Javier D. Bruguera. “Radix-64 Floating-Point Divider”. In: Symposium on Computer Arithmetic (ARITH). IEEE, 2018, pp. 87–94
Google Scholar
Javier D. Bruguera. “Low-Latency and High-Bandwidth Pipelined Radix-64 Division and Square Root Unit”. In: Symposium on Computer Arithmetic (ARITH). IEEE, 2022
Google Scholar
Marius Cornea, John Harrison, and Ping Tak Peter Tang. Scientific Computing on Itanium^®-Based Systems. Intel Press, 2002
Google Scholar
Florent de Dinechin, Mioara Joldeş, Bogdan Pasca, and Guil- laume Revy. “Multiplicative square root algorithms for FP-GAs”. In: International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 2010, pp. 574–577
Google Scholar
Pedro Echeverría and Marisa López-Vallejo. “Customizing floating-point units for FPGAs: Area-performance-standard trade-offs”. In: Microprocessors and Microsystems 35.6 (2011), pp. 535–546
Google Scholar
David R. Lutz. “Optimized Leading Zero Anticipators for Faster Fused Multiply-Adds”. In: Asilomar Conference on Signals, Circuits and Systems. IEEE, 2017, pp. 741–744
Google Scholar
David R. Lutz. “ARM Floating-Point 2019: Latency, Area, Power”. In: Symposium on Computer Arithmetic (ARITH). IEEE, 2019, pp. 69–76
Google Scholar
Peter Markstein. IA-64 and Elementary Functions: Speed and Precision. Hewlett-Packard Professional Books. Prentice Hall, 2000
Google Scholar
Jean-Michel Muller, Nicolas Brunie, Florent de Dinechin, Claude-Pierre Jeannerod, Mioara Joldeş, Vincent Lefèvre, Guillaume Melquiond, Nathalie Revol, and Serge Torres. Handbook of Floating-Point Arithmetic. 2nd ed. Birkhäuser Boston, 2018
Google Scholar
Martin M. Schmookler and Kevin J. Nowka. “Leading Zero Anticipation and Detection - A comparison of methods”. In: Symposium on Computer Arithmetic (ARITH). IEEE, 2001, pp. 7–12
Google Scholar
Jongwook Sohn, David K. Dean, Eric Quintana, and Wing Shek Wong. “Enhanced Floating-Point Adder with Full De-normal Support”. In: Symposium on Computer Arithmetic (ARITH). IEEE, 2022
Google Scholar

Download references

Author information

Authors and Affiliations

CITI laboratory, INSA-Lyon, Villeurbanne, France
Florent de Dinechin
Fulda University of Applied Sciences, Fulda, Germany
Martin Kumm

Authors

Florent de Dinechin
View author publications
You can also search for this author in PubMed Google Scholar
Martin Kumm
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

de Dinechin, F., Kumm, M. (2024). Basic Floating-Point Operators. In: Application-Specific Arithmetic. Springer, Cham. https://doi.org/10.1007/978-3-031-42808-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-42808-1_11
Published: 23 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42807-4
Online ISBN: 978-3-031-42808-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics