1 Introduction

Optimization modulo theories (\(\text {OMT}\)) [6,7,8,9, 20,21,22,23, 28,29,32, 35, 36, 38,39,43] is an important extension to satisfiability modulo theories which allows for finding models that optimize one or more objectives, which typically consist in some linear-arithmetic or Pseudo-Boolean function application.

Nevertheless, many SMT and OMT applications, in particular from SW and HW verification, require handling bit-precise representations of numbers, which in SMT are handled by means of the theory of bit-vectors (\({{\mathcal {B}}}{{\mathcal {V}}}\)) for the integers and that of Floating-Point Numbers (\(\mathcal {FP}\)) for the reals respectively and their combination (\(\mathcal {B}{{\mathcal {V}}}\,\cup \,\mathcal {F}\mathcal {P} \)). For instance, during the verification process of a piece of software, one may look for the minimum/maximum value of some int or double parameter causing an \(\text {SMT}(\mathcal {B}{{\mathcal {V}}}\,\cup \,\mathcal {F}\mathcal {P})\) call to return sat—which typically corresponds to the presence of some bug—so that to guarantee a safe range for such parameter; also, one may want to find the maximum relative difference in the double values returned by two implementations of the same function.

Example 1

Consider some C/C++ library implementation of some mathematical function f: Double\(^N\ \longmapsto \ \) Double. Suppose one wants to substitute it with a new implementation \(f'(...)\) of the same function. Given the ranges \([{\underline{l}},{\underline{u}}]\) for the values of \({\underline{x}}\), one may want to find the maximum relative difference between the value returned by the two functions. This can be done, e.g., by finding the maximum value of \(\epsilon \) s.t. the \(\text {SMT}(\mathcal {B}{{\mathcal {V}}}\,\cup \,\mathcal {F}\mathcal {P})\) formula

$$\begin{aligned}&(|f({\underline{x}})-f'({\underline{x}})| > \epsilon * max\{|f({\underline{x}})|,|f'({\underline{x}})|\}) \wedge \\\nonumber&\textstyle (f({\underline{x}})=...) \wedge (f'({\underline{x}})=...) \wedge \bigwedge _{i=1}^N ((l_i\le x_i) \wedge (x_i\le u_i)) \end{aligned}$$
(1)

is satisfiable, where \((f({\underline{x}})=...)\) and \((f'({\underline{x}})=...)\) are the \(\text {SMT}(\mathcal {B}{{\mathcal {V}}}\,\cup \,\mathcal {F}\mathcal {P})\) Footnote 1 encodings of the implementations of the functions f and \(f'\) respectively.Footnote 2 Notice that here it is strictly necessary to use bit-precise representation of numbers provided by \({\mathcal {B}{{\mathcal {V}}}\,\cup \,\mathcal {F}\mathcal {P}}\) —rather than standard non-linear arithmetic—in order to reproduce the truncating and rounding errors and their propagation. (E.g., two C functions computing iteratively \(a_0+a_1*x+...+a_n*x^n\) and \(a_0+x*(a_1+x*(... +x*(a_n))..))\) by floating-point arithmetic may return different values on the same input value x, although they are mathematically equivalent.)

OMT for the theory of (unsigned) bit-vectors was proposed by Nadel and Ryvchin [32], although a reduction of the problem to MaxSAT was already implemented in the SMT/OMT solver Z3 [10]. The work in [32] was based on the observation that \(\text {OMT}\) on unsigned \({{\mathcal {B}}}{{\mathcal {V}}}\) can be seen as lexicographic optimization over the bits in the bitwise representation of the objective, ordered from the most-significant bit (MSB) to the least-significant bit (LSB). Notice that, in this domain, this corresponds to a binary search over the space of the values of the objective.

In this paper (as in [44]) we address—for the first time to the best of our knowledge—\(\text {OMT}\) for objectives in the theory of signed Bit-Vectors and, most importantly, in the theory of Floating-Point Arithmetic, by exploiting some properties of the two’s complement encoding for signed \({{\mathcal {B}}}{{\mathcal {V}}}\) and of the IEEE 754-2008 encoding for \(\mathcal {FP}\) respectively.  (We consider the former as a straightforward extension of [32], and the latter as our main contribution.)

We start from introducing the notion of attractor, which represents (the bitwise encoding of) the target value for the objective which the optimization process aims at. This allows us to easily leverage the procedure of [32] to work with both signed and unsigned bit-vectors, by minimizing lexicographically the bitwise distance between the objective and the attractor, that is, by minimizing lexicographically the bitwise-xor between the objective and the attractor.

Unfortunately there is no such notion of (fixed) attractor for \(\mathcal {FP}\) numbers, because the target value changes as long as the bits of the objective are updated from the MSB to the LSB, and the optimization process may have to change dynamically its aim, even in the opposite direction. (For instance, as soon as the minimization process realizes there is no solution with a negative value for the objective and thus sets its MSB to 0, the target value is switched from \(-\infty \) to \(0^+\), and the search switches direction, from the maximization of the exponent and the significand to their minimization.).

To cope with this fact, we introduce the notions of dynamic attractor and attractor trajectory, representing the dynamics of the moving target value, which are progressively updated as soon as the bits of the objective are updated from the MSB to the LSB. Based on these ideas, we present novel OMT procedures for \(\mathcal {FP}\) objectives, which require at most \(n+2\), incremental calls to an \( {\mathcal {B}}{\mathcal {V}} \cup {\mathcal {F}}{\mathcal {P}}\) solver, n being the number of bits in the representation of the objective. Notice that these procedures do not depend on the underlying \( {\mathcal {B}}{\mathcal {V}} \cup {\mathcal {F}}{\mathcal {P}}\) procedure used, provided the latter allows for accessing and setting the single bits of the objective.

Notice that, unlike with the \({{\mathcal {B}}}{{\mathcal {V}}}\) domain, this does not simply perform binary search over the space of the values of the objective. Rather, it first performs (a lexicographic bitwise search corresponding to) binary search of the exponent values, which very-rapidly converges to the right order of magnitude, followed by binary search on the significand values, which fine-tunes the final result.

We have implemented these OMT procedures on top of the OptiMathSAT \(\text {OMT}\) solver [43]. We have run an experimental evaluation of the procedures on modified SMT problems from the SMT-LIB library. The empirical results support the validity and feasibility of the novel approach.

The rest of the paper is organized as follows. In Sect. 2 we provide the necessary background on \({{\mathcal {B}}}{{\mathcal {V}}}\) and \(\mathcal {FP}\) theories and reasoning. In Sect. 3 we provide the novel theoretical definitions and results. In Sect. 4 we describe our novel OMT procedures. In Sect. 5 we present the empirical evaluation. In Sect. 6 we conclude, hinting some future directions.

2 Background

We assume some basic knowledge on SAT and SMT and briefly introduce the reader to the Bit-Vector and Floating-Point theories.

Bit-Vectors A bit is a Boolean variable that can be interpreted as 0 or 1. A Bit-Vector (\({{\mathcal {B}}}{{\mathcal {V}}}\)) variable \({\mathbf {v}}^{[n]}\) is a vector of n bits, where v[0] is the Most Significant Bit (MSB) and \(v[n-1]\) is the Least Significant Bit (LSB).Footnote 3 A \({{\mathcal {B}}}{{\mathcal {V}}}\) constant of width n is an interpreted vector of n values in \(\{{0, 1}\} \). We \({\overline{overline}}\) a bit value or a \({{\mathcal {B}}}{{\mathcal {V}}}\) value to denote its complement (e.g., \(\overline{[11010010]}\) is [00101101]). A \({{\mathcal {B}}}{{\mathcal {V}}}\) variable/constant of width n can be unsigned, in which case its domain is \([0, 2^n - 1]\), or signed, which we assume to comply with the Two’s complement representation, so that its domain is \([-2^{(n - 1)},2^{(n - 1)}-1]\). Therefore, the vector [11111111] can be interpreted either as the unsigned \({{\mathcal {B}}}{{\mathcal {V}}}\) constant \(\mathbf {255}^{[8]}\) or as the signed \({{\mathcal {B}}}{{\mathcal {V}}}\) constant \(\mathbf {-1}^{[8]}\). Following the SMT-LIBv2 standard [4], we may also represent a \({{\mathcal {B}}}{{\mathcal {V}}}\) constant in binary form (e.g. \({{\mathbf {2}}}{{\mathbf {8}}}^{[8]}\) is written \(\#b00011100\)). A \({{\mathcal {B}}}{{\mathcal {V}}}\) term is built from \({{\mathcal {B}}}{{\mathcal {V}}}\) constants, variables and interpreted \({{\mathcal {B}}}{{\mathcal {V}}}\) functions which represents standard Register-Transfer Level (RTL) operators: word concatenation (e.g. \({\mathbf {3}}^{[8]} \circ {\mathbf {x}}^{[8]}\)), sub-word selection (e.g. \(({\mathbf {3}}^{[8]}[6:3])^{[4]}\)), modulo-n sum and multiplication (e.g. \({\mathbf {x}}^{[8]} +_{8} {\mathbf {y}}^{[8]}\) and \({\mathbf {x}}^{[8]} \cdot _{8} {\mathbf {y}}^{[8]}\)), bit-wise operators (like, e.g., \(\mathbf{and} _n\), \(\mathbf{or} _n\), \(\mathbf{xor} _n\), \(\mathbf{nxor} _n\), \(\mathbf{not} _n\)), left and right shift \({<<}_n\), \({>>}_n\). A \({{\mathcal {B}}}{{\mathcal {V}}}\) atom can be built by combining \({{\mathcal {B}}}{{\mathcal {V}}}\) terms with interpreted predicates (either signed or unsigned ones) like \(\ge _n\), \(<_n\) (e.g. \({\mathbf {0}}^{[8]} \ge _8 {\mathbf {x}}^{[8]}\)) and equality. We refer the reader to [4, 25] for further details on the syntax and semantics of Bit-Vector theory.

There are two main approaches for \({{\mathcal {B}}}{{\mathcal {V}}}\) satisfiability, the “eager” and the ”lazy” approach, which are substantially complementary to one another [26]. In the eager approach, \({{\mathcal {B}}}{{\mathcal {V}}}\) terms and constraints are encoded into SAT via bit-blasting [17, 18, 24, 25, 33, 34]. In the lazy approach, \({{\mathcal {B}}}{{\mathcal {V}}}\) terms are not immediately expanded—so to avoid any scalability issue—and the \({{\mathcal {B}}}{{\mathcal {V}}}\) solver is comprised by a layered set of techniques, each of which deals with a sub-portion of the \({{\mathcal {B}}}{{\mathcal {V}}}\) theory [11, 16, 19, 25].

Floating-Point     The theory of Floating-Point Numbers (\(\mathcal {FP}\)), [4, 14, 37], is based on the IEEE standard 754-2008 [5] for floating-point arithmetic, restricted to the binary case. A \(\mathcal {FP}\) sort is an indexed nullary sort identifier of the form (_ FP \(<ebits>\) \(<sbits>\)) s.t. both ebits and sbits are positive integers greater than one, ebits defines the number of bits in the exponent and sbits defines the number of bits in the significand, including the hidden bit. A \(\mathcal {FP}\) variable \({\mathbf {v}}^{[n]}\) with sort (_ FP \(<ebits>\) \(<sbits>\)) can be indifferently viewed as a vector of \(n {\mathop {=}\limits ^{\text {\tiny def}}} ebits + sbits\) bits, where v[0] is the Most Significant Bit (MSB) and \(v[n-1]\) is the Least Significant Bit (LSB), or as a triplet of bit-vectors \(\langle \mathbf {sign}, \mathbf {exp}, \mathbf {sig} \rangle \) s.t. \(\mathbf {sign}\) is a \({{\mathcal {B}}}{{\mathcal {V}}}\) of size 1, \(\mathbf {exp}\) is a \({{\mathcal {B}}}{{\mathcal {V}}}\) of size ebits and \(\mathbf {sig}\) is a \({{\mathcal {B}}}{{\mathcal {V}}}\) of size \(sbits - 1\). A \(\mathcal {FP}\) constant is a triplet of \({{\mathcal {B}}}{{\mathcal {V}}}\) constants. Given a fixed floating-point sort, i.e. a pair \(\langle {ebits},{sbits}\rangle \), the following \(\mathcal {FP}\) constants are implicitly defined:

value

Symbol

\({{\mathcal {B}}}{{\mathcal {V}}}\) Repr.

plus infinity

(_ +oo \(<ebits>\) \(<sbits>\))

(fp #b0 #b1...1 #b0...0)

minus infinity

(_ -oo \(<ebits>\) \(<sbits>\))

(fp #b1 #b1...1 #b0...0)

plus zero

(_ +zero \(<ebits>\) \(<sbits>\))

(fp #b0 #b0...0 #b0...0)

minus zero

(_ -zero \(<ebits>\) \(<sbits>\))

(fp #b1 #b0...0 #b0...0)

not-a-number

(_ NaN \(<ebits>\) \(<sbits>\))

(fp t #b1...1 s)

where t is either 0 or 1 and s is a \({{\mathcal {B}}}{{\mathcal {V}}}\) which contains at least a 1.

Setting aside special \(\mathcal {FP}\) constants, the remaining \(\mathcal {FP}\) values can be classified to be either normal or subnormal (a.k.a. denormal) [5]. A \(\mathcal {FP}\) number is said to be subnormal when every bit in its exponent is equal to zero, and normal otherwise. The significand of a normal \(\mathcal {FP}\) number is always interpreted as if the leading binary digit is equal 1, whereas for denormalized \(\mathcal {FP}\) values the leading binary digit is always 0. This allows for the representation of numbers that are closer to zero, although with reduced precision. Notice that the absolute value of any subnormal \(\mathcal {FP}\) number is smaller than the absolute value of any non-zero normal \(\mathcal {FP}\) number, and that the value contribution of the significand bits is always less significant than that of the exponent bits.

Example 2

Let x be the normal \(\mathcal {FP}\) constant (_ FP #b0 #b1100 #b0101000), and y be the subnormal \(\mathcal {FP}\) constant (_ FP #b0 #b0000 #b0101000), so that their corresponding sort is (_ FP \(<4> <8>\)). Then, according to the semantics defined in the IEEE standard 754-2008 [5], the floating-point value of x and y in decimal notation is:

$$\begin{aligned} x =&\,\, (-1)^{0} \cdot 2^{(12 - 7)} \cdot \bigg ( 1 + \sum _{i=1}^{7} \Big (x[4+i] \cdot 2^{-i}\Big ) \bigg ) = 1 \cdot 2^5 \cdot \bigg ( 1 + \frac{1}{2^2} + \frac{1}{2^4} \bigg ) = 42 \\ y =&\,\, (-1)^{0} \cdot 2^{(0 - 7 {+ 1})} \cdot \bigg ( {0} + \sum _{i=1}^{7} \Big (y[4+i] \cdot 2^{-i}\Big ) \bigg ) = 1 \cdot 2^{-6} \cdot \bigg ( \frac{1}{2^2} + \frac{1}{2^4} \bigg ) = \frac{5}{2^{10}}. \end{aligned}$$

Notice that with (_ FP \(<4> <8>\)) the smallest strictly-positive normal value is \(2^{-6}\), whereas the greatest subnormal value is \(2^{-6}\cdot \sum _{i=1}^{7}2^{-i}\), which is smaller than \(2^{-6}\). \(\diamond \)

The theory of \(\mathcal {FP}\) provides a variety of built-in floating-point operations as defined in the IEEE standard 754-2008. This includes binary arithmetic operations (e.g. \(+, -, \star , \div \)), basic unary operations (e.g. \(abs, -\)), binary comparison operations (e.g. \(\le , <, \ne , =, >, \ge \)), the remainder operation, the square root operation and more. Importantly, arithmetic operations are performed as if with infinite precision, but the result is then rounded to the “nearest” representable \(\mathcal {FP}\) number according to the specified rounding mode. Five rounding modes are made available, as in [5].

The most common approach for \(\mathcal {FP}\)-satisfiability is to encode \(\mathcal {FP}\) expressions into \({{\mathcal {B}}}{{\mathcal {V}}}\) formulas based on the circuits used to implement floating-point operations, using appropriate under- and over-approximation schemes—or a mixture of both—to improve performance [15, 45,46,47]. Then, the \({{\mathcal {B}}}{{\mathcal {V}}}\) -Solver is used to deal with the \(\mathcal {FP}\) formula, using either the eager or the lazy \({{\mathcal {B}}}{{\mathcal {V}}}\) approach. An alternative approach, based on abstract interpretation, is presented in [12, 13, 27]. With this technique, called Abstract CDCL (ACDCL), the set of feasible solutions is over-approximated with floating-point intervals, so that intervals-based conflict analysis is performed to decide \(\mathcal {FP}\)-satisfiability.

3 Theoretical Framework

We first present our generalization of [32] to the case of signed Bit-Vector Optimization (Sect. 3.1), and then move on to deal with Floating-Point Optimization (Sect. 3.2).

3.1 Bit-Vector Optimization

Without any loss of generality, we assume that every objective function f(...) is replaced by a variable \({\hbox {obj}} \) of the same type by conjoining “\({\hbox {obj}} = f(...)\)” to the input formula. We use the symbol n to denote the bit-width of \({\hbox {obj}} \), and \({\hbox {obj}} [i] \) to denote the i-th bit of \({\hbox {obj}} \), where \({\hbox {obj}} [0] \) and \({\hbox {obj}} [n-1] \) are the Most Significant Bit (MSB) and the Least Significant Bit (LSB) of \({\hbox {obj}} \) respectively.Footnote 4 We define the Bit-Vector Optimization problem as follows.

Definition 1

(OMT\(_{[{\mathcal {B}}{\mathcal {V}}]}({\mathcal {B}}{\mathcal {V}}\cup {\mathcal {T}})\)) Let \(\varphi \) be a \(\text {SMT}({{\mathcal {B}}}{{\mathcal {V}}} \,\cup \,{\mathcal {T}} )\) formula for some (possibly empty) theory \({\mathcal {T}}\) and \({\hbox {obj}}\) be a—signed or unsigned—\({{\mathcal {B}}}{{\mathcal {V}}}\) variable occurring in \(\varphi \). We call an Optimization Modulo \({{\mathcal {B}}}{{\mathcal {V}}}\) problem for \({{\mathcal {B}}}{{\mathcal {V}}} \,\cup \,{\mathcal {T}} \), \({\varvec{\text {OMT}({{\mathcal {B}}}{{\mathcal {V}}} \,\cup \,{\mathcal {T}} )}}\), the problem of finding a \({{\mathcal {B}}}{{\mathcal {V}}} \,\cup \,{\mathcal {T}} \) -model \({\mathcal {M}}\) for \(\varphi \) (if any) whose value of \({\hbox {obj}}\) is a minimum wrt. the total order relation \(\le _n\) for signed \({{\mathcal {B}}}{{\mathcal {V}}}\) s if \({\hbox {obj}}\) is signed, and the one for unsigned \({{\mathcal {B}}}{{\mathcal {V}}}\) s otherwise. (The dual definition where we look for the maximum follows straightforwardly)

Notice that the definition is independent on the extra theory \({\mathcal {T}}\), provided that \({\hbox {obj}}\) is a \({{\mathcal {B}}}{{\mathcal {V}}}\) term. (In practice \({\mathcal {T}}\) may be empty, or contain \(\mathcal {FP}\) or/and other theories like e.g. that of arrays.) Hereafter, unless otherwise specified and when it is not necessary to make \({\mathcal {T}}\) explicit, we will abbreviate “OMT\(_{[{\mathcal {B}}{\mathcal {V}}]}({\mathcal {B}}{\mathcal {V}}\cup {\mathcal {T}})\)” into “OMT\(_{[{\mathcal {B}}{\mathcal {V}}]}\)”.

We generalize the unsigned \({{\mathcal {B}}}{{\mathcal {V}}}\) maximization procedures in [32] to the case of signed and unsigned \({{\mathcal {B}}}{{\mathcal {V}}}\) optimization. To this extent, we introduce the novel notion of \({{\mathcal {B}}}{{\mathcal {V}}}\) attractor.

Definition 2

(Attractor, attractor equalities). When minimizing [resp. maximizing], we call attractor for \({\hbox {obj}}\) the smallest [resp. greatest] \({{\mathcal {B}}}{{\mathcal {V}}}\)-value \(attr\) of the sort of \({\hbox {obj}}\). We call vector of attractor equalities the vector \(A \) s.t. \(A[k] {\mathop {=}\limits ^{\text {\tiny def}}} ({\hbox {obj}} [k] = attr [k])\), \(k\in [0..n-1]\).

Example 3

If \({\hbox {obj}} ^{[8]}\) is an unsigned \({{\mathcal {B}}}{{\mathcal {V}}}\) objective of width 8, then its corresponding attractor \(attr\) is \({\mathbf {0}}^{[8]}\), i.e. [00000000], when \({\hbox {obj}} ^{[8]}\) is minimized and it is \(\mathbf {255}^{[8]}\), i.e. [11111111], when \({\hbox {obj}} ^{[8]}\) is maximized. When \({\hbox {obj}} ^{[8]}\) is instead a signed \({{\mathcal {B}}}{{\mathcal {V}}}\) objective, following the two’s complement encoding, the corresponding \(attr\) is \(\mathbf {-128}^{[8]}\), i.e. [10000000], for minimization and \(\mathbf {127}^{[8]}\), i.e. [01111111], for maximization. \(\diamond \)

In essence, the attractor can be seen as the target value of the optimization search and therefore it can be used to determine the desired improvement direction and to guide the decisions taken by the optimization search. By construction, if a model \({\mathcal {M}}\) satisfies all equalities \(A[i] \), then the evaluation of \({\hbox {obj}} \) in \({\mathcal {M}}\) is \(attr \).

We use the symbol \(\mu _k\) to denote a generic (possibly partial) assignment which assigns at least the k most-significant bits of \({\hbox {obj}} \). We use the symbol \(\tau _k\) to denote an assignment to the k most-significant bits of \({\hbox {obj}} \). Given \(i<k\), we denote by \(\mu _k[i] [\hbox {resp}.{\tau _k[i]}]\) the value in \(\{{0,1}\}\) assigned to \({\hbox {obj}} [i]\) by \(\mu _k [\hbox {resp}.{\tau _k}]\). Moreover, we use the expression \([\![\mu _k]\!]_{i} \) where \(i \le k\) to denote the restriction of \(\mu _k\) to the i most-significant bits of \({\hbox {obj}} \), \({\hbox {obj}} [0],...,{\hbox {obj}} [i-1] \). Given a model \({\mathcal {M}}\) of \(\varphi \) and a variable v, we denote by \({\mathcal {M}}(v)\) the evaluation of v in \({\mathcal {M}}\). With a little abuse of notation, and when this does not cause ambiguities, we sometimes use an attractor equality \(A[i] {\mathop {=}\limits ^{\text {\tiny def}}} ({\hbox {obj}} [i] = attr [i])\) to denote the single-bit assignment \({\hbox {obj}} [i] := attr [i] \) and we use its negation \(\lnot A[i] \) to denote the assignment to the complement value \({\hbox {obj}} [i] := \overline{attr [i]}\).

Definition 3

(lexicographic maximization) Consider an OMT\(_{[{\mathcal {B}}{\mathcal {V}}]}\) instance \(\langle {\varphi },{{\hbox {obj}}}\rangle \) and the vector of attractor equalities \(A \). We say that an assignment \(\tau _n\) to \({\hbox {obj}}\)lexicographically maximizes \(A \) wrt. \(\varphi \) iff, for every \(k\in [0..{n-1}]\),

  • \(\tau _n[k] = \overline{attr {}[k]}\) if \(\varphi \wedge [\![\tau _n]\!]_{k} \wedge A[k] \) is unsatisfiable,

  • \(\tau _n[k] = attr {}[k]\) otherwise.

where \(A[k] \) is the attractor equality \(({\hbox {obj}} [k] = attr {}[k])\). Given a model \({\mathcal {M}}\) for \(\varphi \), we say that \({\mathcal {M}}\) lexicographically maximizes \(A \) wrt. \(\varphi \) iff its restriction to \({\hbox {obj}}\) lexicographically maximizes \(A \) wrt. \(\varphi \).

Starting from the MSB to the LSB, \(\tau _n {[resp. \mathcal {M}]}\) in Definition 3 assigns to each \({\hbox {obj}} [k]\) the value \(attr [k] \) unless it is inconsistent wrt. \(\varphi \) and the assignments to the previous \({\hbox {obj}} [i]\)s, \(i \in [0..k-1]\).

Notice that this corresponds to the minimization of \(\sum _{k=0}^{n-1}2^{n-1-k}\cdot ({\hbox {obj}} [k] \,\mathbf {xor}_1\,attr [k]) {[resp. maximization\,of\,\sum _{k=0}^{n-1}2^{n-1-k} \cdot {({\hbox {obj}} [k] \,\mathbf {nxor}_1\,attr [k])}]}\)—where \(\mathbf {xor}_n\) is the bitwise-xor operator and \(\mathbf {nxor}_n\) is its complement—because \(2^{n-1-i}>\sum _{k=i+1}^{n-1}2^{n-1-k}\) for every \(n>i\ge 0\).Footnote 5

The following fact derives from the above definitions and the properties of two’s complement representation adopted by the SMT-LIBv2 standard for signed \({{\mathcal {B}}}{{\mathcal {V}}}\)Footnote 6

Theorem 1

An optimal solution of an OMT\(_{[{\mathcal {B}}{\mathcal {V}}]}\) problem \(\langle {\varphi },{{\hbox {obj}}}\rangle \) is any model \({\mathcal {M}}\) of \(\varphi \) which lexicographically maximizes the vector of attractor equalities \(A \).

Proof

(We investigate the minimization case, since the maximization case is dual.)

In the case of minimization with unsigned \({{\mathcal {B}}}{{\mathcal {V}}}\), \(attr\) is [00...00], so that the lexicographic maximization of \(A \) corresponds to minimize \(\sum _{k=0}^{n-1}2^{n-1-k}\cdot {\hbox {obj}} [k] \) which is the standard minimization for unsigned \({{\mathcal {B}}}{{\mathcal {V}}}\).

In the case of minimization with signed \({{\mathcal {B}}}{{\mathcal {V}}}\), \(attr\) is [10...00], so that the lexicographic maximization of \(A \) corresponds to minimize \(2^{n-1}\cdot \overline{{\hbox {obj}} [0]} +\sum _{k=1}^{n-1}2^{n-1-k}\cdot {\hbox {obj}} [k] \) which—by means of subtracting the constant value \(2^{n-1}\)—is equivalent to minimize \(-2^{n-1}\cdot {\hbox {obj}} [0] +\sum _{k=1}^{n-1}2^{n-1-k}\cdot {\hbox {obj}} [k] \), which is the standard minimization for two’s complement \({{\mathcal {B}}}{{\mathcal {V}}}\). \(\square \)

Definitions 2 and 3 with Theorem 1 suggest thus a direct extension to the minimization/maximization of signed \({{\mathcal {B}}}{{\mathcal {V}}}\) of the algorithm for unsigned \({{\mathcal {B}}}{{\mathcal {V}}}\) in [32]: apply the unsigned- \({{\mathcal {B}}}{{\mathcal {V}}}\) maximization [resp. minimization] algorithm of [32] to the objective \({\hbox {obj}} ' {\mathop {=}\limits ^{\text {\tiny def}}} {({\hbox {obj}} \,\mathbf {nxor}_n\,attr)} {[resp. {\hbox {obj}} ' {\mathop {=}\limits ^{\text {\tiny def}}} ({\hbox {obj}} \,\mathbf {xor}_n\,attr)]}\) instead of simply to \({\hbox {obj}}\) \({[resp. {\overline{{\hbox {obj}}}}]}\).

Example 4

Let \({\hbox {obj}} ^{[3]}\) be a signed 3-bit \({{\mathcal {B}}}{{\mathcal {V}}}\) goal to be minimized and \(attr {\mathop {=}\limits ^{\text {\tiny def}}} [100]\) (i.e. \(\mathbf {-4}^{[3]}\)) be its attractor, and \(A {\mathop {=}\limits ^{\text {\tiny def}}} [{\hbox {obj}} [0] = 1, {\hbox {obj}} [1] = 0, {\hbox {obj}} [2] = 0]\) be the corresponding vector of attractor equalities. Consider the three assignments

$$\begin{aligned} \tau _3&{\mathop {=}\limits ^{\text {\tiny def}}}&\{{ A[0], \lnot A[1], \lnot A[2]}\} \ \ \ (\hbox {for which} {\hbox {obj}} ^{[3] = [111], \hbox {i.e} \mathbf {-1}^{[3]}})\\ \tau _3'&{\mathop {=}\limits ^{\text {\tiny def}}}&\{{\lnot A[0], A[1], A[2]}\} \ \ \ (\text{ for } \text{ which } {\hbox {obj}} ^{[3]} = [000], \hbox {i.e} \mathbf {{-}0}^{[3]})\\ \tau _3''&{\mathop {=}\limits ^{\text {\tiny def}}}&\{{ A[0], \lnot A[1], A[2]}\} \ \ \ (\hbox {for which} {\hbox {obj}} ^{[3]} = [110], \hbox {i.e }\mathbf {-2}^{[3]}) \end{aligned}$$

Then \(\tau _3\) is lexicographically better than \(\tau _3'\), because \(\tau _3\) satisfies the attractor equality corresponding to the MSB whereas \(\tau _3'\) does not; \(\tau _3\) is lexicographically worse than \(\tau _3''\) because–all the rest being equal—\(\tau _3''\) makes the attractor equality \(({\hbox {obj}} [2] = 0)\) true. Indeed, \(\tau _3\) is nearer in value to the attractor than \(\tau _3'\) and is farther in value than \(\tau _3''\). \(\diamond \)

3.2 Floating-Point Optimization

We define the Floating-Point Optimization problem as follows.

Definition 4

(OMT\(_{[{\mathcal {F}}{\mathcal {P}}]}({\mathcal {F}}{\mathcal {P}}\cup {\mathcal {T}})\)) Let \(\varphi \) be a \(\text {SMT}(\mathcal {FP} \,\cup \,{\mathcal {T}} )\) formula for some (possibly empty) theory \({\mathcal {T}}\) and \({\hbox {obj}}\) be a \(\mathcal {FP}\) variable occurring in \(\varphi \). We call an Optimization Modulo \(\mathcal {FP}\) problem for \({\mathcal {F}}{\mathcal {P}}\cup {\mathcal {T}}\),OMT\(_{[{\mathcal {F}} {\mathcal {P}}]}({\mathcal {F}}{\mathcal {P}}\cup {\mathcal {T}})\) the problem of finding a \(\mathcal {FP} \,\cup \,{\mathcal {T}} \)-model \({\mathcal {M}}\) for \(\varphi \) (if any) whose value of \({\hbox {obj}}\), is either

  • minimum wrt. the usual total order relation \(\le \) for \(\mathcal {FP}\) numbers, if \(\varphi \) is satisfied by at least one model \({\mathcal {M}}'\) s.t. \({\mathcal {M}}'({\hbox {obj}})\) is not \(\textsc {NaN} \),

  • some binary representation of \(\textsc {NaN} \), otherwise.

(The dual definition where we look for the maximum follows straightforwardly.)

As with \({{\mathcal {B}}}{{\mathcal {V}}}\), the definition is independent on the extra theory \({\mathcal {T}}\), provided that \({\hbox {obj}}\) is a \(\mathcal {FP}\) term. In practice \({\mathcal {T}}\) may be empty, or contain \({{\mathcal {B}}}{{\mathcal {V}}}\) or/and other theories like e.g. that of arrays. Hereafter, unless otherwise specified and when it is not necessary to make \({\mathcal {T}}\) explicit, we will abbreviate “OMT\(_{[{\mathcal {F}} {\mathcal {P}}]}({\mathcal {F}}{\mathcal {P}}\cup {\mathcal {T}})\)” into “\(\text {OMT}_{[\mathcal {FP} ]}\) ”.

Definition 4 is necessarily convoluted because \({\hbox {obj}}\) can be \(\textsc {NaN} \). In fact, in the SMT-LIBv2 standard the comparisons \(\{{\le ,<,\ge ,>}\}\) between \(\textsc {NaN} \) and any other \(\mathcal {FP}\) value are always evaluated false because \(\textsc {NaN} \) has multiple representations at the binary level (see Table 1). Also, requiring the optimal solution to be always different from \(\textsc {NaN} \) makes the resulting OMT\(_{[{\mathcal {F}}{\mathcal {P}}]}\) problem \(\langle {\varphi \wedge \lnot \mathsf {IsNaN({{\hbox {obj}}})}},{{\hbox {obj}}}\rangle \) unsatisfiable when \(\varphi \) is satisfied only by models \({\mathcal {M}}\) s.t. \({\mathcal {M}}({\hbox {obj}})\) is NaN. For these reasons, we admit \(\textsc {NaN} \) as the optimal solution value for \({\hbox {obj}}\) if and only if \(\varphi \) is satisfied only by models \({\mathcal {M}}\) s.t. \({\mathcal {M}}({\hbox {obj}})\) is NaN.

In the rest of this section we assume that we have already checked, in sequence, that

(i):

the input formula \(\varphi \) is satisfiable—by invoking an \(\text {SMT}(\mathcal {FP})\) solver on \(\varphi \). If the solver returns unsat, then there is no need to proceed;

(ii):

\(\varphi \) is satisfied by at least one model \({\mathcal {M}}'\) s.t. \({\mathcal {M}}'({\hbox {obj}})\) is not \(\textsc {NaN} \)—by invoking an \(\text {SMT}(\mathcal {FP})\) solver on \(\varphi \wedge \lnot \mathsf {IsNaN({{\hbox {obj}}})} \) if the model \({\mathcal {M}}\) returned by the previous SMT call is s.t. \({\mathcal {M}}({\hbox {obj}})\) is \(\textsc {NaN} \). If the solver returns unsat, then we conclude that the minimum is NaN.

Thus, we can safely focus our investigation on the restricted OMT\(_{[{\mathcal {F}}{\mathcal {P}}]}\) problem \(\langle {{\varphi _{\mathsf {noNaN}}}},{{\hbox {obj}}}\rangle \), where \({\varphi _{\mathsf {noNaN}}}{\mathop {=}\limits ^{\text {\tiny def}}} \varphi \wedge \lnot \mathsf {IsNaN({{\hbox {obj}}})} \), knowing it is satisfiable.

In Sect. 3.1, we have introduced the concept of a \({{\mathcal {B}}}{{\mathcal {V}}}\) attractor, showing how this value can be used to drive the optimization search towards the optimum value, when minimizing or maximizing a signed or unsigned \({{\mathcal {B}}}{{\mathcal {V}}}\) goal. However, in the case of floating-point optimization, it is not possible to statically determine the attractor value in advance, before the search is even started. This is due to the more complex representation of \(\mathcal {FP}\) variables, which uses three separate Bit-Vectors (i.e. sign, exponent and significand), and the presence of various classes of special values (i.e. zeros, infinity, NaN), which make Definition 2 ambiguous for \(\mathcal {FP}\) optimization. We illustrate this problem with the following example.

Example 5

Let \(\langle {{\varphi _{\mathsf {noNaN}}}},{{\hbox {obj}}}\rangle \) be an OMT\(_{[{\mathcal {F}}{\mathcal {P}}]}\) problem where \({\hbox {obj}}\) is a \(\mathcal {FP}\) objective, of sort (_ FP 3 5), to be minimized. To make our explanation easier to follow, we show in Table 1 a short list of sample values for an \(\mathcal {FP}\) variable of the same sort as \({\hbox {obj}}\). Each \(\mathcal {FP}\) value is represented as a triplet of bit-vectors \(\langle \mathbf {sign}, \mathbf {exp}, \mathbf {sig} \rangle \)—following the SMT-LIBv2 conventions described in Sect. 2—and also in decimal notation.

Table 1 Sample values for a \(\mathcal {FP}\) variable with sort (_ FP 3 5)

From Table 1, we immediately notice that the binary representation of both the exponent and the significant of a Floating-Point number grows in opposite directions in the positive and in the negative domains. In addition, by sorting the values according to their binary representation, we observe that \(\mathtt {-\infty }\) [resp. \(\mathtt {+\infty }\) ] is not the smallest [resp. greatest] representable \(\mathcal {FP}\) value in the negative [resp. positive] domain. In fact, both extreme ends of the table are occupied by NaN, which has multiple binary representations.

In what follows, we temporarily disregard the effects of unit-propagation, which might assign some (or all) bits of \({\hbox {obj}} \) as a result of some constraints in \({\varphi _{\mathsf {noNaN}}}\), and pick some values as candidate attractors for an \(\mathcal {FP}\) goal to be minimized.

Assume that the optimal value of the \(\mathcal {FP}\) goal is the sub-normal \(\mathcal {FP}\) value (fp #b1 #b000 #b1111) (i.e. \(\frac{-15}{64}\)). Suppose that the attractor is chosen to be equal to the value \(\mathtt {-\infty }\) listed at row 9 in Table 1, which is the smallest \(\mathcal {FP}\) value wrt. total order relation \(\le \) for \(\mathcal {FP}\) numbers. Then, it can be seen that after both the sign and the exponent bits have been decided to be equal #b1 and #b000 respectively, the remaining bits of the attractor pull the search in the wrong direction, that is, towards \(0^-\).\(\diamond \)

Selecting a different \(\mathcal {FP}\) value as candidate attractor would not solve the problem; rather, it would result in a different set of issues. For instance, an attractor equal to the NaN value listed at row 10 in Table 1, which is the smallest representable \(\mathcal {FP}\) value according to the binary ordering, would solve the problem for the previous case in which the optimum \(\mathcal {FP}\) value is (fp #b1 #b000 #b1111). However, this attractor would remain an unsuitable choice for an OMT\(_{[{\mathcal {F}}{\mathcal {P}}]}\) instance where \({\hbox {obj}}\) is forced to be positive, because after the sign bit of the objective function has been decided to be equal #b0 the remaining bits of the attractor drive the search in the wrong direction, that is, towards \(\mathtt {+\infty }\). \(\diamond \)

Since there is no statically-determined \(\mathcal {FP}\) value that can be used as an attractor when dealing with floating-point optimization, we introduce the new concept of dynamic attractor.

Definition 5

(Dynamic attractor) Let \(\langle {{\varphi _{\mathsf {noNaN}}}},{{\hbox {obj}}}\rangle \) be a restricted \(\text {OMT}_{[\mathcal {FP} ]}\) problem, where \({\varphi _{\mathsf {noNaN}}}{\mathop {=}\limits ^{\text {\tiny def}}} \varphi \wedge \lnot \mathsf {IsNaN({{\hbox {obj}}})} \) is a satisfiable \(\text {SMT}(\mathcal {FP})\) formula and \({\hbox {obj}}\) is a \(\mathcal {FP}\) objective to be minimized [resp. maximized]. Let \(k\in [0..n]\) and \(\tau _k\) be an assignment to the k most-significant bits of \({\hbox {obj}}\).

Then, we say that an \(\mathcal {FP}\)-value \(attr _{\tau _k} \) for \({\hbox {obj}}\) is a dynamic attractor for \({\hbox {obj}}\) wrt. \(\tau _k\) iff it is the smallest [resp. largest] \(\mathcal {FP}\) value different from NaN s.t. the k most-significant bits of \(attr _{\tau _k}\) have the same value of the k most-significant bits of \({\hbox {obj}}\) in \(\tau _k\). We call vector of attractor equalities the vector \(A_{\tau _k} \) s.t. \(A_{\tau _k}[i] {\mathop {=}\limits ^{\text {\tiny def}}} ({\hbox {obj}} [i] = attr _{\tau _k} [i])\), \(i\in [0..n-1]\).

The following fact derives from the above definitions and the properties of IEEE 754-2008 standard representation adopted by SMT-LIBv2 standard for \(\mathcal {FP}\).

Lemma 1

Let \(\langle {{\varphi _{\mathsf {noNaN}}}},{{\hbox {obj}}}\rangle \) be a restricted minimization \({[resp. maximization]}\) \(\text {OMT}_{[\mathcal {FP} ]}\) problem, let \(\tau _k\) be an assignment to \({\hbox {obj}} [0]...{\hbox {obj}} [k-1] \) and \(attr _{\tau _k} \) be its corresponding dynamic attractor, for some \(k\in [0..n-1]\). Let \(\tau _{k+1}{\mathop {=}\limits ^{\text {\tiny def}}} \tau _k\cup \{{{\hbox {obj}} [k]:=attr _{\tau _k} [k]}\} \) and \(\tau '_{k+1}{\mathop {=}\limits ^{\text {\tiny def}}} \tau _k\cup \{{{\hbox {obj}} [k]:=\overline{attr _{\tau _k} [k]}}\} \), and let \({\mathcal {M}}\), \({\mathcal {M}}'\) two models for \({\varphi _{\mathsf {noNaN}}}\) which extend \(\tau _{k+1}\) and \(\tau '_{k+1}\) respectively.

Then \({\mathcal {M}}({\hbox {obj}})\le {\mathcal {M}}'({\hbox {obj}})\) \({[resp. {\mathcal {M}}({\hbox {obj}})\ge {\mathcal {M}}'({\hbox {obj}})]}.\)

Proof

(We prove the case of minimization, since the case of maximization is dual wrt. the value of the sign bit.) We distinguish three cases based on the value of k.

Case \(k = 0\) (sign bit). Then \(attr _{\tau _0} [0] = 1\), \(\tau _{1} = \{{{\hbox {obj}} [0] =1}\} \) and \(\tau '_{1} = \{{{\hbox {obj}} [0] =0}\} \), where \({\hbox {obj}} [0] \) is the MSB of \({\hbox {obj}} \) and represents the sign of the floating-point value. Then \({\hbox {obj}} \) is smaller or equal zero in every model \({\mathcal {M}}\) and larger or equal zero in every model \({\mathcal {M}}'\) of \({\varphi _{\mathsf {noNaN}}}\), so that \({\mathcal {M}}({\hbox {obj}})\le {\mathcal {M}}'({\hbox {obj}})\) is verified.

Case \(k \in [1..ebits]\) (exponent bits), where ebits is the number of bits in the exponent of \({\hbox {obj}} \). Then, \(attr _{\tau _k} [k] \) is 1 if \(\tau _k[0]=1\) and 0 otherwise.

In the first case, \({\hbox {obj}} \) can only be negative-valued in both \({\mathcal {M}}\) and \({\mathcal {M}}'\). More precisely, \({\mathcal {M}}({\hbox {obj}})\) can be either \(\mathtt {-\infty } \) or a normal negative value, whereas \({\mathcal {M}}'({\hbox {obj}})\) can be either a normal or a sub-normal negative value. Hereafter, we consider only the case in which both have a normal negative value, because the case in which \({\mathcal {M}}({\hbox {obj}}) = \mathtt {-\infty } \) or \({\mathcal {M}}'({\hbox {obj}})\) is sub-normal are both trivial, given that the absolute value of any sub-normal \(\mathcal {FP}\) number is smaller than the absolute value of any non-zero normal \(\mathcal {FP}\) number. Furthermore, we disregard the significand bits in \({\mathcal {M}}\) and \({\mathcal {M}}'\) because their contribution to the value of \({\hbox {obj}} \) is always less significant than that of the bits in the exponent. Given these premises, the exponent value of \({\hbox {obj}} \) in every possible \({\mathcal {M}}\) is larger than the exponent of \({\hbox {obj}} \) in every possible \({\mathcal {M}}'\) by a value equal to \(2^{ebits-k}\) and therefore, given that both \({\mathcal {M}}({\hbox {obj}})\) and \({\mathcal {M}}'({\hbox {obj}})\) are negative-valued, \({\mathcal {M}}({\hbox {obj}})\le {\mathcal {M}}'({\hbox {obj}})\).

The case in which \(\tau _k[0]=0\), that is when \({\hbox {obj}} \) can only be positive-valued in both \({\mathcal {M}}\) and \({\mathcal {M}}'\), is dual.

Case \(k > ebits\) (significand bits). Then there are three sub-cases.

If for every \(i \in [1..ebits]\) the value of \(\tau _k[i]\) is equal 1, then the only possible value of \({\mathcal {M}}({\hbox {obj}})\) for every possible \({\mathcal {M}}\) is \(\mathtt {+\infty } \), and therefore \(attr _{\tau _k} [k] = 0\). On the other hand, there exists no possible model \({\mathcal {M}}'\) of \({\varphi _{\mathsf {noNaN}}}\), because the assignment \({\hbox {obj}} [k] = 1\) would imply \({\hbox {obj}} \) being equal to \(\textsc {NaN} \), so that the statement \({\mathcal {M}}({\hbox {obj}})\le {\mathcal {M}}'({\hbox {obj}})\) is vacuously true.

If instead there is some \(i \in [1..ebits]\) s.t. \(\tau _k[i] = 0\), then \(attr _{\tau _k} [k] \) is 1 if \(\tau _k[0] = 1\) (i.e. \({\hbox {obj}}\) is negative-valued) and 0 otherwise (i.e. \({\hbox {obj}}\) is positive-valued). In both cases, we can disregard the exponent bits in \({\mathcal {M}}\) and \({\mathcal {M}}'\) because their contribution to the value of \({\hbox {obj}} \) is the same in either model. For the same reasons, since \({\mathcal {M}}({\hbox {obj}})\) and \({\mathcal {M}}'({\hbox {obj}})\) can only be either both normal or both sub-normal, we can ignore the contribution of the leading hidden bit and focus on the bits of the significand.

When \(\tau _k[0] = 1\) and \({\hbox {obj}}\) must be negative-valued, the decimal value of the significand in \({\mathcal {M}}\) is larger than the decimal value of every possible significand in \({\mathcal {M}}'\) by exactly \(2^{{}-(k-ebits)}\). Given that both \({\mathcal {M}}({\hbox {obj}})\) and \({\mathcal {M}}'({\hbox {obj}})\) are negative-valued, we have that \({\mathcal {M}}({\hbox {obj}})\le {\mathcal {M}}'({\hbox {obj}})\).

The case in which \(\tau _k[0]=0\), that is when \({\hbox {obj}}\) can only be positive-valued in both \({\mathcal {M}}\) and \({\mathcal {M}}'\), is dual. \(\square \)

Notice that Lemma 1 states “\({\mathcal {M}}({\hbox {obj}})\le {\mathcal {M}}'({\hbox {obj}})\)” and not “\({\mathcal {M}}({\hbox {obj}})<{\mathcal {M}}'({\hbox {obj}})\)” because, e.g., we may have \({\mathcal {M}}({\hbox {obj}})=0^-\) and \({\mathcal {M}}'({\hbox {obj}})=0^+\), and \((0^-<0^+)\) is false in \(\mathcal {FP}\).

Lemma 1 states that, given the current assignment \(\tau _k\) to the k most-significant-bits of \({\hbox {obj}}\), \({\hbox {obj}} [k] =attr _{\tau _k} [k] \) is always the best extension of \(\tau _k\) to the next bit (when consistent). A dynamic attractor \(attr _{\tau _k}\) can thus be used by the optimization search to guide the assignment of the \(k+1\)-th bit of \({\hbox {obj}} \) towards the direction of maximum gain which is allowed by \(\tau _k\), so that to obtain the “best” extension \(\tau _{k+1}\) of \(\tau _k\). Once the (new) assignment \(\tau _{k+1}\) is found, the \(\text {OMT}\) solver can compute the dynamic attractor \(attr _{\tau _{k+1}} \) for \({\hbox {obj}} \) wrt. \(\tau _{k+1}\) and then use it to assign the \(k+2\)-th bit of \({\hbox {obj}} \), and so on.

Let \(\langle {{\varphi _{\mathsf {noNaN}}}},{{\hbox {obj}}}\rangle \) be an \(\text {OMT}_{[\mathcal {FP} ]}\) instance, s.t. \({\hbox {obj}} \) is a \(\mathcal {FP}\) variable of n bits, and \(\tau _0\) be an initially empty assignment. If at each step of the optimization search the assignment of the k-th bit of \({\hbox {obj}} \) is guided by the dynamic attractor for \({\hbox {obj}} \) wrt. \(\tau _k\), then the corresponding sequence of n dynamic attractors (of increasing order k) is unique and depends exclusively on \({\varphi _{\mathsf {noNaN}}}\). Intuitively, this is the case because the (current) dynamic attractor always points in the direction of maximum gain. We illustrate this in the following example.

Example 6

Let \(\langle {{\varphi _{\mathsf {noNaN}}}},{{\hbox {obj}}}\rangle \) be an \(\text {OMT}_{[\mathcal {FP} ]}\) problem where \({\hbox {obj}}\) is a \(\mathcal {FP}\) objective, of sort (_ FP 3 5), to be minimized, as in Example 5. At the beginning of the search, nothing is known about the structure of the solution. Therefore, \(\tau _0 = \emptyset \) and, since \({\hbox {obj}}\) is being minimized, the dynamic attractor \(attr _{\tau _0} \) for \({\hbox {obj}} \) wrt. \(\tau _0\) is (fp #b1 #b111 #b0000) (i.e. \(\mathtt {-\infty }\)), which gives a preference to any feasible value of \({\hbox {obj}}\) in the negative domain.

If we discover that the domain of the objective function can only be positive, so that the first bit of \({\hbox {obj}} \) is permanently set to 0 in \(\tau _1\), then the new dynamic attractor for \({\hbox {obj}}\) wrt. \(\tau _1\) (i.e. \(attr _{\tau _1} \)) is equal to (fp #b0 #b000 #b0000) (i.e. \(0^+\)). Otherwise, \(attr _{\tau _i} \) remains \(\mathtt {-\infty }\) until, e.g., we discover there is no solution \(\le -8\) so that the second bit in the exponent is forced to 0. Then \(attr _{\tau _3} \) becomes (fp #b1 #b101 #b1111) (i.e., \(\frac{-31}{4}\)). ) Notice that all significand bits in the attractor pass from 0 to 1 because now we have a finite solution. \(\diamond \)

Definition 6

(Attractor trajectory \({\mathcal {A}}_\varphi \)) We consider the restricted \(\text {OMT}_{[\mathcal {FP} ]}\) problem \(\langle {{\varphi _{\mathsf {noNaN}}}},{{\hbox {obj}}}\rangle \) s.t. \({\varphi _{\mathsf {noNaN}}}{\mathop {=}\limits ^{\text {\tiny def}}} \varphi \wedge \lnot \mathsf {IsNaN({{\hbox {obj}}})} \) as in Definition 5, a triplet of inductively-defined sequences \(\langle { \{{\tau _{0}, \tau _{1}, ..., \tau _{n}}\}, \{{attr _{\tau _0}, attr _{\tau _1}, ...., attr _{\tau _n}}\}, \{{A_{\tau _0}, A_{\tau _1}, ..., A_{\tau _n}}\} }\rangle \)—where each \(\tau _k\) is an assignment to the first k most-significant bits of \({\hbox {obj}} \) s.t. \(\tau _k \subset \tau _{k+1}\), \(attr _{\tau _k}\) is its corresponding dynamic attractor and \(A_{\tau _k}\) is its corresponding vector of attractor equalities—so that, for every \(k\in [0..n-1]\):

  1. (i)

    \(\tau _{k+1}[k]= \overline{attr _{\tau _k} [k]} \) if \({\varphi _{\mathsf {noNaN}}}\wedge \tau _k \wedge A_{\tau _k}[k] \) is unsatisfiable,

  2. (ii)

    \(\tau _{k+1}[k] = attr _{\tau _k} [k] \) otherwise.

Then we define the attractor trajectory \({\mathcal {a}}_{\varphi }\) as the vector \([A_{\tau _0}[0], ..., A_{\tau _{n-1}}[n-1] ]\).

The attractor trajectory \({\mathcal {a}}_{\varphi }\) contains those attractor equalities \(({\hbox {obj}} [k] = attr _{\tau _k} [k])\) which are of critical importance for the decisions taken by the optimization search. Intuitively, this is the case because the value of the k-th bit of \({\hbox {obj}}\) (i.e. \({\hbox {obj}} [k]\)) is still undecided in \(\tau _k\).

Example 7

Let \(\langle {{\varphi _{\mathsf {noNaN}}}},{{\hbox {obj}}}\rangle \) be a restricted \(\text {OMT}_{[\mathcal {FP} ]}\) problem where \({\hbox {obj}}\) is a \(\mathcal {FP}\) objective, of sort (_ FP 3 5), to be minimized, as in Example 5. We consider the case in which the input formula \({\varphi _{\mathsf {noNaN}}}\) requires \({\hbox {obj}} \) to be larger or equal \(\frac{29}{2}\) and it does not impose any other constraint on the value of \({\hbox {obj}} \). Given the sequence of (partial) assignments \(\tau _{0}, ..., \tau _{8}\) in Fig. 1, the corresponding list of dynamic attractors and the corresponding vectors of attractor equalities, then the attractor trajectory \({\mathcal {A}}_\varphi \) is equal to the vector \([{\hbox {obj}} [0] = 1, {\hbox {obj}} [1] = 0, {\hbox {obj}} [2] = 0, {\hbox {obj}} [3] = 0, {\hbox {obj}} [4] = 0, {\hbox {obj}} [5] = 0, {\hbox {obj}} [6] = 0, {\hbox {obj}} [7] = 0]\). \(\diamond \)

Lemma 2

Consider \(\langle {{\varphi _{\mathsf {noNaN}}}},{{\hbox {obj}}}\rangle \), \(\tau _0, ..., \tau _n\), \(attr _{\tau _0}, ...., attr _{\tau _n} \), \(A_{\tau _0}, ..., A_{\tau _n} \), and \({\mathcal {a}}_{\varphi } \) as in Definition 6. Then \(\tau _n\) lexicographically maximizes \({\mathcal {a}}_{\varphi } \) wrt. \({\varphi _{\mathsf {noNaN}}}\).

Proof

By Definition 6, we have that, for each \(k\in [0..n-1]\),

(i):

\(\tau _{k+1}[k]= \overline{attr _{\tau _k} [k]}\) if \({\varphi _{\mathsf {noNaN}}}\wedge \tau _k \wedge A_{\tau _k}[k] \) is unsatisfiable,

(ii):

\(\tau _{k+1}[k] = attr _{\tau _k} [k] \) otherwise.

By construction, \(\tau _k = [\![\tau _n]\!]_{k} \). Therefore, we can replace \(\tau _k\) with \([\![\tau _n]\!]_{k} \) so that

(i):

\([\![\tau _n]\!]_{k+1} [k]= \overline{attr _{[\![\tau _n]\!]_{k}} [k]}\) if \({\varphi _{\mathsf {noNaN}}}\wedge [\![\tau _n]\!]_{k} \wedge A_{[\![\tau _n]\!]_{k}}[k] \) is unsatisfiable,

(ii):

\([\![\tau _n]\!]_{k+1} [k] = attr _{[\![\tau _n]\!]_{k}} [k] \) otherwise.

We notice the following facts. For each \(k\in [0..n-1]\), \([\![\tau _n]\!]_{k} \subset \tau _n\). Furthermore, for each \(k\in [0..n-1]\), \({\mathcal {A}}_\varphi {k} = A_{[\![\tau _n]\!]_{k}}[k] \) because \({\mathcal {A}}_\varphi {k} = A_{\tau _k}[k] \) by the definition of attractor trajectory, and \(A_{\tau _k}[k] = A_{[\![\tau _n]\!]_{k}}[k] \) by the equality \(\tau _k = [\![\tau _n]\!]_{k} \). Thus, we can replace \([\![\tau _n]\!]_{k+1} \) with \(\tau _n\) and \(A_{[\![\tau _n]\!]_{k}}[k] \) with \({\mathcal {A}}_\varphi {k}\), as follows. For each \(k\in [0..n-1]\),

(i):

\(\tau _{n}[k]= \overline{attr _{\tau _n} [k]}\) if \({\varphi _{\mathsf {noNaN}}}\wedge [\![\tau _n]\!]_{k} \wedge {\mathcal {A}}_\varphi {k}\) is unsatisfiable,

(ii):

\(\tau _{n}[k] = attr _{\tau _n} [k] \) otherwise.

Hence, \(\tau _n\) lexicographically maximizes \({\mathcal {A}}_\varphi \) wrt. \({\varphi _{\mathsf {noNaN}}}\). \(\square \)

Fig. 1
figure 1

An example of \(\mathcal {FP}\) optimization using the dynamic attractor. (“[...]” denotes the value of the attractor \(attr _{\tau _i}\). “\(\Longrightarrow \textsc {sat}/\textsc {unsat} \)” denotes the satisfiability of \({\varphi _{\mathsf {noNaN}}}\wedge \tau _k\wedge A_{\tau _k}[k] \). For ease of illustration, we have underlined the critical bit \(attr _{\tau _k} [k]\) in the attractors and each attractor equality of the attractor trajectory \({\mathcal {a}}_{\varphi }\) inside the vectors of attractor equalities.)

Finally, we make the following two observations. The first is that the sequence \(\tau _0, ..., \tau _{n}\) in Definition 6 can be iteratively constructed using its list of requirements, for instance, by means of a sequence of incremental calls to an SMT solver. The second, more important, observation is that \(\tau _n\) corresponds to the assignment of values which makes \({\hbox {obj}} \) optimal in \({\varphi _{\mathsf {noNaN}}}\). Using the above definitions, we show that the following fact holds.

Theorem 2

Let \(\langle {{\varphi _{\mathsf {noNaN}}}},{{\hbox {obj}}}\rangle \), \(\tau _0, ..., \tau _n\), \(attr _{\tau _0}, ...., attr _{\tau _n} \), \(A_{\tau _0}, ..., A_{\tau _n} \), and \({\mathcal {a}}_{\varphi } \) be as in Definition 6. Then, any model \({\mathcal {M}}\) of \({\varphi _{\mathsf {noNaN}}}\) which lexicographically maximizes the attractor trajectory \({\mathcal {a}}_{\varphi }\) is an optimal solution for the \(\text {OMT}_{[\mathcal {FP} ]}\) problem \(\langle {{\varphi _{\mathsf {noNaN}}}},{{\hbox {obj}}}\rangle \).

Proof

(We prove the case of minimization, since the case of maximization is dual.)

By Lemma 2 we have that \(\tau _n\) lexicographically maximize \({\mathcal {A}}_\varphi \). Let \({\mathcal {M}}\) be a model of \({\varphi _{\mathsf {noNaN}}}\) which lexicographically maximizes \({\mathcal {A}}_\varphi \), and let \(\mu \) be its restriction to \({\hbox {obj}}\). Since both \(\tau _n\) and \({\mathcal {M}}\) lexicographically maximize \({\mathcal {A}}_\varphi \), from the uniqueness of \(\tau _n\), we immediately notice that \(\mu = \tau _n\), so that \(\tau _k = [\![\mu ]\!]_{k} \) for each \(k\in [0..n]\) and \(\mu \) lexicographically maximize \({\mathcal {A}}_\varphi \).

By definition, \({\mathcal {M}}\) is an optimal solution for \(\langle {{\varphi _{\mathsf {noNaN}}}},{{\hbox {obj}}}\rangle \) iff there exists no other model \({\mathcal {M}}'\) for it s.t. \({\mathcal {M}}'({\hbox {obj}}) < {\mathcal {M}}({\hbox {obj}})\). Hence, we show by contradiction that no such \({\mathcal {M}}'\) can exist.

Assume (for the sake of contradiction), that there exists a model \({\mathcal {M}}'\) for \({\varphi _{\mathsf {noNaN}}}\), s.t. \({\mathcal {M}}'({\hbox {obj}}) < {\mathcal {M}}({\hbox {obj}})\), and let \(\mu '\) be the restriction of \({\mathcal {M}}'\) to \({\hbox {obj}}\). Then there must be at least one index i for which \(\mu [i] \ne \mu '[i]\). Let \(m\) be the smallest such index. Recalling that \(\tau _{m} = [\![\mu ]\!]_{m} \) and \(\tau _{m +1} = [\![\mu ]\!]_{m +1} \), we set \(\tau _{m +1}' {\mathop {=}\limits ^{\text {\tiny def}}} [\![\mu ']\!]_{m +1} \). Then, \(\tau _{m} \subset \tau _{m +1}\), \(\tau _{m} \subset \tau _{m +1}'\), \(\tau _{m +1} \ne \tau _{m +1}'\). In particular, \(\tau _{m +1}[m ] = \overline{\tau _{m +1}'[m ]}\) and therefore \(\tau _{m +1}[m ] = attr _{\tau _{m}} [m ] \) if \(\tau _{m +1}'[m ] = \overline{attr _{\tau _{m}} [m ]}\), and vice versa.

Then, we distinguish two cases.

In the first case, \(\tau _{m +1}[m ] = \overline{attr _{\tau _{m}} [m ]}\) and \(\tau _{m +1}'[m ] = attr _{\tau _{m}} [m ] \) . From \(\tau _{m +1}[m ] = \overline{attr _{\tau _{m}} [m ]}\) and the fact that \(\mu \) lexicographically maximizes \({\mathcal {A}}_\varphi \), we derive that \({\varphi _{\mathsf {noNaN}}}\wedge \tau _{m} \wedge {\mathcal {a}}_{\varphi }[m ] \) is unsatisfiable, where \({\mathcal {A}}_\varphi {m} {\mathop {=}\limits ^{\text {\tiny def}}} ({\hbox {obj}} [m ] = attr _{\tau _{m}} [m ])\). Since \(\tau _{m} \subset \tau _{m +1}' \subseteq \mu '\) and \(\tau _{m +1}'[m ] = attr _{\tau _{m}} [m ] \), we conclude that \({\varphi _{\mathsf {noNaN}}}\wedge \mu ' \models \bot \), so that \({\mathcal {M}}'\) cannot be a model of \({\varphi _{\mathsf {noNaN}}}\), contradicting the initial assumption.

In the second case, \(\tau _{m +1}[m ] = attr _{\tau _{m}} [m ] \) and \(\tau _{m +1}[m ] = \overline{attr _{\tau _{m}} [m ]}\) . Therefore, by Lemma 1, for every pair of models \({\mathcal {M}}_1\), \({\mathcal {M}}_2\) for \({\varphi _{\mathsf {noNaN}}}\) which extend respectively \(\tau _{m +1}\) and \(\tau _{m +1}'\) we have that \({\mathcal {M}}_1({\hbox {obj}}) \le {\mathcal {M}}_2({\hbox {obj}})\). Since \(\tau _{m +1} = [\![\mu ]\!]_{m +1} \) and \(\tau _{m +1}' = [\![\mu ']\!]_{m +1} \), it follows that \({\mathcal {M}}'({\hbox {obj}}) \not < {\mathcal {M}}({\hbox {obj}})\), contradicting the initial assumption. \(\square \)

4 \(\text {OMT}_{[\mathcal {FP} ]}\) Procedures

In this paper, we consider two approaches for dealing with \(\text {OMT}_{[\mathcal {FP} ]}\): a baseline linear/binary search, based on the inline \(\text {OMT}\) schema for \({{\mathcal {L}}}{{\mathcal {A}}}{\mathcal {A}}\) objectives presented in [39], and Floating-Point Optimization with Binary Search (ofp-bs), a brand-new engine inspired by the obv-bs algorithm for unsigned bit-vectors in [32] and by Theorem 2 and relative definitions in Sect. 3.2.

4.1 \(\text {OMT}\)-Based Approach

The \(\text {OMT}\)-based approach for \(\text {OMT}_{[\mathcal {FP} ]}\) adapts the linear- and binary-search schemata for \(\text {OMT}\) with \({{\mathcal {L}}}{{\mathcal {A}}}{\mathcal {A}}\) objectives presented in [39] to deal with \(\mathcal {FP}\) objectives.

In the basic linear-search schema, the optimization search is advanced by means of a sequence of linear cuts, each of which forces the \(\text {OMT}\) solver to look for a new model \({\mathcal {M}}'\) which improves the value of \({\hbox {obj}} \) wrt. the most recent model \({\mathcal {M}}\). In the binary-search schema, instead, the \(\text {OMT}\) solver learns an incremental sequence of cuts which bisect the current domain of the objective function. For clarity, we recap here the essential elements of the binary-search schema presented in [38, 39]. At the beginning of the optimization search and following each update of the lower- (lb) and upper- (ub) bounds of \({\hbox {obj}} \), the \(\text {OMT}\) solver computes a pivoting value \(\mathsf {pivot} {\mathop {=}\limits ^{\text {\tiny def}}} \mathtt{floor}(\rho \cdot ub + (1 - \rho ) \cdot lb)\), for some value of \(\rho \) (e.g. \(\frac{1}{2}\)). If \(\mathsf {pivot} \) lies inside the range ]lbub], a cut of the form \(({\hbox {obj}} < \mathsf {pivot})\) is learned. Otherwise, if—due to rounding side-effects of \(\mathcal {FP}\) operations—\(\mathsf {pivot} \) lies outside the range ]lbub], a cut of the form \(({\hbox {obj}} < \mathsf{ub}^{}_{})\) is learned instead. If the cut is satisfiable, the upper-bound of \({\hbox {obj}} \) is updated with a new model value of \({\hbox {obj}} \). Otherwise, the lower-bound is made equal to \(\mathsf {pivot} \) [resp. \(\mathsf{ub}^{}_{} \)]. The algorithm terminates when the search interval [lbub[ becomes empty. In general, it is reasonable to expect the binary-search schema to converge towards the optimal solution faster than the linear-search schema, because the feasible domain of a \(\mathcal {FP}\) goal can be comprised by an exponentially large number of values (wrt. the bit-width of the cost function).

In either schema, whenever the optimization engine encounters for the first time a solution s.t. \({\hbox {obj}} = \textsc {NaN} \), the \(\text {OMT}\) solver learns a unit-clause of the form \(\lnot (\textsc {isNaN}({\hbox {obj}}))\) so as to look for an optimal solution different from \(\textsc {NaN} \) (if any).

When dealing with \(\mathcal {FP}\) objectives, differently from the case of \(\mathcal {LRA}\) in [39], it is not necessary to implement a specialized optimization procedure within the \(\mathcal {FP}\) -Solver in order to guarantee the termination of the optimization search. Indeed, such procedure is not available when Floating-Point terms are bit-blasted into bit-vectors eagerly, or when the acdcl \(\mathcal {FP}\)-Solver is used, because by the time the optimization procedure is called the domain interval of any \(\mathcal {FP}\) term contains a singleton value. Conversely, such a minimization procedure could be envisaged when the \(\text {OMT}\) solver uses a lazy \(\mathcal {FP}\) -Solver as back-end, so as to speed-up the convergence towards the optimal solutionFootnote 7.

4.2 Floating-Point Optimization with Binary Search

The Floating-Point Optimization with Binary Search algorithm, ofp-bs, is a new engine for OMT\(_{[{\mathcal {F}}{\mathcal {P}}]}({\mathcal {F}}{\mathcal {P}} \cup {\mathcal {T}})\)–hereafter simply \(\text {OMT}_{[\mathcal {FP} ]}\)–which is inspired by the obv-bs algorithm for \(\text {OMT}_{[{{\mathcal {B}}}{{\mathcal {V}}} ]}\) [32] and implements Definition 6 and Theorem 2. Here \({\mathcal {T}}\) may be empty, or contain \({{\mathcal {B}}}{{\mathcal {V}}}\) and other theories (e.g. that of arrays). We assume that an \({\text {SMT}({{{\mathcal {B}}}{{\mathcal {V}}} \cup \mathcal {FP} \cup {\mathcal {T}}})}\)-solving procedure is available—hereafter simply “SMT ”—even when \({{\mathcal {B}}}{{\mathcal {V}}}\) is not part of \({\mathcal {T}}\), because we need accessing explicitly to each bit in \({\hbox {obj}}\), which is not possible with plain \(\mathcal {FP}\).

The optimization search tries to lexicographically maximize the (implicit) attractor trajectory vector \({\mathcal {A}}_\varphi \), which is incrementally derived from the current value of the dynamic attractor. The raw value of the dynamic attractor’s bits drive the optimization search towards the direction of maximum gain at any given point in time, without disrupting any decision that has been already made. The dynamic attractor is incrementally updated along the search, based on the outcome of the previous rounds of the optimization search. At each round, one bit of the objective function is assigned its final value. The first round decides the sign, the next batch of rounds decides the exponent, and the remaining rounds decide the fine-grained details of the significand.

Fig. 2
figure 2

ofp-bs Algorithm for floating-point optimization

Fig. 3
figure 3

The function update\(\_\) dynamic\(\_\) attractor()

The pseudo-code of ofp-bs is shown in Fig. 2. The arguments of the algorithm are the input formula \(\varphi \) and the \(\mathcal {FP}\) objective \({\hbox {obj}} \), where \({\hbox {obj}} \) is a \(\mathcal {FP}\) variable with ebits bits in the exponent, \(sbits - 1\) in the significand and \(n {\mathop {=}\limits ^{\text {\tiny def}}} ebits + sbits\) bits overall.

The procedure starts by checking whether the input formula \(\varphi \) is satisfiable and immediately terminates if this is not the case (rows 1–3). If \({\mathcal {M}}({\hbox {obj}}) = \textsc {NaN} \), then the procedure checks whether there exists a model \({\mathcal {M}}'\) for \(\varphi \wedge \lnot \mathsf {IsNaN({{\hbox {obj}}})} \) (rows 4–5). If this is not the case, the procedure terminates immediately and returns the pair \(\langle {\textsc {sat}},{{\mathcal {M}}}\rangle \) (row 7). Otherwise, the model \({\mathcal {M}}\) is updated with the new model \({\mathcal {M}}'\) (row 9). In every case, \(\varphi \) is permanently extended with the constraint \(\lnot \mathsf {IsNaN({{\hbox {obj}}})} \) (row 10).

At this point, the procedure initializes the value of the dynamic attractor by invoking an external function update_dynamic_attractor() with the empty assignment \(\tau \) as parameter, so that the returned value is equal to \(\mathtt {-\infty }\) when minimizing and \(\mathtt {+\infty }\) when maximizing (rows 11–12). Then, the execution moves to the section of code implementing the core part of the ofp-bs algorithm (rows 13–24), which consists of a loop over the bits of \({\hbox {obj}}\), starting from the MSB \({\hbox {obj}} [0] \) down to the LSB \({\hbox {obj}} [n-1] \) (Fig. 3).

Inside this loop, ofp-bs first checks whether the value of \({\hbox {obj}} [i] \) in \({\mathcal {M}}\) matches the i-th bit of the (current) dynamic attractor \(attr _{\tau } \). If this is the case, then the i-th bit is already set to its “best” value in \({\mathcal {M}}\). Thus, the assignment \(\tau \) is extended so as to permanently set \({\hbox {obj}} [i] = attr _{\tau } [i] \) (row 16), and the optimization search moves to the next iteration of the loop. If instead \({\hbox {obj}} [i] \ne attr _{\tau } [i] \) in \({\mathcal {M}}\), we need to verify whether the value of the objective function in \({\mathcal {M}}\) can be improved by forcing the i-th bit of \({\hbox {obj}} \) equal to the i-th bit of the dynamic attractor. To do so, we incrementally invoke the underlying SMT solver, this time checking the satisfiability of \(\varphi \) under the list of assumptions \(\tau \cup \{ {\hbox {obj}} [i] = attr _{\tau } [i] \}\) (row 18). If the SMT solver returns \(\textsc {sat} \), then the value of the objective function has been successfully improved. Hence, \(\tau \) is extended with an assignment setting \({\hbox {obj}} [i] \) equal to \(attr _{\tau } [i] \), and \({\mathcal {M}}\) is replaced with the new model \({\mathcal {M}}'\) (rows 20–21). Otherwise, it is not possible to improve the objective function by toggling the value of \({\hbox {obj}} [i] \), and \(\tau \) is extended so as to permanently set \({\hbox {obj}} [i] \ne attr _{\tau } [i] \) (row 23). At this point, there is a mismatch between the value of the first \(i+1\) bits of \({\hbox {obj}}\) in \({\mathcal {M}}\), corresponding to the assignment \(\tau \), and those of the current dynamic attractor. This mismatch is resolved by calling the function update\(\_\) dynamic\(\_\) attractor() with the updated assignment \(\tau \) and the current loop iteration index i as parameters (row 24). In either case, the execution moves to the next iteration of loop.

After exactly n iterations of the loop, the optimization search terminates with the pair \(\langle {\textsc {sat}},{{\mathcal {M}}}\rangle \), where \({\mathcal {M}}\) is the optimum model of the given \(\text {OMT}_{[\mathcal {FP} ]}\) instance. The ofp-bs algorithm requires at most \(n+2\) incremental calls to an underlying \(\text {SMT}(\mathcal {FP})\) solver. The test in rows \(15-16\) allows for saving lots of such SMT calls when the current model already assigns \({\hbox {obj}} [i]\) to its corresponding value in the attractor.

The function update_dynamic_attractor() takes as input \(\tau \), a (partial) assignment over the k most-significant bits of \({\hbox {obj}} \), and i, the index of of the current loop iteration in ofp-bs . When \({\hbox {obj}} \) is minimized (The implementation is dual when \({\hbox {obj}} \) is maximized), the procedure essentially works as follows. If \(\tau = \emptyset \), then nothing is known about the solution of the problem, so \(\mathtt {-\infty }\) is returned. Otherwise, the procedure must compute the smallest \(\mathcal {FP}\) value different from NaN (if any) which extends \(\tau \). In this case, the procedure starts by flipping the value of \(attr _{\tau } [i] \), forcing \({\hbox {obj}} [i] = attr _{\tau } [i] \) (row 3). This ensures that the value of the first \(i+1\) bits of \({\hbox {obj}} \) in \({\mathcal {M}}\), corresponding to the assignment \(\tau \), is the same as the first \(i+1\) bits of the current dynamic attractor. The remaining \(n - i - 1\) bits of \(attr _{\tau } \) may also need to be updated to reflect this change. Since \(\tau \ne \emptyset \) then we know that the sign of the objective function has been permanently decided in \(\tau \). If \({\hbox {obj}} [0] = 0\) in \(\tau \), i.e. \({\hbox {obj}} \) must be positive, the procedure must return the smallest positive \(\mathcal {FP}\) value admitted by \(\tau \). Hence, we update \(attr _{\tau } \) with \(\bigcup _{j=i+1}^{j=n-1} attr _{\tau } [j] = 0\) and return the corresponding \(\mathcal {FP}\) value (rows 4–6). If \({\hbox {obj}} [0] = 1\) in \(\tau \), i.e. \({\hbox {obj}} \) can be negative values, the procedure must return the largest negative \(\mathcal {FP}\) value admitted by \(\tau \). When \(i \le ebits\) then at least one bit in the exponent of \({\hbox {obj}} \) is assigned to 0 in \(\tau \) (i.e. \({\hbox {obj}} [i] \)). If that is the case, then we update \(attr _{\tau } \) with \(\bigcup _{j=i+1}^{j=n-1} {\hbox {obj}} [j] = 1\) and return the corresponding \(\mathcal {FP}\) value (rows 7–10). In practice, we notice that the block of code at rows 4–10 needs to be executed at most once because the decision of tracking the smallest positive value or the largest negative value (different from \(\mathtt {-\infty }\)) is permanent.

Fig. 4
figure 4

An example of \(\mathcal {FP}\) optimization using the dynamic attractor. (“\(\Longrightarrow \textsc {sat}/\textsc {unsat} \)” denotes the satisfiability of \({\varphi _{\mathsf {noNaN}}}\wedge \tau _k\wedge A_{\tau _k}[k] \) . For ease of illustration, we have underlined the critical bit \(attr _{\tau _k} [k] \) in the attractors and each attractor equality of the attractor trajectory \({\mathcal {a}}_{\varphi }\) inside the vectors of attractor equalities.)

Example 8

Let \(\langle {{\varphi _{\mathsf {noNaN}}}},{{\hbox {obj}}}\rangle \) be a restricted \(\text {OMT}_{[\mathcal {FP} ]}\) problem where \({\hbox {obj}}\) is a \(\mathcal {FP}\) objective, of sort (_ FP 3 5), to be minimized. We consider the case in which the input formula \({\varphi _{\mathsf {noNaN}}}\) requires \({\hbox {obj}} \) to be larger or equal \(\frac{-21}{4}\) and it does not impose any other constraint on the value of \({\hbox {obj}} \). Given the sequence of (partial) assignments \(\tau _{0}, ..., \tau _{8}\) in Fig. 4, it can be seen that after determining the unsatisfiability of \({\hbox {obj}} [2] = attr _{\tau _2} [2] \), the dynamic attractor must start tracking the largest negative value different from \(\mathtt {-\infty }\). Hence, the value of the last \(n - i - 1\) bits of the dynamic attractor are set to be equal 1. Any subsequent call to update_dynamic_attractor() needs only to flip the value of \(attr _{\tau } [i] \), because the last \(n - i - 1\) bits of the dynamic attractor are already set to be equal 1. \(\diamond \)

We stress the fact that, unlike with the \({{\mathcal {L}}}{{\mathcal {A}}}\) [38, 41] and \({{\mathcal {B}}}{{\mathcal {V}}}\) [32] objective domains, ofp-bs does not simply perform binary search over the space of the values of the objective. Rather, after deciding the sign, it first performs binary search of the exponent values, which very-rapidly converges to the right order of magnitude, followed by binary search on the significand values, which fine-tunes the final result.

Example 9

To understand the range-pruning power of binary search over the exponent, consider the case of a 32-bit \(\mathcal {FP}\)\({\hbox {obj}}\) with 8-bit exponent and 23-bit significand. After assigning, e.g., the sign bit to 0 (positive value) the range of possible values is \([0^+,+\infty ]\) (\([0^+,+3.4.10^{38}]\) if we exclude \(+\infty \)); assigning then the first exponent bit to 0, the range reduces to \([0^+,2.0]\), reducing the range by more than a \(10^{38}\) factor; by further setting the second exponent bit to 0, \([0^+,1.1\cdot 10^{-19}]\), further reducing the range by more than a \(10^{19}\) factor, and so on. \(\diamond \)

4.3 Search Enhancements

Given a \(\mathcal {FP}\) value \(attr \) and a \(\mathcal {FP}\) goal \({\hbox {obj}} \), (a combination of) the following techniques can be used to adjust the behavior of the optimization search, similarly what has been proposed for the case of \(\text {OMT}_{[{{\mathcal {B}}}{{\mathcal {V}}} ]}\) by Nadel et al. in [32].

  • branching preference: the bits of the \(\mathcal {FP}\) objective \({\hbox {obj}}\) are marked, inside the \(\text {OMT}\) solver, as preferred variables for branching starting from the MSB down to the LSB. This ensures that conflicts involving the value of the objective function are handled as early as possible, possibly reducing the amount of work that needs to be redone after each back-jump.

  • polarity initialization: the phase-saving value of each \({\hbox {obj}} [i] \) is initialized with the value of \(attr [i] \). This encourages the \(\text {OMT}\) solver to assign the bits of \({\hbox {obj}} \) so as to reassemble the bits of \(attr \), thus possibly speeding-up the convergence towards the optimal value.

In the case of the basic \(\text {OMT}\) schema described in Sect. 4.1, the effectiveness of either technique depends on the initial choice for \(attr \). In the lucky case, the value of \(attr \) pulls the optimization search in the right direction and speeds up the search. In the unlucky case, when \(attr \) pulls in the wrong direction, there is no visible effect or an overall slow down. For instance, in the case of the linear-search optimization schema, enabling both options with an unlucky choice of \(attr \) can cause the \(\text {OMT}\) solver to start the search from the furthest possible point from the optional solution, and thus enumerate an exponential number of intermediate solutions. Naturally, the \(\text {OMT}\)-based optimization search algorithm is still guaranteed to terminate even in the worst-case scenario, but the unpredictable performance makes using either technique a generally unsuitable option in practice.

In the case of the ofp-bs algorithm described in Sect. 4.2, we use the latest value of the dynamic attractor \(attr _{\tau } \) for both the branching preference (lines 11 and 18 of Fig. 2) and the polarity initialization (rows 12 and 19 of Fig. 2) techniques. We observe that the value of every bit in the dynamic attractor can change after the sign of the objective function has been decided. Furthermore, the value of all the significand’s bits in the dynamic attractor can also change during the process of determining the optimal exponent value of the objective function (see, e.g., Example 5). As a consequence, if the \(\text {OMT}\) solver applies either enhancement before the correct improving direction is known, this may cause the underlying \(\text {OMT}\) engine to advance the search starting from a sub-optimal set of initial decisions. Enabling both enhancements at the same time could make things even worse. In order to mitigate this issue, we have designed a variant of our optimization-search approach which does not apply either enhancement on those bits of the objective function for which the best improving direction is not yet known. We have called this variant safe bits restriction.

5 Experimental Evaluation

We have implemented the procedures described in the previous sections on top of the OptiMathSAT\(\text {OMT}\) solver (v. 1.6.2), and assessed its performance on a set of \(\text {OMT}_{[\mathcal {FP} ]}\) formulas that have been automatically generated using the \(\text {SMT}(\mathcal {FP})\) benchmark-set of [4]. The formulas, the results and the scripts necessary to reproduce these results are made publicly available and can be downloaded from [1, 2]. The experiments have been performed on an i7-6500U 2.50GHz Intel Quad-Core machine with 16GB of ram and running Ubuntu Linux 17.10. For each job pair we used a timeout of 600 seconds.

Experiment Setup. The \(\text {OMT}_{[\mathcal {FP} ]}\) instances used in this experiment have been automatically generated starting from the satisfiable formulas included in the \(\text {SMT}(\mathcal {FP})\) benchmark-set of [4]. We did not consider any of the unsatisfiable instances that are present in the remote repository. For each of the original \(\text {SMT}(\mathcal {FP})\) formulas we applied the following transformations. First, we either relaxed or removed some of the constraints in the original problem, so as to broaden the set of feasible solutions. This step is necessary because the majority of the original \(\text {SMT}(\mathcal {FP})\) formulas admits only one solution. Second, for each \(\mathcal {FP}\) variable v appearing inside a \(\text {SMT}(\mathcal {FP})\) problem we generated a pair of \(\text {OMT}_{[\mathcal {FP} ]}\) instances, one for the minimization and another for the maximization of v. At the end of this step, we obtained 39536 \(\text {OMT}_{[\mathcal {FP} ]}\) formulas. Third, we randomly selected up to 300 \(\text {OMT}_{[\mathcal {FP} ]}\) instances from each of the five groups of problems in the \(\text {OMT}_{[\mathcal {FP} ]}\) benchmark-set. This filtering step yielded a total of 1120 SMT-LIBv2 formulas.

The first two \(\text {OMT}\)-based baseline implementations we have considered are OptiMathSAT (omt+lin) and OptiMathSAT (omt+bin), that run the linear- and the binary-search respectively. These configurations have been tested using both the eager and the lazy \(\mathcal {FP}\) approaches. The third baseline implementation we have considered, named OptiMathSAT (eager+obv-bs), is based on a reduction of the \(\text {OMT}_{[\mathcal {FP} ]}\) problem to \(\text {OMT}_{[{{\mathcal {B}}}{{\mathcal {V}}} ]}\) and it uses OptiMathSAT ’s implementation of the obv-bs engine presented by Nadel et al. [32].Footnote 8 For this test, we have generated an \(\text {OMT}_{[{{\mathcal {B}}}{{\mathcal {V}}} ]}\) benchmark-set using a \({{\mathcal {B}}}{{\mathcal {V}}}\) encoding that mimics the essential aspects of the ofp-bs algorithm described Sect. 4.2. We compared these baseline approaches with a configuration using the ofp-bs algorithm and the eager \(\mathcal {FP}\) approach, namely OptiMathSAT (eager+ofp-bs). We have separately tested the effect of enabling the branching preference (bp), the polarity initialization (pi) and the safe bits restriction (so) enhancements described in Sect. 3.2, whenever these options were supported by the given configuration. We have not included other tools in our experiment because we are not aware of any other \(\text {OMT}_{[\mathcal {FP} ]}\) solver.

Last, in order to assess the significance of the optimization problems used in this experiment, we have collected the run-time statistics of OptiMathSAT on the SMT formulas obtained by stripping the objective function from each \(\text {OMT}\) instance, so that no optimization is to be performed. We named this configuration OptiMathSAT (eager+smt).

For all problem instances, we verified the correctness of the optimal solution found by each configuration with an SMT solver (MathSAT5). When terminating, all tools returned the same optimum value.

Table 2 (Top) Comparison among various OptiMathSAT (here simply “OM”) configurations on the \(\text {OMT}_{[\mathcal {FP} ]}\) benchmark-set
Fig. 5
figure 5

Cactus plots of the data displayed in Table 2

Fig. 6
figure 6

Pairwise comparisons on \(\text {OMT}_{[\mathcal {FP} ]}\) formulas using \(\text {OMT}\)-based linear-search and other configurations. (Blue points denote satisfiable benchmarks, green denotes a timeout.) (Color figure online)

Fig. 7
figure 7

Pairwise comparisons on \(\text {OMT}_{[\mathcal {FP} ]}\) formulas using \(\text {OMT}\)-based binary-search and other configurations. (Blue points denote satisfiable benchmarks, green denotes a timeout.)

Fig. 8
figure 8

Pairwise comparisons on \(\text {OMT}_{[\mathcal {FP} ]}\) formulas using the ofp-bs engine and other configurations. (Blue points denote satisfiable benchmarks, green denotes a timeout.) (Color figure online)

Experiment Results.The results of this experiment are listed in Table 2: Fig. 5 depicts the log-scale cactus plot of the same data, for a visual comparison among the different configurations; in addition, Figs. 6, 7 and 8 show a selection of relevant pairwise comparisons among various OptiMathSAT configurations, focusing on variants of the \(\text {OMT}\)-based linear-search approach, of the \(\text {OMT}\)-based binary-search approach, and of the ofp-bs approach respectively.

Concerning \(\text {OMT}\)-based linear-search optimization, we observe that OptiMathSAT performs the best when no enhancement is enabled. In particular, the empirical evidence suggests that enabling branching preference significantly increases the number of timeouts, generally deteriorating the performance (plot 1A in Fig. 6). Enabling only polarity initialization does not result in an appreciable change on the running time of the solver (plot 1B in Fig. 6). In contrast, enabling both enhancements at the same time has a small chance to result in a small improvement of the search time (plot 2A in Fig. 6), but it generally worsens the performance and results in a drastic increase in the number of timeouts (Table 2). We justify these results as follows. First, when only polarity initialization is used, the phase-saving value that is being set by OptiMathSAT does not really matter because the optimization search is dominated by the structure of the formula itself rather than by the bits of the \(\mathcal {FP}\) objective. Second, when polarity initialization is used on top of branching preference, there is an even more drastic decrease in performance due to the fact that the initial phase-saving value that is statically assigned by the \(\text {OMT}\) solver to the bits of the \(\mathcal {FP}\) objective cannot be expected to be “good enough” for any situation. In fact, as illustrated in example 5, the initial phase-saving can be misleading and force the \(\text {OMT}\) solver—when running in linear-search—to explore an exponential number of intermediate satisfiable solutions.

In the case of the \(\text {OMT}\)-based binary-search optimization approach, we observe that it solves more formulas than linear-search and it generally appears to be faster (plot 3B in Fig. 6). Overall, polarity initialization does not seem to be beneficial, whereas enabling branching preference increases the number of formulas solved within the timeout. This behavior is different from the linear-search approach, and we conjecture that it is due to the fact that, with the \(\text {OMT}\)-based binary-search approach, branching over the bits of the objective function can reveal in advance any (partial) assignment to the bits of the objective function that it is inconsistent wrt. the pivoting cuts learned by the optimization engine.

Using the lazy \(\mathcal {FP}\) engine results in fewer formulas being solved, although a significant number of these benchmarks is solved faster than with any other configuration (over 90 instances, for both configurations).

The OptiMathSAT (eager+obv-bs ) configuration is able to solve 1013 formulas within the timeout, showing that \(\text {OMT}_{[\mathcal {FP} ]}\) can be reduced to \(\text {OMT}_{[{{\mathcal {B}}}{{\mathcal {V}}} ]}\) effectively, and that—on the given benchmark-set—the performance of this approach are comparable with the best \(\text {OMT}_{[\mathcal {FP} ]}\) configurations being tested.

Overall, the best performance is obtained by using the ofp-bs engine, with up to 1019 benchmark-set instances solved in correspondence to the OptiMathSAT (eager+ofp-bs +pi) configuration. In plot 2B of Figs. 6 and 7, we show the pairwise comparison of the best ofp-bs configuration with the best \(\text {OMT}\)-based run. Similarly to the case of \(\text {OMT}\)-based optimization with linear-search, we observe that enabling branching preference generally makes the performance worse (plot 1A in Fig. 8). Instead, when polarity initialization is used we observe a general performance improvement that does not only result in an increase in the number of formulas being solved within the timeout, but also a noticeable reduction of the solving time as a whole. This is in contrast with the case of \(\text {OMT}\)-based optimization, and it can be explained by the fact that ofp-bs uses an internal heuristic function to dynamically determine and update the most appropriate phase-saving value for the bits of the objective function. An equally important role is played by the safe bits restriction, that limits the effects of branching preference and polarity initialization to only certain bits of the dynamic attractor. As illustrated by the plots in the second and third rows of Fig. 8 and by the data in Table 2, tThis feature is particularly effective when used in combination with branching preference.

The results of OptiMathSAT over the SMT-only version of the benchmark-set (no optimization) are reported in the last row of Table 2 and in the scatter-plot 3B in Fig. 7, and show that for a large number of instances the \(\text {OMT}\) problem is considerably harder than its SMT-only version. There are a few exceptions to this rule, that we ascribe to the fact that the removal of the objective function alters the internal stack of formulas, and this can have unpredictable consequences on the behavior of various internal heuristics that depend on it. A solution can be found in a shorter amount of time when the sequence of (heuristic) choices is compatible with its assignment and it requires little back-tracking effort.

6 Conclusions and Future Work

We have presented for the first time \(\text {OMT}\) procedures for (signed bit-vectors and) floating-point objectives, based on the novel notions of attractor and dynamic attractor, which we have implemented in OptiMathSAT and tested on modified problems from SMT-LIB.

Ongoing research involves implementing our ofp-bs procedure on top of the ACDCL \(\text {SMT}(\mathcal {FP})\) procedure—which is not immediate to do efficiently because the latter approach does not allow directly accessing and setting the single bits of the objective (since \({{\mathcal {B}}}{{\mathcal {V}}}\) and \(\mathcal {FP}\) are not signature-disjoint). Future research involves experimenting the new \(\text {OMT}\) procedure directly on problems coming from bit-precise SW and HW verification, produced, e.g., by the NuXmv model checker [3].