1 Introduction

As a method of representing real numbers, floating point can support a wide dynamic range and high precision of values. It has been thus commonly used in signal processing, such as image processing, speech processing, and digital signals processing, to represent signals. When these applications are implemented on hardware for high speed and stability, the signals need to be represented in fixed point to optimize the performance of area, power, and speed of the hardware. Hence, the values in floating-point need to be converted to those in fixed point. This process is named as word-length optimization. Its goal is to achieve optimal system performance while satisfying the specification on the system output precision. Word-length optimization involves range analysis and precision analysis. The former one is to find the minimum word length of the integer part of the value, while the latter one focuses on the optimization of the fractional part of the word length.

Word-length optimization has been proven to be an NP-hard problem [1]. It can be usually classified into dynamic analysis [27] and static analysis [820]. By analyzing a large set of stimuli signals, dynamic analysis is applicable to all types of systems. However, it will take long time on simulation to provide sufficient confidence. Also, the precision for the signals without simulation cannot be guaranteed. Comparatively, the static analysis is an automated and efficient word-length optimization method and more applicable to large designs when compared to dynamic analysis. The static analysis mainly uses the characteristics of the input signals to estimate the word length conservatively, which can result in overestimation [12] to some extent. As a part of word-length optimization, the range analysis can also been classified in the same way.

Affine arithmetic (AA) [21] is often used for range analysis in static analysis. In AA, every signal must be represented in an affine form, which is a first-degree polynomial. As AA tracks the correlations among range intervals of signals, it can provide more accurate word-length range. This makes it suitable for range analysis of the result of linear operations. It is noted that besides linear operations, nonlinear operations, such as multiplication, are also involved in hardware operations, typically in linear time invariant (LTI) systems. AA cannot provide an exact affine form for nonlinear operations. To solve this problem, Stolfi and de Figueiredo [22] proposed affine approximation methods for multiplication, which include trivial range estimation (AATRE) and Chebyshev approximation (AACHA). AATRE is efficient for computation, but the range produced by it can be four times of real range at most. The accumulation of the uncertainty of all signals in the computational chain may result in an error explosion, which is unacceptable in application. Such overestimation obviously cannot satisfy the accuracy requirement of the system, which limits the application of AATRE in large systems. The uncertainty of AACHA is less than AATRE, however, it is too complex to be used in large systems. Since LTI operations are accurately covered by AA, the proposed method is applied in the field of the range analysis of word-length optimization in this paper.

A novel affine approximation method, Approximation Affine based on Space Extreme Estimation (AASEE), is proposed to reduce the uncertainty of multiplication and achieve an accurate and efficient range analysis of multiplication in this paper. To analyze the uncertainty conveniently, we use two parts to divide the different parts of all the approximation methods for multiplication, which include AATRE, AACHA, and AASEE. The first part is named as approximate affine form, which is approximated to the nonlinear operation. The second part is named as equivalent affine form, which is the equivalent affine form of the estimated range of the difference between the result of multiplication and the approximate affine form. The more accurate the two parts are, the more accurate the approximation method is. Based on linear geometry [23], it is proven that the proposed approximate affine form is the closest to the result of multiplication. To derive the equivalent affine form, we use the extreme value theory of multivariable functions [24] to estimate the upper and lower bounds of the difference in space, and the difference is introduced by the approximation of the first part. The uncertainty of the proposed method is minimized. The accuracy of the resulting affine form by AASEE is higher than that by AATRE and averagely higher than that by AACHA. Meanwhile, the computational complexity of AASEE is equivalent to that of AATRE and lower than that of AACHA.

The rest of this paper is organized as follows. Background of range analysis for multiplication is presented in Section 2. Section 3 presents the method of derivation of the two parts for multiplication. The refined affine form of multiplication, AASEE, is presented in next section. In Section 5, we compare the computational complexity and the accuracy among AASEE to AATRE and AACHA. The case studies and experimental results are demonstrated in Section 6. Section 7 concludes the paper.

2 Background

2.1 Related work

Interval arithmetic (IA) and affine arithmetic (AA) have been widely used in range analysis in word-length optimization.

IA [25] is a range arithmetic theory which is firstly presented by Moore in 1962. Cmar [2] employs it for range analysis of digital signal processing (DSP) systems. Carreras [20] presents a method based on IA. To reduce the oversized word length, the method provides the probability density functions that can be used when some truncation must be performed due to constraints in the specification. IA is not suitable for most real-world applications, since it could lead to drastic overestimation of the true range.

AA [21] is proposed to overcome the weakness of IA by Stolfi in 1993. In [8, 9], Fang uses AA to analyze word-length optimization. Both range and precision are represented by the same affine form, which limits the optimization. Pu and Ha [10] also use AA for word-length optimization. Simultaneously, they use two different affine forms for range analysis and precision analysis, respectively, and achieve more refined result of word-length optimization. Similarly, Lee et al. [11] develop an automatic optimization approach, which is called MiniBit, to produce accuracy-guaranteed solutions, and area is minimized while meeting an error constraint. Osborne [12] uses both IA and AA for range analysis for different situations. Computation using either of the two methods in the design is time-consuming. The problem of overestimation is serious due to the approximation of the nonlinear operations.

Since AA cannot be used in the systems with infinite number of loops, an improved approach, quantized AA (QAA), has been proposed in [13] for linear time-invariant systems with feedback loops. This method can provide fast and tight estimation of the evolution of large sets of numerical inputs, using only an affine-based simulation, but it does not provide the exact bounds.

AATRE [22] is adopted for multiplication in most of the works for the low computational complexity. But the uncertainty of the range by AATRE is very large. To adjust the trade-off between the accuracy of approximation and computational complexity, Zhang [14] introduces a new parameter N in the N-level simplified affine approximation (N-SAA). This method is faster than AACHA and more accurate than AATRE, but it is more complex than AATRE. Furthermore, it is troublesome to choose a suitable N. A method of range analysis is proposed by Pang [26]. This method combines methods of IA, AATRE, and arithmetic transform (AT); and the result of the method is more accurate than AATRE, while the CPU implementation time is longer than AATRE. To deal with applications from the scientific computing domain, Kinsman [17, 18] uses the computational methods based on Satisfiability Modulo Theory. Search efficiency of this method is improved leading to tighter bounds and thus smaller word length.

For all the existing methods, the accuracy of approximation is improved at the expense of the computational complexity. This paper presents an affine approximation method for multiplication, which achieves better trade-off between accuracy and computational complexity.

2.2 Range analysis

Range analysis involves studying the data range of every signal and minimizing the integer word lengths for signals on the premise that the signals in the design have enough bits to accommodate this range. The range of signal x is represented by x= [xmin, xmax], where the two real numbers, xmin and xmax, denote the lower and upper bounds of x, respectively. The required integer part of the word length for signal x, which is represented as IWL x , can be derived by:

IWL x = log 2 ( | x | max ) + α , | x | max 1 1 , | x | max < 1 . where | x | max = max ( | x min | , | x max | ) and α = 1 , mod ( log 2 ( x max ) , 1 ) 0 2 , mod ( log 2 ( x max ) , 1 ) = 0 .
(1)

In (1), all the signals in the design are assumed to be expressed as signed numbers, and the sign bit is taken into account in IWL x . According to (1), once the range of a signal is decided, the integer part of word length of the signal can be derived.

2.3 Affine arithmetic

AA is widely applied for range analysis. In AA, an uncertain signal x is represented by an affine form as a first-degree polynomial [22]:

x ̂ = x 0 + x 1 ε 1 + x 2 ε 2 ++ x n ε n ,where ε i =[-1,1].
(2)

For the signal x, x0 is the central value, and ε i is the i th noise symbol. ε i denotes an independent uncertainty source that contributes to the total uncertainty of the signal x, and x i is its coefficient.

The upper and lower bounds for the range of x can be represented as

x max = x 0 + i = 1 n | x i |, x min = x 0 - i = 1 n | x i |.
(3)

With xmin and xmax, the input interval x ̄ =[ x min , x max ] can be converted into an equivalent affine form as (4), using only one independent noise symbol.

x ̂ = x 0 + x 1 ε 1 , with x 0 = x max + x min 2 , x 1 = x max - x min 2 .
(4)

AA can keep correlations among the signals of the computational chain by contributing the sample noise symbol ε i to each signal [22].

For multiplication, AATRE and AACHA are typical approximation methods.

The affine form of AATRE is

x ̂ ŷ = x 0 y 0 + i = 1 n ( x 0 y i + y 0 x i ) ε i + i = 1 n | x i | i = 1 n | y i | ε n + 1 .
(5)

Suppose M1= max(n1,n2), in which n1 and n2 denote the number of the noise symbol, whose coefficient is nonzero, of x ̂ and , respectively. The computational complexity of AATRE is O(M1).

AACHA provides a better approximation result, but it is more complex. The affine form of AACHA is

x ̂ ŷ= x 0 y 0 + i = 1 n ( x 0 y i + y 0 x i ) ε i + a + b 2 + b - a 2 ε n + 1 ,
(6)

where a and b denote the minimum and the maximum of the range of i = 1 n x i ε i i = 1 n y i ε i . Suppose M2 = n1 + n2. The complexity of computing the both extremal values, a and b, is O(M2 logM2). As M1 ≤ M2, the computational complexity of AATRE is lower than that of AACHA [22].

2.4 Extreme value theory

The proposed approximation is based on the extreme value theory of multivariable functions [24].

According to the extreme value theory of multivariable functions, the Hessian matrix of the function, H, and Jacobian matrix of the function, J, can be used to find the local maxima and the local minima. Hessian matrix of function f(x1,x2, …, x n ) is

H = 2 f x 1 2 2 f x 1 x 2 2 f x 1 x n 2 f x 2 x 1 2 f x 2 2 2 f x 2 x n 2 f x n x 1 2 f x n x 2 2 f x n 2 .
(7)

Here we use H f α to represent H at a point f α =( x 1 α , x 2 α ,, x n α ) and J f α to represent J at a point fα.

A stationary point of f, fα, is a point where J f α =0. H f α is indefinite when H f α is neither positive semidefinite nor negative semidefinite. If H f α is positive definite, then fα is a local minimum point. If H f α is negative definite, then fα is a local maximum point. If H f α is indefinite, then fα is neither a local maximum nor a local minimum. It is a saddle point. Otherwise, fα is not utilized in this paper.

The principal minor determinants are used to determine if a matrix is positive or negative definite or semidefinite.

It is necessary and sufficient for a positive semidefinite matrix that all the principal minor determinants of the matrix are nonnegative real numbers.

It is necessary and sufficient for a negative semidefinite matrix that all the odd order principal minor determinants of the matrix are non-positive real numbers and all the even order principal minor determinants of the matrix are nonnegative real numbers.

3 Derivation of the two parts for multiplication

A generic nonlinear operation zf( x ̂ ,ŷ) proposed in [22] can be described by (8):

z = f ( x 0 + x 1 ε 1 + + x n ε n , y 0 + y 1 ε 1 + + y n ε n ) = f ( ε 1 , , ε n ) .
(8)

Since the operation f is nonlinear, f(ε1, …, ε n ) cannot be expressed exactly as an affine combination of the noise symbols, ε i . Under this case, an approximate affine form of the operation, which is represented as f z , must be used to approximate f(ε1, …, ε n ). The difference introduced by this approximation, d f  = f-f z , can be expressed by an equivalent affine form of the estimated range of the difference, which is represented as d ̂ . Hence, the affine form of z can be expressed as

z ̂ = f z + d ̂ .
(9)

In (9), f z is a first-degree function of ε i and can be expressed as (10)

f z ( ε 1 ,, ε n )= z 0 + i = 1 n z i ε i .
(10)

The computational complexity of computing the true range of d f is very high in a practical application. The estimated range of d f is utilized instead of the true range. Suppose dmax and dmin denote the upper and lower bounds of the estimated range of d f , respectively. According to (4), the d ̂ can be expressed as (11)

d ̂ = z + z n + 1 ε n + 1 = d max + d min 2 + d max - d min 2 ε n + 1 .
(11)

With (10) and (11), the affine form of z can be represented as

z ̂ = f z + d ̂ = z 0 + i = 1 n z i ε i + z + z n + 1 ε n + 1 .
(12)

For multiplication, z can be expressed as

z= x 0 y 0 + x 0 i = 1 n y i ε i + y 0 i = 1 n x i ε i + i = 1 n x i ε i i = 1 n y i ε i .
(13)

The first three items of (13) form an affine form and the last term is a quadratic term. Its affine form can also be represented as (12).

According to the definition of f z in (10) and d ̂ in (11), AATRE and AACHA can also be represented by f z and d ̂ . For AATRE in (5), the f z and d ̂ are defined as

f z = x 0 y 0 + i = 1 n ( x 0 y i + y 0 x i ) ε i ,
(14)
d ̂ = i = 1 n | x i | i = 1 n | y i | ε n + 1 .
(15)

For AACHA in (6), the f z and d ̂ are defined as

f z = x 0 y 0 + i = 1 n ( x 0 y i + y 0 x i ) ε i ,
(16)
d ̂ = a + b 2 + b - a 2 ε n + 1 .
(17)

In the existing affine approximation methods of AATRE and AACHA, dmax and dmin are estimated in the XY plane. In these methods, the same noise symbol of different variables is considered to be independent. Hence, the range of d ̂ is much larger than that of d f . The difference between d ̂ and d f will propagate to z ̂ and result in uncertainty.

To describe the multiplication accurately, we use ε i as the input arguments and estimate the range of z in the (n+1)-dimensional space En+1. The (n + 1)-dimensional space En+1 is labeled as (ε1, …, ε n , z). In space En+1, a first-degree polynomial function can be expressed as a (n + 1)-dimensional hyperplane and a nonlinear polynomial function denotes a (n + 1)-dimensional space curved surface. The approximate affine form in (10) denotes a (n + 1)-dimensional hyperplane in En+1. Each hyperplane in En+1 can be viewed as a parallel translation of a tangent hyperplane at a certain point of (n + 1)-dimensional space curved surface. Hence, all possible approximate affine forms for z can be regarded as the (n + 1)-dimensional tangent hyperplanes at all points of (n + 1)-dimensional space curved surface in En+1. The translation amount is taken into account in d f , which is approximated by d ̂ . In space En+1, d f can be viewed as the function of the distance between the points of space curved surface and the tangent hyperplane.

Figure 1 shows an example of x ̂ =1+ ε 1 +5 ε 2 and ŷ=3-6 ε 1 + ε 2 . The space is labeled as (ε1, ε2, z). The red mesh surface represents the function z= x ̂ ŷ=(1+ ε 1 +5 ε 2 )(3-6 ε 1 + ε 2 ). The blue plane represents the tangent plane f z , z = 3 - 3ε1 + 16ε2, at the point zα = (0, 0, 3). All the possible approximate affine forms for z are the tangent planes of all the points. d f is a function of distance between z and f z .

Figure 1
figure 1

Example of multiplication in ( n + 1)-dimensional space E n+1 .

Here we use f z α in (18) to represent the tangent hyperplane at the point z α =( ε 1 α , ε 2 α ,, ε n α ). Then, the possible approximate affine form can be represented as f z α , too.

f z α = z α + z ε 1 ( ε 1 - ε 1 α )+z ε 2 ( ε 2 - ε 2 α )++z ε n ( ε n - ε n α ).
(18)

In (18), z ε n are the partial derivatives of z with respect to the variables ε n at the point zα.

With the estimated range of d f , the maximum absolute error of d f can be expressed as

e a =max(| d max |,| d min |).
(19)

To reduce the uncertainty, f z must be the most closed to the result of multiplication. Hence, f z is the tangent hyperplane whose maximum absolute error is minimum among that of all the possible affine form f z α , that is,

e a ( f z )=min( e a ( f z α )).
(20)

The geometrical meaning of f z denotes the tangent hyperplane whose maximum absolute error is minimized.

f z is derived by the range of d f , while d ̂ is the equivalent affine form of d f . It is very complex to compute the true range of d f . With d ̂ in (11), the uncertainty in AA for nonlinear operations is generated due to the difference between the true range of d f and the estimated range of d f .

It is much tighter and easier to estimate range of d f in En+1 space than in the XY plane. Based on the extreme value theory of multivariable functions, the estimated range of d f in AASEE is derived.

With more accurate dmax and dmin, f z and d ̂ can be calculated more precisely, and AASEE can achieve a refined affine approximation result.

In the next sections, the estimated range of d f will be derived firstly, and the two parts will be derived later.

4 AASEE for multiplication

4.1 Estimated range of the difference

For multiplication, which is expressed as (13), the value of z at the point zα is

z α = x 0 + i = 1 n x i ε i α y 0 + i = 1 n y i ε i α .
(21)

The partial derivatives of z with respect to the variable ε i at the point zα are

z ε i = x i y 0 + j = 1 n y j ε j α + y i x 0 + j = 1 n x j ε j α .
(22)

Upon substitution for zα and z ε i , the tangent hyperplane f z α can be expressed as

f z α = x 0 + i = 1 n x i ε i α y 0 + i = 1 n y i ε i α + x 1 y 0 + i = 1 n y i ε i α + y 1 x 0 + i = 1 n x i ε i α ε 1 - ε 1 α + + x n y 0 + i = 1 n y i ε i α + y n x 0 + i = 1 n x i ε i α ε n - ε n α .
(23)

The difference between the tangent hyperplane f z α and (n + 1)-dimensional quadratic surface z is

d f = z - f z α = i , j = 1 n x i y j ( ε i - ε i α ) ( ε j - ε j α ) , where ε i , ε j , ε i α , ε j α = [ - 1 , 1 ] .
(24)

Suppose demax and demin denote the estimated maximum and minimum of the function value at the domain boundary respectively, and dfimax and dfimin denote the local maxima and the local minima, respectively. The estimated maximum and minimum of multivariable function d f , dmax and dmin, can be expressed as

d max =max( d emax , d fimax ),
(25)
d min =min( d emin , d fimin ).
(26)

According to (24), the function value at the domain boundary, d fe , is represented by

d fe = i , j = 1 n x i y j [ ε i ε j - ε j ε i α - ε i ε j α + ε i α ε j α ] where ε i = ± 1 , i = 1 , 2 , , n .
(27)

To simplify, we observe the extreme case of ∀ε i  = ±1. Under this case, for the first item, it is always positive when i = j. Hence, the estimated function value at the domain boundary, de, is expressed as

d e = i , j = 1 , i = j n x i y j + i , j = 1 n x i y j ε i α ε j α + i , j = 1 , i j n x i y j ε i ε j - i , j = 1 n x i y j ε j ε i α - i , j = 1 n x i y j ε i ε j α where ε i = ± 1 .
(28)

Hence, the maximum and minimum of de, demax and demin are derived as

d emax = i = 1 n x i y i + i , j = 1 n x i y j ε i α ε j α + i , j = 1 , i j n | x i y j | + i , j = 1 n | x i y j ε i α | + i , j = 1 n | x i y j ε j α |
(29)
d emin = i = 1 n x i y i + i , j = 1 n x i y j ε i α ε j α - i , j = 1 , i j n | x i y j | - i , j = 1 n | x i y j ε i α | - i , j = 1 n | x i y j ε j α | .
(30)

To simply compare, dfimax and dfimin in (25) and (26) can be expressed as

d fimax = i = 1 n x i y i ε i 2 + i , j = 1 , i j n x i y j ε i ε j + i , j = 1 n x i y j ε i ε j α + i , j = 1 n x i y j ε j ε i α + i , j = 1 n x i y j ε i α ε j α ,
(31)
d fimin = i = 1 n x i y i ε i 2 + i , j = 1 , i j n x i y j ε i ε j + i = 1 n x i y i ε i ( ε i α + ε j α ) + i , j = 1 , i j n x i y j ε i ε j α + i , j = 1 , i j n x i y j ε j ε i α + i , j = 1 n x i y j ε i α ε j α , where ε i , ε j = ( - 1 , 1 ) , and ε i α , ε j α = [ - 1 , 1 ] .
(32)

As the example in Section 3, Figure 2 shows the function of d f  = -6(ε1-0.1)2-29(ε1-0.1)(ε2-0.1) + 5(ε2-0.1)2 when ε 1 α =0.1 and ε 2 α =0.1. The estimated maximum and minimum of d f at the domain boundary, demax and demin, are also marked in the figure. Since the value of ε i in (27) are substituted by ∀ε i  = ±1, demax is larger than the maximum of d f and demin is smaller than the minimum.

Figure 2
figure 2

d f , d emax , and d emin of the example in Section 3.

The extreme value theory of multivariable functions is used to compare demax, dfimax, demin, and dfimin.

Hessian matrix of function d f = i , j = 1 n x i y j ( ε i - ε i α )( ε j - ε j α ) is

H = 2 d f ε 1 2 2 d f ε 1 ε 2 2 d f ε 1 ε n 2 d f ε 2 ε 1 2 d f ε 2 2 2 d f ε 2 ε n 2 d f ε n ε 1 2 d f ε n ε 2 2 d f ε n 2 = 2 x 1 y 1 x 1 y 2 + x 2 y 1 x 1 y 3 + x 3 y 1 x 1 y 2 + x 2 y 1 2 x 2 y 2 x 2 y 3 + x 3 y 2 x 1 y 3 + x 3 y 1 x 2 y 3 + x 3 y 2 2 x 3 y 3 x 1 y n + x n y 1 x 2 y n + x n y 2 x 3 y n + x n y 3 .
(33)

From (33), we can see that H is independent of ε i . It is a expression of x i and y i . This means that H is same for all the points in the domain.

To determine if H is positive or negative definite or semidefinite, its principal minor determinants are derived as

D 0 =2 x i y i
(34)
D 1 = 2 x i y i x i y j + x j y i x i y j + x j y i 2 x j y j = - ( x i y j - x j y i ) 2
(35)
D 2 = D 3 = = D n = 0 , where 1 i < j n .
(36)

As introduced in Section 2.4, H is a positive semidefinite matrix, iff it satisfies

x i y i 0, x i y j = x j y i ,for1i<jn.
(37)

H is a negative semidefinite matrix, iff it satisfies

x i y i 0, x i y j x j y i ,for1i<jn.
(38)

If it satisfies neither (37) nor (38), which means it satisfies (39), H is an indefinite matrix as

x i y i <0,for1in.
(39)

According to (37), (38), and (39), we can compare demax, demin, dfimax, and dfimin, which are expressed as (29), (30), (31), and (32), respectively. Based on (25) and (26), dmax and dmin can be identified.

Lemma 1.

The estimated maximum of function d f , d max equals to the estimated maximum of the function value at the domain boundary, and the estimated minimum of function d f , d min equals to the estimated minimum of the function value at the domain boundary. This can be expressed as

d max = d emax d min = d emin .
(40)

Proof.

There are two cases to consider, as ∃x i y i  < 0 and ∀x i y i  ≥ 0.

For ∃x i y i  < 0, (39) is satisfied and H is indefinite. The stationary point is a saddle point, such as the point P in Figure 2. Neither dfimax nor dfimin exists in d f , that is,

d max = d emax d min = d emin .
(41)

According to (41), Lemma 1 can be proven in this case.

For ∀x i y i  ≥ 0, H may be positive semidefinite or negative semidefinite. d f may have local minima or local maxima under this condition.

As ε i  = [-1, 1], the following inequalities are established:

i , j = 1 , i j n | x i y j |± i , j = 1 , i j n x i y j ε i ε j ,
(42)
i , j = 1 n | x i y j ε i α |± i , j = 1 n x i y j ε i ε j α ,
(43)
i , j = 1 n | x i y j ε j α |± i , j = 1 n x i y j ε j ε i α .
(44)

If a local maximum lies at zα, the difference between demax and dfimax is

d emax - d fimax i = 1 n x i y i (1- ε i 2 ).
(45)

x i y i  ≥ 0, there exists

d emax d fimax .
(46)

According to (25) and (46), we can prove that

d max = d emax .
(47)

Similarly, if a local minimum lies at zα, the difference between demin and dfimin is

d emin - d fimin - i = 1 n x i y i ( ε i 2 + ε i ( ε i α + ε j α ) + 1 ) - i = 1 n x i y i ( ε i + 1 ) 2 .
(48)

As ∀x i y i  ≥ 0 in (48), the inequality (49) can be proven:

d emin d fimin .
(49)

According to (26) and (49), we can prove that

d min = d emin .
(50)

As (47) and (50) are established, Lemma 1 can be proven in the case of ∀x1y1 ≥ 0.

Combining these two cases, Lemma 1 is proven.

According to Lemma 1, dmax and dmin at a point zα can be computed as demax and demin in (29) and (30).

4.2 Expression of the approximate affine form in AASEE

Lemma 2.

When f z represents a tangent hyperplane at the point z0 = z0 = (0, 0, …, 0), it satisfies (20).

Proof.

According to Lemma 1, (29), and (30), the maximum absolute error of d f is

e a = | i = 1 n x i y i | + i , j = 1 , i j n | x i y j | + i , j = 1 n | x i y j ε i α | + i , j = 1 n | x i y j ε j α | + | i , j = 1 n x i y j ε i α ε j α | .
(51)

So the maximum absolute error between the tangent hyperplane f z 0 at the point z0 = z0 = (0, 0, …, 0) and (n + 1)-dimensional quadratic surface z is

e a ( z 0 ) = | i = 1 n x i y i | + i , j = 1 , i j n | x i y j | .
(52)

Suppose that there is another point zα ≠ z0, which is typically represented by zα = (ε1, ε2, …, ε n ), where ε i  = [-1, 1], and ε i cannot be equal to 0 for all i, i = 1 … n. The maximum absolute error between the tangent hyperplane f z α at point zα and (n + 1)-dimensional quadratic surface x ̂ ŷ is

e a ( z α ) = | i = 1 n x i y i | + i , j = 1 , i j n | x i y j | + i , j = 1 n | x i y j ε i α | + i , j = 1 n | x i y j ε j α | + | i , j = 1 n x i y j ε i α ε j α | .
(53)

ea(zα) and ea(z0) can be compared by

e a ( z 0 ) - e a ( z α ) = - i , j = 1 n | x i y j ε i α | - i , j = 1 n | x i y j ε j α | - | i , j = 1 n x i y j ε i α ε j α | 0 .
(54)

Because ea(z0) ≤ ea(zα), the tangent hyperplane f z 0 at the point z0 = z0 = (0, 0, …, 0) is the tangent hyperplane whose maximum absolute error is minimized.

It is proven that the chosen f z is a tangent hyperplane at the point z0 = z0 = (0, 0, …, 0).

According to Lemma 2, f z of AASEE denotes the tangent hyperplane at the point z0 = (0, 0, …, 0) and can be expressed as

f z = x 0 y 0 + x 0 i = 1 n y i ε i + y 0 i = 1 n x i ε i .
(55)

This f z is the same as the f z s in AATRE and AACHA.

4.3 Expression of the equivalent affine form in AASEE

According to (55), the d f between the tangent hyperplane f z 0 and the quadratic surface is

d f = i , j = 1 n x i y j ε i ε j .
(56)

According to Lemma 1, (29), and (30), the estimated maximum and estimated minimum of d f , dmax and dmin can be expressed as

d max = d emax = i = 1 n x i y i + i , j = 1 , i j n | x i y j | d min = d emin = i = 1 n x i y i - i , j = 1 , i j n | x i y j | .
(57)

n = 1 is a special case and dmax and dmin can be optimized as

d max = x 1 y 1 , for n = 1 , x 1 y 1 0 0 , for n = 1 , x 1 y 1 0
(58)
d min = 0 , for n = 1 , x 1 y 1 0 x 1 y 1 , for n = 1 , x 1 y 1 0 .
(59)

By combining the two cases, demax and demin are rewritten as

d max = i = 1 n x i y i + i , j = 1 , i j n | x i y j | , for n > 1 x 1 y 1 , for n = 1 , x 1 y 1 0 0 , for n = 1 , x 1 y 1 < 0
(60)
d min = i = 1 n x i y i - i , j = 1 , i j n | x i y j | , for n > 1 0 , for n = 1 , x 1 y 1 0 x 1 y 1 , for n = 1 , x 1 y 1 < 0 .
(61)

When n > 1, the range of d ̂ can be expressed as

i = 1 n x i y i - i , j = 1 , i j n | x i y j | , i = 1 n x i y i + i , j = 1 , i j n | x i y j | .
(62)

According to (11), the affine form of d ̂ can be expressed as

d ̂ = i = 1 n x i y i + i , j = 1 , i j n | x i y j | ε n + 1 .
(63)

When n = 1, the range of d ̂ can be expressed as

[ x 1 y 1 ,0]or[0, x 1 y 1 ].
(64)

The affine form of d ̂ can be expressed as

d ̂ = 1 2 x 1 y 1 + 1 2 | x 1 y 1 | ε 2 .
(65)

4.4 Formulary of AASEE

According to (12), the affine form of AASEE for multiplication is

z ̂ = f z + d ̂ = x 0 y 0 + x 0 i = 1 n y i ε i + y 0 i = 1 n x i ε i + i = 1 n x i y i + i , j = 1 , i j n | x i y j | ε n + 1 for n > 1 ,
(66)
z ̂ = f z + d ̂ = x 0 y 0 + ( x 0 y 1 + y 0 x 1 ) ε 1 + 1 2 x 1 y 1 + 1 2 | x 1 y 1 | ε 2 for n = 1 .
(67)

It is impossible to obtain the exact affine form for multiplication in AA. The result of multiplication must be approximated to an affine form. Using ε i as the input arguments, the uncertainty of multiplication in AASEE is reduced. The proposed f z is the most closed to the result of multiplication among all the possible approximate affine forms, and the upper and lower bounds of d ̂ in AASEE are much closer to true bounds of d f . Hence, the uncertainty in AASEE is smaller than that in AATRE and AACHA. Formed by such f z and d ̂ , AASEE creates a refined affine form of multiplication.

5 Comparison of AASEE to AATRE and AACHA

5.1 Computational complexity

The computational complexity of an expression is determined by its most complex item. For n > 1, the most complex item is the coefficient of εn+1. To make the analysis convenient, we transform this coefficient:

i , j = 1 , i j n | x i y j | = i , j = 1 n | x i y j | - i = 1 n | x i y i | = i , j = 1 n | x i | i , j = 1 n | y j | - i = 1 n | x i y i | .
(68)

The computational complexity of the minuend is O(M1), where M1 is defined in Section 2.3, while the computational complexity of the subtrahend is less than O(M1).

Hence, the computational complexity of AASEE is O(M1). We can see that it is the same as that of AATRE and is lower than that of AACHA.

5.2 Accuracy

The accuracy of d ̂ is influential to the accuracy of the affine approximation methods of multiplication. The more accurate d ̂ will lead to a more accurate the affine approximation result.

For AATRE, d ̂ = i = 1 n | x i | i = 1 n | y i | ε n + 1 . In this method, the same noise symbol of different variables is considered to be independent. The range of this d ̂ is

- i = 1 n | x i | i = 1 n | y i | , i = 1 n | x i | i = 1 n | y i | .
(69)

It is much larger than the range of d ̂ by AASEE, which is expressed in (62) and (64).

In AACHA, d ̂ = a + b 2 + b - a 2 ε n + 1 , where a and b are represented the estimated range of d ̂ . In this method, a polygon in XY plane is used to find a and b. The domain of x ̂ ŷ is bounded by the polygon. However, the polygon is larger than the true domain, and all the same noise symbols of different variables are not taken into account together.

All the same noise symbols of different variables are considered together by d ̂ of AASEE. It is more accurate than d ̂ of AATRE. In the most cases, it is more accurate than d ̂ of AACHA, too.

6 Case studies

The following nonlinear system cases are used to demonstrate the efficiency of the proposed refined affine form of multiplication. These cases are commonly used in signal processing. The first two cases are univariate cases and come from [11]. The rest of cases are multivariate polynomial functions and come from [2729].

6.1 Introduction of the cases

Case 1. Polynomial approximation. The first case study is that degree-four polynomial for the approximation of y = ln(1 + x), where x = [0,1]. Horner’s rule evaluates the polynomial

y = ( ( ( - 0.0550 x + 0.2168 ) x - 0.4645 ) x + 0.9956 ) x + 0.0001 ,

where the coefficients are obtained by polynomial curve fitting technique.

Case 2. B-splines Uniform cubic B-splines are commonly used for image warping [30]. Basic functions B0, B1, B2, and B3 in B-spline are defined as

B 0 ( u ) = 1 6 ( 1 - u ) 3 , B 1 ( u ) = 1 6 ( 3 u 3 - 6 u 2 + 4 ) , B 2 ( u ) = 1 6 ( - 3 u 3 + 3 u 2 + 3 u + 1 ) , B 3 ( u ) = - u 3 6 ,

where u = [0, 1].

Case 3. Multivariate polynomial functions. In the third case, eight multivariate polynomial functions are examined. They are as follows:

  1. 1.

    Savitzky-Golay filter:

    f 1 ( X ) = 7 x 1 3 - 984 x 2 3 - 76 x 1 2 x 2 + 92 x 1 x 2 2 + 7 x 1 2 - 39 x 1 x 2 - 46 x 2 2 + 7 x 1 - 46 x 2 - 75 where the input range: X = [ - 2 , 2 ] 2
  2. 2.

    Image rejection unit:

    f 2 ( X ) = 16384 x 1 4 + x 2 4 + 64767 x 1 2 - x 2 2 + x 1 - x 2 + 57344 x 1 x 2 ( x 1 - x 2 ) where the input range: X = [ 0 , 1 ] 2
  3. 3.

    A random function:

    f 3 ( X ) = ( x 1 - 1 ) ( x 1 + 2 ) ( x 2 + 1 ) ( x 2 - 2 ) x 3 2 where the input range: X = [ - 2 , 2 ] 3
  4. 4.

    Mitchell function:

    f 4 ( X ) = 4 x 1 4 + x 2 2 + x 3 2 2 + 17 x 1 2 x 2 2 + x 3 2 - 20 x 1 2 + x 2 2 + x 3 2 + 17 where the input range: X = [ - 2 , 2 ] 3
  5. 5.

    Matyas function:

    f 5 ( X ) = 0.26 ( x 1 2 + x 2 2 ) - 0.48 x 1 x 2 where the input range: X = [ - 100 , 100 ] 2
  6. 6.

    Three-hump function:

    f 6 ( X ) = 12 x 1 2 - 6.3 x 1 4 + x 1 6 + 6 x 2 ( x 2 - x 1 ) where the input range: X = [ - 10 , 10 ] 2
  7. 7.

    Goldstein-Price function:

    f 7 ( X ) = 1 + ( x 1 + x 2 + 1 ) 2 19 - 14 x 1 + 3 x 1 2 - 14 x 2 + 6 x 1 x 2 + 3 x 2 2 × 30 + ( 2 x 1 - 3 x 2 ) 2 × 18 - 32 x 1 + 12 x 1 2 + 48 x 2 - 36 x 1 x 2 + 27 x 2 2 where the input range: X = [ - 2 , 2 ] 2
  8. 8.

    Ratscheck function:

    f 8 ( X ) = 4 x 1 2 - 2.1 x 1 4 + 1 3 x 1 6 + x 1 x 2 - 4 x 2 2 + 4 x 2 4 where the input range: X = [ - 100 , 100 ] 2

6.2 Analysis of case 1

For the input range x = [0, 1], equivalent affine form is x ̂ =0.5+0.5 ε 1 . For case 1, the intermediate and output signals are defined as

y 1 = - 0.0550 x + 0.2168 , y 2 = y 1 x - 0.4645 , y 3 = y 2 x + 0.9956 , y = y 3 x + 0.0001 .
(70)

Using AATRE, the affine forms of intermediate and output are

y 1 = 0.1893 - 0.0275 ε 1 , y 2 = - 0.36985 + 0.0809 ε 1 + 0.01375 ε 2 , y 3 = 0.81068 - 0.14448 ε 1 + 0.00688 ε 2 + 0.04733 ε 3 , y = 0.4054 + 0.3331 ε 1 + 0.0034 ε 2 + 0.0237 ε 3 + 0.0993 ε 4 .

Using AACHA, the affine forms of intermediate and output are

y 1 = 0.1893 - 0.0275 ε 1 , y 2 = - 0.3768 + 0.0809 ε 1 + 0.0069 ε 2 , y 3 = 0.8291 - 0.1479 ε 1 + 0.0034 ε 2 + 0.0220 ε 3 , y = 0.3761 + 0.3406 ε 1 + 0.0017 ε 2 + 0.0110 ε 3 + 0.0436 ε 4 .

Using AASEE, the affine forms of intermediate and output are

y 1 = 0.1893 - 0.0275 ε 1 , y 2 = - 0.37673 + 0.0809 ε 1 + 0.00688 ε 2 , y 3 = 0.84769 - 0.14791 ε 1 + 0.00344 ε 2 + 0.00344 ε 3 , y = 0.34999 + 0.34989 ε 1 + 0.00172 ( ε 2 + ε 3 ) + 0.00344 ε 4 .

Table 1 shows the variable ranges and the range intervals, (ymax-ymin), of intermediates and output by the three methods. The true range of y lies in [0,0.6931], and the range interval of output is 0.6931. Suppose R(T), R(C), and R(A) are represented as the ratios of range interval obtained by AATRE, AACHA, and AASEE to the true range interval, respectively. The closer this ratio converges to 1, the more accurate the method is. In this case, as R(T) = 1.33, R(C) = 1.15, and R(A) = 1.03, we can see the range by AASEE is closer to the true range than AATRE and AACHA.

Table 1 Comparison of ranges and range intervals for every variable of the three methods for case 1

6.3 Comparison of range and computational complexity by the three cases

The output ranges by the three methods of case 2 and case 3 can be obtained according to the process of case 1.

Table 2 demonstrates the ranges and the integer word lengths by AASEE and comparison among AATRE, AACHA and AASEE. Column c.fun shows the case study and the function of the row. The true output ranges, which are used as reference values, are obtained by numerical method or nonlinear programming technique, which are time-consuming and are not practical to solve the true bounds for large number of signals. From the table, we can see that the ranges, which are derived by AASEE, cover the true ranges and they are smaller than those by AATRE, for all the functions. For these thirteen functions, the ranges, which are derived by AASEE, are smaller than those by AACHA for nine functions, and equal to those by AACHA for two functions. According to (1), the integer word length can be decided by the range. The integer word-length, which is derived by AASEE, is 2 b less than that by AATRE and 1 b less than that by AACHA, at most. Comparing with AATRE, AASEE and AACHA can save 0.54 b on average.

Table 2 Comparison of analytical ranges and bits by the three methods

To calculate the estimated range of d f , the values of ∃ε i  = ±1, ∀i = 1, 2, …, n in (27) are substituted by ∀ε i  = ±1 in AASEE. The difference between the estimated range and the true range of d f is introduced by this approximation. In most of the applications, the estimated ranges, which are computed by AASEE, are closer than those by AACHA. However, the estimated minimum and maximum of x ̂ ŷ on the boundary of the polygon are independent of the value of ε i . In some applications such as functions f2 and f8 in Table 2, the results by AASEE are almost the same as those by AACHA.

In Table 3, ratios of range intervals and the computational complexity are compared among AATRE, AACHA, and AASEE. The computational complexity is calculated from the numbers of multiplications and additions. For AACHA, the extreme value of a quadratic function in one variable on a bounded interval needs to be calculated. Nm, Na, and Ne denote the numbers of multiplications, additions and the extreme value computations of each case, respectively. Table 3 shows that R(T) values are from 1.04 to 281.2, R(C) are from 1.03 to 233.7, and R(A) are from 1.03 to 192.9. The ratios of R(A) to R(T) and R(C) show the accuracy of AASEE compared to AATRE and AACHA, respectively. The average ratios can be used to evaluate the accuracy of the affine approximation methods. The ratios of R(A) to R(T) are from 0.18 to 0.99, and the average of these ratios is 0.59. The ratios of R(A) to R(C) are from 0.33 to 1.17, and the average of these ratios is 0.89. For these 13 cases, on average, the accuracy of AASEE is 1.69 times than that of AATRE and 1.12 times than that of AACHA. The extreme value computation, which is only necessary for AACHA, of the quadratic function is the most complex and time-consuming among the operations. Hence, the computational complexity of AACHA is much higher than that of AATRE and AASEE. The increase rate of the number of multiplications, Nm, by AASEE to AATRE is from 0.091 to 1.75, and the average is 0.450. The increase rate of the number of multiplications, Nm, by AASEE to AACHA is from 0.2 to 1.833, and the average is 0.567. The increase rate of the number of additions, Na, by AASEE to AATRE is from 0.05 to 3.4, and the average is 0.944. The increase rate of the number of additions, Na, by AASEE to AACHA is from 0 to 0.985, and the average is 0.157. The numbers of multiplications and additions of AASEE are increased a few. As shown in Table 3, AACHA is slightly more accurate for functions c3.f2 and c3.f8, but the computational complexity of AACHA is much higher than that of AASEE.

Table 3 Comparison of range ratios and computational complexity by the three methods

6.4 Comparison of the design cost by the three methods

To compare the design cost, the system area by the three methods, the fractional word lengths are obtained by the precise analysis in [11]. Typically, we select the case of a random function of case 3, c3.f3, for this section. The design of c3.f3 is synthesized on Xilinx Xc2vp30-7ff896 FPGA device (Xilinx, San Jose, CA, USA).

Figure 3 shows the area variation for c3.f3 with increasing target precision. It can be seen that the area, which is calculated by AASEE, is less than that by AATRE and AACHA, and the area difference between them is increasing with the target precision. This difference is from 265 to 729 with the target precision increased. Such optimization of integer word length can save area.

Figure 3
figure 3

Area variation for c 3 . f 3 with increasing target precision.

Figure 4 shows the percentage area saving of AASEE over AATRE at different target precision for c3.f3. The percentage area saving is from 14.34% to 5.62% with the target precision increased. Generally, we obtain increased relative saving for lower precision.

Figure 4
figure 4

Percentage area saving of AASEE over AATRE at different target precision for c 3 . f 3 .

7 Conclusions

This paper presents a novel affine approximation method for multiplication, Approximation Affine based on Space Extreme Estimation. In this method, an extra noise symbol is added to an approximated affine form.

To reduce the uncertainty in AA, we derive this method in the (n + 1)-dimensional space En+1. In space En+1, approximate affine form can be regarded as the tangent hyperplane at a certain point of (n + 1)-dimensional space curved surface. Using the linear geometry, it is proven that the f z of AASEE is the closest to the result of multiplication among all the possible approximate affine forms. Taking ε i as the input arguments, all the same noise symbols of different variables are taken into account together. Hence, the uncertainty of d ̂ of AASEE is reduced. Based on the extreme value theory of multivariable functions, we can prove that the range of this d ̂ covers the true range of the difference introduced by approximation and much tighter than that by AATRE and AACHA.

The uncertainty in AASEE is much smaller than that in AATRE and AACHA on average. At the same time, the computational complexity of AASEE is the same as that of AATRE and lower than that of AACHA.

In the case studies, the accuracy of AASEE is 1.69 times than that of AATRE and 1.12 times than that of AACHA on average. The integer word length, which is derived by AASEE, is 2 b less than that by AATRE and 1 b less than that by AACHA, at most. For the case of c3.f3, the area, which is computed by AASEE, is less than that by AATRE and AACHA, and the percentage area saving of AASEE over AATRE is from 14.34% to 5.62% with the target precision increased.