1 Introduction

Verification methods prove that a given a numerical problem is solvable and produce mathematically rigorous error bounds for the solution of the problem. For an overview of verification methods cf. [5, 8] and [in Japanese] [6].

When developing a new verification method, it is desirable to have some measure for the quality of an inclusion. We consider an inclusion interval X as error bounds for an unknown real quantity \(\hat{x}\), i.e., \(\hat{x} \in X\). Depending on the situation, we use synonymous notations for an inclusion interval, namely

$$\begin{aligned} X= & {} [\underline{x},\overline{x}]:= \{x \in \mathbb {R}: \underline{x}\le x \le \overline{x}\} \\= & {} \langle m,r \rangle := \{ x \in \mathbb {R}: m-r \le x \le m+r \} \ . \end{aligned}$$

A colloquial notation is \(\langle m,r \rangle = m \pm r\). Consider

$$\begin{aligned} X_1:= [-1,2], \quad X_2:= [-1,1], \quad \text{ and }\quad X_3:= [1,2] \ . \end{aligned}$$

It seems that all three intervals do not give much information, only \(X_3\) proves at least that \(\hat{x}\) is positive. Now let A be a symmetric matrix with \(\Vert A\Vert _2 = 10^{10}\) and let the \(X_\nu \) be inclusions of an eigenvalue. Then all three inclusions \(X_\nu \) reveal that the condition number \(\frac{\sigma _{\max }(A)}{\sigma _{\min }(A)}\) of A is at least \(5\cdot 10^9\).

The quality of an interval inclusion depends on the context. Having said that, it may nevertheless be desirable to define a measure for the quality of an interval, knowing the pros and cons of such an attempt. There is some folklore about such measures, however, to that end we found only one paper in the literature, see below.

In this note we develop some criteria for such a measure. We start with some theoretical considerations in the next section, and conclude with some practical remarks.

2 Theoretical considerations

Let \(\varrho : \mathbb {R}\times \mathbb {R}_{\ge 0} \rightarrow \mathbb {R}_{\ge 0}\) be a function for the quality \(\varrho (m,r)\) of \(\langle m,r \rangle \). The letter \(\varrho \) may remind of “relative error”, however, we prefer the wording “quality” because mathematically \(\varrho \) may be interpreted as relative error, but only in a certain sense (see below). Note that \(\varrho (m,r)=0\) means best quality. We first list some desirable properties of such a function:

$$\begin{aligned} \begin{array}{rrl} \text{ I) } \; &{} \text{ non-negativity }\quad \; &{} \varrho (m,r) \ge 0 \\ \text{ II) } \;&{} \text{ zero } \text{ value } \quad \; &{} \varrho (m,r)=0 \; \Leftrightarrow \; r = 0 \\ \text{ III) } \;&{} \text{ scaling } \text{ invariance } \quad \; &{} \varrho (X) \; = \; \varrho (\alpha X) \quad \text{ for }\;\; 0 \ne \alpha \in \mathbb {R}\\ \text{ IV) } \;&{} \text{ monotonicity } \text{ for } \text{ fixed } \text{ m } \quad \; &{} r'> r \; \Rightarrow \; \varrho (m,r')> \varrho (m,r) \\ \text{ V) } \;&{} \text{ monotonicity } \text{ for } \text{ fixed } \text{ r } \quad \; &{} |m'| > |m| \; \Rightarrow \; \varrho (m',r) < \varrho (m,r) \\ \end{array} \end{aligned}$$

The rationale is as follows. Properties I) and II) are clear. As for III), the quality of an inclusion interval X may well depend on the scaling for different settings, see the above example. However, without knowing any setting, invariance with respect to scaling seems the only option. For the monotonicity, an interval with constant midpoint but increasing radius gives less information, and with constant radius but increasing absolute value of the midpointFootnote 1 the interval contains, in some sense, more information.

Moreover, we may demand \(\varrho \) to be continuous in m and r except for \(m=r=0\) because for \(r>0\) it follows \(\varrho (0,0) < \varrho (0,r) = \varrho (0,1)\). As for differentiability note that \(\varrho (m,r) = \varrho (-m,r)\) would imply \(\frac{d\varrho }{dm}(0,r)=0\) for all \(r>0\), but then V) and I) lead to a contradiction. Therefore we require

$$\begin{aligned} \begin{array}{lll} \text{ VI) } &{} \text{ continuity } \quad &{} \varrho (m,r)\; \text{ is } \text{ everywhere } \text{ continuous } \text{ except } \text{ for }\; m=r=0\\ \text{ VII) } &{} \text{ differentiability } \; &{} \varrho (m,r)\; \text{ is } \text{ everywhere } \text{ differentiable } \text{ except } \text{ for }\; m=0\\ \end{array} \end{aligned}$$

Having listed the desired properties, we look for possible candidates. An obvious choice is to use the midpoint m of \(X = \langle m,r \rangle \) as an approximation and define \(\varrho (X)\) to be the largest relative error of \(x \in X\) with respect to m:

$$\begin{aligned} \varrho _1(m,r) := \max _{x \in X} \left| \dfrac{x-m}{m} \right| \quad \text{ implying }\quad \varrho _1(X) = \left| \dfrac{\overline{x}-\underline{x}}{\underline{x}+\overline{x}} \right| \ . \end{aligned}$$
(1)

All properties I) to VII) are satisfied, however, for a small or zero unknown real quantity \(\hat{x}\) the midpoint may be zero causing an obvious problem. In this case \(\varrho _1(0,r)\) is infinite no matter how small the radius r is.

A remedy is to use the maximum over the minimal relative error against some \(\tilde{x} \in X\), i.e.,

$$\begin{aligned} \varrho _2(X) := \min _{\tilde{x} \in X} \max _{x \in X} \left| \dfrac{\tilde{x}-x}{\tilde{x}} \right| \ . \end{aligned}$$
(2)

That is the definition in [4], the only reference we found. It is shown that

$$\begin{aligned} \varrho _2(m,r) = \; \left\{ \begin{array}{ll} \dfrac{r}{|m|} &{} \quad \text{ if } |m|-r \ge 0 \\ \dfrac{2r}{\max (|m-r|,m+r)} &{} \quad \text{ otherwise } \text{. } \end{array}\right. \end{aligned}$$

The properties I) to VI) are satisfied for \(\varrho _2\), however, differentiability VII) is not met:

$$\begin{aligned} \varrho _2(1,1+e) = \; \left\{ \begin{array}{ll} 1+e &{} \quad \text{ if } e \le 0 \\ \dfrac{1+e}{1+e/2} &{} \quad \text{ if } e \ge 0 \ .\\ \end{array}\right. \end{aligned}$$

As has been mentioned there is some folklore about quality measures, in particular

$$\begin{aligned} \varrho _3(X) := \dfrac{\overline{x}-\underline{x}}{|\underline{x}|+|\overline{x}|} \end{aligned}$$
(3)

with \(0/0:=0\). That avoids the zero midpoint problem, but for all intervals X containing zero \(\underline{x}\le 0 \le \overline{x}\) implies

$$\begin{aligned} 0 \in X: \quad \varrho _3(X) = \dfrac{\overline{x}+|\underline{x}|}{|\underline{x}|+\overline{x}} = 1 \ . \end{aligned}$$

The properties I) to VI are satisfied, but \(\varrho _3\) is not differentiable for one endpoint zero:

$$\begin{aligned} \varrho _3([0,e]) = \; \left\{ \begin{array}{ll} 1 &{} \quad \text{ if } e > 0 \\ \dfrac{e}{|e|} &{} \quad \text{ if } e < 0 \ .\\ \end{array}\right. \end{aligned}$$

In order to find a function \(\varrho \) sharing all properties I) to VII) but avoiding the problems for zero midpoint we use, in view of \(\varrho (m,r)=\varrho (-m,r)\), the ansatz

$$\begin{aligned} \varrho (m,r) = \dfrac{\alpha |m| + \beta r}{\gamma |m| + \delta r} \end{aligned}$$

for constants \(\alpha ,\beta ,\gamma ,\delta \) to be determined. Property II) implies \(\alpha =0\) and \(\gamma \ne 0\), so that using III) and some scaling we can restrict our attention to

$$\begin{aligned} \varrho (m,r) = \psi \; \dfrac{r}{\varphi |m| + r} \end{aligned}$$

with a scaling factor \(\psi \) defining the maximum of \(\varrho \). Rewriting \(\varrho (m,r) = \psi \left( \varphi \frac{|m|}{r} + 1 \right) ^{-1}\) it is easy to verify that this definition satisfies all properties I) to VII) for any \(\varphi >0\). In order to find a suitable choice for \(\varphi \) we look at intervals with fixed left endpoint \(\underline{x}= -1\) and right endpoints \(-1 \le \overline{x}\le 1\), that is \(X_r:= \langle -1+r,r \rangle \) for \(0 \le r \le 1\). Then

$$\begin{aligned} \varrho (X_r) = \dfrac{\psi r}{\varphi (1-r) + r}. \end{aligned}$$

A good choice may be \(\varphi =1\) in which case \(\varrho (X_r)\) grows linearly with r. Hence,

$$\begin{aligned} \varrho (m,r):= \dfrac{\psi r}{|m|+r} \ . \end{aligned}$$

Now it is a matter of taste to fix \(\psi \). We may feel that \(\varrho ([0,1])=1\) should hold. That implies \(\psi =2\), so that we define

$$\begin{aligned} \varrho _4(m,r) := \dfrac{2r}{|m|+r} \end{aligned}$$
(4)

implying \(\varrho _4(m,r) \le 2\) for all mr. For \(X = [ \underline{x}, \overline{x}]\) it follows

$$\begin{aligned} \varrho _4(X) = \min \left( \left| \dfrac{\overline{x}-\underline{x}}{\underline{x}} \right| , \left| \dfrac{\overline{x}-\underline{x}}{\overline{x}} \right| \right) \end{aligned}$$

with the convention \(\frac{0}{0}=0\), the minimal relative error of the endpoints against each other. In verification methods \(\text{ mag }(X):= \max \{ |x|: x \in X \}\) is called the magnitude of an interval. Hence \(\varrho _4(X) = \text{ diam }(X)/\text{mag }(X)\). An advantage over \(\varrho _3\) is that no case distinction is necessary in the computation. An almost identical formulation

$$\begin{aligned} \varrho _4'(X) = \dfrac{\overline{x}-\underline{x}}{\max (|\underline{x}|,|\overline{x}|,\eta )} \end{aligned}$$

was suggested by Demmel [1]. It is equal to \(\varrho _4\) except that it is tailored to binary64 of the IEEE754 [3] arithmetic standard by using the gradual underflow unit, i.e., the smallest positive floating-point number \(\eta = 2^{-1074}\). If the endpoints \(\underline{x},\overline{x}\) are binary64 floating-point numbers, then \(\varrho _4(X) = \varrho _4'(X)\).

Fig. 1
figure 1

The functions \(\varrho _\nu \) for fixed midpoint \(m=1\) (left) and fixed left endpoint \(-\)1 (right)

In Fig. 1 the four definitions \(\varrho _\nu \) are compared for fixed midpoint \(m=1\) and for fixed left endpoint \(\underline{x}= -1\).

The first function \(\varrho _1\) [relative error against midpoint, red] shows a linear behaviour for fixed midpoint and growing radius, and tends to infinity if the midpoint approaches zero. As discussed the second function \(\varrho _2\) [Kreinovich’s definition, black with circles] it is not differentiable at \(m=r\). The “folklore” function \(\varrho _3\) [green] is not differentiable for zero endpoint and flat equal to the maximal value 1 for intervals containing zero, no discrimination in terms of small or large radius. Moreover, it is not concave. Finally, the new definition \(\varrho _4\) [blue] is, as \(\varrho _1\), linear for fixed midpoint and growing radius, and everywhere differentiable except for \(m=0\).

The first three definitions coincide in the left picture for \(X=\langle 1,r \rangle \) with \(r \in [0,1]\), and in the right picture for \(X = [-1,-1+d]\) with \(d \in [0,1]\). In both pictures Kreinovich’s definition \(\varrho _2\) and the proposed \(\varrho _4\) coincide for \(r \ge 1\) and \(d \ge 1\), respectively. So the proposed measure \(\varrho _4\) differs from the other definitions for \(r \in [0,1]\) and \(d \in [0,1]\) in the left and right picture, respectively. This ensures differentiability everywhere except zero midpoint.

The definition \(\varrho _4(X) = \frac{\text{ diam }(X)}{\text{ mag }(X)}\) with the interpretation \(\frac{0}{0}=0\) can be used for complex intervals as well. It replaced the function relerr in the latest Version 13 of INTLAB [7], the Matlab/Octave toolbox for reliable computing. Executable Matlab/INTLAB code is as follows:

figure a

The code is working for scalar, vector and matrix input X, full or sparse, real or complex. The “if”-statement takes care of \(\frac{0}{0}\), and of sparse input avoiding full output.

3 Practical considerations

Our definition \(\varrho _4(X)\) seems a good theoretical measure for the relative error of an interval X. However, from a practical and numerical point of view, there is a drawback. Mathematically a small \(\varrho _4(Y)\) means a small forward error, i.e., a small relative error with respect to the true result. But numerically we can only hope for a small backward error, introduced and popularized by Wilkinson [11, 12], see also [2]. The backward error of an approximation \(\tilde{x}\) is small if \(\tilde{x}\) is the true solution of the original problem after a small perturbation of the input data. Without further measurements such as a residual iteration that is about the best what we can expect.

Now consider, similar to our introductory problem, an approximation \(\tilde{x} = 1.23456 \cdot 10^{-10}\) of a singular value of a matrix A with \(\Vert A\Vert _2 = 1\) to the true singular value \(\hat{x} = 1.23457 \cdot 10^{-10}\). Then \(\varrho _4(\tilde{x} \underline{\cup } \hat{x}) = 8.1 \cdot 10^{-6}\). If computed in binary64 equivalent to some 16 decimals precision, the accuracy of \(\tilde{x}\) might be considered as not bad, but far from best possible. With the additional information of the context \(\Vert A\Vert _2 = 1\), however, we know that this is close to the best possible approximation we can hope for.

Therefore, from a practical and numerical point of view it seems reasonable to pass information about the context. We therefore propose a relative accuracy defined by

$$\begin{aligned} \alpha (X,\tau ) := \dfrac{\text{ diam }(X)}{\max (\text{ mag }(X),\tau )} \ , \end{aligned}$$
(5)

where \(\tau \) is the context information. That implies \(\alpha (X,\Vert A\Vert _2) = 10^{-15}\), a value we may expect from a practical, numerical point of view. In Version 13 of INTLAB the function relacc computes the relative accuracy. A typical call is

$$\begin{aligned} \texttt {alpha = relacc(X,'thresh',tau);} \end{aligned}$$

The following Fig. 2 illustrates this definition and compares it to the relative error \(\varrho _4\). We compute approximations \(s_k\) of the singular values of a square matrix with 1000 rows and condition number \(10^{12}\). The well accepted rule thumb says that the approximations \(s_k\) of the smallest singular values may be correct to some 4 decimals. The dotted green lineFootnote 2 in Fig. 2 displays the values \(\varrho _4(s_k \underline{\cup } \sigma _k)\), where \(\sigma _k\) are the true singular values of A. As expected the relative error increases from \(10^{-14}\) for the largest to about \(10^{-6}\) for the smallest singular values. The dotted blue line displays the relative accuracy \(\alpha (X,\Vert A\Vert _2)\) and reflects what we would expect from a numerical point of view.

Fig. 2
figure 2

Relative error and relative accuracy of singular value inclusions

Additionally we use INTLAB’s routine verifysingvalall to compute inclusions \(X_k\) of all singular values of A. The solid black line shows the relative error \(\varrho _4(X)\) of the inclusions, while the solid line displays the relative accuracy \(\alpha (X,\Vert A\Vert _2)\). From the black line we might conclude that the inclusions are of reasonable, but not too good quality for the smallest singular values, whereas the red line shows that the inclusions are of almost best quality for an inclusion method without extra iterative refinement. For other problems the context may be passed similarly.

We want to stress that neither the function relerr nor relacc is a panacea. As noted at the beginning of this note the judgement of the quality of an inclusion depends on the context. As an example let matrices RA be given. Then \(\Vert I-RA\Vert < 1\) for any matrix norm proves that both R and A are nonsingular. Typically, a good choice for R is an approximate inverse of A. Denote by \(\textbf{X}\) the stacked columns of an inclusion of the residual \(I-RA\). As an example, we display the first and last two elements in Table 1.

It is well known that one step of iterative refinement in working precision implies backward stability of the result of Gaussian elimination [9, 10]. A forward stable result, i.e., an approximation with close to maximum accuracy can be achieved with residuals computed in twice the working precision.

The computed \(\textbf{X}\) may be applied in some iterative refinement. The intervals have relatively wide diameters but are small in magnitude. If that is true for all entries, the wide diameters show that a residual of that quality is not suited for iterative refinement, so that relerr provides that information. However, the small magnitude shows that the residuals are good enough to prove that A is nonsingular, so that relacc provides that information.