1 Introduction

The computational task that this paper deals with, when viewed from a high level of abstraction, is equivalent to solving univariate equations with guaranteed accuracy. Since roots of monotone functions are sought, the challenge is not in the root-finding, but in computing the function values. These intermediate computations are notoriously difficult to implement properly in floating-point arithmetic:

  • Under- and overflow problems were reported by Benton and Krishnamoorthy (2003), Ding (1997), Helstrom and Ritcey (1985) independently of us;

  • we also reveal in Baharev and Kemény 2008 that the algorithms of Norton (1983) and Lenth (1987) are exposed to over- and underflow issues, and that the Appendix 12 of Lorenzen and Anderson (1993, p. 374) is most likely bogus due to overflow;

  • catastrophic round-off errors were reported by Frick (1990);

  • drastically increasing computation time and hang-ups were observed by Chattamvelli (1995), Benton and Krishnamoorthy (2003);

  • other noncentral distributions are similarly challenging, see for example Oliveira and Ferreira (2012).

If some of the intermediate computations suffer catastrophic loss of precision, the root-finding method can still succeed, and the final result presented to the user may nevertheless seem plausible. This was the case with the Appendix 12 of Lorenzen and Anderson (1993, p. 374): The errors in the intermediate computations remained unnoticed, despite the authors of the book being very diligent.

In theory, one could analyze the floating-point behavior of all the intermediate computations with pen and paper, and derive mathematically proven bounds on the final numerical error of the entire algorithm when implemented in floating-point arithmetic. However, as the above listed failures demonstrate, this task is too error-prone for humans to carry out for a non-trivial numerical algorithm. One way to mitigate this issue is to automate the numerical error analysis as much as possible: Interval arithmetic (to be discussed in Sect. 2) is a way to keep track of numerical errors automatically, and to guarantee that floating-point issues such as over- and underflow are noticed. However, interval arithmetic still requires that a human analyzes the floating-point behavior of all the basic operations and all the involved mathematical functions with pen and paper, and derives rigorous error bounds for all the possible floating-point inputs. It is still a huge win because “only” the smallest building blocks of the algorithm (basic operations and mathematical functions) have to be analyzed by a human, but not the entire algorithm as before: The numerical error analysis of algorithms built from the human-verified building blocks happens automatically because interval arithmetic keeps track of the floating-point errors as the algorithm is executed. In short, interval arithmetic reduces the amount of analysis that humans have to carry out with pen and paper, but it does not eliminate it. With interval arithmetic we essentially give the rest of the error analysis over to the computer, which then has to finish it as it executes the algorithm. It obviously has a runtime cost; in case of the proposed method, the runtime cost is negligible.

All of those algorithmic failures and inaccuracies that we can still reproduce today could have been prevented by simply cross-checking a sparse but sufficiently wide grid of values against the accurate values provided by the proposed method. By ‘sufficiently wide’ we mean that the grid of values spans the parameter range that is relevant for practical applications. In case of the noncentral beta distribution, we consider this range to be roughly the range spanned by Appendix 12 of Lorenzen and Anderson (1993, p. 374), i.e., shape parameters \(a\in [0.5, 25]\) and \(b\in [0.5, 500]\), type I and type II errors 0.05 and 0.10, respectively. If a numerical algorithm fails or is inaccurate, it typically happens not at isolated tiny regions of the parameter space but over a wide and continuous range, and especially at or near the extremes of the parameter ranges. Therefore, it is not necessary for the grid of cross-checked values to be dense; for cross-checking, it is sufficient if it has points near the extremes of the parameter ranges. We give numerical examples in Sect. 4 to support this claim.

The proposed method is limited to cases where the the degree of freedom of the denominator of the F test statistic is even. Although this is a limitation, it does not affect the usefulness of the proposed method significantly: All of those algorithmic failures and inaccuracies that we can still reproduce today could have been prevented by simply cross-checking against the proposed method.

To the best of our knowledge, the proposed method is the first method that can compute the noncentrality parameter of the noncentral-F distribution with guaranteed accuracy over a wide parameter range that spans the range relevant for practical applications. In our literature research we found the related self-validating numerical methods by Wang and Kennedy Wang and Kennedy (1990, 1992, 1994, 1995), out of which only Wang and Kennedy (1995) is directly relevant. These methods were published more than 20 years ago, and their source code is not publicly available. As we discuss in Baharev and Kemény (2008), the method in Wang and Kennedy (1995) is susceptible to over- and underflow: For example, it would over- or underflow due to the large value of the noncentrality parameter if we tried to compute the top right entry of Table 4 in contrast to the proposed method that has no difficulties there.

1.1 Outline for the rest of the paper

The paper is structured as follows. The proposed method achieves guaranteed accuracy by applying interval arithmetic. In Sect. 2.1 we give an informal overview of interval arithmetic with an example, and the reader can compare and contrast it with ordinary floating-point arithmetic. Interval arithmetic is intentionally treated as a black box in Sect. 2.1: We only present what it delivers, but we do not discuss how. Then, in Sect. 2.2 we give a formal overview of how interval arithmetic guarantees rigorous error bounds. Sect. 3.1 derives the univariate equations that this paper is concerned with, Eqs. (9) and (11). Section 3.2 discusses how interval arithmetic and the interval Newton method can be applied to solve (9) and (11). The formal description of the proposed algorithm is presented in Sect. 3.3 with pseudo-code. We finally present in Sect. 4 two examples from the literature, where the intermediate computations involving the noncentral-F distribution went wrong silently, but the final result of the computations seemed nevertheless plausible, and erroneous results were published. Cross-checking against the proposed method would have caught the numerical errors in both cases.

2 Interval arithmetic

2.1 Automatic numerical error analysis with interval arithmetic: an example

The goal of this example is to demonstrate that interval arithmetic performs automatic numerical error analysis. The reader can think of interval arithmetic as a computationally cheap way to get guaranteed lower and upper bounds on the range of a function over a given domain of the variables. The obtained bounds are not necessarily sharp, but they are guaranteed to enclose the true range of the function despite the intermediate computations being carried out in floating-point arithmetic, and potentially suffer catastrophic loss of precision. Interval arithmetic can safely work with infinity, division by zero, etc., and automatically keeps track of the numerical error propagation throughout the intermediate floating-point computations.

We examine the numerical behavior of the following two functions:

$$\begin{aligned} f(n) = \left( 1 + \frac{1}{n}\right) ^n, \quad g(n) = \left( 1 - \frac{1}{n}\right) ^{-n}, \quad n \ge 2. \end{aligned}$$
(1)

It is known, see e.g. Li and Yeh (2013), that f(n) is monotone increasing, g(n) is monotone decreasing, \({\lim \nolimits _{n\rightarrow \infty }}f(n)= {\lim \nolimits _{n\rightarrow \infty }}g(n)=e\), and as a consequence \(f(n)<e<g(n)\). If we carried out the computations in exact arithmetic, we would get tighter and tighter enclosures for e as n increases. However, we get the erratic results shown in Table 1 when the computations are carried out with 64 bit floating-point numbers on a computer. The source code of the example is available on GitHub at Baharev (2016) if the reader wishes to reproduce the numerical results, or analyze the implementation. For \(k=9,10,11,12,15\), the supposed lower bounding f(n) values exceed the true value of e, where \(n=10^k\); these are the rows with negative entries in the column for \(e-f(n)\). For \(k=9,12,14,15,17\), the supposed upper bounding g(n) values fall below the true value of e; these are the rows with negative entries in the column for \(g(n)-e\). For \(k=9,12,15\), the supposed lower bounding values exceed even the supposed upper bounding values, meaning that we did not even get an enclosure. The f(n) values are supposed to be increasing, but for \(k=13,14,16,17\) it is clearly not the case. Similarly, g(n) is supposed to be monotone decreasing, but it is violated, e.g., at \(k=13,16\).

Table 1 The numerical values of f(n) and g(n) defined in (1) when evaluated with 64 bit floating-point numbers on a computer where \(n = 10^{k}\)

Since the f(n) and g(n) functions are fairly simple, one could carry out a rigorous error analysis of these functions with pen and paper, and figure out the accuracy of the table entries. We will do this automatically with interval arithmetic.

Interval arithmetic takes the function \(f(10^k)\), and the interval for k as input. (The interval for k is just the point interval [kk] for each row in Table 1.) The output of the interval function evaluation is a rigorous enclosure of the range of \(f(10^k)\) over [kk]. In exact arithmetic, the range of \(f(10^k)\) over the point interval [kk] is obviously just a single real number, the value \(f(10^k)\). However, we carry out the computations with 64 bit floating-point numbers on a computer. Interval arithmetic automatically keeps track of the numerical errors that occur during these computations, and we do not get a single real number but a pair of floating-point numbers as a result, that is, an interval enclosing the value of \(f(10^k)\), the interval \([\underline{f}(10^k),\overline{f}(10^k)]\). These intervals are given in Table 2 for \(k=1,2,\dots ,17\), together with the similarly computed \([\underline{g}(10^k),\overline{g}(10^k)]\).

Table 2 Rigorous enclosures of f(n) and g(n) defined in (1) where \(n=10^k\)

The wide intervals in Table 2, for example the rows for \(k\ge 15\), are a clear sign of the intermediate computations suffering catastrophic loss of precision; it is guaranteed that with interval arithmetic we always get informed (through wide intervals) when this happens. We consider this one of the biggest advantages of this approach.

Despite the serious numerical difficulties for \(k\ge 9\), the above discussed properties of f(n) and g(n) are still preserved in some form: (i) \(\overline{f}(n)\) is monotone increasing, and \(\underline{g}(n)\) is monotone decreasing, (ii) \(\underline{f}(n)<e<\overline{g}(n)\) holds (unlike in Table 1, there are no negative entries in the last two columns of Table 2). The fact that these properties are preserved is not a coincidence either but the guaranteed behavior of interval arithmetic. However, note that we did not get tighter and tighter enclosures for e as k increased: The enclosure \([\underline{f}(10^k),\overline{g}(10^k)]\) reaches its minimum width at \(k=8\), then the width starts increasing. We cannot blame interval arithmetic for this: Interval arithmetic is implemented on the top of 64 bit floating-point numbers, and unless one uses some extended precision arithmetic, e cannot be enclosed better with this simple approach. The tightest verified enclosure we got is [2.71828179, 2.71828186] for \(k=8\); indeed, the correct value is \(2.718281828\ldots \), and it is enclosed.

Let us emphasize again that for this simple example one could have derived bounds on the numerical errors of the entries in Table 1 with pen and paper. The advantage of interval arithmetic is that the numerical error analysis of the computations happens fully automatically, and therefore certain kinds of human errors are completely eliminated.

2.2 A formal overview of interval arithmetic

Interval arithmetic is an extension of real arithmetic defined on the set of real intervals, rather than the set of real numbers. According to a survey paper by Kearfott (1996), a form of interval arithmetic perhaps first appeared in Burkill (1924). Modern interval arithmetic was originally invented independently in the late 1950s by several researchers; including Warmus (1956), Sunaga (1958) and finally Moore (1959), who set firm foundations for the field in his many publications, including the foundational book Moore (1966). Since then, interval arithmetic is being used to rigorously solve numerical problems.

Let \(\mathbb {IR}\) be the set of all real intervals, and take \(\mathbf {x}, \mathbf {y}\in \mathbb {IR}\). We set \(\underline{x} := \inf {\mathbf {x}}\) and \(\overline{x} := \sup {\mathbf {x}}\), such that \(\mathbf {x}=[\underline{x},\overline{x}]\). Furthermore, the width of \(\mathbf {x}\) is defined as \({\text {wid}}(\mathbf {x}) := \overline{x}-\underline{x}\), the radius of \(\mathbf {x}\) as \({\text {rad}}(\mathbf {x}) := \tfrac{1}{2}{\text {wid}}(\mathbf {x})\), the magnitude of \(\mathbf {x}\) as \(|\mathbf {x}| := \max (\underline{x},\overline{x})\), and the mignitude of \(\mathbf {x}\) as \(\langle \mathbf {x}\rangle := \min \{|x|\mid x\in \mathbf {y}\}\). For \(\mathbf {x}\) bounded we set the midpoint of \(\mathbf {x}\) as \(\check{x} := \tfrac{1}{2}(\underline{x}+\overline{x})\). We define the elementary operations for interval arithmetic by the rule \(\mathbf {x}\diamond \mathbf {y}= \Box \{x \diamond y \mid x \in \mathbf {x}, y \in \mathbf {y}\}, \forall \diamond \in \{+, -, \times , \div , \hat{}~\}\), where \(\Box S\) denotes the smallest interval containing the set S. (The symbol ‘\(\Box \)’ is a box, and it refers to the tightest interval hull, also called as box hull.) Thus, the ranges of the five elementary interval arithmetic operations are exactly the ranges of their real-valued counterparts. Although this rule characterizes these operations mathematically, the usefulness of interval arithmetic is due to the operational definitions based on interval bounds Hickey et al. (2001). For example, let \(\mathbf {x}= [\underline{x}, \overline{x}]\) and \(\mathbf {y}= [\underline{y}, \overline{y}]\), it can be easily proved that

$$\begin{aligned} \mathbf {x}+ \mathbf {y}= & {} [\underline{x}+\underline{y},\overline{x}+\overline{y}],\\ \mathbf {x}- \mathbf {y}= & {} [\underline{x}-\overline{y},\overline{x}-\underline{y}],\\ \mathbf {x}\times \mathbf {y}= & {} [\min \{\underline{x}\underline{y}, \underline{x}\overline{y}, \overline{x}\underline{y}, \overline{x}\overline{y} \}, \max \{\underline{x}\underline{y}, \underline{x}\overline{y}, \overline{x}\underline{y}, \overline{x}\overline{y} \}],\\ \mathbf {x}\div \mathbf {y}= & {} \mathbf {x}\times 1/\mathbf {y} if 0 \notin \mathbf {y}, where 1/\mathbf {y}= [1/\overline{y}, 1/\underline{y}],\\ \mathbf {x}^{\mathbf {y}}= & {} [\min \{\underline{x}^{\underline{y}}, \underline{x}^{\overline{y}}, \overline{x}^{\underline{y}}, \overline{x}^{\overline{y}} \}, \max \{\underline{x}^{\underline{y}}, \underline{x}^{\overline{y}}, \overline{x}^{\underline{y}}, \overline{x}^{\overline{y}} \}],~~\underline{y}>0, ~~\underline{x}\ge 0 . \end{aligned}$$

In addition, for a function \(\varphi :\mathbb {R}\rightarrow \mathbb {R}\) and an interval \(\mathbf {x}\) we define

$$\begin{aligned} \varphi (\mathbf {x}) := \Box \{\varphi (x)\mid x\in \mathbf {x}\}. \end{aligned}$$

Moreover, if a function f composed of these elementary arithmetic operations and elementary functions \(\varphi \in \{\sin ,\cos ,\exp ,\log ,\ldots \}\), i.e., factorable function, is given, bounds on the range of f can be obtained by replacing the real arithmetic operations and the real functions by their corresponding interval arithmetic counterparts.

The finite nature of computers precludes an exact representation of the reals. In practice, the real set, \(\mathbb {R}\), is approximated by a finite set \(\bar{\mathbb {F}} = \mathbb {F}\cup \{-\infty , +\infty \}\), where \(\mathbb {F}\) is the set of floating-point numbers. The set of real intervals is then approximated by the set \(\mathbb {I}\) of intervals with bounds in \(\bar{\mathbb {F}}\). The power of interval arithmetic lies in its implementation on computers. In particular, outwardly rounded interval arithmetic allows rigorous enclosures for the ranges of operations and functions. This makes a qualitative difference in scientific computations, since the results are now intervals in which the exact result is guaranteed to lie. Interval arithmetic can be carried out for virtually every expression that can be evaluated with floating-point arithmetic. However, two important points have to be considered: Interval arithmetic is only subdistributive, so expressions that are equivalent in real arithmetic differ in interval arithmetic, giving different amounts of overestimation (the amount by which the real range of the function over an interval and the result computed by interval arithmetic differ). Therefore, computations should be arranged so that overestimation of ranges is minimized. Readers are referred to Alefeld and Herzberger (1983), Neumaier (1990), Hickey et al. (2001), Jaulin et al. (2001) for more details on basic interval methods.

Let \(f:\mathbf {x}\rightarrow \mathbb {R}\) be continuously differentiable, and assume the existence of \(\hat{x}\in \mathbf {x}\) with \(f(\hat{x}) = 0\), and let \(\tilde{x}\in \mathbf {x}\). Then by the mean value theorem we get

$$\begin{aligned} f(\hat{x}) = 0 = f(\tilde{x}) + f'(\xi )(\hat{x}-\tilde{x}), \end{aligned}$$

for some \(\xi \in \mathbf {x}\). Therefore,

$$\begin{aligned} \hat{x} = \tilde{x} - \frac{f(\tilde{x})}{f'(\xi )}. \end{aligned}$$

Now let \(\mathfrak f'\) be an interval extension of \(f'\) and \(\mathfrak f\) be an interval extension of f, i.e. a specific expression for computing an enclosure for the range of f over a given input interval \(\mathbf {x}\). Then by the properties of interval arithmetic we get

$$\begin{aligned} \hat{x} \in \tilde{x} - \frac{\mathfrak f(\tilde{x})}{\mathfrak f'(\mathbf {x})} =: N(\mathfrak f,\mathfrak f';\mathbf {x},\tilde{x}). \end{aligned}$$

The operator N is called univariate interval Newton operator. Using this operator we can define the interval Newton iteration as

$$\begin{aligned} \mathbf {x}^{(k+1)} = \mathbf {x}^{(k)}\cap N(\mathfrak f,\mathfrak f';\mathbf {x}^{(k)},\check{x}^{(k)}), \end{aligned}$$
(2)

starting with \(\mathbf {x}^{(0)} = \mathbf {x}\). This iteration has the properties that whenever \(\mathbf {x}^{(k)} = \emptyset \) for some k, then \(\mathbf {x}\) does not contain a zero of f. Otherwise \(\hat{x}\in \mathbf {x}^{(k)}\) for all k, and \({\text {wid}}(\mathbf {x}^{(k+1)}) = O({\text {wid}}(\mathbf {x}^{(k)})^2)\) locally under mild assumptions on f and \(\mathbf {x}^{(0)}\). Furthermore, if for any k we find that \(\mathbf {x}^{(k+1)} \subseteq \text {int}(\mathbf {x}^{(k)})\), i.e., that the interval Newton operator maps the box \(\mathbf {x}^{(k)}\) into its interior, then \(\mathbf {x}^{(k)}\) contains a unique zero of f.

The interval Newton operator requires an interval extension \(\mathfrak {f}'\) of the derivative of f. For every factorable function f such an extension can be constructed using algorithmic differentiation techniques, e.g., see Berz et al. (1996), Griewank and Corliss (1991), Griewank and Walther (2008). For univariate functions the most efficient approach is via the algebra of differential numbers \(\mathcal {D}_1:=\mathbb {R}\times \mathbb {R}\), equipped with the following basic operations: Let \(df:=(f,f'), dg:=(g,g')\in \mathcal {D}_1\), and \(\varphi :\mathbb {R}\rightarrow \mathbb {R}\). Define

$$\begin{aligned} \begin{aligned} df\pm dg&:= (f\pm g,f'\pm g'),\\ df\cdot dg&:= (f\cdot g, f'\cdot g + f\cdot g'),\\ df/dg&:= (f/g, (f'\cdot g - f\cdot g')/g^2),\quad g\ne 0,\\ df^{dg}&:= (f^g, f^g\cdot (g'\cdot \log (f)+g\cdot f'/f)),\quad f>0,\\ \varphi (df)&:= (\varphi (f),\varphi '(f)\cdot f'). \end{aligned} \end{aligned}$$
(3)

The set of real numbers is embedded in \(\mathcal {D}_1\) by \(r\mapsto (r,0)\).

If \(\tilde{f}(x)\) is an expression representing f using arithmetic operations and elementary functions, we can use \(\tilde{f}\) to calculate \((y,y') = \tilde{f}((x,1))\) on \(\mathcal {D}_1\) by replacing the operations and elementary functions in \(\tilde{f}\) by their counterparts on \(\mathcal {D}_1\), and then \(y' = f'(x)\).

This approach can be generalized to compute an interval extension \(\mathfrak {f}'\) of \(f'\) by defining the algebra of interval differential numbers \(\mathbb {I}\mathcal {D}_1:=\mathbb {IR}\times \mathbb {IR}\) and introducing again the operations (3) on \(\mathbb {I}\mathcal {D}_1\) now using interval arithmetic operations in the components of the interval differential numbers. Using this algebra and an expression \(\tilde{f}\) for f, we get by computing \((\mathbf {y},\mathbf {y}') = \tilde{f}((\mathbf {x},1))\) an enclosure \(\mathbf {y}'\supseteq f'(\mathbf {x})\) and thereby an interval extension \(\mathfrak {f}'\) of \(f'\).

3 The proposed method

3.1 Derivation of the formulas

The F test statistic used in the analysis of variance (ANOVA) problems follows the non-central F distribution when the null hypothesis is false, i.e., \(F_0 = F_{nc}(\nu _1,\nu _2,\lambda )\), where \(\nu _1\) and \(\nu _2\) are the degrees of freedom of the nominator and the denominator, respectively, of the F test statistic; \(\lambda \) is the non-centrality parameter. This non-centrality parameter is related to the size of effects. The probability of the Type II error \(\beta \) (not detecting an effect) is

$$\begin{aligned} \beta = P\left[ F_0 \le F_{\alpha }(\nu _1,\nu _2)\right] = P\left[ F_{nc}(\nu _1,\nu _2,\lambda ) \le F_{\alpha }(\nu _1,\nu _2)\right] , \end{aligned}$$
(4)

where \(\alpha \) is the significance level of the test. The power calculation consists of calculating the power (\(Power=1-\beta \)) for certain effect sizes. When it is used in the reversed way, the power is fixed (e.g. \(Power=0.9\)), and the size of the effect is calculated; this is considered as effect of detectable size, see Johnson and Leone (1977, p. 170) and  Lorenzen and Anderson (1993). Another application where the noncentral F-distribution arises is in coverages of Clopper-Pearson confidence intervals for binomial proportions with misspecified parameter p (Puza and O’Neill 2006a, b).

The cdf \(F(w,\nu _1,\nu _2,\lambda )\) of the noncentral F-distribution with \(\nu _1\), \(\nu _2\) degrees of freedom and noncentrality parameter \(\lambda \), and the cdf \(I_x(a,b;\lambda )\) of the noncentral beta distribution with shape parameters a and b and noncentrality parameter \(\lambda \) are related as follows:

$$\begin{aligned} F(w,\nu _1,\nu _2,\lambda )=I_x(a,b;\lambda ), \end{aligned}$$
(5)

where \(a=\frac{\nu _1}{2}\), \(b=\frac{\nu _2}{2}\), and \(x=\frac{\nu _1w}{\nu _1w+\nu _2}\). We will work with the noncentral beta distribution from now on.

The incomplete noncentral beta function ratio \(I_x(a,b;\lambda )\) for \(0\le x\le 1, a>0, b>0, \lambda \ge 0\) is defined as

$$\begin{aligned} I_x(a,b;\lambda )= \sum _{i=0}^{\infty }\frac{e^{-(\lambda /2)}(\lambda /2)^i}{i!}I_x(a+i,b), \end{aligned}$$
(6)

where \(I_x(a,b)\) is the usual incomplete beta function ratio,

$$\begin{aligned} I_x(a,b) =\frac{{\varGamma }(a+b)}{\varGamma (a)\varGamma (b)} \int \limits _{0}^{x} t^{a-1} (1-t)^{b-1}\mathrm {d}t , \end{aligned}$$
(7)

and \(\varGamma (a)\) is the (complete) gamma function

$$\begin{aligned} \varGamma (a)=\int \limits _{0}^{\infty } t^{a-1} e^{-t}\mathrm {d}t,~~ a>0, \end{aligned}$$
(8)

see for example Johnson et al. (1995).

Equation (6) cannot be used directly for actual numerical computations. The proposed method uses the the closed formula

$$\begin{aligned} I_x(a,b)=x^a\left( 1+\sum _{n=1}^{b-1}\left( \prod _{m=1}^n \frac{a+m-1}{m}\right) (1-x)^n\right) \end{aligned}$$
(9)

by Singh and Relyea (1992) (misprint corrected by Chattamvelli (1995)) for the computation of the cdf of the central beta distribution, and the closed formula

$$\begin{aligned} I_x(a,b;\lambda )=e^{-(\lambda /2)(1-x)} \sum _{i=0}^{b-1}\frac{\left( (\lambda /2)(1-x)\right) ^i}{i!}I_x(a+i,b-i) \end{aligned}$$
(10)

by Sibuya (1967) (and later published by Johnson et al. (1995)) for the cdf of the noncentral beta distribution. The shape parameter b must be integer. It is very interesting that the evaluation of (10), using (9), is possible in finitely many steps, requiring only the four arithmetic operations, the power function, and the exponential function; the formulas do not depend on any function for computing statistical distributions, or other special functions.

Detectable differences for a specified type II error probability \(\beta \) are determined by the noncentrality parameter \(\lambda \) for which the cdf value of the noncentral beta distribution equals \(\beta \)

$$\begin{aligned} I_{x_{1-\alpha }}(a, b;\lambda ) = \beta , \end{aligned}$$
(11)

where \(x_{1-\alpha }\) is the upper \(\alpha \) quantile of the central beta distribution with shape parameters \(a=\nu _1/2\) and \(b=\nu _2/2\); \(\alpha \) denotes the allowed type I error probability. When minimal detectable differences are sought, one solves (11) for \(\lambda \), given \(a,b,\alpha \) and \(\beta \).

3.2 Overview of the proposed method

To summarize Sect. 3.1, one can compute minimal detectable differences for a specified type II error probability \(\beta \) in the traditional setting as follows. Given the shape parameters \(a=\nu _1/2\) and \(b=\nu _2/2\), and the allowed type I error probability \(\alpha \), the upper \(\alpha \) quantile \(x_{1-\alpha }\) of the central beta distribution is computed first by solving \(I_x(a,b)=1-\alpha \) for x, using (9) and the traditional Newton method. Then, (11) is solved for the noncentrality parameter \(\lambda \) with the traditional Newton method, given \(x_{1-\alpha }, a, b\), and \(\beta \).

The proposed method follows these steps of the traditional setting but all the computations are carried out with interval arithmetic instead of ordinary floating-point arithmetic, and the interval Newton method is used instead of the traditional Newton method. As a consequence, there are notable differences. The traditional Newton method has two outcomes: It either reaches convergence or it does not. In contrast, the interval Newton method has three possible outcomes as we discussed in Sect. 2.2 formally; these outcomes are enumerated below informally as cases a, b, and c.

(a):

All solutions are rigorously enclosed, and each enclosure contains a unique zero. The result is the list of these enclosures. (In our case, there can be at most one zero, i.e., the list has at most one element.)

(b):

It is proved with mathematical rigor that the function cannot have any zeros in the initial interval.

(c):

There is at least one enclosure among the resulting enclosures of zeros which may contain a zero but verification of existence and/or uniqueness of a zero in that particular enclosure failed.

The primary use case of the proposed method is cross-checking correctness of existing statistical software. The user first computes \(x_{1-\alpha }\) and \(\lambda \) with the statistical software to be checked, given \(a=\nu _1/2\), \(b=\nu _2/2\), \(\alpha \), and \(\beta \). The \(x_{1-\alpha }\) and \(\lambda \) values are floating-point numbers, or in other words, zero-width intervals. The initial intervals for the interval Newton method are then constructed by inflating these point intervals \(x_{1-\alpha }\) and \(\lambda \) such that they are centered around \(x_{1-\alpha }\) and \(\lambda \), respectively, but they have non-zero widths. (In this context, inflation refers to the width of the interval: The width of the point interval \(x_{1-\alpha }\) is increased from zero to a strictly positive value.) If these initial intervals contain the true values, and the interval Newton method succeeds in proving it (case a), then the algorithm under test is at least as accurate as the radius of the initial intervals in that studied case. Analogously, if the initial intervals do not contain the theoretically correct value, and the interval Newton method reliably proves that (case b), then the accuracy of the algorithm under test is less than the radius of the initial intervals. The very rare but unfortunate case c, when reliable conclusion cannot be drawn and further investigation may be needed, can usually be remedied by simply changing (increasing) the radius of the initial intervals. We have not experienced case c when computing in the parameter range relevant for practical applications, e.g., when computing Tables 4 and 5; we only experienced case c when insane parameters were set.

The proposed method also works with, for example, [0, 1] as initial interval for \(x_{1-\alpha }\). In other words, the proposed method works even in the complete absence of an approximate value for \(x_{1-\alpha }\), however, this is not the anticipated use case.

3.3 Formal description of the proposed algorithm

Input. :

The input data of the proposed algorithm are \(x_{1-\alpha }\) and \(\lambda \), that the user wants to cross-check. In the anticipated use case, \(x_{1-\alpha }\) and \(\lambda \) come from an existing statistical software whose correctness is being checked.

Step 1. :

The initial intervals \(x_0\) and \(\lambda _0\) are obtained from the inputs \(x_{1-\alpha }\) and \(\lambda \) by inflating them as follows:

$$\begin{aligned}&x_0 = \left[ (1-\epsilon _x)x_{1-\alpha },~ (1+\epsilon _x)x_{1-\alpha }\right] , \text{ and } \lambda _0 = \left[ (1-\epsilon _{\lambda })\lambda ,~ (1+\epsilon _{\lambda })\lambda \right] ,\nonumber \\ \end{aligned}$$
(12)

where the inflation parameters \(\epsilon _x\) and \(\epsilon _{\lambda }\) are sufficiently small user-defined real numbers, for example \(10^{-6}\).

Step 2. :

A narrow interval containing the theoretically correct value of \(x_{1-\alpha }\) is computed with the interval Newton method, using (9). If the interval Newton method proves that \(x_0\) is guaranteed not to have a solution, or the verification of a unique solution in \(x_0\) fails, exit with the corresponding error message.

Step 3. :

Equation (11) is solved for \(\lambda \) with the interval Newton method, using (10) and (9). The possible outcomes are: A rigorous enclosure of the true value of \(\lambda \) is obtained, or \(\lambda _0\) is proved not to contain the correct value, or the verification of a unique solution in \(\lambda _0\) fails. The algorithm finishes here.

Output. :

The rigorous enclosures for \(x_{1-\alpha }\) and \(\lambda \) are printed if the interval Newton iteration is successful in both Step 2 and Step 3, or the corresponding error message if any of these steps fails.

We implemented this algorithm in C++, the source code is available on GitHub under Baharev (2016).

3.4 Implementation details

As for the implementation details, the above algorithm is implemented in C++ using the C-XSC module nlfzero. C-XSC is available from http://www.xsc.de, and it is documented in the book of Hammer et al. (1995). C-XSC implements the interval Newton method in one variable using automatic differentiation. No higher precision internal data format is used. All computations are done using the IEEE double format (64 bit). Evaluation of (9) and (10) could be significantly speeded up since it involves several redundant operations. For example, it is possible to reduce the number of switches between rounding modes considering that \(a>0\), \(b>0\), and \(0\le x\le 1\) always hold. However, (9) and (10) are used directly, and no efforts were made to decrease the computation time because we found it to be satisfactory.

The parameters are assumed to lie in the domain that is relevant for practical applications, roughly: \(a \le 25, b \le 500\), and \(0.01 \le \alpha , \beta \le 0.99\); the inflation parameters are also assumed to be sane, say \(< 10^{-4}\). Violating these assumptions may cause performance degradation and the algorithm may start reporting verification failures, but incorrect results will never appear in the output.

4 Numerical results

As we claimed in the introduction, all of those algorithmic failures and inaccuracies that we can still reproduce today could have been prevented by simply cross-checking against the proposed method; we now give two such examples. In both examples, the intermediate computations suffer significant loss of precision, but the final results presented to the user seem nevertheless plausible, making these kinds of numerical errors particularly harmful. A by-product of the first example is that the algorithm of Baharev and Kemény (2008), implemented on the top of the built-in functions of R, is proved to be accurate for 6 significant digits in the investigated cases with mathematical certainty.

The entire source code is available on GitHub at Baharev (2016). The computations have been carried out with the following hardware and software configuration. Processor: Intel(R) Core(TM) i5-3320M CPU at 2.60GHz; operating system: Ubuntu 14.04.3 LTS with 3.13.0-67-generic kernel; compiler: gcc 4.8.4, compiler optimization flag: -O3; C-XSC 2.5.4 configuration left on the default values given by the install script.

4.1 Example 1: minimal detectable differences for general ANOVA models

Appendix 12 of Lorenzen and Anderson (1993, p. 374) tabulates the minimal detectable differences for general ANOVA models as a function of \(\nu _1\) and \(\nu _2\), with the type I and type II error probabilities fixed at \(\alpha =0.05\) and \(\beta =0.10\), respectively. All entries in Appendix 12 seem plausible; there is no obvious sign that some of the entries have no correct significant digits (for example that entry in Appendix 12 that corresponds to the top right entry of Table 4 of the present paper). The entries of Appendix 12 are most likely bogus due to floating-point overflow (Baharev and Kemény 2008).

The error could have been caught by simply cross-checking against the proposed method. The corrected form of Appendix 12 of Lorenzen and Anderson (1993, p. 374), i.e., Table 3 of Baharev and Kemény (2008), fully spans the parameter range that is relevant for practical applications. This table (except rows for which b is not integer, i.e., \(b=0.5, 1.5, 2.5, 3.5\)) is recomputed with the proposed method, and it is given as Table 4. The input values of \(x_{0.95}\) and \(\lambda \) are computed by the algorithm of Baharev and Kemény (2008), implemented in R R Development Core Team (2015) and available as the package fpow; the inflation parameter values \(\epsilon _x\) and \(\epsilon _{\lambda }\) are both set to \(10^{-6}\).

Table 3 contains the solution of \(I_x(a,b)=1-\alpha \) for x, given a, b and \(\alpha =0.05\). Table 4 contains the solution of (11), where the verified \(x_{0.95}\) is obtained from the previous step, that is, from Table 3. The overall computation required less than 9 seconds. The output of the proposed algorithm is \(x_{0.95}\) verified up to 12 significant digits, and \(\lambda \) verified up to 10 significant digits. (Only 6 digits are shown in the Tables 3 and 4.) The algorithm of Baharev and Kemény (2008), implemented on the top of the built-in functions of R, is also proved to be accurate for 6 significant digits in the investigated cases with mathematical certainty.

Table 3 The upper \(\alpha = 0.05\) quantiles, the solution of \(I_x(a,b)=1-\alpha \) for x
Table 4 The noncentrality parameter \(\lambda \), the solution to Eq. (11)

4.2 Example 2: comparing the accuracy of numerical algorithms

One of the goals of Table 1 of Chattamvelli and Shanmugam (1997) was to illustrate that their algorithm gives more accurate results than Frick’s algorithm (Frick 1990) for large values of \(\lambda \). We reproduced selected rows of their table in Table 5, and extended it with an extra column showing the correct values up to 7 significant digits. This extra column was computed with the proposed method. The correct values (correct up to 7 significant digits) enabled us to give the number of correct significant digits in parentheses after each table entry. Table 5 proves, with mathematical rigor, that the algorithm of Chattamvelli and Shanmugam (1997) is actually slightly less accurate than Frick’s algorithm (Frick 1990) for those large values of \(\lambda \) that are shown in Table 5. Since the precise details of the computations are not given in Chattamvelli and Shanmugam (1997), we cannot tell where the algorithm in Chattamvelli and Shanmugam (1997) suffers from a loss of precision.

Table 5 The cdf of the non-central beta random variable