1 Introduction

Fisher (1930) introduced the concept of a fiducial distribution. Fisher’s first example is the fiducial density π(ρr) for the correlation ρ of the binormal distribution. It is given by

$$ \pi (\rho {{\mid}} r) = -\partial_{\rho} F (r {{\mid}} \rho), $$
(1.1)

where F(rρ) is the cumulative distribution function for the empirical correlation r of a random sample of size n from the binormal distribution.

Earlier, Fisher (1915) had derived an explicit formula for the probability density f(rρ) = rF(rρ) of the empirical correlation. Fisher’s formula is

$$ f (r {{\mid}} \rho) = \frac{(1 - \rho^2)^{\frac{\nu}{2}} \cdot (1 - r^2)^{\frac{\nu - 3}{2}} }{\pi (\nu - 2)!} \partial_{\rho r}^{\nu - 1} \left\{\frac{\theta}{\sin\theta} \right\}, $$
(1.2)

where \({\cos \limits } \theta = -\rho r\), 0 < 𝜃 < π, and ν = n − 1 is the degrees of freedom. In principle, formula (1.1) and formula (1.2) give a method for deriving a more explicit formula for π(ρr). This is, however, not straightforward since a convenient closed form expression for F(rρ) is missing.

The problem of deriving a more explicit formula for π(ρr) was solved by C. R. Rao. It is stated as the very last formula in the classical book Statistical Methods and Scientific Inference by Fisher (1973, eq.(234)). Rao’s formula is

$$ \pi (\rho {{\mid}} r) = \frac{(1 - r^2)^{\frac{\nu - 1}{2}} \cdot (1 - \rho^2)^{\frac{\nu - 2}{2}} }{\pi (\nu - 2)!} \partial_{\rho r}^{\nu - 2} \left\{\frac{\theta - \frac{1}{2}\sin 2\theta}{\sin^3 \theta} \right\}. $$
(1.3)

Unfortunately, this elegant formula by Rao is less known than the corresponding formula (1.2). The density π(ρr) is more important than the density f(rρ) in applications since it represents directly the resulting state of knowledge for the unknown correlation ρ based on the observed correlation r.

The Rao formula (1.3) for the density π of ρ is similar in form to the formula derived by Fisher (1915) for the density f of r defined in Eq. 1.2. Historically, it took quite some time to arrive at an alternative formula suitable for the numerical calculation of f(rρ). Hotelling (1953) arrived at a formula using hypergeometric functions for calculation of f(rρ), and this solution is now advocated by Stuart and Ord (1994, p.559–65) and Anderson (2003, p.122–26). Theorem 2.1 proved in Section 2 gives a similar explicit formula for π(ρr) using hypergeometric functions.

The concept of a confidence distribution is possibly unfamiliar to many readers. A brief introduction is given in the next section. Then the main result in Theorem 2.1 is stated and proved. This gives the exact confidence density corresponding to the fiducial distribution defined by Fisher (1930). The final section contains a brief discussion of some consequences with some additional remarks on the relevance of the fiducial argument in current statistics. An Appendix explains in more detail the connection of the derived confidence distribution to the corresponding Bayesian posterior.

2 Theory

According to Cox (1958, p.363) a confidence distribution for a parameter can be defined directly or introduced as the set of all confidence intervals at different levels. A direct definition can be given starting with a random cumulative distribution function C depending on the data Y. For completeness the mathematical details are specified next.

It is assumed that \(({\varOmega }, {\mathcal {E}}, {\text {P}})\) is the underlying probability space following Kolmogorov (1933). The data Y is given by a measurable function \(Y: {\varOmega } \rightarrow {\varOmega }_{Y}\) where ΩY is the sample space of the data. A statistical model is defined by assuming that P is unknown, and depending on an unknown model parameter 𝜃ΩΘ. A parameter γ = ψ(𝜃) is defined by a measurable function \(\psi : {\varOmega }_{\Theta } \rightarrow {\varOmega }_{{{\varGamma }}}\). Let \(U \sim {\text {U}} (0,1)\) mean P(Uu) = u for all 0 < u < 1.

Definition 2.1.

A statistic C is an exact confidence distribution for a real parameter ρ if:

  1. 1.

    ρC(ρ;y) is a cumulative distribution function for all y in the sample space of the data Y.

  2. 2.

    \(C (\rho ; Y) \sim {\text {U}} (0,1)\)

If the cumulative distribution function is differentiable with π(ρ;y) = ρC (ρ;y), then π is an exact confidence density for ρ.

It must be observed in Definition 2.1 that the both the law PY of the data Y and the parameter ρ depend on the model. Definition 2.1 is as given by Schweder and Hjort(2016, Definition 3.1, p.58). This is equivalent to demanding that the Wp = C− 1(p) fractile of C defines an exact confidence interval \((-\infty , W_{p}]\) for ρ. The proof is given by

$$ {\text{P}} (\rho \le W_p) = {\text{P}} (C(\rho) \le p) = p. $$
(2.4)

Again, it is important to notice that both probabilities depend on the law of the data Y, and ρ is a parameter of the law of the data.

In the particular case given by Eq. 1.1

$$ C (\rho; y) = 1 - F(r {{\mid}} \rho) $$
(2.5)

where r is the empirical correlation of the random sample y = ((y1, y2)1,…, (y1, y2)n) from the binormal law with correlation ρ. The unknown model is given by the model parameter 𝜃 = (μ1, μ2, σ1, σ2, ρ) corresponding to the means μ1, μ2, the standard deviations σ1, σ2, and the correlation ρ.

For this case it follows then that π(ρr) = ρC(ρ;y) is an exact confidence density as explained originally by Fisher (1930, p.532–5). The proof follows by observing that the law of the empirical correlation r only depends on the correlation ρ, that ρ↦1 − F(rρ) is a differentiable cumulative distribution function, and generally that \(F (X) \sim {\text {U}} (0,1)\) for a continuous random variable X with cumulative distribution function F.

An explicit formula for the density π(ρr) does not follow from the above arguments, but is derived below using an alternative path.

Theorem 2.1.

Let r be the empirical correlation of a random sample of size n from the binormal. The exact confidence density for the correlation ρ is

$$ \pi (\rho {{\mid}} r) = \frac{(1 - r^2)^{\frac{\nu - 1}{2}} \cdot (1 - \rho^2)^{\frac{\nu - 2}{2}} \cdot (1 - r \rho )^{\frac{1-2\nu}{2}}} {\sqrt{2} B(\nu + \frac{1}{2}, \frac{1}{2})} \cdot F\left( \frac{3}{2},-\frac{1}{2}; \nu + \frac{1}{2}; \frac{1 + r \rho}{2}\right) $$

where B is the Beta function, F is the Gaussian hypergeometric function, and ν = n − 1 > 1. The one-sided confidence intervals from π(ρr) are uniformly most accurate invariant with respect to the location-scale groups on the two coordinates.

Proof.

The idea is to use the Elfving equation

$$ \sqrt{u} \frac{\rho}{\sqrt{1-\rho^2}} - \sqrt{v} \frac{r}{\sqrt{1-r^2}} = z. $$
(2.6)

This relation was also obtained by Professor David Sprott as explained and proved by Fraser (1964, p.853, eq.(5.1)). Here \(u \sim \chi ^{2} (\nu ), v \sim \chi ^{2} (\nu - 1), z \sim {\text {N}}(0,1)\) are independent. Equation 2.6 gives the law of ρ when r is known. The degrees of freedom ν equals n − 1 for sample size n. If the means are known, then ν = n. Equation 2.6 is due to Elfving (1947) according to Lee (1971, p.117).

Equation 2.6 gives the conditional density of ρ given u,v. The marginal density of ρ follows then by integration over u,v. This integration is done by a change of variables resulting in a gamma integral and results in the density π(ρr). The details are as follows.

The conditional density of \(s = \rho / \sqrt {1 - \rho ^{2}}\) given u,v is normal by Eq. 2.6 with \((s {{\mid }} u,v) \sim {\text {N}}(\sqrt {\frac {v}{u}} t, 1/u)\) where \(t = r/\sqrt {1 - r^{2}}\). Using this, the law of u,v, and ds = (1 − ρ2)− 3/2dρ give the joint density of ρ,u,v as

$$ (1 - \rho^2)^{-3/2} \cdot \frac{u^{\frac{\nu}{2}-1} e^{-\frac{u}{2}}}{2^{\frac{\nu}{2}} {{\varGamma}}(\frac{\nu}{2})} \cdot \frac{v^{\frac{\nu-1}{2}-1} e^{-\frac{v}{2}}}{2^{\frac{\nu-1}{2}} {{\varGamma}}(\frac{\nu-1}{2})} \cdot \sqrt{\frac{u}{2\pi}} e^{-\frac{u}{2} (s - \sqrt{\frac{v}{u}} t)^2}. $$
(2.7)

The terms in the exponential are

$$ -\frac{1}{2} \left[ \frac{u}{1 - \rho^2} - \frac{2\sqrt{u v} \rho r}{\sqrt{(1 - \rho^2)(1 - r^2)}} + \frac{v}{1 - r^2} \right] = -\frac{\nu (s_1^2 - 2 s_1 s_2 r \rho + s_2^2)}{2 (1 - r^2)} $$
(2.8)

using new coordinates (s1, s2) defined by \(\nu {s_{1}^{2}} = u (1 - r^{2})/(1 - \rho ^{2})\) and \(\nu {s_{2}^{2}} = v\). Let \(s_{1} = \sqrt {\alpha } \exp (-\beta /2)\) and \(s_{2} = \sqrt {\alpha } \exp (\beta /2)\). The density for ρ,α,β from Eq. 2.7 is

$$ \frac{2^{1 - \nu} \nu^{\nu}}{\sqrt{\pi}{{\varGamma}}(\frac{\nu}{2}) {{\varGamma}}(\frac{\nu-1}{2})} (1 - r^2)^{-\frac{\nu + 1}{2}} (1 - \rho^2)^{\frac{\nu - 2}{2}} e^{-\beta} \alpha^{\nu - 1} e^{\frac{-\nu \alpha (\cosh (\beta) - \rho r)}{1 - r^2}}. $$
(2.9)

Integration over α gives π(ρr) using the identity \(\pi (\nu - 2)! = \sqrt {\pi } 2^{\nu - 2} {{\varGamma }}(\frac {\nu }{2}) {{\varGamma }}\) \((\frac {\nu - 1}{2})\) and adjusting an integral representation of F (Olver et al.2010, 14.3.9,14.12.4). The normalization factor from this is \([\nu (\nu - 1){{\varGamma }}(\nu -1)]/[\sqrt {2\pi }\) \({{\varGamma }}(\nu + \frac {1}{2})]\) which simplifies into the stated \(1/[\sqrt {2} B(\nu + \frac {1}{2}, \frac {1}{2})]\). This ends the proof of the formula. The optimality claim is a consequence of Problem 6.68 presented by Lehmann and Romano (2005, p.273). □

The formula for π(ρr) can be reformulated using Legendre functions similarly to the formula for f(rρ) obtained by Fisher (1915, p.511). Interesting recursion relations can also be established. The details of this are not given here since Theorem 2.1 is of a form more suitable for numerical calculations.

The method of proof is also of independent interest since it gives an alternative and simpler derivation of the known exact formulas for f(rρ) derived by Fisher (1915, p.507–11), Hotelling (1953, p.197–200), Stuart and Ord (1994, p.559–65), Anderson (2003, p.122–6). It gives also a possible path for derivation of exact confidence distributions for partial correlation functions.

The formula for π(ρr) seems difficult to derive directly from the formula for f(rρ) and Eq. 1.1. This is possibly the reason why an alternative explicit formula for π(ρr) has been absent for so long. Another reason is given by the good approximation given by the Fisher (1921) z-transformation:

$$ \frac{1}{2} \ln\left( \frac{1+\rho}{1-\rho}\right) - \frac{1}{2} \ln\left( \frac{1+r}{1-r}\right) \approx z/\sqrt{\nu - 2}. $$
(2.10)

Replacing Eq. 2.6 with Eq. 2.10 and solving with respect to ρ gives the z-transform confidence density

$$ \tilde{\pi} (\rho {{\mid}} r) = \sqrt{\frac{\nu-2}{2\pi}} (1-\rho^2)^{-1} e^{\frac{2-\nu}{8}\left[\ln\left( \frac{(1+\rho)(1-r)}{(1-\rho)(1+r)}\right) \right]^2} $$
(2.11)

for ν = n − 1 > 2. The density \(\tilde {\pi }\) is a good approximation to π for large sample size n, and is surprisingly accurate also for moderate sample size.

3 Examples

The result of an experiment is given by four points with (x,y) coordinates (773,727), (777,735), (284,286), and (519,573). There are reasons a priori for assuming a linear relationship. This is further supported by Fig. 1, and a high value for the coefficient of determination R2 = 97.00%. The R2 equals the square of the empirical correlation r = 98.49%. This is an example of linear regression used extensively in applied sciences. A natural question is then: What about uncertainty? The focus in the following is the correlation, but other parameters are of course of possible interest depending on the concrete application.

Figure 1
figure 1

A sample of size 4 with a regression line corresponding to an example by Fisher (1930)

An approximate 95% one sided confidence interval for the correlation ρ based on the Fisher (1921) z-transformation is [66.08,100]%. Linear interpolation in the table presented by Fisher (1930) gives an exact 95% confidence interval [67.42,100]%.Footnote 1 Our Theorem 2.1, without linear interpolation, gives the true exact 95% confidence interval [67.39,100]%. This demonstrates that the z-transformation can be quite good also for small sample size.

More complete information is given by the confidence densities shown in Fig. 2. The exact confidence density in Fig. 2 illustrates the corresponding uncertainty corresponding to all possible confidence intervals with all possible confidence levels. The density is also the Bayesian posterior from a standard prior. It represents hence all information available for the correlation based on the observations. Figure 2 also shows the approximate confidence density \(\tilde {\pi }\) from the z-transform. It is similar, but clearly \(\tilde {\pi }\) is different from π.

Figure 2
figure 2

The confidence density and the z-transform density for the Fisher (1930, p.434) example

Figure 3 shows the cd4 counts for 20 HIV-positive subjects (Efron 1998, p.101). The x-axis gives the baseline count and the y-axis gives the count after one year of treatment with an experimental antiviral drug. The empirical correlation is r = 0.7232, and the equitail z-transform 90% approximate confidence interval is [47.41,86.51]%.

Figure 3
figure 3

The HIV data of DiCiccio and Efron (1996, Table 1)

Figure 4 shows the closeness of the confidence density π and the z-transform density \(\tilde {\pi }\). The exact equitail 90% confidence interval from Theorem 2.1 is [46.54,85.74]%. It is shifted to the left as also can be inferred from Fig. 4. Efron (1998, p.101) discuss this example in more detail using bootstrap techniques.

Figure 4
figure 4

The confidence density and the z-transform density for HIV data DiCiccio and Efron (1996, Table 1)

As a final example, consider certain pairs of measurements with r = 0.534 taken on n = 8 children at age 24 months in connection with a study at a university hospital in Hong Kong. Figure 5 shows again the closeness of the confidence density and the z-transform density. Schweder and Hjort (2016, p.227, Figure 7.8) discusses this example in much more detail including different bootstrap approaches. Using the method of Fisher (1930) they arrive at the same plot of the confidence density using the exact distribution for the empirical correlation. This provides additional verification of the exact result in Theorem 2.1.

Figure 5
figure 5

The confidence density and the z-transform density for data from 8 children Schweder and Hjort (2016, p.227, Figure 7.8)

An alternative method for all examples is density estimation based on Markov Chain Monte Carlo methods from the standard prior for the binormal with five unknown parameters, but the explicit formula is preferable. The explicit formula can be used as a benchmark test case for a general MCMC posterior implementation.

4 Discussion

The fiducial density π(ρr) coincides with the Bayesian posterior from the standard prior for the five unknown parameters in the binormal and gives optimal inference. It is an exact confidence density. This is explained by Taraldsen and Lindqvist (2013). The explicit formulas for f(rρ) and π(ρr) prove that the fiducial is not obtainable from a prior π(ρ). This marginalization paradox is known, but the previously known proof is complicated Berger and Sun (2008, p.966-7).

In current mathematical statistics the problem of choice of a prior is central , and the current revival of the fiducial argument presents an alternative solution to this problem. In a non-parametric problem the choice of a Bayesian prior can be justified by establishing good asymptotic frequentist coverage properties (Castillo and Nickl, 2013; Ghosal and van der Vaart, 2017). Cui and Hannig (2019) demonstrate that this can also be done by a fiducial argument for a non-parametric problem without a Bayesian prior.

Schweder and Hjort (2016) present recent developments in the theory of confidence distributions and advocates this as an alternative to the calculation of Bayesian posteriors. Taraldsen and Lindqvist (2019) explain that the problem of choosing a prior, including non-parametric problems, can be solved by not choosing a prior, but rather using the information contained in a data generating equation. Xie and Singh (2013) explain how the concept of a confidence distribution can be seen as the frequentist distribution estimator of a parameter.

In his initial work on the fiducial argument Fisher (1930, p.532–5) used a frequency interpretation for its justification. In later works, Fisher (1973, p.54–5) insisted on a more general interpretation: When there is no prior information, then the interpretation of the fiducial is exactly as for a Bayesian posterior. It is the state of knowledge of the parameter given the observations. The modern view is that the knowledge given by a prior is replaced by the knowledge inherent in a particular data generating equation. Taraldsen and Lindqvist (2013, p.331) demonstrate that this can give optimal inference in a non-parametric Hilbert space problem, and Cui and Hannig (2019) demonstrate superiority of a fiducial distribution in a non-parametric problem for survival functions under censoring.

Neyman (1937, eq.20) is usually credited for the invention of the theory of confidence intervals. Cox (1958, p.363–6) can likewise be credited for suggesting the use of confidence distributions in statistical inference. Actually, Fisher (1930, p.532–5) defines both concepts precisely, and uses the correlation coefficient and Eq. 1.1 as a concrete example with numerical calculations. Combining his initial results gives an algorithm for numerical calculation of exact confidence intervals for the correlation. Fisher (1930, Table, p.533) computed a table with exact 5% and 95% percentiles for sample size n = 4. His table, up to numerical rounding, is consistent with a more direct approach based on Theorem 2.1. This gives an independent check of the claim in Theorem 2.1, and also of the table calculated by Fisher.

Inference for correlation is, even if of a seemingly elementary kind, of central importance in applied statistics. It is almost impossible to find a linear regression plot without the accompanying R2. The exact solution by Fisher (1930) is rarely used. Standard software gives an approximate solution using the Fisher (1921) z-transform. An example using the z-transform is given by Efron (1998, p.101). The density in Theorem 2.1 gives the uncertainty associated with the estimated correlation, and hence also of the correlation squared. Approximate inference using the Fisher (1921) z-transform can, and should, be replaced by exact inference.

The fiducial density π(ρr) corresponds to the very first example used by Fisher (1930) when he introduced his fiducial argument. Fisher justified this fiducial distribution by proving that the corresponding quantiles give exact confidence intervals. This was a starting point for Neyman (1937) when formulating a general theory of confidence intervals. It can safely be concluded that the seminal paper by Fisher (1930) has been pivotal in the historical development of mathematical statistics, and it still is.