1 Introduction

A common activity in statistics is that of testing the null hypothesis, \(H_0\), that the true value of the parameter \(\omega \) lies in a specified subspace of the parameter space \(\Omega \). The two main general tests are the likelihood ratio test (LRT) and the score test. The LRT rejects \(H_0\) for large values of \(w = 2 \left\{ l( \hat{\omega } ; x_1, \dots , x_n) - l( \tilde{\omega } ; x_1, \dots , x_n) \right\} \), where \(l( \cdot ; x_1, \dots , x_n)\) denotes the log-likelihood based on observations \(x_1, \dots , x_n\), and \(\hat{\omega }\) and \(\tilde{\omega }\) are the maximum likelihood estimate and the restricted maximum likelihood estimate under \(H_0\), respectively. The score test rejects \(H_0\) for large values of

$$\begin{aligned} S = {\tilde{U}}_h^{\top } {\tilde{i}}_h ^{-1} {\tilde{U}}_h, \end{aligned}$$
(1)

where \(U_h\) is the score for the interest parameter, \(i _h{}^{-1}\) is the interest part of the inverse Fisher information, \(U_h^{\top }\) denotes the transpose of \(U_h\), and each tilde indicates evaluation at \(\tilde{\omega }\).

Under mild regularity conditions and under independent sampling, the large-sample asymptotic null distributions of w and S are \(\chi _p^2\) with error of order \(O(n^{-1})\), where p is the dimension of the interest parameter. For w, there is a Bartlett adjusted version, \(w^{*}\), of w given by

$$\begin{aligned} w^{*} = w (1+R/p) \end{aligned}$$
(2)

for some constant R and so that the null distribution of \(w^{*}\) is \(\chi _p^2\) with error of order \(O(n^{-2})\) [3, 4]. The scalar R can be expressed in terms of some tensors [3, 6] that arise from the geometry. For S there is no analogous linear Bartlett adjustment but there is a cubic modification [9] of S such that its null distribution is \(\chi _p^2\) with error of order \(O(n^{-3/2})\). The coefficients of the cubic are linear functions of coefficients in the expansion [11] to order \(O(n^{-1})\) of the moment generating function of S. These coefficients (and so the cubic modification) depend on the choice of parameterisation of the nuisance parameters, i.e., on the way in which the parameter space is written locally as a product of the spaces of interest and nuisance parameters. Even after correction of a misprint noted by [9], the coefficients of the cubic given in [11] are not invariant under re-parameterisation [10]. Further, there are no obvious geometric interpretations of the coefficients. For the case of simple null hypotheses, there is [8] a parameterisation-invariant version, \(S^{\ddag }\), of S such that the null distribution of \(S^{\ddag }\) is \(\chi _p^2\) with error of order \(O(n^{-2})\). Whereas the cubic correction, \(S^{*}\), of S introduced in Sect. 3.3 below is a cubic function of S, \(S^{\ddag } = ({\tilde{U}}_h^{\ddag })^{\top } {\tilde{i}}_h ^{-1} {{\tilde{U}}_h}^{\ddag }\), where \({{\tilde{U}}_h}^{\ddag }\) is a cubic function of \({{\tilde{U}}_h}\). Even in some simple models (such as that in [14, Sect. 3]), the cubic giving \({{\tilde{U}}_h}^{\ddag }\) in terms of \({{\tilde{U}}_h}\) can be quite complicated. There are no obvious geometric interpretations of the coefficients of this cubic.

The aim of this paper is to provide a parameterisation-invariant expansion to order \(O(n^{-1})\) of S in which the coefficients have geometric interpretations. A cubic correction, \(S^{*}\), of S is introduced, such that the null distribution of \(S^{*}\) is \(\chi _p^2\) with error of order \(O(n^{-2})\). Because two serious disadvantages of index notation are (i) it is vulnerable to misprints, (ii) it can obscure concepts by concentrating on the details of calculations, the approach here largely avoids explicit parameterisations and the use of index notation. For readers who prefer index notation, Appendix A contains expressions in that language for the coefficients of the cubic.

Section 2 recalls material on yokes, introduces fibred yokes, and shows how they give rise to decomposition of tensors. In Sect. 3 the asymptotic moment generating function of S is derived, the coefficients of the cubic giving \(S^{*}\) are given, and these coefficients are related to appropriate tensors.

2 Yokes and fibred yokes

An appropriate geometric setting for parametric models in which nuisance parameters can be present is that of submersions from one smooth manifold to another. More precisely, \(\pi : \Omega \rightarrow \Psi \) is a smooth map from the full parameter space, \( \Omega \), to the space, \(\Psi \), of parameters of interest, and at each point \(\omega \) of \(\Omega \) the tangent map \(\pi _{*}\) maps the tangent space \(T \Omega _{\omega }\) onto \(T \Psi _{\pi ( \omega )}\). The submersion condition implies that each fibre \(\pi ^{-1}(\psi )\) is a submanifold of \(\Omega \) and that around each \(\omega \) small portions of \(\Omega \) look like \(\Psi \times \pi ^{-1} (\pi (\omega ))\) with \(\pi \) being identified locally with the projection of \(\Psi \times \pi ^{-1} (\pi (\omega ))\) onto \(\Psi \). Nevertheless, in general \( \Omega \) is not such a product and it is conceptually not helpful to think of \(\Omega \) in this way.

2.1 Yokes

The coordinate-free definition of a yoke is as follows. For a vector field X on a manifold \(\Omega \), define the vector fields \(\bar{X}\) and \(\bar{X} '\) on \(\Omega \times \Omega \) by \(\bar{X} = (X,0)\) and \(\bar{X}' = (0,X)\), i.e.,

$$\begin{aligned} Tp_1(\bar{X}) = X,&\qquad Tp_2(\bar{X}) = 0, \\ Tp_1(\bar{X}') = 0,&\qquad Tp_2(\bar{X}') = X, \end{aligned}$$

where \(p_k : \Omega \times \Omega \rightarrow \Omega \) is the projection onto the \(k^\textrm{th}\) factor for \(k = 1,2\). Then, for vector fields X and Y on \(\Omega \) and a smooth function \(g : \Omega \times \Omega \rightarrow \mathbb {R}\), we define \(g(X \vert Y) : \Omega \rightarrow \mathbb {R}\) by

$$\begin{aligned} g(X \vert Y)(\omega ) = \bar{X}\bar{Y}' g(\omega ,\omega ). \end{aligned}$$

A yoke on \(\Omega \) may now be characterised as a smooth function \(g : \Omega \times \Omega \rightarrow \mathbb {R}\) such that

  1. (i)

    \(\bar{X}g(\omega ,\omega ) = 0\) for all \(\omega \) in \(\Omega \),

  2. (ii)

    the (0,2)-tensor \((X,Y) \mapsto g(X \vert Y)\) is non-singular.

An alternative way of expressing (i) and (ii) is that on the diagonal \(\Delta _\Omega = \left\{ (\omega , \omega ) : \omega \in \Omega \right\} \),

  1. (i)

    \(d_1g = 0\),

  2. (ii)

    \(d_1d_2g\) is non-singular,

where \(d_1\) and \(d_2\) denote exterior differentiation along the first and second factor, respectively, in \(\Omega \times \Omega \).

The two main yokes of interest in statistics are the likelihood yokes. Consider a parametric statistical model with parameter space \(\Omega \), sample space \(\mathcal {X}\) and log-likelihood function \(l : \Omega \times \mathcal {X} \rightarrow \mathbb {R}\). The expected likelihood yoke on \(\Omega \) is the function g on \(\Omega \times \Omega \) given by

$$\begin{aligned} g(\omega , \omega ') = E_{\omega '} [l(\omega ;x) - l(\omega '; x)] . \end{aligned}$$
(3)

Suppose that an auxiliary statistic a is given, such that the statistic \(({\hat{\omega }}, a)\) is minimal sufficient for \(\omega \), where \({\hat{\omega }}\) denotes the maximum likelihood estimator. Then the corresponding observed likelihood yoke on \(\Omega \) is the function g on \(\Omega \times \Omega \) given by

$$\begin{aligned} g(\omega , \omega ') = l(\omega ; \omega ',a) - l(\omega '; \omega ',a). \end{aligned}$$
(4)

Properties and applications of expected and observed likelihood yokes can be found in [1, 3].

A key property of yokes is that they give rise naturally to preferred coordinate charts (called extended normal coordinates) taking values in appropriate cotangent spaces. Given any point \(\omega \) of \(\Omega \), the function \({\Gamma }_{\omega }\) from \(\Omega \) to the cotangent space \(T^{*}\Omega _{\omega }\) to \(\Omega \) at \(\omega \) is defined by

$$\begin{aligned} \Gamma _{\omega }(\omega ') = d_1 g (\omega , \omega ') . \end{aligned}$$
(5)

In terms of local coordinates \(\omega ^1, \dots , \omega ^d\) on \(\Omega \),

$$\begin{aligned} \Gamma _{\omega }(\omega ') = \frac{\partial g(\omega , \omega ')}{\partial \omega ^u} d \omega ^u , \end{aligned}$$

where the Einstein summation convention is used. It follows from property (i) of a yoke that \(\Gamma _{\omega }(\omega ) = 0\) and from property (ii) that the restriction \(\Gamma _{\omega } \vert U\) of \(\Gamma _{\omega }\) to some neighbourhood U of \(\omega \) in \(\Omega \) is a coordinate chart on U taking values in \(T^{*}\Omega _{\omega }\). Note that the space \(T^{*} \Omega _{\omega }\) depends on \(\omega \). It has been customary [3, Sect. 5.6], [6, Sect. 4], [16] to use the metric given by the yoke to ‘raise’ the \({\Gamma }_{\omega }\), in order to obtain extended normal coordinates with values in the tangent space \(T \Omega _{\omega }\) rather than in its dual, the cotangent space \(T^{*} \Omega _{\omega }\). The \({\Gamma }_{\omega }\) defined in (5) are used here because they can be regarded as more basic. In the language of strings, the coordinate expressions for the ‘raised’ versions of the derivatives of \({\Gamma }_{\omega }\) form the costring [6].

For any smooth function f on \(\Omega \), the composition \(f \circ \Gamma _{\omega }^{-1}\) is a function on an open neighbourhood of 0 in the vector space \(T^{*} \Omega _{\omega }\), and so its derivatives are symmetric tensors on \(T^{*} \Omega _{\omega }\). Combining these tensors with \(\Gamma _{\omega }\) gives an invariant Taylor expansion (a parameterisation-invariant analogue of a Taylor expansion) of f. Expressions in index notation for (‘lowered’ versions of) these invariant Taylor expansions are given in [5, Sect. 3.3], [3, Sect. 5.6], [16, Sect. 4]. Similarly, for any smooth function h on \(\Omega \times \Omega \), the composition \(h \circ ( \Gamma _{\omega }^{-1} \times \Gamma _{\omega }^{-1})\) is a function on an open neighbourhood of 0 in the vector space \(T^{*} \Omega _{\omega } \times T^{*} \Omega _{\omega }\), and so its derivatives are symmetric tensors on \(T^{*} \Omega _{\omega } \otimes T^{*} \Omega _{\omega }\). In the language of strings, these tensors are said to be obtained by intertwining [1]. In particular, Taylor expansion of g in the corresponding product coordinate charts on a neighbourhood of \((\omega , \omega )\) in \(\Omega \times \Omega \) yields a family of tensors \(T _{r_1, \dots , r_p; s_1, \dots , s_q}\) on \(\omega \) [6].

Remark 1

Extended normal coordinates, \(\Gamma _{\omega }\), can be defined also in the more general setting of pre-contrast functions, meaning functions \(h: \Omega \times \Omega \rightarrow T^{*}\Omega \) such that

  1. (o)

    \(h(\omega ,\omega ') \in T^{*}\Omega _{\omega }\),

  2. (i)

    \(h(\omega ,\omega ) = 0\),

  3. (ii)

    \(d_2 h\) is non-degenerate on the diagonal, \(\Delta _\Omega \), where \(d_2\) denotes the exterior derivative along \(\{ \omega \} \times \Omega \).

(In the language of vector bundles, h is a section of the pull-back of the cotangent bundle of \(\Omega \) by the projection \(\pi _1: \Omega \times \Omega \rightarrow \Omega \) onto the first factor, such that \(h = 0\) on the diagonal and its derivative is non-degenerate there.) The original definition [12] of pre-contrast functions required the restriction of \(- d_2 h\) to the diagonal to be a semi-Riemannian metric on \(\Omega \).

The general mathematical concept that underlies the results in this paper is that of a fibred yoke, i.e., a submersion \(\pi : \Omega \rightarrow \Psi \), together with a yoke on \(\Omega \). In the current context, \(\pi \) maps parameters to interest parameters, and the yoke is a likelihood yoke (3) or (4).

2.2 Decomposition of tangent spaces

In the tangent space \(T \Omega _{\omega }\) to \(\Omega \) at \(\omega \) the vertical subspace \(V_{\omega }\) is defined as \(V_{\omega } = \left\{ X \in T \Omega _{\omega }: \pi _{*}(X) = 0 \right\} \). Given a Riemannian metric \(\phi \) on \(\Omega \), the horizontal subspace \(H_{\omega }\) is the orthogonal complement of \(V_{\omega }\) in \(T \Omega _{\omega }\). Thus \(\phi \) decomposes \(T \Omega _{\omega }\) as the orthogonal direct sum

$$\begin{aligned} T \Omega _{\omega } = V_{\omega } \oplus H_{\omega } . \end{aligned}$$
(6)

The decomposition (6) varies smoothly with \(\omega \), in the sense that \(\omega \mapsto (V_{\omega } , H_{\omega })\) is a smooth map from \(\Omega \) to \(V_q(T \Omega ) \times V_p(T \Omega )\), where \(V_r(T \Omega )\) denotes the manifold \(\{ (\omega , E_{\omega }) : E_{\omega }{} { isan}r\text{-dimensional } \text{ subspace } \text{ of } T\Omega _{\omega }{} \}\), and p and q are the dimensions of the interest and nuisance parameters, respectively. The smoothness of the decomposition (6) implies that \(Y_h, Y_v, Y_{hv}\), and \(Y_{vv}\) defined in Subsection 3.1 depend smoothly on \(\omega \), and so, under mild regularity conditions, the tensors defined in (9) below exist. The tangent mapping \(\pi _{*}\) identifies \(H_{\omega }\) with \(T \Psi _{\pi ( \omega )}\).

The inner product \(\pi _{\omega } \phi \) on \(T\Psi _{\pi (\omega )}\) is defined by

$$\begin{aligned} \pi _{\omega } \phi (X, Y) = \phi (\tilde{X}, \tilde{Y}) \quad X, Y \in T\Psi _{\pi (\omega )} , \nonumber \end{aligned}$$

where \(\tilde{X}\) and \(\tilde{Y}\) are the horizontal lifts to \(T \Omega _{\omega }\) of X and Y, i.e., they are the unique elements of \( H_{\omega }\) such that \(\pi _{*} (\tilde{X}) = X\) and \(\pi _{*} (\tilde{Y}) = Y\). The dual of the decomposition (6) of the tangent space \(T \Omega _{\omega }\) to \(\Omega \) at \(\omega \) is the decomposition

$$\begin{aligned} T ^* \Omega _{\omega } = V^*_{\omega } \oplus H^*_{\omega } \end{aligned}$$
(7)

of the cotangent space \(T ^* \Omega _{\omega }\) to \(\Omega \) at \(\omega \). Taking the r-fold tensor product of the decomposition (7) of \(T^* \Omega _{\omega }\) leads to the decomposition

$$\begin{aligned} \otimes ^r T^* \Omega _{\omega } = \oplus _{s = 0}^r \left( ( \otimes ^s V^*_{\omega }) \otimes \left( \otimes ^{r-s} H^*_{\omega } \right) \right) \end{aligned}$$
(8)

of the space of r-fold tensors on \(T^* \Omega _{\omega }\).

The projection of the score onto \(H^*_{\omega }\) using the decomposition (7) is the horizontal score, \(U_h\), used in (1). It is the score for the interest parameter, \(\psi \), and is also known as the orthogonal score [13, 17].

3 Higher-order behaviour of S

3.1 Tensors from log-likelihood derivatives

Denote by \(Z_1, Z_2, Z_3\) the 1st, 2nd and 3rd derivatives of the log-likelihood, centred and scaled by \(n^{-1/2}\) to have order \(O_p(1)\). Expressing \(Z_1, Z_2, Z_3\) in the functions \(\Gamma _{\omega }\) around \(\omega \) given by (5) with the expected likelihood yoke (3) yields random tensors \(Y_1, Y_2, Y_3\). Decomposing \(Y_1, Y_2, Y_3\) by (8) gives \(Y_h\) in \(H^{*}_{\omega }\), \(Y_v\) in \(V^{*}_{\omega }\), \(Y_{hv}\) in \(H^{*}_{\omega } \otimes V^{*}_{\omega }\), \(Y_{vv}\) in \(\otimes ^2 V^{*}_{\omega }\) and \(Y_{hvv}\) in \(H^{*}_{\omega } \otimes (\otimes ^2 V^{*}_{\omega })\). The tensors \(\tau _{h, h, h} \) in \(\otimes ^3 H^{*}_{\omega }\), \(\tau _{h, h, v} \) in \((\otimes ^2 H^{*}_{\omega }) \otimes V^{*}_{\omega }\), \(\tau _{h, v, v} \) in \(H^{*}_{\omega } \otimes (\otimes ^2 V^{*}_{\omega })\), \(\tau _{hv, hv}\) in \(\otimes ^2(H^{*}_{\omega } \otimes V^{*}_{\omega })\), \(\tau _{h,h,vv}\) in \((\otimes ^2 H^{*}_{\omega }) \otimes (\otimes ^2 V^{*}_{\omega })\), \(\tau _{h,v, hv}\) in \(\otimes ^2(H^{*}_{\omega } \otimes V^{*}_{\omega })\) and \(\tau _{h,h, h,h}\) in \(\otimes ^4 H^{*}_{\omega }\) are defined by

$$\begin{aligned} \tau _{h, h, h} = E[ \otimes ^3 Y_h ] , \quad \tau _{h, h, v}&= E[ (\otimes ^2 Y_h) \otimes Y_v ] , \quad \tau _{h, v, v} = E[ Y_h \otimes (\otimes ^2 Y_v) ] , \nonumber \\ \tau _{hv, hv} = E[ Y_{hv} \otimes Y_{hv} ] , \, \, \tau _{h, h,vv}&= E[ Y_h \otimes Y_h \otimes Y_{vv} ] , \, \, \tau _{h, v, hv} = E[ Y_h \otimes Y_{v} \otimes Y_{hv} ] , \nonumber \\ \tau _{h, h, h,h}&= E[ \otimes ^4Y_h ] . \end{aligned}$$
(9)

Remark 2

The tensors (9) can be obtained from the expected yoke (3). There are analogous tensors [3, Sect. 5.5] arising from the observed likelihood yoke ( 4). Under ordinary repeated sampling, corresponding tensors differ by \(O(n^{-1/2})\).

3.2 Moment generating function of S

One way [3, Sect. 5.3] of deriving the constant R in the expression (2) for \(w^{*}\) is based on expanding w to order \(O(n^{-1})\) as a quartic in the score. There is an analogous expansion of S as

$$\begin{aligned} S = S_0 + n^{-1/2} S_1 + n^{-1} S_2 + O(n^{-3/2}), \nonumber \end{aligned}$$

where \(S_0, S_1, S_2\) are \(O_p(n^{-1})\), \(S_0\) is a homogeneous quadratic in \(Y_1\), \(S_1\) is a homogeneous cubic in \(Y_1, Y_2\), and \(S_2\) is a homogeneous quartic in \(Y_1, Y_2, Y_3\). Calculation of some low-order moments of products of \(S_0, S_1\) and \(S_2\) leads to the following theorem.

Theorem 1

Suppose that (a) the sample space is continuous, (b) the log-likelihood function is finite and its derivatives of order 4 or less are continuous in some neighbourhood of \(\omega \), (c) the Fisher information at \(\omega \) is non-singular. Then the moment generating function \(M_S(t)\) of S has the form

$$\begin{aligned} M_S(t) = {(1 - 2t)}^{-\frac{p}{2}} \left\{ 1 + \frac{1}{24 n} \left( A_{1}d + A_{2}d^{2} + A_{3} d^{3} + O(d^4) \right) \right\} + O(n^{-3/2}), \end{aligned}$$
(10)

where \(d = 2t/(1-2t)\) and

$$\begin{aligned} A_1&= 12 \, \textrm{tr}_h \, \textrm{tr}_v \, ( \tau _{hv,hv}) + 3 \, \langle \textrm{tr}_v \, (\tau _{h,v,v}) , \textrm{tr}_h \, (\tau _{h,h,h}) \rangle _h\nonumber \\&+ 6 \, \Vert \tau _{h,v,v} \Vert ^2 + 6 \,\textrm{tr}_h \, \textrm{tr}_v (\tau _{h,h,vv}) + 36 \,\textrm{tr}_h \, \textrm{tr}_v (\tau _{h,v,hv}) + 6 \, \Vert \textrm{tr}_v \, \tau _{h,v,v }\Vert ^2 , \end{aligned}$$
(11)
$$\begin{aligned} A_2&= 3 \, \textrm{tr}_h \, \textrm{tr}_h \, ( \tau _{h,h,h,h}) - 6 \Vert \tau _{h,h,v} \Vert ^2 - 3 \Vert \textrm{tr}_{h} \, (\tau _{h,h,v}) \Vert _v^2 \nonumber \\&- 6 \,\langle \textrm{tr}_v \, (\tau _{h,v,v}) , \textrm{tr}_h \, (\tau _{h,h,v}) \rangle _h \, , \end{aligned}$$
(12)
$$\begin{aligned} A_3&= 3 \, \Vert \textrm{tr} \,(\tau _{h,h,h}) \Vert ^2 + 2 \, \Vert \tau _{h,h,h} \Vert ^2 , \end{aligned}$$
(13)

where \(\textrm{tr}_h\) and \( \textrm{tr}_v\) indicate traces taken over pairs of factors in \(H_{\omega }^{*}\) and \(V_{\omega }^{*}\), respectively, while inner products and norms on the tensor spaces \(\otimes H_{\omega }^{*}\), etc. are those given by tensor products of inverse Fisher information.

If the null hypothesis, \(H_0\), is simple then

$$\begin{aligned} A_1&= 0 , \end{aligned}$$
(14)
$$\begin{aligned} A_2&= 3 \, \textrm{tr} \, \textrm{tr} \, ( \tau _4) \, , \end{aligned}$$
(15)
$$\begin{aligned} A_3&= 3 \, \Vert \textrm{tr} \,(\tau _3) \Vert ^2 + 2 \, \Vert \tau _3 \Vert ^2 , \end{aligned}$$
(16)

where \(\tau _3 = \tau _{h,h,h}\), \(\tau _4 = \tau _{h,h,h,h}\), and the expressions given in [11, (3)] agree with (14)–(16) Further, in this case of a simple \(H_0\), the constant R in the definition (2) of the Bartlett adjusted version \(w^{*}\) of w can be expressed as

$$\begin{aligned} R = \frac{1}{12} \left\{ 12 \, \textrm{tr} \, \textrm{tr} \, (\tau _{2,2} ) + A_2 + A_3 \right\} \end{aligned}$$

with \(A_2\) and \(A_3\) as in (15)–(16) and \(\tau _{2,2}\) in \(\otimes ^4 T^{*} \Omega _{\omega }\) defined with components in [6, (5.22)]. There is also an expression [3, 6] for R in terms of analogous tensors (mentioned in Remark 2) arising from the observed likelihood yoke (4).

3.3 Cubic modification of S

Put

$$\begin{aligned} c&= \frac{A_1- A_2 + A_3}{12 p}, \qquad b = \frac{A_2 - 2 A_3}{12 p(p+2)}, \qquad a = \frac{A_3}{12 p(p+2)(p+4)} , \end{aligned}$$

where p is the dimension of \(\Psi \), and define the cubic modification \(S^{*}\) of S by

$$\begin{aligned} S^{*} = \left\{ 1 - \frac{1}{n}(c + b S + a S^2) \right\} S . \end{aligned}$$

Then [9] the null distribution of \(S^{*}\) is \(\chi ^2_p\) with error of order \(O(n^{-3/2})\). A slight extension of the symmetry argument in [4] for the Bartlett-corrected likelihood ratio test shows that the error is of order \(O(n^{-2})\).