A parameterisation-invariant modification of the score test

Jupp, P. E.

doi:10.1007/s41884-023-00101-4

A parameterisation-invariant modification of the score test

Research Paper
Open access
Published: 07 March 2023

Volume 7, pages 429–439, (2024)
Cite this article

Download PDF

You have full access to this open access article

Information Geometry Aims and scope Submit manuscript

A parameterisation-invariant modification of the score test

Download PDF

P. E. Jupp ORCID: orcid.org/0000-0003-0973-8434¹

1371 Accesses
3 Altmetric
Explore all metrics

Abstract

The null distribution of the score test statistic is asymptotically chi-squared for large samples. The error in this approximation is improved greatly by a cubic modification. The coefficients of this cubic that are given in the literature depend on the parameterisation. This paper provides parameterisation-invariant versions of the coefficients, expresses them in terms of appropriate tensors, and provides geometric interpretations.

Quantifying the Bias of Non-linear Equating and Score Transformations

A Review of Score-Test-Based Inference for Categorical Data

Article 26 May 2022

Overestimation of Reliability by Guttman’s λ 4, λ 5, and λ 6 and the Greatest Lower Bound

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

A common activity in statistics is that of testing the null hypothesis, $H_0$, that the true value of the parameter $\omega $ lies in a specified subspace of the parameter space $\Omega $. The two main general tests are the likelihood ratio test (LRT) and the score test. The LRT rejects $H_0$ for large values of $w = 2 \left\{ l( \hat{\omega } ; x_1, \dots , x_n) - l( \tilde{\omega } ; x_1, \dots , x_n) \right\} $, where $l( \cdot ; x_1, \dots , x_n)$ denotes the log-likelihood based on observations $x_1, \dots , x_n$, and $\hat{\omega }$ and $\tilde{\omega }$ are the maximum likelihood estimate and the restricted maximum likelihood estimate under $H_0$, respectively. The score test rejects $H_0$ for large values of

$$\begin{aligned} S = {\tilde{U}}_h^{\top } {\tilde{i}}_h ^{-1} {\tilde{U}}_h, \end{aligned}$$

(1)

where $U_h$ is the score for the interest parameter, $i _h{}^{-1}$ is the interest part of the inverse Fisher information, $U_h^{\top }$ denotes the transpose of $U_h$, and each tilde indicates evaluation at $\tilde{\omega }$.

Under mild regularity conditions and under independent sampling, the large-sample asymptotic null distributions of w and S are $\chi _p^2$ with error of order $O(n^{-1})$, where p is the dimension of the interest parameter. For w, there is a Bartlett adjusted version, $w^{*}$, of w given by

$$\begin{aligned} w^{*} = w (1+R/p) \end{aligned}$$

(2)

for some constant R and so that the null distribution of $w^{*}$ is $\chi _p^2$ with error of order $O(n^{-2})$ [3, 4]. The scalar R can be expressed in terms of some tensors [3, 6] that arise from the geometry. For S there is no analogous linear Bartlett adjustment but there is a cubic modification [9] of S such that its null distribution is $\chi _p^2$ with error of order $O(n^{-3/2})$. The coefficients of the cubic are linear functions of coefficients in the expansion [11] to order $O(n^{-1})$ of the moment generating function of S. These coefficients (and so the cubic modification) depend on the choice of parameterisation of the nuisance parameters, i.e., on the way in which the parameter space is written locally as a product of the spaces of interest and nuisance parameters. Even after correction of a misprint noted by [9], the coefficients of the cubic given in [11] are not invariant under re-parameterisation [10]. Further, there are no obvious geometric interpretations of the coefficients. For the case of simple null hypotheses, there is [8] a parameterisation-invariant version, $S^{\ddag }$, of S such that the null distribution of $S^{\ddag }$ is $\chi _p^2$ with error of order $O(n^{-2})$. Whereas the cubic correction, $S^{*}$, of S introduced in Sect. 3.3 below is a cubic function of S, $S^{\ddag } = ({\tilde{U}}_h^{\ddag })^{\top } {\tilde{i}}_h ^{-1} {{\tilde{U}}_h}^{\ddag }$, where ${{\tilde{U}}_h}^{\ddag }$ is a cubic function of ${{\tilde{U}}_h}$. Even in some simple models (such as that in [14, Sect. 3]), the cubic giving ${{\tilde{U}}_h}^{\ddag }$ in terms of ${{\tilde{U}}_h}$ can be quite complicated. There are no obvious geometric interpretations of the coefficients of this cubic.

The aim of this paper is to provide a parameterisation-invariant expansion to order $O(n^{-1})$ of S in which the coefficients have geometric interpretations. A cubic correction, $S^{*}$, of S is introduced, such that the null distribution of $S^{*}$ is $\chi _p^2$ with error of order $O(n^{-2})$. Because two serious disadvantages of index notation are (i) it is vulnerable to misprints, (ii) it can obscure concepts by concentrating on the details of calculations, the approach here largely avoids explicit parameterisations and the use of index notation. For readers who prefer index notation, Appendix A contains expressions in that language for the coefficients of the cubic.

Section 2 recalls material on yokes, introduces fibred yokes, and shows how they give rise to decomposition of tensors. In Sect. 3 the asymptotic moment generating function of S is derived, the coefficients of the cubic giving $S^{*}$ are given, and these coefficients are related to appropriate tensors.

2 Yokes and fibred yokes

An appropriate geometric setting for parametric models in which nuisance parameters can be present is that of submersions from one smooth manifold to another. More precisely, $\pi : \Omega \rightarrow \Psi $ is a smooth map from the full parameter space, $ \Omega $, to the space, $\Psi $, of parameters of interest, and at each point $\omega $ of $\Omega $ the tangent map $\pi _{*}$ maps the tangent space $T \Omega _{\omega }$ onto $T \Psi _{\pi ( \omega )}$. The submersion condition implies that each fibre $\pi ^{-1}(\psi )$ is a submanifold of $\Omega $ and that around each $\omega $ small portions of $\Omega $ look like $\Psi \times \pi ^{-1} (\pi (\omega ))$ with $\pi $ being identified locally with the projection of $\Psi \times \pi ^{-1} (\pi (\omega ))$ onto $\Psi $. Nevertheless, in general $ \Omega $ is not such a product and it is conceptually not helpful to think of $\Omega $ in this way.

2.1 Yokes

The coordinate-free definition of a yoke is as follows. For a vector field X on a manifold $\Omega $, define the vector fields $\bar{X}$ and $\bar{X} '$ on $\Omega \times \Omega $ by $\bar{X} = (X,0)$ and $\bar{X}' = (0,X)$, i.e.,

$$\begin{aligned} Tp_1(\bar{X}) = X,&\qquad Tp_2(\bar{X}) = 0, \\ Tp_1(\bar{X}') = 0,&\qquad Tp_2(\bar{X}') = X, \end{aligned}$$

where $p_k : \Omega \times \Omega \rightarrow \Omega $ is the projection onto the $k^\textrm{th}$ factor for $k = 1,2$. Then, for vector fields X and Y on $\Omega $ and a smooth function $g : \Omega \times \Omega \rightarrow \mathbb {R}$, we define $g(X \vert Y) : \Omega \rightarrow \mathbb {R}$ by

$$\begin{aligned} g(X \vert Y)(\omega ) = \bar{X}\bar{Y}' g(\omega ,\omega ). \end{aligned}$$

A yoke on $\Omega $ may now be characterised as a smooth function $g : \Omega \times \Omega \rightarrow \mathbb {R}$ such that

(i)
$\bar{X}g(\omega ,\omega ) = 0$ for all $\omega $ in $\Omega $,
(ii)
the (0,2)-tensor $(X,Y) \mapsto g(X \vert Y)$ is non-singular.

An alternative way of expressing (i) and (ii) is that on the diagonal $\Delta _\Omega = \left\{ (\omega , \omega ) : \omega \in \Omega \right\} $,

(i)
$d_1g = 0$,
(ii)
$d_1d_2g$ is non-singular,

where $d_1$ and $d_2$ denote exterior differentiation along the first and second factor, respectively, in $\Omega \times \Omega $.

The two main yokes of interest in statistics are the likelihood yokes. Consider a parametric statistical model with parameter space $\Omega $, sample space $\mathcal {X}$ and log-likelihood function $l : \Omega \times \mathcal {X} \rightarrow \mathbb {R}$. The expected likelihood yoke on $\Omega $ is the function g on $\Omega \times \Omega $ given by

$$\begin{aligned} g(\omega , \omega ') = E_{\omega '} [l(\omega ;x) - l(\omega '; x)] . \end{aligned}$$

(3)

Suppose that an auxiliary statistic a is given, such that the statistic $({\hat{\omega }}, a)$ is minimal sufficient for $\omega $, where ${\hat{\omega }}$ denotes the maximum likelihood estimator. Then the corresponding observed likelihood yoke on $\Omega $ is the function g on $\Omega \times \Omega $ given by

$$\begin{aligned} g(\omega , \omega ') = l(\omega ; \omega ',a) - l(\omega '; \omega ',a). \end{aligned}$$

(4)

Properties and applications of expected and observed likelihood yokes can be found in [1, 3].

A key property of yokes is that they give rise naturally to preferred coordinate charts (called extended normal coordinates) taking values in appropriate cotangent spaces. Given any point $\omega $ of $\Omega $, the function ${\Gamma }_{\omega }$ from $\Omega $ to the cotangent space $T^{*}\Omega _{\omega }$ to $\Omega $ at $\omega $ is defined by

$$\begin{aligned} \Gamma _{\omega }(\omega ') = d_1 g (\omega , \omega ') . \end{aligned}$$

(5)

In terms of local coordinates $\omega ^1, \dots , \omega ^d$ on $\Omega $,

$$\begin{aligned} \Gamma _{\omega }(\omega ') = \frac{\partial g(\omega , \omega ')}{\partial \omega ^u} d \omega ^u , \end{aligned}$$

where the Einstein summation convention is used. It follows from property (i) of a yoke that $\Gamma _{\omega }(\omega ) = 0$ and from property (ii) that the restriction $\Gamma _{\omega } \vert U$ of $\Gamma _{\omega }$ to some neighbourhood U of $\omega $ in $\Omega $ is a coordinate chart on U taking values in $T^{*}\Omega _{\omega }$. Note that the space $T^{*} \Omega _{\omega }$ depends on $\omega $. It has been customary [3, Sect. 5.6], [6, Sect. 4], [16] to use the metric given by the yoke to ‘raise’ the ${\Gamma }_{\omega }$, in order to obtain extended normal coordinates with values in the tangent space $T \Omega _{\omega }$ rather than in its dual, the cotangent space $T^{*} \Omega _{\omega }$. The ${\Gamma }_{\omega }$ defined in (5) are used here because they can be regarded as more basic. In the language of strings, the coordinate expressions for the ‘raised’ versions of the derivatives of ${\Gamma }_{\omega }$ form the costring [6].

For any smooth function f on $\Omega $, the composition $f \circ \Gamma _{\omega }^{-1}$ is a function on an open neighbourhood of 0 in the vector space $T^{*} \Omega _{\omega }$, and so its derivatives are symmetric tensors on $T^{*} \Omega _{\omega }$. Combining these tensors with $\Gamma _{\omega }$ gives an invariant Taylor expansion (a parameterisation-invariant analogue of a Taylor expansion) of f. Expressions in index notation for (‘lowered’ versions of) these invariant Taylor expansions are given in [5, Sect. 3.3], [3, Sect. 5.6], [16, Sect. 4]. Similarly, for any smooth function h on $\Omega \times \Omega $, the composition $h \circ ( \Gamma _{\omega }^{-1} \times \Gamma _{\omega }^{-1})$ is a function on an open neighbourhood of 0 in the vector space $T^{*} \Omega _{\omega } \times T^{*} \Omega _{\omega }$, and so its derivatives are symmetric tensors on $T^{*} \Omega _{\omega } \otimes T^{*} \Omega _{\omega }$. In the language of strings, these tensors are said to be obtained by intertwining [1]. In particular, Taylor expansion of g in the corresponding product coordinate charts on a neighbourhood of $(\omega , \omega )$ in $\Omega \times \Omega $ yields a family of tensors $T _{r_1, \dots , r_p; s_1, \dots , s_q}$ on $\omega $ [6].

Remark 1

Extended normal coordinates, $\Gamma _{\omega }$, can be defined also in the more general setting of pre-contrast functions, meaning functions $h: \Omega \times \Omega \rightarrow T^{*}\Omega $ such that

(o)
$h(\omega ,\omega ') \in T^{*}\Omega _{\omega }$,
(i)
$h(\omega ,\omega ) = 0$,
(ii)
$d_2 h$ is non-degenerate on the diagonal, $\Delta _\Omega $, where $d_2$ denotes the exterior derivative along $\{ \omega \} \times \Omega $.

(In the language of vector bundles, h is a section of the pull-back of the cotangent bundle of $\Omega $ by the projection $\pi _1: \Omega \times \Omega \rightarrow \Omega $ onto the first factor, such that $h = 0$ on the diagonal and its derivative is non-degenerate there.) The original definition [12] of pre-contrast functions required the restriction of $- d_2 h$ to the diagonal to be a semi-Riemannian metric on $\Omega $.

The general mathematical concept that underlies the results in this paper is that of a fibred yoke, i.e., a submersion $\pi : \Omega \rightarrow \Psi $, together with a yoke on $\Omega $. In the current context, $\pi $ maps parameters to interest parameters, and the yoke is a likelihood yoke (3) or (4).

2.2 Decomposition of tangent spaces

In the tangent space $T \Omega _{\omega }$ to $\Omega $ at $\omega $ the vertical subspace $V_{\omega }$ is defined as $V_{\omega } = \left\{ X \in T \Omega _{\omega }: \pi _{*}(X) = 0 \right\} $. Given a Riemannian metric $\phi $ on $\Omega $, the horizontal subspace $H_{\omega }$ is the orthogonal complement of $V_{\omega }$ in $T \Omega _{\omega }$. Thus $\phi $ decomposes $T \Omega _{\omega }$ as the orthogonal direct sum

$$\begin{aligned} T \Omega _{\omega } = V_{\omega } \oplus H_{\omega } . \end{aligned}$$

(6)

The decomposition (6) varies smoothly with $\omega $, in the sense that $\omega \mapsto (V_{\omega } , H_{\omega })$ is a smooth map from $\Omega $ to $V_q(T \Omega ) \times V_p(T \Omega )$, where $V_r(T \Omega )$ denotes the manifold $\{ (\omega , E_{\omega }) : E_{\omega }{} { isan}r\text{-dimensional } \text{ subspace } \text{ of } T\Omega _{\omega }{} \}$, and p and q are the dimensions of the interest and nuisance parameters, respectively. The smoothness of the decomposition (6) implies that $Y_h, Y_v, Y_{hv}$, and $Y_{vv}$ defined in Subsection 3.1 depend smoothly on $\omega $, and so, under mild regularity conditions, the tensors defined in (9) below exist. The tangent mapping $\pi _{*}$ identifies $H_{\omega }$ with $T \Psi _{\pi ( \omega )}$.

The inner product $\pi _{\omega } \phi $ on $T\Psi _{\pi (\omega )}$ is defined by

$$\begin{aligned} \pi _{\omega } \phi (X, Y) = \phi (\tilde{X}, \tilde{Y}) \quad X, Y \in T\Psi _{\pi (\omega )} , \nonumber \end{aligned}$$

where $\tilde{X}$ and $\tilde{Y}$ are the horizontal lifts to $T \Omega _{\omega }$ of X and Y, i.e., they are the unique elements of $ H_{\omega }$ such that $\pi _{*} (\tilde{X}) = X$ and $\pi _{*} (\tilde{Y}) = Y$. The dual of the decomposition (6) of the tangent space $T \Omega _{\omega }$ to $\Omega $ at $\omega $ is the decomposition

$$\begin{aligned} T ^* \Omega _{\omega } = V^*_{\omega } \oplus H^*_{\omega } \end{aligned}$$

(7)

of the cotangent space $T ^* \Omega _{\omega }$ to $\Omega $ at $\omega $. Taking the r-fold tensor product of the decomposition (7) of $T^* \Omega _{\omega }$ leads to the decomposition

$$\begin{aligned} \otimes ^r T^* \Omega _{\omega } = \oplus _{s = 0}^r \left( ( \otimes ^s V^*_{\omega }) \otimes \left( \otimes ^{r-s} H^*_{\omega } \right) \right) \end{aligned}$$

(8)

of the space of r-fold tensors on $T^* \Omega _{\omega }$.

The projection of the score onto $H^*_{\omega }$ using the decomposition (7) is the horizontal score, $U_h$, used in (1). It is the score for the interest parameter, $\psi $, and is also known as the orthogonal score [13, 17].

3 Higher-order behaviour of S

3.1 Tensors from log-likelihood derivatives

Denote by $Z_1, Z_2, Z_3$ the 1st, 2nd and 3rd derivatives of the log-likelihood, centred and scaled by $n^{-1/2}$ to have order $O_p(1)$. Expressing $Z_1, Z_2, Z_3$ in the functions $\Gamma _{\omega }$ around $\omega $ given by (5) with the expected likelihood yoke (3) yields random tensors $Y_1, Y_2, Y_3$. Decomposing $Y_1, Y_2, Y_3$ by (8) gives $Y_h$ in $H^{*}_{\omega }$, $Y_v$ in $V^{*}_{\omega }$, $Y_{hv}$ in $H^{*}_{\omega } \otimes V^{*}_{\omega }$, $Y_{vv}$ in $\otimes ^2 V^{*}_{\omega }$ and $Y_{hvv}$ in $H^{*}_{\omega } \otimes (\otimes ^2 V^{*}_{\omega })$. The tensors $\tau _{h, h, h} $ in $\otimes ^3 H^{*}_{\omega }$, $\tau _{h, h, v} $ in $(\otimes ^2 H^{*}_{\omega }) \otimes V^{*}_{\omega }$, $\tau _{h, v, v} $ in $H^{*}_{\omega } \otimes (\otimes ^2 V^{*}_{\omega })$, $\tau _{hv, hv}$ in $\otimes ^2(H^{*}_{\omega } \otimes V^{*}_{\omega })$, $\tau _{h,h,vv}$ in $(\otimes ^2 H^{*}_{\omega }) \otimes (\otimes ^2 V^{*}_{\omega })$, $\tau _{h,v, hv}$ in $\otimes ^2(H^{*}_{\omega } \otimes V^{*}_{\omega })$ and $\tau _{h,h, h,h}$ in $\otimes ^4 H^{*}_{\omega }$ are defined by

$$\begin{aligned} \tau _{h, h, h} = E[ \otimes ^3 Y_h ] , \quad \tau _{h, h, v}&= E[ (\otimes ^2 Y_h) \otimes Y_v ] , \quad \tau _{h, v, v} = E[ Y_h \otimes (\otimes ^2 Y_v) ] , \nonumber \\ \tau _{hv, hv} = E[ Y_{hv} \otimes Y_{hv} ] , \, \, \tau _{h, h,vv}&= E[ Y_h \otimes Y_h \otimes Y_{vv} ] , \, \, \tau _{h, v, hv} = E[ Y_h \otimes Y_{v} \otimes Y_{hv} ] , \nonumber \\ \tau _{h, h, h,h}&= E[ \otimes ^4Y_h ] . \end{aligned}$$

(9)

Remark 2

The tensors (9) can be obtained from the expected yoke (3). There are analogous tensors [3, Sect. 5.5] arising from the observed likelihood yoke ( 4). Under ordinary repeated sampling, corresponding tensors differ by $O(n^{-1/2})$.

3.2 Moment generating function of S

One way [3, Sect. 5.3] of deriving the constant R in the expression (2) for $w^{*}$ is based on expanding w to order $O(n^{-1})$ as a quartic in the score. There is an analogous expansion of S as

$$\begin{aligned} S = S_0 + n^{-1/2} S_1 + n^{-1} S_2 + O(n^{-3/2}), \nonumber \end{aligned}$$

where $S_0, S_1, S_2$ are $O_p(n^{-1})$, $S_0$ is a homogeneous quadratic in $Y_1$, $S_1$ is a homogeneous cubic in $Y_1, Y_2$, and $S_2$ is a homogeneous quartic in $Y_1, Y_2, Y_3$. Calculation of some low-order moments of products of $S_0, S_1$ and $S_2$ leads to the following theorem.

Theorem 1

Suppose that (a) the sample space is continuous, (b) the log-likelihood function is finite and its derivatives of order 4 or less are continuous in some neighbourhood of $\omega $, (c) the Fisher information at $\omega $ is non-singular. Then the moment generating function $M_S(t)$ of S has the form

$$\begin{aligned} M_S(t) = {(1 - 2t)}^{-\frac{p}{2}} \left\{ 1 + \frac{1}{24 n} \left( A_{1}d + A_{2}d^{2} + A_{3} d^{3} + O(d^4) \right) \right\} + O(n^{-3/2}), \end{aligned}$$

(10)

where $d = 2t/(1-2t)$ and

$$\begin{aligned} A_1&= 12 \, \textrm{tr}_h \, \textrm{tr}_v \, ( \tau _{hv,hv}) + 3 \, \langle \textrm{tr}_v \, (\tau _{h,v,v}) , \textrm{tr}_h \, (\tau _{h,h,h}) \rangle _h\nonumber \\&+ 6 \, \Vert \tau _{h,v,v} \Vert ^2 + 6 \,\textrm{tr}_h \, \textrm{tr}_v (\tau _{h,h,vv}) + 36 \,\textrm{tr}_h \, \textrm{tr}_v (\tau _{h,v,hv}) + 6 \, \Vert \textrm{tr}_v \, \tau _{h,v,v }\Vert ^2 , \end{aligned}$$

(11)

$$\begin{aligned} A_2&= 3 \, \textrm{tr}_h \, \textrm{tr}_h \, ( \tau _{h,h,h,h}) - 6 \Vert \tau _{h,h,v} \Vert ^2 - 3 \Vert \textrm{tr}_{h} \, (\tau _{h,h,v}) \Vert _v^2 \nonumber \\&- 6 \,\langle \textrm{tr}_v \, (\tau _{h,v,v}) , \textrm{tr}_h \, (\tau _{h,h,v}) \rangle _h \, , \end{aligned}$$

(12)

$$\begin{aligned} A_3&= 3 \, \Vert \textrm{tr} \,(\tau _{h,h,h}) \Vert ^2 + 2 \, \Vert \tau _{h,h,h} \Vert ^2 , \end{aligned}$$

(13)

where $\textrm{tr}_h$ and $ \textrm{tr}_v$ indicate traces taken over pairs of factors in $H_{\omega }^{*}$ and $V_{\omega }^{*}$, respectively, while inner products and norms on the tensor spaces $\otimes H_{\omega }^{*}$, etc. are those given by tensor products of inverse Fisher information.

If the null hypothesis, $H_0$, is simple then

$$\begin{aligned} A_1&= 0 , \end{aligned}$$

(14)

$$\begin{aligned} A_2&= 3 \, \textrm{tr} \, \textrm{tr} \, ( \tau _4) \, , \end{aligned}$$

(15)

$$\begin{aligned} A_3&= 3 \, \Vert \textrm{tr} \,(\tau _3) \Vert ^2 + 2 \, \Vert \tau _3 \Vert ^2 , \end{aligned}$$

(16)

where $\tau _3 = \tau _{h,h,h}$, $\tau _4 = \tau _{h,h,h,h}$, and the expressions given in [11, (3)] agree with (14)–(16) Further, in this case of a simple $H_0$, the constant R in the definition (2) of the Bartlett adjusted version $w^{*}$ of w can be expressed as

$$\begin{aligned} R = \frac{1}{12} \left\{ 12 \, \textrm{tr} \, \textrm{tr} \, (\tau _{2,2} ) + A_2 + A_3 \right\} \end{aligned}$$

with $A_2$ and $A_3$ as in (15)–(16) and $\tau _{2,2}$ in $\otimes ^4 T^{*} \Omega _{\omega }$ defined with components in [6, (5.22)]. There is also an expression [3, 6] for R in terms of analogous tensors (mentioned in Remark 2) arising from the observed likelihood yoke (4).

3.3 Cubic modification of S

Put

$$\begin{aligned} c&= \frac{A_1- A_2 + A_3}{12 p}, \qquad b = \frac{A_2 - 2 A_3}{12 p(p+2)}, \qquad a = \frac{A_3}{12 p(p+2)(p+4)} , \end{aligned}$$

where p is the dimension of $\Psi $, and define the cubic modification $S^{*}$ of S by

$$\begin{aligned} S^{*} = \left\{ 1 - \frac{1}{n}(c + b S + a S^2) \right\} S . \end{aligned}$$

Then [9] the null distribution of $S^{*}$ is $\chi ^2_p$ with error of order $O(n^{-3/2})$. A slight extension of the symmetry argument in [4] for the Bartlett-corrected likelihood ratio test shows that the error is of order $O(n^{-2})$.

Data availabilty

Not applicable.

References

Barndorff-Nielsen, O.E., Blæsild, P.: Strings: mathematical theory and statistical examples. Proc. R. Soc. Lond. A 411, 155–176 (1987)
Article ADS MathSciNet Google Scholar
Barndorff-Nielsen, O.E., Cox, D.R.: Asymptotic techniques for use in statistics. Chapman & Hall, London (1989)
Book Google Scholar
Barndorff-Nielsen, O.E., Cox, D.R.: Inference and asymptotics. Chapman & Hall, London (1994)
Book Google Scholar
Barndorff-Nielsen, O.E., Hall, P.: On the level-error after Bartlett adjustment of the likelihood ratio statistic. Biometrika 75, 374–388 (1988)
Article MathSciNet Google Scholar
Barndorff-Nielsen, O.E., Jupp, P.E., Kendall, W.S.: Stochastic calculus, statistical asymptotics, Taylor strings and phyla. Ann. Fac. Sci. Toulouse 3, 5–62 (1994)
Article MathSciNet Google Scholar
Blæsild, P.: Yokes and tensors derived from yokes. Ann. Inst. Stat. Math. 43, 95–113 (1991)
Article MathSciNet Google Scholar
Chandra, T.K., Ghosh, J.K.: Valid asymptotic expansions for the likelihood ratio statistic and other perturbed chi-square variables. Sankhyā 41, 22–47 (1979)
MathSciNet Google Scholar
Chandra, T.K., Mukerjee, R.: Bartlett-type modification for Rao’s efficient score statistic. J. Multivariate Anal. 36, 103–112 (1991)
Article MathSciNet Google Scholar
Cordeiro, G.M., Ferrari, S.L.P.: A modified score test statistic having chi-squared distribution to order $n^{-1}$. Biometrika 78, 573–582 (1991)
MathSciNet Google Scholar
Emberson, E.A.: The asymptotic distribution and robustness of the likelihood ratio and score test statistics. Doctoral dissertation, University of St Andrews (1994)
Harris, P.: An asymptotic expansion for the null distribution of the efficient score statistic. Biometrika 72, 653–659 (1985) Correction: Biometrika 74, 667 (1987)
Henmi, M., Matsuzoe, H.: Geometry of pre-contrast functions and non-conservative estimating functions. In: International Workshop on Complex Structures, Integrability and Vector Fields. AIP Conf. Proc. 1340, pp. 32–41. https://doi.org/10.1063/1.3567122 (2011)
Hudson, S., Vos, P.W.: Marginal information for expectation parameters. Canad. J. Statist. 28, 875–886 (2000)
Article MathSciNet Google Scholar
Jupp, P.E.: Modifications of the Rayleigh and Bingham tests for uniformity of directions. J. Multivariate Anal. 77, 1–20 (2001)
Article MathSciNet Google Scholar
McCullagh, P.: Tensor methods in statistics. Chapman & Hall, London (1987)
Google Scholar
Pace, L., Salvan, A.: The geometric structure of the expected/observed likelihood expansions. Ann. Inst. Stat. Math. 46, 649–666 (1994)
Article MathSciNet Google Scholar
Zhu, Y., Reid, N.: Information, ancillarity and sufficiency in the presence of nuisance parameters. Canad. J. Statist. 22, 111–123 (1994)
Article MathSciNet Google Scholar

Download references

Acknowledgements

I am indebted to Eleanor Emberson for discussions and for detailed calculations in [10] that led to many of the results presented here. I am grateful to Pia Veldt Larsen for discussions on the null asymptotic distributions of the score statistic and geometric Wald statistics. I thank two referees for their helpful comments.

Funding

This research was not funded.

Author information

Authors and Affiliations

School of Mathematics and Statistics, University of St Andrews, St Andrews, KY16 9SS, UK
P. E. Jupp

Authors

P. E. Jupp
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Not applicable.

Corresponding author

Correspondence to P. E. Jupp.

Ethics declarations

Conflict of interest

There are no competing interests.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Communicated by Hiroshi Matsuzoe.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Coefficients $A_1$–$A_3$ in terms of cumulants of log-likelihood derivatives

The coefficients $A_1$–$A_3$ can be expressed in terms of the cumulants of log-likelihood derivatives (for a single observation). In terms of local coordinates $\omega ^1, \dots , \omega ^{p+q}$ on $\Omega $, these cumulants have components

$$\begin{aligned} \kappa _{ij}&= E \left[ \frac{\partial ^2 l}{\partial \omega ^i \partial \omega ^j} (\omega ; x) \right] , \\ \kappa _{ijk}&= E \left[ \frac{\partial ^3 l}{\partial \omega ^i \partial \omega ^j \partial \omega ^k} (\omega ; x) \right] , \\ \kappa _{i,j}&= E \left[ \frac{\partial l}{\partial \omega ^i} (\omega ; x) \frac{\partial l}{\partial \omega ^j } (\omega ; x) \right] , \\ \kappa _{i,j,k}&= E \left[ \frac{\partial l}{\partial \omega ^i} ( \omega ; x) \frac{\partial l}{\partial \omega ^j } ( \omega ; x) \frac{\partial l}{\partial \omega ^k } ( \omega ; x) \right] , \end{aligned}$$

etc.

Suppose that $\omega ^1, \dots , \omega ^{p+q}$ are chosen such that $\omega ^1, \dots , \omega ^p$ are interest parameters, whereas $\omega ^{p+1}, \dots , \omega ^{p+q}$ are nuisance parameters. Let

$$\begin{aligned} K = \left( \begin{array}{cc} K _{1,1} &{} K _{1,2} \\ K _{2,1} &{} K_{2,2} \end{array} \right) \end{aligned}$$

be the $(p+q) \times (p+q)$ matrix of the $\kappa _{i,j}$, partitioned into blocks corresponding to the interest and nuisance parameters, respectively. Put

$$\begin{aligned} A&= \left( \begin{array}{cc} 0 &{} 0 \\ 0 &{} K_{2,2}^{-1} \end{array} \right) , \nonumber \\ M&= K^{-1} - A . \end{aligned}$$

(A1)

Then expressions (11–13) can be written in index notation as

$$\begin{aligned} A_1&= 12 \, (\kappa _{ij,kl} - \kappa ^{m,n} \kappa _{m,ij} \kappa _{n,kl} ) m^{ik} m^{jl} \\&+ 3 \, \kappa _{i,j,k} \kappa _{l,m,n} m^{il} a^{jk} a^{mn} + 6 \, \kappa _{i,j,k} \kappa _{l,m,n} m^{il} a^{jm} a^{kn} \\&+ 6 \, (\kappa _{i,j,kl} - \kappa ^{m,n} \kappa _{m,i,j} \kappa _{n,kl} ) m^{ij} a^{kl} \\&+ 36 \, (\kappa _{i,j,kl} - \kappa ^{m,n}\kappa _{m,i,j} \kappa _{n,kl} ) m^{ik} a^{jl} \\&+ 6 \, \kappa _{i,j,k} \kappa _{l,m,n} a^{il} m^{jk} a^{mn} \\ A_2&= 3 \, \kappa _{i,j,k,l} m^{ij} m^{kl} - 6 \, \kappa _{i,j,k} \kappa _{l,m,n} a^{il} m^{jm} m^{kn} \\&- 3 \, \kappa _{i,j,k} \kappa _{l,m,n} a^{il} m^{jk} m^{mn} - 6 \, \kappa _{i,j,k} \kappa _{l,m,n} m^{il} a^{jk} m^{mn} \\ A_3&= 3 \, \kappa _{i,j,k} \kappa _{l,m,n} m^{il} m^{jk} m^{mn} + 2 \, \kappa _{i,j,k} \kappa _{l,m,n} m^{il} m^{jm} m^{kn} , \end{aligned}$$

where indices run over $1, \dots , p+q$, the $\kappa ^{i,j}$ are the elements of $K^{-1}$, and the Einstein summation convention is used.

Appendix B Proof of Theorem 1

The proof of Theorem 1 proceeds along the lines of the derivation of the expression for the Bartlett correction factor given in [15, Sect. 7.4] and [3, Sect. 5.3]. Only an outline of the proof is given here; full details can be found in [10].

Step 1: S in terms of polynomials in $Y_v, Y_h, Y_{hv}, Y_{vv}$ and $Y_{hvv}$

Ordinary Taylor series expansion (in any coordinate system on the full parameter space $\Omega $) of $Z_1$ and $i_h ^{-1}$ about $\omega $ gives (in index notation)

$$\begin{aligned} {\tilde{Z}}_i&= Z_i + \kappa _{ij} {\tilde{\delta }}^j + n^{-1/2} \left( Z_{ij} {\tilde{\delta }}^j + \frac{1}{2} \kappa _{ijk} {\tilde{\delta }}^j {\tilde{\delta }}^k \right) \nonumber \\&\qquad + n^{-1} \left( \frac{1}{2} Z_{ijk} {\tilde{\delta }}^j {\tilde{\delta }}^k + \frac{1}{6} \kappa _{ijkl} {\tilde{\delta }}^j {\tilde{\delta }}^k {\tilde{\delta }}^l \right) + O(n^{-3/2}) \end{aligned}$$

(B1)

and

$$\begin{aligned} {\tilde{\kappa }}^{i,j}&= \kappa ^{i,j} - n^{-1/2} \kappa ^{i,r} \left[ \left( \frac{\partial }{\partial \omega ^a} \kappa _{r,s} \right) {\tilde{\delta }}^a + n^{-1/ 2} \frac{1}{2} \left( \frac{\partial ^2}{\partial \omega ^a \partial \omega ^b } \kappa _{r,s} \right) {\tilde{\delta }}^a {\tilde{\delta }}^b \right] \kappa ^{s,j} \nonumber \\&+ n^{-1} \kappa ^{i,r} \left( \frac{\partial }{\partial \omega ^a} \kappa _{r,s} \right) {\tilde{\delta }}^a \kappa ^{s,t} \left( \frac{\partial }{\partial \omega ^b} \kappa _{t,u} \right) {\tilde{\delta }}^b \kappa ^{u,j} +O(n^{-3/2}) , \end{aligned}$$

(B2)

where ${\tilde{\delta }}^i = n^{1/2} ({\tilde{\omega }} - \omega )^i$. Since ${\tilde{Z}}_i = 0$ if $\omega _i$ is a nuisance parameter, (B1) can be solved to give $Z_i$ (up to $O(n^{-3/2}$)) as a cubic in the ${\tilde{\delta }}^j$. Substituting (B1) and (B2) in (1) then gives S (up to $O(n^{-3/2}$)) as a cubic in the $Z_j$. For general coordinate systems the coefficients of this cubic are very complicated expressions in the first four cumulants of the score but if the coordinate charts $\Gamma _{\omega }$ are used then the coefficients take a much simpler form and

$$\begin{aligned} S = S_0 + n^{-1/2} S_1 + n^{-1} S_2 + O(n^{-3/2}), \end{aligned}$$

where $ S_0, S_1, S_2$ are polynomials (of degrees 2, 3 and 4, respectively) in $Y_v, Y_h, Y_{hv}, Y_{vv}$ and $Y_{hvv}$.

Step 2: The moment generating function of S.

The randomness in S comes from Y, where $Y = (Y_1, Y_2, Y_3)$. An approximation to order $O(n^{-1})$ to the probability density function of Y is obtained by Edgeworth expansion in terms of tensorial Hermite polynomials [2, Sect. 5.7] of orders 3 and 4. The regularity conditions in Theorem 1 ensure that this Edgeworth expansion is valid (see [7, Sect. 5]). Then the moment generating function $M_S$ of S satisfies

$$\begin{aligned} M_S(t)&= \frac{\vert 2 \pi V \vert ^{-1/2}}{\vert 2 \pi W \vert ^{-1/2}} \int \vert 2 \pi W \vert ^{-1/2} \exp \left\{ -\frac{1}{2} y^{\top }W^{-1} y \right\} P(y) d y + O(n^{-3/2}) , \end{aligned}$$

(B3)

where V is the variance matrix of Y, $W = (I - 2t V U ) ^{-1} V$ with

$$\begin{aligned} U = \left( \begin{array}{ccc} M &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 \end{array} \right) , \end{aligned}$$

M being given by (A1), and P is a function of $S_0, S_1, S_2$ and tensorial Hermite polynomials in $Y_1, Y_2, Y_3$. Equation (B3) can be written in terms of moments of $S_0, S_1, S_2$ and the tensorial Hermite polynomials. Calculation of these moments, together with some manipulation, then yields (10) and (11)–(13).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jupp, P.E. A parameterisation-invariant modification of the score test. Info. Geo. 7 (Suppl 1), 429–439 (2024). https://doi.org/10.1007/s41884-023-00101-4

Download citation

Received: 31 August 2022
Revised: 10 December 2022
Accepted: 04 February 2023
Published: 07 March 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s41884-023-00101-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A parameterisation-invariant modification of the score test

Abstract

Similar content being viewed by others

Quantifying the Bias of Non-linear Equating and Score Transformations

A Review of Score-Test-Based Inference for Categorical Data

Overestimation of Reliability by Guttman’s λ 4, λ 5, and λ 6 and the Greatest Lower Bound

1 Introduction

2 Yokes and fibred yokes

2.1 Yokes

Remark 1

2.2 Decomposition of tangent spaces

3 Higher-order behaviour of S

3.1 Tensors from log-likelihood derivatives

Remark 2

3.2 Moment generating function of S

Theorem 1

3.3 Cubic modification of S

Data availabilty

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent to participate

Consent for publication

Additional information

Publisher's Note

Appendices

Appendix A Coefficients \(A_1\)–\(A_3\) in terms of cumulants of log-likelihood derivatives

Appendix B Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A parameterisation-invariant modification of the score test

Abstract

Similar content being viewed by others

Quantifying the Bias of Non-linear Equating and Score Transformations

A Review of Score-Test-Based Inference for Categorical Data

Overestimation of Reliability by Guttman’s λ 4, λ 5, and λ 6 and the Greatest Lower Bound

1 Introduction

2 Yokes and fibred yokes

2.1 Yokes

Remark 1

2.2 Decomposition of tangent spaces

3 Higher-order behaviour of S

3.1 Tensors from log-likelihood derivatives

Remark 2

3.2 Moment generating function of S

Theorem 1

3.3 Cubic modification of S

Data availabilty

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent to participate

Consent for publication

Additional information

Publisher's Note

Appendices

Appendix A Coefficients \(A_1\)–\(A_3\) in terms of cumulants of log-likelihood derivatives

Appendix B Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation