Abstract
We bring together some strands of development concerning restricted likelihood ratio estimation and testing, including boundary hypothesis testing, going back to pioneering papers of Aitchison, Silvey and Chernoff for motivation. Thus, cases where the parameters are connected by a number of functional relationships, which may involve natural restrictions on the parameters and/or restrictions imposed by a null hypothesis, as well as situations where the null and alternate hypotheses place the true parameter at the boundary of disjoint subsets of the parameter space, are considered. Our asymptotic results are proved under clearly specified and minimal assumptions, which are probably close to the weakest possible. We illustrate with an example for distributions defined on the unit sphere in \(\mathbb {R}^{\varvec{d}}\).
Similar content being viewed by others
Notes
Throughout, vectors and matrices are depicted in boldface. A bold \(\textbf{0}\) will denote a zero vector or matrix whose dimension depends on the context; sometimes a subscript is used to denote the dimension. A superscript “T” denotes a vector or matrix transpose.
\(O_P(1)\) means bounded in probability, equivalently, relatively compact.
We take the norm of a matrix \(\textbf{M}\) to be \(||\textbf{M}||= \sup _{|\textbf{u}|: |\textbf{u}|=1} |\textbf{u}^T\textbf{M}|\).
Vu et al. (1998) assume the analogue of our matrix \(\textbf{B}_n\) is positive definite a.s., but this fact is not used in their proof. Sufficient is that \(\textbf{B}_n\) be symmetric nonsingular a.s.
In what follows we translate others’ notation to our usage where convenient.
The restriction (8.8) can also be taken into account by eliminating one component of \({\varvec{\mu }}\), say, \(\mu _d\), by solving for it in terms of the remaining \(d-1\) components. But this would destroy the symmetry of the setup and make interpretation difficult.
The matrix \(\textbf{F}_n({\varvec{\theta }})\) has determinant equal to \(n a(\kappa )\textrm{det}(-\overline{\textbf{X}}_n\overline{\textbf{X}}_n^T)\), which is 0 because the matrix \(\overline{\textbf{X}}_n\overline{\textbf{X}}_n^T\) has rank \(1<d\).
References
Aitchison, J. and Silvey, S.D. (1958). Maximum-likelihood estimation of parameters subject to restraints. Ann. Math. Statist., 29, 813–828.
Aitchison, J. and Silvey, S.D. (1960). Maximum-Likelihood estimation procedures and associated tests of significance. J. Roy. Statist. Soc. B (Methodological), 22, 154–171.
Andrews, D.W.K. (1998). Hypothesis testing with a restricted parameter space. J. Econometrics, 84, 155–199.
Andrews, D.W.K. (1999). Estimation when a parameter is on a boundary. Econometrica, 67, 1341–1383.
Andrews, D.W.K. (2001). Testing when a parameter is on the boundary of the maintained hypothesis. Econometrica, 69, 683–734.
Andersen, P.K. and Gill, R. D. (1982). Cox’s regression model for counting processes: A large sample study. Ann. Statist., 10, 1100–1120.
Breusch, T.W. (1986). Hypothesis testing in unidentified models. Rev. Econ. Stud., 53, 635–651.
Chant, D. (1974). On asymptotic tests of composite hypotheses in nonstandard conditions. Biometrika, 61, 291–298.
Chernoff, H. (1954). On the distribution of the likelihood ratio. Ann. Math. Statist., 25, 573–578.
Cox, D.R. and Hinkley, D.V. (1974). Theoretical Statistics. London: Chapman & Hall.
Drton, M. (2009). Likelihood ratio tests and singularities. Ann. Statist., 37, 979–1012.
Drton, M. and Sullivant, S. (2007). Algebraic statistical models. Statist. Sinica, 17, 1273–1297.
Eicker, F. (1963). Asymptotic normality and consistency of the least squares estimators for families of linear regressions. Ann. Math. Statist., 34, 447–456.
Eicker, F. (1965). Limit theorems for regressions with unequal and dependent errors. Proc. V Berkeley Symp. Math. Statist. Prob., Berkeley, CA., 1965/66, Vol. I, Univ. of California Press, Berkeley, CA. pp. 59–82.
Fahrmeir, L. and Kaufmann, H. (1985). Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann. Math. Statist., 13, 342–368.
Feder, P.I. (1968). On the distribution of the log likelihood ratio test statistic when the true parameter is “near” the boundaries of the hypothesis regions. Ann. Math. Statist., 39, 2044–2055.
Geyer, C.J. (1994). On the asymptotics of constrained M-estimation. Ann. Statist., 22, 1993–2010.
Geyer, C.J. (1991). Constrained maximum likelihood exemplified by isotonic convex logistic regression. J. Amer. Statist. Assoc., 86, 717–724.
Gromping, U. (2010). Inference with linear equality and inequality constraints using R: The Package. J. Stat. Softw., 33, 10.
Klüppelberg, C., Maller, R.A., Van De Vyver, M. and Wee D. (2002). Testing for reduction to random walk in autoregressive conditional heteroskedasticity models. Econom. J., 5, 387–416.
Kuiper, R.M., Hoijtink, H. and Silvapulle, M. (2011). An Akaike-type information criterion for model selection under inequality constraints. Biometrika, 98, 495–501.
Lehmann, E.L. (1983). Theory of Point Estimation. John Wiley & Sons, New York.
Lehmann, E.L. and Casella, G. (1998). Theory of Point Estimation, 2nd Ed.. Springer Texts in Statistics.
Maller, R.A. (2003). Asymptotics of regressions with stationary and nonstationary residuals. Stoch. Proc. Appl., 105, 33–67.
Maller, R.A. and Zhou, X. (2002). Analysis of parametric models for competing risks. Statist. Sinica, 12, 725–750.
McDonald, J.B. and Newey, W.K. (1988). Partially adaptive estimation of regression models via the generalized T distribution. Econ. Theory, 4, 428–457.
Mitchell, D.J., Allman, E.S. and Rhodes, J.A. (2019). Hypothesis testing near singularities and boundaries. Elect. J. Statist., 13, 2150–2193.
Newey, W.K. and McFadden, D. (1994). Large sample estimation and hypothesis testing. Ch. 36, Handbook Econ., 4, 2111–2245.
Rotnitzky, A, Cox, D.R., Bottai, M. and Robins, J. (2000). Likelihood-based inference with singular information matrix. Bernoulli, 6, 243–284.
Self, S.G. and Liang, K.Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Amer. Statist. Assoc., 82, 605–610.
Silvapulle, M.J. and Sen, P.K. (2005). Constrained Statistical Inference. Inequality, Order, and Shape Restrictions. Wiley, Hoboken, NJ.
Silvapulle, M.J. and Silvapulle, P. (1995). A score test against one-sided alternatives. J. Amer. Statist. Assoc., 90, 342–349.
Silvey, S.D. (1959). The Lagrangian multiplier test. Ann. Math. Statist., 30, 389–407.
Vanbrabant, L., Van de Schoot, R. and Rosseel, Y. (2015). Constrained statistical inference: Sample-size tables for ANOVA and regression. Front. Psychol., 5, 1565.
Vu, H.T.V., Maller, R.A. and Klass, M.J. (1996). On the studentisation of random vectors. J. Multivariate Anal., 57, 142–155.
Vu, H.T.V., Maller, R.A. and Zhou, X. (1998). Asymptotic properties of a class of mixture models for failure data: The interior and boundary cases. Ann. Instit. Statist. Math., 50, 627–653.
Vu, H.T.V. and Zhou, S. (1997). Generalization of likelihood ratio tests under nonstandard conditions. Ann. Statist., 25, 897–916.
Watson, G.S. (1983). Statistics on Spheres. University of Arkansas lecture notes in the mathematical sciences, John Wiley & Sons, New York.
Watson, G.S. (1984). The theory of concentrated Langevin distributions. J. Multivariate Anal., 14, 74–82.
Acknowledgements
We are very grateful to two referees who read the paper extremely closely and carefully and gave detailed and constructive suggestions which helped us improve it.
Funding
This research was partially supported by an Australian Government Research Council Discovery Grant (ARC Grant) DP0664603.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
There is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Proof of Theorems
Before proving Theorems 3.1 and 3.2 we mention some further preliminaries. We can suppose without loss of generality that the \(h_j({\varvec{\theta }})\) have been numbered so that the \(\mathbf{{H}}({\varvec{\theta }})\) in Eq. 2.6 satisfy
For \(\theta \in \Theta \), \({\varvec{\lambda }}\in \mathbb {R}^s\), \(n=1,2,\ldots \), define
where \((g_j(n))\) are the diagonal elements from the \(\mathbf{{G}}_n\) in Eq. 2.6. Differentiating \(\mathcal{L}_n^{{\lambda }*}({\varvec{\theta }})\) with respect to \({\varvec{\theta }}\) gives
Differentiating with respect to \({\varvec{\theta }}\) again gives
Since \(\mathbf{{h}}({\varvec{\theta }})=\textbf{0}\) on \(\Theta ^h\), we see that, for \({\varvec{\theta }}\in \Theta ^h\),
It follows that, for \({\varvec{\theta }}\in \Theta ^h\) and \({\varvec{\lambda }}\in \mathbb {R}^s\), minus the second derivative matrix of \(\mathcal{L}_n^{{\lambda }*}({\varvec{\theta }})\) with respect to \({\varvec{\theta }}\) is the matrix in Eq. 2.6:
Proof of Theorem 3.1
Assume (A1) and Eqs. 3.2–3.4, and let \({\varvec{\lambda }}_0\in \mathbb {R}^s\) be the particular value of \({\varvec{\lambda }}\) specified in Eqs. 3.3 and 3.4. Equation 3.4 implies that both probabilities
tend to 1 as \(n\rightarrow \infty \) then \(A\rightarrow \infty \) for arbitrary \(K>0\). Eq. 7.2 together with Eq. 3.2 implies that the \(\textbf{F}_n^{{\lambda }_0 *}({\varvec{\theta }})\) are positive definite on \(N_n^h(A)\), WPA1 as \(n \rightarrow \infty \) then \(A\rightarrow \infty \). Thus
Now when \({\varvec{\theta }}\in \Theta ^h\), \(\mathcal{L}_n^{{\lambda }*}({\varvec{\theta }})=\mathcal{L}_n^{{\lambda }}({\varvec{\theta }})=\mathcal{L}_n({\varvec{\theta }})\) for any \({\varvec{\lambda }}\in \mathbb {R}^s\), so we have
For \(A>0\), \(n=1,2,\ldots \), define \( M^h_n(A)\) as the boundary of \(N_n^h(A)\), thus
By definition, \( M^h_n(A)\subseteq \Theta ^h\). We now show that
This is done as follows. Take \(A>1\), and \({\varvec{\theta }}\in N_n^h(A)\). Then \({\varvec{\theta }}\in \Theta ^h\). It follows from a Taylor expansion in \({\varvec{\theta }}\) that
where \(\overline{{\varvec{\theta }}}=\alpha {\varvec{\theta }}+(1-\alpha ){\varvec{\theta }}_0\) for some \(\alpha \in [0,1]\). Since \({\varvec{\theta }}\) and \({\varvec{\theta }}_0\) are in \(N_n^h(A)\), we have \(\overline{{\varvec{\theta }}}\in N_n(A)\) (but not necessarily \(\overline{{\varvec{\theta }}}\in N_n^h(A)\), because \(\overline{{\varvec{\theta }}}\) may not satisfy \(\mathbf{{h}}(\overline{{\varvec{\theta }}})=\textbf{0}\)). Let
Observe that \(\textbf{S}_n^{{\lambda }_0 *}({\varvec{\theta }})=\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }})\) when \({\varvec{\theta }}\in \Theta ^h\) (by Eq. 7.1). So for \(c>0\) we have by Eq. 7.6
When \({\varvec{\theta }}\in M^h_n(A)\), \(\mathbf{{v}}_n({\varvec{\theta }})\) is a unit vector (by Eq. 7.4). Thus by Eq. 3.3 the first probability on the righthand side of Eq. 7.7 converges to 0 as \(n\rightarrow \infty \) then \(A\rightarrow \infty \), because
For the second probability on the righthand side of Eq. 7.7, we have, when \({\varvec{\theta }}\in M^h_n(A)\),
From the last inequality follows
By Eq. 3.4, the righthand side here tends to 0 as \(n\rightarrow \infty \), then \(A\rightarrow \infty \), then \(c\rightarrow 0\). As a result, we get Eq. 7.5 from 7.7.
Next, for \(A>0\), \(n=1,2,\ldots \), define events
Take \(n=1,2,\ldots \) and \(A>0\), and suppose \(E_n(A)\) occurs. Then \(\mathcal{L}_n({\varvec{\theta }})\) continuous and concave on the closed, convex neighbourhood \(N_n^h(A)\) implies that \(\mathcal{L}_n({\varvec{\theta }})\) has a unique maximum point on \(N_n^h(A)\). Define \(\widehat{\varvec{\theta }}_n(A)\) to be this unique maximum (on the complement of \(E_n(A)\), \(\widehat{\varvec{\theta }}_n(A)\) need not be defined). In detail, Eq. 7.8 tells us that for each \(\varepsilon >0\) there is an \(A_0(\varepsilon )>0\) such that for each \(A \ge A_0(\varepsilon )>0\) there exists \(n_1(\varepsilon , A)\) with
whenever \(n\ge n_1(\varepsilon , A)\). Note that the maximum may occur at a boundary point of \(\Theta ^h\).
Now \(\widehat{\varvec{\theta }}_n(A)\) does not depend on \({\varvec{\lambda }}_0\) but it may depend on A. We can remove this dependence as follows. Given any positive integer m, by Eq. 7.8 there exists \(A_m>0\) such that
So there is a positive sequence \(f_m\uparrow \infty \) such that
By Eq. 3.2, there is a sequence \(g_m\uparrow \infty \) such that, for all \(n\ge g_m\),
Let \(h_m=\max (f_m, g_m)\). Then \(h_m\uparrow \infty \) and for any \(m\ge 1\) and \(n\ge h_m\) we have
For each \(n\ge h_1\), we can find \(m=m(n)\) such that \(h_{m(n)}\le n <h_{m(n)+1}\). Suppose \(E_n(A_{m(n)})\) occurs and let \(\widehat{\varvec{\theta }}_n =\widehat{\varvec{\theta }}_n(A_{m(n)})\). Note that \(\widehat{\varvec{\theta }}_n\) depends on n only and \(\widehat{\varvec{\theta }}_n \in N_n^h(A_{m(n)})\). Now, given any \(\varepsilon >0\), let \(m_0=m_0(\varepsilon )\) be an integer greater than \(2/\varepsilon +1\). When \(n\ge h_{m_0}\), then \(m(n)\ge m_0-1\) and
Hence, as \(\widehat{\varvec{\theta }}_n \in N_n^h(A_{m(n)})\), on \(E_n(A_{m(n)})\cap \{{\lambda }_{\min }(\textbf{D}_n\textbf{D}_n^T)\ge m(n)A_{m(n)}^2\}\) we have
Thus the \(\widehat{\varvec{\theta }}_n\) we have constructed is locally unique on \(\Theta ^h\), WPA1, and is consistent for \({\varvec{\theta }}_0\).
Since \(A_m\rightarrow \infty \) as \(m\rightarrow \infty \), and \(\widehat{\varvec{\theta }}_n =\widehat{\varvec{\theta }}_n(A_{m(n)})\), it may seem that we will no longer have \(\widehat{\varvec{\theta }}_n \in N_n^h(A)\) WPA1 as \(n\rightarrow \infty \) then \(A\rightarrow \infty \). However, in fact we can show Eq. 3.5. To see this, take \(\varepsilon >0\). Choose \(A_0(\varepsilon )\) and \(n_1(\varepsilon , A)\) so that Eq. 7.9 holds for \(A\ge A_0(\varepsilon )\) and \(n\ge n_1(A, \varepsilon )\). Also choose \(n_2(A)\) such that \(A_{m(n)} \ge A\) for all \(n\ge n_2(A)\). Then \(N_n^h(A) \subseteq N_n^h(A_{m(n)})\). When \(\widehat{\varvec{\theta }}_n(A)\) exists uniquely in \(N_n^h(A)\) and \(E_n(A_{m(n)})\) occurs, then \(\mathcal {L}_n({\varvec{\theta }})\) is concave on \(N_n^h(A_{m(n)})\), so \(\widehat{\varvec{\theta }}_n(A)\) must maximise \(\mathcal {L}_n({\varvec{\theta }})\) over \(N_n^h(A_{m(n)})\) as well. This implies \(\widehat{\varvec{\theta }}_n(A)=\widehat{\varvec{\theta }}_n(A_{m(n)})=\widehat{\varvec{\theta }}_n\), and so \(\widehat{\varvec{\theta }}_n \in N_n^h(A)\). As a result, for each \(A\ge A_0(\varepsilon )\) and all \(n\ge \max (h_{m_0(\varepsilon )},n_1(\varepsilon ,A),n_2(A))\), by Eqs. 7.9 and 7.10 we have
Letting \(n\rightarrow \infty \), then \(A\rightarrow \infty \), then \(\varepsilon \rightarrow 0\), proves Eq. 3.5.
To complete the proof of Theorem 3.1, we stress that \(\widehat{\varvec{\theta }}_n\) does not depend on the choice of \({\varvec{\lambda }}_0\) in Eq. 2.2, \(\mathbf{{G}}_n\) in Eq. 2.6, or \(\textbf{D}_n\) in Eqs. 3.2–3.4. Retracing the argument, we saw that \(\mathcal{L}_n^{{\lambda }*}({\varvec{\theta }})=\mathcal{L}_n^{{\lambda }}({\varvec{\theta }})=\mathcal{L}_n({\varvec{\theta }})\) for any \({\varvec{\lambda }}\in \mathbb {R}^s\) when \({\varvec{\theta }}\in \Theta ^h\), and, in Eq. 7.3, that \(\mathcal{L}_n({\varvec{\theta }})\) is strictly concave for \({\varvec{\theta }}\in N_n^h(A)\) WPA1 as \(n\rightarrow \infty \), then \(A\rightarrow \infty \). By showing in Eq. 7.5 that \(\mathcal{L}_n({\varvec{\theta }})\) is smaller than \(\mathcal{L}_n({\varvec{\theta }}_0)\) for \({\varvec{\theta }}\) in the boundary set \(M_n^h(A)\), we established the existence of a unique maximum of \(\mathcal{L}_n({\varvec{\theta }})\) in \(N_n^h(A)\), WPA1 as \(n\rightarrow \infty \), then \(A\rightarrow \infty \). These considerations do not depend on the choice of \({\varvec{\lambda }}_0\) or \(\mathbf{{G}}_n\). The neighbourhoods \(N_n(A)\) and \(N_n^h(A)\) in Eq. 3.1 do depend on the choice of \(\textbf{D}_n\), so conceivably two choices \(\textbf{D}_n^1\) or \(\textbf{D}_n^2\) satisfying (3.2)–(3.4) with corresponding neighbourhoods \(N_n^{1,h}(A)\) and \(N_n^{2,h}(A)\) may give different estimators \(\widehat{\varvec{\theta }}_n^1\in N_n^{1,h}(A)\) and \(\widehat{\varvec{\theta }}_n^2\in N_n^{2,h}(A)\). But by the strict concavity of \(\mathcal{L}_n({\varvec{\theta }})\) in each neighbourhood, \(\widehat{\varvec{\theta }}_n^1\) and \(\widehat{\varvec{\theta }}_n^2\) must be equal, and in fact lie in \(N_n^{1,h}(A)\cap N_n^{2,h}(A)\). So \(\widehat{\varvec{\theta }}_n\) does not depend on the choice of \({\varvec{\lambda }}_0\), \(\mathbf{{G}}_n\), or \(\textbf{D}_n\).
\(\square \)
Proof of Theorem 3.2
Assume (A1), that \(\widehat{\varvec{\theta }}_n\) is a consistent estimator for \({\varvec{\theta }}_0\), and that (3.7) is a consistent system of equations for \({\varvec{\lambda }}\) with a unique solution \(\widehat{\varvec{\lambda }}_n\). Then
and by Eqs. 3.9 and 3.11 we have
Since \(\widehat{\varvec{\theta }}_n\) is consistent for \({\varvec{\theta }}_0\), the matrix \(\mathbf{{H}}({\varvec{\theta }})\) is a continuous function of \({\varvec{\theta }}\) at \({\varvec{\theta }}_0\), and \(\mathbf{{H}}({\varvec{\theta }}_0)\) is of full rank, we deduce from Eqs. 7.11 and 7.12 that \(\mathbf{{C}}_n\widehat{\varvec{\lambda }}_n/a_n=O_P(1)\) as \(n\rightarrow \infty \). Assuming Eq. 3.8 as well, we conclude that \(\widehat{\varvec{\lambda }}_n=O_P(1)\) as \(n\rightarrow \infty \), and then letting \(n\rightarrow \infty \) in Eq. 7.11 through subsequence a for which \(\widehat{\varvec{\lambda }}_n\) has a finite limit in distribution, \({\varvec{\nu }}_0\), say, shows that
of which the unique solution is, by Eq. 3.10, \({\varvec{\nu }}_0={\varvec{\lambda }}_0\). So \(\widehat{\varvec{\lambda }}_n\overset{\textrm{P}}{\longrightarrow } {\varvec{\lambda }}_0\) as \(n\rightarrow \infty \). \(\square \)
Proof of Theorem 4.1
Throughout, restrict the sample space to an event on which the \((d+s)\times (d+s)\) negative second derivative matrix \(\mathbf{{U}}_n^{\lambda }({\varvec{\theta }})\) in Eq. 4.2 is nonsingular for all \(({\varvec{\theta }},{\varvec{\lambda }})\) in a neighbourhood of \(({\varvec{\theta }}_0,{\varvec{\lambda }}_0))\) with high probability, as is possible by Eq. 4.5. Keep \(\widehat{\varvec{\theta }}_n\in \Theta ^h\) and \(\widehat{\varvec{\lambda }}_n\) in this neighbourhood throughout the proof. Use Taylor’s theorem to write
where \(\overline{{\varvec{\theta }}}_n=\alpha \widehat{\varvec{\theta }}_n+(1-\alpha ){\varvec{\theta }}_0\) and \(\overline{{\varvec{\lambda }}}_n=\beta \widehat{\varvec{\lambda }}_n+(1-\beta ){\varvec{\lambda }}_0\) for some \(\alpha \in [0,1]\), \(\beta \in [0,1]\). Recalling that \(\partial \mathcal{L}_n^{\lambda }({\varvec{\theta }})/\partial {\varvec{\theta }}=\textbf{S}_n^{\lambda }({\varvec{\theta }})\), in this interior case we have \( \partial \mathcal{L}_n^{\widehat{\lambda }_n}(\widehat{\varvec{\theta }}_n)/ \partial {\varvec{\theta }}= \textbf{S}_n^{\widehat{\lambda }_n}(\widehat{\varvec{\theta }}_n)=\textbf{0}_d\) since \((\widehat{\varvec{\theta }}_n,\widehat{\varvec{\lambda }}_n)\) satisfies (4.1), while from Eq. 2.2 we have
because \(\widehat{\varvec{\theta }}_n\in \Theta ^h\) and \({\varvec{\theta }}_0\in \Theta ^h\). Equation 7.13 now gives
which we can solve to get
Then write
in which \(\textbf{J}_n^T\big (\mathbf{{U}}_n^{\overline{{\lambda }}_n}(\overline{{\varvec{\theta }}}_n)\big )^{-1} \textbf{J}_n\overset{\textrm{P}}{\longrightarrow } \textbf{U}_0^{-1}\) by Eq. 4.5 (noting that \((\overline{{\varvec{\theta }}}_n, \overline{{\lambda }}_n) \overset{\textrm{P}}{\longrightarrow } ({\varvec{\theta }}_0, {\lambda }_0)\)), while by Eq. 4.3
Then Eq. 4.6 follows from Eqs. 7.14 and 7.15. \(\square \)
Proof of Theorem 5.1
Assume (A1)–(A3), Eqs. 3.6, 5.4 and 5.5, and that \(\textbf{D}_n^{-1}\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\overset{\textrm{D}}{\longrightarrow } \textbf{Z}\). For \({\varvec{\theta }}\in \Theta \) and \({\varvec{\lambda }}\in \mathbb {R}^s\) define the \((d+s)\times 1\) vectors
and define \(\mathbf{{U}}_n^{{\lambda }}({\varvec{\theta }})\) as in Eq. 4.2. By Eq. 2.4, \(\partial \mathcal{L}_n^{{\lambda }}({\varvec{\theta }})/\partial {\varvec{\theta }}= S_n^{\lambda }({\varvec{\theta }})\). We assume (3.6), so by Eq. 2.2,
and note that this equals 0 when \({\varvec{\theta }}={\varvec{\theta }}_0\).
Now keep \({\varvec{\theta }}\in N_n^h(A)\). Since \(\mathcal{L}_n({\varvec{\theta }})\) and \(\mathcal{L}_n^{{\lambda }}({\varvec{\theta }})\) agree on \(N_n^h(A)\) for any \({\varvec{\lambda }}\), a Taylor expansion as in Eq. 7.13 gives
where \((\overline{{\varvec{\theta }}}, \overline{{\varvec{\lambda }}})=(\alpha {\varvec{\theta }}+(1-\alpha ){\varvec{\theta }}_0,\beta {\varvec{\lambda }}+(1-\beta ){\varvec{\lambda }}_0)=: \overline{{\varvec{\xi }}}\) for some \(\alpha , \beta \in [0,1]\), \(\mathbf{{U}}_0\) is the limit matrix in Eq. 5.7, and
Let \(\textbf{Y}_n:= \textbf{J}_n^{-1} \textbf{T}_n^{{\lambda }_0}({\varvec{\theta }}_0)\), and now set \({\varvec{\lambda }}={\varvec{\lambda }}_0\). Then we can rewrite (7.16) as
where \({\varvec{\xi }}-{\varvec{\xi }}_0= [({\varvec{\theta }}-{\varvec{\theta }}_0)^T\ \textbf{0}^T]^T\) and \(q_n({\varvec{\theta }})\) and \(t_n({\varvec{\theta }})\) depend on \({\varvec{\theta }}\) but not on \({\varvec{\lambda }}\). By assumption there are \(\widehat{\varvec{\theta }}_n^\Omega \in \Omega \) and \(\widehat{\varvec{\theta }}_n^\tau \in \tau \), not depending on \({\varvec{\lambda }}_0\), both consistent for \({\varvec{\theta }}_0\), such that Eq. 3.5 holds with \(\widehat{\varvec{\theta }}_n^\Omega \) and \(\widehat{\varvec{\theta }}_n^\tau \) substituted for \(\widehat{\varvec{\theta }}_n\). Recall the matrices \(\textbf{D}_n\) and \(\textbf{E}_n\) in Eq. 4.4. Since \(\textbf{D}_n^{-1}\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\overset{\textrm{D}}{\longrightarrow } \textbf{Z}\), given \(\varepsilon \in (0,1)\), we can find \(A_0(\varepsilon )\) and for each \(A\ge A_0(\varepsilon )\) an \(n_0(A,\varepsilon )\) such that, for \(n\ge n_0\), the event
has probability exceeding \(1-\varepsilon \). (Recall that \(\textbf{F}_0\) and \(\textbf{P}\) are defined in the statement of Theorem 5.1.) In what follows we suppose event \(E_n\) occurs.
We work at first with \(\widehat{\varvec{\theta }}_n^\Omega \). Let \(\breve{\varvec{\theta }}_n^\Omega \) denote the value that maximizes the quadratic function \(q_n({\varvec{\theta }})\) in Eq. 7.18 on the closed convex \(N_n(A)\cap \Omega \). We show that \(\mathcal{L}_n(\widehat{\varvec{\theta }}_n^\Omega )\) can be replaced to sufficient accuracy with \(\mathcal{L}_n(\breve{\varvec{\theta }}_n^\Omega )\), which in turn is well approximated by the quadratic function \(q_n(\breve{\varvec{\theta }}_n^\Omega )\). The latter quantity has an asymptotic distribution which can be expressed in terms of the first limit random variable in Eq. 5.6.
To carry out this program use (4.5), the uniform convergence in Eqs. 5.5, and 7.17 (with \({\varvec{\lambda }}={\varvec{\lambda }}_0\)) to see that \(t_n(\breve{\varvec{\theta }}_n^\Omega )\overset{\textrm{P}}{\longrightarrow } 0\) and \(t_n(\widehat{\varvec{\theta }}_n^\Omega )\overset{\textrm{P}}{\longrightarrow } 0\) as \(n\rightarrow \infty \). Since \(q_n(\widehat{\varvec{\theta }}_n^\Omega )\le q_n(\breve{\varvec{\theta }}_n^\Omega )\), Eq. 7.18 implies
Hence, because \(\breve{\varvec{\theta }}_n^\Omega \) maximizes \(q_n({\varvec{\theta }})\) on \(N_n(A)\cap \Omega \),
Recalling the definition of \(q_n(\theta )\) in Eq. 7.18, write the last expression as
Recall that the auxiliary matrices \(\textbf{T}_n\) introduced in Eq. 5.2 are orthogonal, so \(||\textbf{T}_n\textbf{D}_n||= ||\textbf{D}_n||\). Transform from \(({\varvec{\theta }},{\varvec{\lambda }})\) to \(\widetilde{\varvec{\xi }}= (\widetilde{{\varvec{\theta }}}, \widetilde{{\varvec{\lambda }}})= \big (\textbf{T}_n\textbf{D}_n({\varvec{\theta }}-{\varvec{\theta }}_0), \textbf{0}\big )\). Since \(C_\Omega \cap \mathcal{N}= \Omega \cap \mathcal{N}\), by Eq. 5.2, \({\varvec{\theta }}\in N_n(A)\cap \Omega \) iff \(|\widetilde{{\varvec{\theta }}}|\le A\) and \(\widetilde{{\varvec{\theta }}}\in \widetilde{C}_{\Omega _n}\). Then the \(\inf \) in Eq. 7.20 can be replaced by an \(\inf \) over \(|{\varvec{\theta }}|\le A\), \({\varvec{\theta }}\in \widetilde{C}_{\Omega _n}\) and we get
Note that since \({\varvec{\theta }}_0\in \Theta ^h\) and \(\textbf{h}({\varvec{\theta }})\) has a bounded second derivative on a neighborhood of \({\varvec{\theta }}_0\) we have, for any \({\varvec{\theta }}\in \Theta ^h\), \(\textbf{0}=\textbf{h}({\varvec{\theta }}) -\textbf{h}({\varvec{\theta }}_0) =\textbf{H}({\varvec{\theta }}_0)({\varvec{\theta }}-{\varvec{\theta }}_0)+O(|{\varvec{\theta }}-{\varvec{\theta }}_0|^2)\). So \(\textbf{H}({\varvec{\theta }}_0)({\varvec{\theta }}-{\varvec{\theta }}_0)= o_p(1)\) for \({\varvec{\theta }}\in N_n^h(A)\). Observe that the expression on the RHS of Eq. 7.21 is, apart from an \(o_p(1)\) term,
Since \(\textbf{Y}_n^T\mathbf{{U}}_0^{-1}\textbf{Y}_n\) does not depend on \({\varvec{\theta }}\) we can ignore it for the time being. Then to evaluate (7.21) set \(\widetilde{\varvec{\eta }}=\mathbf{{U}}_0\widetilde{{\varvec{\xi }}}= [\textbf{F}_0^T\ -\textbf{C}^T\textbf{H}^T({\varvec{\theta }}_0)]^T\widetilde{{\varvec{\xi }}} = [(\textbf{F}_0{\varvec{\theta }})^T\ \textbf{0}^T]^T+o_p(1)\) and calculate
In the last step we transformed from \(\widetilde{C}_{\Omega _n}\) to the set \(\breve{C}_{\Omega _n}= \{\textbf{F}_0{\varvec{\theta }}: {\varvec{\theta }}\in \widetilde{C}_{\Omega _n}\}\), where \(\textbf{F}_0\) is the nonsingular limit matrix in Eq. 5.7. Then \(\widetilde{\varvec{\eta }}\) transforms to \( {\varvec{\eta }}=[{\varvec{\theta }}^T\ \textbf{0}^T]^T\). Since \(\textbf{D}_n^{-1}\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\overset{\textrm{D}}{\longrightarrow } \textbf{Z}\) we have
and the expression in the RHS of Eq. 7.22 can be written, apart from an \(o_p(1)\) term, as
Here \(\mathbf{{P}}=\mathbf{{P}}^{1/2} \mathbf{{P}}^{T/2}\) is a square root decomposition of the positive semi-definite matrix \(\mathbf{{P}}\). Transform from \(\mathbf{{P}}^{T/2}{\varvec{\theta }}\) to \({\varvec{\theta }}\), so that, referring to Eq. 7.22,
where \( \dot{C}_{\Omega _n}=\{\mathbf{{P}}^{T/2}{\varvec{\theta }}:{\varvec{\theta }}\in \breve{C}_{\Omega _n}\} =\{\mathbf{{P}}^{T/2}\textbf{F}_0{\varvec{\theta }}:{\varvec{\theta }}\in \widetilde{C}_{\Omega _n}\}\).
Since \(\dot{C}_{\Omega _n}\) contains 0 we have
where the last inequality follows from Eq. 7.19 because \(\textbf{Z}_n=\textbf{D}_n^{-1} \textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\) and we assumed the event \(E_n\) occurs so \(|\mathbf{{P}}^{T/2}\textbf{Z}_n|\le A||\textbf{F}_0||/2\). There exists \(\dot{\varvec{\theta }}_n\in \dot{C}_{\Omega _n}\) such that
so \(|\dot{\varvec{\theta }}_n|\le A||\textbf{F}_0||\). Hence
Putting back in the omitted term \(\textbf{Y}_n^T\mathbf{{U}}_0^{-1}\textbf{Y}_n\), it follows from Eqs. 7.24 and 7.25 that
The same analysis, with \(\tau \) replacing \(\Omega \), gives
Now Eqs. 7.23, 7.26 and 7.27 and the continuous mapping theorem imply
where \( \dot{C}_{\Omega }=\{\mathbf{{P}}^{T/2}\textbf{F}_0{\varvec{\theta }}:{\varvec{\theta }}\in \widetilde{C}_{\Omega }\}\), \( \dot{C}_{\tau }=\{\mathbf{{P}}^{T/2}\textbf{F}_0{\varvec{\theta }}:{\varvec{\theta }}\in \widetilde{C}_{\tau }\}\), and the convergence follows from Eq. 5.3, which implies
uniformly in \({\varvec{\beta }}\), and similarly for \(\dot{C}_{\tau }\). This gives (5.6) and completes the proof of Theorem 5.1. \(\square \)
Appendix 2: Application: The Langevin (von Mises-Fisher) Distribution
This distribution, defined on the unit sphere in \(\mathbb {R}^d\), displays in a striking way some of the features we wish to illustrate. Let \(\mathcal {S}^d\) be the unit sphere in \(\mathbb {R}^d\), \(d\ge 2\), and suppose random vector \(\textbf{X}\in \mathcal {S}^d\) has density
where \(\kappa >0\), \({\varvec{\mu }}\in \mathbb {R}^d\) and
Here \(\textrm{d}\omega (\cdot )\) is the area element on \(\mathcal {S}^d\) such that
It is easy to show that \(c(\kappa )\) does not in fact depend on \({\varvec{\mu }}\), despite appearances in Eq. 8.2. The formulae
where \(a(\kappa )=(c_{\kappa \kappa }c(\kappa )-c_\kappa ^2)/c^2(\kappa )>0\), with \(c_\kappa =c'(\kappa )\) and \(c_{\kappa \kappa }=c''(\kappa )\), for \(\kappa >0\), can be found in Watson (1983), together with the following useful relations. We have
and the function \(A(\kappa ):=c_\kappa /c(\kappa )\), \(\kappa >0\), satisfies
and
Let \(\textbf{X}_1,\ldots , \textbf{X}_n\) be n i.i.d. observations on \(\textbf{X}\), with log-likelihood
where \({\varvec{\theta }}=[\kappa \ {\varvec{\mu }}^T]^T\in \Theta := (0,\infty )\times \mathbb {R}^d\) and \(\overline{\textbf{X}}_n=n^{-1}\sum _{i=1}^n\textbf{X}_i\). (We alter notation slightly here by setting \({\varvec{\theta }}\) in \(\mathbb {R}^{d+1}\), because it’s convenient to have \({\varvec{\mu }}\) in \(\mathbb {R}^d\).) By Eq. 8.3
for any \(\kappa >0\), \({\varvec{\mu }}\in \mathcal{S}^d\). Since
for all \(\textbf{u}\in S^{d-1}\), \( \textrm{Var}(\textbf{X})\) and \( \textrm{Var}(\overline{\textbf{X}}_n)\) are positive definite matrices.
Let \({\varvec{\theta }}_0\in \mathbb {R}^{d+1}\) denote the true value of \({\varvec{\theta }}\), for which \(\textbf{X}\) has the density (8.1) with \([\kappa _0\ {\varvec{\mu }}_0^T]^T={\varvec{\theta }}_0\). The log-likelihood is to be maximised subject toFootnote 7
Thus we have a single restriction, corresponding to \(s=1\) in Eq. 2.3, and we write \(h({\varvec{\theta }})\) rather than \(\mathbf{{h}}({\varvec{\theta }})\). The restricted parameter space is \(\Theta ^h=\{{\varvec{\theta }}\in \Theta :|{\varvec{\mu }}|=1\}= (0,\infty )\times \mathcal {S}^d\).
We can calculate
and
Observe that the expected first derivative \(E(\textbf{S}_n({\varvec{\theta }}))=[0\ n\kappa A(\kappa ){\varvec{\mu }}^T]^T\) is not equal to 0 for any \({\varvec{\theta }}\in \Theta \), and \(\textbf{F}_n({\varvec{\theta }}) \) is singularFootnote 8 for all \({\varvec{\theta }}\in \Theta \). So “standard" asymptotic theory for MLEs does not apply.
We apply the theory in Sections 2–4. Choose \({\lambda }_n=C_n{\lambda }\), where \(C_n>0\); \({\lambda }_n\) and \({\lambda }\) are scalars here. So we analyse (8.8) in conjunction with
We aim first to verify conditions (3.2), (3.3) and (3.4) of Theorem 3.1 so as to establish existence and consistency of an estimator \(\widehat{\varvec{\theta }}_n\) for \({\varvec{\theta }}_0\). To this end, calculate
and
We use these to augment \(\textbf{S}_n({\varvec{\theta }})\) and \(\textbf{F}_n({\varvec{\theta }})\) to \(\textbf{S}_n^{\lambda }({\varvec{\theta }})\) and \(\textbf{F}_n^{\lambda }({\varvec{\theta }})\), as in Eqs. 2.4 and 2.5.
As expected in this i.i.d. setup, a multiplier n appears in Eqs. 8.9 and 8.10, and prompts choosing \(\textbf{D}_n=\sqrt{n}\textbf{I}_{d+1}\) in Eqs. 3.1–3.4 and \(a_n=C_n=an\), \(a>0\), in Eq. 3.8. Then \(C=1\), a scalar, in Eq. 3.8. Accordingly, for \({\lambda }\in \mathbb {R}\), we set
and
By Eqs. 8.7, 8.9, and the weak law of large numbers,
for \({\varvec{\theta }}\in \Theta \). Taking \({\varvec{\theta }}={\varvec{\theta }}_0\) shows that (3.9) holds with \(a_n=an\) and that \(\textbf{L}_0:= [0\ \kappa _0 A(\kappa _0) {\varvec{\mu }}_0^T/a]^T\).
The equations (3.10) in this situation are, for general \({\varvec{\theta }}\),
Noting that, by Eq. 8.3, \(E\textbf{X}=c_{\kappa }{\varvec{\mu }}/c(\kappa ) =A(\kappa ){\varvec{\mu }}\), these give
It follows from Eq. 8.12 that
and then from Eq. 8.16 we find
The \((d+1)\times (d+1)\) matrix \(\textbf{V}({\varvec{\theta }})\) is positive semidefinite, having rank d. To see this, we can write, explicitly,
Let \(\mathbf{{u}}\) be a unit vector in \(\mathbb {R}^{d+1}\) partitioned as \([u_1\ \mathbf{{u}}_R^T]^T\). Then
is 0 iff \(u_1=-\kappa \) and \(\mathbf{{u}}_R={\varvec{\mu }}\). Since \(\textrm{Var}(\textbf{X})\) is positive definite, \(\mathbf{{u}}^T\textbf{V}({\varvec{\theta }})\mathbf{{u}}>0\) except when \(\mathbf{{u}}=[-\kappa \ {\varvec{\mu }}^T]^T\), and so \(\textrm{Var}(\textbf{S}_n^{{\lambda }}({\varvec{\theta }}))\) is positive semidefinite with rank d.
Formulae Eqs. 8.12–8.19 hold for all \({\varvec{\theta }}\in \Theta \), \({\lambda }\in \mathbb {R}\). In particular, choosing \({\varvec{\theta }}={\varvec{\theta }}_0\), and letting \(E_0\) and \(\textrm{Var}_0\) denote expectation and variance when \({\varvec{\theta }}={\varvec{\theta }}_0\), Eq. 8.15 gives for the true value of \({\lambda }\)
Recall that \(\textbf{D}_n:=\sqrt{n}\textbf{I}_{d+1}\). Then \({\lambda }_{\min }(\textbf{D}_n\textbf{D}_n^T)=n\rightarrow \infty \) as \(n\rightarrow \infty \), so Eq. 3.2 holds. We also have
where \(\textbf{V}_0\) is the finite matrix \(\textbf{V}({\varvec{\theta }})\) defined in Eq. 8.18 evaluated at \({\varvec{\theta }}={\varvec{\theta }}_0\). So Eq. 3.3 holds by Eq. 8.17 and Chebychev’s inequality.
Next we have to check Eq. 3.4. It turns out that the \(\textbf{F}_n^{\lambda }({\varvec{\theta }})\) are not positive definite so we will need to make choices for \(\mathbf{{H}}({\varvec{\theta }})\) and \(\mathbf{{G}}_n\) in Eq. 2.6. Note that, for all \((\kappa , {\varvec{\mu }})\) and \({\lambda }\), by Eqs. 8.13 and 8.15,
as \(n\rightarrow \infty \). This matrix has determinant
where we let
with \(A(\kappa )=c_\kappa /c(\kappa )\), and \(A'(\kappa )=a(\kappa )\). The function \(g(\kappa )\) is positive for all \(\kappa >0\). This follows from Eqs. 8.4 and 8.5, which imply \(g(0)=0\) and \(g'(\kappa )=-\kappa A''(\kappa )>0\). Thus \(g(\kappa )\) is strictly increasing hence \(g(\kappa )>0\) for all \(\kappa >0\). Thus \(\textrm{det}(\textbf{F}^{\lambda }({\varvec{\theta }}))<0\) when \({\lambda }\) satisfies Eq. 8.15.
It follows that \(\textbf{F}_n^{\lambda }({\varvec{\theta }})\) is nonsingular but not definite near \({\lambda }_0\). Set \(\mathbf{{G}}_n:= \sqrt{bn}\textbf{I}_{d+1}/\sqrt{2}\), where \(b>0\), and, following Eqs. 2.6, 8.11 and 8.13, define
This is positive definite with high probability for n large enough in a neighbourhood of \((\kappa _0, {\varvec{\mu }}_0)\) and \({\lambda }_0\) provided b is chosen large enough. To see this, consider the limit in probability
A necessary and sufficient condition for this to be positive definite is that \(\textbf{u}^T\textbf{F}^{{\lambda }*}({\varvec{\theta }})\textbf{u}>0\) for all \(\textbf{u}\in S^{d+1}\). Let \(\textbf{u}\) be such a unit vector, partitioned as \([u_1\ \mathbf{{u}}_R^T]^T\). Then we can calculate
This is positive for any choice of \(b\ge A^2(\kappa )/a(\kappa )\). Then \(\textbf{F}^{{\lambda }_0*}({\varvec{\theta }})\) is positive definite for any such choice of b, and it follows by continuity considerations that \({\lambda }_{\min }(\textbf{F}^{{\lambda }*}({\varvec{\theta }}))>0\) for all \({\varvec{\theta }}\in \Theta \) (not just \({\varvec{\theta }}\in \Theta ^h\)) when \({\lambda }\) is near \({\lambda }_0\). Since \(n^{-1}{\lambda }_{\min }(\textbf{F}_n^{{\lambda }_0 *}({\varvec{\theta }}))\overset{\textrm{P}}{\longrightarrow } {\lambda }_{\min }(\textbf{F}^{{\lambda }_0 *}({\varvec{\theta }}))\) for all \({\varvec{\theta }}\in \Theta \) and we have chosen \(\textbf{D}_n\) proportional to \(\sqrt{n}\) we see that (3.4) holds.
We have now verified that Eqs. 3.2, 3.3, and 3.4 hold for the Langevin with the choice \(b= A^2(\kappa )/a(\kappa )\), so we can deduce from Theorem 3.1 that there is a consistent estimator \(\widehat{\varvec{\theta }}_n=[\widehat{\kappa }_n\ \widehat{\varvec{\mu }}_n^T]^T\) for \({\varvec{\theta }}_0\), satisfying (3.5). Recall that \(\widehat{\varvec{\theta }}_n\) does not depend on \({\lambda }_0\) or on the choice of a, b or \(\textbf{D}_n\).
Next we look for a consistent estimator \(\widehat{\lambda }_n\) for \({\lambda }_0\). The system (3.7) here has the form, by Eq. 8.12,
with a unique solution \(\widehat{\lambda }_n\) obtained from \(2\widehat{\lambda }_n a\widehat{\varvec{\mu }}_n=-\widehat{\kappa }_n\overline{\textbf{X}}_n\) (via the second equation in Eq. 8.22), which implies
(via the first equation in Eq. 8.22). Then (via the second equation in Eq. 8.22)
and \(\widehat{\kappa }_n\) is the solution to
(via the first equation in Eq. 8.22). It’s already clear from Eqs. 8.20 and 8.23 and the consistency of \((\widehat{\kappa }_n,\widehat{\varvec{\mu }}_n)\) for \((\kappa _0,{\varvec{\mu }}_0)\) that \(\widehat{\lambda }_n\) is consistent for \({\lambda }_0\), but to complete the example just note that we established Eq. 3.9 in Eq. 8.14, while Eq. 3.11 follows from
by the weak law of large numbers, and since we already know that \(\widehat{\varvec{\theta }}_n\overset{\textrm{P}}{\longrightarrow } {\varvec{\theta }}_0\). Thus all conditions of Theorem 3.2 are satisfied and we can conclude that \(\widehat{\lambda }_n\) is consistent for \({\lambda }_0\).
Now we find the asymptotic distribution of \((\widehat{\varvec{\theta }}_n,\widehat{\lambda }_n)\) by applying Theorem 4.1. Checking the conditions, Eq. 4.1 follows from Eqs. 8.22 and 8.23, and 4.3 follows immediately since \(\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\) is a sum of i.i.d. random vectors with expectation \(\textbf{0}\) and variance matrix \(n\textbf{V}_0\); thus, recalling that \(\textbf{D}_n=\sqrt{n}\textbf{I}_{d+1}\),
Since \(\textbf{V}_0\) has rank \(d<d=1\), the normal rv Z is concentrated on \(\mathbb {R}^d\).
For Eq. 4.5, choose \(\textbf{J}_n=\sqrt{n}\textbf{I}_{d+2}\). Using Eqs. 4.2, 8.11 and 8.13, we calculate
Then the \((d+2)\times (d+2)\) matrix \(\mathbf{{U}}_n^{{\lambda }}({\varvec{\theta }})\) is nonsingular with inverse satisfying
where \(\textbf{P}\) is a \((d+1)\times (d+1)\) matrix, \(\textbf{Q}\) is a vector in \(\mathbb {R}^{d+1}\), and \(\textrm{R}\) is a scalar, all given by
and
The claimed inverse in Eq. 8.25 can be verified by direct multiplication.
Applying Theorem 4.1 and recalling \(\textbf{W}\) in Eq. 8.25 we obtain, as \(n\rightarrow \infty \),
From Eqs. 8.18, 8.19 and 8.26 we can calculate
(of course \(\textbf{P}^T= \textbf{P}\) here), and
Next we check the conditions in Theorem 5.1. First, \(\textbf{D}_n^{-1}\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\overset{\textrm{D}}{\longrightarrow } \textbf{Z}\) was shown in Eqs. 8.24, and 5.4 follows from Eq. 8.25. Next, to check Eq. 5.5: recall that \(\textbf{J}_n=\sqrt{n}\textbf{I}\). The matrix \(\textbf{J}_n^{-1}\big (\textbf{U}_n^{{\lambda }_0}({\varvec{\theta }})-\textbf{U}_n^{{\lambda }_0}({\varvec{\theta }}_0)\big )\textbf{J}_n^{-T}\) in this case has upper left diagonal submatrix \(\big (\textbf{F}_n^{{\lambda }_0}({\varvec{\theta }})- \textbf{F}_n^{{\lambda }_0}({\varvec{\theta }}_0)\big )/n\) and lower right diagonal element 0. The off-diagonal elements are also zero. Eqs. 8.13 and 8.20 give
and \({\varvec{\theta }}\in N_n(A)\) implies \(|{\varvec{\theta }}-{\varvec{\theta }}_0|\le An^{-1/2}\), so the RHS of Eq. 8.28 tends to 0 in probability as \(n\rightarrow \infty \) uniformly in a neighbourhood of \({\varvec{\theta }}_0\). So Eq. 5.5 holds.
For the limit random variable appearing in Eq. 5.6, we have \(\textbf{Z}\sim \textbf{N}(\textbf{0},\textbf{V}_0)\), where \(\textbf{V}_0\) is as in Eq. 8.19 with \({\varvec{\theta }}={\varvec{\theta }}_0\), so
where \(\mathbf{{P}}\) is in Eq. 8.26. We calculate
and, with \(\textbf{F}_0\) as the limit in probability of \(n^{-1}\textbf{F}_n({\varvec{\theta }}_0\)) given by Eq. 8.10,
With this we can calculate the \( \dot{C}_{\Omega }\) and \(\dot{C}_{\tau }\) appearing in Theorem 5.1 for any desired hypothesis tests. Since we have i.i.d. observations we get a normal limit in Eq. 8.27 and corresponding \(\chi ^2\) distributions for \(d_n\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Maller, R., Ghodsi, M. A General Formulation for the Large-Sample Behaviour of a Class of Hypothesis Test Statistics. Sankhya A (2024). https://doi.org/10.1007/s13171-024-00364-8
Received:
Published:
DOI: https://doi.org/10.1007/s13171-024-00364-8