Skip to main content
Log in

A General Formulation for the Large-Sample Behaviour of a Class of Hypothesis Test Statistics

  • Published:
Sankhya A Aims and scope Submit manuscript

Abstract

We bring together some strands of development concerning restricted likelihood ratio estimation and testing, including boundary hypothesis testing, going back to pioneering papers of Aitchison, Silvey and Chernoff for motivation. Thus, cases where the parameters are connected by a number of functional relationships, which may involve natural restrictions on the parameters and/or restrictions imposed by a null hypothesis, as well as situations where the null and alternate hypotheses place the true parameter at the boundary of disjoint subsets of the parameter space, are considered. Our asymptotic results are proved under clearly specified and minimal assumptions, which are probably close to the weakest possible. We illustrate with an example for distributions defined on the unit sphere in \(\mathbb {R}^{\varvec{d}}\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Throughout, vectors and matrices are depicted in boldface. A bold \(\textbf{0}\) will denote a zero vector or matrix whose dimension depends on the context; sometimes a subscript is used to denote the dimension. A superscript “T” denotes a vector or matrix transpose.

  2. \(O_P(1)\) means bounded in probability, equivalently, relatively compact.

  3. We take the norm of a matrix \(\textbf{M}\) to be \(||\textbf{M}||= \sup _{|\textbf{u}|: |\textbf{u}|=1} |\textbf{u}^T\textbf{M}|\).

  4. Vu et al. (1998) assume the analogue of our matrix \(\textbf{B}_n\) is positive definite a.s., but this fact is not used in their proof. Sufficient is that \(\textbf{B}_n\) be symmetric nonsingular a.s.

  5. In what follows we translate others’ notation to our usage where convenient.

  6. In fact they reference Lehmann (1983), of which Lehmann and Casella (1998) is an update. We use the latter.

  7. The restriction (8.8) can also be taken into account by eliminating one component of \({\varvec{\mu }}\), say, \(\mu _d\), by solving for it in terms of the remaining \(d-1\) components. But this would destroy the symmetry of the setup and make interpretation difficult.

  8. The matrix \(\textbf{F}_n({\varvec{\theta }})\) has determinant equal to \(n a(\kappa )\textrm{det}(-\overline{\textbf{X}}_n\overline{\textbf{X}}_n^T)\), which is 0 because the matrix \(\overline{\textbf{X}}_n\overline{\textbf{X}}_n^T\) has rank \(1<d\).

References

  • Aitchison, J. and Silvey, S.D. (1958). Maximum-likelihood estimation of parameters subject to restraints. Ann. Math. Statist., 29, 813–828.

    Article  MathSciNet  Google Scholar 

  • Aitchison, J. and Silvey, S.D. (1960). Maximum-Likelihood estimation procedures and associated tests of significance. J. Roy. Statist. Soc. B (Methodological), 22, 154–171.

    Article  MathSciNet  Google Scholar 

  • Andrews, D.W.K. (1998). Hypothesis testing with a restricted parameter space. J. Econometrics, 84, 155–199.

    Article  MathSciNet  Google Scholar 

  • Andrews, D.W.K. (1999). Estimation when a parameter is on a boundary. Econometrica, 67, 1341–1383.

    Article  MathSciNet  Google Scholar 

  • Andrews, D.W.K. (2001). Testing when a parameter is on the boundary of the maintained hypothesis. Econometrica, 69, 683–734.

    Article  MathSciNet  Google Scholar 

  • Andersen, P.K. and Gill, R. D. (1982). Cox’s regression model for counting processes: A large sample study. Ann. Statist., 10, 1100–1120.

    Article  MathSciNet  Google Scholar 

  • Breusch, T.W. (1986). Hypothesis testing in unidentified models. Rev. Econ. Stud., 53, 635–651.

    Article  Google Scholar 

  • Chant, D. (1974). On asymptotic tests of composite hypotheses in nonstandard conditions. Biometrika, 61, 291–298.

    Article  MathSciNet  Google Scholar 

  • Chernoff, H. (1954). On the distribution of the likelihood ratio. Ann. Math. Statist., 25, 573–578.

    Article  MathSciNet  Google Scholar 

  • Cox, D.R. and Hinkley, D.V. (1974). Theoretical Statistics. London: Chapman & Hall.

    Book  Google Scholar 

  • Drton, M. (2009). Likelihood ratio tests and singularities. Ann. Statist., 37, 979–1012.

    Article  MathSciNet  Google Scholar 

  • Drton, M. and Sullivant, S. (2007). Algebraic statistical models. Statist. Sinica, 17, 1273–1297.

    MathSciNet  Google Scholar 

  • Eicker, F. (1963). Asymptotic normality and consistency of the least squares estimators for families of linear regressions. Ann. Math. Statist., 34, 447–456.

    Article  MathSciNet  Google Scholar 

  • Eicker, F. (1965). Limit theorems for regressions with unequal and dependent errors. Proc. V Berkeley Symp. Math. Statist. Prob., Berkeley, CA., 1965/66, Vol. I, Univ. of California Press, Berkeley, CA. pp. 59–82.

  • Fahrmeir, L. and Kaufmann, H. (1985). Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann. Math. Statist., 13, 342–368.

    MathSciNet  Google Scholar 

  • Feder, P.I. (1968). On the distribution of the log likelihood ratio test statistic when the true parameter is “near” the boundaries of the hypothesis regions. Ann. Math. Statist., 39, 2044–2055.

    Article  MathSciNet  Google Scholar 

  • Geyer, C.J. (1994). On the asymptotics of constrained M-estimation. Ann. Statist., 22, 1993–2010.

    Article  MathSciNet  Google Scholar 

  • Geyer, C.J. (1991). Constrained maximum likelihood exemplified by isotonic convex logistic regression. J. Amer. Statist. Assoc., 86, 717–724.

    Article  Google Scholar 

  • Gromping, U. (2010). Inference with linear equality and inequality constraints using R: The Package. J. Stat. Softw., 33, 10.

    Article  Google Scholar 

  • Klüppelberg, C., Maller, R.A., Van De Vyver, M. and Wee D. (2002). Testing for reduction to random walk in autoregressive conditional heteroskedasticity models. Econom. J., 5, 387–416.

    Article  MathSciNet  Google Scholar 

  • Kuiper, R.M., Hoijtink, H. and Silvapulle, M. (2011). An Akaike-type information criterion for model selection under inequality constraints. Biometrika, 98, 495–501.

    Article  MathSciNet  Google Scholar 

  • Lehmann, E.L. (1983). Theory of Point Estimation. John Wiley & Sons, New York.

  • Lehmann, E.L. and Casella, G. (1998). Theory of Point Estimation, 2nd Ed.. Springer Texts in Statistics.

  • Maller, R.A. (2003). Asymptotics of regressions with stationary and nonstationary residuals. Stoch. Proc. Appl., 105, 33–67.

    Article  MathSciNet  Google Scholar 

  • Maller, R.A. and Zhou, X. (2002). Analysis of parametric models for competing risks. Statist. Sinica, 12, 725–750.

    MathSciNet  Google Scholar 

  • McDonald, J.B. and Newey, W.K. (1988). Partially adaptive estimation of regression models via the generalized T distribution. Econ. Theory, 4, 428–457.

    Article  MathSciNet  Google Scholar 

  • Mitchell, D.J., Allman, E.S. and Rhodes, J.A. (2019). Hypothesis testing near singularities and boundaries. Elect. J. Statist., 13, 2150–2193.

    MathSciNet  Google Scholar 

  • Newey, W.K. and McFadden, D. (1994). Large sample estimation and hypothesis testing. Ch. 36, Handbook Econ., 4, 2111–2245.

    Google Scholar 

  • Rotnitzky, A, Cox, D.R., Bottai, M. and Robins, J. (2000). Likelihood-based inference with singular information matrix. Bernoulli, 6, 243–284.

    Article  MathSciNet  Google Scholar 

  • Self, S.G. and Liang, K.Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Amer. Statist. Assoc., 82, 605–610.

    Article  MathSciNet  Google Scholar 

  • Silvapulle, M.J. and Sen, P.K. (2005). Constrained Statistical Inference. Inequality, Order, and Shape Restrictions. Wiley, Hoboken, NJ.

  • Silvapulle, M.J. and Silvapulle, P. (1995). A score test against one-sided alternatives. J. Amer. Statist. Assoc., 90, 342–349.

    Article  MathSciNet  Google Scholar 

  • Silvey, S.D. (1959). The Lagrangian multiplier test. Ann. Math. Statist., 30, 389–407.

    Article  MathSciNet  Google Scholar 

  • Vanbrabant, L., Van de Schoot, R. and Rosseel, Y. (2015). Constrained statistical inference: Sample-size tables for ANOVA and regression. Front. Psychol., 5, 1565.

    Article  Google Scholar 

  • Vu, H.T.V., Maller, R.A. and Klass, M.J. (1996). On the studentisation of random vectors. J. Multivariate Anal., 57, 142–155.

    Article  MathSciNet  Google Scholar 

  • Vu, H.T.V., Maller, R.A. and Zhou, X. (1998). Asymptotic properties of a class of mixture models for failure data: The interior and boundary cases. Ann. Instit. Statist. Math., 50, 627–653.

    Article  MathSciNet  Google Scholar 

  • Vu, H.T.V. and Zhou, S. (1997). Generalization of likelihood ratio tests under nonstandard conditions. Ann. Statist., 25, 897–916.

    Article  MathSciNet  Google Scholar 

  • Watson, G.S. (1983). Statistics on Spheres. University of Arkansas lecture notes in the mathematical sciences, John Wiley & Sons, New York.

  • Watson, G.S. (1984). The theory of concentrated Langevin distributions. J. Multivariate Anal., 14, 74–82.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We are very grateful to two referees who read the paper extremely closely and carefully and gave detailed and constructive suggestions which helped us improve it.

Funding

This research was partially supported by an Australian Government Research Council Discovery Grant (ARC Grant) DP0664603.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maryam Ghodsi.

Ethics declarations

Conflict of Interest

There is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Proof of Theorems

Before proving Theorems 3.1 and 3.2 we mention some further preliminaries. We can suppose without loss of generality that the \(h_j({\varvec{\theta }})\) have been numbered so that the \(\mathbf{{H}}({\varvec{\theta }})\) in Eq. 2.6 satisfy

$$ \mathbf{{H}}({\varvec{\theta }})=\begin{bmatrix} \displaystyle {{\partial h_1\over \partial {\varvec{\theta }}}}&\cdots&\displaystyle {\partial h_{s}\over \partial {\varvec{\theta }}}\end{bmatrix}_{d\times s}. $$

For \(\theta \in \Theta \), \({\varvec{\lambda }}\in \mathbb {R}^s\), \(n=1,2,\ldots \), define

$$\begin{aligned} \mathcal{L}_n^{{\lambda }*}({\varvec{\theta }}):=\mathcal{L}_n^{\lambda }({\varvec{\theta }})-{1\over 2} \sum _{j=1}^{s}g_j^2(n)h_j^2({\varvec{\theta }}), \end{aligned}$$

where \((g_j(n))\) are the diagonal elements from the \(\mathbf{{G}}_n\) in Eq. 2.6. Differentiating \(\mathcal{L}_n^{{\lambda }*}({\varvec{\theta }})\) with respect to \({\varvec{\theta }}\) gives

$$\begin{aligned} \textbf{S}_n^{{\lambda }*}({\varvec{\theta }}):={\partial \over \partial {\varvec{\theta }}}\mathcal{L}_n^{{\lambda }*}({\varvec{\theta }}) =\textbf{S}_n^{{\lambda }}({\varvec{\theta }}) -\sum _{j=1}^{s} g_j^2(n) h_j({\varvec{\theta }}) {\partial h_j\over \partial {\varvec{\theta }}}({\varvec{\theta }}). \end{aligned}$$
(7.1)

Differentiating with respect to \({\varvec{\theta }}\) again gives

$$ {\partial \over \partial {\varvec{\theta }}}\sum _{j=1}^{s} g_j^2(n)h_j({\varvec{\theta }}) {\partial h_j\over \partial {\varvec{\theta }}^T}({\varvec{\theta }}) =\sum _{j=1}^{s} g_j^2(n)\left\{ \left( {\partial h_j\over \partial {\varvec{\theta }}}\right) \left( {\partial h_j\over \partial {\varvec{\theta }}^T}\right) +h_j({\varvec{\theta }}) {\partial ^2 h_j\over \partial {\varvec{\theta }}\, \partial {\varvec{\theta }}^T}({\varvec{\theta }})\right\} . $$

Since \(\mathbf{{h}}({\varvec{\theta }})=\textbf{0}\) on \(\Theta ^h\), we see that, for \({\varvec{\theta }}\in \Theta ^h\),

$$\begin{aligned} {\partial ^2 \over \partial {\varvec{\theta }}\partial {\varvec{\theta }}^T} \Big (\sum _{j=1}^{s} g_j^2(n) h^2_j({\varvec{\theta }})\Big )&=\sum _{j=1}^{s} g_j^2(n)\Big ({\partial h_j\over \partial {\varvec{\theta }}}\Big ) \Big ({\partial h_j\over \partial {\varvec{\theta }}}\Big )^T\\&=\mathbf{{H}}({\varvec{\theta }})\mathbf{{G}}_n\mathbf{{G}}_n^T\mathbf{{H}}^T({\varvec{\theta }}). \end{aligned}$$

It follows that, for \({\varvec{\theta }}\in \Theta ^h\) and \({\varvec{\lambda }}\in \mathbb {R}^s\), minus the second derivative matrix of \(\mathcal{L}_n^{{\lambda }*}({\varvec{\theta }})\) with respect to \({\varvec{\theta }}\) is the matrix in Eq. 2.6:

$$ \textbf{F}_n^{\lambda }({\varvec{\theta }})+\mathbf{{H}}({\varvec{\theta }})\mathbf{{G}}_n\mathbf{{G}}_n^T\mathbf{{H}}^T({\varvec{\theta }}) =\textbf{F}_n^{{\lambda }*}({\varvec{\theta }}). $$

Proof of Theorem 3.1

Assume (A1) and Eqs. 3.23.4, and let \({\varvec{\lambda }}_0\in \mathbb {R}^s\) be the particular value of \({\varvec{\lambda }}\) specified in Eqs. 3.3 and 3.4. Equation 3.4 implies that both probabilities

$$\begin{aligned} P\{\inf _{{\varvec{\theta }}\in N_n^h(A)} {\lambda }_{\min }&\big (\textbf{D}_n^{-1}\textbf{F}_n^{{\lambda }_0 *}({\varvec{\theta }})\textbf{D}_n^{-T}\big ) \ge K\}\nonumber \\&\ge P\{\inf _{{\varvec{\theta }}\in N_n(A)} {\lambda }_{\min }\big (\textbf{D}_n^{-1}\textbf{F}_n^{{\lambda }_0 *}({\varvec{\theta }})\textbf{D}_n^{-T}\big ) \ge K\} \end{aligned}$$
(7.2)

tend to 1 as \(n\rightarrow \infty \) then \(A\rightarrow \infty \) for arbitrary \(K>0\). Eq. 7.2 together with Eq. 3.2 implies that the \(\textbf{F}_n^{{\lambda }_0 *}({\varvec{\theta }})\) are positive definite on \(N_n^h(A)\), WPA1 as \(n \rightarrow \infty \) then \(A\rightarrow \infty \). Thus

$$\lim _{A\rightarrow \infty } \liminf _{n\rightarrow \infty } P\{\mathcal{L}_n^{{\lambda }_0 *}({\varvec{\theta }})\ \mathrm{is\ strictly\ concave\ for\ }{\varvec{\theta }}\in N_n^h(A)\}=1. $$

Now when \({\varvec{\theta }}\in \Theta ^h\), \(\mathcal{L}_n^{{\lambda }*}({\varvec{\theta }})=\mathcal{L}_n^{{\lambda }}({\varvec{\theta }})=\mathcal{L}_n({\varvec{\theta }})\) for any \({\varvec{\lambda }}\in \mathbb {R}^s\), so we have

$$\begin{aligned} \lim _{A\rightarrow \infty }\liminf _{n\rightarrow \infty }P\{\mathcal{L}_n({\varvec{\theta }})\ \mathrm{is\ strictly\ concave\ for\ }{\varvec{\theta }}\in N_n^h(A)\}=1. \end{aligned}$$
(7.3)

For \(A>0\), \(n=1,2,\ldots \), define \( M^h_n(A)\) as the boundary of \(N_n^h(A)\), thus

$$\begin{aligned} M^h_n(A) = \partial N_n^h(A)=\{{\varvec{\theta }}\in \Theta ^h: ({\varvec{\theta }}-{\varvec{\theta }}_0)^T\textbf{D}_n\textbf{D}_n^T({\varvec{\theta }}-{\varvec{\theta }}_0)=A^2\}. \end{aligned}$$
(7.4)

By definition, \( M^h_n(A)\subseteq \Theta ^h\). We now show that

$$\begin{aligned} \lim _{A\rightarrow \infty }\liminf _{n\rightarrow \infty } P\{\sup _{{\varvec{\theta }}\in M^h_n(A)}\mathcal{L}_n({\varvec{\theta }})<\mathcal{L}_n({\varvec{\theta }}_0)\} =1. \end{aligned}$$
(7.5)

This is done as follows. Take \(A>1\), and \({\varvec{\theta }}\in N_n^h(A)\). Then \({\varvec{\theta }}\in \Theta ^h\). It follows from a Taylor expansion in \({\varvec{\theta }}\) that

$$\begin{aligned} \mathcal{L}_n({\varvec{\theta }})-\mathcal{L}_n({\varvec{\theta }}_0)&= \mathcal{L}_n^{{\lambda }_0 *}({\varvec{\theta }})-\mathcal{L}_n^{{\lambda }_0 *}({\varvec{\theta }}_0) \\&= ({\varvec{\theta }}-{\varvec{\theta }}_0)^T \textbf{S}_n^{{\lambda }_0 *}({\varvec{\theta }}_0) -{1\over 2}({\varvec{\theta }}-{\varvec{\theta }}_0)^T \textbf{F}_n^{{\lambda }_0 *}(\overline{{\varvec{\theta }}}) ({\varvec{\theta }}-{\varvec{\theta }}_0),\\&\end{aligned}$$
(7.6)

where \(\overline{{\varvec{\theta }}}=\alpha {\varvec{\theta }}+(1-\alpha ){\varvec{\theta }}_0\) for some \(\alpha \in [0,1]\). Since \({\varvec{\theta }}\) and \({\varvec{\theta }}_0\) are in \(N_n^h(A)\), we have \(\overline{{\varvec{\theta }}}\in N_n(A)\) (but not necessarily \(\overline{{\varvec{\theta }}}\in N_n^h(A)\), because \(\overline{{\varvec{\theta }}}\) may not satisfy \(\mathbf{{h}}(\overline{{\varvec{\theta }}})=\textbf{0}\)). Let

$$Q_n({\varvec{\theta }}, \overline{{\varvec{\theta }}})={1\over 2}({\varvec{\theta }}-{\varvec{\theta }}_0)^T\textbf{F}_n^{{\lambda }_0 *}(\overline{{\varvec{\theta }}})({\varvec{\theta }}-{\varvec{\theta }}_0) \quad \textrm{and}\quad \mathbf{{v}}_n({\varvec{\theta }})=(1/A)\textbf{D}_n^T({\varvec{\theta }}-{\varvec{\theta }}_0). $$

Observe that \(\textbf{S}_n^{{\lambda }_0 *}({\varvec{\theta }})=\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }})\) when \({\varvec{\theta }}\in \Theta ^h\) (by Eq. 7.1). So for \(c>0\) we have by Eq. 7.6

$$\begin{aligned} & P\{\mathcal{L}_n({\varvec{\theta }})\ge \mathcal{L}_n({\varvec{\theta }}_0)\ \mathrm{for\ some}\ {\varvec{\theta }}\in M^h_n(A)\} \\\le & P\{\mathbf{{v}}_n^T({\varvec{\theta }})\textbf{D}_n^{-1}\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0) \ge cA\ \mathrm{for\ some}\ {\varvec{\theta }}\in M^h_n(A)\} \\ & \quad +P\{Q_n({\varvec{\theta }}, \overline{{\varvec{\theta }}})\le {cA^2} \ \mathrm{for\ some}\ {\varvec{\theta }}\in M^h_n(A)\}. \end{aligned}$$
(7.7)

When \({\varvec{\theta }}\in M^h_n(A)\), \(\mathbf{{v}}_n({\varvec{\theta }})\) is a unit vector (by Eq. 7.4). Thus by Eq. 3.3 the first probability on the righthand side of Eq. 7.7 converges to 0 as \(n\rightarrow \infty \) then \(A\rightarrow \infty \), because

$$\begin{aligned} & P\{\mathbf{{v}}_n^T({\varvec{\theta }})\textbf{D}_n^{-1}\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0) \ge cA\ \mathrm{for\ some}\ {\varvec{\theta }}\in M^h_n(A)\}\\ & \le P\{\sup _{\mathbf{{u}}\in \mathbb {R}^d,\, |\mathbf{{u}}|=1}|u^T\textbf{D}_n^{-1}\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)|\ge cA \} \\ & \le P\{|\textbf{D}_n^{-1}\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)|\ge cA\}. \end{aligned}$$

For the second probability on the righthand side of Eq. 7.7, we have, when \({\varvec{\theta }}\in M^h_n(A)\),

$$\begin{aligned} Q_n({\varvec{\theta }},\overline{{\varvec{\theta }}})&=\frac{1}{2} A^2\mathbf{{v}}_n^T({\varvec{\theta }})\textbf{D}_n^{-1}\textbf{F}_n^{{\lambda }_0 *}(\overline{{\varvec{\theta }}})\textbf{D}_n^{-T} \mathbf{{v}}_n({\varvec{\theta }})\\&\ge \frac{1}{2}A^2 \inf _{{\varvec{\theta }}\in N_n(A)} {\lambda }_{\min }(\textbf{D}_n^{-1}\textbf{F}_n^{{\lambda }_0 *}({\varvec{\theta }})\textbf{D}_n^{-T}). \end{aligned}$$

From the last inequality follows

$$\begin{aligned} P\{Q_n({\varvec{\theta }}, \overline{{\varvec{\theta }}})\le {cA^2} \ \mathrm{for\ some}\&{\varvec{\theta }}\in M^h_n(A)\}\\&\le P\{\inf _{{\varvec{\theta }}\in N_n(A)}{\lambda }_{\min }(\textbf{D}_n^{-1}\textbf{F}_n^{{\lambda }_0 *}({\varvec{\theta }})\textbf{D}_n^{-T})\le 2c\}. \end{aligned}$$

By Eq. 3.4, the righthand side here tends to 0 as \(n\rightarrow \infty \), then \(A\rightarrow \infty \), then \(c\rightarrow 0\). As a result, we get Eq. 7.5 from 7.7.

Next, for \(A>0\), \(n=1,2,\ldots \), define events

$$ E_n(A)=\left\{ \mathcal{L}_n({\varvec{\theta }})<\mathcal{L}_n({\varvec{\theta }}_0)\ \textrm{for}\ {\varvec{\theta }}\in M^h_n(A)\ \textrm{and}\ \mathcal{L}_n({\varvec{\theta }})\ \mathrm{is\ concave\ on}\ N_n^h(A)\right\} . $$

By Eqs. 7.3 and 7.5 we have

$$\begin{aligned} \lim _{A\rightarrow \infty }\liminf _{n\rightarrow \infty }P\{E_n(A)\}=1. \end{aligned}$$
(7.8)

Take \(n=1,2,\ldots \) and \(A>0\), and suppose \(E_n(A)\) occurs. Then \(\mathcal{L}_n({\varvec{\theta }})\) continuous and concave on the closed, convex neighbourhood \(N_n^h(A)\) implies that \(\mathcal{L}_n({\varvec{\theta }})\) has a unique maximum point on \(N_n^h(A)\). Define \(\widehat{\varvec{\theta }}_n(A)\) to be this unique maximum (on the complement of \(E_n(A)\), \(\widehat{\varvec{\theta }}_n(A)\) need not be defined). In detail, Eq. 7.8 tells us that for each \(\varepsilon >0\) there is an \(A_0(\varepsilon )>0\) such that for each \(A \ge A_0(\varepsilon )>0\) there exists \(n_1(\varepsilon , A)\) with

$$\begin{aligned} P\{\widehat{\varvec{\theta }}_n(A) \ \mathrm{exists\ uniquely\ in}\ N_n^h(A)\} \ge 1-\varepsilon , \end{aligned}$$
(7.9)

whenever \(n\ge n_1(\varepsilon , A)\). Note that the maximum may occur at a boundary point of \(\Theta ^h\).

Now \(\widehat{\varvec{\theta }}_n(A)\) does not depend on \({\varvec{\lambda }}_0\) but it may depend on A. We can remove this dependence as follows. Given any positive integer m, by Eq. 7.8 there exists \(A_m>0\) such that

$$\begin{aligned} \liminf _{n\rightarrow \infty } P\{E_n(A_m)\} >1-{1\over 4m}. \end{aligned}$$

So there is a positive sequence \(f_m\uparrow \infty \) such that

$$P\{E_n(A_m)\}>1-{1\over 2 m} \qquad \mathrm{for\ all}\ n\ge f_m.$$

By Eq. 3.2, there is a sequence \(g_m\uparrow \infty \) such that, for all \(n\ge g_m\),

$$P\{{\lambda }_{\min }(\textbf{D}_n\textbf{D}_n^T)\ge mA_m^2\}>1-{1\over 2m}.$$

Let \(h_m=\max (f_m, g_m)\). Then \(h_m\uparrow \infty \) and for any \(m\ge 1\) and \(n\ge h_m\) we have

$$P\{E_n(A_m)\ \textrm{and} \ {\lambda }_{\min }(\textbf{D}_n\textbf{D}_n^T)\ge mA_m^2\} >1-{1\over m}.$$

For each \(n\ge h_1\), we can find \(m=m(n)\) such that \(h_{m(n)}\le n <h_{m(n)+1}\). Suppose \(E_n(A_{m(n)})\) occurs and let \(\widehat{\varvec{\theta }}_n =\widehat{\varvec{\theta }}_n(A_{m(n)})\). Note that \(\widehat{\varvec{\theta }}_n\) depends on n only and \(\widehat{\varvec{\theta }}_n \in N_n^h(A_{m(n)})\). Now, given any \(\varepsilon >0\), let \(m_0=m_0(\varepsilon )\) be an integer greater than \(2/\varepsilon +1\). When \(n\ge h_{m_0}\), then \(m(n)\ge m_0-1\) and

$$\begin{aligned} & P\{E_n(A_{m(n)})\ \textrm{and} \ {\lambda }_{\min }(\textbf{D}_n\textbf{D}_n^T)\ge m(n)A_{m(n)}^2\} \\ & \\ & > 1-{1\over m(n)} \ge 1-{1\over m_0-1}>1-\varepsilon . \end{aligned}$$
(7.10)

Hence, as \(\widehat{\varvec{\theta }}_n \in N_n^h(A_{m(n)})\), on \(E_n(A_{m(n)})\cap \{{\lambda }_{\min }(\textbf{D}_n\textbf{D}_n^T)\ge m(n)A_{m(n)}^2\}\) we have

$$\begin{aligned} |\widehat{\varvec{\theta }}_n-{\varvec{\theta }}_0|^2\le {(\widehat{\varvec{\theta }}_n-{\varvec{\theta }}_0)^T \textbf{D}_n\textbf{D}_n^T(\widehat{\varvec{\theta }}_n-{\varvec{\theta }}_0)\over {\lambda }_{\min }(\textbf{D}_n\textbf{D}_n^T)}&\le {A_{m(n)}^2\over {\lambda }_{\min }(\textbf{D}_n\textbf{D}_n^T)}\le {1\over m(n)}\\&\le {1\over m_0-1}<\varepsilon . \end{aligned}$$

Thus the \(\widehat{\varvec{\theta }}_n\) we have constructed is locally unique on \(\Theta ^h\), WPA1, and is consistent for \({\varvec{\theta }}_0\).

Since \(A_m\rightarrow \infty \) as \(m\rightarrow \infty \), and \(\widehat{\varvec{\theta }}_n =\widehat{\varvec{\theta }}_n(A_{m(n)})\), it may seem that we will no longer have \(\widehat{\varvec{\theta }}_n \in N_n^h(A)\) WPA1 as \(n\rightarrow \infty \) then \(A\rightarrow \infty \). However, in fact we can show Eq. 3.5. To see this, take \(\varepsilon >0\). Choose \(A_0(\varepsilon )\) and \(n_1(\varepsilon , A)\) so that Eq. 7.9 holds for \(A\ge A_0(\varepsilon )\) and \(n\ge n_1(A, \varepsilon )\). Also choose \(n_2(A)\) such that \(A_{m(n)} \ge A\) for all \(n\ge n_2(A)\). Then \(N_n^h(A) \subseteq N_n^h(A_{m(n)})\). When \(\widehat{\varvec{\theta }}_n(A)\) exists uniquely in \(N_n^h(A)\) and \(E_n(A_{m(n)})\) occurs, then \(\mathcal {L}_n({\varvec{\theta }})\) is concave on \(N_n^h(A_{m(n)})\), so \(\widehat{\varvec{\theta }}_n(A)\) must maximise \(\mathcal {L}_n({\varvec{\theta }})\) over \(N_n^h(A_{m(n)})\) as well. This implies \(\widehat{\varvec{\theta }}_n(A)=\widehat{\varvec{\theta }}_n(A_{m(n)})=\widehat{\varvec{\theta }}_n\), and so \(\widehat{\varvec{\theta }}_n \in N_n^h(A)\). As a result, for each \(A\ge A_0(\varepsilon )\) and all \(n\ge \max (h_{m_0(\varepsilon )},n_1(\varepsilon ,A),n_2(A))\), by Eqs. 7.9 and 7.10 we have

$$\begin{aligned} P\{\widehat{\varvec{\theta }}_n \in N_n^h(A)\} \ge P\{\widehat{\varvec{\theta }}_n \ \mathrm{exists\ uniquely\ on}\ N_n^h(A), \ E_n(A_{m(n)})\} \ge 1-2\varepsilon . \end{aligned}$$

Letting \(n\rightarrow \infty \), then \(A\rightarrow \infty \), then \(\varepsilon \rightarrow 0\), proves Eq. 3.5.

To complete the proof of Theorem 3.1, we stress that \(\widehat{\varvec{\theta }}_n\) does not depend on the choice of \({\varvec{\lambda }}_0\) in Eq. 2.2, \(\mathbf{{G}}_n\) in Eq. 2.6, or \(\textbf{D}_n\) in Eqs. 3.23.4. Retracing the argument, we saw that \(\mathcal{L}_n^{{\lambda }*}({\varvec{\theta }})=\mathcal{L}_n^{{\lambda }}({\varvec{\theta }})=\mathcal{L}_n({\varvec{\theta }})\) for any \({\varvec{\lambda }}\in \mathbb {R}^s\) when \({\varvec{\theta }}\in \Theta ^h\), and, in Eq. 7.3, that \(\mathcal{L}_n({\varvec{\theta }})\) is strictly concave for \({\varvec{\theta }}\in N_n^h(A)\) WPA1 as \(n\rightarrow \infty \), then \(A\rightarrow \infty \). By showing in Eq. 7.5 that \(\mathcal{L}_n({\varvec{\theta }})\) is smaller than \(\mathcal{L}_n({\varvec{\theta }}_0)\) for \({\varvec{\theta }}\) in the boundary set \(M_n^h(A)\), we established the existence of a unique maximum of \(\mathcal{L}_n({\varvec{\theta }})\) in \(N_n^h(A)\), WPA1 as \(n\rightarrow \infty \), then \(A\rightarrow \infty \). These considerations do not depend on the choice of \({\varvec{\lambda }}_0\) or \(\mathbf{{G}}_n\). The neighbourhoods \(N_n(A)\) and \(N_n^h(A)\) in Eq. 3.1 do depend on the choice of \(\textbf{D}_n\), so conceivably two choices \(\textbf{D}_n^1\) or \(\textbf{D}_n^2\) satisfying (3.2)–(3.4) with corresponding neighbourhoods \(N_n^{1,h}(A)\) and \(N_n^{2,h}(A)\) may give different estimators \(\widehat{\varvec{\theta }}_n^1\in N_n^{1,h}(A)\) and \(\widehat{\varvec{\theta }}_n^2\in N_n^{2,h}(A)\). But by the strict concavity of \(\mathcal{L}_n({\varvec{\theta }})\) in each neighbourhood, \(\widehat{\varvec{\theta }}_n^1\) and \(\widehat{\varvec{\theta }}_n^2\) must be equal, and in fact lie in \(N_n^{1,h}(A)\cap N_n^{2,h}(A)\). So \(\widehat{\varvec{\theta }}_n\) does not depend on the choice of \({\varvec{\lambda }}_0\), \(\mathbf{{G}}_n\), or \(\textbf{D}_n\).

\(\square \)

Proof of Theorem 3.2

Assume (A1), that \(\widehat{\varvec{\theta }}_n\) is a consistent estimator for \({\varvec{\theta }}_0\), and that (3.7) is a consistent system of equations for \({\varvec{\lambda }}\) with a unique solution \(\widehat{\varvec{\lambda }}_n\). Then

$$\begin{aligned} \frac{1}{a_n} \mathbf{{H}}(\widehat{\varvec{\theta }}_n)\mathbf{{C}}_n\widehat{\varvec{\lambda }}_n =-\frac{1}{a_n} \textbf{S}_n(\widehat{\varvec{\theta }}_n), \end{aligned}$$
(7.11)

and by Eqs. 3.9 and 3.11 we have

$$\begin{aligned} \frac{1}{a_n} \textbf{S}_n(\widehat{\varvec{\theta }}_n)= \frac{1}{a_n} \textbf{S}_n({\varvec{\theta }}_0)+o_P(1) = \textbf{L}_0+o_P(1). \end{aligned}$$
(7.12)

Since \(\widehat{\varvec{\theta }}_n\) is consistent for \({\varvec{\theta }}_0\), the matrix \(\mathbf{{H}}({\varvec{\theta }})\) is a continuous function of \({\varvec{\theta }}\) at \({\varvec{\theta }}_0\), and \(\mathbf{{H}}({\varvec{\theta }}_0)\) is of full rank, we deduce from Eqs. 7.11 and 7.12 that \(\mathbf{{C}}_n\widehat{\varvec{\lambda }}_n/a_n=O_P(1)\) as \(n\rightarrow \infty \). Assuming Eq. 3.8 as well, we conclude that \(\widehat{\varvec{\lambda }}_n=O_P(1)\) as \(n\rightarrow \infty \), and then letting \(n\rightarrow \infty \) in Eq. 7.11 through subsequence a for which \(\widehat{\varvec{\lambda }}_n\) has a finite limit in distribution, \({\varvec{\nu }}_0\), say, shows that

$$ \textbf{L}_0+\mathbf{{H}}({\varvec{\theta }}_0)\mathbf{{C}}{\varvec{\nu }}_0=\textbf{0}, $$

of which the unique solution is, by Eq. 3.10, \({\varvec{\nu }}_0={\varvec{\lambda }}_0\). So \(\widehat{\varvec{\lambda }}_n\overset{\textrm{P}}{\longrightarrow } {\varvec{\lambda }}_0\) as \(n\rightarrow \infty \). \(\square \)

Proof of Theorem 4.1

Throughout, restrict the sample space to an event on which the \((d+s)\times (d+s)\) negative second derivative matrix \(\mathbf{{U}}_n^{\lambda }({\varvec{\theta }})\) in Eq. 4.2 is nonsingular for all \(({\varvec{\theta }},{\varvec{\lambda }})\) in a neighbourhood of \(({\varvec{\theta }}_0,{\varvec{\lambda }}_0))\) with high probability, as is possible by Eq. 4.5. Keep \(\widehat{\varvec{\theta }}_n\in \Theta ^h\) and \(\widehat{\varvec{\lambda }}_n\) in this neighbourhood throughout the proof. Use Taylor’s theorem to write

$$\begin{aligned} \begin{bmatrix} \displaystyle {{\partial \mathcal{L}_n^{\widehat{\lambda }_n}(\widehat{\varvec{\theta }}_n)\over \partial {\varvec{\theta }}}} \\ & \\ \displaystyle {{\partial \mathcal{L}_n^{\widehat{\lambda }_n}(\widehat{\varvec{\theta }}_n)\over \partial {\varvec{\lambda }}}}\\ \end{bmatrix} = \begin{bmatrix} \displaystyle {{\partial \mathcal{L}_n^{{\lambda }_0}({\varvec{\theta }}_0)\over \partial {\varvec{\theta }}}} \\ & \\ \displaystyle {{\partial \mathcal{L}_n^{{\lambda }_0}({\varvec{\theta }}_0)\over \partial {\varvec{\lambda }}}} \\ \end{bmatrix} -\mathbf{{U}}_n^{\overline{{\lambda }}_n}(\overline{{\varvec{\theta }}}_n) \begin{bmatrix} \widehat{\varvec{\theta }}_n-{\varvec{\theta }}_0\\ & \\ \widehat{\varvec{\lambda }}_n-{\varvec{\lambda }}_0 \end{bmatrix} \end{aligned}$$
(7.13)

where \(\overline{{\varvec{\theta }}}_n=\alpha \widehat{\varvec{\theta }}_n+(1-\alpha ){\varvec{\theta }}_0\) and \(\overline{{\varvec{\lambda }}}_n=\beta \widehat{\varvec{\lambda }}_n+(1-\beta ){\varvec{\lambda }}_0\) for some \(\alpha \in [0,1]\), \(\beta \in [0,1]\). Recalling that \(\partial \mathcal{L}_n^{\lambda }({\varvec{\theta }})/\partial {\varvec{\theta }}=\textbf{S}_n^{\lambda }({\varvec{\theta }})\), in this interior case we have \( \partial \mathcal{L}_n^{\widehat{\lambda }_n}(\widehat{\varvec{\theta }}_n)/ \partial {\varvec{\theta }}= \textbf{S}_n^{\widehat{\lambda }_n}(\widehat{\varvec{\theta }}_n)=\textbf{0}_d\) since \((\widehat{\varvec{\theta }}_n,\widehat{\varvec{\lambda }}_n)\) satisfies (4.1), while from Eq. 2.2 we have

$$ {\partial \mathcal{L}_n^{\widehat{\lambda }_n}(\widehat{\varvec{\theta }}_n)\over \partial {\varvec{\lambda }}}= \textbf{C}_n^T\mathbf{{h}}(\widehat{\varvec{\theta }}_n)=\textbf{0}_s \quad \textrm{and}\quad {\partial \mathcal{L}_n^{{\lambda }_0}({\varvec{\theta }}_0)\over \partial {\varvec{\lambda }}} = \textbf{C}_n^T\mathbf{{h}}({\varvec{\theta }}_0)=\textbf{0}_s, $$

because \(\widehat{\varvec{\theta }}_n\in \Theta ^h\) and \({\varvec{\theta }}_0\in \Theta ^h\). Equation 7.13 now gives

$$\begin{aligned} \begin{bmatrix} \textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\\ \textbf{0}_s \\ \end{bmatrix} =\mathbf{{U}}_n^{\overline{{\lambda }}_n}(\overline{{\varvec{\theta }}}_n) \begin{bmatrix} \widehat{\varvec{\theta }}_n-{\varvec{\theta }}_0 \\ \widehat{\varvec{\lambda }}_n-{\varvec{\lambda }}_0 \end{bmatrix} \end{aligned}$$

which we can solve to get

$$\begin{aligned} \begin{bmatrix} \widehat{\varvec{\theta }}_n-{\varvec{\theta }}_0 \\ \widehat{\varvec{\lambda }}_n-{\varvec{\lambda }}_0 \end{bmatrix} =\big (\mathbf{{U}}_n^{\overline{{\lambda }}_n}(\overline{{\varvec{\theta }}}_n)\big )^{-1} \begin{bmatrix} \textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\\ \textbf{0}_s \\ \end{bmatrix}. \end{aligned}$$

Then write

$$\begin{aligned} \textbf{J}_n^T\begin{bmatrix} \widehat{\varvec{\theta }}_n-{\varvec{\theta }}_0 \\ \widehat{\varvec{\lambda }}_n-{\varvec{\lambda }}_0 \end{bmatrix} =\Big ( \textbf{J}_n^T\big (\mathbf{{U}}_n^{\overline{{\lambda }}_n}(\overline{{\varvec{\theta }}}_n)\big )^{-1} \textbf{J}_n\Big ) \textbf{J}_n^{-1} \begin{bmatrix} \textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\\ \textbf{0}_s \\ \end{bmatrix} \end{aligned}$$
(7.14)

in which \(\textbf{J}_n^T\big (\mathbf{{U}}_n^{\overline{{\lambda }}_n}(\overline{{\varvec{\theta }}}_n)\big )^{-1} \textbf{J}_n\overset{\textrm{P}}{\longrightarrow } \textbf{U}_0^{-1}\) by Eq. 4.5 (noting that \((\overline{{\varvec{\theta }}}_n, \overline{{\lambda }}_n) \overset{\textrm{P}}{\longrightarrow } ({\varvec{\theta }}_0, {\lambda }_0)\)), while by Eq. 4.3

$$\begin{aligned} \textbf{J}_n^{-1} \begin{bmatrix} \textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\\ \textbf{0} \\ \end{bmatrix}= & \begin{bmatrix} \textbf{D}_n^{-1} \textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\\ \textbf{0} \\ \end{bmatrix} \overset{\textrm{D}}{\longrightarrow } \begin{bmatrix} \textbf{Z}\\ \textbf{0} \\ \end{bmatrix}. \end{aligned}$$
(7.15)

Then Eq. 4.6 follows from Eqs. 7.14 and 7.15. \(\square \)

Proof of Theorem 5.1

Assume (A1)–(A3), Eqs. 3.6, 5.4 and 5.5, and that \(\textbf{D}_n^{-1}\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\overset{\textrm{D}}{\longrightarrow } \textbf{Z}\). For \({\varvec{\theta }}\in \Theta \) and \({\varvec{\lambda }}\in \mathbb {R}^s\) define the \((d+s)\times 1\) vectors

$$\begin{aligned} {\varvec{\xi }}= {\varvec{\xi }}({\varvec{\theta }}, {\varvec{\lambda }}):= \begin{bmatrix} {\varvec{\theta }}\\ {\varvec{\lambda }}\end{bmatrix} \quad \textrm{and}\quad \textbf{T}_n^{\lambda }({\varvec{\theta }}):= \begin{bmatrix} \displaystyle {{\partial \mathcal{L}_n^{{\lambda }}({\varvec{\theta }})\over \partial {\varvec{\theta }}}} \\ & \\ \displaystyle {{\partial \mathcal{L}_n^{{\lambda }}({\varvec{\theta }})\over \partial {\varvec{\lambda }}}} \\ \end{bmatrix}, \end{aligned}$$

and define \(\mathbf{{U}}_n^{{\lambda }}({\varvec{\theta }})\) as in Eq. 4.2. By Eq. 2.4, \(\partial \mathcal{L}_n^{{\lambda }}({\varvec{\theta }})/\partial {\varvec{\theta }}= S_n^{\lambda }({\varvec{\theta }})\). We assume (3.6), so by Eq. 2.2,

$$\begin{aligned} \displaystyle {{\partial \mathcal{L}_n^{{\lambda }}({\varvec{\theta }})\over \partial {\varvec{\lambda }}}} = \displaystyle {{\partial \over \partial {\varvec{\lambda }}}} \left( \mathcal{L}_n({\varvec{\theta }})+{\varvec{\lambda }}^T \textbf{C}_n^T\mathbf{{h}}({\varvec{\theta }})\right) = \textbf{C}_n^T\mathbf{{h}}({\varvec{\theta }}), \end{aligned}$$

and note that this equals 0 when \({\varvec{\theta }}={\varvec{\theta }}_0\).

Now keep \({\varvec{\theta }}\in N_n^h(A)\). Since \(\mathcal{L}_n({\varvec{\theta }})\) and \(\mathcal{L}_n^{{\lambda }}({\varvec{\theta }})\) agree on \(N_n^h(A)\) for any \({\varvec{\lambda }}\), a Taylor expansion as in Eq. 7.13 gives

$$\begin{aligned} & 2\left( \mathcal{L}_n({\varvec{\theta }})-\mathcal{L}_n({\varvec{\theta }}_0)\right) = 2\big (\mathcal{L}_n^{{\lambda }}({\varvec{\theta }})-\mathcal{L}_n^{{\lambda }_0}({\varvec{\theta }}_0)\big )\\ & = 2({\varvec{\xi }}-{\varvec{\xi }}_0)^T \textbf{T}_n^{{\lambda }_0}({\varvec{\theta }}_0) -({\varvec{\xi }}-{\varvec{\xi }}_0)^T \mathbf{{U}}_n^{\overline{{\lambda }}}(\overline{{\varvec{\theta }}}) ({\varvec{\xi }}-{\varvec{\xi }}_0) \\ & = 2({\varvec{\xi }}-{\varvec{\xi }}_0)^T \textbf{T}_n^{{\lambda }_0}({\varvec{\theta }}_0) -({\varvec{\xi }}-{\varvec{\xi }}_0)^T\textbf{J}_n\mathbf{{U}}_0\textbf{J}_n^T({\varvec{\xi }}-{\varvec{\xi }}_0)-t_n(\overline{{\varvec{\theta }}}), \end{aligned}$$
(7.16)

where \((\overline{{\varvec{\theta }}}, \overline{{\varvec{\lambda }}})=(\alpha {\varvec{\theta }}+(1-\alpha ){\varvec{\theta }}_0,\beta {\varvec{\lambda }}+(1-\beta ){\varvec{\lambda }}_0)=: \overline{{\varvec{\xi }}}\) for some \(\alpha , \beta \in [0,1]\), \(\mathbf{{U}}_0\) is the limit matrix in Eq. 5.7, and

$$\begin{aligned} t_n(\overline{{\varvec{\theta }}}):=({\varvec{\xi }}-{\varvec{\xi }}_0)^T\textbf{J}_n \big (\textbf{J}_n^{-1} \mathbf{{U}}_n^{\overline{{\lambda }}}(\overline{{\varvec{\theta }}})\textbf{J}_n^{-T}-\mathbf{{U}}_0\big )\textbf{J}_n^T({\varvec{\xi }}-{\varvec{\xi }}_0). \end{aligned}$$
(7.17)

Let \(\textbf{Y}_n:= \textbf{J}_n^{-1} \textbf{T}_n^{{\lambda }_0}({\varvec{\theta }}_0)\), and now set \({\varvec{\lambda }}={\varvec{\lambda }}_0\). Then we can rewrite (7.16) as

$$\begin{aligned} & 2\left( \mathcal{L}_n({\varvec{\theta }})-\mathcal{L}_n({\varvec{\theta }}_0)\right) \\ & = 2({\varvec{\xi }}-{\varvec{\xi }}_0)^T\textbf{J}_n \textbf{Y}_n -({\varvec{\xi }}-{\varvec{\xi }}_0)^T\textbf{J}_n\mathbf{{U}}_0\textbf{J}_n^T({\varvec{\xi }}-{\varvec{\xi }}_0) -t_n(\overline{{\varvec{\theta }}}) \\ & =: q_n({\varvec{\theta }}) -t_n(\overline{{\varvec{\theta }}}), \end{aligned}$$
(7.18)

where \({\varvec{\xi }}-{\varvec{\xi }}_0= [({\varvec{\theta }}-{\varvec{\theta }}_0)^T\ \textbf{0}^T]^T\) and \(q_n({\varvec{\theta }})\) and \(t_n({\varvec{\theta }})\) depend on \({\varvec{\theta }}\) but not on \({\varvec{\lambda }}\). By assumption there are \(\widehat{\varvec{\theta }}_n^\Omega \in \Omega \) and \(\widehat{\varvec{\theta }}_n^\tau \in \tau \), not depending on \({\varvec{\lambda }}_0\), both consistent for \({\varvec{\theta }}_0\), such that Eq. 3.5 holds with \(\widehat{\varvec{\theta }}_n^\Omega \) and \(\widehat{\varvec{\theta }}_n^\tau \) substituted for \(\widehat{\varvec{\theta }}_n\). Recall the matrices \(\textbf{D}_n\) and \(\textbf{E}_n\) in Eq. 4.4. Since \(\textbf{D}_n^{-1}\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\overset{\textrm{D}}{\longrightarrow } \textbf{Z}\), given \(\varepsilon \in (0,1)\), we can find \(A_0(\varepsilon )\) and for each \(A\ge A_0(\varepsilon )\) an \(n_0(A,\varepsilon )\) such that, for \(n\ge n_0\), the event

$$\begin{aligned} E_n:= \{\widehat{\varvec{\theta }}_n^\Omega \in N_n^h(A),\ \widehat{\varvec{\theta }}_n^\tau \in N_n^h(A), \ |\textbf{P}^{T/2}\textbf{D}_n^{-1}\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)| \le A||\textbf{F}_0||/2\} \end{aligned}$$
(7.19)

has probability exceeding \(1-\varepsilon \). (Recall that \(\textbf{F}_0\) and \(\textbf{P}\) are defined in the statement of Theorem 5.1.) In what follows we suppose event \(E_n\) occurs.

We work at first with \(\widehat{\varvec{\theta }}_n^\Omega \). Let \(\breve{\varvec{\theta }}_n^\Omega \) denote the value that maximizes the quadratic function \(q_n({\varvec{\theta }})\) in Eq. 7.18 on the closed convex \(N_n(A)\cap \Omega \). We show that \(\mathcal{L}_n(\widehat{\varvec{\theta }}_n^\Omega )\) can be replaced to sufficient accuracy with \(\mathcal{L}_n(\breve{\varvec{\theta }}_n^\Omega )\), which in turn is well approximated by the quadratic function \(q_n(\breve{\varvec{\theta }}_n^\Omega )\). The latter quantity has an asymptotic distribution which can be expressed in terms of the first limit random variable in Eq. 5.6.

To carry out this program use (4.5), the uniform convergence in Eqs. 5.5, and 7.17 (with \({\varvec{\lambda }}={\varvec{\lambda }}_0\)) to see that \(t_n(\breve{\varvec{\theta }}_n^\Omega )\overset{\textrm{P}}{\longrightarrow } 0\) and \(t_n(\widehat{\varvec{\theta }}_n^\Omega )\overset{\textrm{P}}{\longrightarrow } 0\) as \(n\rightarrow \infty \). Since \(q_n(\widehat{\varvec{\theta }}_n^\Omega )\le q_n(\breve{\varvec{\theta }}_n^\Omega )\), Eq. 7.18 implies

$$\begin{aligned} 0\le 2\big (\mathcal{L}_n(\widehat{\varvec{\theta }}_n^\Omega )-\mathcal{L}_n(\breve{\varvec{\theta }}_n^\Omega )\big )= & q_n(\widehat{\varvec{\theta }}_n^\Omega )- q_n(\breve{\varvec{\theta }}_n^\Omega ) - t_n(\widehat{\varvec{\theta }}_n^\Omega )+t_n(\breve{\varvec{\theta }}_n^\Omega ) \\\le & -t_n(\widehat{\varvec{\theta }}_n^\Omega )+t_n(\breve{\varvec{\theta }}_n^\Omega ) = o_P(1),\ \textrm{as}\ n\rightarrow \infty . \end{aligned}$$

Hence, because \(\breve{\varvec{\theta }}_n^\Omega \) maximizes \(q_n({\varvec{\theta }})\) on \(N_n(A)\cap \Omega \),

$$\begin{aligned} & 2\big (\mathcal{L}_n(\widehat{\varvec{\theta }}_n^\Omega )-\mathcal{L}_n({\varvec{\theta }}_0)\big ) = 2\big (\mathcal{L}_n(\breve{\varvec{\theta }}_n^\Omega )-\mathcal{L}_n({\varvec{\theta }}_0)\big ) +o_p(1) = q_n(\breve{\varvec{\theta }}_n^\Omega ) +o_p(1) \\ & \\= & \sup _{{\varvec{\theta }}\in N_n(A)\cap \Omega }q_n({\varvec{\theta }})+o_p(1) = -\inf _{{\varvec{\theta }}\in N_n(A)\cap \Omega }(-q_n({\varvec{\theta }})) +o_p(1). \end{aligned}$$

Recalling the definition of \(q_n(\theta )\) in Eq. 7.18, write the last expression as

$$\begin{aligned} -\inf _{{\varvec{\theta }}\in N_n(A)\cap \Omega } \Big (-2({\varvec{\xi }}-{\varvec{\xi }}_0)^T \textbf{J}_n\textbf{Y}_n + ({\varvec{\xi }}-{\varvec{\xi }}_0)^T\textbf{J}_n\mathbf{{U}}_0\textbf{J}_n^T({\varvec{\xi }}-{\varvec{\xi }}_0)\Big ) +o_p(1). \end{aligned}$$
(7.20)

Recall that the auxiliary matrices \(\textbf{T}_n\) introduced in Eq. 5.2 are orthogonal, so \(||\textbf{T}_n\textbf{D}_n||= ||\textbf{D}_n||\). Transform from \(({\varvec{\theta }},{\varvec{\lambda }})\) to \(\widetilde{\varvec{\xi }}= (\widetilde{{\varvec{\theta }}}, \widetilde{{\varvec{\lambda }}})= \big (\textbf{T}_n\textbf{D}_n({\varvec{\theta }}-{\varvec{\theta }}_0), \textbf{0}\big )\). Since \(C_\Omega \cap \mathcal{N}= \Omega \cap \mathcal{N}\), by Eq. 5.2, \({\varvec{\theta }}\in N_n(A)\cap \Omega \) iff \(|\widetilde{{\varvec{\theta }}}|\le A\) and \(\widetilde{{\varvec{\theta }}}\in \widetilde{C}_{\Omega _n}\). Then the \(\inf \) in Eq. 7.20 can be replaced by an \(\inf \) over \(|{\varvec{\theta }}|\le A\), \({\varvec{\theta }}\in \widetilde{C}_{\Omega _n}\) and we get

$$\begin{aligned} 2\big (\mathcal{L}_n(\widehat{\varvec{\theta }}_n^\Omega )-\mathcal{L}_n({\varvec{\theta }}_0)\big ) = -\inf _{{\varvec{\theta }}\in \widetilde{C}_{\Omega _n}, |{\varvec{\theta }}|\le A} \big (-2\widetilde{{\varvec{\xi }}}^T \textbf{Y}_n +\widetilde{{\varvec{\xi }}}^T\mathbf{{U}}_0\widetilde{{\varvec{\xi }}}\big ) + o_p(1). \end{aligned}$$
(7.21)

Note that since \({\varvec{\theta }}_0\in \Theta ^h\) and \(\textbf{h}({\varvec{\theta }})\) has a bounded second derivative on a neighborhood of \({\varvec{\theta }}_0\) we have, for any \({\varvec{\theta }}\in \Theta ^h\), \(\textbf{0}=\textbf{h}({\varvec{\theta }}) -\textbf{h}({\varvec{\theta }}_0) =\textbf{H}({\varvec{\theta }}_0)({\varvec{\theta }}-{\varvec{\theta }}_0)+O(|{\varvec{\theta }}-{\varvec{\theta }}_0|^2)\). So \(\textbf{H}({\varvec{\theta }}_0)({\varvec{\theta }}-{\varvec{\theta }}_0)= o_p(1)\) for \({\varvec{\theta }}\in N_n^h(A)\). Observe that the expression on the RHS of Eq. 7.21 is, apart from an \(o_p(1)\) term,

$$\begin{aligned} -2\widetilde{{\varvec{\xi }}}^T \textbf{Y}_n +\widetilde{{\varvec{\xi }}}^T\mathbf{{U}}_0\widetilde{{\varvec{\xi }}} =\big (\textbf{Y}_n-\mathbf{{U}}_0\widetilde{{\varvec{\xi }}}\big )^T\mathbf{{U}}_0^{-1}\ \big (\textbf{Y}_n-\mathbf{{U}}_0\widetilde{{\varvec{\xi }}}\big ) -\textbf{Y}_n^T\mathbf{{U}}_0^{-1}\textbf{Y}_n. \end{aligned}$$

Since \(\textbf{Y}_n^T\mathbf{{U}}_0^{-1}\textbf{Y}_n\) does not depend on \({\varvec{\theta }}\) we can ignore it for the time being. Then to evaluate (7.21) set \(\widetilde{\varvec{\eta }}=\mathbf{{U}}_0\widetilde{{\varvec{\xi }}}= [\textbf{F}_0^T\ -\textbf{C}^T\textbf{H}^T({\varvec{\theta }}_0)]^T\widetilde{{\varvec{\xi }}} = [(\textbf{F}_0{\varvec{\theta }})^T\ \textbf{0}^T]^T+o_p(1)\) and calculate

$$\begin{aligned} & -\inf _{{\varvec{\theta }}\in \widetilde{C}_{\Omega _n}, |{\varvec{\theta }}|\le A} \big (\textbf{Y}_n-\widetilde{\varvec{\eta }}\big )^T\mathbf{{U}}_0^{-1} \big (\textbf{Y}_n-\widetilde{\varvec{\eta }}\big )+o_p(1) \\= & -\inf _{{\varvec{\theta }}\in \breve{C}_{\Omega _n}, |\textbf{F}_0{\varvec{\theta }}|\le A} \left( \textbf{Y}_n-{\varvec{\eta }}\right) ^T\mathbf{{U}}_0^{-1} \left( \textbf{Y}_n-{\varvec{\eta }}\right) +o_p(1). \end{aligned}$$
(7.22)

In the last step we transformed from \(\widetilde{C}_{\Omega _n}\) to the set \(\breve{C}_{\Omega _n}= \{\textbf{F}_0{\varvec{\theta }}: {\varvec{\theta }}\in \widetilde{C}_{\Omega _n}\}\), where \(\textbf{F}_0\) is the nonsingular limit matrix in Eq. 5.7. Then \(\widetilde{\varvec{\eta }}\) transforms to \( {\varvec{\eta }}=[{\varvec{\theta }}^T\ \textbf{0}^T]^T\). Since \(\textbf{D}_n^{-1}\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\overset{\textrm{D}}{\longrightarrow } \textbf{Z}\) we have

$$\begin{aligned} \textbf{Y}_n= \textbf{J}_n^{-1} \textbf{T}_n^{{\lambda }_0}({\varvec{\theta }}_0)= \begin{bmatrix} \textbf{D}_n^{-1} \textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\\ \textbf{0} \\ \end{bmatrix} =: \begin{bmatrix} \textbf{Z}_n\\ \textbf{0} \\ \end{bmatrix} \overset{\textrm{D}}{\longrightarrow } \begin{bmatrix} \textbf{Z}\\ \textbf{0} \\ \end{bmatrix}, \end{aligned}$$
(7.23)

and the expression in the RHS of Eq. 7.22 can be written, apart from an \(o_p(1)\) term, as

$$\begin{aligned}&\left( \textbf{Y}_n-{\varvec{\eta }}\right) ^T\mathbf{{U}}_0^{-1} \left( \textbf{Y}_n-{\varvec{\eta }}\right) = \begin{bmatrix} \textbf{Z}_n-{\varvec{\theta }}\\ \textbf{0} \\ \end{bmatrix}^T \begin{bmatrix} \textbf{P} & \textbf{Q}\\ \textbf{Q}^T & \mathbf{{R}}\end{bmatrix} \begin{bmatrix} \textbf{Z}_n-{\varvec{\theta }}\\ \textbf{0} \\ \end{bmatrix} \\&\\&\qquad = \left( \textbf{Z}_n-{\varvec{\theta }}\right) ^T \textbf{P} \left( \textbf{Z}_n-{\varvec{\theta }}\right) = |\mathbf{{P}}^{T/2}\textbf{Z}_n-\mathbf{{P}}^{T/2}{\varvec{\theta }}|^2. \end{aligned}$$
(7.24)

Here \(\mathbf{{P}}=\mathbf{{P}}^{1/2} \mathbf{{P}}^{T/2}\) is a square root decomposition of the positive semi-definite matrix \(\mathbf{{P}}\). Transform from \(\mathbf{{P}}^{T/2}{\varvec{\theta }}\) to \({\varvec{\theta }}\), so that, referring to Eq. 7.22,

$$\begin{aligned} & -\inf _{{\varvec{\theta }}\in \breve{C}_{\Omega _n},\, |\textbf{F}_0{\varvec{\theta }}|\le A} \left( \textbf{Y}_n-{\varvec{\eta }}\right) ^T\mathbf{{U}}_0^{-1} \left( \textbf{Y}_n-{\varvec{\eta }}\right) \\ & = -\inf _{{\varvec{\theta }}\in \breve{C}_{\Omega _n},\, |\textbf{F}_0{\varvec{\theta }}|\le A} |\mathbf{{P}}^{T/2}\textbf{Z}_n-\mathbf{{P}}^{T/2}{\varvec{\theta }}|^2 \\ & \\ & = -\inf _{{\varvec{\theta }}\in \dot{C}_{\Omega _n},\, |\textbf{P}^{T/2}\textbf{F}_0{\varvec{\theta }}|\le A} |\mathbf{{P}}^{T/2}\textbf{Z}_n-{\varvec{\theta }}|^2, \end{aligned}$$

where \( \dot{C}_{\Omega _n}=\{\mathbf{{P}}^{T/2}{\varvec{\theta }}:{\varvec{\theta }}\in \breve{C}_{\Omega _n}\} =\{\mathbf{{P}}^{T/2}\textbf{F}_0{\varvec{\theta }}:{\varvec{\theta }}\in \widetilde{C}_{\Omega _n}\}\).

Since \(\dot{C}_{\Omega _n}\) contains 0 we have

$$\begin{aligned} \inf _{{\varvec{\theta }}\in \dot{C}_{\Omega _n}} |\mathbf{{P}}^{T/2}\textbf{Z}_n-{\varvec{\theta }}|^2 \le |\mathbf{{P}}^{T/2}\textbf{Z}_n|^2 \le A^2||\textbf{F}_0||^2/4, \end{aligned}$$

where the last inequality follows from Eq. 7.19 because \(\textbf{Z}_n=\textbf{D}_n^{-1} \textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\) and we assumed the event \(E_n\) occurs so \(|\mathbf{{P}}^{T/2}\textbf{Z}_n|\le A||\textbf{F}_0||/2\). There exists \(\dot{\varvec{\theta }}_n\in \dot{C}_{\Omega _n}\) such that

$$ |\mathbf{{P}}^{T/2}\textbf{Z}_n - \dot{\varvec{\theta }}_n| = \inf _{{\varvec{\theta }}\in \dot{C}_{\Omega _n}}|\mathbf{{P}}^{T/2}\textbf{Z}_n-{\varvec{\theta }}| $$

so \(|\dot{\varvec{\theta }}_n|\le A||\textbf{F}_0||\). Hence

$$\begin{aligned} \inf _{{\varvec{\theta }}\in \dot{C}_{\Omega _n}, \, |\dot{\varvec{\theta }}|\le A||\textbf{F}_0||} |\mathbf{{P}}^{T/2}\textbf{Z}_n-{\varvec{\theta }}| = \inf _{{\varvec{\theta }}\in \dot{C}_{\Omega _n}} |\mathbf{{P}}^{T/2}\textbf{Z}_n-{\varvec{\theta }}|. \end{aligned}$$
(7.25)

Putting back in the omitted term \(\textbf{Y}_n^T\mathbf{{U}}_0^{-1}\textbf{Y}_n\), it follows from Eqs. 7.24 and 7.25 that

$$\begin{aligned} 2\big (\mathcal{L}_n(\widehat{\varvec{\theta }}_n^\Omega )-\mathcal{L}_n({\varvec{\theta }}_0)\big ) =-\inf _{{\varvec{\theta }}\in \dot{C}_{\Omega _n}} |\mathbf{{P}}^{T/2}\textbf{Z}_n-{\varvec{\theta }}|^2- \textbf{Y}_n^T\mathbf{{U}}_0^{-1}\textbf{Y}_n +o_P(1). \end{aligned}$$
(7.26)

The same analysis, with \(\tau \) replacing \(\Omega \), gives

$$\begin{aligned} 2\big (\mathcal{L}_n(\widehat{\varvec{\theta }}_n^\tau )-\mathcal{L}_n({\varvec{\theta }}_0)\big ) =-\inf _{{\varvec{\theta }}\in \dot{C}_{\tau _n}} |\mathbf{{P}}^{T/2}\textbf{Z}_n-{\varvec{\theta }}|^2- \textbf{Y}_n^T\mathbf{{U}}_0^{-1}\textbf{Y}_n +o_P(1). \end{aligned}$$
(7.27)

Now Eqs. 7.23, 7.26 and 7.27 and the continuous mapping theorem imply

$$\begin{aligned} d_n= & 2\big (\mathcal{L}_n(\widehat{\varvec{\theta }}_n^\tau )-\mathcal{L}_n(\widehat{\varvec{\theta }}_n^\Omega )\big ) \\&\overset{\textrm{D}}{\longrightarrow }&\inf _{{\varvec{\theta }}\in \dot{C}_{\Omega }}|\mathbf{{P}}^{T/2}\textbf{Z}-{\varvec{\theta }}|^2 - \inf _{{\varvec{\theta }}\in \dot{C}_{\tau }}|\mathbf{{P}}^{T/2}\textbf{Z}-{\varvec{\theta }}|^2, \end{aligned}$$

where \( \dot{C}_{\Omega }=\{\mathbf{{P}}^{T/2}\textbf{F}_0{\varvec{\theta }}:{\varvec{\theta }}\in \widetilde{C}_{\Omega }\}\), \( \dot{C}_{\tau }=\{\mathbf{{P}}^{T/2}\textbf{F}_0{\varvec{\theta }}:{\varvec{\theta }}\in \widetilde{C}_{\tau }\}\), and the convergence follows from Eq. 5.3, which implies

$$ \inf _{{\varvec{\theta }}\in \dot{C}_{\Omega _n}}|{\varvec{\beta }}-{\varvec{\theta }}|^2 \rightarrow \inf _{{\varvec{\theta }}\in \dot{C}_{\Omega }}|{\varvec{\beta }}-{\varvec{\theta }}|^2 $$

uniformly in \({\varvec{\beta }}\), and similarly for \(\dot{C}_{\tau }\). This gives (5.6) and completes the proof of Theorem 5.1. \(\square \)

Appendix 2: Application: The Langevin (von Mises-Fisher) Distribution

This distribution, defined on the unit sphere in \(\mathbb {R}^d\), displays in a striking way some of the features we wish to illustrate. Let \(\mathcal {S}^d\) be the unit sphere in \(\mathbb {R}^d\), \(d\ge 2\), and suppose random vector \(\textbf{X}\in \mathcal {S}^d\) has density

$$\begin{aligned} f_\textbf{X}(\textbf{x})=\frac{1}{c(\kappa )}e^{\kappa {\varvec{\mu }}^T\textbf{x}},\ \textbf{x}\in \mathcal {S}^d, \end{aligned}$$
(8.1)

where \(\kappa >0\), \({\varvec{\mu }}\in \mathbb {R}^d\) and

$$\begin{aligned} c(\kappa )=\int _{\textbf{x}\in \mathcal {S}^d} e^{\kappa {\varvec{\mu }}^T\textbf{x}}\textrm{d}\omega (\textbf{x}). \end{aligned}$$
(8.2)

Here \(\textrm{d}\omega (\cdot )\) is the area element on \(\mathcal {S}^d\) such that

$$\begin{aligned} \int _{\textbf{x}\in \mathcal {S}^d} \textrm{d}\omega (\textbf{x}) =\omega (\mathcal {S}^d)=\mathrm{area\ of}\ \mathcal {S}^d = \frac{2\pi ^{d/2}}{\Gamma (d/2)}. \end{aligned}$$

It is easy to show that \(c(\kappa )\) does not in fact depend on \({\varvec{\mu }}\), despite appearances in Eq. 8.2. The formulae

$$\begin{aligned} E\textbf{X}=\frac{c_\kappa }{c(\kappa )}{\varvec{\mu }}\quad \textrm{and}\quad \textrm{Var}(\textbf{X})= \frac{c_\kappa }{\kappa c(\kappa )}\left( \textbf{I}_d-{\varvec{\mu }}{\varvec{\mu }}^T\right) +a(\kappa ){\varvec{\mu }}{\varvec{\mu }}^T, \end{aligned}$$
(8.3)

where \(a(\kappa )=(c_{\kappa \kappa }c(\kappa )-c_\kappa ^2)/c^2(\kappa )>0\), with \(c_\kappa =c'(\kappa )\) and \(c_{\kappa \kappa }=c''(\kappa )\), for \(\kappa >0\), can be found in Watson (1983), together with the following useful relations. We have

$$\begin{aligned} c(\kappa )>0,\ c'(\kappa )>0,\ c''(\kappa )>0, \end{aligned}$$
(8.4)

and the function \(A(\kappa ):=c_\kappa /c(\kappa )\), \(\kappa >0\), satisfies

$$\begin{aligned} A(0)=0,\ A(\infty )=1,\ A'(0)=a(0)=\frac{1}{d},\ A'(\infty )=0 \end{aligned}$$

and

$$\begin{aligned} \ A'(\kappa )=a(\kappa )> 0,\ A''(\kappa )=a'(\kappa )<0,\ \mathrm{for\ all}\ \kappa >0. \end{aligned}$$
(8.5)

Let \(\textbf{X}_1,\ldots , \textbf{X}_n\) be n i.i.d. observations on \(\textbf{X}\), with log-likelihood

$$\begin{aligned} \mathcal{L}_n({\varvec{\theta }})=n\left( \kappa {\varvec{\mu }}^T\overline{\textbf{X}}_n-\log c(\kappa )\right) , \end{aligned}$$
(8.6)

where \({\varvec{\theta }}=[\kappa \ {\varvec{\mu }}^T]^T\in \Theta := (0,\infty )\times \mathbb {R}^d\) and \(\overline{\textbf{X}}_n=n^{-1}\sum _{i=1}^n\textbf{X}_i\). (We alter notation slightly here by setting \({\varvec{\theta }}\) in \(\mathbb {R}^{d+1}\), because it’s convenient to have \({\varvec{\mu }}\) in \(\mathbb {R}^d\).) By Eq. 8.3

$$\begin{aligned} E\overline{\textbf{X}}_n=E\textbf{X}=A(\kappa ) {\varvec{\mu }}\quad \textrm{and}\quad \textrm{Var}(\overline{\textbf{X}}_n)= \frac{1}{n}\Big (\frac{A(\kappa )}{\kappa }\big (\textbf{I}_d-{\varvec{\mu }}{\varvec{\mu }}^T\big ) +a(\kappa ){\varvec{\mu }}{\varvec{\mu }}^T\Big ), \end{aligned}$$
(8.7)

for any \(\kappa >0\), \({\varvec{\mu }}\in \mathcal{S}^d\). Since

$$\begin{aligned}\textbf{u}^T \textrm{Var}(\textbf{X})\textbf{u}= \frac{A(\kappa )}{n\kappa }(1-|\textbf{u}^T{\varvec{\mu }}|^2) +a(\kappa )|\textbf{u}^T{\varvec{\mu }}|^2>0 \end{aligned}$$

for all \(\textbf{u}\in S^{d-1}\), \( \textrm{Var}(\textbf{X})\) and \( \textrm{Var}(\overline{\textbf{X}}_n)\) are positive definite matrices.

Let \({\varvec{\theta }}_0\in \mathbb {R}^{d+1}\) denote the true value of \({\varvec{\theta }}\), for which \(\textbf{X}\) has the density (8.1) with \([\kappa _0\ {\varvec{\mu }}_0^T]^T={\varvec{\theta }}_0\). The log-likelihood is to be maximised subject toFootnote 7

$$\begin{aligned} h({\varvec{\theta }})={\varvec{\mu }}^T{\varvec{\mu }}-1=0. \end{aligned}$$
(8.8)

Thus we have a single restriction, corresponding to \(s=1\) in Eq. 2.3, and we write \(h({\varvec{\theta }})\) rather than \(\mathbf{{h}}({\varvec{\theta }})\). The restricted parameter space is \(\Theta ^h=\{{\varvec{\theta }}\in \Theta :|{\varvec{\mu }}|=1\}= (0,\infty )\times \mathcal {S}^d\).

We can calculate

$$\begin{aligned} \textbf{S}_n({\varvec{\theta }}):= {\partial \mathcal{L}_n({\varvec{\theta }})\over \partial {\varvec{\theta }}}= n\begin{bmatrix} {\varvec{\mu }}^T\overline{\textbf{X}}_n- \frac{c_\kappa }{c(\kappa )}\\ \kappa \overline{\textbf{X}}_n\\ \end{bmatrix} = n\begin{bmatrix} {\varvec{\mu }}^T\left( \overline{\textbf{X}}_n- E\textbf{X}\right) \\ \kappa \overline{\textbf{X}}_n\\ \end{bmatrix}_{(d+1)\times 1}, \end{aligned}$$
(8.9)

and

$$\begin{aligned} \textbf{F}_n({\varvec{\theta }})= -\frac{\partial ^2 \mathcal{L}_n({\varvec{\theta }})}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T} =n\begin{bmatrix} a(\kappa ) & -\overline{\textbf{X}}_n^T \\ -\overline{\textbf{X}}_n& \textbf{0}_{d\times d} \end{bmatrix}_{(d+1)\times (d+1)}. \end{aligned}$$
(8.10)

Observe that the expected first derivative \(E(\textbf{S}_n({\varvec{\theta }}))=[0\ n\kappa A(\kappa ){\varvec{\mu }}^T]^T\) is not equal to 0 for any \({\varvec{\theta }}\in \Theta \), and \(\textbf{F}_n({\varvec{\theta }}) \) is singularFootnote 8 for all \({\varvec{\theta }}\in \Theta \). So “standard" asymptotic theory for MLEs does not apply.

We apply the theory in Sections 24. Choose \({\lambda }_n=C_n{\lambda }\), where \(C_n>0\); \({\lambda }_n\) and \({\lambda }\) are scalars here. So we analyse (8.8) in conjunction with

$$\begin{aligned} \textbf{S}_n^{\lambda }({\varvec{\theta }})= {\partial \mathcal{L}_n({\varvec{\theta }})\over \partial {\varvec{\theta }}}+{\lambda }C_n{\partial h({\varvec{\theta }})\over \partial {\varvec{\theta }}}=0. \end{aligned}$$

We aim first to verify conditions (3.2), (3.3) and (3.4) of Theorem 3.1 so as to establish existence and consistency of an estimator \(\widehat{\varvec{\theta }}_n\) for \({\varvec{\theta }}_0\). To this end, calculate

$$\begin{aligned} \mathbf{{H}}({\varvec{\theta }})= {\partial h({\varvec{\theta }})\over \partial {\varvec{\theta }}}= 2 \begin{bmatrix} 0\\ {\varvec{\mu }}\\ \end{bmatrix}_{(d+1)\times 1}, \end{aligned}$$
(8.11)

and

$$\begin{aligned} 2\begin{bmatrix} 0 & \textbf{0}_d^T \\ \textbf{0}_d & \textbf{I}_d \end{bmatrix}_{(d+1)\times (d+1)}. \end{aligned}$$

We use these to augment \(\textbf{S}_n({\varvec{\theta }})\) and \(\textbf{F}_n({\varvec{\theta }})\) to \(\textbf{S}_n^{\lambda }({\varvec{\theta }})\) and \(\textbf{F}_n^{\lambda }({\varvec{\theta }})\), as in Eqs. 2.4 and 2.5.

As expected in this i.i.d. setup, a multiplier n appears in Eqs. 8.9 and 8.10, and prompts choosing \(\textbf{D}_n=\sqrt{n}\textbf{I}_{d+1}\) in Eqs. 3.13.4 and \(a_n=C_n=an\), \(a>0\), in Eq. 3.8. Then \(C=1\), a scalar, in Eq. 3.8. Accordingly, for \({\lambda }\in \mathbb {R}\), we set

$$\begin{aligned} \textbf{S}_n^{\lambda }({\varvec{\theta }})=\textbf{S}_n({\varvec{\theta }})+{\lambda }a n\mathbf{{H}}({\varvec{\theta }}) =n\begin{bmatrix} {\varvec{\mu }}^T(\overline{\textbf{X}}_n-E\textbf{X})\\ \kappa \overline{\textbf{X}}_n+2{\lambda }a {\varvec{\mu }}\\ \end{bmatrix}_{(d+1)\times 1} \end{aligned}$$
(8.12)

and

$$\begin{aligned} \textbf{F}_n^{\lambda }({\varvec{\theta }})= \textbf{F}_n({\varvec{\theta }})-{\lambda }a n\frac{\partial ^2 h({\varvec{\theta }})}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T} =n\begin{bmatrix} a(\kappa ) & -\overline{\textbf{X}}_n^T \\ -\overline{\textbf{X}}_n& -2{\lambda }a \textbf{I}_{d} \end{bmatrix}_{(d+1)\times (d+1)}. \end{aligned}$$
(8.13)

By Eqs. 8.7, 8.9, and the weak law of large numbers,

$$\begin{aligned} \frac{1}{an}\textbf{S}_n({\varvec{\theta }})\overset{\textrm{P}}{\longrightarrow } \frac{1}{a} \begin{bmatrix} 0\\ \kappa E\textbf{X}\end{bmatrix}=:\textbf{L}, \ \textrm{as}\ n\rightarrow \infty , \end{aligned}$$
(8.14)

for \({\varvec{\theta }}\in \Theta \). Taking \({\varvec{\theta }}={\varvec{\theta }}_0\) shows that (3.9) holds with \(a_n=an\) and that \(\textbf{L}_0:= [0\ \kappa _0 A(\kappa _0) {\varvec{\mu }}_0^T/a]^T\).

The equations (3.10) in this situation are, for general \({\varvec{\theta }}\),

$$\begin{aligned} \textbf{L}+{\lambda }\mathbf{{H}}({\varvec{\theta }})= \frac{1}{a}\begin{bmatrix} 0\\ \kappa E\textbf{X}\end{bmatrix} +{\lambda }\begin{bmatrix} 0\\ 2{\varvec{\mu }}\end{bmatrix}=\textbf{0}. \end{aligned}$$

Noting that, by Eq. 8.3, \(E\textbf{X}=c_{\kappa }{\varvec{\mu }}/c(\kappa ) =A(\kappa ){\varvec{\mu }}\), these give

$$\begin{aligned} \frac{1}{a} \kappa A(\kappa ) {\varvec{\mu }}+2{\lambda }{\varvec{\mu }}=\textbf{0}, \ \textrm{hence} \ 2{\lambda }a{\varvec{\mu }}= -\kappa E\textbf{X}\ \textrm{and}\ -2{\lambda }a=\kappa A(\kappa ). \end{aligned}$$
(8.15)

It follows from Eq. 8.12 that

$$\begin{aligned} \textbf{S}_n^{{\lambda }}({\varvec{\theta }}) = n\begin{bmatrix} {\varvec{\mu }}^T\left( \overline{\textbf{X}}_n- E\textbf{X}\right) \\ \kappa \left( \overline{\textbf{X}}_n-E\textbf{X}\right) \\ \end{bmatrix} = n\begin{bmatrix} {\varvec{\mu }}^T\overline{\textbf{X}}_n- A(\kappa )\\ \kappa \overline{\textbf{X}}_n-A(\kappa ){\varvec{\mu }}\\ \end{bmatrix}, \end{aligned}$$
(8.16)

and then from Eq. 8.16 we find

$$\begin{aligned} E(\textbf{S}_n^{{\lambda }}({\varvec{\theta }}))=\textbf{0}, \end{aligned}$$
(8.17)

and (using Eqs. 8.3 and 8.7)

$$\begin{aligned} \textrm{Var}(\textbf{S}_n^{{\lambda }}({\varvec{\theta }}))= & n\begin{bmatrix} a(\kappa ) & \kappa a(\kappa ){\varvec{\mu }}^T \\ \kappa a(\kappa ){\varvec{\mu }}& \kappa ^2\textrm{Var}(\textbf{X}) \end{bmatrix}_{(d+1)\times (d+1)} =: n\textbf{V}({\varvec{\theta }}). \end{aligned}$$
(8.18)

The \((d+1)\times (d+1)\) matrix \(\textbf{V}({\varvec{\theta }})\) is positive semidefinite, having rank d. To see this, we can write, explicitly,

$$\begin{aligned} \textbf{V}({\varvec{\theta }}) = \begin{bmatrix} {\varvec{\mu }}^T\textrm{Var}(\textbf{X}){\varvec{\mu }}& \kappa a(\kappa ){\varvec{\mu }}^T \\ \kappa a(\kappa ){\varvec{\mu }}& \kappa ^2\textrm{Var}(\textbf{X}) \end{bmatrix} = \begin{bmatrix} {\varvec{\mu }}^T\\ \kappa \textbf{I}_d \end{bmatrix}\textrm{Var}(\textbf{X}) \begin{bmatrix} {\varvec{\mu }}&\kappa \textbf{I}_d \end{bmatrix}. \end{aligned}$$
(8.19)

Let \(\mathbf{{u}}\) be a unit vector in \(\mathbb {R}^{d+1}\) partitioned as \([u_1\ \mathbf{{u}}_R^T]^T\). Then

$$ [u_1\ \mathbf{{u}}_R^T]\begin{bmatrix} {\varvec{\mu }}^T \\ \kappa \textbf{I}_d \end{bmatrix}= u_1{\varvec{\mu }}^T+\kappa \mathbf{{u}}_R^T $$

is 0 iff \(u_1=-\kappa \) and \(\mathbf{{u}}_R={\varvec{\mu }}\). Since \(\textrm{Var}(\textbf{X})\) is positive definite, \(\mathbf{{u}}^T\textbf{V}({\varvec{\theta }})\mathbf{{u}}>0\) except when \(\mathbf{{u}}=[-\kappa \ {\varvec{\mu }}^T]^T\), and so \(\textrm{Var}(\textbf{S}_n^{{\lambda }}({\varvec{\theta }}))\) is positive semidefinite with rank d.

Formulae Eqs. 8.128.19 hold for all \({\varvec{\theta }}\in \Theta \), \({\lambda }\in \mathbb {R}\). In particular, choosing \({\varvec{\theta }}={\varvec{\theta }}_0\), and letting \(E_0\) and \(\textrm{Var}_0\) denote expectation and variance when \({\varvec{\theta }}={\varvec{\theta }}_0\), Eq. 8.15 gives for the true value of \({\lambda }\)

$$\begin{aligned} {\lambda }_0= -\frac{\kappa _0c_{\kappa _0}}{2ac(\kappa _0)}=-\frac{\kappa _0}{2a}|E_0\textbf{X}|, \ \textrm{so}\ -2a {\lambda }_0 = \kappa _0 A(\kappa _0). \end{aligned}$$
(8.20)

Recall that \(\textbf{D}_n:=\sqrt{n}\textbf{I}_{d+1}\). Then \({\lambda }_{\min }(\textbf{D}_n\textbf{D}_n^T)=n\rightarrow \infty \) as \(n\rightarrow \infty \), so Eq. 3.2 holds. We also have

$${\textrm{Var}}_0\big(\mathbf{D}_n^{-1}\mathbf{S}_n^{\lambda_0}(\varvec{\theta}_0)\big)=\mathbf{V}_0,$$

where \(\textbf{V}_0\) is the finite matrix \(\textbf{V}({\varvec{\theta }})\) defined in Eq. 8.18 evaluated at \({\varvec{\theta }}={\varvec{\theta }}_0\). So Eq. 3.3 holds by Eq. 8.17 and Chebychev’s inequality.

Next we have to check Eq. 3.4. It turns out that the \(\textbf{F}_n^{\lambda }({\varvec{\theta }})\) are not positive definite so we will need to make choices for \(\mathbf{{H}}({\varvec{\theta }})\) and \(\mathbf{{G}}_n\) in Eq. 2.6. Note that, for all \((\kappa , {\varvec{\mu }})\) and \({\lambda }\), by Eqs. 8.13 and 8.15,

$$\begin{aligned} n^{-1}\textbf{F}_n^{\lambda }({\varvec{\theta }})\overset{\textrm{P}}{\longrightarrow } \textbf{F}^{\lambda }({\varvec{\theta }}): =\begin{bmatrix} a(\kappa ) & -|E\textbf{X}|{\varvec{\mu }}^T \\ -|E\textbf{X}|{\varvec{\mu }}& -2{\lambda }a \textbf{I}_{d} \end{bmatrix} =\begin{bmatrix} a(\kappa ) & -A(\kappa ){\varvec{\mu }}^T \\ -A(\kappa ){\varvec{\mu }}& \kappa A(\kappa ) \textbf{I}_{d} \end{bmatrix}, \end{aligned}$$
(8.21)

as \(n\rightarrow \infty \). This matrix has determinant

$$\begin{aligned}= & ( \kappa A(\kappa ) )^d\Big (a(\kappa )-\frac{A^2(\kappa )}{ \kappa A(\kappa ) }\Big ) \\= & (\kappa A(\kappa ))^d\Big (a(\kappa )-\frac{A(\kappa )}{\kappa }\Big ) \ \mathrm{(by}\ \text {Eq. 8.15}) \\= & -\kappa ^{-1}(\kappa A(\kappa ))^d g(\kappa ), \end{aligned}$$

where we let

$$\begin{aligned} g(\kappa )=A(\kappa )-\kappa a(\kappa ) =A(\kappa )-\kappa A'(\kappa ), \end{aligned}$$

with \(A(\kappa )=c_\kappa /c(\kappa )\), and \(A'(\kappa )=a(\kappa )\). The function \(g(\kappa )\) is positive for all \(\kappa >0\). This follows from Eqs. 8.4 and 8.5, which imply \(g(0)=0\) and \(g'(\kappa )=-\kappa A''(\kappa )>0\). Thus \(g(\kappa )\) is strictly increasing hence \(g(\kappa )>0\) for all \(\kappa >0\). Thus \(\textrm{det}(\textbf{F}^{\lambda }({\varvec{\theta }}))<0\) when \({\lambda }\) satisfies Eq. 8.15.

It follows that \(\textbf{F}_n^{\lambda }({\varvec{\theta }})\) is nonsingular but not definite near \({\lambda }_0\). Set \(\mathbf{{G}}_n:= \sqrt{bn}\textbf{I}_{d+1}/\sqrt{2}\), where \(b>0\), and, following Eqs. 2.6, 8.11 and 8.13, define

$$\begin{aligned} \textbf{F}_n^{{\lambda }*}({\varvec{\theta }})= & \textbf{F}_n^{\lambda }({\varvec{\theta }})+\mathbf{{H}}({\varvec{\theta }})\mathbf{{G}}_n\mathbf{{G}}_n^T\mathbf{{H}}^T({\varvec{\theta }})\\ & \\= & n\begin{bmatrix} a(\kappa ) & -\overline{\textbf{X}}_n^T \\ -\overline{\textbf{X}}_n& \kappa A(\kappa ) \textbf{I}_{d}+b{\varvec{\mu }}{\varvec{\mu }}^T \end{bmatrix}. \end{aligned}$$

This is positive definite with high probability for n large enough in a neighbourhood of \((\kappa _0, {\varvec{\mu }}_0)\) and \({\lambda }_0\) provided b is chosen large enough. To see this, consider the limit in probability

$$\begin{aligned} \frac{1}{n}\textbf{F}_n^{{\lambda }*}({\varvec{\theta }}) \overset{\textrm{P}}{\longrightarrow } \textbf{F}^{{\lambda }*}({\varvec{\theta }}) := \begin{bmatrix} a(\kappa ) & -A(\kappa ) {\varvec{\mu }}^T \\ -A(\kappa ){\varvec{\mu }}& \kappa A(\kappa ) \textbf{I}_{d}+b{\varvec{\mu }}{\varvec{\mu }}^T \end{bmatrix}. \end{aligned}$$

A necessary and sufficient condition for this to be positive definite is that \(\textbf{u}^T\textbf{F}^{{\lambda }*}({\varvec{\theta }})\textbf{u}>0\) for all \(\textbf{u}\in S^{d+1}\). Let \(\textbf{u}\) be such a unit vector, partitioned as \([u_1\ \mathbf{{u}}_R^T]^T\). Then we can calculate

$$\begin{aligned}\textbf{u}^T\textbf{F}^{{\lambda }*}({\varvec{\theta }})\textbf{u}= a(\kappa ) \left( u_1 -\frac{A(\kappa )}{a(\kappa } \textbf{u}_R^T{\varvec{\mu }}\right) ^2 +\kappa A(\kappa )|\textbf{u}_R|^2 + \left( b -\frac{A^2(\kappa )}{a(\kappa }\right) | \textbf{u}_R^T{\varvec{\mu }}|^2. \end{aligned}$$

This is positive for any choice of \(b\ge A^2(\kappa )/a(\kappa )\). Then \(\textbf{F}^{{\lambda }_0*}({\varvec{\theta }})\) is positive definite for any such choice of b, and it follows by continuity considerations that \({\lambda }_{\min }(\textbf{F}^{{\lambda }*}({\varvec{\theta }}))>0\) for all \({\varvec{\theta }}\in \Theta \) (not just \({\varvec{\theta }}\in \Theta ^h\)) when \({\lambda }\) is near \({\lambda }_0\). Since \(n^{-1}{\lambda }_{\min }(\textbf{F}_n^{{\lambda }_0 *}({\varvec{\theta }}))\overset{\textrm{P}}{\longrightarrow } {\lambda }_{\min }(\textbf{F}^{{\lambda }_0 *}({\varvec{\theta }}))\) for all \({\varvec{\theta }}\in \Theta \) and we have chosen \(\textbf{D}_n\) proportional to \(\sqrt{n}\) we see that (3.4) holds.

We have now verified that Eqs. 3.2, 3.3, and 3.4 hold for the Langevin with the choice \(b= A^2(\kappa )/a(\kappa )\), so we can deduce from Theorem 3.1 that there is a consistent estimator \(\widehat{\varvec{\theta }}_n=[\widehat{\kappa }_n\ \widehat{\varvec{\mu }}_n^T]^T\) for \({\varvec{\theta }}_0\), satisfying (3.5). Recall that \(\widehat{\varvec{\theta }}_n\) does not depend on \({\lambda }_0\) or on the choice of a, b or \(\textbf{D}_n\).

Next we look for a consistent estimator \(\widehat{\lambda }_n\) for \({\lambda }_0\). The system (3.7) here has the form, by Eq. 8.12,

$$\begin{aligned} \textbf{S}_n^{\lambda }(\widehat{\varvec{\theta }}_n)=n\begin{bmatrix} \widehat{\varvec{\mu }}_n^T\overline{\textbf{X}}_n- A(\widehat{\kappa }_n)\\ \widehat{\kappa }_n\overline{\textbf{X}}_n+2{\lambda }a \widehat{\varvec{\mu }}_n\\ \end{bmatrix}=\textbf{0} \end{aligned}$$
(8.22)

with a unique solution \(\widehat{\lambda }_n\) obtained from \(2\widehat{\lambda }_n a\widehat{\varvec{\mu }}_n=-\widehat{\kappa }_n\overline{\textbf{X}}_n\) (via the second equation in Eq. 8.22), which implies

$$\begin{aligned} -2a\widehat{\lambda }_n =\widehat{\kappa }_n |\overline{\textbf{X}}_n| =\widehat{\kappa }_n A(\widehat{\kappa }_n) \end{aligned}$$
(8.23)

(via the first equation in Eq. 8.22). Then (via the second equation in Eq. 8.22)

$$\begin{aligned} \widehat{\varvec{\mu }}_n = \frac{\widehat{\kappa }_n\overline{\textbf{X}}_n}{-2\widehat{\lambda }_na} =\frac{\overline{\textbf{X}}_n}{|\overline{\textbf{X}}_n|} \end{aligned}$$

and \(\widehat{\kappa }_n\) is the solution to

$$\begin{aligned} A(\widehat{\kappa }_n) = \widehat{\varvec{\mu }}_n^T\overline{\textbf{X}}_n=|\overline{\textbf{X}}_n| \end{aligned}$$

(via the first equation in Eq. 8.22). It’s already clear from Eqs. 8.20 and 8.23 and the consistency of \((\widehat{\kappa }_n,\widehat{\varvec{\mu }}_n)\) for \((\kappa _0,{\varvec{\mu }}_0)\) that \(\widehat{\lambda }_n\) is consistent for \({\lambda }_0\), but to complete the example just note that we established Eq. 3.9 in Eq. 8.14, while Eq. 3.11 follows from

$$\begin{aligned}\frac{1}{n}\big (\textbf{S}_n^{\widehat{\lambda }_n}(\widehat{\varvec{\theta }}_n)-\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\big )= & \begin{bmatrix} (\widehat{\varvec{\mu }}_n- {\varvec{\mu }}_0)^T\overline{\textbf{X}}_n-\big ( A(\widehat{\kappa }_n) -A(\kappa _0)\big ) \\ & & \\ (\widehat{\kappa }_n-\kappa _0)\overline{\textbf{X}}_n-\big (\widehat{\kappa }_n A(\widehat{\kappa }_n)\widehat{\varvec{\mu }}_n- \kappa _0 A(\kappa _0){\varvec{\mu }}_0\big ) \\ \end{bmatrix} \overset{\textrm{P}}{\longrightarrow } \textbf{0},\\ & \end{aligned}$$

by the weak law of large numbers, and since we already know that \(\widehat{\varvec{\theta }}_n\overset{\textrm{P}}{\longrightarrow } {\varvec{\theta }}_0\). Thus all conditions of Theorem 3.2 are satisfied and we can conclude that \(\widehat{\lambda }_n\) is consistent for \({\lambda }_0\).

Now we find the asymptotic distribution of \((\widehat{\varvec{\theta }}_n,\widehat{\lambda }_n)\) by applying Theorem 4.1. Checking the conditions, Eq. 4.1 follows from Eqs. 8.22 and 8.23, and 4.3 follows immediately since \(\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\) is a sum of i.i.d. random vectors with expectation \(\textbf{0}\) and variance matrix \(n\textbf{V}_0\); thus, recalling that \(\textbf{D}_n=\sqrt{n}\textbf{I}_{d+1}\),

$$\begin{aligned} \textbf{D}_n^{-1}\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)= \frac{1}{\sqrt{n}} \textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0) \overset{\textrm{D}}{\longrightarrow } \textbf{N}(\textbf{0},\textbf{V}_0):= \textbf{Z}\in \mathbb {R}^{d+1}. \end{aligned}$$
(8.24)

Since \(\textbf{V}_0\) has rank \(d<d=1\), the normal rv Z is concentrated on \(\mathbb {R}^d\).

For Eq. 4.5, choose \(\textbf{J}_n=\sqrt{n}\textbf{I}_{d+2}\). Using Eqs. 4.2, 8.11 and 8.13, we calculate

$$\begin{aligned} \mathbf{{U}}_n^{{\lambda }}({\varvec{\theta }})= n\begin{bmatrix} a(\kappa ) & -\overline{\textbf{X}}_n^T & 0 \\ -\overline{\textbf{X}}_n& -2{\lambda }a \textbf{I}_{d} & -2{\varvec{\mu }}\\ 0^T & -2{\varvec{\mu }}^T & 0 \end{bmatrix}. \end{aligned}$$

Then the \((d+2)\times (d+2)\) matrix \(\mathbf{{U}}_n^{{\lambda }}({\varvec{\theta }})\) is nonsingular with inverse satisfying

$$\begin{aligned} \textbf{J}_n^{T}(\mathbf{{U}}_n^{\widehat{\lambda }_n}(\widehat{\varvec{\theta }}_n))^{-1}\textbf{J}_n\overset{\textrm{P}}{\longrightarrow } \textbf{W}=:\begin{bmatrix} \textbf{P} & \textbf{Q}\\ \textbf{Q}^T & \textrm{R}\end{bmatrix}. \end{aligned}$$
(8.25)

where \(\textbf{P}\) is a \((d+1)\times (d+1)\) matrix, \(\textbf{Q}\) is a vector in \(\mathbb {R}^{d+1}\), and \(\textrm{R}\) is a scalar, all given by

$$\begin{aligned} \textbf{P}:= \begin{bmatrix} \frac{1}{a(\kappa _0)} & \textbf{0}^T \\ \textbf{0} & \frac{1}{\kappa _0A(\kappa _0)} (\textbf{I}_d-{\varvec{\mu }}_0{\varvec{\mu }}_0^T) \end{bmatrix}, \qquad \textbf{Q}:= -\frac{1}{2}\begin{bmatrix} \frac{A(\kappa _0)}{a(\kappa _0)} \\ {\varvec{\mu }}_0 \\ \end{bmatrix}, \end{aligned}$$
(8.26)

and

$$\begin{aligned} \left( A(\kappa _0)- \kappa _0 a(\kappa _0)\right) = \frac{A(\kappa _0)}{4a(\kappa _0)}g(\kappa _0)>0. \end{aligned}$$

The claimed inverse in Eq. 8.25 can be verified by direct multiplication.

Applying Theorem 4.1 and recalling \(\textbf{W}\) in Eq. 8.25 we obtain, as \(n\rightarrow \infty \),

$$\begin{aligned} \sqrt{n} \begin{bmatrix}\widehat{\varvec{\theta }}_n-{\varvec{\theta }}_0\\ \widehat{\lambda }_n-{\lambda }_0\end{bmatrix}&\overset{\textrm{D}}{\longrightarrow }&\textbf{W}\begin{bmatrix} \textbf{Z}\\ \textbf{0} \end{bmatrix} = \begin{bmatrix} \textbf{P} & \textbf{Q}\\ \textbf{Q}^T & \textrm{R}\end{bmatrix} \begin{bmatrix} \textbf{Z}\\ \textbf{0} \end{bmatrix} = \begin{bmatrix} \textbf{P}\textbf{Z}\\ \textbf{Q}^T\textbf{Z}\end{bmatrix} \\ & \\&\overset{\textrm{D}}{=}&\textbf{N}\left( \textbf{0}, \begin{bmatrix} \textbf{P}\textbf{V}_0\textbf{P}^T & \textbf{P}\textbf{V}_0\textbf{Q}\\ \textbf{Q}^T\textbf{V}_0\textbf{P}^T & \textbf{Q}^T\textbf{V}_0\textbf{Q}\end{bmatrix} \right) . \end{aligned}$$
(8.27)

From Eqs. 8.18, 8.19 and 8.26 we can calculate

$$\begin{aligned} \textbf{Q}^T\textbf{V}_0\textbf{Q}=\frac{1}{4a(\kappa _0)}\left( A(\kappa _0)+\kappa _0a(\kappa _0)\right) ^2 \end{aligned}$$

(of course \(\textbf{P}^T= \textbf{P}\) here), and

$$\begin{aligned} \textbf{P}\textbf{V}_0\textbf{Q}=-\frac{1}{2} \begin{bmatrix} \frac{A(\kappa _0)}{a(\kappa _0)} +\kappa _0\\ \textbf{0} \end{bmatrix}. \end{aligned}$$

Next we check the conditions in Theorem 5.1. First, \(\textbf{D}_n^{-1}\textbf{S}_n^{{\lambda }_0}({\varvec{\theta }}_0)\overset{\textrm{D}}{\longrightarrow } \textbf{Z}\) was shown in Eqs. 8.24, and 5.4 follows from Eq. 8.25. Next, to check Eq. 5.5: recall that \(\textbf{J}_n=\sqrt{n}\textbf{I}\). The matrix \(\textbf{J}_n^{-1}\big (\textbf{U}_n^{{\lambda }_0}({\varvec{\theta }})-\textbf{U}_n^{{\lambda }_0}({\varvec{\theta }}_0)\big )\textbf{J}_n^{-T}\) in this case has upper left diagonal submatrix \(\big (\textbf{F}_n^{{\lambda }_0}({\varvec{\theta }})- \textbf{F}_n^{{\lambda }_0}({\varvec{\theta }}_0)\big )/n\) and lower right diagonal element 0. The off-diagonal elements are also zero. Eqs. 8.13 and 8.20 give

$$\begin{aligned} \frac{1}{n} \big (\textbf{F}_n^{{\lambda }_0}({\varvec{\theta }})- \textbf{F}_n^{{\lambda }_0}({\varvec{\theta }}_0)\big ) =\begin{bmatrix} a(\kappa ) -a(\kappa _0)& \ \textbf{0}^T\\ \textbf{0}& \textbf{0} \end{bmatrix}, \end{aligned}$$
(8.28)

and \({\varvec{\theta }}\in N_n(A)\) implies \(|{\varvec{\theta }}-{\varvec{\theta }}_0|\le An^{-1/2}\), so the RHS of Eq. 8.28 tends to 0 in probability as \(n\rightarrow \infty \) uniformly in a neighbourhood of \({\varvec{\theta }}_0\). So Eq. 5.5 holds.

For the limit random variable appearing in Eq. 5.6, we have \(\textbf{Z}\sim \textbf{N}(\textbf{0},\textbf{V}_0)\), where \(\textbf{V}_0\) is as in Eq. 8.19 with \({\varvec{\theta }}={\varvec{\theta }}_0\), so

$$\begin{aligned} \mathbf{{P}}^{T/2}\textbf{Z}\sim \textbf{N}\big (\textbf{0},\mathbf{{P}}^{T/2}\textbf{V}_0\mathbf{{P}}^{1/2}\big ), \end{aligned}$$

where \(\mathbf{{P}}\) is in Eq. 8.26. We calculate

$$\begin{aligned} \begin{bmatrix} 1 & \textbf{0}^T\\ \textbf{0} & \textbf{I}_d-{\varvec{\mu }}_0{\varvec{\mu }}_0^T \end{bmatrix}_{(d+1)\times (d+1)} \end{aligned}$$

and, with \(\textbf{F}_0\) as the limit in probability of \(n^{-1}\textbf{F}_n({\varvec{\theta }}_0\)) given by Eq. 8.10,

$$\begin{aligned} \mathbf{{P}}^{T/2}\textbf{F}_0= & \begin{bmatrix} \frac{1}{\sqrt{a(\kappa _0)}} & \textbf{0}^T \\ \textbf{0} & \frac{1}{\sqrt{\kappa _0|E_0\textbf{X}|}} (\textbf{I}_d-{\varvec{\mu }}_0{\varvec{\mu }}_0^T) \end{bmatrix} \times \begin{bmatrix} a(\kappa _0) & -A(\kappa _0){\varvec{\mu }}_0^T \\ & \\ -A(\kappa _0){\varvec{\mu }}_0 & \kappa _0A(\kappa _0)\textbf{I}_d \end{bmatrix} \\ & \\ & \\= & \begin{bmatrix} \sqrt{a(\kappa _0)} & \ \frac{-A(\kappa _0){\varvec{\mu }}_0^T}{\sqrt{a(\kappa _0)}} \\ & \\ 0 & \sqrt{\kappa _0A(\kappa _0)} (\textbf{I}_d-{\varvec{\mu }}_0{\varvec{\mu }}_0^T) \end{bmatrix}_{(d+1)\times (d+1)}. \end{aligned}$$

With this we can calculate the \( \dot{C}_{\Omega }\) and \(\dot{C}_{\tau }\) appearing in Theorem 5.1 for any desired hypothesis tests. Since we have i.i.d. observations we get a normal limit in Eq. 8.27 and corresponding \(\chi ^2\) distributions for \(d_n\).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maller, R., Ghodsi, M. A General Formulation for the Large-Sample Behaviour of a Class of Hypothesis Test Statistics. Sankhya A (2024). https://doi.org/10.1007/s13171-024-00364-8

Download citation

  • Received:

  • Published:

  • DOI: https://doi.org/10.1007/s13171-024-00364-8

Keywords

Navigation