Appendix A: Proof of Asymptotic Normality of Fiducial Estimators
We start with several assumptions. The assumptions A0–A6 are sufficient for the maximum likelihood estimate to converge asymptotically to a normal distribution and can be found in Lehmann and Casella (1998) as 6.3 (A0)–(A2) and 6.5 (A)–(D). The assumption B2 shows that the Jacobian converges to a prior (Hannig 2009) and B1 is the assumption necessary for the Bayesian solution to converge to that of the MLE (Ghosh and Ramamoorthi 2003, Theorem 1.4.1).
10.1.1 Assumptions
10.1.1.1 Conditions for Asymptotic Normality of the MLE
-
(A0)
The distributions \(P_{\boldsymbol{\xi}}\) are distinct.
-
(A1)
The set \(\left\{x:f(x|\boldsymbol{\xi})>0\right\}\) is independent of the choice of \(\boldsymbol{\xi}\).
-
(A2)
The data \(\boldsymbol{X}=\{X_{1},\dots,X_{n}\}\) are independent identically distributed (i.i.d.) with probability density \(f(\cdot|\boldsymbol{\xi})\).
-
(A3)
There exists an open neighborhood about the true parameter value \(\boldsymbol{\xi}_{0}\) such that all third partial derivatives \(\left(\partial^{3}/\partial\xi_{i}\partial\xi_{j}\partial\xi_{k}\right)f(\boldsymbol{x}|\boldsymbol{\xi})\) exist in the neighborhood, denoted by \(B(\boldsymbol{\xi}_{0},\delta)\).
-
(A4)
The first and second derivatives of \(L(\boldsymbol{\xi},x)=\log f(x|\boldsymbol{\xi})\) satisfy
$$\begin{aligned} E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\xi_{j}}L(\boldsymbol{\xi},x)\right]=0\end{aligned}$$
and
$$\begin{aligned} I_{j,k}(\boldsymbol{\xi}) & = E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\xi_{j}}L(\boldsymbol{\xi},x)\cdot\frac{\partial}{\partial\xi_{k}}L(\boldsymbol{\xi},x)\right]\\ & = -E_{\boldsymbol{\xi}}\left[\frac{\partial^{2}}{\partial\xi_{j}\partial\xi_{k}}L(\boldsymbol{\xi},x)\right].\end{aligned}$$
-
(A5)
The information matrix \(I(\boldsymbol{\xi})\) is positive definite for all \(\boldsymbol{\xi}\in B(\boldsymbol{\xi}_{0},\delta)\)
-
(A6)
There exists functions \(M_{jkl}(\boldsymbol{x})\) such that
$$\begin{aligned} \sup_{\boldsymbol{\xi}\in B(\boldsymbol{\xi}_{0},\delta)}\left|\frac{\partial^{3}}{\partial\xi_{j}\partial\xi_{k}\partial\xi_{l}}L(\boldsymbol{\xi},x)\right|\le M_{j,k,l}(x)\quad\textrm{and}\;\;E_{\boldsymbol{\xi}_{0}}M_{j,k,l}(x)<\infty\end{aligned}$$
10.1.1.2 Conditions for the Bayesian Posterior Distribution to be Close to That of the MLE.
Let \(\pi(\boldsymbol{\xi})=E_{\boldsymbol{\xi}_{0}}J_{0}(X_{0},\boldsymbol{\xi})\) and \(L_{n}(\boldsymbol{\xi})=\sum L(\boldsymbol{\xi},X_{i})\)
-
(B1)
For any \(\delta>0\) there exists \(\epsilon>0\) such that
$$\begin{aligned} P_{\boldsymbol{\xi}_{0}}\left\{\sup_{\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)}\frac{1}{n}\left(L_{n}(\boldsymbol{\xi})-L_{n}(\boldsymbol{\xi}_{0})\right)\le-\epsilon\right\} \to1\end{aligned}$$
-
(B2)
\(\pi\left(\boldsymbol{\xi}\right)\) is positive at \(\boldsymbol{\xi}_{0}\)
10.1.1.3 Conditions for Showing That the Fiducial Distribution is Close to the Bayesian Posterior
-
(C1)
For any \(\delta>0\)
$$\begin{aligned} \inf_{\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)}\frac{\min_{i=1\dots n}L(\boldsymbol{\xi},X_{i})}{\left|L_{n}(\boldsymbol{\xi})-L_{n}(\boldsymbol{\xi}_{0})\right|}\stackrel{P_{\boldsymbol{\xi}_{0}}}{\longrightarrow}0\end{aligned}$$
-
(C2)
Let \(\pi(\boldsymbol{\xi})=E_{\boldsymbol{\xi}_{0}}J_{0}(X_{0},\boldsymbol{\xi})\). The Jacobian function \(J\left(\boldsymbol{X},\boldsymbol{\xi}\right)\stackrel{a.s.}{\to}\pi\left(\boldsymbol{\xi}\right)\) uniformly on compacts in \(\boldsymbol{\xi}\). In the single variable case, this reduces to \(J\left(\boldsymbol{X},\xi\right)\) is continuous in \(\xi,\)
\(\pi\left(\xi\right)\) is finite and \(\pi\left(\xi_{0}\right)>0\), and for some \(\delta_{0}\)
$$\begin{aligned} E_{\xi_{0}}\left(\sup_{\xi\in B\left(\xi_{0},\delta\right)}J_{0}\left(\boldsymbol{X},\xi\right)\right)<\infty.\end{aligned}$$
In the multivariate case, we follow Yeo and Johnson (2001). Let
$$\begin{aligned} J_{j}\left(x_{1},\dots,x_{j};\boldsymbol{\xi}\right)=E_{\boldsymbol{\xi}_{0}}\left[J_{0}\left(x_{1},\dots,x_{j},X_{j+1},\dots,X_{k};\boldsymbol{\xi}\right)\right].\end{aligned}$$
-
(C2.a)
There exists a integrable and symmetric functions \(g\left(x_{1},\dots,x_{j}\right)\) and compact space \(\bar{B}\left(\boldsymbol{\xi}_{0},\delta\right)\) such that for \(\boldsymbol{\xi}\in\bar{B}\left(\boldsymbol{\xi}_{0},\delta\right)\) then \(\left|J_{j}\left(x_{1},\dots,x_{j};\boldsymbol{\xi}\right)\right|\le g\left(x_{1},\dots,x_{j}\right)\) for \(j=1,\dots,k\).
-
(C2.b)
There exists a sequence of measurable sets \(S_{M}^{k}\) such that
$$\begin{aligned} P\left(\mathbb{R}^{k}-\cup_{M=1}^{\infty}S_{M}^{k}\right)=0\end{aligned}$$
-
(C2.c)
For each M and for all \(j\in1,\dots,k\), \(J_{j}\left(x_{1},\dots,x_{j};\boldsymbol{\xi}\right)\) is equicontinuous in \(\boldsymbol{\xi}\) for \(\{x_{1},\dots,x_{j}\}\in S_{M}^{j}\) where \(S_{M}^{k}=S_{M}^{j}S_{M}^{k-j}\).
10.1.2 Proof of Asymptotic Normality of Multivariate Fiducial Estimators
We now prove the asymptotic normality (Theorem 1) for multivariate fiducial estimators.
Proof.
Assume without loss of generality that \(\xi\in\boldsymbol{\Xi}=\mathbb{R}^{p}\). We denote \(J_{n}\left(\boldsymbol{x}_{n},\boldsymbol{\xi}\right)\) as the average of all possible Jacobians over a sample of size n and \(\pi\left(\boldsymbol{\xi}\right)=E_{\boldsymbol{\xi}_{0}}J_{0}\left(\boldsymbol{x},\boldsymbol{\xi}\right)\). Assumption C2 and the uniform strong law of large numbers for U-statistics imply that \(J_{n}\left(\boldsymbol{x},\boldsymbol{\xi}\right)\stackrel{a.s.}{\rightarrow}\pi\left(\boldsymbol{\xi}\right)\) uniformly in \(\boldsymbol{\xi}\in\bar{B}\left(\boldsymbol{\xi}_{0},\delta\right)\) and that \(\pi\left(\boldsymbol{\xi}\right)\) is continuous. Therefore,
$$\sup_{\boldsymbol{\xi}\in\bar{B}\left(\boldsymbol{\xi}_{0},\delta\right)}\left|J_{n}\left(x_{n},\boldsymbol{\xi}\right)-\pi\left(\boldsymbol{\xi}\right)\right|\to0\;P_{\boldsymbol{\xi}_{0}}\, a.s.$$
The multivariate proof now proceeds in a similar fashion as the univariate case. Let
$$\begin{aligned} \pi^{*}\left(\boldsymbol{s},\boldsymbol{x}\right) &= \frac{J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)f\left(\boldsymbol{x}_{n}|\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)}{\int_{\mathbb{R}^{p}}J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right)f\left(\boldsymbol{x}_{n}|\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right)\, d\!\boldsymbol{t}}\\ & = \frac{J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)\right]}{\int_{\mathbb{R}^{p}}J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right)\right]\, d\!\boldsymbol{t}}\\ & =\frac{J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]}{\int_{\mathbb{R}^{p}}J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\, d\!\boldsymbol{t}}\end{aligned}$$
and just as Ghosh and Ramamoorthi (2003), we let \(H=-\frac{1}{n}\frac{\partial}{\partial\boldsymbol{\xi}\partial\boldsymbol{\xi}}L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\) and we notice that \(H\to I\left(\boldsymbol{\xi}_{0}\right)\, a.s.P_{\boldsymbol{\xi}_{0}}\). It will be sufficient to prove
$$\begin{aligned}\int_{\mathbb{R}^{p}}\left|J_{n}\left({x}_{n},\hat{\xi}_{n}+\frac{t}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\xi}_{n}+\frac{t}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right.\nonumber\\\left.-\pi\left(\boldsymbol{\xi}_{0}\right){\rm exp}\left[\frac{-\boldsymbol{t}^{T}I\left(\boldsymbol{\xi}_{0}\right)\boldsymbol{t}}{2}\right]\right|d\!\boldsymbol{t} \stackrel{P_{\boldsymbol{\xi}_{0}}}{\rightarrow}0\end{aligned}$$
(10.3)
Let t
i
represent the ith component of vector \(\boldsymbol{t}\). By Taylor’s Theorem, we can compute
$$\begin{aligned} L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right) &= L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)+\sum_{i=1}^{p}\left(\frac{t_{i}}{\sqrt{n}}\right)\frac{\partial}{\partial\xi_{i}}L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\\ &+\frac{1}{2}\sum_{i=1}^{p}\sum_{j=1}^{p}\left(\frac{t_{i}t_{j}}{\left(\sqrt{n}\right)^{2}}\frac{\partial}{\partial\xi_{i}\partial\xi_{j}}L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right)\\ &+\frac{1}{6}\sum_{i=1}^{p}\sum_{j=1}^{p}\sum_{k=1}^{p}\left(\frac{t_{i}t_{j}t_{k}}{\left(\sqrt{n}\right)^{3}}\frac{\partial}{\partial\xi_{i}\partial\xi_{j}\partial\xi_{k}}L_{n}\left(\boldsymbol{\xi}'\right)\right)\\ & = L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)-\frac{\boldsymbol{t}^{T}H\boldsymbol{t}}{2}+R_{n}\end{aligned}$$
for some \(\boldsymbol{\xi}^\prime\in\left[\hat{\boldsymbol{\xi}}_{n},\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right]\). Notice that \(R_{n}=O_{p}\left(\left\Vert \boldsymbol{t}\right\Vert/n^{3/2}\right)\).
Given any \(0<\delta<\delta_{0}\) and \(c>0\), we break \(\mathbb{R}^{p}\) into three regions:
$$\begin{aligned} A_{1} & =\left\{\boldsymbol{t}:\;\left\Vert \boldsymbol{t}\right\Vert <c\log\sqrt{n}\right\}\\ A_{2}&=\left\{\boldsymbol{t}:c\log\sqrt{n}<\left\Vert \boldsymbol{t}\right\Vert <\delta\sqrt{n}\right\}\\ A_{3} & =\left\{\boldsymbol{t}:\;\delta\sqrt{n}<\left\Vert \boldsymbol{t}\right\Vert \right\}\end{aligned}$$
On \(A_{1}\cup A_{2}\) we compute
$$\begin{aligned} {\int_{A_{1}\cup A_{2}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right.}\\ \left.-\pi\left(\boldsymbol{\xi}_{0}\right){\rm exp}\left[-\frac{1}{2}\boldsymbol{t}'I\left(\boldsymbol{\xi}_{0}\right)\boldsymbol{t}\right]\right|d\!\boldsymbol{t}\\ \le \int_{A_{1}\cup A_{2}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-\pi\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)\right|\\ \cdot{\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]d\!\boldsymbol{t}\\ +\int_{A_{1}\cup A_{2}}\left|\pi\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right.\\ \left.-\pi\left(\boldsymbol{\xi}_{0}\right){\rm exp}\left[-\frac{1}{2}\boldsymbol{t}'I\left(\boldsymbol{\xi}_{0}\right)\boldsymbol{t}\right]\right|d\!\boldsymbol{t}\end{aligned}$$
Since \(\pi\left(\boldsymbol{\cdot}\right)\) is a proper prior on \(A_{1}\cup A_{2}\), then the second term goes to 0 by the Bayesian Bernstein-von Mises theorem. Next we notice that
$$\begin{aligned} {\int_{A_{1}\cup A_{2}}\left|J_{n}\left(x,\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-\pi\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)\right|}\\ \cdot{\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]d\!\boldsymbol{t}\\ \le \sup_{\boldsymbol{t}\in A_{1}\cup A_{2}}\left|J_{n}\left(x,\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-\pi\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)\right|\\ \cdot\int_{A_{1}\cup A_{2}}{\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]d\!\boldsymbol{t}\end{aligned}$$
Since \(\sqrt{n}\left(\hat{\boldsymbol{\xi}}_{n}-\boldsymbol{\xi}_{0}\right)\stackrel{\mathcal{D}}{\to}N\left(0,I\left(\boldsymbol{\xi}_{0}\right)^{-1}\right)\), then
$$\begin{aligned}P_{\boldsymbol{\xi}_{0}}\left[\left\{\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n};\;\boldsymbol{t}\in A_{1}\cup A_{2}\right\} \subset B\left(\boldsymbol{\xi}_{0},\delta_{0}\right)\right]\to1.\end{aligned}$$
Furthermore, since \(L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)=-\frac{\boldsymbol{t}^{T}H\boldsymbol{t}}{2}+R_{n}\) then the integral converges in probability to 1. Since \(\max_{\mathbf{t}\in A_{1}\cup A_{2}}\left\Vert \boldsymbol{t}/\sqrt{n}\right\Vert \le\delta\) and \(J_{n}\to\pi,\) then the term \(\to0\) in probability.
Next, we turn to
$$\begin{aligned} {\int_{A_{3}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right.}\\ \left.-\pi\left(\boldsymbol{\xi}_{0}\right){\rm exp}\left[\frac{-\boldsymbol{t}^{T}I\left(\boldsymbol{\xi}_{0}\right)\boldsymbol{t}}{2}\right]\right|\, d\!\boldsymbol{t}\\ \le \int_{A_{3}}J_{n}\left(\boldsymbol{x}_{i},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]d\!\boldsymbol{t}\\ +\int_{A_{3}}\pi\left(\boldsymbol{\xi}_{0}\right){\rm exp}\left[\frac{-\boldsymbol{t}^{T}I\left(\boldsymbol{\xi}_{0}\right)\boldsymbol{t}}{2}\right]d\!\boldsymbol{t}\end{aligned}$$
The second integral goes to 0 in \(P_{\boldsymbol{\xi}_{0}}\) probability because \(\min_{A_{3}}\left\Vert \boldsymbol{t}\right\Vert \to\infty\). As for the first integral,
$$\begin{aligned} {\int_{A_{3}}J_{n}\left(\boldsymbol{x},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]d\!\boldsymbol{t}}\\ =\frac{1}{n}\sum_{i=1}^{n}\int_{A_{3}}J\left(\boldsymbol{x}_{i},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]d\!\boldsymbol{t}\\ =\frac{1}{n}\sum_{i=1}^{n}\int_{A_{3}}J\left(\boldsymbol{x}_{i},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)f\left(x_{i}|\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)\\ {\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)-\log f\left(x_{i}|\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)\right]d\!\boldsymbol{t}\end{aligned}$$
Because \(J\left(\cdot\right)\) is a probability measure, then so is \(J\left(\cdot\right)f\left(\cdot\right)\). Assumption C1 assures that the exponent goes to \(-\infty\) and therefore the integral converges to 0 in probability.
Having shown Eq. 10.3, we now follow Ghosh and Ramamoorthi (2003) and let
$$\begin{aligned} C_{n}=\int_{\mathbb{R}^{p}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right|d\!\boldsymbol{t}\end{aligned}$$
then the main result to be proved (Eq. 10.2) becomes
$$\begin{aligned}C_{n}^{-1}\left\{\int_{\mathbb{R}^{p}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right.\right..\nonumber\\\left.\left.-C_{n}\frac{\sqrt{det\left|I\left(\boldsymbol{\xi}_{0}\right)\right|}}{\sqrt{2\pi}}e^{-\boldsymbol{s}^{T}I\left(\boldsymbol{\xi}_{0}\right)\mathbf{s}/2}\right|\right\} \, d\boldsymbol{s} & \stackrel{P_{\boldsymbol{\xi}_{0}}}{\to} & 0\end{aligned}$$
(10.4)
Because
$$\begin{aligned} \int_{\mathbb{R}^{p}}J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right){\rm exp}\left[-\frac{\boldsymbol{s}^{T}H\boldsymbol{s}}{2}\right]\, d\boldsymbol{s} & = J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right)\int_{\mathbb{R}^{p}}{\rm exp}\left[-\frac{\boldsymbol{s}^{T}H\boldsymbol{s}}{2}\right]\, d\boldsymbol{s}\\ &=J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right)\frac{\sqrt{2\pi}}{\sqrt{\det\left(H\right)}}\\ &\stackrel{a.s.}{\to} \pi\left(\boldsymbol{\xi}_{0}\right)\sqrt{\frac{2\pi}{\det\left(I\left(\boldsymbol{\xi}_{0}\right)\right)}}\end{aligned}$$
and Eq. 10.3 imply that \(C_{n}\stackrel{P}{\to}\pi\left(\boldsymbol{\xi}_{0}\right)\sqrt{\frac{2\pi}{\det\left(I\left(\boldsymbol{\xi}_{0}\right)\right)}}\) it is enough to show that the integral in Eq:10.4 goes to 0 in probability. This integral is less than \(I_{1}+I_{2}\) where
$$\begin{aligned} I_{1}& =\int_{\mathbb{R}^{P}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right.\\ &\left.-J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right){\rm exp}\left[\frac{-\boldsymbol{s}^{T}H\boldsymbol{s}}{2}\right]\right|\, d\boldsymbol{s}\end{aligned}$$
and
$$\begin{aligned} I_{2}=\int_{\mathbb{R}^{P}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right){\rm exp}\left[\frac{-\boldsymbol{s}^{T}H\boldsymbol{s}}{2}\right]-C_{n}\frac{\sqrt{det\left|I\left(\boldsymbol{\xi}_{0}\right)\right|}}{\sqrt{2\pi}}e^{-\boldsymbol{s}^{T}I\left(\boldsymbol{\xi}_{0}\right)\mathbf{s}/2}\right|\, d\boldsymbol{s}.\end{aligned}$$
Eq. 10.3 shows that \(I_{1}\to0\) in probability and I
2 is
$$\begin{aligned} I_{2}& =& \left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right)-C_{n}\frac{\sqrt{det\left|I\left(\boldsymbol{\xi}_{0}\right)\right|}}{\sqrt{2\pi}}\right|\int_{\mathbb{R}^{P}}{\rm exp}\left[\frac{-\boldsymbol{s}^{T}H\boldsymbol{s}}{2}\right]\, d\boldsymbol{s}\\ & \stackrel{P}{\to} & 0\end{aligned}$$
because \(J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right)\stackrel{P}{\to}\pi\left(\boldsymbol{\xi}_{0}\right)\) and \(C_{n}\stackrel{P}{\to}\pi\left(\boldsymbol{\xi}_{0}\right)\sqrt{\frac{2\pi}{\det\left(I\left(\boldsymbol{\xi}_{0}\right)\right)}}.\hfill\Box\)
Appendix B: Proof of Assumptions for Free-Knot Splines Using a Truncated Polynomial Basis
We now consider the free-knot spline case. Suppose we are interested in a p degree (order \(m=p+1\)) polynomial spline with κ knot points, \(\boldsymbol{t}=\left\{t_{1},\dots,t_{\kappa}\right\} ^{T}\) where \(t_{k}\in(a+\delta,b-\delta)\) and \(\left|t_{i}-t_{j}\right|\le\delta\) for \(i\ne j\) and some \(\delta>0\). Furthermore, we assume that the data points \(\left\{x_{i},y_{i}\right\}\) independent with the distribution of the x
i
having positive density on \(\left[a,b\right]\).
Denote the truncated polynomial spline basis functions as
$$\begin{aligned} N(x,\boldsymbol{t}) & = \left\{N_{1}(x,\boldsymbol{t}),\dots,N_{\kappa+m}(x,\boldsymbol{t})\right\} ^{T}\\ & = \left\{1,x,\dots,x^{p},(x-t_{1})_{+}^{p},\dots,(x-t_{\kappa})_{+}^{p}\right\} ^{T}\end{aligned}$$
and let \(y_{i}=N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}+\sigma\epsilon_{i}\) where \(\epsilon_{i}\stackrel{iid}{\sim}N(0,1)\) and thus the density function is
$$\begin{aligned} f(y,\boldsymbol{\xi})=\frac{1}{\sqrt{2\pi\sigma^{2}}}{\rm exp}\left[-\frac{1}{2\sigma^{2}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right]\end{aligned}$$
where \(\boldsymbol{\xi}=\{\boldsymbol{t},\boldsymbol{\alpha},\sigma^{2}\}\) and the log-likelihood function is
$$\begin{aligned} L(\boldsymbol{\xi},y)=\frac{1}{2}\log2\pi-\frac{1}{2}\log\sigma^{2}-\frac{1}{2\sigma^{2}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\end{aligned}$$
10.2.1 Assumptions A0–A4
Assumptions A0–A2 are satisfied. We now consider assumption A3 and A4. We note that if \(p\ge4\) then the necessary three continuous derivatives exist and now examine the derivatives. Let \(\boldsymbol{\theta}=\{\boldsymbol{t},\boldsymbol{\alpha}\}\) and thus
$$\begin{aligned} E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}}L(\boldsymbol{\xi},y)\right] & =E_{\boldsymbol{\xi}}\left[-\frac{1}{2\sigma^{2}}2\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(-\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right]\\ & =-\frac{1}{2\sigma^{2}}2\left(E_{\boldsymbol{\xi}}\left[y\right]-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(-\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\\ & = 0\end{aligned}$$
and
$$\begin{aligned} E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},\boldsymbol{y})\right] & =E_{\boldsymbol{\xi}}\left[-\frac{1}{2\sigma^{2}}+\frac{1}{2\left(\sigma^{2}\right)^{2}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right]\\ & =-\frac{1}{2\sigma^{2}}+\frac{1}{2\left(\sigma^{2}\right)^{2}}\left(\sigma^{2}\right)\\ & =0.\end{aligned}$$
Next, we consider information matrix. First, we consider the \(\boldsymbol{\theta}\) terms.
$$\begin{aligned} {E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}}L(\boldsymbol{\xi},\boldsymbol{y})\,\frac{\partial}{\partial\theta_{k}}L(\boldsymbol{\xi},\boldsymbol{y})\right]} & =E_{\boldsymbol{\xi}}\left[\frac{1}{\sigma^{4}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\left(\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right]\\ &=\frac{1}{\sigma^{4}}E_{\boldsymbol{\xi}}\left[\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right]\left(\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\\ &=\frac{1}{\sigma^{2}}\left(\frac{\partial}{\partial\theta_{j}}N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{k}}N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\end{aligned}$$
The j,k partials for the second derivative are
$$\begin{aligned} {\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}L(\boldsymbol{\xi},y)}&=\frac{\partial}{\partial\theta_{j}}\left[-\frac{1}{2\sigma^{2}}2\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(-\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right]\\ &=\frac{\partial}{\partial\theta_{j}}\left[-\frac{1}{\sigma^{2}}\left(-y_{i}\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)+N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right)\right]\\ &=-\frac{1}{\sigma^{2}}\left[-y\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}+\left(\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right.\\ &\left.+N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right]\end{aligned}$$
which have expectation
$$\begin{aligned} E_{\boldsymbol{\xi}}\left[\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}L(\boldsymbol{\xi},y)\right] & =-\frac{1}{\sigma^{2}}\left(\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\\ & =-E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}}L(\boldsymbol{\xi},y)\,\frac{\partial}{\partial\theta_{k}}L(\boldsymbol{\xi},\boldsymbol{y})\right]\end{aligned}$$
as necessary. Next, we consider
$$\begin{aligned} {E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}}L(\boldsymbol{\xi},y)\,\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},y)\right]}\\ & =E_{\boldsymbol{\xi}}\left[\frac{1}{\sigma^{2}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\left[-\frac{1}{2\sigma^{2}}+\frac{1}{2\sigma^{4}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right]\right]\\ & =E_{\boldsymbol{\xi}}\left[-\frac{1}{2\sigma^{4}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}+\frac{1}{2\sigma^{6}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{3}\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right]\\ & =0\end{aligned}$$
which is equal to
$$\begin{aligned} E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}\partial\sigma^{2}}L(\boldsymbol{\xi},\boldsymbol{y})\right] & =E_{\boldsymbol{\xi}}\left[\frac{2}{2\sigma^{4}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right]\\ & =0.\end{aligned}$$
Finally,
$$\begin{aligned} {E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},y)\,\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},y)\right]}\\ & =E_{\boldsymbol{\xi}}\left[\left\{-\frac{1}{2\sigma^{2}}+\frac{1}{2\sigma^{4}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right\} \left\{-\frac{1}{2\sigma^{2}}+\frac{1}{2\sigma^{4}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right\} \right]\\ &=E_{\boldsymbol{\xi}}\left[\frac{1}{4\sigma^{4}}-\frac{2}{4\sigma^{6}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}+\frac{1}{4\sigma^{8}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{4}\right]\\ &=\frac{1}{4\sigma_{0}^{4}}-\frac{2}{4\sigma_{0}^{6}}\sigma_{0}^{2}+\frac{1}{4\sigma_{0}^{8}}3\sigma_{0}^{4}\\ &=\frac{2}{4\sigma_{0}^{4}}\end{aligned}$$
which is equal to
$$\begin{aligned} -E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\sigma^{2}\partial\sigma^{2}}L(\boldsymbol{\xi},y)\right] = & -E_{\boldsymbol{\xi}}\left[\frac{1}{2}\sigma^{-4}-\frac{2}{2}\sigma^{-6}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right]\\ = & -\frac{1}{2}\sigma_0^{-4} + \frac{2}{2}\sigma_0^{-4}.\end{aligned}$$
Therefore, the interchange of integration and differentiation is justified.
10.2.2 Assumptions A5
To address whether the information matrix is positive definite, we notice that since \(E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},y)\,\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},y)\right]>0\) and \(E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}}L(\boldsymbol{\xi},y)\,\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},y)\right]=0\), we only need to be concerned with the submatrix
$$\begin{aligned} I_{j,k}(\boldsymbol{\theta})&=\sum_{i=1}^{n}E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}}L(\boldsymbol{\xi},y_{i})\,\frac{\partial}{\partial\theta_{k}}L(\boldsymbol{\xi},y_{i})\right]\\ &=\frac{1}{\sigma^{2}}\sum_{i=1}^{n}\left(\frac{\partial}{\partial\theta_{j}}N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{k}}N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right).\end{aligned}$$
where the \(\sigma^{-2}\) term can be ignored because it does not affect the positive definiteness. First, we note
$$\begin{aligned} \frac{\partial}{\partial t_{j}}N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}&=-p\left(x_{i}-t_{j}\right)_{+}^{p-1}\alpha_{p+j+1}\\ \frac{\partial}{\partial\alpha_{j}}N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}&=N_{j}(x_{i},\boldsymbol{t}).\end{aligned}$$
If we let
$$\begin{aligned} X=\left[\begin{array}{cccccc} N_{1}\left(x_{1},\boldsymbol{t}\right) & \cdots & N_{m+\kappa}\left(x_{1},\boldsymbol{t}\right) & \frac{\partial}{\partial t_{1}}N(x_{1},\boldsymbol{t})^{T}\boldsymbol{\alpha} & \cdots & \frac{\partial}{\partial t_{\kappa}}N(x_{1},\boldsymbol{t})^{T}\boldsymbol{\alpha}\\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots\\ N_{1}\left(x_{n},\boldsymbol{t}\right) & \cdots & N_{m+\kappa}\left(x_{n},\boldsymbol{t}\right) & \frac{\partial}{\partial t_{1}}N(x_{n},\boldsymbol{t})^{T}\boldsymbol{\alpha} & \cdots & \frac{\partial}{\partial t_{\kappa}}N(x_{n},\boldsymbol{t})^{T}\boldsymbol{\alpha}\end{array}\right]\end{aligned}$$
then \(I(\boldsymbol{\theta})=X^{T}X\). Then, \(I(\boldsymbol{\theta})\) is positive definite if the columns of X are linearly independent. This is true under the assumptions that \(t_{j}\ne t_{k}\) and that \(\alpha_{m+j}\ne0\).
10.2.3 Assumptions A6
We next consider a bound on the third partial derivatives. We start with the derivatives of the basis functions.
$$\begin{aligned} \frac{\partial^{2}}{\partial t_{j}\partial t_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}=0\qquad\textrm{if}j\ne k\end{aligned}$$
$$\begin{aligned} \frac{\partial^{2}}{\partial t_{j}\partial t_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}=p(p-1)\left(x-t_{j}\right)_{+}^{p-2}\alpha_{p+j+1}\end{aligned}$$
$$\begin{aligned} \frac{\partial^{2}}{\partial\alpha_{j}\partial\alpha_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}=0\end{aligned}$$
$$\begin{aligned} \frac{\partial^{2}}{\partial t_{j}\partial\alpha_{p+j+1}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}=-p\left(x-t_{j}\right)_{+}^{p-1}\end{aligned}$$
$$\begin{aligned} \frac{\partial^{3}}{\partial t_{j}\partial t_{j}\partial t_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}=-p(p-1)(p-2)\left(x-t_{j}\right)_{+}^{p-3}\alpha_{p+j+1}\end{aligned}$$
$$\begin{aligned} \frac{\partial^{3}}{\partial t_{j}\partial t_{j}\partial\alpha_{p+j+1}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}=p(p-1)\left(x-t_{j}\right)_{+}^{p-2}\end{aligned}$$
Since, x is an element of a compact set, then for \(\boldsymbol{\xi}\in B(\boldsymbol{\xi}_{0},\delta)\) all of the earlier partials are bounded as is \(N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\). Therefore
$$\begin{aligned} &{\frac{\partial^{3}}{\partial\theta_{j}\partial\theta_{k}\partial\theta_{l}}L(\boldsymbol{\xi},x)}\\ & \qquad= -\frac{1}{\sigma^{2}}\left[-y\frac{\partial^{3}}{\partial\theta_{j}\partial\theta_{k}\partial\theta_{l}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}+\left(\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial^{2}}{\partial\theta_{l}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right.\\ &\qquad+\left(\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{l}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial^{2}}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\\ &\qquad+\left(\frac{\partial^{2}}{\partial\theta_{l}\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial^{2}}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\\ &\qquad\left.+N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\left(\frac{\partial^{3}}{\partial\theta_{j}\partial\theta_{k}\partial\theta_{l}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right]\end{aligned}$$
and
$$\begin{aligned} &{\frac{\partial^{3}}{\partial\theta_{j}\partial\theta_{k}\partial\sigma^{2}}L(\boldsymbol{\xi},x)}\\ & \qquad=\frac{1}{\sigma^{4}}\left[-y\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}+\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right.\\ &\qquad\left.+N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right]\end{aligned}$$
and
$$\begin{aligned} \frac{\partial^{3}}{\partial\theta_{j}\partial\sigma^{2}\partial\sigma^{2}}L(\boldsymbol{\xi},y)=-\frac{2}{\sigma^{6}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(-\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\end{aligned}$$
and
$$\begin{aligned} \frac{\partial^{3}}{\partial\sigma^{2}\partial\sigma^{2}\partial\sigma^{2}}L(\boldsymbol{\xi},\boldsymbol{y})=-\frac{1}{\sigma^{6}}+\frac{3}{\sigma^{8}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\end{aligned}$$
are also bounded \(\boldsymbol{\xi}\in B(\boldsymbol{\xi}_{0},\delta)\) since \(\sigma_{0}^{2}>0\) by assumption. The expectation of the bounds also clearly exists.
10.2.4 Lemmas
To show that the remaining assumptions are satisfied, we first examine the behavior of
$$\begin{aligned} g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})=N(x_{i},\boldsymbol{t}_{0})^{T}\boldsymbol{\alpha}_{0}-N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}.\end{aligned}$$
Notice that for x
i
chosen on a uniform grid over \([a,b]\) then
$$\begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2} \to\frac{1}{b-a}\int_{a}^{b}\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x)\right)^{2}\, dx.\end{aligned}$$
Furthermore we notice that \(g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x\right)\) is also a spline. The sum of the two splines is also a spline. Consider the degree p case of \(g\left(x|\boldsymbol{\alpha},t\right)+g\left(x|\boldsymbol{\alpha}^{*},t^{*}\right)\) where \(t<t^{*}.\) Then the sum is a spline with knot points \(\left\{t,t^{*}\right\}\) and whose first p + 1 coefficients are \(\boldsymbol{\alpha}+\boldsymbol{\alpha}^{*}\) and last two coefficients are \(\left\{\alpha_{p+1},\alpha_{p+1}^{*}\right\}\).
At this point, we also notice
$$\begin{aligned} E\left[n^{-1}\sum g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\epsilon_{i}\right] &=n^{-1}\sum g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)E\left[\epsilon_{i}\right]\\ &=0\end{aligned}$$
$$\begin{aligned} V\left[n^{-1}\sum g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\epsilon_{i}\right] &=n^{-2}V\left[\sum g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\epsilon_{i}\right]\\ &=n^{-2}\sum V\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\epsilon_{i}\right]\\ &=n^{-2}\sum g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)^{2}V\left[\epsilon_{i}\right]\\ &=n^{-2}\sum g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)^{2}\\ &\rightarrow 0\end{aligned}$$
and that \(\sum\epsilon_{i}^{2}\sim\chi_{n}^{2}\) and thus \(n^{-1}\sum\epsilon_{i}^{2}\) converges in probability to the constant 1. Therefore, by the SLLN,
$$\begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left[g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2} &=\frac{1}{n}\sum_{i=1}^{n}\left[g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i}\right)\right]^{2}+\frac{2\sigma_{0}}{n}\sum_{i=1}^{n}\epsilon_{i}g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i}\right)+\frac{\sigma_{0}^{2}}{n}\sum_{i=1}^{n}\epsilon_{i}^{2}\\ &=\frac{1}{n}\sum_{i=1}^{n}\left[g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i}\right)\right]^{2}+O_{p}\left(n^{-1}\right)+\frac{\sigma_{0}^{2}}{n}\sum_{i=1}^{n}\epsilon_{i}^{2}\\ &\stackrel{a.s.}{\rightarrow}\frac{1}{b-a}\int_{a}^{b}\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x)\right)^{2}\, dx+\sigma_{0}^{2}.\end{aligned}$$
Lemma 1.
Given a degree p polynomial
\(g(x|\boldsymbol{\alpha})\)
on
\([a,b]\)
with coefficients
\(\boldsymbol{\alpha}\)
, then
\(\exists\;\lambda_{n,m},\lambda_{n,M}>0\)
such that
\(||\boldsymbol{\alpha}||^{2}\lambda_{n,m}^{2}\le\frac{1}{n}\sum_{i=1}^{n}\left[g(x_{i}|\boldsymbol{\alpha})\right]^{2}\le||\boldsymbol{\alpha}||^{2}\lambda_{n,M}^{2}\)
.
Proof.
If \(\boldsymbol{\alpha}=\boldsymbol{0}\), then \(g\left(x|\boldsymbol{\alpha}\right)=0\) and the result is obvious. If \(g\left(x|\boldsymbol{\alpha}\right)\) is a polynomial with at least one non-zero coefficient, it therefore cannot be identically zero on \([a,b]\) and therefore for n > p then \(\frac{1}{n}\sum\left[g(x_{i}|\boldsymbol{\alpha})\right]^{2}>0\) since the polynomial can only have at most p zeros. We notice that
$$\begin{aligned} \int_{a}^{b}\left[g(x|\boldsymbol{\alpha})\right]^{2}\, dx & =\int_{a}^{b}\left[\sum_{i=0}^{p}\alpha_{i}^{2}x^{2i}+2\sum_{i=0}^{p-1}\sum_{j=i+1}^{p}\alpha_{i}\alpha_{j}x^{i+j}\right]dx\\ & =\left.\sum_{i=0}^{p}\frac{\alpha_{i}^{2}}{i+1}x^{2i+1}+2\sum_{i=0}^{p-1}\sum_{j=i+1}^{p}\frac{\alpha_{i}\alpha_{j}}{i+j+1}x^{i+j+1}\right|_{x=a}^{b}\\ & =\boldsymbol{\alpha}^{T}X\boldsymbol{\alpha}\end{aligned}$$
where the matrix \(\boldsymbol{X}\) has i,j element \(\left(b^{i+j}-a^{i+j}\right)/(i+j)\). Since \(\int_{a}^{b}\left[g(x|\boldsymbol{\alpha})\right]^{2}\, dx>0\) for all \(\boldsymbol{\alpha}\) then the matrix X must be positive definite. Next we notice that
$$\begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left[g(x_{i}|\boldsymbol{\alpha})\right]^{2} =& \frac{1}{n}\sum_{i=1}^{n}\boldsymbol{\alpha}^{T}\boldsymbol{X}_{i}\boldsymbol{\alpha}\\ = & \boldsymbol{\alpha}^{T}\left(\frac{1}{n}\sum\boldsymbol{X}_{i}\right)\boldsymbol{\alpha}\\ = & \boldsymbol{\alpha}^{T}\boldsymbol{X}_{n}\boldsymbol{\alpha}\end{aligned}$$
and therefore \(\boldsymbol{X}_{n}\to\boldsymbol{X}\) and therefore, denoting the eigenvalues of \(\boldsymbol{X}_{n}\) as \(\boldsymbol{\lambda}_{n}\) and the eigenvalues of \(\boldsymbol{X}\) as \(\boldsymbol{\lambda}\), we have \(\boldsymbol{\lambda}_{n}\to\boldsymbol{\lambda}\)
Letting \(\lambda_{n,m}\) and \(\lambda_{n,M}\) be the minimum and maximum eigenvalues of \(\boldsymbol{X}_{n}\) be the largest, then \(\lambda_{n,m}^{2}\left\Vert \boldsymbol{\alpha}\right\Vert ^{2}\le\frac{1}{n}\sum\left[g(x|\boldsymbol{\alpha})\right]^{2}\le\lambda_{n,M}^{2}\left\Vert \boldsymbol{\alpha}\right\Vert ^{2}.\hfill\Box\)
The values \(\lambda_{n,m},\lambda_{n,M}\) depend on the interval that the polynomial is integrated/summed over and that if a = b, then the integral is zero. In the following lemmas, we assume that there is some minimal distance between two knot-points and between a knot-point and the boundary values a,b.
Lemma 2.
Given a degree p spline
\(g(x|\boldsymbol{\theta})\)
with κ knot points on
\([a,b]\)
, let
\(\tau=\left(\left|a\right|\vee\left|b\right|\right)^{\kappa}\)
. Then
\(\forall\;\delta>2\tau,\;\exists\;\lambda_{n}>0\)
such that if
\(\left\Vert \boldsymbol{\theta}\right\Vert>\delta\)
then
\(\frac{1}{n}\sum\left[g(x_{i}|\boldsymbol{\theta})\right]^{2}>\left(\delta^{2}+\tau^{2}\right)\lambda_{n}\)
.
Proof.
Notice that \(||\boldsymbol{\theta}||^{2}>\delta^{2}>4\tau^{2}\) implies \(||\boldsymbol{\alpha}||^{2}>\delta^{2}-\tau^{2}\). First we consider the case of \(\kappa=1\). If \(\alpha_{0}^{2}+\dots+\alpha_{p}^{2}>\left(\delta^{2}+\tau^{2}\right)/9\) then \(\frac{1}{n}\sum\left[g(x_{i}|\boldsymbol{\theta})\right]^{2}1_{[a,t]}\left(x_{i}\right)>\lambda_{n}\left(\delta^{2}+\tau^{2}\right)\) for some \(\lambda_{n}>0.\) If \(\alpha_{0}^{2}+\dots+\alpha_{p}^{2}\le\left(\delta^{2}+\tau^{2}\right)/9\) then \(\alpha_{p+1}^{2}\ge3\left(\delta^{2}+\tau^{2}\right)/4\). Therefore \((\alpha_{p}+\alpha_{p+1})\), the coefficient of the x
p term of the polynomial on \([t_{1},b]\) is
$$\begin{aligned} \left\Vert \alpha_{p}+\alpha_{p+1}\right\Vert ^{2} &> \left\Vert \alpha_{p+1}\right\Vert ^{2}-\left\Vert \alpha_{p}\right\Vert ^{2}\\ &> \frac{3\left(\delta^{2}+\tau^{2}\right)}{4}-\frac{\left(\delta^{2}+\tau^{2}\right)}{4}\\ &> \frac{1}{2}\left(\delta^{2}+\tau^{2}\right)\end{aligned}$$
and thus the squared norm of the coefficients of the polynomial on \([t_{1},b]\) must also be greater than \(\frac{1}{2}\left(\delta^{2}+\tau^{2}\right)\) and thus \(\frac{1}{n}\sum\left[g(x_{i}|\boldsymbol{\theta})\right]^{2}1_{[t,b]}\left(x_{i}\right)>\lambda_{n}\left(\delta^{2}+\tau^{2}\right)\) for some \(\lambda_{n}>0.\) The proof for multiple knots is similar, only examining all \(\kappa+1\) polynomial sections for one with coefficients with squared norm larger than some fraction of \(\left(\delta^{2}+\tau^{2}\right).\hfill\Box\)
Lemma 3.
For all
\(\delta>0\)
, there exists
\(\lambda_{n}>0\)
such that for all
\(\boldsymbol{\theta}\notin B(\boldsymbol{\theta}_{0},\delta)\)
then
\(\frac{1}{n}\sum\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2}>\lambda_{n}\delta\)
.
Proof.
By the previous lemma, for all \(\Delta>2\tau\) there exists \(\exists\,\Lambda_{n}>0\) such that for all \(\boldsymbol{\theta}\notin B(\boldsymbol{\theta}_{0},\Delta)\) then \(\frac{1}{n}\sum\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2}>\Lambda_{n}\Delta\). We now consider the region
$$\begin{aligned} \mathcal{C}=\textrm{closure}\left[B\left(\boldsymbol{\theta}_{0},\Delta\right) B\left(\boldsymbol{\theta}_{0},\delta\right)\right]\end{aligned}$$
Assume to the contrary that there exists \(\delta>0\) such that \(\forall\,\lambda_{n}>0,\,\,\exists\boldsymbol{\theta}\in\mathcal{C}\) such that \(\frac{1}{n}\sum\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2}\le\lambda_{n}\delta\) and we will seek a contradiction. By the negation, there exists a sequence \(\boldsymbol{\theta}_{n}\in\mathcal{C}\) such that \(\frac{1}{n}\sum\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2}\to0\). But since \(\boldsymbol{\theta}_{n}\) is in a compact space, there exists a subsequence \(\boldsymbol{\theta}_{n_{k}}\) that converges to \(\boldsymbol{\theta}_{\infty}\in\mathcal{C}\) and \(\frac{1}{n}\sum\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2}=0\). But since \(\boldsymbol{\theta}_{0}\notin\mathcal{C}\) this is a contradiction.\(\hfill\Box\)
Corollary 4.
There exists λ such that for any
\(\delta>0\)
and
\(\boldsymbol{\theta}\notin B(\boldsymbol{\theta}_{0},\delta)\)
$$\begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left[g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2} \ge \lambda_{n}^{2}\delta^{2}+O_{p}\left(n^{-1/2}\right)+\sigma_{0}^{2}.\end{aligned}$$
We now focus our attention on the ratio of the maximum value of a polynomial and its integral.
Lemma 5.
Given a degree p polynomial
\(g\left(x|\boldsymbol{\alpha}\right)\)
on
\(\left[a,b\right]\)
, then
$$\begin{aligned} \frac{\max_{i\in\left\{1,\dots,n\right\}}\left[g\left(x_{i}|\boldsymbol{\alpha}\right)\right]^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left[g\left(x_{i}|\boldsymbol{\alpha}\right)\right]^{2}\, dx}\le\frac{\lambda_{M}^{2}}{\lambda_{n,m}^{2}}\to\frac{\lambda_{M}^{2}}{\lambda_{m}^{2}}\end{aligned}$$
for some
\(\lambda_{M},\lambda_{m}>0\).
Proof.
Since we can write \(\left[g\left(x|\boldsymbol{\alpha}\right)\right]^{2}=\boldsymbol{\alpha}^{T}W_{x}\boldsymbol{\alpha}\) for some nonnegative definite matrix W
x
which has a maximum eigenvalue \(\lambda_{M,x}\), and because the the maximum eigenvalue is a continuous function in x, let \(\lambda_{M}=\sup\lambda_{M,x}\). Then the maximum of \(\left[g\left(x|\boldsymbol{\alpha}\right)\right]^{2}\) over \(x\in[a,b]\) is less than \(\left\Vert \boldsymbol{\alpha}\right\Vert ^{2}\lambda_{M}^{2}\). The denominator is bounded from below by \(\left\Vert \boldsymbol{\alpha}\right\Vert ^{2}\lambda_{n,m}^{2}.\hfill\Box\)
Lemma 6.
Given a degree p spline
\(g\left(x|\boldsymbol{\theta}\right)\)
on
\(\left[a,b\right]\)
, then
$$\begin{aligned} \frac{\max\left[g\left(x|\boldsymbol{\theta}\right)\right]^{2}}{\int_{a}^{b}\left[g\left(x|\boldsymbol{\theta}\right)\right]^{2}\, dx}\le\frac{\lambda_{M}^{2}}{\lambda_{m}^{2}}\end{aligned}$$
for some
\(\lambda_{M},\lambda_{m}>0\).
Proof.
Since a degree p spline is a degree p polynomial on different regions defined by the knot-points, and because the integral over the whole interval \([a,b]\) is greater than the integral over the regions defined by the knot-points, we can use the previous lemma on each section and then chose the largest ratio.\(\hfill\Box\)
Lemma 7.
Given a degree p spline
\(g\left(x|\boldsymbol{\theta}\right)\)
on
\([a,b]\)
then
$$\frac{n^{-1/2}\max_{i}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}{n^{-1}\sum_{i=1}^{n}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}=O_{p}\left(1\right)$$
(10.5)
uniformly over
\(\boldsymbol{\theta}\).
Proof.
Notice
$$\begin{aligned} \frac{n^{-1/2}\max_{i}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}{n^{-1}\sum_{i=1}^{n}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}} & \le & \frac{2n^{-1/2}\max_{i}\left[\epsilon_{i}^{2}\sigma_{0}^{2}\right]+2n^{-1/2}\max_{i}\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}{n^{-1}\sum_{i=1}^{n}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}\\ & = & \frac{2\sigma_{0}^{2}n^{-1/2}\max_{i}\epsilon_{i}^{2}+\max_{i}\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}{n^{-1}\sum_{i=1}^{n}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}\\ & = & \frac{O_{p}\left(\frac{\log n}{\sqrt{n}}\right)+\max_{i}\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}{n^{-1}\sum_{i=1}^{n}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}\end{aligned}$$
and since \(n^{-1}\sum_{i=1}^{n}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}\stackrel{P}{\to}\frac{1}{b-a}\int_{a}^{b}\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x)\right)^{2}\, dx+\sigma_{0}^{2}\), and lemma 8 bounds the ratio of the terms that involve \(\boldsymbol{\theta}\), this ratio is bounded in probability uniformly over \(\boldsymbol{\theta}\).\(\hfill\Box\)
10.2.5 Assumptions B1
Returning to assumption B1, we now consider \(\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)\) and
$$\begin{aligned} L_{n}\left(\boldsymbol{\xi}\right) & =\sum\log\left\{\frac{1}{\sqrt{2\pi}\sigma}{\rm exp}\left[\frac{-1}{2\sigma}\sum\left(y_{i}-N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right]\right\}\\ & =-\frac{n}{2}\log\left(2\pi\right)-n\log\sigma-\frac{1}{2\sigma}\sum\left[y_{i}-N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right]^{2}\\ & =-\frac{n}{2}\log\left(2\pi\right)-n\log\sigma-\frac{1}{2\sigma}\sum\left[N(x_{i},\boldsymbol{t}_{0})^{T}\boldsymbol{\alpha}_{0}+\sigma_{0}\epsilon_{i}-N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right]^{2}\\ & =-\frac{n}{2}\log\left(2\pi\right)-n\log\sigma-\frac{1}{2\sigma}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}\end{aligned}$$
and therefore
$$\begin{aligned} {\frac{1}{n}}&\left(L_{n}(\boldsymbol{\xi})-L_{n}(\boldsymbol{\xi}_{0})\right)\\ & =-\log\sigma-\frac{1}{2n\sigma^{2}}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}+\log\sigma_{0}+\frac{1}{2n\sigma_{0}}\sum\left[g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}\\ & =\log\frac{\sigma_{0}}{\sigma}-\frac{1}{2n\sigma^{2}}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}+\frac{1}{2n\sigma_{0}^{2}}\sum\left[\sigma_{0}\epsilon_{i}\right]^{2}\\ & =\log\frac{\sigma_{0}}{\sigma}-\frac{\left(\lambda_{n}\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0}\right)\right)^{2}}{2\sigma^{2}}-\frac{\sigma_{0}^{2}}{2\sigma^{2}}+\frac{1}{2n}\sum\left[\epsilon_{i}\right]^{2}\end{aligned}$$
where
$$\begin{aligned} \left[\lambda_{n}\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0}\right)\right]^{2} =\frac{1}{n}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}-\sigma_{0}^{2}\end{aligned}$$
which converges in probability to \(\frac{1}{b-a}\int_{a}^{b}\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x\right)\right]^{2}dx\). The function goes to \(-\infty\) as \(\sigma\to0\) and \(\sigma\to\infty\). Taking the derivative
$$\begin{aligned}\frac{d}{d\sigma}\left[\log\frac{\sigma_{0}}{\sigma}-\frac{1}{2\sigma^{2}}\left[\left(\lambda_{n}\right)^{2}+\sigma_{0}^{2}\right]+\frac{1}{2n}\sum\epsilon_{i}^{2}\right]=-\frac{1}{\sigma}+\frac{1}{\sigma^{3}}\left[\left(\lambda_{n}\right)^{2}+\sigma_{0}^{2}\right]\end{aligned}$$
and setting it equal to zero yields a single critical point of at \(\sigma^{2}=\left[\left(\lambda_{n}\right)^{2}+\sigma_{0}^{2}\right]\) which results in a maximum of
$$\log\left(\frac{\sigma_{0}}{\sqrt{\left(\lambda_{n}\right)^{2}+\sigma_{0}^{2}}}\right)-\frac{1}{2}+\frac{1}{2}n^{-1}\sum\epsilon_{i}^{2}$$
(10.6)
which bounded away from zero in probability for \(\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)\)
10.2.6 Assumption C1
Assumption C1 is
$$\begin{aligned}\inf_{\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)}\frac{\min_{i=1\dots n}L(\boldsymbol{\xi},X_{i})}{\left|L_{n}(\boldsymbol{\xi})-L_{n}(\boldsymbol{\xi}_{0})\right|}\stackrel{P_{\boldsymbol{\xi}_{0}}}{\longrightarrow}0\end{aligned}$$
First notice
$$\begin{aligned} L(\boldsymbol{\xi},Y_{i})&=-\frac{1}{2}\log\left(2\pi\right)-\log\sigma-\frac{1}{2\sigma^{2}}\left(Y_{i}-N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\\ & =-\frac{1}{2}\log\left(2\pi\right)-\log\sigma-\frac{1}{2\sigma^{2}}\left(\epsilon_{i}\sigma_{0}+N(x_{i},\boldsymbol{t}_{0})^{T}\boldsymbol{\alpha}_{0}-N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\\ & =-\frac{1}{2}\log\left(2\pi\right)-\log\sigma-\frac{1}{2\sigma^{2}}\left(\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2}\end{aligned}$$
and we consider \(\mathcal{C}=\left\{\boldsymbol{\xi}:\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)\right\}\). Define
$$\begin{aligned} f_{n}\left(\boldsymbol{\xi}\right)&=& \frac{\min\; L\left(\boldsymbol{\xi},Y_{i}\right)}{\left|L_{n}\left(\boldsymbol{\xi}\right)-L_{n}\left(\boldsymbol{\xi}_{0}\right)\right|}\\ & = & \frac{-\frac{1}{2}\log\left(2\pi\right)-\log\sigma-\frac{1}{2\sigma^{2}}\max\left[\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{n\cdot\frac{1}{n}\left|L_{n}\left(\boldsymbol{\xi}\right)-L_{n}\left(\boldsymbol{\xi}_{0}\right)\right|}\end{aligned}$$
and notice that the denominator is bounded away from 0 by 10.6.
$$\begin{aligned} f_{n}\left(\boldsymbol{\xi}\right)&= \frac{-\frac{1}{2}\log\left(2\pi\right)-\log\sigma-\frac{1}{2\sigma^{2}}\max\left[\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{-n\cdot\frac{1}{n}\left(L_{n}\left(\boldsymbol{\xi}\right)-L_{n}\left(\boldsymbol{\xi}_{0}\right)\right)}\\ & =\frac{\frac{1}{\sqrt{n}}\left[-\frac{1}{2}\log\left(2\pi\right)-\log\sigma-\frac{1}{2\sigma^{2}}\max\left[\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}\right]}{-\sqrt{n}\cdot\frac{1}{n}\left[n\log\frac{\sigma_{0}}{\sigma}-\frac{1}{2\sigma^{2}}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}+\frac{1}{2}\sum\epsilon_{i}^{2}\right]}\\ & =\frac{1}{\sqrt{n}}\cdot\frac{-\frac{1}{2\sqrt{n}}\log\left(2\pi\right)-\frac{1}{\sqrt{n}}\log\sigma-\frac{1}{2\sqrt{n}\sigma^{2}}\max\left[\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{-\log\frac{\sigma_{0}}{\sigma}+\frac{1}{2n\sigma^{2}}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}-\frac{1}{2n}\sum\epsilon_{i}^{2}}\\ & =\frac{1}{\sqrt{n}}\left[\frac{-\frac{1}{2\sqrt{n}}\log\left(2\pi\right)}{-\log\frac{\sigma_{0}}{\sigma}+\frac{1}{2n\sigma^{2}}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}-\frac{1}{2n}\sum\epsilon_{i}^{2}}\right.\\ &\qquad \left.+\frac{-\frac{1}{\sqrt{n}}\log\sigma-\frac{1}{2\sqrt{n}\sigma^{2}}\max\left[\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{-\log\frac{\sigma_{0}}{\sigma}+\frac{1}{2n\sigma^{2}}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}-\frac{1}{2n}\sum\epsilon_{i}^{2}}\right]\end{aligned}$$
We consider the infimums of the terms inside the brackets separately.
For the first term, since the denominator is bounded in probability above 0 uniformly in \(\boldsymbol{\theta}\), and the numerator goes to zero, the infimum of the first term goes to 0 in probability.
The second term is uniformly bounded over \(\boldsymbol{\theta}\) by lemma 9. Notice that the numerator is
$$\begin{aligned} -&\frac{1}{\sqrt{n}}\log\sigma-\frac{1}{2\sqrt{n}\sigma^{2}}\max\left[\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}\\ &\ge -\frac{1}{\sqrt{n}}\log\sigma-\frac{\max\left[\epsilon_{i}\sigma_{0}\right]^{2}}{\sqrt{n}\sigma^{2}}-\frac{\max\left[g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{\sqrt{n}\sigma^{2}}\\ & = -\frac{1}{\sqrt{n}}\log\sigma-\frac{\sigma_{0}^{2}\, O_{p}\left(\log\, n\right)}{\sqrt{n}\sigma^{2}}-\frac{\max\left[g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{\sqrt{n}\sigma^{2}}\\ & \ge \frac{-\log n}{\sqrt{n}}\,\log\sigma-\frac{\sigma_{0}^{2}\, O_{p}\left(\log\, n\right)}{\sqrt{n}\sigma^{2}}-\frac{\max\left[g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{\sqrt{n}\sigma^{2}}\end{aligned}$$
and all three terms of the numerator converge to 0 for every σ. Therefore, for \(\sigma\in\left[0,d\right]\) for some large d, the infimum converges to 0. For \(\sigma>d\), the \(\log\sigma\) terms dominate and the infimum occurs at \(\sigma=d\) which also converges to 0. Therefore
$$\begin{aligned}\begin{array}{c}{\rm inf}\\[-3pt]\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)\end{array}\frac{\min{L\left(\boldsymbol{\xi},Y_{i}\right)}}{\left|L_{n}\left(\boldsymbol{\xi}\right)-L_{n}\left(\boldsymbol{\xi}_{0}\right)\right|}\stackrel{P}{\to}0.\end{aligned}$$
10.2.7 Assumptions C2
Finally we turn our attention to the Jacobian. Recall that the Jacobian is
$$\begin{aligned}J_{0}\left(\boldsymbol{y}_{0},\boldsymbol{\xi}\right)=\left|\frac{1}{\sigma^{2}}p^{\kappa}\det\left[\begin{array}{ccc} \boldsymbol{B}_{\boldsymbol{\alpha}} & \boldsymbol{B}_{\boldsymbol{t}} & \boldsymbol{B}_{\sigma^{2}}\end{array}\right]\right|\end{aligned}$$
where
$$\begin{aligned}\boldsymbol{B}_{\boldsymbol{\alpha}}=\left[\!\!\!\begin{array}{ccccccc} 1 & x_{(1)} & \dots & x_{(1)}^{p} & (x_{(1)}-t_{1})_{+}^{p} & \dots & (x_{(1)}-t_{\kappa})_{+}^{p}\\ \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots\\ 1 & x_{(l)} & \dots & x_{(l)}^{p} & (x_{(l)}-t_{1})_{+}^{p} & \dots & (x_{(l)}-t_{\kappa})_{+}^{p}\end{array}\!\!\!\right],\end{aligned}$$
$$\begin{aligned}\boldsymbol{B}_{\boldsymbol{t}}=\left[\!\!\!\begin{array}{ccc} \alpha_{1+p+1}\left(x_{(1)}-t_{1}\right)_{+}^{p-1}I\left(x_{(1)}-t_{1}\right) & \dots & \alpha_{1+p+\kappa}\left(x_{(1)}-t_{\kappa}\right)_{+}^{p-1}I\left(x_{(1)}-t_{\kappa}\right)\\ \vdots & \ddots & \vdots\\ \alpha_{1+p+1}\left(x_{(l)}-t_{1}\right)_{+}^{p-1}I\left(x_{(l)}-t_{1}\right) & \dots & \alpha_{1+p+\kappa}\left(x_{(l)}-t_{\kappa}\right)_{+}^{p-1}I\left(x_{(l)}-t_{\kappa}\right)\end{array}\!\!\!\right],\end{aligned}$$
and
$$\begin{aligned}\boldsymbol{B}_{\sigma^{2}}=\left[\!\!\!\begin{array}{c} -\frac{1}{2}\left(y_{(1)}-g(x_{(1)}|\boldsymbol{\theta})\right)\\ \vdots\\ -\frac{1}{2}\left(y_{(l)}-g(x_{(l)}|\boldsymbol{\theta})\right)\end{array}\!\!\!\right].\end{aligned}$$
Following the notation of Yeo and Johnson, we suppress parenthesis and 0 subscripts. We consider the \(\boldsymbol{\xi}\) in compact space \(\bar{B}(\boldsymbol{\xi}_{0},\delta)\). We notice that for \(\delta<\sigma^{-2}\) that \(J(\boldsymbol{y};\boldsymbol{\xi})\le\delta^{\kappa+1}p^{\kappa}g(\boldsymbol{y})\) for some \(g(\boldsymbol{y})\) because \(\boldsymbol{B_{\alpha}}\) and \(\boldsymbol{B_{t}}\) are functions of \(\boldsymbol{x},\boldsymbol{t}\) which are bounded.
We let \(S_{M}^{l}\) be the unit square in \(\mathbb{R}^{l}\) of radius M.
Finally, we notice that \(J_{j}(y_{1},\dots,y_{j};\boldsymbol{\xi})=E\left[J\left(y_{1},\dots,y_{j},Y_{j+1},\dots,Y_{l};\boldsymbol{\xi}\right)\right]\) is a polynomial in \(\boldsymbol{\theta}\) scaled by \(\sigma^{2}\), which is equicontinuous on compacts of \(\boldsymbol{\xi}\) where σ is bounded away from 0.
Appendix C: Full Simulation Results