Fiducial Theory for Free-Knot Splines

Sonderegger, Derek L.; Hannig, Jan

doi:10.1007/978-3-319-02651-0_10

Fiducial Theory for Free-Knot Splines

Derek L. Sonderegger⁵ &
Jan Hannig⁶

Conference paper
First Online: 01 January 2013

1362 Accesses
7 Citations

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 68))

Abstract

We construct the fiducial model for free-knot splines and derive sufficient conditions to show asymptotic consistency of a multivariate fiducial estimator. We show that splines of degree four and higher satisfy those conditions and conduct a simulation study to evaluate quality of the fiducial estimates compared to the competing Bayesian solution. The fiducial confidence intervals achieve the desired confidence level while tending to be shorter than the corresponding Bayesian credible interval using the reference prior. AMS 2000 subject classifications: Primary 62F99, 62G08; secondary 62P10.

Jan Hannig’s research was supported in part by the National Science Foundation under Grant No. 1007543 and 1016441.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Cisewski J, Cisewski J, Hannig J (2012) Generalized fiducial inference for normal linear mixed models Ann Stat 40:2102–2127
Article MATH MathSciNet Google Scholar
DiMatteo I, Genovese C, Kass R, Robert E (2001) Bayesian curve-fitting with free-knot splines. Biometrika 88:1055–1071
Article MATH MathSciNet Google Scholar
Lidong E, Hannig J, Iyer H (2008) Fiducial intervals for variance components in an un-balanced two-component normal mixed linear model. J Am Stat Assoc 103:854–865
Article MATH Google Scholar
Fisher RA (1930) Inverse probability. Proc Camb Philos Soc xxvi:528–535
Article Google Scholar
Ghosh JK, Ramamoorthi RV (2003). Bayesian Nonparametrics. Springer-Verlag, New York
MATH Google Scholar
Hannig J (2009) On generalized fiducial inference. Statist Sinica 19:491–544
MATH MathSciNet Google Scholar
Hannig J (2013) Generalized fiducial inference via discretization. Stat Sinica 23:489–514
MATH MathSciNet Google Scholar
Hannig J, Iyer H, Patterson P (2006) Fiducial generalized confidence intervals. J Am Stat Assoc 101:254–269. 10.1198/016214505000000736
Article MATH MathSciNet Google Scholar
Hannig J, Lee TCM (2009). Generalized fiducial inference for wavelet regression. Biometrika 96:847–860. 10.1093/biomet/asp050
Article MATH MathSciNet Google Scholar
Lehmann EL, George C, (1998) Theory of point estimation. Springer, New York
MATH Google Scholar
Muggeo VMR (2003) Estimating regression models with unknown break-points. Stat Med 22:3055–3071
Article Google Scholar
Muggeo VMR (2008) Segmented: an R package to fit regression models with broken-line relationships. R News, 8, 1: 20–25.
Google Scholar
Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, Cambridge
Book MATH Google Scholar
Sonderegger DL, Wang H, Clements WH, Noon BR (2009) Using SiZer to detect thresholds in ecological data. Front Ecol Environ 7:190–195 doi:10.1890/070179
Google Scholar
Toms JD, Lesperance ML (2003) Piecewise regression: a tool for identifying ecological thresholds. Ecology 84:2034–2041
Article Google Scholar
van der Varrt AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge
Book Google Scholar
Wandler DV, Hannig J (2011) Generalized fiducial confidence intervals for extremes. Extremes 15:67–87. 10.1007/s10687-011-0127-9
Article MathSciNet Google Scholar
Wandler DV, Hannig J (2012) A fiducial approach to multiple comparisons. J Stat Plan Infer 142:878–895. 10.1016/j.jspi.2011.10.011
Article MATH MathSciNet Google Scholar
Weerahandi S (1993) Generalized confidence intervals. J Am Stat Assoc 88(423):899–905
Article MATH MathSciNet Google Scholar
Yeo IK, Johnson RA (2001) A uniform strong law of large numbers for U-statistics with application to transforming to near symmetry. Stat Probab Lett 51 63–69
Article MATH MathSciNet Google Scholar

Download references

Acknowledgement

Dr Hannig thanks Prof. Hira Koul for his encouragement and help ever since he was a graduate student at Michigan State University. A young researcher cannot ask for a better role model. The authors also thank the two anonymous referees that made several useful suggestions for improving the manuscript.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, Northern Arizona Univeristy, Flagstaff, USA
Derek L. Sonderegger
Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, North Carolina, USA
Jan Hannig

Authors

Derek L. Sonderegger
View author publications
You can also search for this author in PubMed Google Scholar
Jan Hannig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Derek L. Sonderegger .

Editor information

Editors and Affiliations

North Carolina State Universy, Raleigh, North Carolina, USA
Soumendra Lahiri
Department of Mathematical Sciences, Binghampton University, Binghamton, New York, USA
Anton Schick
Applied Statistics Unit, Kolkata, India
Ashis SenGupta
Department of Statistics, University of Georgia, Athens, Georgia, USA
T.N. Sriram

Appendices

Appendix A: Proof of Asymptotic Normality of Fiducial Estimators

We start with several assumptions. The assumptions A0–A6 are sufficient for the maximum likelihood estimate to converge asymptotically to a normal distribution and can be found in Lehmann and Casella (1998) as 6.3 (A0)–(A2) and 6.5 (A)–(D). The assumption B2 shows that the Jacobian converges to a prior (Hannig 2009) and B1 is the assumption necessary for the Bayesian solution to converge to that of the MLE (Ghosh and Ramamoorthi 2003, Theorem 1.4.1).

10.1.1 Assumptions

10.1.1.1 Conditions for Asymptotic Normality of the MLE

(A0)
The distributions $P_{\boldsymbol{\xi}}$ are distinct.
(A1)
The set $\left\{x:f(x|\boldsymbol{\xi})>0\right\}$ is independent of the choice of $\boldsymbol{\xi}$.
(A2)
The data $\boldsymbol{X}=\{X_{1},\dots,X_{n}\}$ are independent identically distributed (i.i.d.) with probability density $f(\cdot|\boldsymbol{\xi})$.
(A3)
There exists an open neighborhood about the true parameter value $\boldsymbol{\xi}_{0}$ such that all third partial derivatives $\left(\partial^{3}/\partial\xi_{i}\partial\xi_{j}\partial\xi_{k}\right)f(\boldsymbol{x}|\boldsymbol{\xi})$ exist in the neighborhood, denoted by $B(\boldsymbol{\xi}_{0},\delta)$.
(A4)
The first and second derivatives of $L(\boldsymbol{\xi},x)=\log f(x|\boldsymbol{\xi})$ satisfy
$$\begin{aligned} E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\xi_{j}}L(\boldsymbol{\xi},x)\right]=0\end{aligned}$$
and
$$\begin{aligned} I_{j,k}(\boldsymbol{\xi}) & = E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\xi_{j}}L(\boldsymbol{\xi},x)\cdot\frac{\partial}{\partial\xi_{k}}L(\boldsymbol{\xi},x)\right]\\ & = -E_{\boldsymbol{\xi}}\left[\frac{\partial^{2}}{\partial\xi_{j}\partial\xi_{k}}L(\boldsymbol{\xi},x)\right].\end{aligned}$$
(A5)
The information matrix $I(\boldsymbol{\xi})$ is positive definite for all $\boldsymbol{\xi}\in B(\boldsymbol{\xi}_{0},\delta)$
(A6)
There exists functions $M_{jkl}(\boldsymbol{x})$ such that
$$\begin{aligned} \sup_{\boldsymbol{\xi}\in B(\boldsymbol{\xi}_{0},\delta)}\left|\frac{\partial^{3}}{\partial\xi_{j}\partial\xi_{k}\partial\xi_{l}}L(\boldsymbol{\xi},x)\right|\le M_{j,k,l}(x)\quad\textrm{and}\;\;E_{\boldsymbol{\xi}_{0}}M_{j,k,l}(x)<\infty\end{aligned}$$

10.1.1.2 Conditions for the Bayesian Posterior Distribution to be Close to That of the MLE.

Let $\pi(\boldsymbol{\xi})=E_{\boldsymbol{\xi}_{0}}J_{0}(X_{0},\boldsymbol{\xi})$ and $L_{n}(\boldsymbol{\xi})=\sum L(\boldsymbol{\xi},X_{i})$

(B1)
For any $\delta>0$ there exists $\epsilon>0$ such that
$$\begin{aligned} P_{\boldsymbol{\xi}_{0}}\left\{\sup_{\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)}\frac{1}{n}\left(L_{n}(\boldsymbol{\xi})-L_{n}(\boldsymbol{\xi}_{0})\right)\le-\epsilon\right\} \to1\end{aligned}$$
(B2)
$\pi\left(\boldsymbol{\xi}\right)$ is positive at $\boldsymbol{\xi}_{0}$

10.1.1.3 Conditions for Showing That the Fiducial Distribution is Close to the Bayesian Posterior

(C1)
For any $\delta>0$
$$\begin{aligned} \inf_{\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)}\frac{\min_{i=1\dots n}L(\boldsymbol{\xi},X_{i})}{\left|L_{n}(\boldsymbol{\xi})-L_{n}(\boldsymbol{\xi}_{0})\right|}\stackrel{P_{\boldsymbol{\xi}_{0}}}{\longrightarrow}0\end{aligned}$$
(C2)
Let $\pi(\boldsymbol{\xi})=E_{\boldsymbol{\xi}_{0}}J_{0}(X_{0},\boldsymbol{\xi})$. The Jacobian function $J\left(\boldsymbol{X},\boldsymbol{\xi}\right)\stackrel{a.s.}{\to}\pi\left(\boldsymbol{\xi}\right)$ uniformly on compacts in $\boldsymbol{\xi}$. In the single variable case, this reduces to $J\left(\boldsymbol{X},\xi\right)$ is continuous in $\xi,$ $\pi\left(\xi\right)$ is finite and $\pi\left(\xi_{0}\right)>0$, and for some $\delta_{0}$
$$\begin{aligned} E_{\xi_{0}}\left(\sup_{\xi\in B\left(\xi_{0},\delta\right)}J_{0}\left(\boldsymbol{X},\xi\right)\right)<\infty.\end{aligned}$$
In the multivariate case, we follow Yeo and Johnson (2001). Let
$$\begin{aligned} J_{j}\left(x_{1},\dots,x_{j};\boldsymbol{\xi}\right)=E_{\boldsymbol{\xi}_{0}}\left[J_{0}\left(x_{1},\dots,x_{j},X_{j+1},\dots,X_{k};\boldsymbol{\xi}\right)\right].\end{aligned}$$

(C2.a)
There exists a integrable and symmetric functions $g\left(x_{1},\dots,x_{j}\right)$ and compact space $\bar{B}\left(\boldsymbol{\xi}_{0},\delta\right)$ such that for $\boldsymbol{\xi}\in\bar{B}\left(\boldsymbol{\xi}_{0},\delta\right)$ then $\left|J_{j}\left(x_{1},\dots,x_{j};\boldsymbol{\xi}\right)\right|\le g\left(x_{1},\dots,x_{j}\right)$ for $j=1,\dots,k$.
(C2.b)
There exists a sequence of measurable sets $S_{M}^{k}$ such that
$$\begin{aligned} P\left(\mathbb{R}^{k}-\cup_{M=1}^{\infty}S_{M}^{k}\right)=0\end{aligned}$$
(C2.c)
For each M and for all $j\in1,\dots,k$, $J_{j}\left(x_{1},\dots,x_{j};\boldsymbol{\xi}\right)$ is equicontinuous in $\boldsymbol{\xi}$ for $\{x_{1},\dots,x_{j}\}\in S_{M}^{j}$ where $S_{M}^{k}=S_{M}^{j}S_{M}^{k-j}$.

10.1.2 Proof of Asymptotic Normality of Multivariate Fiducial Estimators

We now prove the asymptotic normality (Theorem 1) for multivariate fiducial estimators.

Proof.

Assume without loss of generality that $\xi\in\boldsymbol{\Xi}=\mathbb{R}^{p}$. We denote $J_{n}\left(\boldsymbol{x}_{n},\boldsymbol{\xi}\right)$ as the average of all possible Jacobians over a sample of size n and $\pi\left(\boldsymbol{\xi}\right)=E_{\boldsymbol{\xi}_{0}}J_{0}\left(\boldsymbol{x},\boldsymbol{\xi}\right)$. Assumption C2 and the uniform strong law of large numbers for U-statistics imply that $J_{n}\left(\boldsymbol{x},\boldsymbol{\xi}\right)\stackrel{a.s.}{\rightarrow}\pi\left(\boldsymbol{\xi}\right)$ uniformly in $\boldsymbol{\xi}\in\bar{B}\left(\boldsymbol{\xi}_{0},\delta\right)$ and that $\pi\left(\boldsymbol{\xi}\right)$ is continuous. Therefore,

$$\sup_{\boldsymbol{\xi}\in\bar{B}\left(\boldsymbol{\xi}_{0},\delta\right)}\left|J_{n}\left(x_{n},\boldsymbol{\xi}\right)-\pi\left(\boldsymbol{\xi}\right)\right|\to0\;P_{\boldsymbol{\xi}_{0}}\, a.s.$$

The multivariate proof now proceeds in a similar fashion as the univariate case. Let

$$\begin{aligned} \pi^{*}\left(\boldsymbol{s},\boldsymbol{x}\right) &= \frac{J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)f\left(\boldsymbol{x}_{n}|\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)}{\int_{\mathbb{R}^{p}}J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right)f\left(\boldsymbol{x}_{n}|\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right)\, d\!\boldsymbol{t}}\\ & = \frac{J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)\right]}{\int_{\mathbb{R}^{p}}J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right)\right]\, d\!\boldsymbol{t}}\\ & =\frac{J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]}{\int_{\mathbb{R}^{p}}J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\, d\!\boldsymbol{t}}\end{aligned}$$

and just as Ghosh and Ramamoorthi (2003), we let $H=-\frac{1}{n}\frac{\partial}{\partial\boldsymbol{\xi}\partial\boldsymbol{\xi}}L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)$ and we notice that $H\to I\left(\boldsymbol{\xi}_{0}\right)\, a.s.P_{\boldsymbol{\xi}_{0}}$. It will be sufficient to prove

$$\begin{aligned}\int_{\mathbb{R}^{p}}\left|J_{n}\left({x}_{n},\hat{\xi}_{n}+\frac{t}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\xi}_{n}+\frac{t}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right.\nonumber\\\left.-\pi\left(\boldsymbol{\xi}_{0}\right){\rm exp}\left[\frac{-\boldsymbol{t}^{T}I\left(\boldsymbol{\xi}_{0}\right)\boldsymbol{t}}{2}\right]\right|d\!\boldsymbol{t} \stackrel{P_{\boldsymbol{\xi}_{0}}}{\rightarrow}0\end{aligned}$$

(10.3)

Let t _i represent the ith component of vector $\boldsymbol{t}$. By Taylor’s Theorem, we can compute

$$\begin{aligned} L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right) &= L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)+\sum_{i=1}^{p}\left(\frac{t_{i}}{\sqrt{n}}\right)\frac{\partial}{\partial\xi_{i}}L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\\ &+\frac{1}{2}\sum_{i=1}^{p}\sum_{j=1}^{p}\left(\frac{t_{i}t_{j}}{\left(\sqrt{n}\right)^{2}}\frac{\partial}{\partial\xi_{i}\partial\xi_{j}}L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right)\\ &+\frac{1}{6}\sum_{i=1}^{p}\sum_{j=1}^{p}\sum_{k=1}^{p}\left(\frac{t_{i}t_{j}t_{k}}{\left(\sqrt{n}\right)^{3}}\frac{\partial}{\partial\xi_{i}\partial\xi_{j}\partial\xi_{k}}L_{n}\left(\boldsymbol{\xi}'\right)\right)\\ & = L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)-\frac{\boldsymbol{t}^{T}H\boldsymbol{t}}{2}+R_{n}\end{aligned}$$

for some $\boldsymbol{\xi}^\prime\in\left[\hat{\boldsymbol{\xi}}_{n},\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right]$. Notice that $R_{n}=O_{p}\left(\left\Vert \boldsymbol{t}\right\Vert/n^{3/2}\right)$.

Given any $0<\delta<\delta_{0}$ and $c>0$, we break $\mathbb{R}^{p}$ into three regions:

$$\begin{aligned} A_{1} & =\left\{\boldsymbol{t}:\;\left\Vert \boldsymbol{t}\right\Vert <c\log\sqrt{n}\right\}\\ A_{2}&=\left\{\boldsymbol{t}:c\log\sqrt{n}<\left\Vert \boldsymbol{t}\right\Vert <\delta\sqrt{n}\right\}\\ A_{3} & =\left\{\boldsymbol{t}:\;\delta\sqrt{n}<\left\Vert \boldsymbol{t}\right\Vert \right\}\end{aligned}$$

On $A_{1}\cup A_{2}$ we compute

$$\begin{aligned} {\int_{A_{1}\cup A_{2}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right.}\\ \left.-\pi\left(\boldsymbol{\xi}_{0}\right){\rm exp}\left[-\frac{1}{2}\boldsymbol{t}'I\left(\boldsymbol{\xi}_{0}\right)\boldsymbol{t}\right]\right|d\!\boldsymbol{t}\\ \le \int_{A_{1}\cup A_{2}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-\pi\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)\right|\\ \cdot{\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]d\!\boldsymbol{t}\\ +\int_{A_{1}\cup A_{2}}\left|\pi\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right.\\ \left.-\pi\left(\boldsymbol{\xi}_{0}\right){\rm exp}\left[-\frac{1}{2}\boldsymbol{t}'I\left(\boldsymbol{\xi}_{0}\right)\boldsymbol{t}\right]\right|d\!\boldsymbol{t}\end{aligned}$$

Since $\pi\left(\boldsymbol{\cdot}\right)$ is a proper prior on $A_{1}\cup A_{2}$, then the second term goes to 0 by the Bayesian Bernstein-von Mises theorem. Next we notice that

$$\begin{aligned} {\int_{A_{1}\cup A_{2}}\left|J_{n}\left(x,\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-\pi\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)\right|}\\ \cdot{\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]d\!\boldsymbol{t}\\ \le \sup_{\boldsymbol{t}\in A_{1}\cup A_{2}}\left|J_{n}\left(x,\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-\pi\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)\right|\\ \cdot\int_{A_{1}\cup A_{2}}{\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]d\!\boldsymbol{t}\end{aligned}$$

Since $\sqrt{n}\left(\hat{\boldsymbol{\xi}}_{n}-\boldsymbol{\xi}_{0}\right)\stackrel{\mathcal{D}}{\to}N\left(0,I\left(\boldsymbol{\xi}_{0}\right)^{-1}\right)$, then

$$\begin{aligned}P_{\boldsymbol{\xi}_{0}}\left[\left\{\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n};\;\boldsymbol{t}\in A_{1}\cup A_{2}\right\} \subset B\left(\boldsymbol{\xi}_{0},\delta_{0}\right)\right]\to1.\end{aligned}$$

Furthermore, since $L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)=-\frac{\boldsymbol{t}^{T}H\boldsymbol{t}}{2}+R_{n}$ then the integral converges in probability to 1. Since $\max_{\mathbf{t}\in A_{1}\cup A_{2}}\left\Vert \boldsymbol{t}/\sqrt{n}\right\Vert \le\delta$ and $J_{n}\to\pi,$ then the term $\to0$ in probability.

Next, we turn to

$$\begin{aligned} {\int_{A_{3}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right.}\\ \left.-\pi\left(\boldsymbol{\xi}_{0}\right){\rm exp}\left[\frac{-\boldsymbol{t}^{T}I\left(\boldsymbol{\xi}_{0}\right)\boldsymbol{t}}{2}\right]\right|\, d\!\boldsymbol{t}\\ \le \int_{A_{3}}J_{n}\left(\boldsymbol{x}_{i},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]d\!\boldsymbol{t}\\ +\int_{A_{3}}\pi\left(\boldsymbol{\xi}_{0}\right){\rm exp}\left[\frac{-\boldsymbol{t}^{T}I\left(\boldsymbol{\xi}_{0}\right)\boldsymbol{t}}{2}\right]d\!\boldsymbol{t}\end{aligned}$$

The second integral goes to 0 in $P_{\boldsymbol{\xi}_{0}}$ probability because $\min_{A_{3}}\left\Vert \boldsymbol{t}\right\Vert \to\infty$. As for the first integral,

$$\begin{aligned} {\int_{A_{3}}J_{n}\left(\boldsymbol{x},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]d\!\boldsymbol{t}}\\ =\frac{1}{n}\sum_{i=1}^{n}\int_{A_{3}}J\left(\boldsymbol{x}_{i},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]d\!\boldsymbol{t}\\ =\frac{1}{n}\sum_{i=1}^{n}\int_{A_{3}}J\left(\boldsymbol{x}_{i},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)f\left(x_{i}|\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)\\ {\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)-\log f\left(x_{i}|\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)\right]d\!\boldsymbol{t}\end{aligned}$$

Because $J\left(\cdot\right)$ is a probability measure, then so is $J\left(\cdot\right)f\left(\cdot\right)$. Assumption C1 assures that the exponent goes to $-\infty$ and therefore the integral converges to 0 in probability.

Having shown Eq. 10.3, we now follow Ghosh and Ramamoorthi (2003) and let

$$\begin{aligned} C_{n}=\int_{\mathbb{R}^{p}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right|d\!\boldsymbol{t}\end{aligned}$$

then the main result to be proved (Eq. 10.2) becomes

$$\begin{aligned}C_{n}^{-1}\left\{\int_{\mathbb{R}^{p}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right.\right..\nonumber\\\left.\left.-C_{n}\frac{\sqrt{det\left|I\left(\boldsymbol{\xi}_{0}\right)\right|}}{\sqrt{2\pi}}e^{-\boldsymbol{s}^{T}I\left(\boldsymbol{\xi}_{0}\right)\mathbf{s}/2}\right|\right\} \, d\boldsymbol{s} & \stackrel{P_{\boldsymbol{\xi}_{0}}}{\to} & 0\end{aligned}$$

(10.4)

Because

$$\begin{aligned} \int_{\mathbb{R}^{p}}J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right){\rm exp}\left[-\frac{\boldsymbol{s}^{T}H\boldsymbol{s}}{2}\right]\, d\boldsymbol{s} & = J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right)\int_{\mathbb{R}^{p}}{\rm exp}\left[-\frac{\boldsymbol{s}^{T}H\boldsymbol{s}}{2}\right]\, d\boldsymbol{s}\\ &=J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right)\frac{\sqrt{2\pi}}{\sqrt{\det\left(H\right)}}\\ &\stackrel{a.s.}{\to} \pi\left(\boldsymbol{\xi}_{0}\right)\sqrt{\frac{2\pi}{\det\left(I\left(\boldsymbol{\xi}_{0}\right)\right)}}\end{aligned}$$

and Eq. 10.3 imply that $C_{n}\stackrel{P}{\to}\pi\left(\boldsymbol{\xi}_{0}\right)\sqrt{\frac{2\pi}{\det\left(I\left(\boldsymbol{\xi}_{0}\right)\right)}}$ it is enough to show that the integral in Eq:10.4 goes to 0 in probability. This integral is less than $I_{1}+I_{2}$ where

$$\begin{aligned} I_{1}& =\int_{\mathbb{R}^{P}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right.\\ &\left.-J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right){\rm exp}\left[\frac{-\boldsymbol{s}^{T}H\boldsymbol{s}}{2}\right]\right|\, d\boldsymbol{s}\end{aligned}$$

and

$$\begin{aligned} I_{2}=\int_{\mathbb{R}^{P}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right){\rm exp}\left[\frac{-\boldsymbol{s}^{T}H\boldsymbol{s}}{2}\right]-C_{n}\frac{\sqrt{det\left|I\left(\boldsymbol{\xi}_{0}\right)\right|}}{\sqrt{2\pi}}e^{-\boldsymbol{s}^{T}I\left(\boldsymbol{\xi}_{0}\right)\mathbf{s}/2}\right|\, d\boldsymbol{s}.\end{aligned}$$

Eq. 10.3 shows that $I_{1}\to0$ in probability and I ₂ is

$$\begin{aligned} I_{2}& =& \left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right)-C_{n}\frac{\sqrt{det\left|I\left(\boldsymbol{\xi}_{0}\right)\right|}}{\sqrt{2\pi}}\right|\int_{\mathbb{R}^{P}}{\rm exp}\left[\frac{-\boldsymbol{s}^{T}H\boldsymbol{s}}{2}\right]\, d\boldsymbol{s}\\ & \stackrel{P}{\to} & 0\end{aligned}$$

because $J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right)\stackrel{P}{\to}\pi\left(\boldsymbol{\xi}_{0}\right)$ and $C_{n}\stackrel{P}{\to}\pi\left(\boldsymbol{\xi}_{0}\right)\sqrt{\frac{2\pi}{\det\left(I\left(\boldsymbol{\xi}_{0}\right)\right)}}.\hfill\Box$

Appendix B: Proof of Assumptions for Free-Knot Splines Using a Truncated Polynomial Basis

We now consider the free-knot spline case. Suppose we are interested in a p degree (order $m=p+1$) polynomial spline with κ knot points, $\boldsymbol{t}=\left\{t_{1},\dots,t_{\kappa}\right\} ^{T}$ where $t_{k}\in(a+\delta,b-\delta)$ and $\left|t_{i}-t_{j}\right|\le\delta$ for $i\ne j$ and some $\delta>0$. Furthermore, we assume that the data points $\left\{x_{i},y_{i}\right\}$ independent with the distribution of the x _i having positive density on $\left[a,b\right]$.

Denote the truncated polynomial spline basis functions as

$$\begin{aligned} N(x,\boldsymbol{t}) & = \left\{N_{1}(x,\boldsymbol{t}),\dots,N_{\kappa+m}(x,\boldsymbol{t})\right\} ^{T}\\ & = \left\{1,x,\dots,x^{p},(x-t_{1})_{+}^{p},\dots,(x-t_{\kappa})_{+}^{p}\right\} ^{T}\end{aligned}$$

and let $y_{i}=N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}+\sigma\epsilon_{i}$ where $\epsilon_{i}\stackrel{iid}{\sim}N(0,1)$ and thus the density function is

$$\begin{aligned} f(y,\boldsymbol{\xi})=\frac{1}{\sqrt{2\pi\sigma^{2}}}{\rm exp}\left[-\frac{1}{2\sigma^{2}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right]\end{aligned}$$

where $\boldsymbol{\xi}=\{\boldsymbol{t},\boldsymbol{\alpha},\sigma^{2}\}$ and the log-likelihood function is

$$\begin{aligned} L(\boldsymbol{\xi},y)=\frac{1}{2}\log2\pi-\frac{1}{2}\log\sigma^{2}-\frac{1}{2\sigma^{2}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\end{aligned}$$

10.2.1 Assumptions A0–A4

Assumptions A0–A2 are satisfied. We now consider assumption A3 and A4. We note that if $p\ge4$ then the necessary three continuous derivatives exist and now examine the derivatives. Let $\boldsymbol{\theta}=\{\boldsymbol{t},\boldsymbol{\alpha}\}$ and thus

$$\begin{aligned} E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}}L(\boldsymbol{\xi},y)\right] & =E_{\boldsymbol{\xi}}\left[-\frac{1}{2\sigma^{2}}2\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(-\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right]\\ & =-\frac{1}{2\sigma^{2}}2\left(E_{\boldsymbol{\xi}}\left[y\right]-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(-\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\\ & = 0\end{aligned}$$

and

$$\begin{aligned} E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},\boldsymbol{y})\right] & =E_{\boldsymbol{\xi}}\left[-\frac{1}{2\sigma^{2}}+\frac{1}{2\left(\sigma^{2}\right)^{2}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right]\\ & =-\frac{1}{2\sigma^{2}}+\frac{1}{2\left(\sigma^{2}\right)^{2}}\left(\sigma^{2}\right)\\ & =0.\end{aligned}$$

Next, we consider information matrix. First, we consider the $\boldsymbol{\theta}$ terms.

$$\begin{aligned} {E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}}L(\boldsymbol{\xi},\boldsymbol{y})\,\frac{\partial}{\partial\theta_{k}}L(\boldsymbol{\xi},\boldsymbol{y})\right]} & =E_{\boldsymbol{\xi}}\left[\frac{1}{\sigma^{4}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\left(\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right]\\ &=\frac{1}{\sigma^{4}}E_{\boldsymbol{\xi}}\left[\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right]\left(\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\\ &=\frac{1}{\sigma^{2}}\left(\frac{\partial}{\partial\theta_{j}}N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{k}}N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\end{aligned}$$

The j,k partials for the second derivative are

$$\begin{aligned} {\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}L(\boldsymbol{\xi},y)}&=\frac{\partial}{\partial\theta_{j}}\left[-\frac{1}{2\sigma^{2}}2\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(-\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right]\\ &=\frac{\partial}{\partial\theta_{j}}\left[-\frac{1}{\sigma^{2}}\left(-y_{i}\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)+N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right)\right]\\ &=-\frac{1}{\sigma^{2}}\left[-y\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}+\left(\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right.\\ &\left.+N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right]\end{aligned}$$

which have expectation

$$\begin{aligned} E_{\boldsymbol{\xi}}\left[\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}L(\boldsymbol{\xi},y)\right] & =-\frac{1}{\sigma^{2}}\left(\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\\ & =-E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}}L(\boldsymbol{\xi},y)\,\frac{\partial}{\partial\theta_{k}}L(\boldsymbol{\xi},\boldsymbol{y})\right]\end{aligned}$$

as necessary. Next, we consider

$$\begin{aligned} {E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}}L(\boldsymbol{\xi},y)\,\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},y)\right]}\\ & =E_{\boldsymbol{\xi}}\left[\frac{1}{\sigma^{2}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\left[-\frac{1}{2\sigma^{2}}+\frac{1}{2\sigma^{4}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right]\right]\\ & =E_{\boldsymbol{\xi}}\left[-\frac{1}{2\sigma^{4}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}+\frac{1}{2\sigma^{6}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{3}\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right]\\ & =0\end{aligned}$$

which is equal to

$$\begin{aligned} E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}\partial\sigma^{2}}L(\boldsymbol{\xi},\boldsymbol{y})\right] & =E_{\boldsymbol{\xi}}\left[\frac{2}{2\sigma^{4}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right]\\ & =0.\end{aligned}$$

Finally,

$$\begin{aligned} {E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},y)\,\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},y)\right]}\\ & =E_{\boldsymbol{\xi}}\left[\left\{-\frac{1}{2\sigma^{2}}+\frac{1}{2\sigma^{4}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right\} \left\{-\frac{1}{2\sigma^{2}}+\frac{1}{2\sigma^{4}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right\} \right]\\ &=E_{\boldsymbol{\xi}}\left[\frac{1}{4\sigma^{4}}-\frac{2}{4\sigma^{6}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}+\frac{1}{4\sigma^{8}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{4}\right]\\ &=\frac{1}{4\sigma_{0}^{4}}-\frac{2}{4\sigma_{0}^{6}}\sigma_{0}^{2}+\frac{1}{4\sigma_{0}^{8}}3\sigma_{0}^{4}\\ &=\frac{2}{4\sigma_{0}^{4}}\end{aligned}$$

which is equal to

$$\begin{aligned} -E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\sigma^{2}\partial\sigma^{2}}L(\boldsymbol{\xi},y)\right] = & -E_{\boldsymbol{\xi}}\left[\frac{1}{2}\sigma^{-4}-\frac{2}{2}\sigma^{-6}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right]\\ = & -\frac{1}{2}\sigma_0^{-4} + \frac{2}{2}\sigma_0^{-4}.\end{aligned}$$

Therefore, the interchange of integration and differentiation is justified.

10.2.2 Assumptions A5

To address whether the information matrix is positive definite, we notice that since $E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},y)\,\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},y)\right]>0$ and $E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}}L(\boldsymbol{\xi},y)\,\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},y)\right]=0$, we only need to be concerned with the submatrix

$$\begin{aligned} I_{j,k}(\boldsymbol{\theta})&=\sum_{i=1}^{n}E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}}L(\boldsymbol{\xi},y_{i})\,\frac{\partial}{\partial\theta_{k}}L(\boldsymbol{\xi},y_{i})\right]\\ &=\frac{1}{\sigma^{2}}\sum_{i=1}^{n}\left(\frac{\partial}{\partial\theta_{j}}N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{k}}N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right).\end{aligned}$$

where the $\sigma^{-2}$ term can be ignored because it does not affect the positive definiteness. First, we note

$$\begin{aligned} \frac{\partial}{\partial t_{j}}N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}&=-p\left(x_{i}-t_{j}\right)_{+}^{p-1}\alpha_{p+j+1}\\ \frac{\partial}{\partial\alpha_{j}}N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}&=N_{j}(x_{i},\boldsymbol{t}).\end{aligned}$$

If we let

$$\begin{aligned} X=\left[\begin{array}{cccccc} N_{1}\left(x_{1},\boldsymbol{t}\right) & \cdots & N_{m+\kappa}\left(x_{1},\boldsymbol{t}\right) & \frac{\partial}{\partial t_{1}}N(x_{1},\boldsymbol{t})^{T}\boldsymbol{\alpha} & \cdots & \frac{\partial}{\partial t_{\kappa}}N(x_{1},\boldsymbol{t})^{T}\boldsymbol{\alpha}\\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots\\ N_{1}\left(x_{n},\boldsymbol{t}\right) & \cdots & N_{m+\kappa}\left(x_{n},\boldsymbol{t}\right) & \frac{\partial}{\partial t_{1}}N(x_{n},\boldsymbol{t})^{T}\boldsymbol{\alpha} & \cdots & \frac{\partial}{\partial t_{\kappa}}N(x_{n},\boldsymbol{t})^{T}\boldsymbol{\alpha}\end{array}\right]\end{aligned}$$

then $I(\boldsymbol{\theta})=X^{T}X$. Then, $I(\boldsymbol{\theta})$ is positive definite if the columns of X are linearly independent. This is true under the assumptions that $t_{j}\ne t_{k}$ and that $\alpha_{m+j}\ne0$.

10.2.3 Assumptions A6

We next consider a bound on the third partial derivatives. We start with the derivatives of the basis functions.

$$\begin{aligned} \frac{\partial^{2}}{\partial t_{j}\partial t_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}=0\qquad\textrm{if}j\ne k\end{aligned}$$

$$\begin{aligned} \frac{\partial^{2}}{\partial t_{j}\partial t_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}=p(p-1)\left(x-t_{j}\right)_{+}^{p-2}\alpha_{p+j+1}\end{aligned}$$

$$\begin{aligned} \frac{\partial^{2}}{\partial\alpha_{j}\partial\alpha_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}=0\end{aligned}$$

$$\begin{aligned} \frac{\partial^{2}}{\partial t_{j}\partial\alpha_{p+j+1}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}=-p\left(x-t_{j}\right)_{+}^{p-1}\end{aligned}$$

$$\begin{aligned} \frac{\partial^{3}}{\partial t_{j}\partial t_{j}\partial t_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}=-p(p-1)(p-2)\left(x-t_{j}\right)_{+}^{p-3}\alpha_{p+j+1}\end{aligned}$$

$$\begin{aligned} \frac{\partial^{3}}{\partial t_{j}\partial t_{j}\partial\alpha_{p+j+1}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}=p(p-1)\left(x-t_{j}\right)_{+}^{p-2}\end{aligned}$$

Since, x is an element of a compact set, then for $\boldsymbol{\xi}\in B(\boldsymbol{\xi}_{0},\delta)$ all of the earlier partials are bounded as is $N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}$. Therefore

$$\begin{aligned} &{\frac{\partial^{3}}{\partial\theta_{j}\partial\theta_{k}\partial\theta_{l}}L(\boldsymbol{\xi},x)}\\ & \qquad= -\frac{1}{\sigma^{2}}\left[-y\frac{\partial^{3}}{\partial\theta_{j}\partial\theta_{k}\partial\theta_{l}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}+\left(\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial^{2}}{\partial\theta_{l}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right.\\ &\qquad+\left(\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{l}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial^{2}}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\\ &\qquad+\left(\frac{\partial^{2}}{\partial\theta_{l}\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial^{2}}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\\ &\qquad\left.+N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\left(\frac{\partial^{3}}{\partial\theta_{j}\partial\theta_{k}\partial\theta_{l}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right]\end{aligned}$$

and

$$\begin{aligned} &{\frac{\partial^{3}}{\partial\theta_{j}\partial\theta_{k}\partial\sigma^{2}}L(\boldsymbol{\xi},x)}\\ & \qquad=\frac{1}{\sigma^{4}}\left[-y\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}+\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right.\\ &\qquad\left.+N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right]\end{aligned}$$

and

$$\begin{aligned} \frac{\partial^{3}}{\partial\theta_{j}\partial\sigma^{2}\partial\sigma^{2}}L(\boldsymbol{\xi},y)=-\frac{2}{\sigma^{6}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(-\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\end{aligned}$$

and

$$\begin{aligned} \frac{\partial^{3}}{\partial\sigma^{2}\partial\sigma^{2}\partial\sigma^{2}}L(\boldsymbol{\xi},\boldsymbol{y})=-\frac{1}{\sigma^{6}}+\frac{3}{\sigma^{8}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\end{aligned}$$

are also bounded $\boldsymbol{\xi}\in B(\boldsymbol{\xi}_{0},\delta)$ since $\sigma_{0}^{2}>0$ by assumption. The expectation of the bounds also clearly exists.

10.2.4 Lemmas

To show that the remaining assumptions are satisfied, we first examine the behavior of

$$\begin{aligned} g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})=N(x_{i},\boldsymbol{t}_{0})^{T}\boldsymbol{\alpha}_{0}-N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}.\end{aligned}$$

Notice that for x _i chosen on a uniform grid over $[a,b]$ then

$$\begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2} \to\frac{1}{b-a}\int_{a}^{b}\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x)\right)^{2}\, dx.\end{aligned}$$

Furthermore we notice that $g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x\right)$ is also a spline. The sum of the two splines is also a spline. Consider the degree p case of $g\left(x|\boldsymbol{\alpha},t\right)+g\left(x|\boldsymbol{\alpha}^{*},t^{*}\right)$ where $t<t^{*}.$ Then the sum is a spline with knot points $\left\{t,t^{*}\right\}$ and whose first p + 1 coefficients are $\boldsymbol{\alpha}+\boldsymbol{\alpha}^{*}$ and last two coefficients are $\left\{\alpha_{p+1},\alpha_{p+1}^{*}\right\}$.

At this point, we also notice

$$\begin{aligned} E\left[n^{-1}\sum g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\epsilon_{i}\right] &=n^{-1}\sum g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)E\left[\epsilon_{i}\right]\\ &=0\end{aligned}$$

$$\begin{aligned} V\left[n^{-1}\sum g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\epsilon_{i}\right] &=n^{-2}V\left[\sum g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\epsilon_{i}\right]\\ &=n^{-2}\sum V\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\epsilon_{i}\right]\\ &=n^{-2}\sum g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)^{2}V\left[\epsilon_{i}\right]\\ &=n^{-2}\sum g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)^{2}\\ &\rightarrow 0\end{aligned}$$

and that $\sum\epsilon_{i}^{2}\sim\chi_{n}^{2}$ and thus $n^{-1}\sum\epsilon_{i}^{2}$ converges in probability to the constant 1. Therefore, by the SLLN,

$$\begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left[g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2} &=\frac{1}{n}\sum_{i=1}^{n}\left[g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i}\right)\right]^{2}+\frac{2\sigma_{0}}{n}\sum_{i=1}^{n}\epsilon_{i}g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i}\right)+\frac{\sigma_{0}^{2}}{n}\sum_{i=1}^{n}\epsilon_{i}^{2}\\ &=\frac{1}{n}\sum_{i=1}^{n}\left[g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i}\right)\right]^{2}+O_{p}\left(n^{-1}\right)+\frac{\sigma_{0}^{2}}{n}\sum_{i=1}^{n}\epsilon_{i}^{2}\\ &\stackrel{a.s.}{\rightarrow}\frac{1}{b-a}\int_{a}^{b}\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x)\right)^{2}\, dx+\sigma_{0}^{2}.\end{aligned}$$

Lemma 1.

Given a degree p polynomial $g(x|\boldsymbol{\alpha})$ on $[a,b]$ with coefficients $\boldsymbol{\alpha}$ , then $\exists\;\lambda_{n,m},\lambda_{n,M}>0$ such that $||\boldsymbol{\alpha}||^{2}\lambda_{n,m}^{2}\le\frac{1}{n}\sum_{i=1}^{n}\left[g(x_{i}|\boldsymbol{\alpha})\right]^{2}\le||\boldsymbol{\alpha}||^{2}\lambda_{n,M}^{2}$ .

Proof.

If $\boldsymbol{\alpha}=\boldsymbol{0}$, then $g\left(x|\boldsymbol{\alpha}\right)=0$ and the result is obvious. If $g\left(x|\boldsymbol{\alpha}\right)$ is a polynomial with at least one non-zero coefficient, it therefore cannot be identically zero on $[a,b]$ and therefore for n > p then $\frac{1}{n}\sum\left[g(x_{i}|\boldsymbol{\alpha})\right]^{2}>0$ since the polynomial can only have at most p zeros. We notice that

$$\begin{aligned} \int_{a}^{b}\left[g(x|\boldsymbol{\alpha})\right]^{2}\, dx & =\int_{a}^{b}\left[\sum_{i=0}^{p}\alpha_{i}^{2}x^{2i}+2\sum_{i=0}^{p-1}\sum_{j=i+1}^{p}\alpha_{i}\alpha_{j}x^{i+j}\right]dx\\ & =\left.\sum_{i=0}^{p}\frac{\alpha_{i}^{2}}{i+1}x^{2i+1}+2\sum_{i=0}^{p-1}\sum_{j=i+1}^{p}\frac{\alpha_{i}\alpha_{j}}{i+j+1}x^{i+j+1}\right|_{x=a}^{b}\\ & =\boldsymbol{\alpha}^{T}X\boldsymbol{\alpha}\end{aligned}$$

where the matrix $\boldsymbol{X}$ has i,j element $\left(b^{i+j}-a^{i+j}\right)/(i+j)$. Since $\int_{a}^{b}\left[g(x|\boldsymbol{\alpha})\right]^{2}\, dx>0$ for all $\boldsymbol{\alpha}$ then the matrix X must be positive definite. Next we notice that

$$\begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left[g(x_{i}|\boldsymbol{\alpha})\right]^{2} =& \frac{1}{n}\sum_{i=1}^{n}\boldsymbol{\alpha}^{T}\boldsymbol{X}_{i}\boldsymbol{\alpha}\\ = & \boldsymbol{\alpha}^{T}\left(\frac{1}{n}\sum\boldsymbol{X}_{i}\right)\boldsymbol{\alpha}\\ = & \boldsymbol{\alpha}^{T}\boldsymbol{X}_{n}\boldsymbol{\alpha}\end{aligned}$$

and therefore $\boldsymbol{X}_{n}\to\boldsymbol{X}$ and therefore, denoting the eigenvalues of $\boldsymbol{X}_{n}$ as $\boldsymbol{\lambda}_{n}$ and the eigenvalues of $\boldsymbol{X}$ as $\boldsymbol{\lambda}$, we have $\boldsymbol{\lambda}_{n}\to\boldsymbol{\lambda}$

Letting $\lambda_{n,m}$ and $\lambda_{n,M}$ be the minimum and maximum eigenvalues of $\boldsymbol{X}_{n}$ be the largest, then $\lambda_{n,m}^{2}\left\Vert \boldsymbol{\alpha}\right\Vert ^{2}\le\frac{1}{n}\sum\left[g(x|\boldsymbol{\alpha})\right]^{2}\le\lambda_{n,M}^{2}\left\Vert \boldsymbol{\alpha}\right\Vert ^{2}.\hfill\Box$

The values $\lambda_{n,m},\lambda_{n,M}$ depend on the interval that the polynomial is integrated/summed over and that if a = b, then the integral is zero. In the following lemmas, we assume that there is some minimal distance between two knot-points and between a knot-point and the boundary values a,b.

Lemma 2.

Given a degree p spline $g(x|\boldsymbol{\theta})$ with κ knot points on $[a,b]$ , let $\tau=\left(\left|a\right|\vee\left|b\right|\right)^{\kappa}$ . Then $\forall\;\delta>2\tau,\;\exists\;\lambda_{n}>0$ such that if $\left\Vert \boldsymbol{\theta}\right\Vert>\delta$ then $\frac{1}{n}\sum\left[g(x_{i}|\boldsymbol{\theta})\right]^{2}>\left(\delta^{2}+\tau^{2}\right)\lambda_{n}$ .

Proof.

Notice that $||\boldsymbol{\theta}||^{2}>\delta^{2}>4\tau^{2}$ implies $||\boldsymbol{\alpha}||^{2}>\delta^{2}-\tau^{2}$. First we consider the case of $\kappa=1$. If $\alpha_{0}^{2}+\dots+\alpha_{p}^{2}>\left(\delta^{2}+\tau^{2}\right)/9$ then $\frac{1}{n}\sum\left[g(x_{i}|\boldsymbol{\theta})\right]^{2}1_{[a,t]}\left(x_{i}\right)>\lambda_{n}\left(\delta^{2}+\tau^{2}\right)$ for some $\lambda_{n}>0.$ If $\alpha_{0}^{2}+\dots+\alpha_{p}^{2}\le\left(\delta^{2}+\tau^{2}\right)/9$ then $\alpha_{p+1}^{2}\ge3\left(\delta^{2}+\tau^{2}\right)/4$. Therefore $(\alpha_{p}+\alpha_{p+1})$, the coefficient of the x ^p term of the polynomial on $[t_{1},b]$ is

$$\begin{aligned} \left\Vert \alpha_{p}+\alpha_{p+1}\right\Vert ^{2} &> \left\Vert \alpha_{p+1}\right\Vert ^{2}-\left\Vert \alpha_{p}\right\Vert ^{2}\\ &> \frac{3\left(\delta^{2}+\tau^{2}\right)}{4}-\frac{\left(\delta^{2}+\tau^{2}\right)}{4}\\ &> \frac{1}{2}\left(\delta^{2}+\tau^{2}\right)\end{aligned}$$

and thus the squared norm of the coefficients of the polynomial on $[t_{1},b]$ must also be greater than $\frac{1}{2}\left(\delta^{2}+\tau^{2}\right)$ and thus $\frac{1}{n}\sum\left[g(x_{i}|\boldsymbol{\theta})\right]^{2}1_{[t,b]}\left(x_{i}\right)>\lambda_{n}\left(\delta^{2}+\tau^{2}\right)$ for some $\lambda_{n}>0.$ The proof for multiple knots is similar, only examining all $\kappa+1$ polynomial sections for one with coefficients with squared norm larger than some fraction of $\left(\delta^{2}+\tau^{2}\right).\hfill\Box$

Lemma 3.

For all $\delta>0$ , there exists $\lambda_{n}>0$ such that for all $\boldsymbol{\theta}\notin B(\boldsymbol{\theta}_{0},\delta)$ then $\frac{1}{n}\sum\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2}>\lambda_{n}\delta$ .

Proof.

By the previous lemma, for all $\Delta>2\tau$ there exists $\exists\,\Lambda_{n}>0$ such that for all $\boldsymbol{\theta}\notin B(\boldsymbol{\theta}_{0},\Delta)$ then $\frac{1}{n}\sum\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2}>\Lambda_{n}\Delta$. We now consider the region

$$\begin{aligned} \mathcal{C}=\textrm{closure}\left[B\left(\boldsymbol{\theta}_{0},\Delta\right) B\left(\boldsymbol{\theta}_{0},\delta\right)\right]\end{aligned}$$

Assume to the contrary that there exists $\delta>0$ such that $\forall\,\lambda_{n}>0,\,\,\exists\boldsymbol{\theta}\in\mathcal{C}$ such that $\frac{1}{n}\sum\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2}\le\lambda_{n}\delta$ and we will seek a contradiction. By the negation, there exists a sequence $\boldsymbol{\theta}_{n}\in\mathcal{C}$ such that $\frac{1}{n}\sum\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2}\to0$. But since $\boldsymbol{\theta}_{n}$ is in a compact space, there exists a subsequence $\boldsymbol{\theta}_{n_{k}}$ that converges to $\boldsymbol{\theta}_{\infty}\in\mathcal{C}$ and $\frac{1}{n}\sum\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2}=0$. But since $\boldsymbol{\theta}_{0}\notin\mathcal{C}$ this is a contradiction.$\hfill\Box$

Corollary 4.

There exists λ such that for any $\delta>0$ and $\boldsymbol{\theta}\notin B(\boldsymbol{\theta}_{0},\delta)$

$$\begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left[g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2} \ge \lambda_{n}^{2}\delta^{2}+O_{p}\left(n^{-1/2}\right)+\sigma_{0}^{2}.\end{aligned}$$

We now focus our attention on the ratio of the maximum value of a polynomial and its integral.

Lemma 5.

Given a degree p polynomial $g\left(x|\boldsymbol{\alpha}\right)$ on $\left[a,b\right]$ , then

$$\begin{aligned} \frac{\max_{i\in\left\{1,\dots,n\right\}}\left[g\left(x_{i}|\boldsymbol{\alpha}\right)\right]^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left[g\left(x_{i}|\boldsymbol{\alpha}\right)\right]^{2}\, dx}\le\frac{\lambda_{M}^{2}}{\lambda_{n,m}^{2}}\to\frac{\lambda_{M}^{2}}{\lambda_{m}^{2}}\end{aligned}$$

for some $\lambda_{M},\lambda_{m}>0$.

Proof.

Since we can write $\left[g\left(x|\boldsymbol{\alpha}\right)\right]^{2}=\boldsymbol{\alpha}^{T}W_{x}\boldsymbol{\alpha}$ for some nonnegative definite matrix W _x which has a maximum eigenvalue $\lambda_{M,x}$, and because the the maximum eigenvalue is a continuous function in x, let $\lambda_{M}=\sup\lambda_{M,x}$. Then the maximum of $\left[g\left(x|\boldsymbol{\alpha}\right)\right]^{2}$ over $x\in[a,b]$ is less than $\left\Vert \boldsymbol{\alpha}\right\Vert ^{2}\lambda_{M}^{2}$. The denominator is bounded from below by $\left\Vert \boldsymbol{\alpha}\right\Vert ^{2}\lambda_{n,m}^{2}.\hfill\Box$

Lemma 6.

Given a degree p spline $g\left(x|\boldsymbol{\theta}\right)$ on $\left[a,b\right]$ , then

$$\begin{aligned} \frac{\max\left[g\left(x|\boldsymbol{\theta}\right)\right]^{2}}{\int_{a}^{b}\left[g\left(x|\boldsymbol{\theta}\right)\right]^{2}\, dx}\le\frac{\lambda_{M}^{2}}{\lambda_{m}^{2}}\end{aligned}$$

for some $\lambda_{M},\lambda_{m}>0$.

Proof.

Since a degree p spline is a degree p polynomial on different regions defined by the knot-points, and because the integral over the whole interval $[a,b]$ is greater than the integral over the regions defined by the knot-points, we can use the previous lemma on each section and then chose the largest ratio.$\hfill\Box$

Lemma 7.

Given a degree p spline $g\left(x|\boldsymbol{\theta}\right)$ on $[a,b]$ then

$$\frac{n^{-1/2}\max_{i}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}{n^{-1}\sum_{i=1}^{n}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}=O_{p}\left(1\right)$$

(10.5)

uniformly over $\boldsymbol{\theta}$.

Proof.

Notice

$$\begin{aligned} \frac{n^{-1/2}\max_{i}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}{n^{-1}\sum_{i=1}^{n}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}} & \le & \frac{2n^{-1/2}\max_{i}\left[\epsilon_{i}^{2}\sigma_{0}^{2}\right]+2n^{-1/2}\max_{i}\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}{n^{-1}\sum_{i=1}^{n}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}\\ & = & \frac{2\sigma_{0}^{2}n^{-1/2}\max_{i}\epsilon_{i}^{2}+\max_{i}\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}{n^{-1}\sum_{i=1}^{n}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}\\ & = & \frac{O_{p}\left(\frac{\log n}{\sqrt{n}}\right)+\max_{i}\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}{n^{-1}\sum_{i=1}^{n}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}\end{aligned}$$

and since $n^{-1}\sum_{i=1}^{n}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}\stackrel{P}{\to}\frac{1}{b-a}\int_{a}^{b}\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x)\right)^{2}\, dx+\sigma_{0}^{2}$, and lemma 8 bounds the ratio of the terms that involve $\boldsymbol{\theta}$, this ratio is bounded in probability uniformly over $\boldsymbol{\theta}$.$\hfill\Box$

10.2.5 Assumptions B1

Returning to assumption B1, we now consider $\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)$ and

$$\begin{aligned} L_{n}\left(\boldsymbol{\xi}\right) & =\sum\log\left\{\frac{1}{\sqrt{2\pi}\sigma}{\rm exp}\left[\frac{-1}{2\sigma}\sum\left(y_{i}-N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right]\right\}\\ & =-\frac{n}{2}\log\left(2\pi\right)-n\log\sigma-\frac{1}{2\sigma}\sum\left[y_{i}-N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right]^{2}\\ & =-\frac{n}{2}\log\left(2\pi\right)-n\log\sigma-\frac{1}{2\sigma}\sum\left[N(x_{i},\boldsymbol{t}_{0})^{T}\boldsymbol{\alpha}_{0}+\sigma_{0}\epsilon_{i}-N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right]^{2}\\ & =-\frac{n}{2}\log\left(2\pi\right)-n\log\sigma-\frac{1}{2\sigma}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}\end{aligned}$$

and therefore

$$\begin{aligned} {\frac{1}{n}}&\left(L_{n}(\boldsymbol{\xi})-L_{n}(\boldsymbol{\xi}_{0})\right)\\ & =-\log\sigma-\frac{1}{2n\sigma^{2}}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}+\log\sigma_{0}+\frac{1}{2n\sigma_{0}}\sum\left[g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}\\ & =\log\frac{\sigma_{0}}{\sigma}-\frac{1}{2n\sigma^{2}}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}+\frac{1}{2n\sigma_{0}^{2}}\sum\left[\sigma_{0}\epsilon_{i}\right]^{2}\\ & =\log\frac{\sigma_{0}}{\sigma}-\frac{\left(\lambda_{n}\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0}\right)\right)^{2}}{2\sigma^{2}}-\frac{\sigma_{0}^{2}}{2\sigma^{2}}+\frac{1}{2n}\sum\left[\epsilon_{i}\right]^{2}\end{aligned}$$

where

$$\begin{aligned} \left[\lambda_{n}\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0}\right)\right]^{2} =\frac{1}{n}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}-\sigma_{0}^{2}\end{aligned}$$

which converges in probability to $\frac{1}{b-a}\int_{a}^{b}\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x\right)\right]^{2}dx$. The function goes to $-\infty$ as $\sigma\to0$ and $\sigma\to\infty$. Taking the derivative

$$\begin{aligned}\frac{d}{d\sigma}\left[\log\frac{\sigma_{0}}{\sigma}-\frac{1}{2\sigma^{2}}\left[\left(\lambda_{n}\right)^{2}+\sigma_{0}^{2}\right]+\frac{1}{2n}\sum\epsilon_{i}^{2}\right]=-\frac{1}{\sigma}+\frac{1}{\sigma^{3}}\left[\left(\lambda_{n}\right)^{2}+\sigma_{0}^{2}\right]\end{aligned}$$

and setting it equal to zero yields a single critical point of at $\sigma^{2}=\left[\left(\lambda_{n}\right)^{2}+\sigma_{0}^{2}\right]$ which results in a maximum of

$$\log\left(\frac{\sigma_{0}}{\sqrt{\left(\lambda_{n}\right)^{2}+\sigma_{0}^{2}}}\right)-\frac{1}{2}+\frac{1}{2}n^{-1}\sum\epsilon_{i}^{2}$$

(10.6)

which bounded away from zero in probability for $\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)$

10.2.6 Assumption C1

Assumption C1 is

$$\begin{aligned}\inf_{\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)}\frac{\min_{i=1\dots n}L(\boldsymbol{\xi},X_{i})}{\left|L_{n}(\boldsymbol{\xi})-L_{n}(\boldsymbol{\xi}_{0})\right|}\stackrel{P_{\boldsymbol{\xi}_{0}}}{\longrightarrow}0\end{aligned}$$

First notice

$$\begin{aligned} L(\boldsymbol{\xi},Y_{i})&=-\frac{1}{2}\log\left(2\pi\right)-\log\sigma-\frac{1}{2\sigma^{2}}\left(Y_{i}-N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\\ & =-\frac{1}{2}\log\left(2\pi\right)-\log\sigma-\frac{1}{2\sigma^{2}}\left(\epsilon_{i}\sigma_{0}+N(x_{i},\boldsymbol{t}_{0})^{T}\boldsymbol{\alpha}_{0}-N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\\ & =-\frac{1}{2}\log\left(2\pi\right)-\log\sigma-\frac{1}{2\sigma^{2}}\left(\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2}\end{aligned}$$

and we consider $\mathcal{C}=\left\{\boldsymbol{\xi}:\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)\right\}$. Define

$$\begin{aligned} f_{n}\left(\boldsymbol{\xi}\right)&=& \frac{\min\; L\left(\boldsymbol{\xi},Y_{i}\right)}{\left|L_{n}\left(\boldsymbol{\xi}\right)-L_{n}\left(\boldsymbol{\xi}_{0}\right)\right|}\\ & = & \frac{-\frac{1}{2}\log\left(2\pi\right)-\log\sigma-\frac{1}{2\sigma^{2}}\max\left[\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{n\cdot\frac{1}{n}\left|L_{n}\left(\boldsymbol{\xi}\right)-L_{n}\left(\boldsymbol{\xi}_{0}\right)\right|}\end{aligned}$$

and notice that the denominator is bounded away from 0 by 10.6.

$$\begin{aligned} f_{n}\left(\boldsymbol{\xi}\right)&= \frac{-\frac{1}{2}\log\left(2\pi\right)-\log\sigma-\frac{1}{2\sigma^{2}}\max\left[\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{-n\cdot\frac{1}{n}\left(L_{n}\left(\boldsymbol{\xi}\right)-L_{n}\left(\boldsymbol{\xi}_{0}\right)\right)}\\ & =\frac{\frac{1}{\sqrt{n}}\left[-\frac{1}{2}\log\left(2\pi\right)-\log\sigma-\frac{1}{2\sigma^{2}}\max\left[\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}\right]}{-\sqrt{n}\cdot\frac{1}{n}\left[n\log\frac{\sigma_{0}}{\sigma}-\frac{1}{2\sigma^{2}}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}+\frac{1}{2}\sum\epsilon_{i}^{2}\right]}\\ & =\frac{1}{\sqrt{n}}\cdot\frac{-\frac{1}{2\sqrt{n}}\log\left(2\pi\right)-\frac{1}{\sqrt{n}}\log\sigma-\frac{1}{2\sqrt{n}\sigma^{2}}\max\left[\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{-\log\frac{\sigma_{0}}{\sigma}+\frac{1}{2n\sigma^{2}}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}-\frac{1}{2n}\sum\epsilon_{i}^{2}}\\ & =\frac{1}{\sqrt{n}}\left[\frac{-\frac{1}{2\sqrt{n}}\log\left(2\pi\right)}{-\log\frac{\sigma_{0}}{\sigma}+\frac{1}{2n\sigma^{2}}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}-\frac{1}{2n}\sum\epsilon_{i}^{2}}\right.\\ &\qquad \left.+\frac{-\frac{1}{\sqrt{n}}\log\sigma-\frac{1}{2\sqrt{n}\sigma^{2}}\max\left[\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{-\log\frac{\sigma_{0}}{\sigma}+\frac{1}{2n\sigma^{2}}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}-\frac{1}{2n}\sum\epsilon_{i}^{2}}\right]\end{aligned}$$

We consider the infimums of the terms inside the brackets separately.

For the first term, since the denominator is bounded in probability above 0 uniformly in $\boldsymbol{\theta}$, and the numerator goes to zero, the infimum of the first term goes to 0 in probability.

The second term is uniformly bounded over $\boldsymbol{\theta}$ by lemma 9. Notice that the numerator is

$$\begin{aligned} -&\frac{1}{\sqrt{n}}\log\sigma-\frac{1}{2\sqrt{n}\sigma^{2}}\max\left[\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}\\ &\ge -\frac{1}{\sqrt{n}}\log\sigma-\frac{\max\left[\epsilon_{i}\sigma_{0}\right]^{2}}{\sqrt{n}\sigma^{2}}-\frac{\max\left[g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{\sqrt{n}\sigma^{2}}\\ & = -\frac{1}{\sqrt{n}}\log\sigma-\frac{\sigma_{0}^{2}\, O_{p}\left(\log\, n\right)}{\sqrt{n}\sigma^{2}}-\frac{\max\left[g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{\sqrt{n}\sigma^{2}}\\ & \ge \frac{-\log n}{\sqrt{n}}\,\log\sigma-\frac{\sigma_{0}^{2}\, O_{p}\left(\log\, n\right)}{\sqrt{n}\sigma^{2}}-\frac{\max\left[g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{\sqrt{n}\sigma^{2}}\end{aligned}$$

and all three terms of the numerator converge to 0 for every σ. Therefore, for $\sigma\in\left[0,d\right]$ for some large d, the infimum converges to 0. For $\sigma>d$, the $\log\sigma$ terms dominate and the infimum occurs at $\sigma=d$ which also converges to 0. Therefore

$$\begin{aligned}\begin{array}{c}{\rm inf}\\[-3pt]\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)\end{array}\frac{\min{L\left(\boldsymbol{\xi},Y_{i}\right)}}{\left|L_{n}\left(\boldsymbol{\xi}\right)-L_{n}\left(\boldsymbol{\xi}_{0}\right)\right|}\stackrel{P}{\to}0.\end{aligned}$$

10.2.7 Assumptions C2

Finally we turn our attention to the Jacobian. Recall that the Jacobian is

$$\begin{aligned}J_{0}\left(\boldsymbol{y}_{0},\boldsymbol{\xi}\right)=\left|\frac{1}{\sigma^{2}}p^{\kappa}\det\left[\begin{array}{ccc} \boldsymbol{B}_{\boldsymbol{\alpha}} & \boldsymbol{B}_{\boldsymbol{t}} & \boldsymbol{B}_{\sigma^{2}}\end{array}\right]\right|\end{aligned}$$

where

$$\begin{aligned}\boldsymbol{B}_{\boldsymbol{\alpha}}=\left[\!\!\!\begin{array}{ccccccc} 1 & x_{(1)} & \dots & x_{(1)}^{p} & (x_{(1)}-t_{1})_{+}^{p} & \dots & (x_{(1)}-t_{\kappa})_{+}^{p}\\ \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots\\ 1 & x_{(l)} & \dots & x_{(l)}^{p} & (x_{(l)}-t_{1})_{+}^{p} & \dots & (x_{(l)}-t_{\kappa})_{+}^{p}\end{array}\!\!\!\right],\end{aligned}$$

$$\begin{aligned}\boldsymbol{B}_{\boldsymbol{t}}=\left[\!\!\!\begin{array}{ccc} \alpha_{1+p+1}\left(x_{(1)}-t_{1}\right)_{+}^{p-1}I\left(x_{(1)}-t_{1}\right) & \dots & \alpha_{1+p+\kappa}\left(x_{(1)}-t_{\kappa}\right)_{+}^{p-1}I\left(x_{(1)}-t_{\kappa}\right)\\ \vdots & \ddots & \vdots\\ \alpha_{1+p+1}\left(x_{(l)}-t_{1}\right)_{+}^{p-1}I\left(x_{(l)}-t_{1}\right) & \dots & \alpha_{1+p+\kappa}\left(x_{(l)}-t_{\kappa}\right)_{+}^{p-1}I\left(x_{(l)}-t_{\kappa}\right)\end{array}\!\!\!\right],\end{aligned}$$

and

$$\begin{aligned}\boldsymbol{B}_{\sigma^{2}}=\left[\!\!\!\begin{array}{c} -\frac{1}{2}\left(y_{(1)}-g(x_{(1)}|\boldsymbol{\theta})\right)\\ \vdots\\ -\frac{1}{2}\left(y_{(l)}-g(x_{(l)}|\boldsymbol{\theta})\right)\end{array}\!\!\!\right].\end{aligned}$$

Following the notation of Yeo and Johnson, we suppress parenthesis and 0 subscripts. We consider the $\boldsymbol{\xi}$ in compact space $\bar{B}(\boldsymbol{\xi}_{0},\delta)$. We notice that for $\delta<\sigma^{-2}$ that $J(\boldsymbol{y};\boldsymbol{\xi})\le\delta^{\kappa+1}p^{\kappa}g(\boldsymbol{y})$ for some $g(\boldsymbol{y})$ because $\boldsymbol{B_{\alpha}}$ and $\boldsymbol{B_{t}}$ are functions of $\boldsymbol{x},\boldsymbol{t}$ which are bounded.

We let $S_{M}^{l}$ be the unit square in $\mathbb{R}^{l}$ of radius M.

Finally, we notice that $J_{j}(y_{1},\dots,y_{j};\boldsymbol{\xi})=E\left[J\left(y_{1},\dots,y_{j},Y_{j+1},\dots,Y_{l};\boldsymbol{\xi}\right)\right]$ is a polynomial in $\boldsymbol{\theta}$ scaled by $\sigma^{2}$, which is equicontinuous on compacts of $\boldsymbol{\xi}$ where σ is bounded away from 0.

Appendix C: Full Simulation Results

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sonderegger, D., Hannig, J. (2014). Fiducial Theory for Free-Knot Splines. In: Lahiri, S., Schick, A., SenGupta, A., Sriram, T. (eds) Contemporary Developments in Statistical Theory. Springer Proceedings in Mathematics & Statistics, vol 68. Springer, Cham. https://doi.org/10.1007/978-3-319-02651-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-02651-0_10
Published: 03 December 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02650-3
Online ISBN: 978-3-319-02651-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Abstract

Buying options

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix A: Proof of Asymptotic Normality of Fiducial Estimators

10.1.1 Assumptions

10.1.1.1 Conditions for Asymptotic Normality of the MLE

10.1.1.2 Conditions for the Bayesian Posterior Distribution to be Close to That of the MLE.

10.1.1.3 Conditions for Showing That the Fiducial Distribution is Close to the Bayesian Posterior

10.1.2 Proof of Asymptotic Normality of Multivariate Fiducial Estimators

Proof.

Appendix B: Proof of Assumptions for Free-Knot Splines Using a Truncated Polynomial Basis

10.2.1 Assumptions A0–A4

10.2.2 Assumptions A5

10.2.3 Assumptions A6

10.2.4 Lemmas

Lemma 1.

Proof.

Lemma 2.

Proof.

Lemma 3.

Proof.

Corollary 4.

Lemma 5.

Proof.

Lemma 6.

Proof.

Lemma 7.

Proof.

10.2.5 Assumptions B1

10.2.6 Assumption C1

10.2.7 Assumptions C2

Appendix C: Full Simulation Results

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation