Skip to main content

Fiducial Theory for Free-Knot Splines

  • Conference paper
  • First Online:

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 68))

Abstract

We construct the fiducial model for free-knot splines and derive sufficient conditions to show asymptotic consistency of a multivariate fiducial estimator. We show that splines of degree four and higher satisfy those conditions and conduct a simulation study to evaluate quality of the fiducial estimates compared to the competing Bayesian solution. The fiducial confidence intervals achieve the desired confidence level while tending to be shorter than the corresponding Bayesian credible interval using the reference prior. AMS 2000 subject classifications: Primary 62F99, 62G08; secondary 62P10.

Jan Hannig’s research was supported in part by the National Science Foundation under Grant No. 1007543 and 1016441.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Cisewski J, Cisewski J, Hannig J (2012) Generalized fiducial inference for normal linear mixed models Ann Stat 40:2102–2127

    Article  MATH  MathSciNet  Google Scholar 

  • DiMatteo I, Genovese C, Kass R, Robert E (2001) Bayesian curve-fitting with free-knot splines. Biometrika 88:1055–1071

    Article  MATH  MathSciNet  Google Scholar 

  • Lidong E, Hannig J, Iyer H (2008) Fiducial intervals for variance components in an un-balanced two-component normal mixed linear model. J Am Stat Assoc 103:854–865

    Article  MATH  Google Scholar 

  • Fisher RA (1930) Inverse probability. Proc Camb Philos Soc xxvi:528–535

    Article  Google Scholar 

  • Ghosh JK, Ramamoorthi RV (2003). Bayesian Nonparametrics. Springer-Verlag, New York

    MATH  Google Scholar 

  • Hannig J (2009) On generalized fiducial inference. Statist Sinica 19:491–544

    MATH  MathSciNet  Google Scholar 

  • Hannig J (2013) Generalized fiducial inference via discretization. Stat Sinica 23:489–514

    MATH  MathSciNet  Google Scholar 

  • Hannig J, Iyer H, Patterson P (2006) Fiducial generalized confidence intervals. J Am Stat Assoc 101:254–269. 10.1198/016214505000000736

    Article  MATH  MathSciNet  Google Scholar 

  • Hannig J, Lee TCM (2009). Generalized fiducial inference for wavelet regression. Biometrika 96:847–860. 10.1093/biomet/asp050

    Article  MATH  MathSciNet  Google Scholar 

  • Lehmann EL, George C, (1998) Theory of point estimation. Springer, New York

    MATH  Google Scholar 

  • Muggeo VMR (2003) Estimating regression models with unknown break-points. Stat Med 22:3055–3071

    Article  Google Scholar 

  • Muggeo VMR (2008) Segmented: an R package to fit regression models with broken-line relationships. R News, 8, 1: 20–25.

    Google Scholar 

  • Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Sonderegger DL, Wang H, Clements WH, Noon BR (2009) Using SiZer to detect thresholds in ecological data. Front Ecol Environ 7:190–195 doi:10.1890/070179

    Google Scholar 

  • Toms JD, Lesperance ML (2003) Piecewise regression: a tool for identifying ecological thresholds. Ecology 84:2034–2041

    Article  Google Scholar 

  • van der Varrt AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Wandler DV, Hannig J (2011) Generalized fiducial confidence intervals for extremes. Extremes 15:67–87. 10.1007/s10687-011-0127-9

    Article  MathSciNet  Google Scholar 

  • Wandler DV, Hannig J (2012) A fiducial approach to multiple comparisons. J Stat Plan Infer 142:878–895. 10.1016/j.jspi.2011.10.011

    Article  MATH  MathSciNet  Google Scholar 

  • Weerahandi S (1993) Generalized confidence intervals. J Am Stat Assoc 88(423):899–905

    Article  MATH  MathSciNet  Google Scholar 

  • Yeo IK, Johnson RA (2001) A uniform strong law of large numbers for U-statistics with application to transforming to near symmetry. Stat Probab Lett 51 63–69

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgement

Dr Hannig thanks Prof. Hira Koul for his encouragement and help ever since he was a graduate student at Michigan State University. A young researcher cannot ask for a better role model. The authors also thank the two anonymous referees that made several useful suggestions for improving the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Derek L. Sonderegger .

Editor information

Editors and Affiliations

Appendices

Appendix A: Proof of Asymptotic Normality of Fiducial Estimators

We start with several assumptions. The assumptions A0–A6 are sufficient for the maximum likelihood estimate to converge asymptotically to a normal distribution and can be found in Lehmann and Casella (1998) as 6.3 (A0)–(A2) and 6.5 (A)–(D). The assumption B2 shows that the Jacobian converges to a prior (Hannig 2009) and B1 is the assumption necessary for the Bayesian solution to converge to that of the MLE (Ghosh and Ramamoorthi 2003, Theorem 1.4.1).

10.1.1 Assumptions

10.1.1.1 Conditions for Asymptotic Normality of the MLE

  1. (A0)

    The distributions \(P_{\boldsymbol{\xi}}\) are distinct.

  2. (A1)

    The set \(\left\{x:f(x|\boldsymbol{\xi})>0\right\}\) is independent of the choice of \(\boldsymbol{\xi}\).

  3. (A2)

    The data \(\boldsymbol{X}=\{X_{1},\dots,X_{n}\}\) are independent identically distributed (i.i.d.) with probability density \(f(\cdot|\boldsymbol{\xi})\).

  4. (A3)

    There exists an open neighborhood about the true parameter value \(\boldsymbol{\xi}_{0}\) such that all third partial derivatives \(\left(\partial^{3}/\partial\xi_{i}\partial\xi_{j}\partial\xi_{k}\right)f(\boldsymbol{x}|\boldsymbol{\xi})\) exist in the neighborhood, denoted by \(B(\boldsymbol{\xi}_{0},\delta)\).

  5. (A4)

    The first and second derivatives of \(L(\boldsymbol{\xi},x)=\log f(x|\boldsymbol{\xi})\) satisfy

    $$\begin{aligned} E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\xi_{j}}L(\boldsymbol{\xi},x)\right]=0\end{aligned}$$

    and

    $$\begin{aligned} I_{j,k}(\boldsymbol{\xi}) & = E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\xi_{j}}L(\boldsymbol{\xi},x)\cdot\frac{\partial}{\partial\xi_{k}}L(\boldsymbol{\xi},x)\right]\\ & = -E_{\boldsymbol{\xi}}\left[\frac{\partial^{2}}{\partial\xi_{j}\partial\xi_{k}}L(\boldsymbol{\xi},x)\right].\end{aligned}$$
  6. (A5)

    The information matrix \(I(\boldsymbol{\xi})\) is positive definite for all \(\boldsymbol{\xi}\in B(\boldsymbol{\xi}_{0},\delta)\)

  7. (A6)

    There exists functions \(M_{jkl}(\boldsymbol{x})\) such that

    $$\begin{aligned} \sup_{\boldsymbol{\xi}\in B(\boldsymbol{\xi}_{0},\delta)}\left|\frac{\partial^{3}}{\partial\xi_{j}\partial\xi_{k}\partial\xi_{l}}L(\boldsymbol{\xi},x)\right|\le M_{j,k,l}(x)\quad\textrm{and}\;\;E_{\boldsymbol{\xi}_{0}}M_{j,k,l}(x)<\infty\end{aligned}$$

10.1.1.2 Conditions for the Bayesian Posterior Distribution to be Close to That of the MLE.

Let \(\pi(\boldsymbol{\xi})=E_{\boldsymbol{\xi}_{0}}J_{0}(X_{0},\boldsymbol{\xi})\) and \(L_{n}(\boldsymbol{\xi})=\sum L(\boldsymbol{\xi},X_{i})\)

  1. (B1)

    For any \(\delta>0\) there exists \(\epsilon>0\) such that

    $$\begin{aligned} P_{\boldsymbol{\xi}_{0}}\left\{\sup_{\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)}\frac{1}{n}\left(L_{n}(\boldsymbol{\xi})-L_{n}(\boldsymbol{\xi}_{0})\right)\le-\epsilon\right\} \to1\end{aligned}$$
  2. (B2)

    \(\pi\left(\boldsymbol{\xi}\right)\) is positive at \(\boldsymbol{\xi}_{0}\)

10.1.1.3 Conditions for Showing That the Fiducial Distribution is Close to the Bayesian Posterior

  1. (C1)

    For any \(\delta>0\)

    $$\begin{aligned} \inf_{\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)}\frac{\min_{i=1\dots n}L(\boldsymbol{\xi},X_{i})}{\left|L_{n}(\boldsymbol{\xi})-L_{n}(\boldsymbol{\xi}_{0})\right|}\stackrel{P_{\boldsymbol{\xi}_{0}}}{\longrightarrow}0\end{aligned}$$
  2. (C2)

    Let \(\pi(\boldsymbol{\xi})=E_{\boldsymbol{\xi}_{0}}J_{0}(X_{0},\boldsymbol{\xi})\). The Jacobian function \(J\left(\boldsymbol{X},\boldsymbol{\xi}\right)\stackrel{a.s.}{\to}\pi\left(\boldsymbol{\xi}\right)\) uniformly on compacts in \(\boldsymbol{\xi}\). In the single variable case, this reduces to \(J\left(\boldsymbol{X},\xi\right)\) is continuous in \(\xi,\) \(\pi\left(\xi\right)\) is finite and \(\pi\left(\xi_{0}\right)>0\), and for some \(\delta_{0}\)

    $$\begin{aligned} E_{\xi_{0}}\left(\sup_{\xi\in B\left(\xi_{0},\delta\right)}J_{0}\left(\boldsymbol{X},\xi\right)\right)<\infty.\end{aligned}$$

    In the multivariate case, we follow Yeo and Johnson (2001). Let

    $$\begin{aligned} J_{j}\left(x_{1},\dots,x_{j};\boldsymbol{\xi}\right)=E_{\boldsymbol{\xi}_{0}}\left[J_{0}\left(x_{1},\dots,x_{j},X_{j+1},\dots,X_{k};\boldsymbol{\xi}\right)\right].\end{aligned}$$
  1. (C2.a)

    There exists a integrable and symmetric functions \(g\left(x_{1},\dots,x_{j}\right)\) and compact space \(\bar{B}\left(\boldsymbol{\xi}_{0},\delta\right)\) such that for \(\boldsymbol{\xi}\in\bar{B}\left(\boldsymbol{\xi}_{0},\delta\right)\) then \(\left|J_{j}\left(x_{1},\dots,x_{j};\boldsymbol{\xi}\right)\right|\le g\left(x_{1},\dots,x_{j}\right)\) for \(j=1,\dots,k\).

  2. (C2.b)

    There exists a sequence of measurable sets \(S_{M}^{k}\) such that

    $$\begin{aligned} P\left(\mathbb{R}^{k}-\cup_{M=1}^{\infty}S_{M}^{k}\right)=0\end{aligned}$$
  3. (C2.c)

    For each M and for all \(j\in1,\dots,k\), \(J_{j}\left(x_{1},\dots,x_{j};\boldsymbol{\xi}\right)\) is equicontinuous in \(\boldsymbol{\xi}\) for \(\{x_{1},\dots,x_{j}\}\in S_{M}^{j}\) where \(S_{M}^{k}=S_{M}^{j}S_{M}^{k-j}\).

10.1.2 Proof of Asymptotic Normality of Multivariate Fiducial Estimators

We now prove the asymptotic normality (Theorem 1) for multivariate fiducial estimators.

Proof.

Assume without loss of generality that \(\xi\in\boldsymbol{\Xi}=\mathbb{R}^{p}\). We denote \(J_{n}\left(\boldsymbol{x}_{n},\boldsymbol{\xi}\right)\) as the average of all possible Jacobians over a sample of size n and \(\pi\left(\boldsymbol{\xi}\right)=E_{\boldsymbol{\xi}_{0}}J_{0}\left(\boldsymbol{x},\boldsymbol{\xi}\right)\). Assumption C2 and the uniform strong law of large numbers for U-statistics imply that \(J_{n}\left(\boldsymbol{x},\boldsymbol{\xi}\right)\stackrel{a.s.}{\rightarrow}\pi\left(\boldsymbol{\xi}\right)\) uniformly in \(\boldsymbol{\xi}\in\bar{B}\left(\boldsymbol{\xi}_{0},\delta\right)\) and that \(\pi\left(\boldsymbol{\xi}\right)\) is continuous. Therefore,

$$\sup_{\boldsymbol{\xi}\in\bar{B}\left(\boldsymbol{\xi}_{0},\delta\right)}\left|J_{n}\left(x_{n},\boldsymbol{\xi}\right)-\pi\left(\boldsymbol{\xi}\right)\right|\to0\;P_{\boldsymbol{\xi}_{0}}\, a.s.$$

The multivariate proof now proceeds in a similar fashion as the univariate case. Let

$$\begin{aligned} \pi^{*}\left(\boldsymbol{s},\boldsymbol{x}\right) &= \frac{J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)f\left(\boldsymbol{x}_{n}|\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)}{\int_{\mathbb{R}^{p}}J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right)f\left(\boldsymbol{x}_{n}|\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right)\, d\!\boldsymbol{t}}\\ & = \frac{J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)\right]}{\int_{\mathbb{R}^{p}}J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right)\right]\, d\!\boldsymbol{t}}\\ & =\frac{J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]}{\int_{\mathbb{R}^{p}}J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\, d\!\boldsymbol{t}}\end{aligned}$$

and just as Ghosh and Ramamoorthi (2003), we let \(H=-\frac{1}{n}\frac{\partial}{\partial\boldsymbol{\xi}\partial\boldsymbol{\xi}}L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\) and we notice that \(H\to I\left(\boldsymbol{\xi}_{0}\right)\, a.s.P_{\boldsymbol{\xi}_{0}}\). It will be sufficient to prove

$$\begin{aligned}\int_{\mathbb{R}^{p}}\left|J_{n}\left({x}_{n},\hat{\xi}_{n}+\frac{t}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\xi}_{n}+\frac{t}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right.\nonumber\\\left.-\pi\left(\boldsymbol{\xi}_{0}\right){\rm exp}\left[\frac{-\boldsymbol{t}^{T}I\left(\boldsymbol{\xi}_{0}\right)\boldsymbol{t}}{2}\right]\right|d\!\boldsymbol{t} \stackrel{P_{\boldsymbol{\xi}_{0}}}{\rightarrow}0\end{aligned}$$
(10.3)

Let t i represent the ith component of vector \(\boldsymbol{t}\). By Taylor’s Theorem, we can compute

$$\begin{aligned} L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right) &= L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)+\sum_{i=1}^{p}\left(\frac{t_{i}}{\sqrt{n}}\right)\frac{\partial}{\partial\xi_{i}}L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\\ &+\frac{1}{2}\sum_{i=1}^{p}\sum_{j=1}^{p}\left(\frac{t_{i}t_{j}}{\left(\sqrt{n}\right)^{2}}\frac{\partial}{\partial\xi_{i}\partial\xi_{j}}L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right)\\ &+\frac{1}{6}\sum_{i=1}^{p}\sum_{j=1}^{p}\sum_{k=1}^{p}\left(\frac{t_{i}t_{j}t_{k}}{\left(\sqrt{n}\right)^{3}}\frac{\partial}{\partial\xi_{i}\partial\xi_{j}\partial\xi_{k}}L_{n}\left(\boldsymbol{\xi}'\right)\right)\\ & = L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)-\frac{\boldsymbol{t}^{T}H\boldsymbol{t}}{2}+R_{n}\end{aligned}$$

for some \(\boldsymbol{\xi}^\prime\in\left[\hat{\boldsymbol{\xi}}_{n},\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right]\). Notice that \(R_{n}=O_{p}\left(\left\Vert \boldsymbol{t}\right\Vert/n^{3/2}\right)\).

Given any \(0<\delta<\delta_{0}\) and \(c>0\), we break \(\mathbb{R}^{p}\) into three regions:

$$\begin{aligned} A_{1} & =\left\{\boldsymbol{t}:\;\left\Vert \boldsymbol{t}\right\Vert <c\log\sqrt{n}\right\}\\ A_{2}&=\left\{\boldsymbol{t}:c\log\sqrt{n}<\left\Vert \boldsymbol{t}\right\Vert <\delta\sqrt{n}\right\}\\ A_{3} & =\left\{\boldsymbol{t}:\;\delta\sqrt{n}<\left\Vert \boldsymbol{t}\right\Vert \right\}\end{aligned}$$

On \(A_{1}\cup A_{2}\) we compute

$$\begin{aligned} {\int_{A_{1}\cup A_{2}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right.}\\ \left.-\pi\left(\boldsymbol{\xi}_{0}\right){\rm exp}\left[-\frac{1}{2}\boldsymbol{t}'I\left(\boldsymbol{\xi}_{0}\right)\boldsymbol{t}\right]\right|d\!\boldsymbol{t}\\ \le \int_{A_{1}\cup A_{2}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-\pi\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)\right|\\ \cdot{\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]d\!\boldsymbol{t}\\ +\int_{A_{1}\cup A_{2}}\left|\pi\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right.\\ \left.-\pi\left(\boldsymbol{\xi}_{0}\right){\rm exp}\left[-\frac{1}{2}\boldsymbol{t}'I\left(\boldsymbol{\xi}_{0}\right)\boldsymbol{t}\right]\right|d\!\boldsymbol{t}\end{aligned}$$

Since \(\pi\left(\boldsymbol{\cdot}\right)\) is a proper prior on \(A_{1}\cup A_{2}\), then the second term goes to 0 by the Bayesian Bernstein-von Mises theorem. Next we notice that

$$\begin{aligned} {\int_{A_{1}\cup A_{2}}\left|J_{n}\left(x,\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-\pi\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)\right|}\\ \cdot{\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]d\!\boldsymbol{t}\\ \le \sup_{\boldsymbol{t}\in A_{1}\cup A_{2}}\left|J_{n}\left(x,\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-\pi\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)\right|\\ \cdot\int_{A_{1}\cup A_{2}}{\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]d\!\boldsymbol{t}\end{aligned}$$

Since \(\sqrt{n}\left(\hat{\boldsymbol{\xi}}_{n}-\boldsymbol{\xi}_{0}\right)\stackrel{\mathcal{D}}{\to}N\left(0,I\left(\boldsymbol{\xi}_{0}\right)^{-1}\right)\), then

$$\begin{aligned}P_{\boldsymbol{\xi}_{0}}\left[\left\{\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n};\;\boldsymbol{t}\in A_{1}\cup A_{2}\right\} \subset B\left(\boldsymbol{\xi}_{0},\delta_{0}\right)\right]\to1.\end{aligned}$$

Furthermore, since \(L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\boldsymbol{t}/\sqrt{n}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)=-\frac{\boldsymbol{t}^{T}H\boldsymbol{t}}{2}+R_{n}\) then the integral converges in probability to 1. Since \(\max_{\mathbf{t}\in A_{1}\cup A_{2}}\left\Vert \boldsymbol{t}/\sqrt{n}\right\Vert \le\delta\) and \(J_{n}\to\pi,\) then the term \(\to0\) in probability.

Next, we turn to

$$\begin{aligned} {\int_{A_{3}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right.}\\ \left.-\pi\left(\boldsymbol{\xi}_{0}\right){\rm exp}\left[\frac{-\boldsymbol{t}^{T}I\left(\boldsymbol{\xi}_{0}\right)\boldsymbol{t}}{2}\right]\right|\, d\!\boldsymbol{t}\\ \le \int_{A_{3}}J_{n}\left(\boldsymbol{x}_{i},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]d\!\boldsymbol{t}\\ +\int_{A_{3}}\pi\left(\boldsymbol{\xi}_{0}\right){\rm exp}\left[\frac{-\boldsymbol{t}^{T}I\left(\boldsymbol{\xi}_{0}\right)\boldsymbol{t}}{2}\right]d\!\boldsymbol{t}\end{aligned}$$

The second integral goes to 0 in \(P_{\boldsymbol{\xi}_{0}}\) probability because \(\min_{A_{3}}\left\Vert \boldsymbol{t}\right\Vert \to\infty\). As for the first integral,

$$\begin{aligned} {\int_{A_{3}}J_{n}\left(\boldsymbol{x},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]d\!\boldsymbol{t}}\\ =\frac{1}{n}\sum_{i=1}^{n}\int_{A_{3}}J\left(\boldsymbol{x}_{i},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]d\!\boldsymbol{t}\\ =\frac{1}{n}\sum_{i=1}^{n}\int_{A_{3}}J\left(\boldsymbol{x}_{i},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)f\left(x_{i}|\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)\\ {\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)-\log f\left(x_{i}|\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)\right]d\!\boldsymbol{t}\end{aligned}$$

Because \(J\left(\cdot\right)\) is a probability measure, then so is \(J\left(\cdot\right)f\left(\cdot\right)\). Assumption C1 assures that the exponent goes to \(-\infty\) and therefore the integral converges to 0 in probability.

Having shown Eq. 10.3, we now follow Ghosh and Ramamoorthi (2003) and let

$$\begin{aligned} C_{n}=\int_{\mathbb{R}^{p}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{t}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right|d\!\boldsymbol{t}\end{aligned}$$

then the main result to be proved (Eq. 10.2) becomes

$$\begin{aligned}C_{n}^{-1}\left\{\int_{\mathbb{R}^{p}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right.\right..\nonumber\\\left.\left.-C_{n}\frac{\sqrt{det\left|I\left(\boldsymbol{\xi}_{0}\right)\right|}}{\sqrt{2\pi}}e^{-\boldsymbol{s}^{T}I\left(\boldsymbol{\xi}_{0}\right)\mathbf{s}/2}\right|\right\} \, d\boldsymbol{s} & \stackrel{P_{\boldsymbol{\xi}_{0}}}{\to} & 0\end{aligned}$$
(10.4)

Because

$$\begin{aligned} \int_{\mathbb{R}^{p}}J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right){\rm exp}\left[-\frac{\boldsymbol{s}^{T}H\boldsymbol{s}}{2}\right]\, d\boldsymbol{s} & = J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right)\int_{\mathbb{R}^{p}}{\rm exp}\left[-\frac{\boldsymbol{s}^{T}H\boldsymbol{s}}{2}\right]\, d\boldsymbol{s}\\ &=J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right)\frac{\sqrt{2\pi}}{\sqrt{\det\left(H\right)}}\\ &\stackrel{a.s.}{\to} \pi\left(\boldsymbol{\xi}_{0}\right)\sqrt{\frac{2\pi}{\det\left(I\left(\boldsymbol{\xi}_{0}\right)\right)}}\end{aligned}$$

and Eq. 10.3 imply that \(C_{n}\stackrel{P}{\to}\pi\left(\boldsymbol{\xi}_{0}\right)\sqrt{\frac{2\pi}{\det\left(I\left(\boldsymbol{\xi}_{0}\right)\right)}}\) it is enough to show that the integral in Eq:10.4 goes to 0 in probability. This integral is less than \(I_{1}+I_{2}\) where

$$\begin{aligned} I_{1}& =\int_{\mathbb{R}^{P}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right){\rm exp}\left[L_{n}\left(\hat{\boldsymbol{\xi}}_{n}+\frac{\boldsymbol{s}}{\sqrt{n}}\right)-L_{n}\left(\hat{\boldsymbol{\xi}}_{n}\right)\right]\right.\\ &\left.-J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right){\rm exp}\left[\frac{-\boldsymbol{s}^{T}H\boldsymbol{s}}{2}\right]\right|\, d\boldsymbol{s}\end{aligned}$$

and

$$\begin{aligned} I_{2}=\int_{\mathbb{R}^{P}}\left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right){\rm exp}\left[\frac{-\boldsymbol{s}^{T}H\boldsymbol{s}}{2}\right]-C_{n}\frac{\sqrt{det\left|I\left(\boldsymbol{\xi}_{0}\right)\right|}}{\sqrt{2\pi}}e^{-\boldsymbol{s}^{T}I\left(\boldsymbol{\xi}_{0}\right)\mathbf{s}/2}\right|\, d\boldsymbol{s}.\end{aligned}$$

Eq. 10.3 shows that \(I_{1}\to0\) in probability and I 2 is

$$\begin{aligned} I_{2}& =& \left|J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right)-C_{n}\frac{\sqrt{det\left|I\left(\boldsymbol{\xi}_{0}\right)\right|}}{\sqrt{2\pi}}\right|\int_{\mathbb{R}^{P}}{\rm exp}\left[\frac{-\boldsymbol{s}^{T}H\boldsymbol{s}}{2}\right]\, d\boldsymbol{s}\\ & \stackrel{P}{\to} & 0\end{aligned}$$

because \(J_{n}\left(\boldsymbol{x}_{n},\hat{\boldsymbol{\xi}}_{n}\right)\stackrel{P}{\to}\pi\left(\boldsymbol{\xi}_{0}\right)\) and \(C_{n}\stackrel{P}{\to}\pi\left(\boldsymbol{\xi}_{0}\right)\sqrt{\frac{2\pi}{\det\left(I\left(\boldsymbol{\xi}_{0}\right)\right)}}.\hfill\Box\)

Appendix B: Proof of Assumptions for Free-Knot Splines Using a Truncated Polynomial Basis

We now consider the free-knot spline case. Suppose we are interested in a p degree (order \(m=p+1\)) polynomial spline with κ knot points, \(\boldsymbol{t}=\left\{t_{1},\dots,t_{\kappa}\right\} ^{T}\) where \(t_{k}\in(a+\delta,b-\delta)\) and \(\left|t_{i}-t_{j}\right|\le\delta\) for \(i\ne j\) and some \(\delta>0\). Furthermore, we assume that the data points \(\left\{x_{i},y_{i}\right\}\) independent with the distribution of the x i having positive density on \(\left[a,b\right]\).

Denote the truncated polynomial spline basis functions as

$$\begin{aligned} N(x,\boldsymbol{t}) & = \left\{N_{1}(x,\boldsymbol{t}),\dots,N_{\kappa+m}(x,\boldsymbol{t})\right\} ^{T}\\ & = \left\{1,x,\dots,x^{p},(x-t_{1})_{+}^{p},\dots,(x-t_{\kappa})_{+}^{p}\right\} ^{T}\end{aligned}$$

and let \(y_{i}=N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}+\sigma\epsilon_{i}\) where \(\epsilon_{i}\stackrel{iid}{\sim}N(0,1)\) and thus the density function is

$$\begin{aligned} f(y,\boldsymbol{\xi})=\frac{1}{\sqrt{2\pi\sigma^{2}}}{\rm exp}\left[-\frac{1}{2\sigma^{2}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right]\end{aligned}$$

where \(\boldsymbol{\xi}=\{\boldsymbol{t},\boldsymbol{\alpha},\sigma^{2}\}\) and the log-likelihood function is

$$\begin{aligned} L(\boldsymbol{\xi},y)=\frac{1}{2}\log2\pi-\frac{1}{2}\log\sigma^{2}-\frac{1}{2\sigma^{2}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\end{aligned}$$

10.2.1 Assumptions A0–A4

Assumptions A0–A2 are satisfied. We now consider assumption A3 and A4. We note that if \(p\ge4\) then the necessary three continuous derivatives exist and now examine the derivatives. Let \(\boldsymbol{\theta}=\{\boldsymbol{t},\boldsymbol{\alpha}\}\) and thus

$$\begin{aligned} E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}}L(\boldsymbol{\xi},y)\right] & =E_{\boldsymbol{\xi}}\left[-\frac{1}{2\sigma^{2}}2\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(-\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right]\\ & =-\frac{1}{2\sigma^{2}}2\left(E_{\boldsymbol{\xi}}\left[y\right]-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(-\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\\ & = 0\end{aligned}$$

and

$$\begin{aligned} E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},\boldsymbol{y})\right] & =E_{\boldsymbol{\xi}}\left[-\frac{1}{2\sigma^{2}}+\frac{1}{2\left(\sigma^{2}\right)^{2}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right]\\ & =-\frac{1}{2\sigma^{2}}+\frac{1}{2\left(\sigma^{2}\right)^{2}}\left(\sigma^{2}\right)\\ & =0.\end{aligned}$$

Next, we consider information matrix. First, we consider the \(\boldsymbol{\theta}\) terms.

$$\begin{aligned} {E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}}L(\boldsymbol{\xi},\boldsymbol{y})\,\frac{\partial}{\partial\theta_{k}}L(\boldsymbol{\xi},\boldsymbol{y})\right]} & =E_{\boldsymbol{\xi}}\left[\frac{1}{\sigma^{4}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\left(\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right]\\ &=\frac{1}{\sigma^{4}}E_{\boldsymbol{\xi}}\left[\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right]\left(\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\\ &=\frac{1}{\sigma^{2}}\left(\frac{\partial}{\partial\theta_{j}}N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{k}}N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\end{aligned}$$

The j,k partials for the second derivative are

$$\begin{aligned} {\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}L(\boldsymbol{\xi},y)}&=\frac{\partial}{\partial\theta_{j}}\left[-\frac{1}{2\sigma^{2}}2\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(-\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right]\\ &=\frac{\partial}{\partial\theta_{j}}\left[-\frac{1}{\sigma^{2}}\left(-y_{i}\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)+N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right)\right]\\ &=-\frac{1}{\sigma^{2}}\left[-y\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}+\left(\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right.\\ &\left.+N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right]\end{aligned}$$

which have expectation

$$\begin{aligned} E_{\boldsymbol{\xi}}\left[\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}L(\boldsymbol{\xi},y)\right] & =-\frac{1}{\sigma^{2}}\left(\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\\ & =-E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}}L(\boldsymbol{\xi},y)\,\frac{\partial}{\partial\theta_{k}}L(\boldsymbol{\xi},\boldsymbol{y})\right]\end{aligned}$$

as necessary. Next, we consider

$$\begin{aligned} {E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}}L(\boldsymbol{\xi},y)\,\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},y)\right]}\\ & =E_{\boldsymbol{\xi}}\left[\frac{1}{\sigma^{2}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\left[-\frac{1}{2\sigma^{2}}+\frac{1}{2\sigma^{4}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right]\right]\\ & =E_{\boldsymbol{\xi}}\left[-\frac{1}{2\sigma^{4}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}+\frac{1}{2\sigma^{6}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{3}\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right]\\ & =0\end{aligned}$$

which is equal to

$$\begin{aligned} E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}\partial\sigma^{2}}L(\boldsymbol{\xi},\boldsymbol{y})\right] & =E_{\boldsymbol{\xi}}\left[\frac{2}{2\sigma^{4}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right]\\ & =0.\end{aligned}$$

Finally,

$$\begin{aligned} {E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},y)\,\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},y)\right]}\\ & =E_{\boldsymbol{\xi}}\left[\left\{-\frac{1}{2\sigma^{2}}+\frac{1}{2\sigma^{4}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right\} \left\{-\frac{1}{2\sigma^{2}}+\frac{1}{2\sigma^{4}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right\} \right]\\ &=E_{\boldsymbol{\xi}}\left[\frac{1}{4\sigma^{4}}-\frac{2}{4\sigma^{6}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}+\frac{1}{4\sigma^{8}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{4}\right]\\ &=\frac{1}{4\sigma_{0}^{4}}-\frac{2}{4\sigma_{0}^{6}}\sigma_{0}^{2}+\frac{1}{4\sigma_{0}^{8}}3\sigma_{0}^{4}\\ &=\frac{2}{4\sigma_{0}^{4}}\end{aligned}$$

which is equal to

$$\begin{aligned} -E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\sigma^{2}\partial\sigma^{2}}L(\boldsymbol{\xi},y)\right] = & -E_{\boldsymbol{\xi}}\left[\frac{1}{2}\sigma^{-4}-\frac{2}{2}\sigma^{-6}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right]\\ = & -\frac{1}{2}\sigma_0^{-4} + \frac{2}{2}\sigma_0^{-4}.\end{aligned}$$

Therefore, the interchange of integration and differentiation is justified.

10.2.2 Assumptions A5

To address whether the information matrix is positive definite, we notice that since \(E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},y)\,\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},y)\right]>0\) and \(E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}}L(\boldsymbol{\xi},y)\,\frac{\partial}{\partial\sigma^{2}}L(\boldsymbol{\xi},y)\right]=0\), we only need to be concerned with the submatrix

$$\begin{aligned} I_{j,k}(\boldsymbol{\theta})&=\sum_{i=1}^{n}E_{\boldsymbol{\xi}}\left[\frac{\partial}{\partial\theta_{j}}L(\boldsymbol{\xi},y_{i})\,\frac{\partial}{\partial\theta_{k}}L(\boldsymbol{\xi},y_{i})\right]\\ &=\frac{1}{\sigma^{2}}\sum_{i=1}^{n}\left(\frac{\partial}{\partial\theta_{j}}N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{k}}N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right).\end{aligned}$$

where the \(\sigma^{-2}\) term can be ignored because it does not affect the positive definiteness. First, we note

$$\begin{aligned} \frac{\partial}{\partial t_{j}}N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}&=-p\left(x_{i}-t_{j}\right)_{+}^{p-1}\alpha_{p+j+1}\\ \frac{\partial}{\partial\alpha_{j}}N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}&=N_{j}(x_{i},\boldsymbol{t}).\end{aligned}$$

If we let

$$\begin{aligned} X=\left[\begin{array}{cccccc} N_{1}\left(x_{1},\boldsymbol{t}\right) & \cdots & N_{m+\kappa}\left(x_{1},\boldsymbol{t}\right) & \frac{\partial}{\partial t_{1}}N(x_{1},\boldsymbol{t})^{T}\boldsymbol{\alpha} & \cdots & \frac{\partial}{\partial t_{\kappa}}N(x_{1},\boldsymbol{t})^{T}\boldsymbol{\alpha}\\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots\\ N_{1}\left(x_{n},\boldsymbol{t}\right) & \cdots & N_{m+\kappa}\left(x_{n},\boldsymbol{t}\right) & \frac{\partial}{\partial t_{1}}N(x_{n},\boldsymbol{t})^{T}\boldsymbol{\alpha} & \cdots & \frac{\partial}{\partial t_{\kappa}}N(x_{n},\boldsymbol{t})^{T}\boldsymbol{\alpha}\end{array}\right]\end{aligned}$$

then \(I(\boldsymbol{\theta})=X^{T}X\). Then, \(I(\boldsymbol{\theta})\) is positive definite if the columns of X are linearly independent. This is true under the assumptions that \(t_{j}\ne t_{k}\) and that \(\alpha_{m+j}\ne0\).

10.2.3 Assumptions A6

We next consider a bound on the third partial derivatives. We start with the derivatives of the basis functions.

$$\begin{aligned} \frac{\partial^{2}}{\partial t_{j}\partial t_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}=0\qquad\textrm{if}j\ne k\end{aligned}$$
$$\begin{aligned} \frac{\partial^{2}}{\partial t_{j}\partial t_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}=p(p-1)\left(x-t_{j}\right)_{+}^{p-2}\alpha_{p+j+1}\end{aligned}$$
$$\begin{aligned} \frac{\partial^{2}}{\partial\alpha_{j}\partial\alpha_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}=0\end{aligned}$$
$$\begin{aligned} \frac{\partial^{2}}{\partial t_{j}\partial\alpha_{p+j+1}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}=-p\left(x-t_{j}\right)_{+}^{p-1}\end{aligned}$$
$$\begin{aligned} \frac{\partial^{3}}{\partial t_{j}\partial t_{j}\partial t_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}=-p(p-1)(p-2)\left(x-t_{j}\right)_{+}^{p-3}\alpha_{p+j+1}\end{aligned}$$
$$\begin{aligned} \frac{\partial^{3}}{\partial t_{j}\partial t_{j}\partial\alpha_{p+j+1}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}=p(p-1)\left(x-t_{j}\right)_{+}^{p-2}\end{aligned}$$

Since, x is an element of a compact set, then for \(\boldsymbol{\xi}\in B(\boldsymbol{\xi}_{0},\delta)\) all of the earlier partials are bounded as is \(N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\). Therefore

$$\begin{aligned} &{\frac{\partial^{3}}{\partial\theta_{j}\partial\theta_{k}\partial\theta_{l}}L(\boldsymbol{\xi},x)}\\ & \qquad= -\frac{1}{\sigma^{2}}\left[-y\frac{\partial^{3}}{\partial\theta_{j}\partial\theta_{k}\partial\theta_{l}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}+\left(\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial^{2}}{\partial\theta_{l}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right.\\ &\qquad+\left(\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{l}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial^{2}}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\\ &\qquad+\left(\frac{\partial^{2}}{\partial\theta_{l}\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial^{2}}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\\ &\qquad\left.+N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\left(\frac{\partial^{3}}{\partial\theta_{j}\partial\theta_{k}\partial\theta_{l}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right]\end{aligned}$$

and

$$\begin{aligned} &{\frac{\partial^{3}}{\partial\theta_{j}\partial\theta_{k}\partial\sigma^{2}}L(\boldsymbol{\xi},x)}\\ & \qquad=\frac{1}{\sigma^{4}}\left[-y\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}+\left(\frac{\partial}{\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\right.\\ &\qquad\left.+N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\frac{\partial^{2}}{\partial\theta_{j}\partial\theta_{k}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right]\end{aligned}$$

and

$$\begin{aligned} \frac{\partial^{3}}{\partial\theta_{j}\partial\sigma^{2}\partial\sigma^{2}}L(\boldsymbol{\xi},y)=-\frac{2}{\sigma^{6}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\left(-\frac{\partial}{\partial\theta_{j}}N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)\end{aligned}$$

and

$$\begin{aligned} \frac{\partial^{3}}{\partial\sigma^{2}\partial\sigma^{2}\partial\sigma^{2}}L(\boldsymbol{\xi},\boldsymbol{y})=-\frac{1}{\sigma^{6}}+\frac{3}{\sigma^{8}}\left(y-N(x,\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\end{aligned}$$

are also bounded \(\boldsymbol{\xi}\in B(\boldsymbol{\xi}_{0},\delta)\) since \(\sigma_{0}^{2}>0\) by assumption. The expectation of the bounds also clearly exists.

10.2.4 Lemmas

To show that the remaining assumptions are satisfied, we first examine the behavior of

$$\begin{aligned} g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})=N(x_{i},\boldsymbol{t}_{0})^{T}\boldsymbol{\alpha}_{0}-N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}.\end{aligned}$$

Notice that for x i chosen on a uniform grid over \([a,b]\) then

$$\begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2} \to\frac{1}{b-a}\int_{a}^{b}\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x)\right)^{2}\, dx.\end{aligned}$$

Furthermore we notice that \(g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x\right)\) is also a spline. The sum of the two splines is also a spline. Consider the degree p case of \(g\left(x|\boldsymbol{\alpha},t\right)+g\left(x|\boldsymbol{\alpha}^{*},t^{*}\right)\) where \(t<t^{*}.\) Then the sum is a spline with knot points \(\left\{t,t^{*}\right\}\) and whose first p + 1 coefficients are \(\boldsymbol{\alpha}+\boldsymbol{\alpha}^{*}\) and last two coefficients are \(\left\{\alpha_{p+1},\alpha_{p+1}^{*}\right\}\).

At this point, we also notice

$$\begin{aligned} E\left[n^{-1}\sum g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\epsilon_{i}\right] &=n^{-1}\sum g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)E\left[\epsilon_{i}\right]\\ &=0\end{aligned}$$
$$\begin{aligned} V\left[n^{-1}\sum g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\epsilon_{i}\right] &=n^{-2}V\left[\sum g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\epsilon_{i}\right]\\ &=n^{-2}\sum V\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\epsilon_{i}\right]\\ &=n^{-2}\sum g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)^{2}V\left[\epsilon_{i}\right]\\ &=n^{-2}\sum g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)^{2}\\ &\rightarrow 0\end{aligned}$$

and that \(\sum\epsilon_{i}^{2}\sim\chi_{n}^{2}\) and thus \(n^{-1}\sum\epsilon_{i}^{2}\) converges in probability to the constant 1. Therefore, by the SLLN,

$$\begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left[g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2} &=\frac{1}{n}\sum_{i=1}^{n}\left[g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i}\right)\right]^{2}+\frac{2\sigma_{0}}{n}\sum_{i=1}^{n}\epsilon_{i}g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i}\right)+\frac{\sigma_{0}^{2}}{n}\sum_{i=1}^{n}\epsilon_{i}^{2}\\ &=\frac{1}{n}\sum_{i=1}^{n}\left[g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i}\right)\right]^{2}+O_{p}\left(n^{-1}\right)+\frac{\sigma_{0}^{2}}{n}\sum_{i=1}^{n}\epsilon_{i}^{2}\\ &\stackrel{a.s.}{\rightarrow}\frac{1}{b-a}\int_{a}^{b}\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x)\right)^{2}\, dx+\sigma_{0}^{2}.\end{aligned}$$

Lemma 1.

Given a degree p polynomial \(g(x|\boldsymbol{\alpha})\) on \([a,b]\) with coefficients \(\boldsymbol{\alpha}\) , then \(\exists\;\lambda_{n,m},\lambda_{n,M}>0\) such that \(||\boldsymbol{\alpha}||^{2}\lambda_{n,m}^{2}\le\frac{1}{n}\sum_{i=1}^{n}\left[g(x_{i}|\boldsymbol{\alpha})\right]^{2}\le||\boldsymbol{\alpha}||^{2}\lambda_{n,M}^{2}\) .

Proof.

If \(\boldsymbol{\alpha}=\boldsymbol{0}\), then \(g\left(x|\boldsymbol{\alpha}\right)=0\) and the result is obvious. If \(g\left(x|\boldsymbol{\alpha}\right)\) is a polynomial with at least one non-zero coefficient, it therefore cannot be identically zero on \([a,b]\) and therefore for n > p then \(\frac{1}{n}\sum\left[g(x_{i}|\boldsymbol{\alpha})\right]^{2}>0\) since the polynomial can only have at most p zeros. We notice that

$$\begin{aligned} \int_{a}^{b}\left[g(x|\boldsymbol{\alpha})\right]^{2}\, dx & =\int_{a}^{b}\left[\sum_{i=0}^{p}\alpha_{i}^{2}x^{2i}+2\sum_{i=0}^{p-1}\sum_{j=i+1}^{p}\alpha_{i}\alpha_{j}x^{i+j}\right]dx\\ & =\left.\sum_{i=0}^{p}\frac{\alpha_{i}^{2}}{i+1}x^{2i+1}+2\sum_{i=0}^{p-1}\sum_{j=i+1}^{p}\frac{\alpha_{i}\alpha_{j}}{i+j+1}x^{i+j+1}\right|_{x=a}^{b}\\ & =\boldsymbol{\alpha}^{T}X\boldsymbol{\alpha}\end{aligned}$$

where the matrix \(\boldsymbol{X}\) has i,j element \(\left(b^{i+j}-a^{i+j}\right)/(i+j)\). Since \(\int_{a}^{b}\left[g(x|\boldsymbol{\alpha})\right]^{2}\, dx>0\) for all \(\boldsymbol{\alpha}\) then the matrix X must be positive definite. Next we notice that

$$\begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left[g(x_{i}|\boldsymbol{\alpha})\right]^{2} =& \frac{1}{n}\sum_{i=1}^{n}\boldsymbol{\alpha}^{T}\boldsymbol{X}_{i}\boldsymbol{\alpha}\\ = & \boldsymbol{\alpha}^{T}\left(\frac{1}{n}\sum\boldsymbol{X}_{i}\right)\boldsymbol{\alpha}\\ = & \boldsymbol{\alpha}^{T}\boldsymbol{X}_{n}\boldsymbol{\alpha}\end{aligned}$$

and therefore \(\boldsymbol{X}_{n}\to\boldsymbol{X}\) and therefore, denoting the eigenvalues of \(\boldsymbol{X}_{n}\) as \(\boldsymbol{\lambda}_{n}\) and the eigenvalues of \(\boldsymbol{X}\) as \(\boldsymbol{\lambda}\), we have \(\boldsymbol{\lambda}_{n}\to\boldsymbol{\lambda}\)

Letting \(\lambda_{n,m}\) and \(\lambda_{n,M}\) be the minimum and maximum eigenvalues of \(\boldsymbol{X}_{n}\) be the largest, then \(\lambda_{n,m}^{2}\left\Vert \boldsymbol{\alpha}\right\Vert ^{2}\le\frac{1}{n}\sum\left[g(x|\boldsymbol{\alpha})\right]^{2}\le\lambda_{n,M}^{2}\left\Vert \boldsymbol{\alpha}\right\Vert ^{2}.\hfill\Box\)

The values \(\lambda_{n,m},\lambda_{n,M}\) depend on the interval that the polynomial is integrated/summed over and that if a = b, then the integral is zero. In the following lemmas, we assume that there is some minimal distance between two knot-points and between a knot-point and the boundary values a,b.

Lemma 2.

Given a degree p spline \(g(x|\boldsymbol{\theta})\) with κ knot points on \([a,b]\) , let \(\tau=\left(\left|a\right|\vee\left|b\right|\right)^{\kappa}\) . Then \(\forall\;\delta>2\tau,\;\exists\;\lambda_{n}>0\) such that if \(\left\Vert \boldsymbol{\theta}\right\Vert>\delta\) then \(\frac{1}{n}\sum\left[g(x_{i}|\boldsymbol{\theta})\right]^{2}>\left(\delta^{2}+\tau^{2}\right)\lambda_{n}\) .

Proof.

Notice that \(||\boldsymbol{\theta}||^{2}>\delta^{2}>4\tau^{2}\) implies \(||\boldsymbol{\alpha}||^{2}>\delta^{2}-\tau^{2}\). First we consider the case of \(\kappa=1\). If \(\alpha_{0}^{2}+\dots+\alpha_{p}^{2}>\left(\delta^{2}+\tau^{2}\right)/9\) then \(\frac{1}{n}\sum\left[g(x_{i}|\boldsymbol{\theta})\right]^{2}1_{[a,t]}\left(x_{i}\right)>\lambda_{n}\left(\delta^{2}+\tau^{2}\right)\) for some \(\lambda_{n}>0.\) If \(\alpha_{0}^{2}+\dots+\alpha_{p}^{2}\le\left(\delta^{2}+\tau^{2}\right)/9\) then \(\alpha_{p+1}^{2}\ge3\left(\delta^{2}+\tau^{2}\right)/4\). Therefore \((\alpha_{p}+\alpha_{p+1})\), the coefficient of the x p term of the polynomial on \([t_{1},b]\) is

$$\begin{aligned} \left\Vert \alpha_{p}+\alpha_{p+1}\right\Vert ^{2} &> \left\Vert \alpha_{p+1}\right\Vert ^{2}-\left\Vert \alpha_{p}\right\Vert ^{2}\\ &> \frac{3\left(\delta^{2}+\tau^{2}\right)}{4}-\frac{\left(\delta^{2}+\tau^{2}\right)}{4}\\ &> \frac{1}{2}\left(\delta^{2}+\tau^{2}\right)\end{aligned}$$

and thus the squared norm of the coefficients of the polynomial on \([t_{1},b]\) must also be greater than \(\frac{1}{2}\left(\delta^{2}+\tau^{2}\right)\) and thus \(\frac{1}{n}\sum\left[g(x_{i}|\boldsymbol{\theta})\right]^{2}1_{[t,b]}\left(x_{i}\right)>\lambda_{n}\left(\delta^{2}+\tau^{2}\right)\) for some \(\lambda_{n}>0.\) The proof for multiple knots is similar, only examining all \(\kappa+1\) polynomial sections for one with coefficients with squared norm larger than some fraction of \(\left(\delta^{2}+\tau^{2}\right).\hfill\Box\)

Lemma 3.

For all \(\delta>0\) , there exists \(\lambda_{n}>0\) such that for all \(\boldsymbol{\theta}\notin B(\boldsymbol{\theta}_{0},\delta)\) then \(\frac{1}{n}\sum\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2}>\lambda_{n}\delta\) .

Proof.

By the previous lemma, for all \(\Delta>2\tau\) there exists \(\exists\,\Lambda_{n}>0\) such that for all \(\boldsymbol{\theta}\notin B(\boldsymbol{\theta}_{0},\Delta)\) then \(\frac{1}{n}\sum\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2}>\Lambda_{n}\Delta\). We now consider the region

$$\begin{aligned} \mathcal{C}=\textrm{closure}\left[B\left(\boldsymbol{\theta}_{0},\Delta\right) B\left(\boldsymbol{\theta}_{0},\delta\right)\right]\end{aligned}$$

Assume to the contrary that there exists \(\delta>0\) such that \(\forall\,\lambda_{n}>0,\,\,\exists\boldsymbol{\theta}\in\mathcal{C}\) such that \(\frac{1}{n}\sum\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2}\le\lambda_{n}\delta\) and we will seek a contradiction. By the negation, there exists a sequence \(\boldsymbol{\theta}_{n}\in\mathcal{C}\) such that \(\frac{1}{n}\sum\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2}\to0\). But since \(\boldsymbol{\theta}_{n}\) is in a compact space, there exists a subsequence \(\boldsymbol{\theta}_{n_{k}}\) that converges to \(\boldsymbol{\theta}_{\infty}\in\mathcal{C}\) and \(\frac{1}{n}\sum\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2}=0\). But since \(\boldsymbol{\theta}_{0}\notin\mathcal{C}\) this is a contradiction.\(\hfill\Box\)

Corollary 4.

There exists λ such that for any \(\delta>0\) and \(\boldsymbol{\theta}\notin B(\boldsymbol{\theta}_{0},\delta)\)

$$\begin{aligned} \frac{1}{n}\sum_{i=1}^{n}\left[g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2} \ge \lambda_{n}^{2}\delta^{2}+O_{p}\left(n^{-1/2}\right)+\sigma_{0}^{2}.\end{aligned}$$

We now focus our attention on the ratio of the maximum value of a polynomial and its integral.

Lemma 5.

Given a degree p polynomial \(g\left(x|\boldsymbol{\alpha}\right)\) on \(\left[a,b\right]\) , then

$$\begin{aligned} \frac{\max_{i\in\left\{1,\dots,n\right\}}\left[g\left(x_{i}|\boldsymbol{\alpha}\right)\right]^{2}}{\frac{1}{n}\sum_{i=1}^{n}\left[g\left(x_{i}|\boldsymbol{\alpha}\right)\right]^{2}\, dx}\le\frac{\lambda_{M}^{2}}{\lambda_{n,m}^{2}}\to\frac{\lambda_{M}^{2}}{\lambda_{m}^{2}}\end{aligned}$$

for some \(\lambda_{M},\lambda_{m}>0\).

Proof.

Since we can write \(\left[g\left(x|\boldsymbol{\alpha}\right)\right]^{2}=\boldsymbol{\alpha}^{T}W_{x}\boldsymbol{\alpha}\) for some nonnegative definite matrix W x which has a maximum eigenvalue \(\lambda_{M,x}\), and because the the maximum eigenvalue is a continuous function in x, let \(\lambda_{M}=\sup\lambda_{M,x}\). Then the maximum of \(\left[g\left(x|\boldsymbol{\alpha}\right)\right]^{2}\) over \(x\in[a,b]\) is less than \(\left\Vert \boldsymbol{\alpha}\right\Vert ^{2}\lambda_{M}^{2}\). The denominator is bounded from below by \(\left\Vert \boldsymbol{\alpha}\right\Vert ^{2}\lambda_{n,m}^{2}.\hfill\Box\)

Lemma 6.

Given a degree p spline \(g\left(x|\boldsymbol{\theta}\right)\) on \(\left[a,b\right]\) , then

$$\begin{aligned} \frac{\max\left[g\left(x|\boldsymbol{\theta}\right)\right]^{2}}{\int_{a}^{b}\left[g\left(x|\boldsymbol{\theta}\right)\right]^{2}\, dx}\le\frac{\lambda_{M}^{2}}{\lambda_{m}^{2}}\end{aligned}$$

for some \(\lambda_{M},\lambda_{m}>0\).

Proof.

Since a degree p spline is a degree p polynomial on different regions defined by the knot-points, and because the integral over the whole interval \([a,b]\) is greater than the integral over the regions defined by the knot-points, we can use the previous lemma on each section and then chose the largest ratio.\(\hfill\Box\)

Lemma 7.

Given a degree p spline \(g\left(x|\boldsymbol{\theta}\right)\) on \([a,b]\) then

$$\frac{n^{-1/2}\max_{i}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}{n^{-1}\sum_{i=1}^{n}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}=O_{p}\left(1\right)$$
(10.5)

uniformly over \(\boldsymbol{\theta}\).

Proof.

Notice

$$\begin{aligned} \frac{n^{-1/2}\max_{i}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}{n^{-1}\sum_{i=1}^{n}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}} & \le & \frac{2n^{-1/2}\max_{i}\left[\epsilon_{i}^{2}\sigma_{0}^{2}\right]+2n^{-1/2}\max_{i}\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}{n^{-1}\sum_{i=1}^{n}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}\\ & = & \frac{2\sigma_{0}^{2}n^{-1/2}\max_{i}\epsilon_{i}^{2}+\max_{i}\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}{n^{-1}\sum_{i=1}^{n}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}\\ & = & \frac{O_{p}\left(\frac{\log n}{\sqrt{n}}\right)+\max_{i}\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}{n^{-1}\sum_{i=1}^{n}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}}\end{aligned}$$

and since \(n^{-1}\sum_{i=1}^{n}\left[\epsilon_{i}\sigma_{0}+g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)\right]^{2}\stackrel{P}{\to}\frac{1}{b-a}\int_{a}^{b}\left(g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x)\right)^{2}\, dx+\sigma_{0}^{2}\), and lemma 8 bounds the ratio of the terms that involve \(\boldsymbol{\theta}\), this ratio is bounded in probability uniformly over \(\boldsymbol{\theta}\).\(\hfill\Box\)

10.2.5 Assumptions B1

Returning to assumption B1, we now consider \(\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)\) and

$$\begin{aligned} L_{n}\left(\boldsymbol{\xi}\right) & =\sum\log\left\{\frac{1}{\sqrt{2\pi}\sigma}{\rm exp}\left[\frac{-1}{2\sigma}\sum\left(y_{i}-N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\right]\right\}\\ & =-\frac{n}{2}\log\left(2\pi\right)-n\log\sigma-\frac{1}{2\sigma}\sum\left[y_{i}-N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right]^{2}\\ & =-\frac{n}{2}\log\left(2\pi\right)-n\log\sigma-\frac{1}{2\sigma}\sum\left[N(x_{i},\boldsymbol{t}_{0})^{T}\boldsymbol{\alpha}_{0}+\sigma_{0}\epsilon_{i}-N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right]^{2}\\ & =-\frac{n}{2}\log\left(2\pi\right)-n\log\sigma-\frac{1}{2\sigma}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}\end{aligned}$$

and therefore

$$\begin{aligned} {\frac{1}{n}}&\left(L_{n}(\boldsymbol{\xi})-L_{n}(\boldsymbol{\xi}_{0})\right)\\ & =-\log\sigma-\frac{1}{2n\sigma^{2}}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}+\log\sigma_{0}+\frac{1}{2n\sigma_{0}}\sum\left[g\left(\boldsymbol{\theta}_{0},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}\\ & =\log\frac{\sigma_{0}}{\sigma}-\frac{1}{2n\sigma^{2}}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}+\frac{1}{2n\sigma_{0}^{2}}\sum\left[\sigma_{0}\epsilon_{i}\right]^{2}\\ & =\log\frac{\sigma_{0}}{\sigma}-\frac{\left(\lambda_{n}\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0}\right)\right)^{2}}{2\sigma^{2}}-\frac{\sigma_{0}^{2}}{2\sigma^{2}}+\frac{1}{2n}\sum\left[\epsilon_{i}\right]^{2}\end{aligned}$$

where

$$\begin{aligned} \left[\lambda_{n}\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0}\right)\right]^{2} =\frac{1}{n}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}-\sigma_{0}^{2}\end{aligned}$$

which converges in probability to \(\frac{1}{b-a}\int_{a}^{b}\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x\right)\right]^{2}dx\). The function goes to \(-\infty\) as \(\sigma\to0\) and \(\sigma\to\infty\). Taking the derivative

$$\begin{aligned}\frac{d}{d\sigma}\left[\log\frac{\sigma_{0}}{\sigma}-\frac{1}{2\sigma^{2}}\left[\left(\lambda_{n}\right)^{2}+\sigma_{0}^{2}\right]+\frac{1}{2n}\sum\epsilon_{i}^{2}\right]=-\frac{1}{\sigma}+\frac{1}{\sigma^{3}}\left[\left(\lambda_{n}\right)^{2}+\sigma_{0}^{2}\right]\end{aligned}$$

and setting it equal to zero yields a single critical point of at \(\sigma^{2}=\left[\left(\lambda_{n}\right)^{2}+\sigma_{0}^{2}\right]\) which results in a maximum of

$$\log\left(\frac{\sigma_{0}}{\sqrt{\left(\lambda_{n}\right)^{2}+\sigma_{0}^{2}}}\right)-\frac{1}{2}+\frac{1}{2}n^{-1}\sum\epsilon_{i}^{2}$$
(10.6)

which bounded away from zero in probability for \(\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)\)

10.2.6 Assumption C1

Assumption C1 is

$$\begin{aligned}\inf_{\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)}\frac{\min_{i=1\dots n}L(\boldsymbol{\xi},X_{i})}{\left|L_{n}(\boldsymbol{\xi})-L_{n}(\boldsymbol{\xi}_{0})\right|}\stackrel{P_{\boldsymbol{\xi}_{0}}}{\longrightarrow}0\end{aligned}$$

First notice

$$\begin{aligned} L(\boldsymbol{\xi},Y_{i})&=-\frac{1}{2}\log\left(2\pi\right)-\log\sigma-\frac{1}{2\sigma^{2}}\left(Y_{i}-N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\\ & =-\frac{1}{2}\log\left(2\pi\right)-\log\sigma-\frac{1}{2\sigma^{2}}\left(\epsilon_{i}\sigma_{0}+N(x_{i},\boldsymbol{t}_{0})^{T}\boldsymbol{\alpha}_{0}-N(x_{i},\boldsymbol{t})^{T}\boldsymbol{\alpha}\right)^{2}\\ & =-\frac{1}{2}\log\left(2\pi\right)-\log\sigma-\frac{1}{2\sigma^{2}}\left(\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right)^{2}\end{aligned}$$

and we consider \(\mathcal{C}=\left\{\boldsymbol{\xi}:\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)\right\}\). Define

$$\begin{aligned} f_{n}\left(\boldsymbol{\xi}\right)&=& \frac{\min\; L\left(\boldsymbol{\xi},Y_{i}\right)}{\left|L_{n}\left(\boldsymbol{\xi}\right)-L_{n}\left(\boldsymbol{\xi}_{0}\right)\right|}\\ & = & \frac{-\frac{1}{2}\log\left(2\pi\right)-\log\sigma-\frac{1}{2\sigma^{2}}\max\left[\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{n\cdot\frac{1}{n}\left|L_{n}\left(\boldsymbol{\xi}\right)-L_{n}\left(\boldsymbol{\xi}_{0}\right)\right|}\end{aligned}$$

and notice that the denominator is bounded away from 0 by 10.6.

$$\begin{aligned} f_{n}\left(\boldsymbol{\xi}\right)&= \frac{-\frac{1}{2}\log\left(2\pi\right)-\log\sigma-\frac{1}{2\sigma^{2}}\max\left[\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{-n\cdot\frac{1}{n}\left(L_{n}\left(\boldsymbol{\xi}\right)-L_{n}\left(\boldsymbol{\xi}_{0}\right)\right)}\\ & =\frac{\frac{1}{\sqrt{n}}\left[-\frac{1}{2}\log\left(2\pi\right)-\log\sigma-\frac{1}{2\sigma^{2}}\max\left[\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}\right]}{-\sqrt{n}\cdot\frac{1}{n}\left[n\log\frac{\sigma_{0}}{\sigma}-\frac{1}{2\sigma^{2}}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}+\frac{1}{2}\sum\epsilon_{i}^{2}\right]}\\ & =\frac{1}{\sqrt{n}}\cdot\frac{-\frac{1}{2\sqrt{n}}\log\left(2\pi\right)-\frac{1}{\sqrt{n}}\log\sigma-\frac{1}{2\sqrt{n}\sigma^{2}}\max\left[\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{-\log\frac{\sigma_{0}}{\sigma}+\frac{1}{2n\sigma^{2}}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}-\frac{1}{2n}\sum\epsilon_{i}^{2}}\\ & =\frac{1}{\sqrt{n}}\left[\frac{-\frac{1}{2\sqrt{n}}\log\left(2\pi\right)}{-\log\frac{\sigma_{0}}{\sigma}+\frac{1}{2n\sigma^{2}}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}-\frac{1}{2n}\sum\epsilon_{i}^{2}}\right.\\ &\qquad \left.+\frac{-\frac{1}{\sqrt{n}}\log\sigma-\frac{1}{2\sqrt{n}\sigma^{2}}\max\left[\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{-\log\frac{\sigma_{0}}{\sigma}+\frac{1}{2n\sigma^{2}}\sum\left[g\left(\boldsymbol{\theta},\boldsymbol{\theta}_{0},x_{i}\right)+\sigma_{0}\epsilon_{i}\right]^{2}-\frac{1}{2n}\sum\epsilon_{i}^{2}}\right]\end{aligned}$$

We consider the infimums of the terms inside the brackets separately.

For the first term, since the denominator is bounded in probability above 0 uniformly in \(\boldsymbol{\theta}\), and the numerator goes to zero, the infimum of the first term goes to 0 in probability.

The second term is uniformly bounded over \(\boldsymbol{\theta}\) by lemma 9. Notice that the numerator is

$$\begin{aligned} -&\frac{1}{\sqrt{n}}\log\sigma-\frac{1}{2\sqrt{n}\sigma^{2}}\max\left[\epsilon_{i}\sigma_{0}+g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}\\ &\ge -\frac{1}{\sqrt{n}}\log\sigma-\frac{\max\left[\epsilon_{i}\sigma_{0}\right]^{2}}{\sqrt{n}\sigma^{2}}-\frac{\max\left[g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{\sqrt{n}\sigma^{2}}\\ & = -\frac{1}{\sqrt{n}}\log\sigma-\frac{\sigma_{0}^{2}\, O_{p}\left(\log\, n\right)}{\sqrt{n}\sigma^{2}}-\frac{\max\left[g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{\sqrt{n}\sigma^{2}}\\ & \ge \frac{-\log n}{\sqrt{n}}\,\log\sigma-\frac{\sigma_{0}^{2}\, O_{p}\left(\log\, n\right)}{\sqrt{n}\sigma^{2}}-\frac{\max\left[g(\boldsymbol{\theta}_{0},\boldsymbol{\theta},x_{i})\right]^{2}}{\sqrt{n}\sigma^{2}}\end{aligned}$$

and all three terms of the numerator converge to 0 for every σ. Therefore, for \(\sigma\in\left[0,d\right]\) for some large d, the infimum converges to 0. For \(\sigma>d\), the \(\log\sigma\) terms dominate and the infimum occurs at \(\sigma=d\) which also converges to 0. Therefore

$$\begin{aligned}\begin{array}{c}{\rm inf}\\[-3pt]\boldsymbol{\xi}\notin B(\boldsymbol{\xi}_{0},\delta)\end{array}\frac{\min{L\left(\boldsymbol{\xi},Y_{i}\right)}}{\left|L_{n}\left(\boldsymbol{\xi}\right)-L_{n}\left(\boldsymbol{\xi}_{0}\right)\right|}\stackrel{P}{\to}0.\end{aligned}$$

10.2.7 Assumptions C2

Finally we turn our attention to the Jacobian. Recall that the Jacobian is

$$\begin{aligned}J_{0}\left(\boldsymbol{y}_{0},\boldsymbol{\xi}\right)=\left|\frac{1}{\sigma^{2}}p^{\kappa}\det\left[\begin{array}{ccc} \boldsymbol{B}_{\boldsymbol{\alpha}} & \boldsymbol{B}_{\boldsymbol{t}} & \boldsymbol{B}_{\sigma^{2}}\end{array}\right]\right|\end{aligned}$$

where

$$\begin{aligned}\boldsymbol{B}_{\boldsymbol{\alpha}}=\left[\!\!\!\begin{array}{ccccccc} 1 & x_{(1)} & \dots & x_{(1)}^{p} & (x_{(1)}-t_{1})_{+}^{p} & \dots & (x_{(1)}-t_{\kappa})_{+}^{p}\\ \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots\\ 1 & x_{(l)} & \dots & x_{(l)}^{p} & (x_{(l)}-t_{1})_{+}^{p} & \dots & (x_{(l)}-t_{\kappa})_{+}^{p}\end{array}\!\!\!\right],\end{aligned}$$
$$\begin{aligned}\boldsymbol{B}_{\boldsymbol{t}}=\left[\!\!\!\begin{array}{ccc} \alpha_{1+p+1}\left(x_{(1)}-t_{1}\right)_{+}^{p-1}I\left(x_{(1)}-t_{1}\right) & \dots & \alpha_{1+p+\kappa}\left(x_{(1)}-t_{\kappa}\right)_{+}^{p-1}I\left(x_{(1)}-t_{\kappa}\right)\\ \vdots & \ddots & \vdots\\ \alpha_{1+p+1}\left(x_{(l)}-t_{1}\right)_{+}^{p-1}I\left(x_{(l)}-t_{1}\right) & \dots & \alpha_{1+p+\kappa}\left(x_{(l)}-t_{\kappa}\right)_{+}^{p-1}I\left(x_{(l)}-t_{\kappa}\right)\end{array}\!\!\!\right],\end{aligned}$$

and

$$\begin{aligned}\boldsymbol{B}_{\sigma^{2}}=\left[\!\!\!\begin{array}{c} -\frac{1}{2}\left(y_{(1)}-g(x_{(1)}|\boldsymbol{\theta})\right)\\ \vdots\\ -\frac{1}{2}\left(y_{(l)}-g(x_{(l)}|\boldsymbol{\theta})\right)\end{array}\!\!\!\right].\end{aligned}$$

Following the notation of Yeo and Johnson, we suppress parenthesis and 0 subscripts. We consider the \(\boldsymbol{\xi}\) in compact space \(\bar{B}(\boldsymbol{\xi}_{0},\delta)\). We notice that for \(\delta<\sigma^{-2}\) that \(J(\boldsymbol{y};\boldsymbol{\xi})\le\delta^{\kappa+1}p^{\kappa}g(\boldsymbol{y})\) for some \(g(\boldsymbol{y})\) because \(\boldsymbol{B_{\alpha}}\) and \(\boldsymbol{B_{t}}\) are functions of \(\boldsymbol{x},\boldsymbol{t}\) which are bounded.

We let \(S_{M}^{l}\) be the unit square in \(\mathbb{R}^{l}\) of radius M.

Finally, we notice that \(J_{j}(y_{1},\dots,y_{j};\boldsymbol{\xi})=E\left[J\left(y_{1},\dots,y_{j},Y_{j+1},\dots,Y_{l};\boldsymbol{\xi}\right)\right]\) is a polynomial in \(\boldsymbol{\theta}\) scaled by \(\sigma^{2}\), which is equicontinuous on compacts of \(\boldsymbol{\xi}\) where σ is bounded away from 0.

Appendix C: Full Simulation Results

Fig. 10.4
figure 4

Coverage rates for the single knot scenario. The color (red, blue) represents the method (fiducial, Bayesian)

Fig. 10.5
figure 5

Coverage rates for the three knot “Simple” scenario. The color (red, blue) represents the method (fiducial, Bayesian). The topmost panel is the coverage of knot one in the \(sigma=0.1, n=40\) simulation

Fig. 10.6
figure 6

Coverage rates for the three knot “Clustered” scenario. The color (red, blue) represents the method (fiducial, Bayesian). The topmost panel is the coverage of knot one in the \(sigma=0.1, n=40\) simulation

Fig. 10.7
figure 7

Coverage rates for the three knot “Subtle” scenario. The color (red, blue) represents the method (fiducial, Bayesian). The topmost panel is the coverage of knot one in the \(sigma=0.1, n=40\) simulation

Fig. 10.8
figure 8

Confidence interval lengths for the single knot scenario. The color (red, blue) represents the method (fiducial, Bayesian)

Fig. 10.9
figure 9

Confidence interval lengths for the three knot “Simple” scenario. The color (red, blue) represents the method (fiducial, Bayesian). The topmost panel is the coverage of knot one in the \(sigma=0.1, n=40\) simulation

Fig. 10.10
figure 10

Confidence interval lengths for the three knot “Clustered” scenario. The color (red, blue) represents the method (fiducial, Bayesian). The topmost panel is the coverage of knot one in the \(sigma=0.1, n=40\) simulation

Fig. 10.11
figure 11

Confidence interval lengths for the three knot “Subtle” scenario. The color (red, blue) represents the method (fiducial, Bayesian). The topmost panel is the coverage of knot one in the \(sigma=0.1, n=40\) simulation

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Sonderegger, D., Hannig, J. (2014). Fiducial Theory for Free-Knot Splines. In: Lahiri, S., Schick, A., SenGupta, A., Sriram, T. (eds) Contemporary Developments in Statistical Theory. Springer Proceedings in Mathematics & Statistics, vol 68. Springer, Cham. https://doi.org/10.1007/978-3-319-02651-0_10

Download citation

Publish with us

Policies and ethics