Skip to main content
Log in

Semiparametric Bernstein–von Mises theorem and bias, illustrated with Gaussian process priors

  • Published:
Sankhya A Aims and scope Submit manuscript

Abstract

A semiparametric model is considered where the functional of interest is a shift parameter between two curves. A surprising example is provided where two at first sight indistinguishable Gaussian priors lead to quite different behaviours of the posterior distribution of the functional of interest. This phenomenon also illustrates that a condition introduced in Castillo (2012) of the approximation of the least favourable direction by the Gaussian prior is almost necessary for the Bernstein–von Mises theorem to hold.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bickel, P.J. and Kleijn, B.J.K. (2012). The semiparametric Bernstein–von Mises theorem. Ann. Statist., 40, 206–237.

    Article  MATH  Google Scholar 

  • Boucheron, S. and Gassiat, E. (2009). A Bernstein–von Mises theorem for discrete probability distributions. Electron. J. Stat., 3, 114–148.

    Article  MathSciNet  Google Scholar 

  • Castillo, I. (2008). Lower bounds for posterior rates with Gaussian process priors. Electron. J. Stat., 2, 1281–1299.

    Article  MathSciNet  Google Scholar 

  • Castillo, I. (2012). A semiparametric Bernstein–von Mises theorem for Gaussian process priors. Probab. Theory Related Fields, 152, 53–99.

    Article  MathSciNet  MATH  Google Scholar 

  • Castillo, I. and Cator, E. (2011). Semiparametric shift estimation based on the cumulated periodogram for non-regular functions. Electron. J. Stat., 5, 102–126.

    Article  MathSciNet  Google Scholar 

  • Cox, D.D. (1993). An analysis of Bayesian inference for nonparametric regression. Ann. Statist., 21, 903–923.

    Article  MathSciNet  MATH  Google Scholar 

  • Diaconis, P. and Freedman, D. (1986). On the consistency of Bayes estimates. Ann. Statist., 14, 1–67. With a discussion and a rejoinder by the authors.

    Article  MathSciNet  MATH  Google Scholar 

  • Freedman, D. (1999). On the Bernstein–von Mises theorem with infinite-dimensional parameters. Ann. Statist., 27, 1119–1140.

    MathSciNet  MATH  Google Scholar 

  • Gamboa, F., Loubes, J.-M. and Maza, E. (2007). Semi-parametric estimation of shifts. Electron. J. Stat., 1, 616–640.

    Article  MathSciNet  MATH  Google Scholar 

  • Ghosal, S. (2000). Asymptotic normality of posterior distributions for exponential families when the number of parameters tends to infinity. J. Multivariate Anal., 74, 49–68.

    Article  MathSciNet  MATH  Google Scholar 

  • Ghosal, S., Ghosh, J.K. and van der Vaart, A.W. (2000). Convergence rates of posterior distributions. Ann. Statist., 28, 500–531.

    Article  MathSciNet  MATH  Google Scholar 

  • Ghosal, S. and van der Vaart, A.W. (2007). Convergence rates of posterior distributions for non-i.i.d. observations. Ann. Statist., 35, 192–223.

    Article  MathSciNet  MATH  Google Scholar 

  • Giné, E. and Nickl, R. (2011). Rates of contraction for posterior distributions in Lr-metrics, 1 ≤ r ≤ ∞. Ann. Statist., 39, 2883–2911.

    Article  MATH  Google Scholar 

  • Ibragimov, I.A. and Hasminskiĭ, R.Z. (1981). Statistical estimation. In Applications of Mathematics, (volume 16). Springer, New York.

    Google Scholar 

  • Johnstone, I. (2010). High dimensional Bernstein–von Mises: simple examples. In Festschrift for L awrence D. B rown. Inst. Math. Stat. Collect., (volume 6, pages 87–98).

  • Kim, Y. (2006). The Bernstein–von Mises theorem for the proportional hazard model. Ann. Statist., 34, 1678–1700.

    Article  MathSciNet  MATH  Google Scholar 

  • McNeney, B. and Wellner, J.A. (2000). Application of convolution theorems in semiparametric models with non-i.i.d. data. J. Statist. Plann. Inference, 91, 441–480.

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseau, J. and Rivoirard, V. (2012). Bernstein–von Mises theorem for linear functionals of the density. Ann. Statist., 40, 1489–1523.

    Article  MATH  Google Scholar 

  • Shen, X. (2002). Asymptotic normality of semiparametric and nonparametric posterior distributions. J. Amer. Statist. Assoc., 97, 222–235.

    Article  MathSciNet  MATH  Google Scholar 

  • Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. Ann. Statist., 29, 687–714.

    Article  MathSciNet  MATH  Google Scholar 

  • van der Vaart, A. (2002). The statistical work of Lucien Le Cam. Ann. Statist., 30, 631–682.

    Article  MathSciNet  MATH  Google Scholar 

  • van der Vaart, A.W. (1998). Asymptotic statistics. In Cambridge Series in Statistical and Probabilistic Mathematics (volume 3). Cambridge University Press, Cambridge.

    Google Scholar 

  • van der Vaart, A.W. and van Zanten, H. (2008). Rates of contraction of posterior distributions based on Gaussian process priors. Ann. Statist., 36, 1435–1463.

    Article  MathSciNet  MATH  Google Scholar 

  • van der Vaart, A.W. and van Zanten, H. (2008). Reproducing kernel Hilbert spaces of Gaussian priors. Inst. Math. Stat. Collect., 3, 200–222.

    Article  Google Scholar 

  • van der Vaart, A.W. and Wellner, J.A. (1996). Weak convergence and empirical processes. Springer Series in Statistics. Springer, New York.

    Google Scholar 

  • Wu, Y. and Ghosal, S. (2008). Posterior consistency for some semi-parametric problems. Sankhyā, 70, 267–313.

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgement.

The author acknowledges the hospitality of the Statistics Department of the Vrije Universiteit Amsterdam and TU Eindhoven/Eurandom for a 2-week stay during the preparation of this work. The author would also like to thank Dominique Picard for an insightful comment. This work was partly supported by ANR Grant “Banhdits” ANR-2010-BLAN-0113-03.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ismaël Castillo.

A Appendix: Checking conditions (C1)-(N1)

A Appendix: Checking conditions (C1)-(N1)

Let us check that the priors Π α and Π α,* verify conditions (C 1 ) and (N 1 ) for some rate ε n , in some domain of values of the regularity parameters (α,β). The arguments are very similar to the ones used in Castillo (2012) for the translation parameter estimation, Castillo (2012), Eq. (9). In fact, here we obtain the same set of parameters (α,β) for which (C 1 ) and (N 1 ) are satisfied, see Fig. 1 in Castillo (2012).

First, we check the concentration condition (C 1 ) following the approach in Ghosal and van der Vaart (2007). The first step is to show a concentration in terms of a distance for which tests with exponential decrease exist. Given the true parameter (θ 0,f 0) and another parameter (θ 1,f 1), let us set

$$\begin{array}{rll} \phi_n &=& {\bf 1}\bigg\{\int_{0}^{1} (f_1-f_0)(t) dY(t)\\ &&\quad+ \int_{0}^{1} \{f_1(t-{\theta}_1)-f_0(t-{\theta}_0)\}dZ(t) > \|f_1\|^2 - \|f_0\|^2\bigg\}. \end{array}$$

Simple calculations analogous to Lemma 5 in Ghosal and van der Vaart (2007) show that this test enables to test the true (θ 0,f 0) versus a ball with appropriate exponential decrease of the error probabilities, see Castillo (2012), Eq. (4), or Ghosal and van der Vaart (2007), Eq. (2.2). The corresponding testing distance d T is given by

$$ d_T(({\theta}_1,f_1),({\theta}_2,f_2))^2 = \|f_1-f_2\|^2 + \|f_1(\cdot-{\theta}_1)-f_2(\cdot-{\theta}_2)\|^2, $$

One then relates \(d_T^2\) to the squared-distance \(\|f_1-f_2\|_2^2+({\theta}_1-{\theta}_2)^2\). This is easily done by adapting Lemma 4 in Castillo (2012) to the case of not-necessarily symmetric f. Once those distances are related, the verifications of the entropy and prior mass conditions are done exactly as in Castillo (2012), Section 4.1.1, thus leading to (C 1 ). One also verifies that the rate ε n can be taken proportional to n  − α ∧ β/(2α + 1).

Now, we check (N 1 ). The term R n (θ 0,f + (θ − θ 0)γ) is 0 so one focuses on R n (θ,f). We first introduce a sieve \({\mathcal{F}}_n\) on which it is possible to restrict the supremum in condition (N 1 ). Let us introduce the Hilbert space of functions

$$ {\mathbb{B}}^p = \Bigg\{f=\sum\limits_{k\geq 1}f_k{\varepsilon}_k(\cdot),\quad \sum\limits_{k\geq 1} k^{2p}f_k^2 < {+\infty} \Bigg\},\qquad p\ge 1, $$

equipped with the norm \(\|f\|^2 _{2,p} = \sum _{k\geq 1} k^{2p} f_k^2\). The idea is to use Borell’s inequality in the form of van der Vaart and van Zanten (2008), Theorem 5.1. This result exactly tells us that overwhelming probability, the Gaussian prior (either Π α or Π α,*) draws functions g which can be written

$$ \label{decomp0} g={\varepsilon}_n v_0 + \sqrt{n}{\varepsilon}_n w_0, \quad \text{with}\ v_0\in{\mathbb{B}}_1^0,\ w\in {\mathbb{H}}_1^{{\alpha}}, $$
(3.2)

but also, for 1 ≤ p < α, and some rate α n →0 to be specified,

$$ \label{decomp} g={\alpha}_n v+ \sqrt{n}{\alpha}_n w, \quad \text{with}\ v\in{\mathbb{B}}_1^p,\ w\in {\mathbb{H}}_1^{{\alpha}}, $$
(3.3)

where \({\mathbb{H}}_1^{{\alpha}}\) denotes the unit ball of the RKHS of the prior (we use the same notation \({\mathbb{H}}_1^{{\alpha}}\) for Π α and Π α,* though the corresponding spaces differ slightly) and \({\mathbb{B}}_1^p\) the unit ball of the space \({\mathbb{B}}^p\). As in Castillo (2012), one can then define a sieve \({\mathcal{F}}_n\) as the intersection of the set of functions defined by (3.2) and (3.3). Under some conditions on α n , Borell’s inequality implies that the complement \({\mathcal{F}}\setminus {\mathcal{F}}_n\) has probability less than \(\exp(-n{\varepsilon}_n^2)\), see Castillo (2012), Lemma 13. Thus, it is possible to restrict the study of the posterior (and of (N 1 )) to \({\mathcal{F}}_n\).

We first deal with the deterministic terms R n,3, R n,4. To control R n,4, it is enough to bound from above separately \(\int (a_f(t-{\theta})-a_f(t-{\theta}_0))^2 dt\) and \(\int D_n(t,h)^2 dt\). This last term can be bounded as in Castillo (2012), Lemma 5 (adapting slightly the proof to accommodate to not necessarily symmetric functions f), leading to a bound in o(1 + h 2). The first term is bounded using the decomposition (9) in the form f = α n v + w n , with \(\|w_n\|_{{\mathbb{H}}_1^{\alpha}}^2\le n{\alpha}_n^2\),

$$\begin{array}{lll} \;\; \lefteqn{\int_0^1 (a_f(t-{\theta})-a_f(t-{\theta}_0))^2 dt} \\ {\lesssim} \, n{\alpha}_n^2 \int_0^1 (v(t-{\theta})-v(t-{\theta}_0))^2dt \\ \;\; +\, n\int_0^1( (w_n-f_0)(t-{\theta})-(w_n-f_0)(t-{\theta}_0))^2 dt. \end{array}$$

The bounds on the respective variances have been derived in Castillo (2012), see the bounds to (22)–(23). The first term is a \(O({\alpha}_n^2 h^2)\) and the second is a \(O((1+h^2){\alpha}_n^2 n^{2/(1+2{\alpha})})\). Thus, both are o(1 + h 2) provided that \({\alpha}_n=o(n^{-1/(1+2{\alpha})})\).

To bound R n,3, we develop the product and bound again each term separately. One resulting term is \(\int (a_f-hf_0')(t-{\theta}_0) D_n(t,h)dt\) and, similar to Lemma 6 in Castillo (2012), is a o(1 + h 2) as soon as \({\varepsilon}_n=o(n^{-1+{\beta}/2})\). Another term is \(h\int f_0'(t-{\theta}_0)(a_f(t-{\theta})-a_f(t-{\theta}_0))dt\). Using Cauchy–Schwarz inequality, we can reuse the bound of the previous display. The last term to bound is \(\int a_f(t-{\theta}_0)(a_f(t-{\theta})-a_f(t-{\theta}_0))dt\). First, we notice that, for any w in L 2[0,1], 1-periodic of Fourier coefficients w k , expanding the function on the Fourier basis,

$$ \int_0^1w(t-{\theta}_0)(w(t-{\theta}_0)-w(t-{\theta}))dt = \sum\limits_{k\ge 1}\sin^2(\pi k({\theta}-{\theta}_0))(w_{2k}^2+w_{2k+1}^2). $$

Applying this to the function \(a_f=\sqrt{n}(f-f_0)\) and using the inequality sin(x) ≤ x enables us to bound the quantity at stake by a constant times \(h^2\sum_{k\ge 1} k^2 (f_{0,k}-f_k)^2\). We split this sum along indexes k ≤ k(n) and k > k(n), with \(k(n)=\lfloor n^{1/(1+2{\alpha})} \rfloor\). The sum up to k(n) leads to the bound \(h^2 k(n)^2\|f-f_0\|^2\le h^2k(n)^2{\varepsilon}_n^2\). Due to the expressions of k(n) and ε n , this is a o(h 2) when α ∧ β ≥ 1. The sum for k > k(n) is bounded noticing that \(\sum_{k>k(n)} k^2 f_{0,k}^2=o(1)\) since β > 1 and using the decomposition (9) as follows:

$$ \sum\limits_{k>k(n)} k^2 f_k^2 \le {\alpha}_n^2\sum\limits_{k>k(n)}k^2 v_k^2 + n{\alpha}_n^2 \sum\limits_{k>k(n)} k^2 w_k^2. $$

Since \(v\in{\mathbb{B}}_1^p\) with p > 1, the first term is a \(o({\alpha}_n^2)\). Since \(w\in{\mathbb{H}}^{{\alpha}}_1\), we have that

$$ \sum\limits_{k>k(n)} k^2 w_k^2 \le k(n)^{1-2{\alpha}}\sum\limits_{k>k(n)}k^{1+2{\alpha}}w_k^2=o(k(n)^{1-2{\alpha}}). $$

By definition of k(n), we conclude that the term at stake is a o(h 2) if \(n^{2/(1+2{\alpha})}{\alpha}_n^2=o(1)\). The stochastic terms R n,1 and R n,2 are exactly the same (up to the symmetry assumption on f, which does not change the proofs) as in Castillo (2012), see Eq. (16)–(18), so we can borrow the proofs.

The imposed conditions on ε n ,α n found above are the same as in Castillo (2012), Section 4.1.3, where it is checked that those are satisfied as soon as \({\alpha} > 1 + \sqrt{3}/2\), β > 3/2, and if β < 2 ∧ α, also α < (3β − 2)/(4 − 2β ). This is the zone depicted in Fig. 1 in Castillo (2012). It includes in particular the rectangle \({\alpha} > 1 + \sqrt{3}/2\), β ≥ 2, where (N 1 ) is therefore satisfied.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Castillo, I. Semiparametric Bernstein–von Mises theorem and bias, illustrated with Gaussian process priors. Sankhya A 74, 194–221 (2012). https://doi.org/10.1007/s13171-012-0008-6

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13171-012-0008-6

Keywords and phrases.

AMS (2000) subject classification.

Navigation