Semiparametric Bernstein–von Mises theorem and bias, illustrated with Gaussian process priors

Castillo, Ismaël

doi:10.1007/s13171-012-0008-6

Semiparametric Bernstein–von Mises theorem and bias, illustrated with Gaussian process priors

Published: 01 December 2012

Volume 74, pages 194–221, (2012)
Cite this article

Sankhya A Aims and scope Submit manuscript

Ismaël Castillo¹

201 Accesses
10 Citations
Explore all metrics

Abstract

A semiparametric model is considered where the functional of interest is a shift parameter between two curves. A surprising example is provided where two at first sight indistinguishable Gaussian priors lead to quite different behaviours of the posterior distribution of the functional of interest. This phenomenon also illustrates that a condition introduced in Castillo (2012) of the approximation of the least favourable direction by the Gaussian prior is almost necessary for the Bernstein–von Mises theorem to hold.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Residuals-based distributionally robust optimization with covariate information

Article 26 September 2023

Mixture Models: Latent Profile and Latent Class Analysis

References

Bickel, P.J. and Kleijn, B.J.K. (2012). The semiparametric Bernstein–von Mises theorem. Ann. Statist., 40, 206–237.
Article MATH Google Scholar
Boucheron, S. and Gassiat, E. (2009). A Bernstein–von Mises theorem for discrete probability distributions. Electron. J. Stat., 3, 114–148.
Article MathSciNet Google Scholar
Castillo, I. (2008). Lower bounds for posterior rates with Gaussian process priors. Electron. J. Stat., 2, 1281–1299.
Article MathSciNet Google Scholar
Castillo, I. (2012). A semiparametric Bernstein–von Mises theorem for Gaussian process priors. Probab. Theory Related Fields, 152, 53–99.
Article MathSciNet MATH Google Scholar
Castillo, I. and Cator, E. (2011). Semiparametric shift estimation based on the cumulated periodogram for non-regular functions. Electron. J. Stat., 5, 102–126.
Article MathSciNet Google Scholar
Cox, D.D. (1993). An analysis of Bayesian inference for nonparametric regression. Ann. Statist., 21, 903–923.
Article MathSciNet MATH Google Scholar
Diaconis, P. and Freedman, D. (1986). On the consistency of Bayes estimates. Ann. Statist., 14, 1–67. With a discussion and a rejoinder by the authors.
Article MathSciNet MATH Google Scholar
Freedman, D. (1999). On the Bernstein–von Mises theorem with infinite-dimensional parameters. Ann. Statist., 27, 1119–1140.
MathSciNet MATH Google Scholar
Gamboa, F., Loubes, J.-M. and Maza, E. (2007). Semi-parametric estimation of shifts. Electron. J. Stat., 1, 616–640.
Article MathSciNet MATH Google Scholar
Ghosal, S. (2000). Asymptotic normality of posterior distributions for exponential families when the number of parameters tends to infinity. J. Multivariate Anal., 74, 49–68.
Article MathSciNet MATH Google Scholar
Ghosal, S., Ghosh, J.K. and van der Vaart, A.W. (2000). Convergence rates of posterior distributions. Ann. Statist., 28, 500–531.
Article MathSciNet MATH Google Scholar
Ghosal, S. and van der Vaart, A.W. (2007). Convergence rates of posterior distributions for non-i.i.d. observations. Ann. Statist., 35, 192–223.
Article MathSciNet MATH Google Scholar
Giné, E. and Nickl, R. (2011). Rates of contraction for posterior distributions in L^r-metrics, 1 ≤ r ≤ ∞. Ann. Statist., 39, 2883–2911.
Article MATH Google Scholar
Ibragimov, I.A. and Has^′minskiĭ, R.Z. (1981). Statistical estimation. In Applications of Mathematics, (volume 16). Springer, New York.
Google Scholar
Johnstone, I. (2010). High dimensional Bernstein–von Mises: simple examples. In Festschrift for L awrence D. B rown. Inst. Math. Stat. Collect., (volume 6, pages 87–98).
Kim, Y. (2006). The Bernstein–von Mises theorem for the proportional hazard model. Ann. Statist., 34, 1678–1700.
Article MathSciNet MATH Google Scholar
McNeney, B. and Wellner, J.A. (2000). Application of convolution theorems in semiparametric models with non-i.i.d. data. J. Statist. Plann. Inference, 91, 441–480.
Article MathSciNet MATH Google Scholar
Rousseau, J. and Rivoirard, V. (2012). Bernstein–von Mises theorem for linear functionals of the density. Ann. Statist., 40, 1489–1523.
Article MATH Google Scholar
Shen, X. (2002). Asymptotic normality of semiparametric and nonparametric posterior distributions. J. Amer. Statist. Assoc., 97, 222–235.
Article MathSciNet MATH Google Scholar
Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. Ann. Statist., 29, 687–714.
Article MathSciNet MATH Google Scholar
van der Vaart, A. (2002). The statistical work of Lucien Le Cam. Ann. Statist., 30, 631–682.
Article MathSciNet MATH Google Scholar
van der Vaart, A.W. (1998). Asymptotic statistics. In Cambridge Series in Statistical and Probabilistic Mathematics (volume 3). Cambridge University Press, Cambridge.
Google Scholar
van der Vaart, A.W. and van Zanten, H. (2008). Rates of contraction of posterior distributions based on Gaussian process priors. Ann. Statist., 36, 1435–1463.
Article MathSciNet MATH Google Scholar
van der Vaart, A.W. and van Zanten, H. (2008). Reproducing kernel Hilbert spaces of Gaussian priors. Inst. Math. Stat. Collect., 3, 200–222.
Article Google Scholar
van der Vaart, A.W. and Wellner, J.A. (1996). Weak convergence and empirical processes. Springer Series in Statistics. Springer, New York.
Google Scholar
Wu, Y. and Ghosal, S. (2008). Posterior consistency for some semi-parametric problems. Sankhyā, 70, 267–313.
MathSciNet MATH Google Scholar

Download references

Acknowledgement.

The author acknowledges the hospitality of the Statistics Department of the Vrije Universiteit Amsterdam and TU Eindhoven/Eurandom for a 2-week stay during the preparation of this work. The author would also like to thank Dominique Picard for an insightful comment. This work was partly supported by ANR Grant “Banhdits” ANR-2010-BLAN-0113-03.

Author information

Authors and Affiliations

CNRS, Laboratoire Probabilités et Modèles Aléatoires, (LPMA) 175, rue du Chevaleret, 75013, Paris, France
Ismaël Castillo

Authors

Ismaël Castillo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ismaël Castillo.

A Appendix: Checking conditions (C1)-(N1)

Let us check that the priors Π ^α and Π ^α,* verify conditions (C ₁ ) and (N ₁ ) for some rate ε _n, in some domain of values of the regularity parameters (α,β). The arguments are very similar to the ones used in Castillo (2012) for the translation parameter estimation, Castillo (2012), Eq. (9). In fact, here we obtain the same set of parameters (α,β) for which (C ₁ ) and (N ₁ ) are satisfied, see Fig. 1 in Castillo (2012).

First, we check the concentration condition (C ₁ ) following the approach in Ghosal and van der Vaart (2007). The first step is to show a concentration in terms of a distance for which tests with exponential decrease exist. Given the true parameter (θ ₀,f ₀) and another parameter (θ ₁,f ₁), let us set

$$\begin{array}{rll} \phi_n &=& {\bf 1}\bigg\{\int_{0}^{1} (f_1-f_0)(t) dY(t)\\ &&\quad+ \int_{0}^{1} \{f_1(t-{\theta}_1)-f_0(t-{\theta}_0)\}dZ(t) > \|f_1\|^2 - \|f_0\|^2\bigg\}. \end{array}$$

Simple calculations analogous to Lemma 5 in Ghosal and van der Vaart (2007) show that this test enables to test the true (θ ₀,f ₀) versus a ball with appropriate exponential decrease of the error probabilities, see Castillo (2012), Eq. (4), or Ghosal and van der Vaart (2007), Eq. (2.2). The corresponding testing distance d _T is given by

$$ d_T(({\theta}_1,f_1),({\theta}_2,f_2))^2 = \|f_1-f_2\|^2 + \|f_1(\cdot-{\theta}_1)-f_2(\cdot-{\theta}_2)\|^2, $$

One then relates $d_T^2$ to the squared-distance $\|f_1-f_2\|_2^2+({\theta}_1-{\theta}_2)^2$. This is easily done by adapting Lemma 4 in Castillo (2012) to the case of not-necessarily symmetric f. Once those distances are related, the verifications of the entropy and prior mass conditions are done exactly as in Castillo (2012), Section 4.1.1, thus leading to (C ₁ ). One also verifies that the rate ε _n can be taken proportional to n ^{− α ∧ β/(2α + 1)}.

Now, we check (N ₁ ). The term R _n(θ ₀,f + (θ − θ ₀)γ) is 0 so one focuses on R _n(θ,f). We first introduce a sieve ${\mathcal{F}}_n$ on which it is possible to restrict the supremum in condition (N ₁ ). Let us introduce the Hilbert space of functions

$$ {\mathbb{B}}^p = \Bigg\{f=\sum\limits_{k\geq 1}f_k{\varepsilon}_k(\cdot),\quad \sum\limits_{k\geq 1} k^{2p}f_k^2 < {+\infty} \Bigg\},\qquad p\ge 1, $$

equipped with the norm $\|f\|^2 _{2,p} = \sum _{k\geq 1} k^{2p} f_k^2$. The idea is to use Borell’s inequality in the form of van der Vaart and van Zanten (2008), Theorem 5.1. This result exactly tells us that overwhelming probability, the Gaussian prior (either Π ^α or Π ^α,*) draws functions g which can be written

$$ \label{decomp0} g={\varepsilon}_n v_0 + \sqrt{n}{\varepsilon}_n w_0, \quad \text{with}\ v_0\in{\mathbb{B}}_1^0,\ w\in {\mathbb{H}}_1^{{\alpha}}, $$

(3.2)

but also, for 1 ≤ p < α, and some rate α _n→0 to be specified,

$$ \label{decomp} g={\alpha}_n v+ \sqrt{n}{\alpha}_n w, \quad \text{with}\ v\in{\mathbb{B}}_1^p,\ w\in {\mathbb{H}}_1^{{\alpha}}, $$

(3.3)

where ${\mathbb{H}}_1^{{\alpha}}$ denotes the unit ball of the RKHS of the prior (we use the same notation ${\mathbb{H}}_1^{{\alpha}}$ for Π ^α and Π ^α,* though the corresponding spaces differ slightly) and ${\mathbb{B}}_1^p$ the unit ball of the space ${\mathbb{B}}^p$. As in Castillo (2012), one can then define a sieve ${\mathcal{F}}_n$ as the intersection of the set of functions defined by (3.2) and (3.3). Under some conditions on α _n, Borell’s inequality implies that the complement ${\mathcal{F}}\setminus {\mathcal{F}}_n$ has probability less than $\exp(-n{\varepsilon}_n^2)$, see Castillo (2012), Lemma 13. Thus, it is possible to restrict the study of the posterior (and of (N ₁ )) to ${\mathcal{F}}_n$.

We first deal with the deterministic terms R _n,3, R _n,4. To control R _n,4, it is enough to bound from above separately $\int (a_f(t-{\theta})-a_f(t-{\theta}_0))^2 dt$ and $\int D_n(t,h)^2 dt$. This last term can be bounded as in Castillo (2012), Lemma 5 (adapting slightly the proof to accommodate to not necessarily symmetric functions f), leading to a bound in o(1 + h ²). The first term is bounded using the decomposition (9) in the form f = α _n v + w _n, with $\|w_n\|_{{\mathbb{H}}_1^{\alpha}}^2\le n{\alpha}_n^2$,

$$\begin{array}{lll} \;\; \lefteqn{\int_0^1 (a_f(t-{\theta})-a_f(t-{\theta}_0))^2 dt} \\ {\lesssim} \, n{\alpha}_n^2 \int_0^1 (v(t-{\theta})-v(t-{\theta}_0))^2dt \\ \;\; +\, n\int_0^1( (w_n-f_0)(t-{\theta})-(w_n-f_0)(t-{\theta}_0))^2 dt. \end{array}$$

The bounds on the respective variances have been derived in Castillo (2012), see the bounds to (22)–(23). The first term is a $O({\alpha}_n^2 h^2)$ and the second is a $O((1+h^2){\alpha}_n^2 n^{2/(1+2{\alpha})})$. Thus, both are o(1 + h ²) provided that ${\alpha}_n=o(n^{-1/(1+2{\alpha})})$.

To bound R _n,3, we develop the product and bound again each term separately. One resulting term is $\int (a_f-hf_0')(t-{\theta}_0) D_n(t,h)dt$ and, similar to Lemma 6 in Castillo (2012), is a o(1 + h ²) as soon as ${\varepsilon}_n=o(n^{-1+{\beta}/2})$. Another term is $h\int f_0'(t-{\theta}_0)(a_f(t-{\theta})-a_f(t-{\theta}_0))dt$. Using Cauchy–Schwarz inequality, we can reuse the bound of the previous display. The last term to bound is $\int a_f(t-{\theta}_0)(a_f(t-{\theta})-a_f(t-{\theta}_0))dt$. First, we notice that, for any w in L ²[0,1], 1-periodic of Fourier coefficients w _k, expanding the function on the Fourier basis,

$$ \int_0^1w(t-{\theta}_0)(w(t-{\theta}_0)-w(t-{\theta}))dt = \sum\limits_{k\ge 1}\sin^2(\pi k({\theta}-{\theta}_0))(w_{2k}^2+w_{2k+1}^2). $$

Applying this to the function $a_f=\sqrt{n}(f-f_0)$ and using the inequality sin(x) ≤ x enables us to bound the quantity at stake by a constant times $h^2\sum_{k\ge 1} k^2 (f_{0,k}-f_k)^2$. We split this sum along indexes k ≤ k(n) and k > k(n), with $k(n)=\lfloor n^{1/(1+2{\alpha})} \rfloor$. The sum up to k(n) leads to the bound $h^2 k(n)^2\|f-f_0\|^2\le h^2k(n)^2{\varepsilon}_n^2$. Due to the expressions of k(n) and ε _n, this is a o(h ²) when α ∧ β ≥ 1. The sum for k > k(n) is bounded noticing that $\sum_{k>k(n)} k^2 f_{0,k}^2=o(1)$ since β > 1 and using the decomposition (9) as follows:

$$ \sum\limits_{k>k(n)} k^2 f_k^2 \le {\alpha}_n^2\sum\limits_{k>k(n)}k^2 v_k^2 + n{\alpha}_n^2 \sum\limits_{k>k(n)} k^2 w_k^2. $$

Since $v\in{\mathbb{B}}_1^p$ with p > 1, the first term is a $o({\alpha}_n^2)$. Since $w\in{\mathbb{H}}^{{\alpha}}_1$, we have that

$$ \sum\limits_{k>k(n)} k^2 w_k^2 \le k(n)^{1-2{\alpha}}\sum\limits_{k>k(n)}k^{1+2{\alpha}}w_k^2=o(k(n)^{1-2{\alpha}}). $$

By definition of k(n), we conclude that the term at stake is a o(h ²) if $n^{2/(1+2{\alpha})}{\alpha}_n^2=o(1)$. The stochastic terms R _n,1 and R _n,2 are exactly the same (up to the symmetry assumption on f, which does not change the proofs) as in Castillo (2012), see Eq. (16)–(18), so we can borrow the proofs.

The imposed conditions on ε _n,α _n found above are the same as in Castillo (2012), Section 4.1.3, where it is checked that those are satisfied as soon as ${\alpha} > 1 + \sqrt{3}/2$, β > 3/2, and if β < 2 ∧ α, also α < (3β − 2)/(4 − 2β ). This is the zone depicted in Fig. 1 in Castillo (2012). It includes in particular the rectangle ${\alpha} > 1 + \sqrt{3}/2$, β ≥ 2, where (N ₁ ) is therefore satisfied.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Castillo, I. Semiparametric Bernstein–von Mises theorem and bias, illustrated with Gaussian process priors. Sankhya A 74, 194–221 (2012). https://doi.org/10.1007/s13171-012-0008-6

Download citation

Received: 02 March 2011
Revised: 09 September 2011
Published: 01 December 2012
Issue Date: August 2012
DOI: https://doi.org/10.1007/s13171-012-0008-6

Keywords and phrases.

Bayesian nonparametrics, Bernstein–von Mises theorem, semiparametric models, Gaussian process priors.

AMS (2000) subject classification.

Primary 62G05; Secondary 62G20, 62F25.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semiparametric Bernstein–von Mises theorem and bias, illustrated with Gaussian process priors

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Residuals-based distributionally robust optimization with covariate information

Mixture Models: Latent Profile and Latent Class Analysis

References

Acknowledgement.

Author information

Authors and Affiliations

Corresponding author

A Appendix: Checking conditions (C1)-(N1)

Rights and permissions

About this article

Cite this article

Keywords and phrases.

AMS (2000) subject classification.

Navigation

Semiparametric Bernstein–von Mises theorem and bias, illustrated with Gaussian process priors

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Residuals-based distributionally robust optimization with covariate information

Mixture Models: Latent Profile and Latent Class Analysis

References

Acknowledgement.

Author information

Authors and Affiliations

Corresponding author

A Appendix: Checking conditions (C1)-(N1)

A Appendix: Checking conditions (C1)-(N1)

Rights and permissions

About this article

Cite this article

Share this article

Keywords and phrases.

AMS (2000) subject classification.

Search

Navigation

A Appendix: Checking conditions (C₁)-(N₁)