Abstract
A semiparametric model is considered where the functional of interest is a shift parameter between two curves. A surprising example is provided where two at first sight indistinguishable Gaussian priors lead to quite different behaviours of the posterior distribution of the functional of interest. This phenomenon also illustrates that a condition introduced in Castillo (2012) of the approximation of the least favourable direction by the Gaussian prior is almost necessary for the Bernstein–von Mises theorem to hold.
Similar content being viewed by others
References
Bickel, P.J. and Kleijn, B.J.K. (2012). The semiparametric Bernstein–von Mises theorem. Ann. Statist., 40, 206–237.
Boucheron, S. and Gassiat, E. (2009). A Bernstein–von Mises theorem for discrete probability distributions. Electron. J. Stat., 3, 114–148.
Castillo, I. (2008). Lower bounds for posterior rates with Gaussian process priors. Electron. J. Stat., 2, 1281–1299.
Castillo, I. (2012). A semiparametric Bernstein–von Mises theorem for Gaussian process priors. Probab. Theory Related Fields, 152, 53–99.
Castillo, I. and Cator, E. (2011). Semiparametric shift estimation based on the cumulated periodogram for non-regular functions. Electron. J. Stat., 5, 102–126.
Cox, D.D. (1993). An analysis of Bayesian inference for nonparametric regression. Ann. Statist., 21, 903–923.
Diaconis, P. and Freedman, D. (1986). On the consistency of Bayes estimates. Ann. Statist., 14, 1–67. With a discussion and a rejoinder by the authors.
Freedman, D. (1999). On the Bernstein–von Mises theorem with infinite-dimensional parameters. Ann. Statist., 27, 1119–1140.
Gamboa, F., Loubes, J.-M. and Maza, E. (2007). Semi-parametric estimation of shifts. Electron. J. Stat., 1, 616–640.
Ghosal, S. (2000). Asymptotic normality of posterior distributions for exponential families when the number of parameters tends to infinity. J. Multivariate Anal., 74, 49–68.
Ghosal, S., Ghosh, J.K. and van der Vaart, A.W. (2000). Convergence rates of posterior distributions. Ann. Statist., 28, 500–531.
Ghosal, S. and van der Vaart, A.W. (2007). Convergence rates of posterior distributions for non-i.i.d. observations. Ann. Statist., 35, 192–223.
Giné, E. and Nickl, R. (2011). Rates of contraction for posterior distributions in Lr-metrics, 1 ≤ r ≤ ∞. Ann. Statist., 39, 2883–2911.
Ibragimov, I.A. and Has′minskiĭ, R.Z. (1981). Statistical estimation. In Applications of Mathematics, (volume 16). Springer, New York.
Johnstone, I. (2010). High dimensional Bernstein–von Mises: simple examples. In Festschrift for L awrence D. B rown. Inst. Math. Stat. Collect., (volume 6, pages 87–98).
Kim, Y. (2006). The Bernstein–von Mises theorem for the proportional hazard model. Ann. Statist., 34, 1678–1700.
McNeney, B. and Wellner, J.A. (2000). Application of convolution theorems in semiparametric models with non-i.i.d. data. J. Statist. Plann. Inference, 91, 441–480.
Rousseau, J. and Rivoirard, V. (2012). Bernstein–von Mises theorem for linear functionals of the density. Ann. Statist., 40, 1489–1523.
Shen, X. (2002). Asymptotic normality of semiparametric and nonparametric posterior distributions. J. Amer. Statist. Assoc., 97, 222–235.
Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. Ann. Statist., 29, 687–714.
van der Vaart, A. (2002). The statistical work of Lucien Le Cam. Ann. Statist., 30, 631–682.
van der Vaart, A.W. (1998). Asymptotic statistics. In Cambridge Series in Statistical and Probabilistic Mathematics (volume 3). Cambridge University Press, Cambridge.
van der Vaart, A.W. and van Zanten, H. (2008). Rates of contraction of posterior distributions based on Gaussian process priors. Ann. Statist., 36, 1435–1463.
van der Vaart, A.W. and van Zanten, H. (2008). Reproducing kernel Hilbert spaces of Gaussian priors. Inst. Math. Stat. Collect., 3, 200–222.
van der Vaart, A.W. and Wellner, J.A. (1996). Weak convergence and empirical processes. Springer Series in Statistics. Springer, New York.
Wu, Y. and Ghosal, S. (2008). Posterior consistency for some semi-parametric problems. Sankhyā, 70, 267–313.
Acknowledgement.
The author acknowledges the hospitality of the Statistics Department of the Vrije Universiteit Amsterdam and TU Eindhoven/Eurandom for a 2-week stay during the preparation of this work. The author would also like to thank Dominique Picard for an insightful comment. This work was partly supported by ANR Grant “Banhdits” ANR-2010-BLAN-0113-03.
Author information
Authors and Affiliations
Corresponding author
A Appendix: Checking conditions (C1)-(N1)
A Appendix: Checking conditions (C1)-(N1)
Let us check that the priors Π α and Π α,* verify conditions (C 1 ) and (N 1 ) for some rate ε n , in some domain of values of the regularity parameters (α,β). The arguments are very similar to the ones used in Castillo (2012) for the translation parameter estimation, Castillo (2012), Eq. (9). In fact, here we obtain the same set of parameters (α,β) for which (C 1 ) and (N 1 ) are satisfied, see Fig. 1 in Castillo (2012).
First, we check the concentration condition (C 1 ) following the approach in Ghosal and van der Vaart (2007). The first step is to show a concentration in terms of a distance for which tests with exponential decrease exist. Given the true parameter (θ 0,f 0) and another parameter (θ 1,f 1), let us set
Simple calculations analogous to Lemma 5 in Ghosal and van der Vaart (2007) show that this test enables to test the true (θ 0,f 0) versus a ball with appropriate exponential decrease of the error probabilities, see Castillo (2012), Eq. (4), or Ghosal and van der Vaart (2007), Eq. (2.2). The corresponding testing distance d T is given by
One then relates \(d_T^2\) to the squared-distance \(\|f_1-f_2\|_2^2+({\theta}_1-{\theta}_2)^2\). This is easily done by adapting Lemma 4 in Castillo (2012) to the case of not-necessarily symmetric f. Once those distances are related, the verifications of the entropy and prior mass conditions are done exactly as in Castillo (2012), Section 4.1.1, thus leading to (C 1 ). One also verifies that the rate ε n can be taken proportional to n − α ∧ β/(2α + 1).
Now, we check (N 1 ). The term R n (θ 0,f + (θ − θ 0)γ) is 0 so one focuses on R n (θ,f). We first introduce a sieve \({\mathcal{F}}_n\) on which it is possible to restrict the supremum in condition (N 1 ). Let us introduce the Hilbert space of functions
equipped with the norm \(\|f\|^2 _{2,p} = \sum _{k\geq 1} k^{2p} f_k^2\). The idea is to use Borell’s inequality in the form of van der Vaart and van Zanten (2008), Theorem 5.1. This result exactly tells us that overwhelming probability, the Gaussian prior (either Π α or Π α,*) draws functions g which can be written
but also, for 1 ≤ p < α, and some rate α n →0 to be specified,
where \({\mathbb{H}}_1^{{\alpha}}\) denotes the unit ball of the RKHS of the prior (we use the same notation \({\mathbb{H}}_1^{{\alpha}}\) for Π α and Π α,* though the corresponding spaces differ slightly) and \({\mathbb{B}}_1^p\) the unit ball of the space \({\mathbb{B}}^p\). As in Castillo (2012), one can then define a sieve \({\mathcal{F}}_n\) as the intersection of the set of functions defined by (3.2) and (3.3). Under some conditions on α n , Borell’s inequality implies that the complement \({\mathcal{F}}\setminus {\mathcal{F}}_n\) has probability less than \(\exp(-n{\varepsilon}_n^2)\), see Castillo (2012), Lemma 13. Thus, it is possible to restrict the study of the posterior (and of (N 1 )) to \({\mathcal{F}}_n\).
We first deal with the deterministic terms R n,3, R n,4. To control R n,4, it is enough to bound from above separately \(\int (a_f(t-{\theta})-a_f(t-{\theta}_0))^2 dt\) and \(\int D_n(t,h)^2 dt\). This last term can be bounded as in Castillo (2012), Lemma 5 (adapting slightly the proof to accommodate to not necessarily symmetric functions f), leading to a bound in o(1 + h 2). The first term is bounded using the decomposition (9) in the form f = α n v + w n , with \(\|w_n\|_{{\mathbb{H}}_1^{\alpha}}^2\le n{\alpha}_n^2\),
The bounds on the respective variances have been derived in Castillo (2012), see the bounds to (22)–(23). The first term is a \(O({\alpha}_n^2 h^2)\) and the second is a \(O((1+h^2){\alpha}_n^2 n^{2/(1+2{\alpha})})\). Thus, both are o(1 + h 2) provided that \({\alpha}_n=o(n^{-1/(1+2{\alpha})})\).
To bound R n,3, we develop the product and bound again each term separately. One resulting term is \(\int (a_f-hf_0')(t-{\theta}_0) D_n(t,h)dt\) and, similar to Lemma 6 in Castillo (2012), is a o(1 + h 2) as soon as \({\varepsilon}_n=o(n^{-1+{\beta}/2})\). Another term is \(h\int f_0'(t-{\theta}_0)(a_f(t-{\theta})-a_f(t-{\theta}_0))dt\). Using Cauchy–Schwarz inequality, we can reuse the bound of the previous display. The last term to bound is \(\int a_f(t-{\theta}_0)(a_f(t-{\theta})-a_f(t-{\theta}_0))dt\). First, we notice that, for any w in L 2[0,1], 1-periodic of Fourier coefficients w k , expanding the function on the Fourier basis,
Applying this to the function \(a_f=\sqrt{n}(f-f_0)\) and using the inequality sin(x) ≤ x enables us to bound the quantity at stake by a constant times \(h^2\sum_{k\ge 1} k^2 (f_{0,k}-f_k)^2\). We split this sum along indexes k ≤ k(n) and k > k(n), with \(k(n)=\lfloor n^{1/(1+2{\alpha})} \rfloor\). The sum up to k(n) leads to the bound \(h^2 k(n)^2\|f-f_0\|^2\le h^2k(n)^2{\varepsilon}_n^2\). Due to the expressions of k(n) and ε n , this is a o(h 2) when α ∧ β ≥ 1. The sum for k > k(n) is bounded noticing that \(\sum_{k>k(n)} k^2 f_{0,k}^2=o(1)\) since β > 1 and using the decomposition (9) as follows:
Since \(v\in{\mathbb{B}}_1^p\) with p > 1, the first term is a \(o({\alpha}_n^2)\). Since \(w\in{\mathbb{H}}^{{\alpha}}_1\), we have that
By definition of k(n), we conclude that the term at stake is a o(h 2) if \(n^{2/(1+2{\alpha})}{\alpha}_n^2=o(1)\). The stochastic terms R n,1 and R n,2 are exactly the same (up to the symmetry assumption on f, which does not change the proofs) as in Castillo (2012), see Eq. (16)–(18), so we can borrow the proofs.
The imposed conditions on ε n ,α n found above are the same as in Castillo (2012), Section 4.1.3, where it is checked that those are satisfied as soon as \({\alpha} > 1 + \sqrt{3}/2\), β > 3/2, and if β < 2 ∧ α, also α < (3β − 2)/(4 − 2β ). This is the zone depicted in Fig. 1 in Castillo (2012). It includes in particular the rectangle \({\alpha} > 1 + \sqrt{3}/2\), β ≥ 2, where (N 1 ) is therefore satisfied.
Rights and permissions
About this article
Cite this article
Castillo, I. Semiparametric Bernstein–von Mises theorem and bias, illustrated with Gaussian process priors. Sankhya A 74, 194–221 (2012). https://doi.org/10.1007/s13171-012-0008-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13171-012-0008-6