The variation of the posterior variance and Bayesian sample size determination

We consider Bayesian sample size determination using a criterion that utilizes the first two moments of the posterior variance. We study the resulting sample size in dependence on the chosen prior and explore the success rate for bounding the posterior variance below a prescribed limit under the true sampling distribution. Compared with sample size determination based on the average of the posterior variance the proposed criterion leads to an increase in sample size and significantly improved success rates. Generic asymptotic properties are proven, such as an asymptotic expression for the sample size and a sort of phase transition. Our study is illustrated using two real world datasets with Poisson and normally distributed data. Based on our results some recommendations are given.


Introduction
Sample size determination (SSD) is the attempt to estimate the data size that is needed in order to meet a certain criterion (Desu 2012). This task is usually performed at a planning stage before any data is actually measured or recorded so that especially in the context of high financial or temporal expenses a careful SSD becomes indispensable. In the design, say, of animal experiments or clinical trials SSD can even have an ethical dimension (Charan and Kantharia 2013;Dell et al. 2002). In this article we study a Bayesian method for SSD that limits the expected fluctuations of the uncertainty of the result. By ''uncertainty'' we will here mean (the square root of) the posterior variance. For n data points x n ¼ ðx 1 ; . . .; x n Þ drawn from a sampling distribution pðx n jhÞ with parameter h the posterior distribution is defined by pðhjx n Þ / pðhÞ Á pðx n jhÞ; where pðhÞ denotes the prior for the parameter h. The posterior variance is then given as u 2 n :¼ Var h $ pðhjx n Þ ðhÞ: In practice, a scientist performing an experiment might desire to specify her/his result with an according uncertainty, saŷ h AE u n ; with u n being the square root of u 2 n as defined in (2) and withĥ being the posterior mean. In order for this result to be precise enough the scientist might desire to fulfill a condition such as u n \e or, equivalently, for some small, positive e that is chosen a priori. As the posterior distribution is dependent on the data x n , so is u 2 n . Choosing an appropriate sample size n so that (3) is guaranteed before x n is known is only possible for a few restricted scenarios, for instance Bernoulli distributed samples (Pham-Gia and Turkkan 1992; Joseph and Bélisle 2019). A more generally applicable criterion is to require instead of (3) where mðx n Þ ¼ R pðx n jhÞpðhÞdh denotes the prior predictive. This is known as the average posterior variance criterion (APVC) in the literature (Wang and Gelfand 2002;Pham-Gia and Turkkan 1992;De Santis 2007). For many standard cases explicit expressions for u 2 n can be derived. The usage of the prior predictive mðx n Þ is quite natural as it describes what is known about the data x n given our prior knowledge. We will denote the smallest n such that the APVC (4) is satisfied throughout this article by e n e . In the literature many alternative criteria can be found that replace u 2 n by some other, data dependent random variable Tðx n Þ, compare for instance (Adcock 1997;M'lan et al. 2008;De Santis 2006;Wang and Gelfand 2002;Rubin and Stern 1998). While we will stick in this article to the choice Tðx n Þ ¼ u 2 n many of the ideas presented here can, in principle, be translated to such approaches. 1 The APVC has a rather obvious drawback: it only guarantees (3) to hold on average. Consequently, one expects that u e n e is larger than e for certain data samples x e n e , cf. also Fig. 3a below. To get a better grasp on the variability of the uncertainty one can make more extensive usage of the prior predictive mðx n Þ (De Santis 2006Santis , 2007Brutti et al. 2008;Gubbiotti and De Santis 2011;Sambucini 2008;Sahu and Smith 2006). This article aims at studying the behavior of such criteria and thereby to give some guidance or, at least, a deeper understanding for a SSD based on mðx n Þ. In order to do so we will use an extension of the APVC (4) which we will call the variation of the posterior variance criterion (VPVC) and which takes the form where k is a parameter to be chosen and where Throughout this article we will denote the smallest n such that (5) is satisfied by n e . We will see in Sect. 2 below that taking (5) into account leads to substantially different sample sizes than the sole consideration of the APVC (4) and to a better compliance with (3). We will often take k ¼ 2 in this article (loosely motivated from the normal distribution) but will show that for e ! 0 there is an optimal k Ã . We will give some ideas on how to guess k Ã in practice. Moreover, we will provide an asymptotic formula for the sample size n e in the small e regime. As the VPVC (5) only uses the first two moments it is of course not as exhaustive in its description as a criterion that considers the full law mðx n Þ-compare for instance (De Santis 2006, 2007Brutti et al. 2008;Gubbiotti and De Santis 2011;Sambucini 2008), but for different criteria than the one considered here. In Sect. 2.1 below we will compare ourselves to a more extensive usage of mðx n Þ and will find that in many cases (5) is a good approximation. Taking this approximation on the other hand is quite convenient for our purposes: it provides us with explicit expressions, spares us numerical issues and allows for a rather concise discussion of asymptotic properties in Sect. 3. We will discuss our method for two common cases, namely Poisson and normally distributed data, and illustrate our discussion with real world datasets.
The authors are, to the best of their knowledge, not aware of work in the literature that considers a criterion in the exact same shape of (5). (In Pham-Gia and Turkkan 1992) Pham-Gia and Turkkan consider an object such as Du 2 n from (6) for a Binomial distribution but apply it in a different manner. Our approach is inspired from the quite common idea of using mðx n Þ for studying u 2 n . We believe that the discussion in this paper deepens the understanding of SSD methods built on mðx n Þ in general. The paper is organized as follows: Sect. 2 discusses the application of the VPVC to Poisson and normally distributed data. We will compare our results to the simpler AVPC method (4), visualize the effect of using a prior that is inconsistent with the underlying parameter and debate how a conservative SSD could be performed. For this purpose we will use actual datasets, namely the daily number of accidents in Leeds in 2018 (Leeds City Council 2019) and the mercury concentration in the blood of alligators in South Carolina and Florida Nilsen et al. 2019). In Sect. 3 we will look at the behavior of the VPVC for e ! 0. The results from that section, especially Theorem 3.5, indicate that SSD methods based on mðx n Þ exhibit some sort of phase transition in this limit. We will show that a k Ã exists such that for k [ k Ã a SSD based on the VPVC (5) will ensure that (3) will asymptotically be true with probability 1 under the true sampling distribution. As k Ã will depend on the (usually unknown) true value of h, we will discuss a method to get an upper bound based on the prior knowledge. Moreover, in Lemma 3.4 we will provide a generic, asymptotic formula for the sample n e predicted by the VPVC.

SSD based on the variation of the posterior variance
In this section we study the effects of using the VPVC as proposed in (5) and discuss its dependency on the prior knowledge and the true parameter. The discussion is first carried out for the case of a single parameter and Poisson distributed data. In Sect. 2.2 we then consider an example that involves nuisance parameters and normally distributed data.

Single parameter: poisson distributed data
The histogram in Fig. 1 shows the daily number of traffic accidents that happened in Leeds (UK), taken from Leeds City Council (2019). The number of accidents is an important metric when judging the effect of traffic planning (Hadayeghi et al. 2010). While evaluating a whole year will give a good precision on the expected daily number of accidents, this might be an intolerable time scale for judging the effect of traffic measures such as a revised speed limit. Planning an evaluation at a too early stage on the other hand might turn out to be too vague and therefore useless.
What is a suitable number of days before we can give a decent estimate? To formalize this question let us suppose that the data for n days follows the product of n Poisson distributions with the (unknown) parameter h: 2 where x n ¼ ðx 1 ; . . .; x n Þ and each x i for i ¼ 1; . . .; n should be read as the number of accidents on a day. We want to find n such that the uncertainty u n about h is less than some e, for which we take e ¼ 0:3 in this section. To specify our prior knowledge about h we will use a gamma distribution pðhÞ ¼ Gammaðhja; bÞ with shape a and rate b. As uncertainty u n for our result we take the square root of the posterior variance 2 . The inequality u 2 n \e 2 has no solution for n that holds for any x n . However, we can average over prior predictive mðx n Þ and choose the smallest n ! 1 such that We referred to this as the APVC in the introduction and denoted the corresponding n by e n e . The result of such an SSD for e ¼ 0:3 is shown in Fig. 2a: for various values of the prior mean E h $ pðhÞ ½h ¼ a b and its standard deviation ððVar h $ pðhÞ ðhÞÞÞ 1=2 ¼ a 1=2 b we plotted the sample size e n e predicted by (8). Naturally, as the mean of pðhÞ increases the sample size e n e increases as well since h is linked to the variance of the Poisson distributed data. The standard deviation of pðhÞ seems to have only a minor influence on the sample size with one exception: below a certain threshold the prior variance pushes the posterior variance into the right ballpark so that even a minimal sample size of e n e ¼ 1 is enough to fulfill (8). Figure 3a illustrates for which prior choices the SSD based on the APVC is successful based on the full year accident dataset we depicted in Fig. 1: for each prior mean and standard deviation and the corresponding sample sizes e n e from Fig. 2a 100 random samples of size e n e were drawn from the Leeds accident data and the corresponding u e n e were computed. Figure 3a shows the fraction of u e n e that is actually below e.
For comparison the ''true'' value (the average of the full data shown in Fig. 1) is depicted by the dashed line. The result is rather disappointing. Only beyond the true h of 5.46 the quota rises above 60%. The reason is quite apparent: according to the APVC u 2 e n e is expected to be small enough only on average.
As we pointed out in the introduction we therefore here use a refined criterion, which we called the VPVC and which takes the form where 3=2 and where we choose k ¼ 2. The result of the VPVC is depicted in Fig. 2b. There are two clear differences compared to Fig. 2a: first, the sample sizes n e are higher than the numbers e n e we obtained from the APVC, which was expected as we added an additional positive term k Du 2 n to the left hand side of the criterion. Second, and perhaps more important, the result of the SSD substantially increases once we increase the standard deviation of the prior. This allows us to make our sample choice more conservative for a given prior mean by increasing its standard deviation, i.e. our prior uncertainty about h, which is quite natural. The most conservative SSD is thus located in the right upper corner of the  (2018) plot. This phenomenon for parameters that describe the variance of the data will appear again in the next subsection.
The success of the refined criterion is depicted in Fig. 3b. The percentage of u n e below e reaches now much higher values, at many positions beyond 95%. This success is also more robust against deviations from the true value. Even for a mean of the prior pðhÞ which is well below the true h a high enough standard deviation of pðhÞ will allow to reach a quota of 95%. Figure 3b therefore splits in three areas. As in Fig. 3a there is a bottom part where the posterior variance is dominated by the small prior uncertainty so that u n e \e is easily satisfied even for the minimal sample size of n e ¼ 1. As the standard deviation of pðhÞ gets larger a too small value of the prior mean will result in a low quota: the prior was chosen too optimistic. Increasing either the mean of pðhÞ or its standard deviation will lead however to a more conservative SSD and to a better compliance of u n e \e. The highest percentage can therefore be found in the upper right corner of Fig. 3b and in the bottom area. We already pointed out in the introduction of this article that enhancing the mean by the variance as in the VPVC (5) is, in a certain sense, only an approximation and an exhaustive treatment would use the full distribution mðx n Þ. In fact many methods in the literature (De Santis 2006Santis , 2007Brutti et al. 2008;Gubbiotti and De Santis 2011;Sambucini 2008;Wang and Gelfand 2002;Chen et al. 2011;Psioda and Ibrahim 2019) use a criterion that is formulated as a probability when x n is drawn from mðx n Þ. The cited works use different criteria that are not directly comparable to the VPVC presented here. To allow for some comparison let us forge (5) into a probabilistic shape where p can be chosen between 0 and 1. We will denote the smallest n for which (10) is satisfied as n P e . In a way (10) resembles the length probability criterion (LPC) from the literature, compare for example De Santis and Pacifico 2004;De Santis 2007;Joseph and Belisle 1997). For most cases n P e can only be estimated numerically via sampling and even for rather easy cases, such as the one from the next subsection, we observed that finding a stable n P e is rather elaborate. For the Poisson setup a computation shows that P n i¼1 x i ¼ ðn þ bÞ 2 u 2 n À a follows a negative binomial distribution with failures a and success probability n nþb so that we can in fact precisely compute n P e in this case. Figure 9 in the appendix shows the relative deviation between n e and n P e for various choices of p in (10) together with the corresponding success rates u n P e \e. For values of p between 0.95 and 0.97 we observe a good accordance between the VPVC and (10): the relative deviation between n e and n P e is below 5% for almost all priors, with exception of extremely skew choices (in the upper left corner) where a strong dependency of n P e on p can be observed. A value of p ¼ 0:98, which would arise for a normal distribution for k ¼ 2 and thus ignore any skewness, produces for a range of prior choices substantially larger sample sizes.  (Rumbold et al. 2002;Grillitsch and Schiesari 2010).

Nuisance parameters: normally distributed data
We want to study the minimal number of alligators that is needed in order to specify the Hg concentration up to a pre-specified precision e and will, in a similar manner as in Sect. 2.1, use the data collection from (Lawson and Jodice 2019; Nilsen et al. 2019) to validate the success of this sample size planning.
It is common to model the concentration of pollutants via a log-Normal distribution (Ott 1990): for n measurements of mercury concentration c Hg;i we set mg=kg Þ and assume that pðx n jl; r 2 Þ ¼ Q n i¼1 N ðx i jl; r 2 Þ with mean l and standard deviation r 2 . Both parameters l and r 2 are unknown, with l as the parameter of interest and r 2 as nuisance parameter. We take the normal-inversegamma prior (Fink 1997) pðl; r 2 Þ ¼ N ðljl 0 ; kr 2 Þ Á IGðr 2 ja; bÞ; where IG denotes the inverse-gamma distribution and where a [ 2; k; b [ 0 and l 0 are hyperparameters. The squared uncertainty u 2 n for l is given by the variance of the marginal posterior pðljx n Þ, that is u 2 with n k ¼ n þ k À1 and x ¼ 1 n P n i¼1 x i . Let n e be again the smallest n such that u 2 n þ k Á Du 2 n \e 2 ; where u 2 n ¼ E x n $ mðx n Þ ½u 2 n ¼ b n k ðaÀ1Þ , and where we choose once more k ¼ 2. The sample sizes n e for various choices of the prior pðl; r 2 Þ and e ¼ 0:1 are depicted in Fig. 5.
To visualize the impact of the prior knowledge we varied one of the marginals pðlÞ and pðr 2 Þ while keeping the other one fixed. In Fig. 5a we varied the marginal prior pðlÞ, while fixing the marginal pðr 2 Þ to have a mean of 0.15 and a standard deviation of 0.10, for comparison this pðr 2 Þ was marked by the black cross in Fig. 5b. For Fig. 5b we fixed pðlÞ to have a mean À1:4 and a standard deviation of 0.6 (marked by the black cross in Fig. 5a) and varied pðr 2 Þ. In particular, the two crosses in Fig. 5a, b both mark positions with an equal sample size of n ¼ 33. The APVC criterion from (4) and the average coverage criterion from (Adcock 1988, Joseph et al. 1995 both yield a sample size of 15 for this prior.
Note, that the effect of the marginal pðlÞ is rather minimal, once pðr 2 Þ is kept fixed. In fact, the VPVC turns out to be independent of the hyperparameter l 0compare the vertical symmetry in Fig. 5a. Varying the standard deviation of pðlÞ while keeping pðr 2 Þ fixed will only affect the hyperparameter k which has only a minor influence on the SSD result. The only exception to this is the bottom area of Fig. 5 where the small standard deviation of pðlÞ forces the posterior variance to be small and thus predicts a small sample size of 1, which is similar to a phenomenon we observed in Sect. 2.1. Figure 5b shows that the variance parameter r 2 has a similar influence on the sample size as the Poisson parameter in Sect. 2.1: increasing either the mean or the standard deviation of pðr 2 Þ will increase the sample size so that the most conservative experimental design is located in the upper right corner. This behavior can again be expected since increasing the mean of pðr 2 Þ one will expect data that have larger variability and contain less information. In contrast to Figs. 2b and 5a there is no distinct bottom area of minimal sample sizes.
As in Sect. 2.1 we can use the full dataset, the one depicted in Fig. 4, to judge the success of our sample size planning. Figure 6 displays, for the same priors as in Fig. 5 the proportion of u n e below e for u n e computed from random samples of size n e , drawn from full dataset. The left plot, Fig. 6a, reveals that the choice of the marginal pðlÞ has mostly an impact on the success of the SSD if its mean is far-off from the true parameter. Moreover, a very precise prior knowledge about l, will result in a high proportion of u n e \e, which is similar to the observations we made in Sect. 2.1. The behavior of the percentage for which u n e \e when varying pðr 2 Þ is similar to what we have observed for the Poisson parameter in Sect. 2.1. For small values of the mean of pðr 2 Þ the quota drops, which can be cured by increasing the standard deviation of pðr 2 Þ. The highest percentage can be found in the upper right corner. For the prior marked by the black crosses in Fig. 6, the same as the one marked in Fig. 5, the quota is around 95% while the sample size predicted by the APVC and average coverage criterion only yields a percentage of around 35%. The bottom area of Fig. 6b, where the posterior variance is heavily influenced by the prior, is not as distinct as in Fig. 3b, this role seems to be played in this context by l -compare Fig. 6a.
Let us shortly summarize the most important observations we have made concerning SSD based on the prior predictive mðx n Þ via using the VPVC (5). There are two strategies to have a good chance of achieving u n \e: 1. A precise prior knowledge about the parameter of interest. This will result in small sample sizes 2. A conservative estimate of the parameter that determines the variation of the data. This will result in larger sample sizes. We have observed two different ways to achieve this: • Make a conservative (that is rather large) guess for the mean of the prior marginal w.r.t this parameter. • Choose a large enough standard deviation for this marginal prior. In order for this to work it is not sufficient to base the SSD on the average u 2 n only but higher moments as in the VPVC have to be taken into account.
In the Poisson example from Sect. 2.1 both strategies 1 and 2 concern the same parameter h. For the case of normally distributed data we have observed that strategy 1 concerns the parameter of interest l while to follow strategy 2 the parameter r 2 , that describes the variance of the data, was central.

Behavior for small e
We have seen in Sect. 2 that the determination of the sample size via the VPVC can be successful, provided the prior knowledge is either precise enough or sufficiently conservative. Being ''conservative'' is of course a rather vague quality, but in this section we want to answer what happens if e from (5) becomes smaller. Will it become easier or harder to be conservative? In Theorem 3.5 we show that as e ! 0 the VPVC tends to satisfy (3) perfectly provided k is bigger then some threshold k Ã . In other words for such a k the VPVC has a tendency to become conservative. If k is smaller than k Ã on the other hand we will show that, asymptotically, the uncertainty will almost surely be above the expression used for the VPVC. This phase transition with respect to k, compare Fig. 8 in the appendix, gives some insight on SSD based on the prior predictive in the small e regime. We will give some ideas how to get an upper bound on k Ã . Another result of this section is Lemma 3.4, where we show an asymptotic formula for the sample size n e determined by the VPVC.
First let us fix our setup. Similar as in Sect. 2 we will assume that there is a univariate parameter of interest h 0 and a, possibly empty or multivariate, nuisance parameter h 0 , so that the total parameter that determines our sampling distribution pðx n jhÞ ¼ pðx 1 ; . . .; x n jhÞ is given by We will denote the Fisher information matrix of the single sample distribution pðx 1 jhÞ by I h and its first component (the one linked to h 0 ) by I h 0 . Note, that since h 0 is univariate I h 0 is a non-negative number. The knowledge about h is described by the prior pðhÞ with marginals pðh 0 Þ and pðh 0 Þ. We further fix a value h true ¼ ðh true;0 ; h 0 true Þ in the support of pðhÞ that we treat as the true parameter. In practice h true will of course be unknown. Given some data x n ¼ ðx 1 ; . . .; x n Þ we introduce as above the squared uncertainty by u 2 n ¼ Var h 0 $ pðh 0 jx n Þ ðh 0 Þ as well as u 2 n ¼ E x n $ mðx n Þ ½u 2 n , Du 2 n ¼ ðVar x n $ mðx n Þ ðu 2 n ÞÞ 1=2 with mðx n Þ ¼ R pðx n jhÞpðhÞdh and formulate the VPVC for e [ 0 as The smallest n satisfying (12) will be called once more n e . E h $ pðhÞ ½E x n $ pðx n jhÞ ½jn Á u 2 n À I À1 h 0 j 2 ¼ 0: From these assumptions point 2 is probably the one that needs the most explanation. The Bernstein-von-Mises theorem (Van der Vaart 2000) indicates that under relatively mild conditions, and conditional on h, the posterior distribution is asymptotically normal with variance 1 n I À1 h 0 , which motivates (13). When considering variances, as in this paper, it is then quite natural to require convergence in L 2 , while the B.-v.-Mises theorem only ensures convergence in probability. While standard methods, such as Vitali's convergence theorem, could be applied to provide the first convergence (13) these seems more involved for the second limit as it is in a rather non-standard form. Examining this point in more detail would stray far away from the scope of this paper and could be subject to future research. However, we found that both identities (13) were rather easy to check for many standard cases such as the ones from Sect. 2 or a Bernoulli distribution with a Beta prior.
Under Assumptions 3.1 the posterior variance is, conditional on h, asymptotically proportional to I À1 h 0 . The following quantity therefore describes the variation of the asymptotic posterior variance given the prior knowledge pðhÞ: Let us make these considerations more precise. Remark 3.3 The limit of c true;n is a special case of the one for c n by taking the delta distribution d h true , centered at h true , as a prior.
Proof Let us first consider the limit of c n . Note that the second condition of point 2 in Assumption 3.1 states that X n :¼ n u 2 n converges to I À1 h 0 in L 2 with respect to the law pðh; x n Þ ¼ pðhÞ Á pðx n jhÞ. Since X n is not dependent on h, I h 0 is not dependent on x n and the marginals of pðh; x n Þ are pðhÞ and mðx n Þ we conclude that Var x n $ mðx n Þ ðX n Þ ¼Var ðh;x n Þ $ pðh;x n Þ ðX n Þ ! Var ðh;x n Þ $ pðh;x n Þ ðI À1 and similar E x n $ mðx n Þ ½X n ! E h $ pðhÞ ½I À1 h 0 . From this we obtain the claimed limit of c n : For the second part of the claim we use that by the first condition in point 2 of Assumption 3.1 we have L 2 convergence of X n conditional on h true from which we obtain indeed c true;n ¼ ðVar x n $ pðx n jh true Þ ðX n ÞÞ 1=2 E x n $ pðx n jh true Þ ½X n ! ðVar x n $ pðx n jh true Þ ðI À1 h true;0 ÞÞ 1=2 E x n $ pðx n jh true Þ ½I À1 h true;0 ¼ 0; where we used on the right hand side that I h true;0 does not depend on x n . h We see from Lemma 3.2 that the variation of u 2 n under the marginal mðx n Þ is behaving in a different manner as under the true parameter h true . The object Du 2 n ¼ ðVar x n $ mðx n Þ ðu 2 n ÞÞ 1=2 decays at the same order as u 2 n ¼ E x n $ mðx n Þ ½u 2 n whereas, being conditional on a parameter, the standard deviation ðVar x n $ pðx n jh true Þ ðu 2 n ÞÞ 1=2 decays faster than E x n $ pðx n jh true Þ ½u 2 n . The distributions mðx n Þ and pðx n jh true Þ result in quite different asymptotic behavior for the moments of the posterior variance. We will see in Theorem 3.5 below that these different asymptotic properties will provoke some phase transition with respect to k in the limit e ! 0.
As a first consequence of the observations from Lemma 3.2 we will now see that the variance term k Du 2 n in the VPVC will strongly affect the sample size even as e ! 0.
Lemma 3.4 (VPVC sample size for small e) Provided Assumption 3.1 is true, we have Proof We have used already in the first part of the proof of Lemma 3.2 that nu 2 n ! E h $ pðhÞ ½I À1 h 0 so that applying in addition the convergence c n ! c from Lemma 3.2 we arrive at s n :¼ n ðu 2 n þ k Du 2 n Þ ¼ nu 2 n ð1 þ k c n Þ À! n!1 ð1 þ k cÞE h $ pðhÞ ½I À1 h 0 ¼: Next, observe that since u 2 n [ 0 by point 3 of Assumption 3.1 this implies that we can find a constant c [ 0 such that for any n we have c n \u 2 n þ kDu 2 n , from which we get n e [ c Á e À2 and in particular lim e!0 n e ¼ 1: Let us rewrite the claim of the lemma as lim e!0 n e e À2 Á s 1 ¼ 1: From ( Define m e ¼ bs 1 e À2 c, the largest integer less equal s 1 e À2 , and note that for e small enough there is an integer n 0 e such that the following nested inequality is true Choose further e small enough such that for any n [ m e we have s n =s 1 \ð1 þ d=2Þ. But then the following inequality holds: By (20) we have thus found a smaller integer 1 n 0 e \n e that satisfies the VPVC (12), which contradicts the choice of n e . h For the setup considered in Sect. 2.2 we sketched the convergence of the object from Lemma 3.4 for various priors in Fig. 7a.
The formula from Lemma 3.4 allows for a few interesting observations. First, we see that asymptotically the prior pðhÞ has an impact on the sample size through E h $ pðhÞ ½I À1 h 0 (the expectation of the rescaled B.-v.-Mises limit of the posterior variance) and through the coefficient of variation c. The higher the latter the more impact will the choice of k have on the sample size. Moreover, comparing the sample sizes n e from the VPVC andñ e from the APVC (k ¼ 0) we see that taking the variance into account will increase the sample size by a factor of n ẽ n e ' 1 þ k Á c. Finally, note that we do not need any explicit expression for u 2 n when using the asymptotic expression of n e . Given the Fisher information I h 0 of the sampling distribution and any prior pðhÞ we can directly compute the sample size for small e without any need of computing the posterior distribution.
We will now turn to the main result of this section. In Sect. 2 we have evaluated the ''success'' of the SSD by evaluating how often we will have u n e \e on the actual dataset, recall for instance Fig. 3b. In the following we want to do this for e ! 0 in a purely generic setup by assuming that the data x n follows the distribution pðx n jh true Þ with the true parameter h true . We have already observed in Lemma 3.2 that conditioning on a parameter like h true leads to a different asymptotic behavior than considering the marginal mðx n Þ. The following theorem shows the consequence of this disparity concerning the success of the SSD.
Theorem 3.5 (VPVC becomes conservative for k [ k Ã and small e) Assume that Assumptions 3.1 hold and define We then have for any k In particular, we have for any SSD built on the VPVC (12) with such a k lim e!0 P x ne $ pðx ne jh true Þ u 2 n e ! e 2 ¼ 0: Moreover, k Ã is optimal in the sense that for any 0\k\k Ã we have lim n!1 P x n $ pðx n jh true Þ u 2 n \u 2 n þ kDu 2 h 0 À 1 so that k Ã can be written as k Ã ¼ maxðq; 0Þ=c with c as in Lemma 3.2. Similar as in the proof of Lemma 3.2 we see that Let us first show (22). We can rewrite the inequality inside the probability as u 2 n À E x n $ pðx n jh true Þ ½u 2 n [ u 2 n À E x n $ pðx n jh true Þ ½u 2 n þ kDu 2 n : The expression on the right hand side becomes finally positive when n is high enough. Indeed, we can we reshape it, using point 3 of Assumption 3.1 and the object c n from Lemma 3.2, as for an actual dataset. Samples of u 2 n (inred) for different sample sizes using the data and setup from Sect 2.1 and a prior pðhÞwith mean and standard deviation 4:0 AE 2:0. The black lines mark u 2 n þ kDu 2 n for k ¼ k Ã ¼ 0:73 (computed from the full dataset) and k ¼ k Ã AE 0:30 u 2 n Á 1 À E x n $ pðx n jh true Þ ½u 2 n u 2 n þ kc n ! ¼: u 2 n Á h n where h n satisfies lim n!1 h n ¼ Àq þ k Á c [ 0 due to Lemma 3.2, (25) and the choice k [ k Ã ¼ maxðq; 0Þ=c. For n large enough such that the right hand side of (26) is positive, we can apply Chebyshev's inequality, which yields lim sup n!1 P x n $ pðx n jh true Þ u 2 n [ u 2 n þ kDu 2 n lim sup n!1 Var x n $ pðx n jh true Þ ðu 2 n Þ ðu 2 n À E x n $ pðx n jh true Þ ½u 2 n þ kDu 2 n Þ 2 where we used once more (25) and that we know from Lemma 3.2 that c true;n converges to 0. From (22) we immediately obtain (23) by the definition of n e and the fact that n e ! 1, due to Lemma 3.4. To show (24) we reshape the expression inside the parentheses of (24) into E x n $ pðx n jh true Þ ½u 2 n À u 2 n [ E x n $ pðx n jh true Þ ½u 2 n À ðu 2 n þ kDu 2 n Þ: The rest of the argument then follows along the lines of the first part of the proof. h Theorem 3.5 reveals a sort of phase transition. For a k smaller than some threshold k Ã the squared uncertainty u 2 n will asymptotically be above u 2 n þ kDu 2 n , while for k [ k Ã we will have u 2 n \u 2 n þ kDu 2 n with probability almost 1 for large n (or small e). This effect is shown in Fig. 7b for the Leeds accident data from Sect. 2.1 and a fixed prior.
The behavior predicted by Theorem 3.5 might be surprising at first glance: recall that u 2 n is defined as an average and that the object Du 2 n is the according standard deviation. Naively one might expect that the range of k times the standard deviation around the mean might always cover some percentage strictly between 0 and 1. The reason behind the observed phenomenon is that we use one distribution, mðx n Þ, to perform the SSD and another one, pðx n jh true Þ, to evaluate its success. The different asymptotics between these two distributions then gives rise to the erratic behavior around k ¼ k Ã . The effect for a fixed k and various priors is depicted by Fig. 8 in the appendix: As e gets small we get two sharply separated regions for the hyperparameters where one region has a quota of u 2 n e \e 2 around 0% while the other one approaches 100%. We expect that this phenomenon is not solely adherent to the VPVC. For instance, it seems reasonable that for many cases a credible interval for u 2 n under mðx n Þ is, at least loosely, related to a certain k so that similar effects are likely to arise for a framework based on other approaches using mðx n Þ. Investigating such a point further might be an interesting subject for future research.
We have seen in Theorem 3.5 that for small e the success of the SSD crucially depends on having k [ k Ã . Now, k Ã is dependent on the true parameter h true and thus unknown in practice. For the prior used for Fig. 7b we have a k Ã around 0.21 and therefore much smaller than the k ¼ 2 used in the analysis of Sect. 2.1. In general one might choose for an area H in the domain of pðhÞ as an upper bound for k Ã k up:b: Taking a single standard deviation around the prior mean for pðhÞ and pðr 2 Þ respectively we get for the prior from

Conclusions
We discussed a Bayesian criterion for sample size determination, based on the prior predictive, which we called the variation of posterior variance criterion (VPVC). Compared with the classical average posterior variance criterion (APVC) this criterion leads to a better compliance with the objective to restrain the uncertainty by some e, while still providing explicit expressions in contrast to a full treatment of the law of the uncertainty under the prior predictive. In particular this allows to treat the asymptotic behavior for small e in a generic manner and thus to enhance the understanding of sample size methods based on the prior predictive.
Using two different datasets we discussed the dependency of the sample size determination and its success on the chosen prior and the true parameter and deduced concepts on how to choose the prior to lead the sample size determination more likely to a success. In particular, we observed that a part of these strategies cannot be applied for the APVC but only for methods such as the VPVC that take higher moments with respect to the prior predictive into account.
Finally, we gave some results concerning the behavior of the VPVC for e ! 0. We proved an explicit formula for the predicted sample size in this regime and showed that there is an exact limit for the portion of the variance w.r.t. prior predictive that can be used in order to guarantee the success of the sample size determination in this limit. Fig. 8 Percentage of u ne \e for the setup and data from Sect. 2.2 for various e (in decreasing order), k ¼ 2, various marginals pðr 2 Þ and the same fixed marginal pðlÞ as in Fig. 6b. The ''true'' value of r 2 (computed from the dataset) is depicted by the dashed lines. We used a smaller resolution compared to Fig. 6b for numerical reasons c Fig. 9 Left column: Relative deviation ðn e À n P e Þ=n P e between the sample sizes n e from the VPVC and n P e predicted by (10) in steps of 5% for the setup from Sect. 2.1 and various choices of p in (10). Right column: Percentage of u n P e \e for the data from Fig. 1 (a)