Approximate Bayesian computation approach on the maximal offspring and parameters in controlled branching processes

Our purpose is to estimate the posterior distribution of the parameters of interest for controlled branching processes (CBPs) without prior knowledge of the maximum number of offspring that an individual can give birth to and without explicit likelihood calculations. We consider that only the population sizes at each generation and at least the number of progenitors of the last generation are observed, but the number of offspring produced by any individual at any generation is unknown. The proposed approach is twofold. Firstly, to estimate the maximum progeny per individual we make use of an approximate Bayesian computation (ABC) algorithm for model choice and based on sequential importance sampling with the raw data. Secondly, given such an estimate and taking advantage of the simulated values of the previous stage, we approximate the posterior distribution of the main parameters of a CBP by applying the rejection ABC algorithm with an appropriate summary statistic and a post-processing adjustment. The accuracy of the proposed method is illustrated by means of simulated examples developed with the statistical software R. Moreover, we apply the methodology to two real datasets describing populations with logistic growth. To this end, different population growth models based on CBPs are proposed for the first time.


Introduction
We focus our attention on inferential issues related to controlled branching processes. A controlled branching process is a discrete-time stochastic process that models populations developing in the following manner: the population begins with a fixed number of individuals or progenitors; each of them, independently of the others and according to a common probability distribution, gives birth to offspring, and then ceases to participate in subsequent reproduction processes. Thus, each individual lives for one unit of time and is replaced with a random number of offspring. Moreover, since by several reasons of an environmental, social, or other nature the number of progenitors which take part in each generation might be controlled, a random mechanism is introduced in the model to determine the number of offspring with reproductive capacity in each generation. Mathematically, a controlled branching process (CBP) is a process {Z n } n∈N 0 defined recursively as where N 0 = N ∪ {0}, N ∈ N, {X nj : n ∈ N 0 ; j ∈ N} and {φ n (k) : n, k ∈ N 0 } are independent families of non-negative integer valued random variables and the empty sum in (1) is considered to be 0. The random variables X nj , n ∈ N 0 , j ∈ N, are assumed to be independent and identically distributed (i.i.d.) with distribution p = {p j = P(X 01 = j) : j ∈ N 0 } and in terms of population dynamics they represent the number of offspring given by the j-th progenitor of the n-th generation. Moreover, {φ n (k)} k∈N 0 , for n ∈ N 0 , are independent stochastic processes with equal one-dimensional probability distributions. This property means that the control mechanism works in an independent manner in each generation, and once the population size at certain generation n, Z n , is known, the probability distribution of the number of progenitors, denoted by φ n (Z n ), is independent of the generation. Some particular cases collected in this general family of branching processes are the simplest model, the standard Bienaymé-Galton-Watson (BGW) process, by considering φ n (k) = k a.s. for each k, or the branching processes with immigration, by setting φ n (k) = k + Y n , where {Y n } n∈N 0 is a class of i.i.d. random variables, among others. The recent monograph [9] provides an extensive description of its probabilistic theory. The behaviour of the long-time evolution of a CBP is determined by the parameters of the model associated to the offspring and control laws. Briefly, assuming that m = E[X 01 ] and ε(k) = E[φ n (k)], k ∈ N 0 , exist and are finite, and whenever the limit τ = lim k→∞ k −1 ε(k) exists, the threshold parameter of this branching model is τ m. The extinction occurs almost surely in subcritical populations, namely if τ m < 1, and different growth rates on the non-extinction set are obtained depending on whether τ m = 1 (critical population) or τ m > 1 (supercritical population) with additional conditions. In real situations, these parameters are unknown. Until now, the methodologies proposed in the literature for the Bayesian inference on the offspring distribution have focused on the cases where either the support of the reproduction law is finite and known (see [7]) or that the offspring law belongs to some one-dimensional parametric family (see [10]). A first paper in the context of the CBP that faces the problem of an unknown scenario on the offspring distribution could be [8]. The statistical procedures developed in this work did not include the estimation of the posterior distribution of the maximum number of offspring per progenitor given the sample of population sizes at each generation {Z 0 , . . . , Z n }, but this quantity was set 1 + max 1≤k≤n Z k as a primary approach. Within the class of other branching processes, this problem has been only considered in the BGW process. In particular, from a probabilistic viewpoint, the asymptotic behaviour of the number of offspring of the most prolific individual in the n-th generation has been studied as an extreme value problem in [1,18]. From an inferential viewpoint, a particle Markov Chain Monte Carlo method was introduced to estimate the support of the offspring law in [5]. However, the drawback of this approach is that its computational feasibility strongly depends on dealing with BGW processes with low values.
The first aim of this work is to provide a methodology to estimate the maximum progeny that an individual in the population can bear (called maximum offspring capacity per individual) in the general class of CBPs and regardless the magnitude of the observed samples. Having estimated the maximum offspring capacity per individual, we also make inference on the expected values of offspring and control laws in a CBP. To this end, we consider the maximum offspring capacity per individual as a model index and, for the first time, we tackle its estimation by means of a model choice procedure. Thus, we provide an algorithm based on approximate Bayesian computation (ABC) techniques to estimate both the maximum number of offspring that an individual is able to give birth to and the parameters of interest of the model. The ABC methodology in the context of CBPs was already analysed and applied in [10] by assuming that the offspring distribution belongs to a parametric family. This means that the family of offspring distributions is known (for instance, geometric, Poisson or binomial distributions) and the only unknown elements are the parameters that determine them. In this paper we drop this assumption and face the problem of making inference on the parameters of interest in a less informative scenario with respect to the offspring distribution.
For our purpose, let us consider a CBP with an offspring distribution with an unknown support and control laws belonging to some known one-dimensional parametric family with unknown parameter. Let κ = sup{ j ∈ N 0 : p j > 0} the maximum number of offspring per individual, denote p(κ) = {p j (κ) = P(X 01 = j) : j ∈ N 0 } the offspring distribution when the maximum offspring capacity per individual is κ, and let γ be the control parameter, with γ ∈ Γ ⊆ R. We recall that in that case, the distribution of each control variable φ n (k) only depends on k and γ , and E[φ n (k)] = ε(k, γ ). Let us denote m(κ) = κ j=0 j p j (κ) and τ (γ ) = lim k→∞ k −1 ε(k, γ ). We assume that m(κ) < ∞ and τ (γ ) exists for all γ ∈ Γ . Moreover we assume the existence of the inverse of τ (·). Several preliminary simulation studies lead us to the conclusion that to approximate the posterior distributions of the parameters of interest reasonably well by making use of ABC methodology, we have to assume that at least the population sizes at each generation and the number of progenitors in the last generation are observable (see [10]). Hence, let us consider the observed sample Z obs n = {Z obs 0 , . . . , Z obs n , φ n−1 (Z n−1 ) obs }. Briefly, we will proceed as follows: firstly, we draw a sample from an estimate of the posterior distribution of κ, denoted by π(κ| Z obs n ), by considering a model choice algorithm. Secondly, we generate a sample from an estimate of the posterior distribution of ( p( κ n ), γ ), where κ n is a point estimate of κ. Next, from this sample we estimate the posterior distributions of ( p( κ n ), γ ), m( κ n ) and τ (γ ) using kernel density estimation. We denote these posterior distributions by π( p( κ n ), γ | κ n , Z obs n ), π(m | κ n , Z obs n ) and π(τ (γ ) | κ n , Z obs n ), respectively.
The performance of the proposed algorithm is firstly illustrated by two simulated examples. Next, the method is applied on two real datasets that show a logistic growth. To this end, we model the evolution of logistic growth of populations by CBPs, which represents another important novelty of this paper. These populations are characterised by the fact that when their sizes are small enough, they grow with almost no restriction, but when the sizes increase, the limited resources of the environment lead to a control on the population sizes. As a consequence, there exists a maximum population size, usually called carrying capacity in an ecological context, that can be supported by the ecosystem. With the aim of describing mathematically these populations, we introduce CBPs with control distributions given by binomial distributions whose success probabilities mainly depend on the density of the population. We provide several models based on different success probability functions which are inspired in classical deterministic population growth models.
Apart from this introduction, the paper is organized as follows. In Sect. 2 we provide a detailed description of the ABC algorithm for model choice and parameter estimation in the context of CBPs. Section 3 gathers simulation studies to evaluate and illustrate the performance of the proposed ABC approach. In Sect. 4 we present the application of the proposed algorithm to two real datasets from populations that exhibit a logistic growth. Additional information related to the examples are presented in the Appendix. In Sect. 5 we summarise the main contributions of this work.

Methodology
In this section we describe the ABC approach for estimating the posterior distribution of the main parameters of a CBP. ABC algorithms are a group of Monte Carlo algorithms used to find posterior distributions without requiring explicit knowledge of the likelihood function. These are very useful when the likelihood is intractable or too costly to evaluate. The inference is mainly done with samplings from the model, and hence, their versatility in the framework of branching processes (see the monograph [19] for further details).
In this context, the fact that the value of κ is unknown and could be even infinite, increases the complexity of the problem of estimating the parameters of the CBP and requires the use of methodologies for model choice with the aim of estimating the parameter κ. We assume κ ≥ 2 to avoid trivial cases. To implement the ABC methodology, we remark that even if our knowledge on the value of κ is very poor, we usually have some information about an effective upper bound for κ, denoted K max , from the dynamics of the population that we model via the CBP. An example of this situation is the family of K-selected species (see [17]), which includes larger mammals such as elephants, horses, and primates, and whose species are relatively stable populations and produce relatively low numbers of offspring. For practical purposes and without loss of generality, throughout this paper we consider offspring laws with finite support. Thus, κ ∈ {2, 3, . . . , K max }. We can take the parameter κ as a model index. We emphasise that as a consequence, for each value of κ the parameter of interest in the corresponding model is ( p(κ), γ ) = ( p 0 (κ), . . . , p κ (κ), γ ) ∈ Δ κ × R, whose dimension depends on κ, and where Δ κ is the κstandard simplex in R κ .
We recall that our final aim is to estimate the posterior π( p(κ), γ | κ, Z obs n ), with κ a point estimate of κ, and to that end, we propose a two-fold procedure.

First stage: estimation of (Ä | Z obs n )
In the first part, we estimate π(κ | Z obs n ). We apply an ABC algorithm for model choice based on sequential importance sampling, ABC SMC for model choice, introduced in [20] to draw a sample {(κ (1) , p(κ (1) ) (1) , γ (1) ), . . . , (κ (N ) , p(κ (N ) ) (N ) , γ (N ) )} from the joint posterior distribution of (κ, p(κ), γ ) given the observed sample Z obs n , denoted by π(κ, p(κ), γ | Z obs n ). Next, using the information of the marginal sample {κ (1) , . . . , κ (N ) } we are able to estimate the distribution π(κ | Z obs n ). We point out that the output of the ABC SMC for model choice is not analysed as usual in the framework of Bayesian analysis (see [20]). Instead, we use this output to provide an estimation of κ. Precisely, having obtained an estimation of the distribution π(κ | Z obs n ), we propose the closer integer to its posterior mean as the Bayesian point estimator for the parameter κ. We refer to this estimator asκ n . Our choice is justified by the good asymptotic properties that this estimator usually exhibits even in the case of CBPs (see [11]).
We now describe how to implement the ABC SMC algorithm for model choice to draw samples from the posterior distribution π(κ, p(κ), γ | Z obs n ). The algorithm reaches the target distribution through a series of intermediate distributions sampling from appropriate proposal distributions and weighting the samples by importance weights. To that end, we fix a number of T iterations and a decreasing sequence of tolerance levels 1 > · · · > T . In practice, the tolerance levels are selected as quantiles of the distances between the simulated and observed data (see the mathematical arguments for this choice in [2]).
The first iteration consists in running the tolerance-rejection ABC algorithm for model choice. It starts by drawing a value κ from the prior distribution on the models, denoted π(κ). Assuming that we have no other knowledge than the lower and upper bounds of κ, we shall consider a uniform distribution on the points 2, . . . , K max , denoted U {2, . . . , K max }, for the prior model distribution. Using the fact that the reproduction and control laws are independent, we assume that the prior distribution for the model index κ, denoted by π( p(κ), γ | κ), satisfies where π( p(κ) | κ) is the prior distribution of p(κ) given the model index κ and π(γ ) is a suitable prior for γ . Now, bearing in mind that the parameter p(κ) is a probability distribution with support {0, . . . , κ}, we propose a Dirichlet distribution with a (κ + 1)dimensional parameter α κ , denoted D(κ + 1, α κ ), as the distribution π( p(κ) | κ). Let us also write f ( Z n | p(κ), γ ) to refer to the likelihood function given p(κ) and γ , with } is generated by using the previously sampled parameters and accept them if the sample is close enough to the observed sample Z obs n in terms of some distance ρ(·, ·) and the tolerance level. In this stage, we compare directly the raw data without summary statistics. The jumps between the model indexes might lead to quite different dimensions of the prior distributions π( p(κ)|κ), for each κ, and consequently, finding a low-dimensional summary statistic to identify parameters of a large dimension is quite hard (see the discussion in [16]).
It is worth to mention that in order to quantify the disparities between the simulated and the observed data we can use many different functions. However, based on the results of previous studies (see [10]), a good discrepancy measure in the CBP setting should satisfy the non-negative property, the identity of indiscernible and the symmetry, but it should also compare the simulated and observed data in relative terms to avoid any issue due to the magnitude of each coordinate. For these reasons, we propose the following function: and d e is the Euclidean distance. Finally, at the end of this stage, all the outputs (κ (i) , p(κ (i) ) (i) , γ (i) ) are assigned the same weight ω (i) We can now describe the first iteration of the ABC SMC algorithm for model choice on κ as follows: Algorithm: ABC SMC algorithm for model choice on κ Specify a decreasing sequence of tolerance levels 1 > . . .

End for
To run the following iterations, the idea is to draw the parameters from proposal distributions that are closer to the target distributions so that we can reduce the variance of the final sample. For each iteration t, t = 2, . . . , T , we have to specify a joint proposal distribution for each p(κ * ) and γ * , denoted by q t ( p(κ), γ | p(κ * ), γ * ). However, in real applications finding a joint distribution that leads to a good performance of the ABC SMC algorithm represents a challenge.
To that end, it is important to highlight that despite the independence between offspring and control distributions, once the sample is given, their posterior distributions are usually highly correlated, as shown empirically in the second simulated example in Sect. 3 (see Fig. 12, left). Indeed, the outputs (κ , p(κ ), γ ) of each iteration of the algorithm satisfy , and τ m represents the true value of the threshold parameter. Thus, the use of component-wise perturbation proposals might lead to an inappropriate structure of the true posterior. Taking into account the relationship described above, we suggest the following proposal distribution: We set q t ( p(κ) | p(κ * ), γ * ) to be a Dirichlet distribution with mean vector p(κ * ) and variance controlled by a single tuning parameter a > 0, i.e., a Dirichlet distribution of order κ * + 1 and parameter a p(κ * ), D(κ * + 1, a p(κ * )). Given a value p(κ) from q t ( p(κ) | p(κ * ), γ * ), we fix q t (γ | p(κ), p(κ * ), γ * ) as the distribution of the variable τ −1 (U * /m(κ)), where τ −1 (·) is the inverse of the function τ (·), the random variable U * follows a normal distribution with mean τ (γ * )m(κ * ) and some variance σ 2 t , N (τ (γ * )m(κ * ), σ 2 t ), and m(κ) is the offspring mean of the distribution p(κ). Notice that we keep the variability of the proposal distribution fixed when the value p(κ * ) is perturbed, however an adaptive dispersion is chosen to perturb the control parameter γ . In particular, σ 2 t is twice the weighted empirical variance of selected γ 's in the t − 1 iteration (see [6] for further discussion on optimality of proposals for ABC SMC).
For a step-by-step description of the remaining iterations of the algorithm in the first phase, let us )}, for the output of the t stage of the algorithm. Moreover, let us denote P t (κ) the family defined by the elements of P t such that κ ( j) t = κ, j = 1, . . . , N . We also write 1 A to refer to the indicator function of the set A and ω t = (ω (1) t , . . . , ω (N ) t ) to refer to the vector of weights in the iteration t.
Algorithm: Continuation of the ABC SMC algorithm for model choice on κ End for For every k = 2, . . . , K max , normalise the weights.
End for 2.2 Second stage: estimation of ( p( Ä n ), | Ä n , Z obs n ) Having obtained the estimate for κ, denoted κ n , given by the closest integer to the mean of the sample {κ (1) , . . . κ (N ) } drawn in the first stage, we now describe how to draw a sample from the ABC approximation of the distribution π( p( κ n ), γ |κ n , Z obs n ). Besides the approximation of the marginal posterior distribution of the model index, the output of the first stage provides a sample from the ABC estimate of the marginal posterior distributions of parameters, i.e. π( p(κ), γ |κ, Z obs n ), for κ = 2, . . . , K max . Although the ABC methodology for the inference on κ works quite well without the use of summary statistics as pointed out before, its use does improve the output of the ABC algorithm when the aim is to make inference on the parameters once the model index is known (see [10]). Thus, our proposal is to proceed as follows: when the first stage is implemented all the generated parameter values together with their data sets in the last iteration are stored. Consequently, they can be used to run an ABC algorithm to estimate the posterior distribution of the parameters of the model given κ =κ n without having to generate new data. Let us denote as { Z n } the simulated data corresponding to the sample {(κ (1) , p(κ (1) ) (1) , γ (1) ), . . . , (κ (N ) , p(κ (N ) ) (N ) , γ (N ) )}, and let {κ (i 1 ) , . . . , κ (i L ) } be all the elements of the sample {κ (1) , . . . , κ (N ) } such that κ (i l ) =κ n , for l = 1, . . . , L. Next, we use of the simulated marginal values to check the rejection condition in the tolerance-rejection ABC algorithm considering a suitable summary statistic. We use the following summary statistic This statistic results from adding a fourth coordinate to the one in [10]. The properties of the model (see [9]) ensure that in a general setting, as n → ∞, almost surely on {Z n → ∞}, regardless of whether we consider parametric frameworks for the offspring or control distributions. Consequently, the third and the new coordinate enable us to identify each factor of the threshold parameter. Our simulation results show that the four dimensional summary statistic proposed improves the results compared to previous summary statistics. More details about the efficiency of adding a new coordinate to the summary statistic can be found in [14]. Finally, we apply a post-processing method based on a local linear regression on the output sample. The outputs p(κ (i j ) ) (i j ) are (κ n +1)-dimensional vectors whose coordinates sum one, but, after regression, some of them could be negative. Such outputs must be removed from the sample (see [10] for details on both methods).

Simulated examples
Our methodology is illustrated via several simulated examples. First, we show how well the methodology works in situations as described above, where the reproduction law has finite support. More precisely, we fix the value of the threshold parameter τ m and consider different CBPs where we vary the support of the offspring distribution, the mean of the offspring distribution m, and the control parameter γ in such a way that the value of τ m remains constant for all of the cases. Second, we return to the previous simulated study in [10]. Our aim is to estimate the posterior distribution of the parameters of interest without assuming a parametric offspring distribution. The true offspring distribution in this scenario has an infinite support, but we show our methodology is also useful in this context if the main aim is to approximate the posterior distributions of stable parameters, namely, the offspring mean and control parameter.

Example 1
We begin our simulation study focusing on offspring distributions with finite support. We show the suitability of the methodology in this framework by considering reproduction laws with different supports and means, and also various control laws with different parameters, keeping the same threshold parameter.
To that end, we explore four different models/cases of CBPs where the initial number of individuals is Z 0 = 1 and the control variables φ n ( j) follow binomial distributions with parameters ξ( j) and γ , where ξ( j) = j + log( j) , for each j ∈ N, ξ(0) = 0, and x denotes the integer part of a number x. We observe that these control laws are a mixture of a deterministic component and a random one. The introduction of these control functions can be explained in an ecological context, where, first, we allow the introduction of new individuals in the ecosystem as described by the deterministic function ξ(·), and next, the binomial control models situations as the emigration or death of individuals due to their hunt by predators. Here, γ represents the probability that an individual does not participate in the subsequent reproduction process as it is no longer present in the ecosystem. We note that for these CBPs ε( j, γ ) = γ ξ( j) and τ = γ . For our purpose, we vary the value of the control parameter γ across the four cases. For the reproduction law we chose binomial distributions with different sizes, κ, and probabilities of success, , in such a way that the four models satisfy τ m = 2.88, i.e. the CBPs are supercritical. The values of the parameters are gathered in Table 1. We emphasise that our choice of the parameters enables us to compare the results obtained by the methodology proposed when examining different finite supports and types of skewness of the offspring distribution (see Table 2). For each case described above we simulated the first 10 generations of a CBP and we ran the ABC SMC algorithm for model choice with each of the corresponding samples as observed data (see Table 4 in Appendix for details on the samples). For that purpose, we assumed that our only knowledge on the offspring distribution is an upper bound for κ, and the fact that the control laws for a population size j are binomial distributions with parameters ξ( j) and γ , with γ ∈ (0, 1) unknown. To run the ABC SMC algorithm for model choice we fixed T = 3 iterations, an upper bound K max = 15, α κ = (1, . . . , 1), where the prior for γ is a beta distribution with both parameters equal to 1, and the tuning parameter is a = 30. The choice of the value of a was justified by the results of several simulated experiments to avoid that the proposal distribution becomes a Dirac measure at the point where it is perturbed. We simulated pools of 4 · 10 5 , 2· 10 6 , 20· 10 6 of non-extinct CBPs at the corresponding iterations and fixed as the tolerance levels 1 , 2 and 3 the quantiles of orders 0.0125, 0.0025, and 0.00025, respectively, of the sample of the distances between the paths of the simulated and observed processes. As a result, for each sample path observed we obtained a sample of size 5000 of the corresponding posterior distribution of κ. The barplots of these samples are given in Fig. 1  κ = 4, the distribution is concentrated around 4 and the point estimate isκ n = 5 due to the tail of the distribution. In the Case 2, with κ = 10,κ n = 7 while the posterior distribution is right-skewed too. The posterior distribution in the Case 3, with κ = 7, has a similar shape, but with support {6, . . . , 15}, andκ n = 9. Finally, in the Case 4, with κ = 10, the posterior distribution is more symmetric than in the previous cases and the point estimate isκ n = 12. Taking into account the cumulative distribution function associated with each of the offspring distributions (see Table 2), the proposed estimate ofκ n in each case is quite reasonable.
We continued with the second step of our methodology by performing the tolerancerejection algorithm and the post-processing method with the summary statistic to draw samples from distributions that approximate the posteriors π(m | κ n , Z obs n ) and π(γ | κ n , Z obs n ) in each case. The estimates of the joint posterior π(m, γ | κ n , Z obs n ) densities and their marginal posterior distributions for each case are displayed in Figs. 3, 4, 5, 6. In  . Right: Estimate of π(γ | κ n , Z obs n ). Red dashed vertical lines represent the true value of the parameter, grey solid vertical lines are the sample means, and blue dashed-dotted vertical lines correspond to 95% HPD intervals all cases one can observe that the estimated densities obtained are centred around the true values and their spread is relatively small. These results indicate that the method retrieves the parameters of interest reasonably well, which is a key property to predict the evolution of the population.
Besides the four particular examples presented above, in the second part of this subsection, we analyse in more detail the accuracy of the methodology to estimate the posterior distributions for κ when the support of the reproduction law is finite. Specifically, for each of the previous four models, we simulated the first 10 generations of 100 processes starting with one individual for each of the cases (i.e., 100 different observed samples), and we ran the ABC SMC algorithm for model choice algorithm with each of these observed samples. To this aim, the same number of iterations, prior distributions, tuning parameter as above are set, but we considered simulated pools of 16,000, 80,000 and 800,000 of non-extinct CBPs at the corresponding iterations and fixed as the tolerance levels 1 , 2 and 3 the quantiles of orders 0.0125, 0.0025, and 0.00025, respectively, of the sample of the distances between the simulated and the observed processes. As a result, for each of the 100 observed paths we obtained a sample of size 200 drawn from the posterior of κ, and we computed the Bayesian  Table  3. We recall that the Cases 1 and 2 have the same offspring mean and control parameter, but the offspring distribution in Case 2 is concentrated in greater values than in Case 1 (see Table  2). Our results indicate that the algorithm proposed is able to distinguish and to identify both cases reasonably well, as was reported above in the study developed above for each particular case. We also observe that the skewness of the reproduction law has some impact on the shape of the probability distribution of the Bayesian point estimator of κ, κ n . Indeed, the first offspring distribution is left-skewed, and the method tends to overestimate the value of κ, while the second one is right-skewed and the method tends to underestimate it. In particular, in the Case 1, the choices 5 and 6 cover the 86% of the values of the sample, where 5 has a relative frequency of 48%. In the Case 2, the choices 6, 7 and 8 cover the 72%, where 7 has a relative frequency of 30%; notice in this case that the cumulative probabilities for the values 6, 7, and 8, are 0.97, 0.994, and 0.999, respectively. Regarding the Cases 3 and 4, we remark that both of them have different offspring means and control parameters, and the methodology is able to discriminate satisfactorily between both. Precisely, the choices 9 and 10 represent the 78% of the values of the sample in the Case 3 whereas 11 and 12 correspond   Table 3), and consequently, the performance of the method enables us to estimate adequately the support of the offspring distributions. Next, for each observed path we obtained a sample of the ABC approximation of the posterior distributions of π(m|κ n , Z obs n ) and π(γ |κ n , Z obs n ) and we took the means of these samples as Bayesian point estimates of m and γ ,m n,1 , . . . ,m n,100 andγ n,1 , . . . ,γ n,100 . The box-plots of these estimates are given in Figs. 7, 8, 9, and 10 in Cases 1, 2, 3, and 4, respectively. These show that the sample of each posterior distribution is centred around the true value of each parameter and their dispersion is not considerable. Thus, they lead to accurate estimates of the posterior of the parameters.

Example 2
We continue our simulation study with one of the examples in [10]. The considered CBP starts with Z 0 = 1 individual, the offspring distribution is a geometric distribution with  m and γ ,m n,1 , . . . ,m n,100 andγ n,1 , . . . ,γ n,100 , based on the samples of 100 simulated processes. Horizontal red line corresponds to the true value of the parameter   m and γ ,m n,1 , . . . ,m n,100 andγ n,1 , . . . ,γ n,100 , based on the samples of 100 simulated processes. Horizontal red line corresponds to the true value of the parameter parameter q = 0.4 and the control variables φ n ( j) follow a binomial distribution with parameters ξ( j) and γ = 0.75, where the function ξ(·) was introduced in the previous example. The offspring mean and variance are m = 1.5 and σ 2 = 3.75, the control means are ε( j, γ ) = γ ξ( j) = 0.75ξ( j), j ∈ N 0 , τ = γ = 0.75, and the threshold parameter is τ m = 1.125. Thus, taking into account the value of this last parameter, the CBP is supercritical. The simulated path and the observed sample Z obs n of the first 30 generations of such a process are presented in Table 5 in Appendix. Note that the offspring distribution has infinite support.
In Sect. 4.1 of [10] we provided some inferential results obtained by using ABC algorithms under the hypothesis of a parametric offspring distribution. Recall that this latter implies that we assumed that we knew the parametric family of probability distribution to which the offspring distribution belonged, but the value of the parameter was unknown. We now deal with the estimation of the posterior distributions of the stable parameters of the model as the offspring mean and the control parameter in a different framework. To that end, throughout this example, we understand the maximum offspring capacity per individual as a number κ such that the probability that an individual gives birth to more than κ offspring is sufficiently small, i.e., we look for a realistic upper limit for the offspring capacity of the majority of the individuals of the population. Our goal is to estimate the posterior distribution of the maximum offspring capacity per individual with the aim of identifying the stable parameters of the model properly. Thus, we assume that we can propose a reasonable upper bound, K max , of this maximum offspring capacity in view of the knowledge of the population that we are modeling, as discussed in Sect. 2. We make use of the observed sample to estimate the joint posterior distributions of the mean offspring and control parameter by assuming that our only knowledge on the offspring distribution is K max , and the fact that the control laws for a population size j are binomial distributions with parameters ξ( j) and γ , with γ ∈ (0, 1) unknown.
We implemented the ABC SBC algorithm for model choice described in Sect. 2 by setting the same number of iterations, prior distributions, pools of non-extinct simulated processes, tolerance levels and tuning parameter as in Example 1. We therefore obtained a sample of length 5000 at each iteration. The resulting barplot of the sample obtained from the estimate of the posterior distribution π(κ | Z obs n ) is shown in Fig. 11. The closest integer to the sample Fig. 12 Estimates of the posterior distributions via the ABC algorithm with the local linear regression adjustment with κ n = 5. Left: Contour plot of the estimate of the joint density π(m, γ | κ n , Z obs n ) with the curve τ m = 1.125. The red point indicates the true value of the parameters and the grey one represents the sample means. Centre: Estimate of π(m | κ n , Z obs n ). Right: Estimate of π(γ | κ n , Z obs n ). Red dashed vertical lines represent the true value of the parameter, grey solid vertical lines represent the sample means, and blue dashed-dotted vertical lines correspond to 95% HPD intervals mean of the posterior distribution of κ is 5, and then, we propose κ n = 5. We note that the probability that the true offspring distribution, a geometric distribution of parameter 0.4, is less than or equal to 5 is 0.9533. Consequently, the choice of 5 as the maximum number of offspring per individual is appropriate to explain the evolution of our data reasonably well.
Next, we considered the marginal samples corresponding to κ n = 5 and applied the rejection condition in the tolerance-rejection ABC algorithm and a local linear regression adjustment making use of the summary statistic in (3), as described in Sect. 2.2. The results are plotted in Fig. 12. Precisely, we represented the estimated posterior densities of π(m | κ n , Z obs n ) and π(γ | κ n , Z obs n ) and the contour plot of the estimated joint posterior density of π(m, γ | κ n , Z obs n ) together with the curve τ m = 1.125 (recall that in this case τ = γ ). This figure illustrates the correlation between m and γ given the observed sample. The results show that the proposed ABC algorithm estimates of the posterior densities are quite accurate. It is worthy to point out that the implementation of the proposed methodology is computationally simple, and provides a useful approach to make inference on the parameters of interest in a scenario that requires very little information about the true offspring law. This latter is a great advantage versus the previous methodology considered in [10] that assumed the knowledge of the parametric offspring family to which the true offspring law belonged.

Real data examples
In this section our aim is to apply the described methodology to real datasets that represent the logistic growth of populations. These kinds of populations are characterized by an initial approximately exponential growth of the number of individuals till they reach an equilibrium value around which they fluctuate. This equilibrium value, denoted as K e , mainly depends on the maximum population size supported by the environment. We refer to the latter value as the carrying capacity of the population, denoted as K (see [3]). Population-size dependent branching processes (PSDBPs) are often used to model these kind of data (see, for instance, [4], [12], or [15] and references therein). The PSDBP is a modification of a BGW process. Briefly, the assumption of identical offspring distribution for all the individuals in the BGW process is replaced with the assumption of offspring distributions in each generation which depend on the population sizes. In particular, in order to fit logistic growth data the repro-duction laws depend on the current population size, the carrying capacity and on some other parameters. However, the existence of a carrying capacity does not necessarily imply that the reproductive capacity of an individual changes along generations, but rather the probability that an individual successfully becomes a progenitor. Consequently, we propose a CBP to model population logistic growths by considering control laws defined by binomial distributions with a success probability depending on the current population size, z, the carrying capacity, K , and the offspring mean, m. We refer to z/K as density. More precisely, the random variable φ 0 (z) is distributed following a binomial distribution of size z and success probability given by a function s(m, z, K ). We consider that the process begins with a much smaller initial number of individuals Z 0 than K and m > 1. Under this consideration we have E[Z n+1 | Z n = z] = mzs(m, z, K ). Although the probabilistic evolution of the described CBP with binomial control can be represented equivalently as a PSDBP, from a practical view point the structure of a CBP makes easier to interpret the parameters involved.
Different functions s(m, z, K ) can be defined to introduce a density-dependent growth inspired by deterministic models. Given their practical relevance we highlight the following ones and the corresponding deterministic models on which the functions s(m, z, K ) are based: In particular, θ = 1 for the second function yields the Ricker model while β = 1 in the third function gives us the Beverton-Holt model. We notice that, as is reasonable, a high value of density implies a low probability of being progenitor in all the models. With the aim of making inference on the offspring mean and the equilibrium value for logistic growth data we implemented the ABC SBC algorithm for model choice and estimation of the parameters in Sect. 2 by considering the binomial control distributions introduced above, with the control parameter γ = K . We tackled the estimation in two real datasets: yeast data and seal data. We set the same number of iterations, pools of non-extinct simulated processes, tolerance levels and tuning parameter as in the previous simulated examples. The details on the prior distributions are given below for each dataset.

Yeast dataset
The yeast dataset was already studied in [21] (see Figure 1 (a) in this paper) and it collects the yeast cell numbers in a replicate by colony scan-o-matic from 0 and 72 hours of growth at 20 min intervals. These data are plotted in grey in Fig. 13 below. Note the high dimension of the data and that given the nature of the data, the observed sample is only given by the total size of each generation. To perform the algorithm, we set K max = 6 and the prior distribution for γ = K as an uniform distribution on (1 · 10 7 , 1.1 · 10 7 ) interval. We note that a yeast cell might reproduce more than once in 20 minutes and κ therefore represents the maximum number of yeast cells produced by a cell in this period of time.
To choose the best choice of the θ -logistic and Hassell models, we ran the algorithm for a grid of values of the θ and β parameters and selected the corresponding models which . Right: Estimate of π(K e , | κ n , Z obs n ). Grey solid lines are the sample means and blue dashed-dotted vertical lines correspond to 95% HPD intervals provide the best adjustments. We based our decision on R 2 g , the fraction of variance in the growth data explained by the different logistic regression models, which is the adjustment measure considered in [21]. In Fig. 13 we plotted a point estimates of the expected values of each generation size given by the different logistic regression models and provide the fraction of variance explained by each of them. The maximum value of R 2 g is provided by Hassell logistic growth model with β = 0.05, R 2 g = 0.9946. It is worthy to point out that this latter value is similar to the one obtained in the study developed in [21]. For this model, we also estimated the joint posterior distribution of the offspring mean, m, and the equilibrium value, K e , and the corresponding marginal distributions in Fig. 14.

Seal dataset
The seal dataset collects the average annual harbor seal haul-out counts in the coastal estuarine environment of Washington State, USA, from 1975 to 1999. These are provided in Table 6 in the Appendix (see [13] for further details on this dataset) and represented in Fig. 15. It is worthy to point out that these data show missing values and a greater dispersion than yeast data. In this case we use the same value of K max as in the yeast data example and for prior

Concluding remarks
We dealt with the Bayesian estimation of the main parameters of a CBP in a general context. Precisely, we assumed a parametric framework for the control laws and a non-parametric one for the offspring distribution without any knowledge about its support The two main goals in this setting were to estimate the posterior distribution of the maximum number of offspring per individual, κ, and to estimate the posterior distribution of other parameters such as the offspring mean and control parameter based on the Bayes point estimate of κ under the quadratic loss function. To that end, we considered the sample defined by the population sizes in all the generations and the number of progenitors in the last generation.
The methodology that we proposed consists of two steps. In the first one, we used the parameter κ as a model index and applied a SMC ABC algorithm for model choice with the raw data to draw a sample from the estimate of the posterior distribution of κ. From this sample, we also proposed the sample mean as point estimate of the value of κ. In the second step, given this point estimate and the samples obtained in the last iteration of the method in the previous step, we made use of an ABC algorithm together with a local linear regression adjustment to draw samples from the estimates of the posterior distribution of the parameters of interest related to the offspring and control distribution. In this stage, we introduced an appropriate summary statistic to identify the parameters of the model.
Our empirical results support the suitability of the methodology proposed. First, via several simulated examples, we showed that SMC ABC algorithm for model choice with the raw data enables us to obtain a sample of the posterior distribution of κ relatively easily and identify the main parameters of the reproduction and control laws through the second stage of the algorithm with the summary statistic. Indeed, the resulting posterior distributions are centred around the true value of the parameters. Second, turning to the simulation study in [10] we applied the method to estimate the posterior distribution of the offspring mean and control parameter when the support of the offspring distribution is infinite. In this setting, as indicated above, the parameter κ is now interpreted as such a quantity satisfying that probability that an individual has at most κ offspring is large enough, that is a realistic upper bound for the reproduction capacity of the majority of the individuals. Again, the results obtained are quite satisfactory even in this miss-specified model framework.
We also used our methodology to estimate the posterior distribution of the offspring mean and the equilibrium value for two real datasets that present logistic growth of populations. To the best of our knowledge, this was the first time that CBPs were used as models for populations whose evolution is conditioned by the existence of a maximum capacity of the environment in which evolve. We highlight that the methodology is quite flexible and works reasonably well even with missing values, as happens in seal dataset, and with high value data, as happens in both examples-mainly in the yeast one. In both datasets the adjusted models fit the observed data quite well, providing suitable estimates of the parameters of interest.
We finally remark that in situations where the knowledge on the reproduction law is limited, the computational simplicity of the methodology makes it an appropriate way to generate samples of the the estimate of the posterior distributions of the target parameters. This represents a clear progress compared to previous works in this setting such as [5,8,10,11], even when working with high value data. appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. probability distribution of the control variable φ n (k) is a binomial distribution with parameters ξ(k) and γ = 0.75, with ξ(k) = k + log(k) , for each k ∈ N and ξ(0) = 0. Table 6 gathers the average annual harbor seal haul-out counts in the coastal estuarine environment of Washington State from 1975 to 1999. These data were previously provided and analysed in [13] (see Table 1 in the aforementioned paper). Table 6 The average annual harbor seal haul-out counts Year 1975Year 1976Year 1977Year 1978Year 1979Year 1980Year 1981Year 1982Year 1983Year 1984Year 1985Year 1986 Z n 1694 1742 2082 2570 · 2864 4408 5197 4416 4203 6008 4807