1 Introduction

We focus our attention on inferential issues related to controlled branching processes. A controlled branching process is a discrete-time stochastic process that models populations developing in the following manner: the population begins with a fixed number of individuals or progenitors; each of them, independently of the others and according to a common probability distribution, gives birth to offspring, and then ceases to participate in subsequent reproduction processes. Thus, each individual lives for one unit of time and is replaced with a random number of offspring. Moreover, since by several reasons of an environmental, social, or other nature the number of progenitors which take part in each generation might be controlled, a random mechanism is introduced in the model to determine the number of offspring with reproductive capacity in each generation. Mathematically, a controlled branching process (CBP) is a process \(\{Z_n\}_{n\in \mathbb {N}_0}\) defined recursively as

$$\begin{aligned} Z_0=N,\quad Z_{n+1}=\sum _{j=1}^{\phi _n(Z_{n})}X_{nj},\quad n\in \mathbb {N}_0, \end{aligned}$$
(1)

where \(\mathbb {N}_0=\mathbb {N}\cup \{0\}\), \(N\in \mathbb {N}\), \(\{X_{nj}:\ n\in \mathbb {N}_0;\ j\in \mathbb {N}\}\) and \(\{\phi _n(k):n,k\in \mathbb {N}_0\}\) are independent families of non-negative integer valued random variables and the empty sum in (1) is considered to be 0. The random variables \(X_{nj}\), \(n\in \mathbb {N}_0\), \(j\in \mathbb {N}\), are assumed to be independent and identically distributed (i.i.d.) with distribution \(\varvec{p}=\{p_j=P(X_{01}=j): j\in \mathbb {N}_0\}\) and in terms of population dynamics they represent the number of offspring given by the j-th progenitor of the n-th generation. Moreover, \(\{\phi _n(k)\}_{k\in \mathbb {N}_0}\), for \(n\in \mathbb {N}_0\), are independent stochastic processes with equal one-dimensional probability distributions. This property means that the control mechanism works in an independent manner in each generation, and once the population size at certain generation n, \(Z_n\), is known, the probability distribution of the number of progenitors, denoted by \(\phi _n(Z_{n})\), is independent of the generation. Some particular cases collected in this general family of branching processes are the simplest model, the standard Bienaymé–Galton–Watson (BGW) process, by considering \(\phi _n(k)=k\) a.s. for each k, or the branching processes with immigration, by setting \(\phi _n(k)=k+Y_n\), where \(\{Y_n\}_{n\in \mathbb {N}_0}\) is a class of i.i.d. random variables, among others.

The recent monograph [9] provides an extensive description of its probabilistic theory. The behaviour of the long-time evolution of a CBP is determined by the parameters of the model associated to the offspring and control laws. Briefly, assuming that \(m=E[X_{01}]\) and \(\varepsilon (k)=E[\phi _{n}(k)]\), \(k\in \mathbb {N}_0\), exist and are finite, and whenever the limit \(\tau =\lim _{k\rightarrow \infty }k^{-1}\varepsilon (k)\) exists, the threshold parameter of this branching model is \(\tau m\). The extinction occurs almost surely in subcritical populations, namely if \(\tau m<1\), and different growth rates on the non-extinction set are obtained depending on whether \(\tau m=1\) (critical population) or \(\tau m>1\) (supercritical population) with additional conditions. In real situations, these parameters are unknown. Until now, the methodologies proposed in the literature for the Bayesian inference on the offspring distribution have focused on the cases where either the support of the reproduction law is finite and known (see [7]) or that the offspring law belongs to some one-dimensional parametric family (see [10]). A first paper in the context of the CBP that faces the problem of an unknown scenario on the offspring distribution could be [8]. The statistical procedures developed in this work did not include the estimation of the posterior distribution of the maximum number of offspring per progenitor given the sample of population sizes at each generation \(\{Z_0,\ldots , Z_n\}\), but this quantity was set \(1+\max _{1\le k\le n}Z_k\) as a primary approach. Within the class of other branching processes, this problem has been only considered in the BGW process. In particular, from a probabilistic viewpoint, the asymptotic behaviour of the number of offspring of the most prolific individual in the n-th generation has been studied as an extreme value problem in [1, 18]. From an inferential viewpoint, a particle Markov Chain Monte Carlo method was introduced to estimate the support of the offspring law in [5]. However, the drawback of this approach is that its computational feasibility strongly depends on dealing with BGW processes with low values.

The first aim of this work is to provide a methodology to estimate the maximum progeny that an individual in the population can bear (called maximum offspring capacity per individual) in the general class of CBPs and regardless the magnitude of the observed samples. Having estimated the maximum offspring capacity per individual, we also make inference on the expected values of offspring and control laws in a CBP. To this end, we consider the maximum offspring capacity per individual as a model index and, for the first time, we tackle its estimation by means of a model choice procedure. Thus, we provide an algorithm based on approximate Bayesian computation (ABC) techniques to estimate both the maximum number of offspring that an individual is able to give birth to and the parameters of interest of the model. The ABC methodology in the context of CBPs was already analysed and applied in [10] by assuming that the offspring distribution belongs to a parametric family. This means that the family of offspring distributions is known (for instance, geometric, Poisson or binomial distributions) and the only unknown elements are the parameters that determine them. In this paper we drop this assumption and face the problem of making inference on the parameters of interest in a less informative scenario with respect to the offspring distribution.

For our purpose, let us consider a CBP with an offspring distribution with an unknown support and control laws belonging to some known one-dimensional parametric family with unknown parameter. Let \(\kappa =\sup \{j\in \mathbb {N}_0: p_j>0\}\) the maximum number of offspring per individual, denote \(\varvec{p}({\kappa })=\{p_j({\kappa })=P(X_{01}=j):\ j\in \mathbb {N}_0\}\) the offspring distribution when the maximum offspring capacity per individual is \(\kappa \), and let \(\gamma \) be the control parameter, with \(\gamma \in \varGamma \subseteq \mathbb {R}\). We recall that in that case, the distribution of each control variable \(\phi _n(k)\) only depends on k and \(\gamma \), and \(E[\phi _n(k)]=\varepsilon (k,\gamma )\). Let us denote \(m({\kappa })=\sum _{j=0}^\kappa j p_j(\kappa )\) and \(\tau (\gamma )=\lim _{k\rightarrow \infty }k^{-1}\varepsilon (k,\gamma )\). We assume that \(m({\kappa })<\infty \) and \(\tau (\gamma )\) exists for all \(\gamma \in \varGamma \). Moreover we assume the existence of the inverse of \(\tau (\cdot )\). Several preliminary simulation studies lead us to the conclusion that to approximate the posterior distributions of the parameters of interest reasonably well by making use of ABC methodology, we have to assume that at least the population sizes at each generation and the number of progenitors in the last generation are observable (see [10]). Hence, let us consider the observed sample \(\widetilde{\mathcal {Z}}^{obs}_n =\{Z_0^{obs},\ldots ,Z_n^{obs},\phi _{n-1}(Z_{n-1})^{obs}\}\). Briefly, we will proceed as follows: firstly, we draw a sample from an estimate of the posterior distribution of \(\kappa \), denoted by \(\pi (\kappa |\widetilde{\mathcal {Z}}^{obs}_n)\), by considering a model choice algorithm. Secondly, we generate a sample from an estimate of the posterior distribution of \((\varvec{p}(\widetilde{\kappa }_n),\gamma )\), where \(\widetilde{\kappa }_n\) is a point estimate of \(\kappa \). Next, from this sample we estimate the posterior distributions of \((\varvec{p}(\widetilde{\kappa }_n),\gamma )\), \(m(\widetilde{\kappa }_n)\) and \(\tau (\gamma )\) using kernel density estimation. We denote these posterior distributions by \(\pi (\varvec{p}(\widetilde{\kappa }_n),\gamma |\widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\), \(\pi (m \mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\) and \(\pi (\tau (\gamma )\mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\), respectively.

The performance of the proposed algorithm is firstly illustrated by two simulated examples. Next, the method is applied on two real datasets that show a logistic growth. To this end, we model the evolution of logistic growth of populations by CBPs, which represents another important novelty of this paper. These populations are characterised by the fact that when their sizes are small enough, they grow with almost no restriction, but when the sizes increase, the limited resources of the environment lead to a control on the population sizes. As a consequence, there exists a maximum population size, usually called carrying capacity in an ecological context, that can be supported by the ecosystem. With the aim of describing mathematically these populations, we introduce CBPs with control distributions given by binomial distributions whose success probabilities mainly depend on the density of the population. We provide several models based on different success probability functions which are inspired in classical deterministic population growth models.

Apart from this introduction, the paper is organized as follows. In Sect. 2 we provide a detailed description of the ABC algorithm for model choice and parameter estimation in the context of CBPs. Section 3 gathers simulation studies to evaluate and illustrate the performance of the proposed ABC approach. In Sect.  4 we present the application of the proposed algorithm to two real datasets from populations that exhibit a logistic growth. Additional information related to the examples are presented in the Appendix. In Sect. 5 we summarise the main contributions of this work.

2 Methodology

In this section we describe the ABC approach for estimating the posterior distribution of the main parameters of a CBP. ABC algorithms are a group of Monte Carlo algorithms used to find posterior distributions without requiring explicit knowledge of the likelihood function. These are very useful when the likelihood is intractable or too costly to evaluate. The inference is mainly done with samplings from the model, and hence, their versatility in the framework of branching processes (see the monograph [19] for further details).

In this context, the fact that the value of \(\kappa \) is unknown and could be even infinite, increases the complexity of the problem of estimating the parameters of the CBP and requires the use of methodologies for model choice with the aim of estimating the parameter \(\kappa \). We assume \(\kappa \ge 2\) to avoid trivial cases. To implement the ABC methodology, we remark that even if our knowledge on the value of \(\kappa \) is very poor, we usually have some information about an effective upper bound for \(\kappa \), denoted \(K_{max}\), from the dynamics of the population that we model via the CBP. An example of this situation is the family of K-selected species (see [17]), which includes larger mammals such as elephants, horses, and primates, and whose species are relatively stable populations and produce relatively low numbers of offspring. For practical purposes and without loss of generality, throughout this paper we consider offspring laws with finite support. Thus, \(\kappa \in \{2,3,\dots ,K_{max}\}\). We can take the parameter \(\kappa \) as a model index. We emphasise that as a consequence, for each value of \(\kappa \) the parameter of interest in the corresponding model is

$$\begin{aligned} (\varvec{p}(\kappa ),\gamma )=(p_0(\kappa ),\ldots ,p_\kappa (\kappa ),\gamma )\in \varDelta _{\kappa }\times \mathbb {R}, \end{aligned}$$

whose dimension depends on \(\kappa \), and where \(\varDelta _{\kappa }\) is the \(\kappa \)- standard simplex in \(\mathbb {R}^{\kappa }\).

We recall that our final aim is to estimate the posterior \(\pi (\varvec{p}(\kappa ),\gamma \mid \widetilde{\kappa },\widetilde{\mathcal {Z}}_n^{obs})\), with \(\widetilde{\kappa }\) a point estimate of \(\kappa \), and to that end, we propose a two-fold procedure.

2.1 First stage: estimation of \(\pi (\kappa \mid \widetilde{\mathcal {Z}}_n^{obs})\)

In the first part, we estimate \(\pi (\kappa \mid \widetilde{\mathcal {Z}}_n^{obs})\). We apply an ABC algorithm for model choice based on sequential importance sampling, ABC SMC for model choice, introduced in [20] to draw a sample \(\{(\kappa ^{(1)},\varvec{p}(\kappa ^{(1)})^{(1)},\gamma ^{(1)}),\ldots , (\kappa ^{(N)},\varvec{p}(\kappa ^{(N)})^{(N)},\gamma ^{(N)})\}\) from the joint posterior distribution of \((\kappa ,\varvec{p}(\kappa ),\gamma )\) given the observed sample \(\widetilde{\mathcal {Z}}_n^{obs}\), denoted by \(\pi (\kappa ,\varvec{p}(\kappa ),\gamma \mid \widetilde{\mathcal {Z}}_n^{obs})\). Next, using the information of the marginal sample \(\{\kappa ^{(1)},\ldots , \kappa ^{(N)}\}\) we are able to estimate the distribution \(\pi (\kappa \mid \widetilde{\mathcal {Z}}_n^{obs})\). We point out that the output of the ABC SMC for model choice is not analysed as usual in the framework of Bayesian analysis (see [20]). Instead, we use this output to provide an estimation of \(\kappa \). Precisely, having obtained an estimation of the distribution \(\pi (\kappa \mid \widetilde{\mathcal {Z}}_n^{obs})\), we propose the closer integer to its posterior mean as the Bayesian point estimator for the parameter \(\kappa \). We refer to this estimator as \(\tilde{\kappa }_n\). Our choice is justified by the good asymptotic properties that this estimator usually exhibits even in the case of CBPs (see [11]).

We now describe how to implement the ABC SMC algorithm for model choice to draw samples from the posterior distribution \(\pi (\kappa ,\varvec{p}(\kappa ),\gamma \mid \widetilde{\mathcal {Z}}_n^{obs})\). The algorithm reaches the target distribution through a series of intermediate distributions sampling from appropriate proposal distributions and weighting the samples by importance weights. To that end, we fix a number of T iterations and a decreasing sequence of tolerance levels \(\epsilon _1>\cdots > \epsilon _T\). In practice, the tolerance levels are selected as quantiles of the distances between the simulated and observed data (see the mathematical arguments for this choice in [2]).

The first iteration consists in running the tolerance-rejection ABC algorithm for model choice. It starts by drawing a value \(\kappa '\) from the prior distribution on the models, denoted \(\pi (\kappa )\). Assuming that we have no other knowledge than the lower and upper bounds of \(\kappa \), we shall consider a uniform distribution on the points \(2,\ldots , K_{max}\), denoted \(U\{2,\ldots ,K_{max}\}\), for the prior model distribution. Using the fact that the reproduction and control laws are independent, we assume that the prior distribution for the model index \(\kappa \), denoted by \(\pi (\varvec{p}(\kappa ),\gamma \mid \kappa )\), satisfies

$$\begin{aligned} \pi (\varvec{p}(\kappa ),\gamma \mid \kappa )=\pi (\varvec{p}(\kappa )\mid \kappa )\pi (\gamma ), \end{aligned}$$

where \(\pi (\varvec{p}(\kappa )\mid \kappa )\) is the prior distribution of \(\varvec{p}(\kappa )\) given the model index \(\kappa \) and \(\pi (\gamma )\) is a suitable prior for \(\gamma \). Now, bearing in mind that the parameter \(\varvec{p}(\kappa )\) is a probability distribution with support \(\{0,\ldots ,\kappa \}\), we propose a Dirichlet distribution with a \((\kappa +1)\)-dimensional parameter \(\varvec{\alpha }_\kappa \), denoted \(D(\kappa +1,\varvec{\alpha }_\kappa )\), as the distribution \(\pi (\varvec{p}(\kappa )\mid \kappa )\). Let us also write \(f(\widetilde{\mathcal {Z}}_n\mid \varvec{p}(\kappa ), \gamma )\) to refer to the likelihood function given \(\varvec{p}(\kappa )\) and \(\gamma \), with \(\widetilde{\mathcal {Z}}_n=\{Z_0,\ldots ,Z_n,\phi _{n-1}(Z_{n-1})\}\). The next steps are the usual ones in tolerance-rejection ABC algorithms. A sample \(\widetilde{\mathcal {Z}}_n^{sim}=\{Z_0^{sim},\ldots ,Z_n^{sim},\phi _{n-1}(Z_{n-1}^{sim})\}\) is generated by using the previously sampled parameters and accept them if the sample is close enough to the observed sample \(\widetilde{\mathcal {Z}}_n^{obs}\) in terms of some distance \(\rho (\cdot ,\cdot )\) and the tolerance level. In this stage, we compare directly the raw data without summary statistics. The jumps between the model indexes might lead to quite different dimensions of the prior distributions \(\pi (\varvec{p}(\kappa )|\kappa )\), for each \(\kappa \), and consequently, finding a low-dimensional summary statistic to identify parameters of a large dimension is quite hard (see the discussion in [16]).

It is worth to mention that in order to quantify the disparities between the simulated and the observed data we can use many different functions. However, based on the results of previous studies (see [10]), a good discrepancy measure in the CBP setting should satisfy the non-negative property, the identity of indiscernible and the symmetry, but it should also compare the simulated and observed data in relative terms to avoid any issue due to the magnitude of each coordinate. For these reasons, we propose the following function:

$$\begin{aligned} \rho ({\mathbf{x}},{\mathbf{y}}) = d_e\left( \frac{{\mathbf{x}}}{{\mathbf{y}}},\frac{{\mathbf{y}}}{{\mathbf{x}}}\right) , \text{ with } {\mathbf{x}}=(x_1,\ldots ,x_L), {\mathbf{y}}=(y_1,\ldots ,y_L)\in \mathbb {R}_+^L, \end{aligned}$$

where \(\frac{\mathbf{x}}{\mathbf{y}}=(\frac{x_1}{y_1},\ldots ,\frac{x_L}{y_L})\), \(\frac{\mathbf{y}}{\mathbf{x}}=(\frac{y_1}{x_1},\ldots ,\frac{y_L}{x_L})\), \(\mathbb {R}_+=(0,\infty )\) and \(d_e\) is the Euclidean distance. Finally, at the end of this stage, all the outputs \((\kappa ^{(i)},\varvec{p}(\kappa ^{(i)})^{(i)},\gamma ^{(i)})\) are assigned the same weight \(\omega _1^{(i)}=1/N\), for \(i=1,\ldots ,N\).

We can now describe the first iteration of the ABC SMC algorithm for model choice on \(\kappa \) as follows:

figure a

To run the following iterations, the idea is to draw the parameters from proposal distributions that are closer to the target distributions so that we can reduce the variance of the final sample. For each iteration t, \(t=2,\ldots , T\), we have to specify a joint proposal distribution for each \(\varvec{p}({\kappa }^*)\) and \(\gamma ^*\), denoted by \(q_t(\varvec{p}({\kappa }), \gamma \mid \varvec{p}({\kappa }^*), \gamma ^*)\). However, in real applications finding a joint distribution that leads to a good performance of the ABC SMC algorithm represents a challenge.

To that end, it is important to highlight that despite the independence between offspring and control distributions, once the sample is given, their posterior distributions are usually highly correlated, as shown empirically in the second simulated example in Sect. 3 (see Fig. 12, left). Indeed, the outputs \(({\kappa }', \varvec{p}({\kappa }'), \gamma ')\) of each iteration of the algorithm satisfy \(\tau (\gamma ')m(\kappa ')\approx \tau m\), where recall \(m(\kappa )=\sum _{j=0}^\kappa j p_j(\kappa )\), \(\tau (\gamma )=\lim _{k\rightarrow \infty }k^{-1}\varepsilon (k,\gamma )\), with \(\varepsilon (k,\gamma )=E[\phi _n(k)]\), and \(\tau m\) represents the true value of the threshold parameter. Thus, the use of component-wise perturbation proposals might lead to an inappropriate structure of the true posterior. Taking into account the relationship described above, we suggest the following proposal distribution:

$$\begin{aligned} q_t(\varvec{p}({\kappa }), \gamma \mid \varvec{p}({\kappa }^*), \gamma ^*)=q_t(\varvec{p}({\kappa })\mid \varvec{p}({\kappa }^*), \gamma ^*)q_t(\gamma \mid \varvec{p}({\kappa }), \varvec{p}({\kappa }^*), \gamma ^*). \end{aligned}$$
(2)

We set \(q_t(\varvec{p}({\kappa })\mid \varvec{p}({\kappa }^*), \gamma ^*)\) to be a Dirichlet distribution with mean vector \(\varvec{p}(\kappa ^*)\) and variance controlled by a single tuning parameter \(a>0\), i.e., a Dirichlet distribution of order \(\kappa ^*+1\) and parameter \(a \varvec{p}(\kappa ^*)\), \(D(\kappa ^*+1, a \varvec{p}(\kappa ^*))\). Given a value \(\varvec{p}({\kappa })\) from \(q_t(\varvec{p}({\kappa })\mid \varvec{p}({\kappa }^*), \gamma ^*)\), we fix \(q_t(\gamma \mid \varvec{p}({\kappa }), \varvec{p}({\kappa }^*), \gamma ^*)\) as the distribution of the variable \(\tau ^{-1}(U^*/m(\kappa ))\), where \(\tau ^{-1}(\cdot )\) is the inverse of the function \(\tau (\cdot )\), the random variable \(U^*\) follows a normal distribution with mean \(\tau (\gamma ^*)m(\kappa ^*)\) and some variance \(\sigma _t^2\), \(N( \tau (\gamma ^*)m(\kappa ^*), \sigma _t^2)\), and \(m(\kappa )\) is the offspring mean of the distribution \(\varvec{p}({\kappa })\). Notice that we keep the variability of the proposal distribution fixed when the value \(\varvec{p}({\kappa }^*)\) is perturbed, however an adaptive dispersion is chosen to perturb the control parameter \(\gamma \). In particular, \(\sigma _t^2\) is twice the weighted empirical variance of selected \(\gamma \)’s in the \(t-1\) iteration (see [6] for further discussion on optimality of proposals for ABC SMC).

For a step-by-step description of the remaining iterations of the algorithm in the first phase, let us write \(\mathcal {P}_t=\{({\kappa }_t^{(1)},\varvec{p}({\kappa }_t^{(1)})^{(1)},\gamma _t^{(1)})\ldots , ({\kappa }_t^{(1)},\varvec{p}({\kappa }_t^{(N)})^{(N)},\gamma _t^{(N)})\}\), for the output of the t stage of the algorithm. Moreover, let us denote \(\mathcal {P}_t(\kappa )\) the family defined by the elements of \(\mathcal {P}_t\) such that \({\kappa }_t^{(j)}={\kappa }\), \(j=1,\ldots ,N\). We also write \(\mathbbm {1}_{A}\) to refer to the indicator function of the set A and \(\varvec{\omega }_t=(\omega _t^{(1)},\ldots , \omega _t^{(N)})\) to refer to the vector of weights in the iteration t.

figure b

2.2 Second stage: estimation of \(\pi (\varvec{p}(\widetilde{\kappa }_n),\gamma \mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\)

Having obtained the estimate for \(\kappa \), denoted \(\widetilde{\kappa }_n\), given by the closest integer to the mean of the sample \(\{\kappa ^{(1)},\ldots \kappa ^{(N)}\}\) drawn in the first stage, we now describe how to draw a sample from the ABC approximation of the distribution \(\pi (\varvec{p}({\widetilde{\kappa }_n}),\gamma |\tilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\).

Besides the approximation of the marginal posterior distribution of the model index, the output of the first stage provides a sample from the ABC estimate of the marginal posterior distributions of parameters, i.e. \(\pi (\varvec{p}(\kappa ),\gamma |\kappa ,\widetilde{\mathcal {Z}}_n^{obs})\), for \(\kappa =2,\ldots , K_{max}\). Although the ABC methodology for the inference on \(\kappa \) works quite well without the use of summary statistics as pointed out before, its use does improve the output of the ABC algorithm when the aim is to make inference on the parameters once the model index is known (see [10]). Thus, our proposal is to proceed as follows: when the first stage is implemented all the generated parameter values together with their data sets in the last iteration are stored. Consequently, they can be used to run an ABC algorithm to estimate the posterior distribution of the parameters of the model given \(\kappa =\tilde{\kappa }_n\) without having to generate new data. Let us denote as \(\{\widetilde{\mathcal {Z}}_n^{(1)},\ldots , \widetilde{\mathcal {Z}}_n^{(N)}\}\) the simulated data corresponding to the sample \(\{(\kappa ^{(1)},\varvec{p}(\kappa ^{(1)})^{(1)},\gamma ^{(1)}),\ldots , (\kappa ^{(N)},\varvec{p}(\kappa ^{(N)})^{(N)},\gamma ^{(N)})\}\), and let \(\{\kappa ^{(i_1)},\ldots , \kappa ^{(i_L)}\}\) be all the elements of the sample \(\{\kappa ^{(1)},\ldots , \kappa ^{(N)}\}\) such that \(\kappa ^{(i_l)}=\tilde{\kappa }_n\), for \(l=1,\ldots ,L\). Next, we use of the simulated marginal values

$$\begin{aligned} \{(\varvec{p}(\kappa ^{(i_1)})^{(i_1)},\gamma ^{(i_1)},\widetilde{\mathcal {Z}}_n^{(i_1)}),\ldots , (\varvec{p}(\kappa ^{(i_L)})^{(i_L)},\gamma ^{(i_L)},\widetilde{\mathcal {Z}}_n^{(i_L)})\}, \end{aligned}$$

to check the rejection condition in the tolerance-rejection ABC algorithm considering a suitable summary statistic. We use the following summary statistic

$$\begin{aligned} \mathcal {S}(\widetilde{\mathcal {Z}}_n)=\left( \sum _{i=1}^n Z_i,\frac{\sum _{i=1}^n Z_i}{\sum _{i=0}^{n-1} Z_i},\frac{\phi _{n-1}(Z_{n-1})}{Z_{n-1}}, \frac{Z_{n}}{\phi _{n-1}(Z_{n-1})}\right) . \end{aligned}$$
(3)

This statistic results from adding a fourth coordinate to the one in [10]. The properties of the model (see [9]) ensure that in a general setting, as \(n\rightarrow \infty \),

$$\begin{aligned} \frac{\sum _ {i=1}^n Z_i}{\sum _{i=0}^{n-1} Z_i}\rightarrow \tau m,\quad \frac{\phi _{n-1}(Z_{n-1})}{Z_{n-1}}\rightarrow \tau ,\quad \frac{Z_{n}}{\phi _{n-1}(Z_{n-1})}\rightarrow m, \end{aligned}$$

almost surely on \(\{Z_n \rightarrow \infty \}\), regardless of whether we consider parametric frameworks for the offspring or control distributions. Consequently, the third and the new coordinate enable us to identify each factor of the threshold parameter. Our simulation results show that the four dimensional summary statistic proposed improves the results compared to previous summary statistics. More details about the efficiency of adding a new coordinate to the summary statistic can be found in [14].

Finally, we apply a post-processing method based on a local linear regression on the output sample. The outputs \(\varvec{p}(\kappa ^{(i_j)})^{(i_j)}\) are (\(\tilde{\kappa }_n\)+1)-dimensional vectors whose coordinates sum one, but, after regression, some of them could be negative. Such outputs must be removed from the sample (see [10] for details on both methods).

3 Simulated examples

Our methodology is illustrated via several simulated examples. First, we show how well the methodology works in situations as described above, where the reproduction law has finite support. More precisely, we fix the value of the threshold parameter \(\tau m\) and consider different CBPs where we vary the support of the offspring distribution, the mean of the offspring distribution m, and the control parameter \(\gamma \) in such a way that the value of \(\tau m\) remains constant for all of the cases. Second, we return to the previous simulated study in [10]. Our aim is to estimate the posterior distribution of the parameters of interest without assuming a parametric offspring distribution. The true offspring distribution in this scenario has an infinite support, but we show our methodology is also useful in this context if the main aim is to approximate the posterior distributions of stable parameters, namely, the offspring mean and control parameter.

3.1 Example 1

We begin our simulation study focusing on offspring distributions with finite support. We show the suitability of the methodology in this framework by considering reproduction laws with different supports and means, and also various control laws with different parameters, keeping the same threshold parameter.

To that end, we explore four different models/cases of CBPs where the initial number of individuals is \(Z_0=1\) and the control variables \(\phi _n(j)\) follow binomial distributions with parameters \(\xi (j)\) and \(\gamma \), where \(\xi (j)= j+\lfloor \log (j)\rfloor \), for each \(j\in \mathbb {N}\), \(\xi (0)=0\), and \(\lfloor x \rfloor \) denotes the integer part of a number x. We observe that these control laws are a mixture of a deterministic component and a random one. The introduction of these control functions can be explained in an ecological context, where, first, we allow the introduction of new individuals in the ecosystem as described by the deterministic function \(\xi (\cdot )\), and next, the binomial control models situations as the emigration or death of individuals due to their hunt by predators. Here, \(\gamma \) represents the probability that an individual does not participate in the subsequent reproduction process as it is no longer present in the ecosystem. We note that for these CBPs \(\varepsilon (j,\gamma )=\gamma \xi (j)\) and \(\tau =\gamma \). For our purpose, we vary the value of the control parameter \(\gamma \) across the four cases. For the reproduction law we chose binomial distributions with different sizes, \(\kappa \), and probabilities of success, \(\varrho \), in such a way that the four models satisfy \(\tau m=2.88\), i.e. the CBPs are supercritical. The values of the parameters are gathered in Table 1. We emphasise that our choice of the parameters enables us to compare the results obtained by the methodology proposed when examining different finite supports and types of skewness of the offspring distribution (see Table 2).

Table 1 Value of the parameters of each CBP
Table 2 Values of the cumulative distribution function associated with each offspring distribution

For each case described above we simulated the first 10 generations of a CBP and we ran the ABC SMC algorithm for model choice with each of the corresponding samples as observed data (see Table 4 in Appendix for details on the samples). For that purpose, we assumed that our only knowledge on the offspring distribution is an upper bound for \(\kappa \), and the fact that the control laws for a population size j are binomial distributions with parameters \(\xi (j)\) and \(\gamma \), with \(\gamma \in (0,1)\) unknown. To run the ABC SMC algorithm for model choice we fixed \(T=3\) iterations, an upper bound \(K_{max}=15\), \(\varvec{\alpha }_\kappa =(1,\ldots , 1)\), where the prior for \(\gamma \) is a beta distribution with both parameters equal to 1, and the tuning parameter is \(a=30\). The choice of the value of a was justified by the results of several simulated experiments to avoid that the proposal distribution becomes a Dirac measure at the point where it is perturbed. We simulated pools of \(4\cdot 10^5\), \(2\cdot 10^6\), \(20\cdot 10^6\) of non-extinct CBPs at the corresponding iterations and fixed as the tolerance levels \(\epsilon _1\) , \(\epsilon _2\) and \(\epsilon _3\) the quantiles of orders 0.0125, 0.0025, and 0.00025, respectively, of the sample of the distances between the paths of the simulated and observed processes. As a result, for each sample path observed we obtained a sample of size 5000 of the corresponding posterior distribution of \(\kappa \). The barplots of these samples are given in Fig. 1 in the Cases 1 and 2, and in Fig. 2 in the Cases 3 and 4. In the Case 1, with \(\kappa =4\), the distribution is concentrated around 4 and the point estimate is \(\tilde{\kappa }_n=5\) due to the tail of the distribution. In the Case 2, with \(\kappa =10\), \(\tilde{\kappa }_n=7\) while the posterior distribution is right-skewed too. The posterior distribution in the Case 3, with \(\kappa =7\), has a similar shape, but with support \(\{6,\ldots ,15\}\), and \(\tilde{\kappa }_n=9\). Finally, in the Case 4, with \(\kappa =10\), the posterior distribution is more symmetric than in the previous cases and the point estimate is \(\tilde{\kappa }_n=12\). Taking into account the cumulative distribution function associated with each of the offspring distributions (see Table 2), the proposed estimate of \(\tilde{\kappa }_n\) in each case is quite reasonable.

Fig. 1
figure 1

Estimate of the posterior of \(\kappa \) obtained in the first step of the ABC SMC algorithm for model choice in Example 1. Left: Case 1. Right: Case 2

Fig. 2
figure 2

Estimate of the posterior of \(\kappa \) obtained in the first step of the ABC SMC algorithm for model choice in Example 1. Left: Case 3. Right: Case 4

We continued with the second step of our methodology by performing the tolerance-rejection algorithm and the post-processing method with the summary statistic to draw samples from distributions that approximate the posteriors \(\pi (m\mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\) and \(\pi (\gamma \mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\) in each case. The estimates of the joint posterior \(\pi (m,\gamma \mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\) densities and their marginal posterior distributions for each case are displayed in Figs. 3, 4, 5, 6. In all cases one can observe that the estimated densities obtained are centred around the true values and their spread is relatively small. These results indicate that the method retrieves the parameters of interest reasonably well, which is a key property to predict the evolution of the population.

Fig. 3
figure 3

Case 1. Estimates of the posterior distributions via the ABC algorithm with the local linear regression adjustment with \(\widetilde{\kappa }_n=5\). Left: Contour plot of the estimates of the joint density \(\pi (m,\gamma \mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\), together with the curve \(\tau m = 2.88\). The red point corresponds to the true values of the parameters and the grey point corresponds to the sample means. Centre: Estimate of \(\pi (m\mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\). Right: Estimate of \(\pi (\gamma \mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\). Red dashed vertical lines represent the true value of the parameter, grey solid vertical lines are the sample means, and blue dashed-dotted vertical lines correspond to 95% HPD intervals

Fig. 4
figure 4

Case 2. Estimates of the posterior distributions via the ABC algorithm with the local linear regression adjustment with \(\widetilde{\kappa }_n=7\). Left: Contour plot of the estimates of the joint density \(\pi (m,\gamma \mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\), together with the curve \(\tau m = 2.88\). The red point corresponds to the true values of the parameters and the grey point corresponds to the sample means. Centre: Estimate of \(\pi (m\mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\). Right: Estimate of \(\pi (\gamma \mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\). Red dashed vertical lines represent the true value of the parameter, grey solid vertical lines are the sample means, and blue dashed-dotted vertical lines correspond to 95% HPD intervals

Fig. 5
figure 5

Case 3. Estimates of the posterior distributions via the ABC algorithm with the local linear regression adjustment with \(\widetilde{\kappa }_n=9\). Left: Contour plot of the estimates of the joint density \(\pi (m,\gamma \mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\), together with the curve \(\tau m = 2.88\). The red point corresponds to the true values of the parameters and the grey point corresponds to the sample means. Centre: Estimate of \(\pi (m\mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\). Right: Estimate of \(\pi (\gamma \mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\). Red dashed vertical lines represent the true value of the parameter, grey solid vertical lines are the sample means, and blue dashed-dotted vertical lines correspond to 95% HPD intervals

Fig. 6
figure 6

Case 4. Estimates of the posterior distributions via the ABC algorithm with the local linear regression adjustment with \(\widetilde{\kappa }_n=12\). Left: Contour plot of the estimates of the joint density \(\pi (m,\gamma \mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\), together with the curve \(\tau m = 2.88\). The red point corresponds to the true values of the parameters and the grey point corresponds to the sample means. Centre: Estimate of \(\pi (m\mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\). Right: Estimate of \(\pi (\gamma \mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\). Red dashed vertical lines represent the true value of the parameter, grey solid vertical lines are the sample means, and blue dashed-dotted vertical lines correspond to 95% HPD intervals

Besides the four particular examples presented above, in the second part of this subsection, we analyse in more detail the accuracy of the methodology to estimate the posterior distributions for \(\kappa \) when the support of the reproduction law is finite. Specifically, for each of the previous four models, we simulated the first 10 generations of 100 processes starting with one individual for each of the cases (i.e., 100 different observed samples), and we ran the ABC SMC algorithm for model choice algorithm with each of these observed samples. To this aim, the same number of iterations, prior distributions, tuning parameter as above are set, but we considered simulated pools of 16,000, 80,000 and 800,000 of non-extinct CBPs at the corresponding iterations and fixed as the tolerance levels \(\epsilon _1\) , \(\epsilon _2\) and \(\epsilon _3\) the quantiles of orders 0.0125, 0.0025, and 0.00025, respectively, of the sample of the distances between the simulated and the observed processes. As a result, for each of the 100 observed paths we obtained a sample of size 200 drawn from the posterior of \(\kappa \), and we computed the Bayesian point estimate \(\widetilde{\kappa }_n\). Thus, we got a sample of size 100 of estimates, \(\tilde{\kappa }_{n,1},\ldots ,\tilde{\kappa }_{n,100}\), for each model/case. The corresponding relative frequencies of the values of \(\kappa \) are provided in Table 3. We recall that the Cases 1 and 2 have the same offspring mean and control parameter, but the offspring distribution in Case 2 is concentrated in greater values than in Case 1 (see Table 2). Our results indicate that the algorithm proposed is able to distinguish and to identify both cases reasonably well, as was reported above in the study developed above for each particular case. We also observe that the skewness of the reproduction law has some impact on the shape of the probability distribution of the Bayesian point estimator of \(\kappa \), \(\widetilde{\kappa }_n\). Indeed, the first offspring distribution is left-skewed, and the method tends to overestimate the value of \(\kappa \), while the second one is right-skewed and the method tends to underestimate it. In particular, in the Case 1, the choices 5 and 6 cover the 86% of the values of the sample, where 5 has a relative frequency of 48%. In the Case 2, the choices 6, 7 and 8 cover the 72%, where 7 has a relative frequency of 30%; notice in this case that the cumulative probabilities for the values 6, 7, and 8, are 0.97, 0.994, and 0.999, respectively. Regarding the Cases 3 and 4, we remark that both of them have different offspring means and control parameters, and the methodology is able to discriminate satisfactorily between both. Precisely, the choices 9 and 10 represent the 78% of the values of the sample in the Case 3 whereas 11 and 12 correspond to the 84% in the Case 4. It is also important to highlight that the range of selected values of \(\widetilde{\kappa }_n\) for all the cases are different (see Table 3), and consequently, the performance of the method enables us to estimate adequately the support of the offspring distributions.

Table 3 Relative frequencies of the values of \(\kappa \) in the sample \(\tilde{\kappa }_{n,1},\ldots ,\tilde{\kappa }_{n,100}\) for each model

Next, for each observed path we obtained a sample of the ABC approximation of the posterior distributions of \(\pi (m|\tilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\) and \(\pi (\gamma |\tilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\) and we took the means of these samples as Bayesian point estimates of m and \(\gamma \), \(\tilde{m}_{n,1},\ldots ,\tilde{m}_{n,100}\) and \(\tilde{\gamma }_{n,1},\ldots ,\tilde{\gamma }_{n,100}\). The box-plots of these estimates are given in Figs. 7, 8, 9, and 10 in Cases 1, 2, 3, and 4, respectively. These show that the sample of each posterior distribution is centred around the true value of each parameter and their dispersion is not considerable. Thus, they lead to accurate estimates of the posterior of the parameters.

Fig. 7
figure 7

Case 1. Box-plots of Bayes point estimates of m and \(\gamma \), \(\tilde{m}_{n,1},\ldots ,\tilde{m}_{n,100}\) and \(\tilde{\gamma }_{n,1},\ldots ,\tilde{\gamma }_{n,100}\), based on the samples of 100 simulated processes. Horizontal red line corresponds to the true value of the parameter

Fig. 8
figure 8

Case 2. Box-plots of Bayes point estimates of m and \(\gamma \), \(\tilde{m}_{n,1},\ldots ,\tilde{m}_{n,100}\) and \(\tilde{\gamma }_{n,1},\ldots ,\tilde{\gamma }_{n,100}\), based on the samples of 100 simulated processes. Horizontal red line corresponds to the true value of the parameter

Fig. 9
figure 9

Case 3. Box-plots of Bayes point estimates of m and \(\gamma \), \(\tilde{m}_{n,1},\ldots ,\tilde{m}_{n,100}\) and \(\tilde{\gamma }_{n,1},\ldots ,\tilde{\gamma }_{n,100}\), based on the samples of 100 simulated processes. Horizontal red line corresponds to the true value of the parameter

Fig. 10
figure 10

Case 4. Box-plots of Bayes point estimates of m and \(\gamma \), \(\tilde{m}_{n,1},\ldots ,\tilde{m}_{n,100}\) and \(\tilde{\gamma }_{n,1},\ldots ,\tilde{\gamma }_{n,100}\), based on the samples of 100 simulated processes. Horizontal red line corresponds to the true value of the parameter

3.2 Example 2

We continue our simulation study with one of the examples in [10]. The considered CBP starts with \(Z_0=1\) individual, the offspring distribution is a geometric distribution with parameter \(q=0.4\) and the control variables \(\phi _n(j)\) follow a binomial distribution with parameters \(\xi (j)\) and \(\gamma =0.75\), where the function \(\xi (\cdot )\) was introduced in the previous example. The offspring mean and variance are \(m=1.5\) and \(\sigma ^2=3.75\), the control means are \(\varepsilon (j,\gamma )=\gamma \xi (j)=0.75 \xi (j)\), \(j\in \mathbb {N}_0\), \(\tau =\gamma =0.75\), and the threshold parameter is \(\tau m=1.125\). Thus, taking into account the value of this last parameter, the CBP is supercritical. The simulated path and the observed sample \(\widetilde{Z}_n^{obs}\) of the first 30 generations of such a process are presented in Table 5 in Appendix. Note that the offspring distribution has infinite support.

Fig. 11
figure 11

Estimate of the posterior distribution of \(\kappa \) in ABC SMC algorithm for model choice in Example 3.2

In Sect. 4.1 of [10] we provided some inferential results obtained by using ABC algorithms under the hypothesis of a parametric offspring distribution. Recall that this latter implies that we assumed that we knew the parametric family of probability distribution to which the offspring distribution belonged, but the value of the parameter was unknown. We now deal with the estimation of the posterior distributions of the stable parameters of the model as the offspring mean and the control parameter in a different framework. To that end, throughout this example, we understand the maximum offspring capacity per individual as a number \(\kappa \) such that the probability that an individual gives birth to more than \(\kappa \) offspring is sufficiently small, i.e., we look for a realistic upper limit for the offspring capacity of the majority of the individuals of the population. Our goal is to estimate the posterior distribution of the maximum offspring capacity per individual with the aim of identifying the stable parameters of the model properly. Thus, we assume that we can propose a reasonable upper bound, \(K_{max}\), of this maximum offspring capacity in view of the knowledge of the population that we are modeling, as discussed in Sect. 2. We make use of the observed sample to estimate the joint posterior distributions of the mean offspring and control parameter by assuming that our only knowledge on the offspring distribution is \(K_{max}\), and the fact that the control laws for a population size j are binomial distributions with parameters \(\xi (j)\) and \(\gamma \), with \(\gamma \in (0,1)\) unknown.

We implemented the ABC SBC algorithm for model choice described in Sect. 2 by setting the same number of iterations, prior distributions, pools of non-extinct simulated processes, tolerance levels and tuning parameter as in Example 1. We therefore obtained a sample of length 5000 at each iteration. The resulting barplot of the sample obtained from the estimate of the posterior distribution \(\pi (\kappa \mid \widetilde{\mathcal {Z}}_n^{obs})\) is shown in Fig. 11. The closest integer to the sample mean of the posterior distribution of \(\kappa \) is 5, and then, we propose \(\widetilde{\kappa }_n=5\). We note that the probability that the true offspring distribution, a geometric distribution of parameter 0.4, is less than or equal to 5 is 0.9533. Consequently, the choice of 5 as the maximum number of offspring per individual is appropriate to explain the evolution of our data reasonably well.

Next, we considered the marginal samples corresponding to \(\widetilde{\kappa }_n=5\) and applied the rejection condition in the tolerance-rejection ABC algorithm and a local linear regression adjustment making use of the summary statistic in (3), as described in Sect. 2.2. The results are plotted in Fig. 12. Precisely, we represented the estimated posterior densities of \(\pi (m\mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\) and \(\pi (\gamma \mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\) and the contour plot of the estimated joint posterior density of \(\pi (m,\gamma \mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\) together with the curve \(\tau m=1.125\) (recall that in this case \(\tau =\gamma \)). This figure illustrates the correlation between m and \(\gamma \) given the observed sample. The results show that the proposed ABC algorithm estimates of the posterior densities are quite accurate. It is worthy to point out that the implementation of the proposed methodology is computationally simple, and provides a useful approach to make inference on the parameters of interest in a scenario that requires very little information about the true offspring law. This latter is a great advantage versus the previous methodology considered in [10] that assumed the knowledge of the parametric offspring family to which the true offspring law belonged.

Fig. 12
figure 12

Estimates of the posterior distributions via the ABC algorithm with the local linear regression adjustment with \(\widetilde{\kappa }_n=5\). Left: Contour plot of the estimate of the joint density \(\pi (m,\gamma \mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\) with the curve \(\tau m = 1.125\). The red point indicates the true value of the parameters and the grey one represents the sample means. Centre: Estimate of \(\pi (m\mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\). Right: Estimate of \(\pi (\gamma \mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\). Red dashed vertical lines represent the true value of the parameter, grey solid vertical lines represent the sample means, and blue dashed-dotted vertical lines correspond to 95% HPD intervals

4 Real data examples

In this section our aim is to apply the described methodology to real datasets that represent the logistic growth of populations. These kinds of populations are characterized by an initial approximately exponential growth of the number of individuals till they reach an equilibrium value around which they fluctuate. This equilibrium value, denoted as \(K_e\), mainly depends on the maximum population size supported by the environment. We refer to the latter value as the carrying capacity of the population, denoted as K (see [3]). Population-size dependent branching processes (PSDBPs) are often used to model these kind of data (see, for instance, [4, 12], or [15] and references therein). The PSDBP is a modification of a BGW process. Briefly, the assumption of identical offspring distribution for all the individuals in the BGW process is replaced with the assumption of offspring distributions in each generation which depend on the population sizes. In particular, in order to fit logistic growth data the reproduction laws depend on the current population size, the carrying capacity and on some other parameters. However, the existence of a carrying capacity does not necessarily imply that the reproductive capacity of an individual changes along generations, but rather the probability that an individual successfully becomes a progenitor. Consequently, we propose a CBP to model population logistic growths by considering control laws defined by binomial distributions with a success probability depending on the current population size, z, the carrying capacity, K, and the offspring mean, m. We refer to z/K as density. More precisely, the random variable \(\phi _0(z)\) is distributed following a binomial distribution of size z and success probability given by a function s(mzK). We consider that the process begins with a much smaller initial number of individuals \(Z_0\) than K and \(m>1\). Under this consideration we have \(E[Z_{n+1}\mid Z_n=z]= m z s(m,z,K)\). Although the probabilistic evolution of the described CBP with binomial control can be represented equivalently as a PSDBP, from a practical view point the structure of a CBP makes easier to interpret the parameters involved.

Different functions s(mzK) can be defined to introduce a density-dependent growth inspired by deterministic models. Given their practical relevance we highlight the following ones and the corresponding deterministic models on which the functions s(mzK) are based:

$$\begin{aligned} \begin{array}{rclcl} s^V(m,z,K)&{}=&{}(1-z/K),&{}\qquad &{} \quad \text {Verhulst logistic equation},\\ s^{L,\theta }(m,z,K)&{}=&{}m^{-(z/K)^\theta },&{}\quad \theta>0,&{}\quad \theta \text {-logistic model},\\ s^{H,\beta }(m,z,K)&{}=&{}(1+(m-1)z/K)^{-\beta },&{}\quad \beta >0,&{}\quad \text {Hassell model},\\ s^{G}(m,z,K)&{}=&{}m^{-\log (z+1)/\log (K+1)},&{}\qquad &{} \quad \text {Gompertz model}. \end{array} \end{aligned}$$

In particular, \(\theta =1\) for the second function yields the Ricker model while \(\beta =1\) in the third function gives us the Beverton-Holt model. We notice that, as is reasonable, a high value of density implies a low probability of being progenitor in all the models. The equilibrium value \(K_e\) can be determined by solving the equation \(E[Z_{n+1}\mid Z_n=z]=z\). The respective equilibrium values are \(K_e^V=(1-m^{-1})K\), \(K_e^{L,\theta }=K\), \(K_e^{H, \beta }=K(m^{1/\beta }-1)/(m-1)\), and \(K_e^{G}=K\).

With the aim of making inference on the offspring mean and the equilibrium value for logistic growth data we implemented the ABC SBC algorithm for model choice and estimation of the parameters in Sect. 2 by considering the binomial control distributions introduced above, with the control parameter \(\gamma =K\). We tackled the estimation in two real datasets: yeast data and seal data. We set the same number of iterations, pools of non-extinct simulated processes, tolerance levels and tuning parameter as in the previous simulated examples. The details on the prior distributions are given below for each dataset.

4.1 Yeast dataset

The yeast dataset was already studied in [21] (see Figure 1 (a) in this paper) and it collects the yeast cell numbers in a replicate by colony scan-o-matic from 0 and 72 hours of growth at 20 min intervals. These data are plotted in grey in Fig. 13 below. Note the high dimension of the data and that given the nature of the data, the observed sample is only given by the total size of each generation. To perform the algorithm, we set \(K_{max}=6\) and the prior distribution for \(\gamma =K\) as an uniform distribution on \((1\cdot 10^7, 1.1\cdot 10^7)\) interval. We note that a yeast cell might reproduce more than once in 20 minutes and \(\kappa \) therefore represents the maximum number of yeast cells produced by a cell in this period of time.

Fig. 13
figure 13

Fitted logistic (expected values) curves together with the observed values (grey dots). For the \(\theta \)-logistic model \(\theta =0.55\) and for the Hassell model \(\beta =0.05\)

To choose the best choice of the \(\theta \)-logistic and Hassell models, we ran the algorithm for a grid of values of the \(\theta \) and \(\beta \) parameters and selected the corresponding models which provide the best adjustments. We based our decision on \(R^2_g\), the fraction of variance in the growth data explained by the different logistic regression models, which is the adjustment measure considered in [21]. In Fig. 13 we plotted a point estimates of the expected values of each generation size given by the different logistic regression models and provide the fraction of variance explained by each of them. The maximum value of \(R^2_g\) is provided by Hassell logistic growth model with \(\beta =0.05\), \(R^2_g=0.9946\). It is worthy to point out that this latter value is similar to the one obtained in the study developed in [21]. For this model, we also estimated the joint posterior distribution of the offspring mean, m, and the equilibrium value, \(K_e\), and the corresponding marginal distributions in Fig. 14.

Fig. 14
figure 14

Estimates of the posterior distributions via the ABC algorithm and the local linear regression adjustment with the Hassell model with \(\beta =0.05\). Left: Contour plot of the estimate of the joint density \(\pi (m,K_e\mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\). The grey point represents the sample means. Centre: Estimate of \(\pi (m,\mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\). Right: Estimate of \(\pi (K_e,\mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\). Grey solid lines are the sample means and blue dashed-dotted vertical lines correspond to 95% HPD intervals

Fig. 15
figure 15

Fitted logistic (expected values) curves together with the observed values (grey dots). For the \(\theta \)-logistic model \(\theta =2\) and for the Hassell model \(\beta =1.25\)

4.2 Seal dataset

The seal dataset collects the average annual harbor seal haul-out counts in the coastal estuarine environment of Washington State, USA, from 1975 to 1999. These are provided in Table 6 in the Appendix (see [13] for further details on this dataset) and represented in Fig. 15. It is worthy to point out that these data show missing values and a greater dispersion than yeast data. In this case we use the same value of \(K_{max}\) as in the yeast data example and for prior distribution of \(\gamma =K\) we set a uniform distribution on the interval (5000, 10000). Based on the values of \(R_g^2\) the best adjustment is provided by \(\theta \)-logistic model with \(\theta =2\). For this model the estimated joint posterior distribution of the offspring mean, m, and the equilibrium value, \(K_e\), and the corresponding marginal distributions, which are plotted in Fig. .

Fig. 16
figure 16

Estimates of the posterior distributions via the ABC algorithm and the local linear regression adjustment with the \(\theta \)-logistic model \(\theta =2\). Left: Contour plot of the estimate of the joint density \(\pi (m,K_e\mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\). The grey point represents the sample means. Centre: Estimate of \(\pi (m,\mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\). Right: Estimate of \(\pi (K_e,\mid \widetilde{\kappa }_n,\widetilde{\mathcal {Z}}_n^{obs})\). Grey solid lines are the sample means and blue dashed-dotted vertical lines correspond to 95% HPD intervals

5 Concluding remarks

We dealt with the Bayesian estimation of the main parameters of a CBP in a general context. Precisely, we assumed a parametric framework for the control laws and a non-parametric one for the offspring distribution without any knowledge about its support The two main goals in this setting were to estimate the posterior distribution of the maximum number of offspring per individual, \(\kappa \), and to estimate the posterior distribution of other parameters such as the offspring mean and control parameter based on the Bayes point estimate of \(\kappa \) under the quadratic loss function. To that end, we considered the sample defined by the population sizes in all the generations and the number of progenitors in the last generation.

The methodology that we proposed consists of two steps. In the first one, we used the parameter \(\kappa \) as a model index and applied a SMC ABC algorithm for model choice with the raw data to draw a sample from the estimate of the posterior distribution of \(\kappa \). From this sample, we also proposed the sample mean as point estimate of the value of \(\kappa \). In the second step, given this point estimate and the samples obtained in the last iteration of the method in the previous step, we made use of an ABC algorithm together with a local linear regression adjustment to draw samples from the estimates of the posterior distribution of the parameters of interest related to the offspring and control distribution. In this stage, we introduced an appropriate summary statistic to identify the parameters of the model.

Our empirical results support the suitability of the methodology proposed. First, via several simulated examples, we showed that SMC ABC algorithm for model choice with the raw data enables us to obtain a sample of the posterior distribution of \(\kappa \) relatively easily and identify the main parameters of the reproduction and control laws through the second stage of the algorithm with the summary statistic. Indeed, the resulting posterior distributions are centred around the true value of the parameters. Second, turning to the simulation study in [10] we applied the method to estimate the posterior distribution of the offspring mean and control parameter when the support of the offspring distribution is infinite. In this setting, as indicated above, the parameter \(\kappa \) is now interpreted as such a quantity satisfying that probability that an individual has at most \(\kappa \) offspring is large enough, that is a realistic upper bound for the reproduction capacity of the majority of the individuals. Again, the results obtained are quite satisfactory even in this miss-specified model framework.

We also used our methodology to estimate the posterior distribution of the offspring mean and the equilibrium value for two real datasets that present logistic growth of populations. To the best of our knowledge, this was the first time that CBPs were used as models for populations whose evolution is conditioned by the existence of a maximum capacity of the environment in which evolve. We highlight that the methodology is quite flexible and works reasonably well even with missing values, as happens in seal dataset, and with high value data, as happens in both examples—mainly in the yeast one. In both datasets the adjusted models fit the observed data quite well, providing suitable estimates of the parameters of interest.

We finally remark that in situations where the knowledge on the reproduction law is limited, the computational simplicity of the methodology makes it an appropriate way to generate samples of the the estimate of the posterior distributions of the target parameters. This represents a clear progress compared to previous works in this setting such as [5, 8, 10, 11], even when working with high value data.