1 Introduction

The simple linear birth and death process, which was first introduced by Feller (1939), is a widely used Markov model with applications in population growth, epidemiology, genetics and so on. The basic idea of this process is that the probabilities of any individual giving birth to a new individual, or any individual dying, are constant at any moment in time and all individuals are independent of each other. Many statistical properties, including moments, distribution function, extinction probability, or some other cumulative distribution of interests, are explicitly derived in the literature; see for example, Kendall (1949). The statistical inference for simple birth and death processes is then developed by Keiding (1975), where maximum likelihood estimators and other asymptotic results are discussed. Since the distribution function of simple birth and death processes is explicit, the construction of the likelihood function is straightforward. However, it is pointed out in the literature that the transition probability is actually cumbersome and numerically unstable when the size of population is large over time. At the same time, a variety of alternative estimation methods have been proposed. For example, quasi- and pseudo - likelihood estimators (Chen and Hyrien 2011; Crawford et al. 2014) addressed it as a missing data problem and apply an EM algorithm to maximize it. Tavaré (2018) found those transition probabilities by numerical inversion of the probability generating function and then applied Bayesian methods to perform estimation. Davison et al. (2021) adopted a saddle point approximation method to further improve the accuracy of transition probabilities.

The bivariate and multivariate birth and death process are developed in Griffiths (1972, 1973). Griffiths (1972) described the transmission of malaria (so called host-vector situation) as a bivariate birth and death process where there is no direct infection between the same type of population. Then the author extended the model to multivariate case (Griffiths 1973) which can be regarded as an approximation of general epidemic with several types of infective. However, due to the intractability of the joint probability generating function, maximum likelihood estimation for parameters is not implementable. One possible way forward is to use integer-valued time series to approximate the continuous birth and death process and maximum likelihood estimation would then be feasible.

In recent years, there has been a growing interest in modelling integer-valued time series due to the presence of count data from different scientific fields such as social science, healthcare, insurance, economic and the financial industry. In particular, regarding to the univariate case, Al-Osh and Alzaid (1987) and McKenzie (1985) were the first to consider an INAR(1) model based on the so-called binomial thinning operator. The idea here is to manipulate the operation between coefficients and variables as well as the innovation terms in a way that the values are always integer. One can apply different discrete random variables to describe this operation. For more details, the interested reader can refer to Weiß (2018), Davis et al. (2016), Scotto et al. (2015), Weiß (2008) among many more.

In this paper, we propose an integer-valued autoregressive model of order one (INAR(1)) to approximate continuous birth and death process. In this way, the continuous process is approximated by a discrete Markov chain so that transition probabilities as well as likelihood function can be written down explicitly. As the birth and death process in our setting does not consider any immigrant, the innovation term is dropped in the proposed INAR(1) model. Similar to Nelson (1990), Kirchner (2016), where they find out the relationship between discrete models and their continuous counterparts, we also first need to make sure the our proposed discrete INAR(1) model would converge to birth and death process in weak convergence sense. Then we will explore how our proposed model would help facilitate the statistical inference. According to the probability generating function of the simple birth and death process, the death part can be described by binomial random variable while the birth part corresponds to a negative binomial. Then one can construct a bivariate INAR model based on these random variables to describe the bivariate birth and death process and even the multivariate one. As the transition probabilities and likelihood function of bivariate birth and death process cannot be written down explicitly, the main contribution is that the proposed bivariate INAR(1) model would provide a feasible way to estimate the parameters of bivariate birth and death process (Maximum likelihood estimation).

The paper is organized as follows: Sect. 2 reviews some main results of univariate and bivariate birth and death processes with constant rates. Section 3 introduces Integer-valued autoregressive models as well as some distributional properties. Section 4 constructs the discrete semimartingale using the proposed INAR models and proves the weak convergence between constructed semimartingale and birth and death processes. A simulation study is carried out in Sect. 5 to illustrate the estimation method via proposed INAR models an their corresponding properties of estimators. Some concluding remarks are in Sect. 6.

2 Univariate and bivariate birth and death processes

In this section, we will review the essential elements of simple birth-and-death processes, including moments and other distributional properties. These are well known and extensively discussed in the literature. Then, we will discuss the bivariate case where analytic expressions of the distribution function are not available.

2.1 Simple univariate birth-and-death process

Suppose that we have a population whose total number is evolved as a simple birth and death process \( Z_t \), with constant birth rate \( \lambda \ge 0 \), death rate \( \mu \ge 0 \) and initial population \( Z_0 \in {\mathbb {N}}\). In other words, the probability that any individual gives birth in time \( \Delta \) is \( \lambda \Delta \), and the probability that any individual dies in time \( \Delta \) is \( \mu \Delta \). Individuals are independent of each other. Let \( P_n (t) = \Pr (Z_t = n) \) be the probability that the total population is n at time t. Then the transition probability of the simple birth and death process is characterized by the following ordinary differential equation (ODE)

$$\begin{aligned} {\left\{ \begin{array}{ll} \frac{d P_n(t)}{dt} &{}= \lambda (n-1) P_{n-1} (t) + \mu (n+1) P_{n+1}(t) - (\lambda + \mu ) n P_{n}(t), \quad n \ge 1 \\ P_{Z_0}(0) &{}= 1 \end{array}\right. } \end{aligned}$$
(1)

Applying a liner transform \( \sum _{n} \theta ^n \) on both sides and defining \( \varphi (t,\theta ) = \sum _{n} \theta ^n P_{n}(t) \), we can get a partial differential equation whose solution \( \varphi \) is the probability generating function of \( Z_t^{(a)} \).

$$\begin{aligned} \begin{aligned} \frac{\partial \varphi }{\partial t}&= \lambda \theta ^2 \frac{\partial \varphi }{\partial \theta } + \mu \frac{\partial \varphi }{\partial \theta } -(\lambda + \mu )\theta \frac{\partial \varphi }{\partial \theta }\\&= (\lambda \theta - \mu ) (\theta -1) \frac{\partial \varphi }{\partial \theta } \\ \varphi (0,\theta )&= \theta ^a \end{aligned} \end{aligned}$$
(2)

This linear PDE can be solved explicitly

$$\begin{aligned} \begin{aligned} \varphi (t,\theta )&= \left( 1 - \alpha (t) + \alpha (t) \frac{\beta (t) \theta }{1 - (1-\beta (t))\theta }\right) ^{Z_0} \\ \alpha (t)&= \frac{(\lambda - \mu ) e^{(\lambda - \mu )t}}{\lambda e^{(\lambda - \mu )t} - \mu }, \quad \beta (t) = \frac{\lambda - \mu }{\lambda e^{(\lambda - \mu )t} - \mu } \end{aligned} \end{aligned}$$
(3)

This probability generating function clearly gives the construction of \( Z_t \) given \( Z_0 \), i.e. the sum of i.i.d zero-modified geometric random variables

$$\begin{aligned} Z_t \sim \sum _{i=1}^{Z_0} B_i(\alpha (t))G_i(\beta (t)), \end{aligned}$$
(4)

where \( B_i \) are i.i.d Bernoulli random variables and \( G_i \) are i.i.d Geometric random variables with mean \( \alpha (t) \) and \( \frac{1}{\beta (t)} \), respectively. Furthermore, from the definition of transition probability, the linear birth and death process is a pure-jump semimartingale with following characteristic triplet:

$$\begin{aligned} \begin{aligned}&Ch(Z_t) = {\left\{ \begin{array}{ll} B_t = 0 \\ C_t = 0 \\ \nu (Z_t;dt,dx) = dt K(Z_t,dx) = dt (\lambda Z_{t^-} \delta _{1}(dx) + \mu Z_{t^-} \delta _{-1}(dx)) \end{array}\right. } \\&\int _{R} \left( x^2 \wedge 1 \right) K(Z_t,dx) = (\lambda + \mu ) Z_{t^-} < \infty , \quad \text {given that}\, Z_{t^-} \, \text {is finite} \end{aligned} \end{aligned}$$
(5)

With the help of piece-wise deterministic Markov process theory in Davis (1984), the infinitesimal generator of the simple birth and death process \( Z_t \) acting on a function f(tZ) within its domain \( \Omega ({\mathcal {A}}) \) is given by

$$\begin{aligned} {\mathcal {A}} f(t, Z) = \frac{\partial f}{\partial t} + \lambda Z (f(t, Z+1) - f(t, Z)) + \mu Z (f(t, Z-1) - f(t, Z)), \end{aligned}$$
(6)

where \( \Omega ({\mathcal {A}}) \) is the domain for the generator \( {\mathcal {A}} \) such that f(tZ) is differentiable with respect to t for all tZ, and

$$\begin{aligned} \begin{aligned}&\vert f(t, Z+1) - f(t, Z) \vert< \infty \\&\vert f(t, Z-1) - f(t, Z) \vert < \infty . \end{aligned} \end{aligned}$$
(7)

The first and second moments can be derived by applying infinitesimal generator to the functions \( f(t,z) = Z, Z^2 \) such that

$$\begin{aligned} \begin{aligned} {\mathcal {A}} Z&= \lambda Z(Z+1 - Z) + \mu Z (Z -1 -Z) \\ {\mathcal {A}} Z^2&= \lambda Z( (Z +1)^2 - Z^2) + \mu Z ( (Z-1)^2 - Z^2), \end{aligned} \end{aligned}$$
(8)

which leads to two ODEs,

$$\begin{aligned} \begin{aligned} \frac{d {\mathbb {E}}[Z_t]}{dt}&= (\lambda - \mu ){\mathbb {E}}[Z_t] \\ \frac{d {\mathbb {E}}[Z_t^2]}{dt}&= 2(\lambda - \mu ) {\mathbb {E}}[Z_t^2] + (\lambda + \mu ) {\mathbb {E}}[Z_t] \end{aligned} \end{aligned}$$
(9)

Then, we can solve them explicitly

$$\begin{aligned} \begin{aligned} {\mathbb {E}}[Z_t]&= Z_0 e^{(\lambda - \mu )t} \\ {\mathbb {E}}[Z_t^2]&= Z_0^2 e^{2(\lambda - \mu )t} + \frac{Z_0 (\lambda + \mu )}{(\lambda - \mu )} e^{(\lambda - \mu )t} \left( e^{(\lambda - \mu )t} - 1 \right) \\ Var(Z_t)&= \frac{Z_0 (\lambda + \mu )}{(\lambda - \mu )} e^{(\lambda - \mu )t} \left( e^{(\lambda - \mu )t} - 1 \right) \end{aligned} \end{aligned}$$
(10)

According to the analytic expression of the first moment, it is clear that the population is bound to become extinct if \( \lambda < \mu \).

2.2 Bivariate birth-and-death process

Suppose there are two populations \(\textbf{M} = (M_1,M_2)^T \) with initial population \( \textbf{M}_0 \in {\mathbb {N}}^2_+ \). The rate with which the population \( M_1 \) increases by one is \( \lambda _{21} M_2 + \lambda _{11} M_1 \) while the same for the population \( M_2 \) would be \( \lambda _{12} M_1 + \lambda _{22} M_2\). The subscript \( \lambda _{i,j} \) means that the rate is from population i contributed to population j. The death rate for two populations would be \( \mu _1, \mu _2 \) respectively. The two population is not independent as long as the cross birth rates \(\lambda _{i,j} \ne 0,\ i \ne j\). Then denote \( P_{mn}(t) = \Pr (M_{1,t}= m, M_{2,t}= n)\). This satisfies the following ODE

$$\begin{aligned} {\left\{ \begin{array}{ll} \frac{d P_{m,n}}{dt} =&{} \left( \lambda _{11} (m-1) + \lambda _{21} n \right) P_{m-1,n} + \mu _1 (m+1) P_{m+1,n} \\ &{}+ \left( \lambda _{12} m + \lambda _{22} (n-1) \right) P_{m,n-1} + \mu _2 (n+1) P_{m,n+1}\\ &{}- \left( (\lambda _{11} + \lambda _{12} + \mu _1) m + (\lambda _{21} + \lambda _{22} + \mu _2) n \right) P_{m,n} \\ P_{\textbf{M}_0}(0) =&{} 1, \quad M_{1,0}, M_{2,0} \in {\mathbb {N}}_+ \end{array}\right. } \end{aligned}$$
(11)

Griffiths (1972) introduced this bivariate birth death process (\( \lambda _{11} = \lambda _{22} = 0 \)) to describe the host-vector epidemic situation where the birth probability of two population depends on the size the other population only, e.g. transmission of malaria. To get the joint probability generating function of \( \Psi (t,\theta ,\phi ) = \sum _m \sum _n \theta ^m \phi ^n P_{mn}(t) \), we can apply a linear transform \( \sum _m \sum _n \theta ^m \phi ^n \) on both sides of the ODE. The resulting PDE is

$$\begin{aligned} \begin{aligned} \frac{\partial \Psi }{\partial t}&= \lambda _{11} \theta ^2 \frac{\partial \Psi }{\partial \theta } + \lambda _{21} \theta \phi \frac{\partial \Psi }{\partial \phi } + \mu _1 \frac{\partial \Psi }{\partial \theta } + \lambda _{12} \theta \phi \frac{\partial \Psi }{\partial \theta } + \lambda _{22} \phi ^2 \frac{\partial \Psi }{\partial \phi } + \mu _2 \frac{\partial \Psi }{\partial \phi } \\&\quad - \theta (\lambda _{11} + \lambda _{12} + \mu _1) \frac{\partial \Psi }{\partial \theta } - \phi (\lambda _{21} + \lambda _{22} + \mu _2) \frac{\partial \Psi }{\partial \phi } \\&= (\lambda _{11}\theta ^2 + \lambda _{12} \theta \phi + \mu _1 - \theta (\lambda _{11} + \lambda _{12} + \mu _1)) \frac{\partial \Psi }{\partial \theta } \\&\quad + (\lambda _{22}\phi ^2 + \lambda _{21} \theta \phi + \mu _2 - \phi (\lambda _{21} +\lambda _{22} + \mu _2)) \frac{\partial \Psi }{\partial \phi } \\ \Psi (0,\theta ,\phi )&= \theta ^{M_{1,0}} \phi ^{M_{2,0}} \end{aligned} \end{aligned}$$
(12)

This is a semi-linear PDE. The subsidiary equations are defined as

$$\begin{aligned} \begin{aligned} \frac{d \Psi }{ 0} = \frac{dt}{1}&= \frac{- d\theta }{\lambda _{11}\theta ^2 + \lambda _{12} \theta \phi + \mu _1 - \theta (\lambda _{11} + \lambda _{12} + \mu _1) } \\&= \frac{-d\phi }{\lambda _{22}\phi ^2 + \lambda _{21} \theta \phi + \mu _2 - \phi (\lambda _{21} +\lambda _{22} + \mu _2)} \end{aligned} \end{aligned}$$
(13)

The first fraction does not mean divide \( d \Psi \) by 0 and combining with the second fraction \( \frac{dt}{1} \) infers that \(\Psi =\) constant, according to chapter 8 of Bailey (1991) . Matching the third and fourth differentials above, we have

$$\begin{aligned} \begin{aligned} \frac{d\theta }{d\phi } = \frac{\lambda _{11}\theta ^2 + \lambda _{12} \theta \phi + \mu _1 - \theta (\lambda _{11} + \lambda _{12} + \mu _1) }{\lambda _{22}\phi ^2 + \lambda _{21} \theta \phi + \mu _2 - \phi (\lambda _{21} +\lambda _{22} + \mu _2)} \end{aligned} \end{aligned}$$
(14)

It seems that there is no way to solve this non-linear ODE and therefore no explicit solution is available for this PDE. However, it can be shown that this PDE gives a unique solution by Existence-Uniqueness Theorem for Quasilinear First-Order Equations. With regard to its characteristic, similar to the univariate case, this process is a pure-jump semimartingale with following characteristic triplets:

$$\begin{aligned} \begin{aligned}&Ch(\textbf{M}_t) = {\left\{ \begin{array}{ll} B_t = 0 \\ C_t = 0 \\ \nu (\textbf{M}_t ; dt,dx) = dt K(\textbf{M}_t,dx) = \\ dt ( \tilde{\varvec{\lambda }}_1\delta _{(1,0)}(dx) + \tilde{\varvec{\lambda }}_2 \delta _{(0,1)}(dx) + \tilde{\varvec{\mu }}_1 \delta _{(-1,0)}(dx) + \tilde{\varvec{\mu }}_2 \delta _{(0,-1)}(dx)) \textbf{M}_{t^-} \end{array}\right. }\\&\int _R \left( x^2 \wedge 1 \right) K(M_t,dx) = (\tilde{\varvec{\lambda }}_1 + \tilde{\varvec{\lambda }}_2 + \tilde{\varvec{\mu }}_1 + \tilde{\varvec{\mu }}_2) \textbf{M}_{t^-} < \infty ,\ \text {given that}\, \textbf{M}_{t^-} \, \text {is finite}, \\&\text {where} \\&\tilde{\varvec{\lambda }}_1 = (\lambda _{11},\lambda _{21}), \quad \tilde{\varvec{\lambda }}_2 = (\lambda _{21},\lambda _{22}),\quad \tilde{\varvec{\mu }}_1 = (\mu _1,0), \quad \tilde{\varvec{\mu }}_2 = (0,\mu _2) \end{aligned} \end{aligned}$$
(15)

The moments of this bivariate process can be derived by applying again infinitesimal generator.

Proposition 1

The first and second moments of the bivariate birth and death process \( \textbf{M}_t = (M_{1,t}, M_{2,t}) \) defined in (11) are given by

$$\begin{aligned} \begin{aligned} {\mathbb {E}}[M_{1,t}]&= M_{1,0} \left( \frac{\lambda _{12} c }{2\lambda _{12}c + \kappa _1 - \kappa _2} e^{(\lambda _{12}c - \kappa _2 ) t} + \frac{\lambda _{12} c + \kappa _1 - \kappa _2}{2\lambda _{12}c + \kappa _1 - \kappa _2} e^{-(\lambda _{12}c + \kappa _1)t} \right) \\&+ M_{2,0} \frac{\lambda _{21} }{2\lambda _{12}c + \kappa _1 - \kappa _2} \left( e^{(\lambda _{12}c - \kappa _2 ) t} - e^{-(\lambda _{12}c + \kappa _1)t} \right) \\ {\mathbb {E}}[M_2,t]&= M_{1,0} \frac{\lambda _{12} }{2\lambda _{12}c + \kappa _1 - \kappa _2} \left( e^{(\lambda _{12}c - \kappa _2 ) t} - e^{-(\lambda _{12}c + \kappa _1)t} \right) \\&+ M_{2,0} \left( \frac{\lambda _{12} c +\kappa _1 - \kappa _2 }{2\lambda _{12}c + \kappa _1 - \kappa _2} e^{(\lambda _{12}c - \kappa _2 ) t} + \frac{\lambda _{12} c}{2\lambda _{12}c + \kappa _1 - \kappa _2} e^{-(\lambda _{12}c + \kappa _1)t} \right) , \end{aligned} \end{aligned}$$
(16)

where

$$\begin{aligned} \kappa _1 = \mu _1 - \lambda _{11}, \quad \kappa _2 = \mu _2 - \lambda _{22}, \quad c = \frac{\kappa _2 - \kappa _1 + \sqrt{(\kappa _1 - \kappa _2)^2 + 4\lambda _{21}\lambda _{12}}}{2\lambda _{12}}. \end{aligned}$$

The second moments \( {\mathbb {E}}[M_{1,t}^2], {\mathbb {E}}[M_{2,t}^2] \) and \( {\mathbb {E}}[M_{1,t} M_{2,t}] \) are determined by the following system of ODE,

$$\begin{aligned} \begin{aligned} \frac{d}{dt}{\mathbb {E}}[M_{1,t}^2]&= -2\kappa _1 {\mathbb {E}}[M_{1,t}^2] + 2\lambda _{21} {\mathbb {E}}[M_{1,t} M_{2,t}] + \lambda _{21} {\mathbb {E}}[M_{2,t}]+ \mu _1 {\mathbb {E}}[M_{1,t}] \\ \frac{d}{dt}{\mathbb {E}}[M_{2,t}^2]&= -2\kappa _2 {\mathbb {E}}[M_{2,t}^2] + 2\lambda _{12} {\mathbb {E}}[M_{1,t} M_{2,t}] + \lambda _{12} {\mathbb {E}}[M_{1,t}] + \mu _2 {\mathbb {E}}[M_{2,t}] \\ \frac{d}{dt} {\mathbb {E}}[M_{1,t} M_{2,t}]&= -(\kappa _1 + \kappa _2) {\mathbb {E}}[M_{1,t} M_{2,t}] + \lambda _{21} {\mathbb {E}}[M_{2,t}^2] + \lambda _{12} {\mathbb {E}}[M_{1,t}^2] \end{aligned} \end{aligned}$$
(17)

Proof

See Appendix A.1. \(\square \)

Note that to ensure the bivariate process becomes extinct with probability one, we need the (necessary and sufficient condition) \( (\mu _1 - \lambda _{11})(\mu _2 - \lambda _{22}) > \lambda _{12} \lambda _{21} \) according to Griffiths (1973). Many interesting properties of the process have been investigated by Griffiths (1972, 1973). In general, this bivariate birth and death process is not straightforward to apply in practice because there are no explicit solutions to the above PDE, and the second moments have to be evaluated by numerical methods. The discrete integer-value model proposed in the next section would be a possible solution.

3 Univariate and bivariate INAR models

In this section, we will introduce integer-valued autoregressive models which will serve as discrete approximations for continuous counterparts discussed in the last section. The derivation of this approximation will demonstrate how to parameterize the bivariate INAR case.

3.1 Univariate INAR model

The classical integer-value autoregressive (INAR) model is introduced by defining a so-called binomial thinning operator \( \circ \) such that \( \alpha \circ X \) is the sum of X i.i.d Bernoulli random variable with success probability \( \alpha \). i.e.

$$\begin{aligned} \alpha \circ X = \sum _{i=1}^{X} b_i, \quad b_i \overset{i.i.d}{\sim }\ \text {Bernoulli}(\alpha ) \end{aligned}$$
(18)

A well-known Poisson INAR(1) model \( X_t \) is given by

$$\begin{aligned} X_t = \alpha \circ X_{t-1} + R_t, \end{aligned}$$
(19)

where \( \{R_i\}_{i=1,\dots ,t} \) are i.i.d Poisson variables with parameter \( \rho \). The key idea of the integer-value model would be the operator \( \circ \). One can choose different discrete random variables to construct different integer-valued models. Indicated by the transition probability of continuous birth and death process, i.e. the sum of i.i.d zero-modified geometric random variables shown in Eq. (4), INAR model can be a good approximation by combining \( \circ \) and geometric operator as defined below.

Definition 1

A birth and death INAR(1) model with survival probability \( \alpha \in [0,1] \) and birth probability \( p \in [0,1] \) is defined as

$$\begin{aligned} X_t = p *_1 \alpha \circ X_{t-1}, \end{aligned}$$
(20)

where

  • \( \circ \) is the binomial operator

  • \( *_1 \) is a geometric (reproduction) operator such that \( p*_1 X = \sum _{i=1}^{X} g^{(1)}_i \) with \( g^{(1)}_i \) being i.i.d geometric random variable with success probability p whose probability mass function is given by

    $$\begin{aligned} P(g^{(1)}_i = k ) = p(1 - p)^{k-1}, \quad k = 1,2, \dots , \end{aligned}$$
  • \( p*_1 \alpha \circ X = \sum _{i=1}^{\alpha \circ X} g^{(1)}_i\)

Remark The innovation is dropped as there is no independent immigrant process in the birth and death process investigated.

Proposition 2

The birth and death INAR(1) model has the following statistical properties

  1. 1.

    The probability generating function of \(X_t\) can be iterated backwardly such that

    $$\begin{aligned} \begin{aligned} \varphi ^{(I)}(t,\theta )&= {\mathbb {E}}[\theta ^{X_t}] ={\mathbb {E}}\left[ \left( 1 - \alpha + \frac{\alpha p \theta }{1 - (1-p)\theta }\right) ^{X_{t-1}}\right] \\&={\mathbb {E}}\left[ \left( 1 - \alpha _i + \frac{\alpha _i p_i \theta }{1 - (1-p_i)\theta }\right) ^{X_{t-i}}\right] , \quad i = 1, \dots , t \end{aligned} \end{aligned}$$
    (21)

    where

    $$\begin{aligned} \begin{aligned} p_i&= \frac{p^i}{d_{i-1}} \quad \alpha _i = \frac{\alpha ^i}{d_{i-1}} \\ d_i&= p^i \left( 1 + (1-p)\frac{\frac{\alpha }{p} - \left( \frac{\alpha }{p}\right) ^{i+1}}{1 - \frac{\alpha }{p}} \right) \\ \end{aligned} \end{aligned}$$
    (22)

    In order words, the birth and death operator \(p*_1 \alpha \circ \) as a whole is iterable.

    $$\begin{aligned} X_t = p_1 *_1 \alpha _1 \circ X_{t - 1} = p_2 *_1 \alpha _2 \circ X_{t-2} = \dots = p_t *_1 \alpha _t \circ X_0 \end{aligned}$$
    (23)
  2. 2.

    Then the mean, variance and covariance are given by

    $$\begin{aligned} \begin{aligned} {\mathbb {E}}[X_t]&= \frac{\alpha _i}{p_i} {\mathbb {E}}[X_{t-i}] \\ Var(X_{t})&= \left( \frac{\alpha _i (1- p_i)}{p_i^2} + \frac{\alpha _i(1- \alpha _i)}{p_i^2} \right) {\mathbb {E}}[X_{t-i}] + \frac{\alpha _i^2}{p_i^2} Var(X_{t-i}) \\ Cov(X_t, X_{t-i})&= \frac{\alpha _i}{p_i} Var(X_{t-i}) \end{aligned} \end{aligned}$$
    (24)

Proof

See Appendix A.2. \(\square \)

Note that if \( \alpha /p < 1 \), the process \( X_t \) will become extinct eventually. It is obvious that the continuous birth and death process can be approximated by this discrete INAR(1) model by directly matching the probability generating function \(\varphi ^{(I)}\) to the one \(\varphi \) in Eq. (3) as the \( p *_1 \alpha \circ X\) is the sum of X i.i.d zero-modified geometric random variables.

3.2 Bivariate INAR model

Discrete approximation for univariate birth and death process is somehow simple because the PDE(2) has an explicit solution and hence the distribution is already known. In the case where the dynamic of two populations are characterized by (11), no explicit solution for its PDE (12). However, from the birth and death INAR(1) model, it is clear that birth and death probability are closely related to binomial and negative binomial random variables. Based on the dynamic (11) and linear form of the first moment (16), a bivariate INAR(1) model is proposed as follows.

Definition 2

A bivariate birth and death INAR(1) model \( \textbf{Y}_t = (Y_{1,t}, Y_{2,t})^T \) with survival probability \( \alpha _1, \alpha _2 \in [0,1] \) and birth probability \( \beta _{11}, \beta _{12}, \beta _{21}, \beta _{22} \in [0,1] \) is defined as

$$\begin{aligned} \begin{aligned} Y_{1,t} = \beta _{11} *_1 \alpha _1 \circ Y_{1,t-1} + \beta _{21} *_2 Y_{2,t-1} \\ Y_{2,t} = \beta _{12} *_2 Y_{1,t-1} + \beta _{22} *_1 \alpha _2 \circ Y_{2,t-1}, \end{aligned} \end{aligned}$$
(25)

where

  • \( \circ \) is the binomial operator

  • \( *_2 \) is another geometric (reproduction) operator different from \( *_1 \) such that \( \beta *_2 X = \sum _{i=1}^{X} g^{(2)}_i \) with \( g^{(2)}_i \) being i.i.d geometric random variable whose success probability is \( \beta \). The probability mass function is given by

    $$\begin{aligned} P(g^{(2)}_i = k ) = \beta (1 - \beta )^{k}, \quad k = 0,1,2, \dots , \end{aligned}$$
  • Conditional on \( \textbf{Y}_{t-1} \), the random variables \( \beta _{11} *_1 \alpha _1 \circ Y_{1,t-1}, \ \beta _{21} *_2 Y_{2,t-1}, \ \beta _{12} *_2 Y_{1,t-1} \ \text {and} \ \beta _{22} *_1 \alpha _2 \circ Y_{1,t-1} \) are all independent of each other.

Now it seems that the structure of bivariate INAR(1) matches the the dynamics of (11), i.e. the birth probability depends on the size of both populations while death probability depends on the size of its own population. We adopt another geometric random variable \( g^{(2)} \) which is slightly different from \( g^{(1)} \) because for example, if we use \( g^{(1)} \), \( Y_{1,t} \ge Y_{2,t-1} \forall t \) which is not reasonable when \( Y_{1,t-1} < Y_{2,t-1} \) for a population.

Proposition 3

The first and second moments of the bivariate INAR(1) defined above are characterized by the following recursive formulas

$$\begin{aligned} \begin{aligned} {\mathbb {E}}[Y_{1,t}]&= \frac{\alpha _1}{\beta _{11}} {\mathbb {E}}[Y_{1,t-1}] + \frac{1-\beta _{21}}{\beta _{21}} {\mathbb {E}}[Y_{2,t-1}] \\ {\mathbb {E}}[Y_{2,t}]&= \frac{1-\beta _{12}}{\beta _{12}} {\mathbb {E}}[Y_{1,t-1}] + \frac{\alpha _2}{\beta _{22}} {\mathbb {E}}[Y_{2,t-1}] \\ Var(Y_{1,t})&= \frac{\alpha _1^2 }{\beta _{11}^2}Var(Y_{1,t-1}) + \frac{\alpha _1(2 - \beta _{11} - \alpha _1)}{\beta _{11}^2} {\mathbb {E}}[Y_{1,t-1}] + \left( \frac{1-\beta _{21}}{\beta _{21}}\right) ^2 Var(Y_{2,t-1}) \\&+ \frac{1-\beta _{21}}{\beta _{21}^2} {\mathbb {E}}[Y_{2,t-1}] + 2\frac{\alpha _1 (1-\beta _{21})}{\beta _{11}\beta _{21}} Cov(Y_{1,t-1},Y_{2,t-1}) \\ Var(Y_{2,t})&= \left( \frac{1-\beta _{12}}{\beta _{12}}\right) ^2 Var(Y_{1,t-1}) + \frac{1-\beta _{12}}{\beta _{12}^2} {\mathbb {E}}[Y_{1,t-1}] + \frac{\alpha _2^2}{\beta _{22}^2} Var(Y_{2,t-1}) \\&+ \frac{\alpha _2 (2-\beta _{22} -\alpha _2) }{\beta ^2_{22}} {\mathbb {E}}[Y_{2,t-1}] + 2\frac{\alpha _2(1-\beta _{12})}{\beta _{12}\beta _{22}} Cov(Y_{1,t-1}, Y_{2,t-1}) \\&Cov(Y_{1,t}, Y_{2,t}) \\&= \left( \frac{\alpha _1 \alpha _2}{\beta _{11}\beta _{22}} + \frac{(1-\beta _{21})(1-\beta _{12})}{\beta _{12} \beta _{21}} \right) Cov(Y_{1,t-1},Y_{2,t-1}) \\&+ \frac{\alpha _1 (1-\beta _{12})}{\beta _{11}\beta _{12}}Var(Y_{1,t-1}) + \frac{\alpha _2 (1-\beta _{21})}{\beta _{21} \beta _{22}} Var(Y_{2,t-1}) \end{aligned} \end{aligned}$$
(26)

Proof

Similar to Proposition 2, the moments can be derived by conditional expectation. The first and second moment for random variable \( g^{(2)}_i \) with parameter \( \beta \) are \( \frac{1-\beta }{\beta } \) and \( \frac{1-\beta }{\beta ^2} \). Then the first moment for \(X_t\) are

$$\begin{aligned} \begin{aligned} {\mathbb {E}}[Y_{1,t} \vert \textbf{Y}_{t-1}]&= {\mathbb {E}}[\beta _{11} *_1 \alpha _1 \circ Y_{1,t-1} \vert Y_{1,t-1}] + {\mathbb {E}}[\beta _{21} *_2 Y_{2,t-1} \vert Y_{2,t-1}] \\&= \frac{\alpha _1}{\beta _{11}} Y_{1,t-1} + \frac{1-\beta _{21}}{\beta _{21}} Y_{2,t-1} \\ \end{aligned} \end{aligned}$$

The second moments are given by

$$\begin{aligned} \begin{aligned} Var(Y_{1,T} \vert \textbf{Y}_t)&= Var(\beta _{11} *_1 \alpha _1 \circ Y_{1,t-1} \vert Y_{1,t-1}) + Var(\beta _{21} *_2 Y_{2,t-1} \vert Y_{2,t-1}) \\&= \frac{\alpha _1 (2 - \beta _{11} - \alpha _1)}{\beta _{11}^2} Y_{1,t-1} + \frac{1-\beta _{21}}{\beta _{21}^2} Y_{2,t-1} \\ Var(Y_{1,t})&= Var({\mathbb {E}}[ Y_{1,t-1} \vert \textbf{Y}_{t-1}]) + {\mathbb {E}}[ Var(Y_{1,t} \vert \textbf{Y}_{t-1})] \\&+ 2Cov({\mathbb {E}}[ \beta _{11} *_1 \alpha _1 \circ Y_{1,t-1} \vert Y_{1,t-1}], {\mathbb {E}}[\beta _{21} *_2 Y_{2,t-1} \vert Y_{2,t-1}]) \\&+ 2{\mathbb {E}}[ Cov( \beta _{11} *_1 \alpha _1 \circ Y_{1,t-1}, \beta _{21} *_2 Y_{2,t-1} \vert \textbf{Y}_{t-1}) ]\\&= Var({\mathbb {E}}[ Y_{1,t} \vert \textbf{Y}_{t-1} ]) + {\mathbb {E}}[Var(Y_{1,t} \vert \textbf{Y}_{t-1} )] \\ {}&+ \frac{\alpha _1(1-\beta _{21})}{\beta _{11}\beta _{21}} Cov(Y_{1,t-1},Y_{2,t-1}) \\ Cov(Y_{1,t}, Y_{2,t})&= Cov(\beta _{11} *_1 \alpha _1 \circ Y_{1,t-1}, \beta _{12} *_2 Y_{1,t-1}) \\&+ Cov(\beta _{11} *_1 \alpha _1 \circ Y_{1,t-1}, \beta _{22} *_1 \alpha _2 \circ Y_{2,t-1}) \\&+ Cov(\beta _{21} *_2 Y_{2,t-1}, \beta _{12} *_2 Y_{1,t-1}) + Cov(\beta _{21} *_2 Y_{2,t-1}, \beta _{22} *_1 \alpha _2 \circ Y_{2,t-1}) \\&= \frac{\alpha _1(1-\beta _{12})}{ \beta _{11} \beta _{12}} Var(Y_{1,t-1}) + \frac{\alpha _1 \alpha _2}{\beta _{11}\beta _{22}} Cov(Y_{1,t-1}, Y_{2,t-1}) \\&+ \frac{(1-\beta _{12})(1-\beta _{21})}{\beta _{12}\beta _{21}} Cov(Y_{2,t-1}, Y_{1,t-1}) + \frac{(1-\beta _{21}\alpha _2)}{\beta _{21}\beta _{22}}Var(Y_{2,t-1}) \end{aligned} \end{aligned}$$

The first and second moments of \( Y_{2,t} \) can be derived in a similar way. \(\square \)

Proposition 4

If the eigen-values \( \eta _1, \eta _2 \) of the following matrix

$$\begin{aligned} A = \begin{bmatrix} \frac{\alpha _1}{\beta _{11}} &{} \frac{1-\beta _{21}}{\beta _{21}} \\ \frac{1-\beta _{12}}{\beta _{12}} &{} \frac{\alpha _2}{\beta _{22}} \end{bmatrix} \end{aligned}$$
(27)

lie in the interval \( [-1, 1] \), then the bivariate population \( X_t, Y_t \) will become extinct eventually.

Proof

The first moment can be expressed in a matrix form

$$\begin{aligned} {\mathbb {E}}[\textbf{Y}_t] = A {\mathbb {E}}[\textbf{Y}_{t-1}] = A^t {\mathbb {E}}[\textbf{Y}_0] \end{aligned}$$
(28)

The tth power of a matrix here is defined as t times matrix multiplication. By eigen-decomposition, power of a matrix can be expressed as

$$\begin{aligned} A^t = Q \mathop {\textrm{diag}}\nolimits (\{\eta _1^t, \eta _2^t\}) Q^{-1}, \end{aligned}$$
(29)

where \( Q = (\nu _1, \nu _2)\) is eigen vector matrix with \( \nu _1, \nu _2 \) as eigen vectors for \( \eta _1,\eta _2 \). Now, it is clear that \( {\mathbb {E}}[\textbf{Y}_t] \) is decreasing in t when \( \eta _1, \eta _2 \in [-1,1]\). \(\square \)

4 Weak convergence to continuous birth and death process

In this section, we will construct two continuous processes from the above proposed INAR models. These processes, under a certain parametrization, will converge weakly to the aforementioned continuous birth and death processes when the length of sub-interval goes to 0.

4.1 Construction of continuous processes

Since the continuous birth and death processes are clearly semimartingale defined in non-negative state spaces, to apply limit theorem of locally bounded semimartingales, we need to construct ’continuous’ processes on a dense subsets of \( {\mathbb {R}}_+ \) (will take \( t \in [0,1] \) for convenience) and compute their characteristic triplets from the discrete INAR models. Finally, when everything is set up nicely, we can apply weak convergence of semimartingale theorem to prove the result. The construction mainly follows from Jacod and Shiryaev (2013, Chapter II, section 3).

Starting with a discrete basis \( {\mathcal {B}} = (\mathbf {\Omega }, \textbf{F},({\mathcal {F}}_n)_{n \in {\mathbb {N}}},\textbf{P}) \), assume that he INAR models \( X_n \) and \( \textbf{Y}_n \) defined above are adapted to this discrete stochastic basis and so as the increment processes

$$\begin{aligned} \begin{aligned}&U_k = X_k - X_{k-1}, \quad U_0 = X_0 \\&\textbf{V}_k = \textbf{Y}_k - \textbf{Y}_{k-1}, \quad \textbf{V}_0 = \textbf{Y}_0, \quad k = 0,1,2, \dots \end{aligned} \end{aligned}$$
(30)

then we can construct ’continuous’ processes via time change.

Definition 3

Given a fixed time interval [0, 1] , one can define a equal-length grid with size n such that each subinterval with length \( \Delta = \frac{1}{n} \). The following the processes:

$$\begin{aligned} Z_t^{(n)} = \sum _{k=0}^{\sigma _t} U_k, \quad \textbf{M}_t^{(n)} = \sum _{k=0}^{\sigma _t} \textbf{V}_t, \end{aligned}$$
(31)

where \( \sigma _t = \lfloor tn \rfloor \), are adapted to the continuous-time basis \( \tilde{{\mathcal {B}}} = (\mathbf {\Omega }, \textbf{F}, G = (\mathfrak {g}_t)_{t\ge 0}, \textbf{P}) \). The parameters setting for \( Z_t^{(n)} \) are

$$\begin{aligned} \alpha = \frac{(\lambda - \mu )e^{(\lambda - \mu )\Delta }}{\lambda e^{(\lambda - \mu )\Delta } - \mu }, \quad p = \frac{\lambda - \mu }{\lambda e^{(\lambda - \mu )\Delta } - \mu }. \end{aligned}$$
(32)

The parameters setting for \( M_t^{(n)} \) are

$$\begin{aligned} \begin{aligned} \alpha _1&= \frac{(\lambda _{11} - \mu _1) \omega _1(\Delta ) }{\lambda _{11}\omega _1(\Delta ) - \mu _1 }, \quad \alpha _2 = \frac{(\lambda _{22} - \mu _2) \omega _2(\Delta ) }{\lambda _{22}\omega _2(\Delta ) - \mu _2 } \\ \beta _{11}&= \frac{\lambda _{11} - \mu _1}{\lambda _{11}\omega _1(\Delta ) - \mu _1},\quad \beta _{22} = \frac{\lambda _{22} - \mu _2}{\lambda _{22}\omega _2(\Delta ) - \mu _2}\\ \beta _{21}&= \left( 1 + C_{\beta _1} \left( e^{ u_1 \Delta } - e^{u_2\Delta } \right) \right) ^{-1}, \quad \beta _{12} = \left( 1 + C_{\beta _2} \left( e^{u_1\Delta } - e^{u_2\Delta } \right) \right) ^{-1}, \end{aligned} \end{aligned}$$
(33)

where

$$\begin{aligned} \begin{aligned} \omega _1 (\Delta )&= C_{\alpha } e^{u_1\Delta } + (1 -C_{\alpha }) e^{u_2\Delta }, \quad \omega _2 (\Delta ) = (1-C_{\alpha }) e^{u_1 \Delta } + C_{\alpha } e^{u_2\Delta } \\ C_{\alpha }&= \frac{\lambda _{12} c }{2\lambda _{12}c + \mu _1 - \mu _2}, \ C_{\beta _1} = \frac{\lambda _{21} }{2\lambda _{12}c + \kappa _1 - \kappa _2}, \ C_{\beta _2} = \frac{\lambda _{12} }{2\lambda _{12}c + \kappa _1 - \kappa _2} \\ u_1&= \lambda _{12} c - \kappa _2, \quad u_2 = -(\lambda _{12} c + \kappa _1),\quad \kappa _i = \mu _i - \lambda _{ii}, \ i= 1,2 \\ c&= \frac{\kappa _2 - \kappa _1 + \sqrt{(\kappa _1 - \kappa _2)^2 + 4\lambda _{21}\lambda _{12}}}{2\lambda _{12}}. \end{aligned} \end{aligned}$$

It is straightforward to derive the parameter setting for univariate case since we only need to match the parameter via probability generating function between \( Z_t^{(n)} \) and \( Z_t \). However, in the other case where the closed form probability generating function for \( \textbf{M}_t \) is not available, we need to seek other ways to set up \(\alpha _i\) and \(\beta _{i,j}\) in terms of \(\lambda \) and \(\mu \). The direct approach would be to match the first and second order moments to see whether it works. It is clear that we can match moment equations (26) to (16) and find out the mapping of \( \beta _{12}, \beta _{21} \) in terms of \( \lambda _{i,j}, \ \mu _{i}, \ i,j \in \{1,2\} \). Unfortunately, only the ratio \( \alpha _i /\beta _{ii} \) is known. Nevertheless, the parameter setting in univariate case shows us the way to distribute the ratio \( \alpha /p \) to \( \alpha \) and p. Then \( \alpha _i, \beta _{ii}\) can be set up in a similar way.

Proposition 5

With the above parameters setting and any non-negative integer m, the transition probabilities for \( Z_t^{(n)} \) conditional on \( Z_{t-\Delta }^{(n)} = k \) are

$$\begin{aligned} \begin{aligned}&\Pr (Z_t^{(n)} = k + m \vert Z_{t-\Delta }^{(n)} = k) = \left( {\begin{array}{c}k+m-1\\ k-1\end{array}}\right) (\lambda \Delta )^m + o(\Delta ^m) \\&\Pr (Z_t^{(n)} = k - m \vert Z_{t-\Delta }^{(n)} = k) = \left( {\begin{array}{c}k\\ k-m\end{array}}\right) (\mu \Delta )^m + o(\Delta ^m) \end{aligned} \end{aligned}$$
(34)

The above probabilities can be simplified as,

$$\begin{aligned} \begin{aligned}&\Pr \left( Z_t^{(n)} = k + 1 \vert Z_{t-\Delta }^{(n)} = k\right) = \lambda k \Delta + o(\Delta )\\&\Pr \left( Z_t^{(n)} = k - 1 \vert Z_{t-\Delta }^{(n)} = k\right) = \mu k \Delta + o(\Delta ) \\&\Pr \left( \vert Z_t^{(n)} - k \vert \ge 2 \vert Z_{t-\Delta }^{(n)} = k\right) = o(\Delta ) \end{aligned} \end{aligned}$$
(35)

On the other hand, the transition probabilities for \( \textbf{M}_t^{(n)} \) conditional on \( \textbf{M}_{t-\Delta }^{(n)} = \textbf{k} = (k_1, k_2) \) given by

$$\begin{aligned} \begin{aligned}&\Pr (M_{i,t}^{(n)} = k_i + m \vert \textbf{M}_{t-\Delta }^{(n)} = \textbf{k} ) \\&\quad = \sum _{j = k_i}^{k_i + m} \left( {\begin{array}{c}j-1\\ k_i - 1\end{array}}\right) \left( {\begin{array}{c}k_1 + k_2 + m - j - 1\\ k_{i'} - 1\end{array}}\right) (\lambda _{ii}\Delta )^{j - k_i} (\lambda _{i',i}\Delta )^{k_i+m - j} + o(\Delta ^m) \\&\quad \Pr (M_{i,t}^{(n)} = k_i - m \vert \textbf{M}_{t-\Delta }^{(n)} = \textbf{k} ) = \left( {\begin{array}{c}k_i\\ k_i - m\end{array}}\right) (\mu _i \Delta )^m + o(\Delta ^m), \end{aligned} \end{aligned}$$
(36)

where \(i \in \{1,2\} \) and \( i' = 3 - i \). Due to the conditional independence of bivariate INAR models, the joint transition probabilities for \( \textbf{M}_t^{(n)} \) conditional on \( \textbf{M}_{t-\Delta }^{(n)} \) are

$$\begin{aligned} \begin{aligned}&\Pr ( M_{1,t}^{(n)} = k_1 \pm m_1, M_{2,t}^{(n)} = k_2 \pm m_2 \vert \textbf{M}_{t-\Delta }^{(n)}=\textbf{k}) \\&\quad =\Pr (M_{1,t}^{(n)} = k_1 \pm m_1 \vert \textbf{M}_{t-\Delta }^{(n)} = \textbf{k} ) \Pr (M_{2,t}^{(n)} = k_2 + m_2 \vert \textbf{M}_{t-\Delta }^{(n)} = \textbf{k} ) \end{aligned} \end{aligned}$$
(37)

Similarly, the above probabilities can be simplified as

$$\begin{aligned} \begin{aligned}&\Pr (M_{i,t}^{(n)} = k_i + 1 \vert \textbf{M}_{t-\Delta }^{(n)} = \textbf{k} ) = \lambda _{ii}k_1 \Delta + \lambda _{i',i}k_2\Delta + o(\Delta ) \\&\Pr (M_{i,t}^{(n)} = k_i - 1 \vert \textbf{M}_{t-\Delta }^{(n)} = \textbf{k} ) = \mu _i k_i \Delta + o(\Delta ) \\&\Pr \left( \vert M_{i,t}^{(n)} - k_i \vert \ge 2 \vert \textbf{M}_{t-\Delta }^{(n)} = \textbf{k} \right) = o(\Delta ) \end{aligned} \end{aligned}$$
(38)

Proof

See Appendix A.3. \(\square \)

It is obvious that the above transition probabilities have exactly the same form as continuous counterparts when \( m = 1 \). Consequently, the Lévy measures of \( Z^{(n)}_t \) and \( \textbf{M}_t^{(n)} \) have similar structure to their continuous counterparts.

Proposition 6

The continuous processes \( Z_t^{(n)} \) and \( \textbf{M}_t^{(n)} \) defined above are semimartingales with following characteristics triplets.

$$\begin{aligned} \begin{aligned}&Ch(Z_t^{(n)}) = {\left\{ \begin{array}{ll} &{}B_t = 0 \\ &{}C_t = 0 \\ &{}\nu ([0,t] \times g) = \sum _{k=1}^{\sigma _t} (g(1)\lambda + g(-1)\mu )X_{k-1}\Delta + O(\Delta ) \end{array}\right. } \\&Ch(\textbf{M}_t^{(n)}) = {\left\{ \begin{array}{ll} \textbf{B}_t = 0 \\ \textbf{C}_t = 0 \\ \mathbf {\nu }([0,t] \times g) = &{} \sum _{k=1}^{\sigma _t} \left( g(1,0)\tilde{\varvec{\lambda }}_{1} + g(-1,0)\tilde{\varvec{\mu }}_1\right) \textbf{Y}_{k-1}\Delta \\ &{}+ \left( g(0,1)\tilde{\varvec{\lambda }}_2 + g(0,-1)\tilde{\varvec{\mu }}_2\right) \textbf{Y}_{k-1}\Delta \\ {} &{}+ O(\Delta ), \end{array}\right. } \end{aligned} \end{aligned}$$
(39)

where the g is a continuous, non-negative, bounded Borel function vanishing near 0 and \( \textbf{M}_t^{(n)} \) respectively, the truncation function is \( h = \vert x \vert \textbf{1}_{\{\vert x \vert < 1 \}} \) and

$$\begin{aligned} \tilde{\varvec{\lambda }}_1 = (\lambda _{11},\lambda _{21}), \quad \tilde{\varvec{\lambda }}_2 = (\lambda _{21},\lambda _{22}),\quad \tilde{\varvec{\mu }}_1 = (\mu _1,0), \quad \tilde{\varvec{\mu }}_2 = (0,\mu _2) \end{aligned}$$

Proof

See Appendix A.4. \(\square \)

Theorem 7

With the the definition and the parametrization above, and the initial distribution condition:

$$\begin{aligned} Z_0^{(n)} = Z_0, \quad \textbf{M}_0^{(n)} = \textbf{M}_0, \end{aligned}$$
(40)

the processes \( Z_t^{(n)} \) and \( \textbf{M}_t^{(n)} \) converge weakly to the continuous birth and death processes \( Z_t \) and \( \textbf{M}_t \).

$$\begin{aligned} \begin{aligned}&\underset{n \rightarrow \infty }{\lim }\ Z_t^{(n)} \overset{w}{\rightarrow }\ Z_t \\&\underset{n \rightarrow \infty }{\lim }\ \textbf{M}_t^{(n)} \overset{w}{\rightarrow }\ \textbf{M}_t, \end{aligned} \end{aligned}$$
(41)

when the size of subinterval \( \Delta \) goes to 0 or equivalently, \( n \rightarrow \infty \).

Proof

Here we simply apply Theorem 3.39 from Jacod and Shiryaev (2013, chapter IX, section 3), the limit theorem of semimartingales for the locally bounded case.

  1. i

    The local strong Majorization Hypothesis: For both cases \( Z_t \) and \( \textbf{M}_t \), the first two terms of the characteristic triplets are 0 and stochastic integrals with respect to the function is clearly finite on [0, 1]

  2. ii

    Local Conditions on big jumps: For both cases \( Z_t \) and \( \textbf{M}_t \), there is no jump with absolute size greater than 1.

  3. iii

    The local uniqueness: for every choices of initial distributions for \( Z_0 \) and \( \textbf{M}_0 \), their Lévy measures are uniquely characterized by their (joint) probability distribution functions.

  4. iv

    Continuity Condition, the characteristic triplets \(B_t(\omega ), C_t(\omega ), \nu (\omega ; dt, dx) \) of \( Z_t \) and \( \textbf{M}_t \) are continuous with respect to \( \omega \).

  5. v

    Weak convergence of initial distribution. This is stated at the beginning of this theorem.

  6. vi

    Convergence of characteristic triplet of discrete processes to that of their continuous counterparts. This can be proved by showing the uniform convergence of Lévy measures. For every \( a >0 \), define a stopping time for the population process:

    $$\begin{aligned} S_a(X) = \inf \left\{ t: \vert X_t \vert> a, \ \text {or} \ \vert X_{t^-} \vert > a \right\} \end{aligned}$$
    (42)

For the univariate case, the stochastic integral with respect to \( g*v \) for any Borel function g is given by

$$\begin{aligned} \begin{aligned} (g* \nu _{t \wedge S_a})\circ Z^{(n)} =&g *\nu (Z^{(n)};[0,t \wedge S_a(Z^{(n)})],R) \\ =&\int _0^{t \wedge S_a(Z^{(n)})} \int _{R} g(x) (\lambda \delta _1(dx) + \mu \delta _{-1}(dx)) Z^{(n)}_{s^-} ds \\ =&\int _0^{t \wedge S_a(Z^{(n)})} ( g(1)\lambda + g(-1)\mu ) Z^{(n)}_{s^-} ds \\ =&\sum _{k=1}^{\sigma _{t \wedge S_a(Z^{(n)})}} \left( g(1) \lambda + g(-1) \mu \right) Z^{(n)}_{k-1}\Delta \\&+ \left( g(1)\lambda + g(-1) \mu \right) Z^{(n)}_{\sigma _{t \wedge S_a(Z^{(n)})}} \left( t \wedge S_a(Z^{(n)}) - \sigma _{t \wedge S_a(Z^{(n)})} \Delta \right) \\ \end{aligned} \end{aligned}$$
(43)

and the absolute difference of two stochastic integrals is given by,

$$\begin{aligned} \begin{aligned}&\vert g * \nu ^n_{t \wedge S_a } - (g* \nu _{t \wedge S_a})\circ Z^{(n)} \vert \\&\quad = \left| O(\Delta ) + \left( g(1)\lambda + g(-1) \mu \right) Z_{\sigma _{t \wedge S_a(Z^{(n)})}} \left( t \wedge S_a(Z^{(n)}) - \sigma _{t \wedge S_a(Z^{(n)})} \Delta \right) \right| \\&\quad \le O(\Delta ) + \vert g(1)\lambda + g(-1) \mu \vert Z_{\sigma _{t \wedge S_a(Z^{(n)})}} \left( t \wedge S_a(Z^{(n)}) - \sigma _{t \wedge S_a(Z^{(n)})} \Delta \right) \end{aligned} \end{aligned}$$
(44)

It is clear that all the quantity inside \( \vert ..\vert \) are finite and for every \( \xi >0 \), and then there exists a natural number N such that for \( n > N \), we have

$$\begin{aligned} \vert g * \nu ^n_{t \wedge S_a } - (g* \nu _{t \wedge S_a})\circ Z^{(n)} \vert < \xi \end{aligned}$$
(45)

and hence we have the uniform convergence for \( g * \nu ^n_{t \wedge S_a } \) to \( (g* \nu _{t \wedge S_a})\circ Z^{(n)} \). For the bivariate case, the stochastic integral \( g*\nu \), where \( \nu \) is the Lévy measure of M, for any Borel function g is given by

$$\begin{aligned} \begin{aligned}&(g* \nu _{t \wedge S_a}) \circ \textbf{M}^{(n)} = g* \nu (\textbf{M}^{(n)};[0,t\wedge S_a(\textbf{M}^{(n)})],R)\\ =&\int _0^{t \wedge S_a(\textbf{M}^{(n)})} \int _{R} g(x) (\tilde{\varvec{\lambda }}_1 \delta _{(1,0)}(dx) + \tilde{\varvec{\lambda }}_2\delta _{(0,1)}(dx) \\&+ \tilde{\varvec{\mu }}_1\delta _{(0,-1)} (dx) + \tilde{\varvec{\mu }}_2 \delta _{(0,-1)}(dx))\textbf{M}_{s^-}^{(n)} ds \\ =&\int _0^{t \wedge S_a(\textbf{M}^{(n)})} \left( g(1,0)\tilde{\varvec{\lambda }}_1 + g(0,1)\tilde{\varvec{\lambda }}_2 +g(-1,0)\tilde{\varvec{\mu }}_1 + g(0,-1)\tilde{\varvec{\mu }}_2 \right) \textbf{M}_{s^-}^{(n)} ds \\ =&\sum _{k=1}^{t \wedge S_a(\textbf{M}^{(n)})} \left( g(1,0)\tilde{\varvec{\lambda }}_1 + g(0,1)\tilde{\varvec{\lambda }}_2 +g(-1,0)\tilde{\varvec{\mu }}_1 + g(0,-1)\tilde{\varvec{\mu }}_2 \right) \textbf{M}_{k-1}^{(n)}\Delta \\ {}&+ \left( g(1,0)\tilde{\varvec{\lambda }}_1 + g(0,1)\tilde{\varvec{\lambda }}_2 +g(-1,0)\tilde{\varvec{\mu }}_1 + g(0,-1)\tilde{\varvec{\mu }}_2 \right) \\ {}&\times \textbf{M}^{(n)}_{\sigma _{t \wedge S_a(\textbf{M}^{(n)})}} \left( t \wedge S_a(\textbf{M}^{(n)}) - \sigma _{t \wedge S_a(\textbf{M}^{(n)})}\right) \end{aligned} \end{aligned}$$
(46)

Then the absolute difference of two stochastic integrals is given by

$$\begin{aligned} \begin{aligned}&\vert g * \nu _{t \wedge S_a(\textbf{M}^{(n)})} - (g * \nu _{t \wedge S_a}) \circ \textbf{M}^{(n)} \vert \\&\quad \le O(\Delta ) + \left| g(1,0)\tilde{\varvec{\lambda }}_1 + g(0,1)\tilde{\varvec{\lambda }}_2 +g(-1,0)\tilde{\varvec{\mu }}_1 + g(0,-1)\tilde{\varvec{\mu }}_2 \right| \\&\quad \times \textbf{M}^{(n)}_{\sigma _{t \wedge S_a(\textbf{M}^{(n)})}} \left( t \wedge S_a(\textbf{M}^{(n)}) - \sigma _{t \wedge S_a(\textbf{M}^{(n)})}\right) \end{aligned} \end{aligned}$$
(47)

Hence the uniform convergence holds using similar argument as in the univariate case. Finally, the \( Z_t^{(n)} \), \( M_t^{(n)} \) converge weakly to \( Z_t \) and \( M_t \) respectively. \(\square \)

5 Simulation study

In this section, we outline the simulation algorithm for bivariate birth and death processes. Then estimation method, properties of estimators are investigated in the simulation study.

5.1 Simulation of bivariate birth and death process

The simulation algorithm of bivariate birth and death process \( \textbf{M}_t \) can be derived straightforwardly according to its ODE (11). Given the current population \( \textbf{M}_t \), the waiting time that a event (birth or death in either population) will happen follows exponential distribution with rate

$$\begin{aligned} \rho _t =(\lambda _{11} + \lambda _{12} + \mu _1) M_{1,t} + (\lambda _{21}+ \lambda _{22} + \mu _2) M_{2,t} \end{aligned}$$

Then the probability that this event will happen in population \( M_{1,t} \) is

$$\begin{aligned} p_1 = \frac{\lambda _{21} M_{2,t} + (\lambda _{11} + \mu _1) M_{1,t}}{\rho _t} \end{aligned}$$
(48)

The probability that this event will happen in population \( M_{2,t} \) would simply be \( p_2 = 1 - p_1 \). Suppose now an event happens in population \( M_{1,t} \), the probability that there is a new individual would be

$$\begin{aligned} p_1^b = \frac{\lambda _{11} M_{1,t} + \lambda _{21} M_{2,t}}{\lambda _{21} M_{2,t} + (\lambda _{11} + \mu _1) M_{1,t}}, \end{aligned}$$
(49)

and the probability that an individual dies is \( p_1^d = 1 - p_1^b \). Likewise, if the event happens in the population \( M_{2,t} \), the birth probability would be

$$\begin{aligned} p_2^b = \frac{\lambda _{12} M_{1,t} + \lambda _{22} M_{2,t}}{\lambda _{12} M_{1,t} + (\lambda _{22} + \mu _2) M_{2,t}} \end{aligned}$$
(50)

and death probability \( p_n^d = 1 - p_n^b \). Overall, the simulation algorithm is shown in the following Algorithm 1.

figure a

On the other hand, the simulation procedure of bivariate INAR(1) model is straightforward because the distribution of \( \textbf{Y}_t \) are indicated by the operator \( (\circ , \ *_1, \ *_2) \) given \( \textbf{Y}_{t-1} \).

5.2 Statistical inference of univariate and bivariate birth and death process

5.2.1 Quasi-MLE for bivariate LBD

In the univariate case, parameters estimation and their asymptotic properties are available in Keiding (1975). Suppose now we have the full information of the sample path, the exact inter-arrival times for each birth and death events \( \{\tau _i\}_{\{i= 0,1,2,\dots \}}\) on the sampling interval [0, T] where \( \tau _0 = 0 \), the maximum likelihood estimators for \( Z_t \) are

$$\begin{aligned} \hat{\lambda } = \frac{B_T}{X_T}, \quad \hat{\mu } = \frac{D_T}{X_T}, \quad X_T = \sum _{k=1}^{B_T + D_T} \tau _k Z_{\tau _{k-1}} + \left( T - \sum _{i=1}^n \tau _i\right) Z_T, \end{aligned}$$
(51)

where \( B_T, D_T \) are total number of birth and death events respectively. The asymptotic properties are given by fixed T and large population

$$\begin{aligned} \lim _{Z_0 \rightarrow \infty } \left( \frac{Z_0(e^{(\lambda - \mu )T}-1)}{\lambda - \mu }\right) ^{\frac{1}{2}} \left( {\begin{array}{c}\hat{\lambda } - \lambda \\ \hat{\mu } - \mu \end{array}}\right) \overset{D}{\rightarrow } \textbf{N}\left( \left( {\begin{array}{c}0\\ 0\end{array}}\right) , \ \begin{pmatrix} \lambda &{} 0 \\ 0 &{} \mu \\ \end{pmatrix}\right) \end{aligned}$$
(52)

In practice, one may not have exact information of inter-arrival time of the events. Instead, we have records for populations sampling over a fixed-length interval \( \Delta \) such that \( Z_0, Z_{\Delta }, Z_{2\Delta } \dots Z_{n\Delta }\) are available. Then to estimate the parameters \( \lambda ,\mu \), one can numerically maximize the Quasi log-likelihood function from the proposed INAR(1) model \( X_k = Z_{k\Delta }, \ k =0,1,\dots ,n \). The log likelihood function is given by,

$$ \begin{aligned} \begin{aligned}&\ell (\alpha ,p) = \sum _{k=1}^n \log \Pr (X_{k-1},X_k) \\&\Pr (X_{k-1}, X_k) \\&\quad ={\left\{ \begin{array}{ll} 1, \quad &{}X_{k-1} = X_k = 0 \\ (1-\alpha )^{X_{t-1}}, \quad &{}X_k = 0\\ \sum _{j=1}^{\min \{X_{k-1},X_k\}} f_b(j;X_{k-1},\alpha ) f_{nb}(X_k-j;j,p), \quad &{}X_{k-1}>0 \ \& \ X_k > 0, \end{array}\right. } \end{aligned} \end{aligned}$$
(53)

where \( f_b \) and \( f_{nb} \) are probability mass function of binomial and negative binomial random variables

$$\begin{aligned} f_b(x;n,\alpha ) = \left( {\begin{array}{c}n\\ x\end{array}}\right) \alpha ^x(1-\alpha )^{n-x} \quad f_{nb}(x;n,\beta ) = \left( {\begin{array}{c}n+x-1\\ n-1\end{array}}\right) \beta ^{n}(1-\beta )^x \end{aligned}$$
Table 1 Parameter setting for univariate case

The simulation is conducted as follow: we generate 1000 sample paths of \( Z_t \) using the parameters settings in Table 1. Since \( Z_t \) are continuous sample paths, we set up an equal-distance grid with sampling interval \( \Delta \). Then the equal-distance observations \( X_t\) are obtained by counting the total number of population up to each discrete time \((0,\Delta , 2\Delta ,\dots ,n\Delta )\) where \( n = \frac{T}{\Delta } \). The log likelihood function is then maximized by ’optim’ function with method = ’BFGS’ in R programming. Finally, we can recover the rate estimates by inverting the parametrization in equation (32) such that

$$\begin{aligned} \tilde{\lambda } = \frac{\frac{1-\hat{p}}{\hat{p}}\log \frac{\hat{\alpha }}{\hat{p}}}{\frac{\hat{\alpha }}{\hat{p}} - 1}, \quad \tilde{\mu } = \tilde{\lambda } - \frac{1}{\Delta }\log \frac{\hat{\alpha }}{\hat{p}} \end{aligned}$$
(54)

In the following, we will first explore how the size of \( \Delta \) would affect properties estimators, i.e. bias and mean square error (MSE), and how much more computational time we need compared to true MLE method. Four different size of sampling intervals \( \Delta = \{0.1,0.05,0.025,0.01\} \) is chosen and the results are presented in Table 2. The theoretical row shows the biased and MSE computed through equation (52). There is no surprise that the True MLE method from Eq. (51) performs the best, with lowest MSE and computational time. The Quasi-MLE method by constructing INAR model, on the other hand, becomes better as we decreasing the size of sampling interval \( \Delta \) but it still performs no better than the true MLE method and require much more computational time. The empirical distribution of these estimators are illustrated in Fig. 1 and since the general shape of distribution of \( \tilde{\lambda } \) and \( \tilde{\mu } \) has little difference, we will only show the distribution of \( \tilde{\lambda }\). It is clear that only the case \( \Delta = 0.01 \) has satisfactory normal shape compared to all other cases.

Table 2 Properties of different maximum likelihood estimators
Fig. 1
figure 1

The empirical distribution of estimated parameters. The top panel is the MLE from 51 and the rest of plots are MLE from INAR model. The solid lines are the true values of the parameters listed in Table 1 and the dash lines stand for empirical means

To achieve asymptotic normality for Quasi-MLE method from INAR model, one need not only large initial population, but also a small sampling interval \( \Delta \). In the following simulation, we would fix the sampling interval \( \Delta = 0.01 \) and investigate how the size of initial population would affect the asymptotic distribution of estimators and the computational time for estimation procedure. To explore the effect of \( Z_0 \) for asymptotic distribution, we choose \( Z_0 \in \{5,10,30,50\}\) and it seems from Fig. 2 that to ensure asymptotic normality for both estimators, one need at least \( Z_0 = 30 \), which is a large sample size in statistical sense.

Fig. 2
figure 2

Asymptotic distribution of \( \tilde{\lambda }, \tilde{\mu }\) with different \( Z_0 \)

The computational time with respect to \( Z_0 \in \{10,50,100,150,\dots ,500\} \) clearly shows a linear trend in Fig. 3. This is reasonable as the number of summation involved in Eq. (53) increases linearly with respect to \( Z_0 \)

Fig. 3
figure 3

The computational time for INAR models of 1000 sample paths

In summary, the Quasi-MLE method constructed from INAR model can reach moderate level of estimation accuracy and asymptotic normality with large initial population \( Z_0 \ge 30 \) and small sampling interval \( \Delta \le 0.01 \). However, it would require much more computational time than the true MLE method. This method should only be used in the case where we have no information on inter-arrival time of birth and death events.

5.2.2 Quasi-MLE for bivariate LBD

Since the bivariate INAR(1) model is a bivariate Markov Chain, the log likelihood function can be written as the sum of logarithm of transition probabilities. Denote \( \Theta = \{\alpha _1,\alpha _2,\beta _{11},\beta _{12},\beta _{21},\beta _{22}\} \) as the parameter space of bivariate INAR(1) model, then the likelihood function can be written as

$$\begin{aligned} \begin{aligned} \ell (\Theta )&= \sum _{t=1}^n \log \Pr (X_{t}, Y_{t} \vert X_{t-1}, Y_{t-1}) \\&= \sum _{t=1}^n \left( \log \Pr (X_t \vert X_{t-1}, Y_{t-1}) + \log \Pr (Y_t \vert X_{t-1}, Y_{t-1}) \right) \\&= \ell _x (\Theta _x) + \ell _y(\Theta _y), \end{aligned} \end{aligned}$$
(55)

where \( \Theta _x = \{\alpha _1, \beta _{11}, \beta _{21}\} \) and \( \Theta _y = \{\alpha _2, \beta _{12},\beta _{22}\} \). Because \( X_{t} \) and \( Y_t \) are independent of each other given the last state \( (X_{t-1}, Y_{t-t}) \), the likelihood function can be separated into two parts, \( \ell _x \) and \( \ell _y \) respectively. Then transition probability for \( X_t \) is given by

$$ \begin{aligned} \begin{aligned}&\Pr (X_t = z_1 \vert X_{t-1} = x, Y_{t-1} = y) \\&\quad ={\left\{ \begin{array}{ll} 1 \quad &{}z_1 = x = y = 0 \\ (1- \alpha _1)^x \beta _{21}^y \quad &{}z_1 = 0 \\ f_{nb}(z_1;y,\beta _{21}) \quad &{}x = 0 \ \& \ y>0 \\ \sum _{i=1}^{\min \{x,z_1\}} f_b(i;x,\alpha _1) f_{nb}(z_1 - i ; i,\beta _{11}) \quad &{}x> 0 \ \& \ y =0 \\ \sum _{j=1}^{z_1}\sum _{i=1}^{\min \{x,j\}} f_b(i;x,\alpha _1) f_{nb}(z_1 - i; i,\beta _{11}) f_{nb}(z_1 - j; y,\beta _{21}) \\ \quad + (1-\alpha _1)^x f_{nb}(z_1;y,\beta _{21}) \quad &{}x>0 \ \& \ y>0 \end{array}\right. } \end{aligned} \end{aligned}$$

The one for \(Y_t\) is

$$ \begin{aligned} \begin{aligned}&\Pr (Y_t = z_2 \vert X_{t-1} = x, Y_{t-1} = y) \\&\quad = {\left\{ \begin{array}{ll} 1 \quad &{}z_2 = x = y = 0 \\ (1- \alpha _2)^y \beta _{12}^x \quad &{}z_2 = 0 \\ f_{nb}(z_2;x,\beta _{12}) \quad &{}x> 0 \ \& \ y = 0 \\ \sum _{i=1}^{\min \{y,z_2\}} f_b(i;y,\alpha _2) f_{nb}(z_2 - i ; i,\beta _{22}) \quad &{}x = 0 \ \& \ y>0 \\ \sum _{j=1}^{z_2}\sum _{i=1}^{\min \{y,j\}} f_b(i;y,\alpha _2) f_{nb}(z_2 - i; i,\beta _{22}) f_{nb}(z_2 - j; y,\beta _{12}) \\ \quad + (1-\alpha _2)^y f_{nb}(z_2;x,\beta _{12}) \quad &{}x>0 \ \& \ y>0 \end{array}\right. } \end{aligned} \end{aligned}$$

One can then numerically maximize the log likelihood function \( \ell _x, \ell _y \) given the random samples \( \{ (X_0, Y_0), (X_1,Y_1), \dots , (X_n,Y_n) \} \). From the estimated parameters \( \hat{\Theta } \), we can solve the following system of equations to get the estimates \( \Theta _{bd} =\{\lambda _{11},\lambda _{12},\lambda _{21},\lambda _{22},\mu _1,\mu _2\} \) for bivariate birth and death process.

$$\begin{aligned} {\left\{ \begin{array}{ll} &{}\alpha _1(\Theta _{bd},\Delta ) - \hat{\alpha }_1 = 0 \\ &{}\alpha _2(\Theta _{bd},\Delta ) -\hat{\alpha }_2 = 0 \\ &{}\beta _{11}(\Theta _{bd},\Delta ) -\hat{\beta }_{11} = 0 \\ &{}\beta _{12}(\Theta _{bd},\Delta ) -\hat{\beta }_{12} = 0 \\ &{}\beta _{21}(\Theta _{bd},\Delta ) -\hat{\beta }_{21} = 0 \\ &{}\beta _{22}(\Theta _{bd},\Delta ) -\hat{\beta }_{22} = 0, \end{array}\right. } \end{aligned}$$
(56)

where the parametrization function \(.(\Theta _{bd},\Delta ) \) are given in equation 33 and \( \Delta \) is chosen based on the interpretation of birth and death rates. For example, when the random samples are collected on daily basis over a year \( t= 1 \), one can define \( \Delta = t/365 \). Then these parameters \( \Theta _{bd} \) are interpreted on an annual scale.

Table 3 Parameter setting for simulation
Fig. 4
figure 4

Empirical distribution estimators from bivariate INAR model. The solid lines are the true values of the parameters listed in Table 3 and the dash lines stand for empirical means

In the following, we will simulate the \( r_2 = 1000 \) sample paths of \( \textbf{M}_t \) based on the pre-specific parameters in Table 3. Then equal-distance gird with sampling interval \( \Delta \) is set up and random samples \( (\textbf{Y}_0, \textbf{Y}_1, \dots , \textbf{Y}_n ) \) are obtained, like the way mentioned in the univariate case. Then the likelihood functions \(\ell _x, \ell _y\) are maximized by ’optim’ in R with method being specified as ’BFGS’ and the maximum likelihood estimators \(\hat{\Theta }\) are obtained. Finally, we can obtain the estimators \(\hat{\Theta }_{bd}\) by numerically solving the system of equations (56) via a root-finding algorithm (e.g. Newton–Raphson method). Referring to the estimation results in univariate case, we focus on the choices of \( \Delta \in \{0.02,0.01,0.005\} \) as well as large initial population (40, 50) , and hopefully we can obtain asymptotic normality for each estimator. The empirical distribution of these estimators \(\Theta _{bd}\) are illustrated in Fig. 4 and their properties are summarized in Table 4.

Table 4 Properties of different maximum likelihood estimators
Fig. 5
figure 5

Empirical distribution estimators from bivariate INAR model

The bias and MSE of most estimators are decreasing with respect to \( \Delta \) as expected. However, the MSE of birth rates are much larger than the estimators of death rates. Except the estimators for death rates, all other estimators for birth rate are skewed to different directions and clearly non-normal distributed. This may caused by some of non-normal estimators for proposed INAR model illustrated in Fig. 5. In the classical setting where the innovation term is included, one need stationary condition to ensure asymptotic normality for all estimators of parameters, see Bu et al. (2008). And in our case, INAR model itself is not stationary and hence some of the estimate can be skewed.

Notice that the pair of birth rates that contributed to the same population, \( (\lambda _{11}, \lambda _{21}) \) and \( (\lambda _{12},\lambda _{22}) \) are skewed in opposite directions. It is then worthwhile to see whether the sum of these pair estimators has desired asymptotic properties and the results in Fig. 6 confirms our conjecture. Combining the simulation procedure of bivariate birth and death processes, Quasi-MLE method may not be able to distinguish the pair of birth rates contributed to the same population. Instead, it would provide good estimators for the scale of total birth rates \(\bar{\lambda }_1 = \hat{\lambda }_{11}r_m + \hat{\lambda }_{21} (1-r_m) \) and \( \bar{\lambda }_2 = \hat{\lambda }_{12} r_m + \hat{\lambda }_{22} (1-r_m)\) where \( r_m = \frac{{\mathbb {E}}[M_{1,t}]}{{\mathbb {E}}[M_{1,t} + M_{2,t}]} \). Furthermore, according to the proof A.1, the relationship between first moment of two population is given by

$$\begin{aligned} {\mathbb {E}}[M_{1,t}] = c{\mathbb {E}}[M_{2,t}] + (M_{1,0} - cM_{2,0})e^{-(\lambda _{12}c - \kappa _2)t}. \end{aligned}$$
(57)

As long as the whole process is not extinct with probability one, i.e. \(\kappa _1 \kappa _2 < \lambda _{12}\lambda _{21} \), the exponential power \( (\lambda _{12}c - \kappa _2) \) will always be positive and hence \( {\mathbb {E}}[M_{1,t}] \approx c{\mathbb {E}}[M_{2,t}] \) when t is large. In other words, the ratio

Fig. 6
figure 6

Empirical distribution for total birth rates. The solid lines are the true values of the parameters listed in Table 3 and the dash lines stand for empirical means

Table 5 Properties for total birth rates estimators
Fig. 7
figure 7

Empirical distribution for total birth rates. The solid lines are the true values of the parameters listed in Table 3 and the dash lines stand for empirical means

Table 6 Parameter setting for simulation
$$\begin{aligned} r_m = \frac{{\mathbb {E}}[M_{1,t}]}{{\mathbb {E}}[M_{1,t} + M_{2,t}]} \rightarrow \frac{c}{1 + c}, \end{aligned}$$
(58)

becomes a constant eventually. For the parameter setting in Table 3, \( c = 1.040833 \), \( r_m \approx \frac{1}{2} \) and hence \( \hat{\lambda }_{11} + \hat{\lambda }_{21} \) serves as an estimator for the total birth rate of \( M_{1,t} \). In practice, the c is unknown as true parameters need to be estimated. Then we can use the values at the end of sampling period to approximate \( r_m \), i.e.

$$\begin{aligned} r_m \approx \frac{M_{1,T}}{M_{1,T} + M_{2,T} } \end{aligned}$$
(59)

The properties of \( \bar{\lambda }_1, \bar{\lambda }_2 \) and their empirical distribution are shown in Table 5 and Fig. 7. These new estimators benefits from nice properties, low bias and MSE and they decreases as \( \Delta \) decreases. Most importantly, they are not skewed anymore and asymptotic normal.

Table 7 Properties of different maximum likelihood estimators
Table 8 Properties for total birth rates estimators
Fig. 8
figure 8

Empirical distribution for total birth rates. The solid lines are the true values of the parameters listed in Table 3 and the dash lines stand for empirical means

Let us try another parameter setting in Table 6 to verify this conjecture. Same simulation and estimation process as previous case and the results are shown in Tables 78 and Fig. 8. This time the constant c is 0.576306 and \( r_m = 0.365605 \). Similar to the last setting, the estimators for all birth rates are skewed and some of them have large bias and MSE. The estimators for total birth rate, on the other hand, are of low bias and MSE and they are again asymptotic normal.

Let us finally try another parameter setting in Table 9 where the \( \textbf{M}_t \) is going to be extinct eventually. It means that the exponential function in Eq. (57) can no longer be omitted. The results are illustrated in Table 10 and Fig. 9 and they look similar to the results of the first case. Nice properties for death rates’ estimators but skewed and non-normal for birth rates’ estimators.

Table 9 Parameter setting for simulation
Table 10 Properties of different maximum likelihood estimators

6 Concluding remarks

In this paper, we propose an integer-valued autoregressive model INAR(1) to approximate the continuous birth-and-death process. In univariate case, we propose a birth-death operator \(p *_1 \alpha \circ X\) which is the sum of zero-modified geometric random variable. The parametrization of p and \(\alpha \) can be determined by matching the first and second moment of continuous process. Then we propose an bivariate INAR(1) model to approximate bivariate birth and death process where birth probabilities will also depend on the size of the other population. The parametrization of this model can be obtained in a similar way. The convergence from discrete process to continuous process is proved by apply weak convergence theorem of locally bounded semimartingales. Due to the simple Markov structure of INAR(1) model, maximum likelihood estimation would be feasible. It is however not the case for bivariate and multivariate birth and death process. Basically, one can extend the result here to multivariate case, i.e. we can approximate multivariate birth and death process in Griffiths (1973) by multivariate INAR(1) model using the these operators \(*_1, *_2, \circ \) only as well as adding an immigrant process. However, the difficulty of expressing the parameters of INAR(1) model in terms of the parameters of multivariate birth and death process would be increasing and as we need to find out the first moment of birth and process explicitly.

Fig. 9
figure 9

Empirical distribution for individual birth and death rates. The solid lines are the true values of the parameters listed in Table 9 and the dash lines stand for empirical means