## 1 Introduction

A first-order integer-valued autoregressive (INAR(1)) process is defined by a sequence of non-negative integer-valued random variables $$\{X_t\}_{t\in \mathbb {N}}$$ satisfying

\begin{aligned} X_t=\alpha \circ X_{t-1}+\epsilon _t,\quad t\in {\mathbb {N}}, \end{aligned}
(1)

where “$$\circ$$” is the binomial thinning operator by Steutel and van Harn (1979), which is defined by $$\alpha \circ X_{t-1}\equiv \sum _{k=1}^{X_{t-1}}\zeta _{t,k}$$, $$\{\zeta _{t,k}\}_{t,k\in \mathbb {N}}$$ is a doubly infinite array of independent and identically distributed (iid) Bernoulli random variables with $${\text {P}}\,(\zeta _{t,k}=1)=1-{\text {P}}\,(\zeta _{t,k}=0)=\alpha \in (0,1)$$, and $$\{\epsilon _t\}_{t\in \mathbb {N}}$$ is assumed to be a sequence of iid non-negative integer-valued random variables, with $$\epsilon _t$$ independent of $$X_{s-1}$$ and $$\zeta _{s,k}$$, for all $$k\ge 1$$ and for all $$s\le t$$. INAR processes have been commonly used to fit count time series data. In the context of branching process, the random variable $$X_t$$ can be seen as the total population at time t, $$\alpha \circ X_{t-1}$$ is the number of survivors at time $$t-1$$, while $$\epsilon _t$$ stands for the immigration at time t. The random variable $$\zeta _{t,k}$$ denotes the survival or not of the k-th individual of the population at time $$t-1$$. Observe that, under the assumption that $$\mu _\epsilon \equiv E(\epsilon _t)<\infty$$, the conditional expectation of $$X_t$$ given $$X_{t-1}$$ is linear on $$X_{t-1}$$, that is,

\begin{aligned} E(X_t|X_{t-1})=\alpha X_{t-1}+\mu _\epsilon . \end{aligned}
(2)

The INAR process given in (1) has been studied by Alzaid and Al-Osh (1987), McKenzie (1988), and Dion et al. (1995). Conditional least squares estimation for this model was explored, for instance, by Wei and Winnicki (1990), Ispány et al. (2003), Freeland and McCabe (2005), and Rahimov (2008). For a comprehensive review on thinning-based INAR processes and some of their generalizations, see Scotto et al. (2015).

Alternative integer-valued processes based on non-additive innovation through maximum and minimum operations have been proposed in the literature. Littlejohn (1992) and Littlejohn (1996) considered a discrete minification processes based on a thickening operator; see also Kalamkar (1995) for an alternative class of minification processes. Scotto et al. (2016) defined a class of max-INAR processes by $$X_t=\max \{\alpha \circ X_{t-1},\epsilon _t\}$$, for $$t\ge 1$$, with “$$\circ$$" being the binomial thinning operator, while Aleksić and Ristić (2021) used the minimum function and a modified negative binomial INAR-operator to define their processes as $$X_t=\min \{\alpha \diamond X_{t-1},\epsilon _t\}$$, where $$\alpha \diamond X=\sum _{i=1}^{1+X}G_i$$, for a non-negative random variable X, with $$\{G_i\}_{i\in {\mathbb {N}}}$$ being a sequence of iid geometric distributed random variables with mean $$\alpha >0$$, where $$\{\epsilon _t\}_{t\in \mathbb {N}}$$ is defined as before.

For the count processes $$\{X_t\}_{t\in {\mathbb {N}}}$$ considered in these works, a certain non-linearity is induced in the sense that the conditional expectation $$E(X_t|X_{t-1})$$ is non-linear on $$X_{t-1}$$ (and also the conditional variance) in contrast with (2). We refer to these models as “non-linear" throughout this paper. On the other hand, the immigration interpretation in a populational context is lost due to the non-additive innovation assumption.

Our chief goal in this paper is to propose a novel class of non-linear INAR(1) processes that preserve the additive innovation, while still having a practical interpretation, in contrast with the existing non-linear INAR models, where this interpretation is lost. With that purpose we introduce a new INAR-operator, the so-called geometric thinning operator, which is of own interest. The new models are named non-linear INAR(1) (in short NonLINAR(1)) processes. We show that the proposed NonLINAR(1) processes can produce better results in terms of prediction when compared to the linear case, which is commonly considered in the literature. We now highlight other contributions of the present paper:

1. (i)

development of inferential procedures and numerical experiments, which are not well-explored for the existing non-linear models aforementioned;

2. (ii)

properties of the novel geometric thinning operator are established;

3. (iii)

a particular NonLINAR(1) process with geometric marginals is investigated in detail, including an explicit expression for the autocorrelation function;

4. (iv)

both stationary and non-stationary cases are explored, being the latter important for allowing the inclusion of covariates, a feature not considered by the aforementioned papers on non-linear INAR models;

5. (v)

a generalization of the geometric thinning operator and an associated NonLINAR process are also proposed and motivated for dealing with zero-inflated or zero-deflated count time series data.

The paper is organized as follows. In Sect. 2, we introduce the new geometric thinning operator and explore its properties. Section 3 is devoted to the development of the NonLINAR processes based on the new operator, with a focus on the case where the marginals are geometrically distributed. Two methods for estimating the model parameters are discussed in Sect. 4, including Monte Carlo simulations to evaluate the proposed estimators. In Sect. 5, we introduce a non-stationary NonLINAR process allowing for the inclusion of covariates and provide some Monte Carlo studies. Section 6 is devoted to two real data applications. Finally, in Sect. 7, we develop a generalization of the geometric thinning operator and an associated NonLINAR model.

## 2 Geometric thinning operator: definition and properties

In this section, we introduce a new thinning operator and derive its main properties. We begin by introducing some notation. For two random variables X and Y, we write $$\min \{X, Y\}: =X \wedge Y$$ to denote the minimum between X and Y. The probability generating function (pgf) of a non-negative integer-valued random variable Y is denoted by

\begin{aligned} \Psi _{Y}(s)= {\text {E}}\left( s^{Y}\right) =\sum _{k=0}^\infty s^k {\text {P}}\,(Y=k), \end{aligned}

for all values of s for which the right-hand side converges absolutely. The n-th derivative of $$\Psi _{Y}(x)$$ with respect to x and evaluated at $$x=x_0$$ is denoted by $$\Psi _{Y}^{(n)}(x_0)$$.

Let Z be a geometric random variable with parameter $$\alpha >0$$ and probability function assuming the form

\begin{aligned} {\text {P}}\,(Z = k) = \frac{\alpha ^k}{(1+\alpha )^{k+1}},\quad k=0,1,\dots . \end{aligned}

In this case, the pgf of X is

\begin{aligned} \Psi _{Z}(s) = \frac{1}{1+\alpha (1-s)},\quad |s|<1+\alpha ^{-1}, \end{aligned}
(3)

and the parameter $$\alpha$$ has the interpretation $$\alpha =E(Z)>0$$. The shorthand notation $$Z\sim \textrm{Geo}(\alpha )$$ will be used throughout the text. We are ready to introduce the new operator and explore some of its properties.

### Definition 1

(Geometric thinning operator) Let X be a non-negative integer-valued random variable, independent of $$Z\sim \textrm{Geo}(\alpha )$$, with $$\alpha >0$$. The geometric thinning operator $$\triangle$$ is defined by

\begin{aligned} \alpha \,{\triangle }\,X \equiv \min \left( X, Z\right) . \end{aligned}
(4)

### Remark 1

The operator $$\,{\triangle }\,$$ defined in (4) satisfies $$\alpha \,{\triangle }\,X \le X$$, like the classic binomial thinning operator $$\circ$$. Therefore, $$\,{\triangle }\,$$ is indeed a thinning operator.

In what follows, we present some properties of the proposed geometric thinning operator. We start by obtaining its probability generating function.

### Proposition 1

Let X be a non-negative integer-valued random variable with pgf $$\Psi _X$$. Then, the pgf of $$\alpha \,{\triangle }\,X$$ is given by

\begin{aligned} \Psi _{\alpha \,{\triangle }\,X}(s)=\frac{1 +\alpha (1-s) \Psi _{X}\left( \dfrac{\alpha s}{1+\alpha }\right) }{1+\alpha (1 - s)},\quad |s|<1+\alpha ^{-1}. \end{aligned}

### Proof

By the independence assumption between X and Z, it holds that

\begin{aligned} {\text {P}}\,(\alpha \,{\triangle }\,X = k)&= {\text {P}}\,(\alpha \,{\triangle }\,X \ge k) - {\text {P}}\,(\alpha \,{\triangle }\,X \ge k+1)\\ {}&= {\text {P}}\,( Z \ge k) {\text {P}}\,( X \ge k) - {\text {P}}\,( Z \ge k+1) {\text {P}}\,( X \ge k+1)\\&= \left( \frac{\alpha }{1+\alpha }\right) ^k\left[ {\text {P}}\,( X = k) + \frac{1}{1+\alpha } {\text {P}}\,( X \ge k+1)\right] . \end{aligned}

Hence,

\begin{aligned} \Psi _{\alpha \,{\triangle }\,X}(s)&=\sum _{k=0}^\infty \left( \frac{\alpha s}{1+\alpha }\right) ^k {\text {P}}\,( X = k) + \frac{1}{1+\alpha } \sum _{k=0}^\infty \left( \frac{\alpha s}{1+\alpha }\right) ^k {\text {P}}\,( X \ge k+1)\\&=\Psi _{X}\left( \frac{\alpha s}{1+\alpha }\right) - \frac{1}{1+\alpha } \Psi _{X}\left( \frac{\alpha s}{1+\alpha }\right) + \frac{1}{1+\alpha } \sum _{k=0}^\infty \left( \frac{\alpha s}{1+\alpha }\right) ^k {\text {P}}\,( X \ge k)\\&= \frac{\alpha }{1+\alpha }\Psi _{X}\left( \frac{\alpha s}{1+\alpha }\right) + \frac{1}{1+\alpha } \sum _{k=0}^\infty \left( \frac{\alpha s}{1+\alpha }\right) ^k {\text {P}}\,( X \ge k). \end{aligned}

The second term on the last equality can be expressed as

\begin{aligned} \frac{1}{1+\alpha } \sum _{k=0}^\infty \left( \frac{\alpha s}{1+\alpha }\right) ^k {\text {P}}\,( X \ge k)&= \frac{1}{1+\alpha } \sum _{k=0}^\infty \left( \frac{\alpha s}{1+\alpha }\right) ^k \sum _{l=k}^\infty {\text {P}}\,( X = l)\\ {}&= \frac{1}{1+\alpha } \sum _{l=0}^\infty \sum _{k=0}^l \left( \frac{\alpha s}{1+\alpha }\right) ^k {\text {P}}\,( X = l)\\&= \frac{1}{1+\alpha -\alpha s} \left[ 1- \frac{\alpha s}{1+\alpha } \Psi _{X}\left( \frac{\alpha s}{1+\alpha }\right) \right] . \end{aligned}

The result follows by rearranging the terms. $$\square$$

The next result gives us the moments of $$\alpha \,{\triangle }\,X$$, which will be important to discuss prediction and forecasting in what follows.

### Proposition 2

Let $$\triangle$$ be the geometric thinning operator in (4). It holds that the n-th factorial moment of $$\alpha \,{\triangle }\,X$$ is given by

\begin{aligned} {\text {E}}\,\big ((\alpha \,{\triangle }\,X)_n\big )=n!\alpha ^n\left\{ 1-\sum _{k=0}^{n-1}\dfrac{\Psi _{X}^{(k)}\left( \frac{\alpha }{1+\alpha }\right) }{k!(1+\alpha )^k}\right\} , \end{aligned}

for $$n\in {\mathbb {N}}$$, where $$(\alpha \,{\triangle }\,X)_n\equiv \alpha \,{\triangle }\,X\times (\alpha \,{\triangle }\,X-1)\times \cdots \times (\alpha \,{\triangle }\,X-n+1)$$.

### Proof

The result follows by using the pgf given in Proposition 1 and the generalized Leibniz rule for derivatives, namely $$(d_1d_2)^{(n)}(s)=\sum _{k=0}^n \left( {\begin{array}{c}n\\ k\end{array}}\right) d_1^{(n-k)}(s)d_2^{(k)}(s)$$, with $$d_1(s)=1 +\alpha (1-s) \Psi _{X}\left( \dfrac{\alpha s}{1+\alpha }\right)$$ and $$d_2(s)=\dfrac{1}{1+\alpha (1 - s)}$$. $$\square$$

In what follows, the notation $$X\Rightarrow Y$$ means X weakly converges to Y.

### Proposition 3

Let $$\triangle$$ be the geometric thinning operator in (4). Then,

1. (i)

$$\alpha \,{\triangle }\,X \Rightarrow 0, \quad \text {as } \alpha \rightarrow 0,$$

2. (ii)

$$\alpha \,{\triangle }\,X \Rightarrow X, \quad \text {as } \alpha \rightarrow \infty .$$

### Proof

The proof follows immediately from Proposition 1 and the Continuity Theorem for pgf’s. $$\square$$

We now show a property of the operator $$\,{\triangle }\,$$ of own interest.

### Proposition 4

Let $$Z_{1},\ldots ,Z_{n}$$ be independent geometric random variables with parameters $$\alpha _1,\ldots ,\alpha _n$$, respectively. Assume that $$X_1, \ldots , X_n$$ are non-negative integer-valued random variables independent of the Z’s, and let $$\alpha _i\,{\triangle }\,X_i=\min \left( X_i,Z_i\right)$$. Then,

\begin{aligned} \wedge _{k=1}^n \alpha _k \,{\triangle }\,X_k = \widetilde{\alpha }_n \,{\triangle }\,\wedge _{k=1}^n X_k, \end{aligned}
(5)

with $$\widetilde{\alpha }_n=\dfrac{\prod _{k=1}^n \alpha _k}{\prod _{k=1}^n(1+ \alpha _k) - \prod _{k=1}^n \alpha _k}$$, $$n\in {\mathbb {N}}$$.

### Proof

We prove (5) by induction on n. For $$n=2$$, it holds that

\begin{aligned} \wedge _{k=1}^2 \alpha _k \,{\triangle }\,X_k = \wedge _{k=1}^2 (X_k \wedge Z_{k})= (X_1\wedge X_2)\wedge (Z_{1}\wedge Z_{2})= \widetilde{\alpha }_2 \,{\triangle }\,\wedge _{k=1}^2 X_k, \end{aligned}

where $$\widetilde{\alpha }_2=\dfrac{\prod _{k=1}^2 \alpha _k}{\prod _{k=1}^2(1+ \alpha _k) - \prod _{k=1}^2 \alpha _k}$$. Assume that $$\wedge _{k=1}^{n-1} \alpha _k \,{\triangle }\,X_k = \widetilde{\alpha }_{n-1} \,{\triangle }\,\wedge _{k=1}^{n-1} X_k$$. Since

\begin{aligned} \wedge _{k=1}^n \alpha _k \,{\triangle }\,X_k= & {} (\wedge _{k=1}^{n-1} \alpha _k \,{\triangle }\,X_k) \wedge (\alpha _n \,{\triangle }\,X_n) = (\,\widetilde{\alpha }_{n-1} \,{\triangle }\,\wedge _{k=1}^{n-1} X_k) \wedge (\alpha _n \,{\triangle }\,X_n) \\= & {} \widetilde{\alpha }_n \,{\triangle }\,\wedge _{k=1}^n X_k, \end{aligned}

the proof is complete. $$\square$$

We finish this section by discussing the zero-modified geometric (ZMG) distribution and some of its properties. Such a distribution will play an important role in the construction of our model in Sect. 3. We say that a random variable Y follows a ZMG distribution with parameters $$\mu >0$$ and $$\pi \in (-1/\mu ,1)$$ if its probability function is given by

\begin{aligned} {\text {P}}\,(Y = k) = \left\{ \begin{array}{ll} \pi +(1-\pi )\dfrac{1}{(1+\mu )}, &{} \text {for } k=0,\\ (1-\pi )\dfrac{\mu ^k}{(1+\mu )^{k+1}}, &{} \text {for } k=1,2,\ldots . \end{array}\right. \end{aligned}

We denote $$Y\sim \textrm{ZMG}(\pi ,\mu )$$. The geometric distribution with mean $$\mu$$ is obtained as a particular case when $$\pi =0$$. For $$\pi <0$$ and $$\pi >0$$, the ZMG distribution is zero-deflated or zero-inflated with relation to the geometric distribution, respectively. For $$\pi \in (0,1)$$, Y satisfies the following equality in distribution: $$Y{\mathop {=}\limits ^{d}}BZ$$, where B and Z are independent random variables with Bernoulli (with success parameter $$\pi$$) and geometric (with mean $$\mu$$) distributions, respectively. The associated pgf can be computed as

\begin{aligned} \Psi _{Y}(s) = \frac{1+\pi \mu (1-s)}{1+\mu (1-s)},\quad |s|<1+\mu ^{-1}. \end{aligned}
(6)

Now, assume that $$X\sim \textrm{Geo}(\mu )$$, with $$\mu >0$$. We have that

\begin{aligned} {\text {P}}\,(\alpha \,{\triangle }\,X> z)&= {\text {P}}\,(X> z){\text {P}}\,(Z > z) = \left[ \left( \frac{\mu }{1+\mu }\right) \left( \frac{\alpha }{1+\alpha }\right) \right] ^{z+1}, \quad z=0,1,\dots , \end{aligned}

which means $$\alpha \,{\triangle }\,X\sim \textrm{Geo}\left( \dfrac{\alpha \mu }{1+\alpha + \mu }\right)$$.

In the next section, we introduce our class of non-linear INAR processes and provide some of their properties.

## 3 Non-linear INAR(1) processes

In this section, we introduce a novel class of non-linear INAR(1) processes based on the new geometric thinning operator $$\,{\triangle }\,$$ defined in Sect. 2 and explore a special case when the marginals are geometrically distributed.

### Definition 2

A sequence of random variables $$\{X_t\}_{t\in \mathbb {N}}$$ is said to be a non-linear INAR(1) process (in short NonLINAR(1)) if it satisfies the stochastic equation

\begin{aligned} X_t = \alpha \,{\triangle }\,X_{t-1} + \epsilon _t,\quad t\in \mathbb {N}, \end{aligned}
(7)

with $$\alpha \,{\triangle }\,X_{t-1} =\min \left( X_{t-1},Z_{t}\right)$$, $$\left\{ Z_{t}\right\} _{t\in \mathbb {N}}$$ being a sequence of iid random variables with $$Z_{1}{\sim }\text{ Geo }(\alpha )$$, $$\{\epsilon _t\}_{t\in \mathbb {N}}$$ being an iid non-negative integer-valued random variables called innovations, where $$\epsilon _t$$ is independent of $$X_{t-l}$$ and $$Z_{t-l+1}$$, for all $$l\ge 1$$, and $$X_0$$ is some starting non-negative value/random variable.

### Remark 2

The random variable $$Z_t$$ in Definition 2 determines the number of survivals at time t. Either the previous population is reduced to $$Z_t$$ individuals if $$Z_t<X_{t-1}$$ or everybody survives if $$Z_t\ge X_{t-1}$$. As argued in Remark 1, $$\,{\triangle }\,$$ is a thinning operator. We remark that the minimum operation to construct count processes is also known in the literature as minimization; for instance, see Littlejohn (1992). Non-linearity for count time series models is achieved by using minimum or maximum operations, but differently from the existing literature, our proposed methodology induces non-linearity to the model and keeps the additive innovation assumption at the same time. Therefore, the populational interpretation (with survivals and immigration processes) is still valid under our framework.

We now obtain some properties of the NonLINAR(1) processes.

### Proposition 5

The 1-step transition probabilities of the non-linear INAR(1) process, say $${\text {P}}\,(x,y) \equiv {\text {P}}\,(X_t=y \,|\,X_{t-1}=x)$$, are given by

\begin{aligned} {\text {P}}\,(x,y) = {\left\{ \begin{array}{ll} \displaystyle \sum _{k=0}^{x-1}{\text {P}}\,(Z = k){\text {P}}\,(\epsilon _t = y-k) + {\text {P}}\,(Z \ge x){\text {P}}\,(\epsilon _t = y-x), &{} \text {for}\ \, x \le y,\\ \displaystyle \sum _{k=0}^{y}{\text {P}}\,(Z = k){\text {P}}\,(\epsilon _t = y-k), &{} \text {for}\ \, x > y, \\ \end{array}\right. } \end{aligned}
(8)

for $$x,y =0,1,\dots$$. In particular, we have $${\text {P}}\,(0,y) = {\text {P}}\,(\epsilon _t = y)$$.

### Proof

For $$x=0$$, we have that $${\text {P}}\,(0,y) = {\text {P}}\,(\alpha \,{\triangle }\,0 + \epsilon _t = y) = {\text {P}}\,(\epsilon _t = y)$$. For $$x > 0$$, it follows that

\begin{aligned} {\text {P}}\,(x,y) = {\text {P}}\,(\alpha \,{\triangle }\,x + \epsilon = y) = \sum _{k=0}^y {\text {P}}\,(\alpha \,{\triangle }\,x = k){\text {P}}\,(\epsilon = y-k), \end{aligned}

where

\begin{aligned} {\text {P}}\,(\alpha \,{\triangle }\,x = z) = {\left\{ \begin{array}{ll} 0, &{} \text {for}\ \,x<z, \\ {\text {P}}\,(Z \ge z), &{} \text {for}\ \,x=z,\\ {\text {P}}\,(Z=z), &{} \text {for}\ \,x>z. \\ \end{array}\right. } \end{aligned}

This gives the desired transition probabilities in (8). $$\square$$

### Theorem 6

Let $$\{X_t\}_{t\in \mathbb {N}}$$ be a NonLINAR(1) process. Then, $$\{X_t\}_{t\in \mathbb {N}}$$ is stationary and ergodic.

### Proof

Note that $$\{X_t\}_{t\in \mathbb {N}}$$ is a homogeneous discrete time Markov chain (see Proposition (5)). Therefore, using the Markov property, it is not hard to see that $$(X_1, \dots , X_m)$$ and $$(X_k, \dots , X_{k+m})$$ have the same distribution, for all $$m\in \mathbb {N}$$ and for all $$k\in \mathbb {N}$$. This gives the stationarity of the process. To prove ergodicity, let $$\sigma (X_t, X_{t+1}, \dots )$$ be the sigma-field generated by the random variables $$X_t, X_{t+1}\dots$$. Definition (7) leads to

\begin{aligned} \sigma (X_t, X_{t+1}, \dots )\subset \sigma (\alpha \,{\triangle }\,X_t, \epsilon _t, \alpha \,{\triangle }\,X_{t+1}, \epsilon _{t+1}, \dots ), \end{aligned}

where $$\{\alpha \,{\triangle }\,X_{t}\}_{t\in \mathbb {N}}$$ and $$\{\epsilon _t\}_{t\in \mathbb {N}}$$ are independent sequences. Hence

\begin{aligned} \bigcap _{t=1}^{\infty } \sigma (X_t, X_{t+1}, \dots )\subset \bigcap _{t=1}^{\infty } \sigma (\alpha \,{\triangle }\,X_t, \epsilon _t, \alpha \,{\triangle }\,X_{t+1}, \epsilon _{t+1}, \dots ) \equiv \mathcal {T}. \end{aligned}

Since $$\mathcal {T}$$ is a tail sigma-field, it follows by Kolmogorov’s 0–1 Law that every event $$A\in \mathcal {T}$$ has probability 0 or 1. It follows that the process is ergodic (see Shiryaev (2019), Definition 2, pg. 43). $$\square$$

### Proposition 7

The joint pgf of the discrete random vector $$(X_t, X_{t-1})$$ is given by

\begin{aligned} \Psi _{X_t, X_{t-1}}(s_1, s_2) = \frac{\Psi _{\epsilon }(s_1)}{1-\alpha (s_1-1)}\left[ \Psi _X(s_2)-\alpha (s_1-1)\Psi _X\left( \frac{s_1s_2\alpha }{1+\alpha }\right) \right] , \end{aligned}
(9)

with $$\Psi _X(\cdot )$$ being the pgf of X and $$\Psi _\epsilon (\cdot )$$ as in (11), where $$s_1$$ and $$s_2$$ belong to some intervals containing the value 1.

### Proof

We have that

\begin{aligned}{} & {} \Psi _{X_t, X_{t-1}}(s_1, s_2) = {\text {E}}\,\left( s_1^{X_t}s_2^{X_{t-1}}\right) = {\text {E}}\,\left( s_1^{\alpha \,{\triangle }\,X_{t-1}+ \epsilon _t}s_2^{X_{t-1}}\right) \\{} & {} \quad = \Psi _{\epsilon _t}(s_1){\text {E}}\,\left( s_2^{X_{t-1}}{\text {E}}\,\left( s_1^{\alpha \,{\triangle }\,X_{t-1}}\,|\,X_{t-1}\right) \right) , \end{aligned}

where

\begin{aligned} {\text {E}}\left( s_1^{\alpha \,{\triangle }\,X}\,|\,X=x\right)&= \sum _{k=0}^{x-1}s_1^k{\text {P}}\,(Z=k) + s_1^x {\text {P}}\,(Z\ge x) = \frac{1-\alpha (s_1-1)\left[ s_1\alpha /(1+\alpha )\right] ^x}{1-\alpha (s_1-1)}. \end{aligned}
(10)

Therefore,

\begin{aligned} \Psi _{X_t, X_{t-1}}(s_1, s_2)&= \Psi _{\epsilon _t}(s_1){\text {E}}\,\left( \frac{s_2^X}{1-\alpha (s_1-1)}-\frac{\alpha (s_1-1)}{1-\alpha (s_1-1)}\left( \frac{s_1s_2\alpha }{1+\alpha }\right) ^X\right) \\&= \frac{\Psi _{\epsilon _t}(s_1)}{1-\alpha (s_1-1)}\left[ \Psi _X(s_2) - \alpha (s_1-1)\Psi _X\left( \frac{s_1s_2\alpha }{1+\alpha }\right) \right] . \end{aligned}

$$\square$$

### Proposition 8

The 1-step ahead conditional mean and conditional variance are given by

\begin{aligned} {\text {E}}\,(X_t\,|\,X_{t-1})&= \alpha \left[ 1 - \left( \displaystyle \frac{\alpha }{1+\alpha }\right) ^{X_{t-1}}\right] + \mu _\epsilon ,\\ {\text {Var}}\,(X_t\,|\,X_{t-1})&=\alpha \left[ 1-\left( \frac{\alpha }{1+\alpha }\right) ^{X_{t-1}}\right] \left[ 1+\alpha \left( 1+\left( \frac{\alpha }{1+\alpha }\right) ^{X_{t-1}}\right) \right] \\&\quad -2\alpha X_{t-1}\left( \frac{\alpha }{1+\alpha }\right) ^{X_{t-1}} + \sigma _\epsilon ^2, \end{aligned}

respectively.

### Proof

From the definition of the NonLINAR(1) processes, we obtain that

\begin{aligned} {\text {E}}\,(X_t\,|\,X_{t-1}=x) = {\text {E}}\,(\alpha \,{\triangle }\,X_{t-1} + \epsilon _t \,|\,X_{t-1}=x) ={\text {E}}\,(\alpha \,{\triangle }\,X_{t-1}\,|\,X_{t-1}=x)+\mu _{\epsilon }, \end{aligned}

for all $$x=0,1,\dots$$. The conditional expectation above can be obtained from Proposition 2 with X being a degenerate random variable at x (i.e. $$P(X=x)=1$$). Then, it follows that

\begin{aligned} {\text {E}}\,(X_t\,|\,X_{t-1}=x)=\alpha \left[ 1-\left( \frac{\alpha }{1+\alpha }\right) ^{x}\right] + \mu _\epsilon . \end{aligned}

The conditional variance can be derived analogously, so details are omitted. $$\square$$

### Remark 3

Note that the conditional expectation and variance given in Proposition 8 are non-linear on $$X_{t-1}$$ in contrast with the classic INAR processes where they are linear.

From now on, we focus our attention on a special case from our class of non-linear INAR processes when the marginals are geometrically distributed. From (7), it follows that a NonLINAR(1) process with geometric marginals is well-defined if the function $$\Psi _{\epsilon _1}(s)\equiv \dfrac{\Psi _X(s)}{\Psi _{\alpha \,{\triangle }\,X}(s)}$$ is a proper pgf, with s belonging to some interval containing the value 1, where $$\Psi _X(s)$$ and $$\Psi _{\alpha \,{\triangle }\,X}(s)$$ are the pgf’s of geometric distributions with means $$\mu$$ and $$\dfrac{\alpha \mu }{1+\alpha + \mu }$$, respectively. More specifically, we have

\begin{aligned} \Psi _{\epsilon _1}(s) = \dfrac{1+\frac{\alpha }{1+\mu +\alpha }\mu (1-s)}{1+\mu (1-s)},\quad |s|<1+\mu ^{-1}, \end{aligned}
(11)

which corresponds to the pgf of a zero-modified geometric distribution with parameters $$\mu$$ and $$\alpha /(1+\mu +\alpha )$$; see (6). This shows that a NonLINAR(1) process with geometric marginals is well-defined.

### Definition 3

The stationary geometric non-linear INAR (Geo-NonLINAR) process $$\{X_t\}_{t\in \mathbb {N}}$$ is defined by assuming that (7) holds with $$\{\epsilon _t\}_{t\in \mathbb {N}}{\mathop {\sim }\limits ^{iid}}\textrm{ZMG}\left( \dfrac{\alpha }{1+\mu +\alpha },\mu \right)$$ and $$X_0\sim \textrm{Geo}(\mu )$$.

### Remark 4

Note that imposing a geometric distribution for the marginals of the NonLINAR process implies that the innovations are ZMG distributed. Conversely, assuming a ZMG distribution as above for the innovations implies that the marginals are geometrically distributed. Therefore, Definition 3 ensures that the process has geometric marginals.

From (11), we have that the mean and variance of the innovations $$\{\epsilon _t\}_{t\ge 1}$$ are given by

\begin{aligned} \mu _\epsilon := {\text {E}}\,(\epsilon _t) = \frac{\mu (1+\mu )}{1+\mu +\alpha } \quad \text {and} \quad \sigma _\epsilon ^2:= {\text {Var}}\,(\epsilon _t) = \frac{\mu (1+\mu )}{1+\mu +\alpha }\left[ 1+\frac{\mu (1+\mu +2\alpha )}{1+\mu +\alpha }\right] , \end{aligned}

respectively. Additionally, the third and forth moments of the innovations are

\begin{aligned}{} & {} E(\epsilon _t^3)=\dfrac{\mu (1+\mu )}{1+\mu +\alpha }(6\mu ^2+4\mu +1)\quad \text{ and }\quad \\{} & {} \quad E(\epsilon _t^4)=\dfrac{\mu (1+\mu )}{1+\mu +\alpha }(24\mu ^3+36\mu ^2+12\mu +5). \end{aligned}

### Proposition 9

The autocovariance and autocorrelation functions at lag 1 of the Geo-NonLINAR process are respectively given by

\begin{aligned}{} & {} \gamma (1) \equiv {\text {Cov}}(X_t, X_{t-1}) = \frac{\mu \alpha (1+\mu )(1+\alpha )}{(1+\mu +\alpha )^2}\quad \text{ and }\quad \nonumber \\{} & {} \quad \rho (1) \equiv {\text {Corr}}\,(X_t, X_{t-1}) = \frac{\alpha (1+\alpha )}{(1+\mu +\alpha )^2}. \end{aligned}
(12)

### Proof

We have that $${\text {Cov}}(X_t, X_{t-1}) = {\text {E}}\,(X_tX_{t-1}) - {\text {E}}\,(X_t){\text {E}}\,(X_{t-1})$$, with

\begin{aligned} {\text {E}}\,(X_tX_{t-1})&= {\text {E}}\,\left[ {\text {E}}\,(X_tX_{t-1}\,|\,X_{t-1})\right] = {\text {E}}\,\left[ X_{t-1}{\text {E}}\,(X_t\,|\,X_{t-1})\right] \\&= \alpha {\text {E}}\,(X_{t-1}) - \alpha {\text {E}}\,\left[ X_{t-1}\left( \frac{\alpha }{1+\alpha }\right) ^{X_{t-1}}\right] + \mu _\epsilon {\text {E}}\,(X_{t-1})\\&= \mu \alpha - \frac{\mu \alpha ^2(1+\alpha )}{(1+\mu +\alpha )^2} + \frac{\mu ^2(1+\mu )}{1+\mu +\alpha }. \end{aligned}

After some algebra, the result follows. $$\square$$

In the following proposition, we obtain an expression for the conditional expectation $$E(X_t|X_{t-k}=\ell )$$. This function will be important to find the autocovariance function at lag $$k\in {\mathbb {N}}$$ and to perform prediction and/or forecasting.

### Proposition 10

For $$\alpha >0$$, define $$h_j=\frac{(1+\alpha )^{j-1}}{(1+\alpha )^j-\alpha ^j}$$ and $$g_j=\frac{\alpha (1+\alpha )^{j-1}-\alpha ^j}{(1+\alpha )^j-\alpha ^j}$$, and the real functions

\begin{aligned} f_j(x)= \Psi _{\epsilon _1}({\alpha _*}^{j-1})\left( h_j+g_jx\right) , \end{aligned}

$$j=2,3,\dots$$, where $$\alpha _*\equiv \frac{\alpha }{1+\alpha }$$ and $$x\in \mathbb {R}$$, and $$\Psi _{\epsilon _1}$$ is given in (11). Finally, let $$H_{k}(x)=f_2(\dots (f_{k-1}(f_k(x))))$$. Then, for all $$\ell \in {\mathbb {N}}_*\equiv {\mathbb {N}}\cup \{0\}$$,

\begin{aligned} E(X_t|X_{t-k}=\ell )=\alpha \left( 1-H_k\left( {\alpha _*}^{k\ell }\right) \right) +\mu _{\epsilon }, \end{aligned}
(13)

for all integer $$k\ge 2$$.

### Proof

Let $$\mathcal {F}_t=\sigma (X_1,\dots ,X_t)$$ denote the sigma-field generated by the random variables $$X_1,\dots ,X_t$$. By the Markov property it is clear that

\begin{aligned} E(X_t|X_{t-k})=E(X_t|\mathcal {F}_{t-k})=E[E(X_t|\mathcal {F}_{t-k+1})|\mathcal {F}_{t-k}], \end{aligned}
(14)

for all $$k\ge 1$$. The proof proceeds by induction on k. Equation (14) and Proposition 8 give us that

\begin{aligned} E(X_t|X_{t-2})&=E[E(X_t|X_{t-1})|X_{t-2}]=E\left[ \alpha (1-{\alpha _*}^{X_{t-1}})+\mu _{\epsilon }|X_{t-2}\right] \\&=\alpha [1-E(\alpha _*^{\alpha \,{\triangle }\,X_{t-2}+\epsilon _{t-1}}|X_{t-2})]+\mu _{\epsilon }\\&=\alpha [1-\Psi _{\epsilon _{1}}(\alpha _*)E({\alpha _*}^{\alpha \,{\triangle }\,X_{t-2}}|X_{t-2})]+\mu _{\epsilon }, \end{aligned}

with $$\alpha _*=\frac{\alpha }{1+\alpha }$$. Using (10), we obtain that

\begin{aligned} E(X_t|X_{t-2}=\ell )&=\alpha [1-\Psi _{\epsilon _{1}}(\alpha _*)E({\alpha _*}^{\alpha \,{\triangle }\,X_{t-2}}|X_{t-2}=\ell )]+\mu _{\epsilon }\\&=\alpha \left[ 1-\Psi _{\epsilon _{1}}(\alpha _*)\frac{1-\alpha (\alpha _*-1)[\alpha _*\alpha /(1+\alpha )]^{\ell }}{1-\alpha (\alpha _*-1)}\right] +\mu _{\epsilon }\\&=\alpha \left[ 1-\Psi _{\epsilon _{1}}(\alpha _*)\left( \frac{1+\alpha }{(1+\alpha )^2-\alpha ^2}+\frac{\alpha (1+\alpha )-\alpha ^2}{(1+\alpha )^2-\alpha ^2}{\alpha _*}^{2\ell }\right) \right] +\mu _{\epsilon }\\&=\alpha \left( 1-H_2\left( {\alpha _*}^{2\ell }\right) \right) +\mu _{\epsilon }. \end{aligned}

Assume that (13) is true for $$k=n-1$$. Using (14), we have

\begin{aligned} E(X_t|X_{t-n})= & {} E(X_t|\mathcal {F}_{t-n})=E[E(X_t|\mathcal {F}_{t-(n-1)})|\mathcal {F}_{t-n}]\\= & {} \alpha \left( 1-E\left( H_{n-1}\left( {\alpha _*}^{(n-1)X_{t-(n-1)}}\right) \Big |X_{t-n}\right) \right) + \mu _{\epsilon }. \end{aligned}

From the definition of $$H_n$$, we obtain

\begin{aligned}{} & {} E\left( H_{n-1}\left( {\alpha _*}^{(n-1)X_{t-(n-1)}}\right) \Big |X_{t-n}=\ell \right) \\{} & {} \quad =f_2\left( \dots \left( f_{n-2}\left( E\left( f_{n-1}\left( {\alpha _*}^{(n-1)X_{t-(n-1)}}\right) \Big |X_{t-n}=\ell \right) \right) \right) \right) . \end{aligned}

Note that

\begin{aligned}&E\left( f_{n-1}\left( {\alpha _*}^{(n-1)X_{t-(n-1)}}\right) |X_{t-n}=\ell \right) =h_{n-1}+g_{n-1}E\left[ {\alpha _*}^{(n-1)X_{t-(n-1)}}|X_{t-n}=\ell \right] \\&\quad =h_{n-1}+g_{n-1}E\left[ {\alpha _*}^{(n-1)(\alpha \,{\triangle }\,X_{t-n})}|X_{t-n}=\ell \right] \\&\quad =h_{n-1}+g_{n-1}\left[ \Psi _{\epsilon _{1}}({\alpha _*}^{n-1})\frac{1-\alpha ({\alpha _*}^{n-1}-1)[{\alpha _*}^{n-1}\alpha /(1+\alpha )]^{\ell }}{1-\alpha ({\alpha _*}^{n-1}-1)}\right] \\&\quad =h_{n-1}+g_{n-1}\left[ \Psi _{\epsilon _{1}}({\alpha _*}^{n-1})\left( \frac{(1+\alpha )^{n-1}}{(1+\alpha )^n-\alpha ^n}+\frac{\alpha (1+\alpha )^{n-1}-\alpha ^n}{(1+\alpha )^n-\alpha ^n}{\alpha _*}^{n\ell }\right) \right] \\&\quad =f_{n-1}(f_n({\alpha _*}^{n\ell })). \end{aligned}

Therefore, $$E\left( H_{n-1}\left( {\alpha _*}^{(n-1)X_{t-(n-1)}}\right) \Big |X_{t-n}=\ell \right) =f_2(\dots (f_{n-1}(f_n({\alpha _*}^{nl}))))=H_n({\alpha _*}^{n\ell })$$, and hence we get the desired expression $$E(X_t|X_{t-n}=\ell )=\alpha \big (1-H_n\big ({\alpha _*}^{n\ell }\big )\big )+ \mu _{\epsilon }$$, which completes the proof. $$\square$$

### Proposition 11

Let $$h_j$$, $$g_j$$ be as in Proposition 10 and write $$\tilde{h}_j=\mu h_j$$, for $$j\in {\mathbb {N}}$$. It holds that

\begin{aligned} \gamma (k):=Cov(X_t,X_{t-k})=\alpha \mu \left[ 1-H_k(G(\alpha ,\mu ,k))\right] +\mu \left( \mu _{\epsilon }-\mu \right) , \end{aligned}

where $$G(\alpha ,\mu ,k)=\frac{\alpha _*}{k(1+\mu (1-{\alpha _*}^k))^2}$$, and $$H_k(\cdot )$$ as defined in Proposition , for $$k\in {\mathbb {N}}$$.

### Proof

Note that

\begin{aligned} \gamma (k)= & {} E(E(X_t X_{t-k}|X_{t-k}))-\mu ^2=E(X_{t-k}E(X_t| X_{t-k}))-\mu ^2\nonumber \\= & {} \alpha \left( \mu -E\left( X_{t-k}H_{k}\left( {\alpha _*}^{kX_{t-k}}\right) \right) \right) +\mu (\mu _{\epsilon }-\mu ), \end{aligned}
(15)

where the third equality follows by (13). A thorough inspection of the definition of $$H_k$$ gives

\begin{aligned} E\left( X_{t-k}H_k\left( {\alpha _*}^{kX_{t-k}}\right) \right) =\tilde{f}_2\left( \dots \left( \tilde{f}_k\left( E\left( X_{t-k}{\alpha _*}^{kX_{t-k}}\right) \right) \right) \right) , \end{aligned}
(16)

where we have defined $$\tilde{f}_j(x)=\Psi _{\epsilon _1}({\alpha _*}^{j-1})\left( \tilde{h}_j+g_jx\right)$$, for $$j\in {\mathbb {N}}$$.

Note that the argument of the function in (16) is just a constant times the derivative of $$\Psi _{X_1}(s)$$ with respect to s and evaluated at $${\alpha _*}$$. More specifically,

\begin{aligned} E\left( X_{t-k}{\alpha _*}^{kX_{t-k}}\right) =\frac{\alpha _*}{k}\Psi '_{X_1}({\alpha _*}^{k})=\mu \frac{\alpha _*}{k(1+\mu (1-{\alpha _*}^k))^2}=\mu G(\alpha ,\mu ,k).\qquad \end{aligned}
(17)

The second equality follows from (3). Plugging (17) in (16), we obtain

\begin{aligned} E\left( X_{t-k}H_k\left( {\alpha _*}^{kX_{t-k}}\right) \right)= & {} \tilde{f}_2(\dots (\tilde{f}_k (\mu G(\alpha ,\mu ,k))))=\mu f_2\left( \dots \left( {f}_k \left( G\left( \alpha ,\mu ,k\right) \right) \right) \right) \nonumber \\= & {} \mu H_k(G(\alpha ,\mu ,k)). \end{aligned}
(18)

The result follows by plugging (18) in (15). $$\square$$

## 4 Parameter estimation

In this section, we discuss estimation procedures for the geometric NonLINAR process through conditional least squares (CLS) and maximum likelihood methods. We assume that $$X_1,\ldots ,X_n$$ is a trajectory from the Geo-NonLINAR model with observed values $$x_1,\ldots ,x_n$$, where n stands for the sample size. We denote the parameter vector by $$\varvec{\theta }\equiv (\mu , \alpha )^\top$$.

For the CLS method, we define the function $$Q_n(\varvec{\theta })$$ as

\begin{aligned} Q_n(\varvec{\theta })\equiv & {} \sum _{t=2}^n\left\{ x_t-{\text {E}}\,(X_t\,|\,X_{t-1}=x_{t-1})\right\} ^2 = \sum _{t=2}^n\left\{ x_t-\alpha \left[ 1-\left( \frac{\alpha }{1+\alpha }\right) ^{x_{t-1}}\right] \right. \nonumber \\{} & {} \quad \left. - \frac{\mu (1+\mu )}{1+\mu +\alpha }\right\} ^2. \end{aligned}
(19)

The CLS estimators are obtained as the argument that minimizes $$Q_n(\varvec{\theta })$$, i.e.

\begin{aligned} \hat{\varvec{\theta }}_{cls} = {\mathop {\mathrm{arg\, min}}\limits _{\varvec{\theta }}}\, Q_n(\varvec{\theta }). \end{aligned}
(20)

Since we do not have an explicit expression for $$\hat{\varvec{\theta }}_{cls}$$, numerical optimization methods are required to solve (20). This can be done through optimizer packages implemented in softwares such as R (R Core Team 2021) and MATLAB. The gradient function associated with $$Q_n(\cdot )$$ can be provided for these numerical optimizers and is given by

\begin{aligned} \frac{\partial Q_n(\varvec{\theta })}{\partial \mu } =&-2\left[ 1 - \frac{\alpha \left( 1+\alpha \right) }{(1+\mu +\alpha )^2}\right] \sum _{t=2}^n \left[ x_t - \alpha \left( 1-\left( \frac{\alpha }{1+\alpha }\right) ^{x_{t-1}}\right) \right. \\&\left. - \frac{\mu (1+\mu )}{1+\mu +\alpha }\right] \quad \text {and}\\ \frac{\partial Q_n(\varvec{\theta })}{\partial \alpha } =&-2 \sum _{t=2}^n \left[ x_t - \alpha \left( 1-\left( \frac{\alpha }{1+\alpha }\right) ^{x_{t-1}}\right) - \frac{\mu (1+\mu )}{1+\mu +\alpha }\right] \\&\left[ 1 - \left( \frac{\alpha }{1+\alpha }\right) ^{x_{t-1}}\left( 1+\frac{x_{t-1}}{1+\alpha }\right) - \frac{\mu (1+\mu )}{(1+\mu +\alpha )^2}\right] . \end{aligned}

A strategy to get the standard errors of the CLS estimates based on bootstrap is proposed and illustrated in our empirical illustrations; please see Sect. 6.

We now discuss the maximum likelihood estimation (MLE) method. Note that our proposed Geo-NonLINAR process is a Markov chain (by definition) and therefore the likelihood function can be expressed in terms of the 1-step transition probabilities derived in Proposition 5. The MLE estimators are obtained as the argument that maximizes the log-likelihood function, that is, $$\hat{\varvec{\theta }}_{mle} = {{\,\mathrm{arg\,max}\,}}_{\varvec{\theta }}\ell _n(\varvec{\theta })$$, with

\begin{aligned} \ell _n(\varvec{\theta })&= \sum _{t=2}^n \log {\text {P}}\,(X_t = x_t \,|\,X_{t-1} = x_{t-1})+\log {\text {P}}\,(X_1=x_1), \end{aligned}
(21)

where the conditional probabilities in (21) are given by (8) and $${\text {P}}\,(X_1=x_1)$$ is the probability function of a geometric distribution with mean $$\mu$$. There is no closed-form expression available for $$\hat{\varvec{\theta }}_{mle}$$. The maximization of (21) can be accomplished through numerical methods such as the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm implemented in the R package optim. The standard errors of the maximum likelihood estimates can be obtained by using the Hessian matrix associated with (21), which can be evaluated numerically.

In the remaining of this section, we examine and compare the finite-sample behavior of the CLS and MLE methods via Monte Carlo (MC) simulation with 1000 replications per set of parameter configurations, with the parameter estimates computed under both approaches. All the numerical experiments presented in this paper were carried out using the R programming language.

We consider four simulation scenarios with different values for $$\varvec{\theta }= (\mu , \alpha )^\top$$, namely: (I) $$\varvec{\theta }= (2.0, 1.0)^\top$$, (II) $$\varvec{\theta }= (1.2, 0.5)^\top$$, (III) $$\varvec{\theta }= (0.5, 1.5)^\top$$, and (IV) $$\varvec{\theta }= (0.3,0.5)^\top$$. To illustrate these configurations, we display in Fig. 1 simulated trajectories from the Geo-NonLINAR process and their associated autocorrelation function (ACF) and partial autocorrelation function (PACF) under Scenarios I and IV. In Table 1, we report the empirical mean and root mean squared error (RMSE) of the parameter estimates obtained from the MC simulation based on the MLE and CLS methods. We can observe that both approaches produce satisfactory results and also a slight advantage of the MLE estimators over the CLS for estimating $$\alpha$$, mainly in terms of RMSE, which is already expected. This advantage can also be seen from Fig. 2, which presents boxplots of the parameter estimates for $$\mu$$ and $$\alpha$$ under the Scenarios I and IV with sample sizes $$n = 100$$ and $$n = 500$$. In general, the estimation procedures considered here produce estimates with bias and RMSE decreasing towards zero as the sample size increases, therefore giving evidence of consistency. We also present in Fig. 3 the histograms of the standardized Monte Carlo estimates of $$\mu$$ and $$\alpha$$ under MLE and CLS approaches along with standard normal density curves. From this figure, we can observe a good normal approximation of the proposed estimators.

## 5 Dealing with non-stationarity

In many practical situations, stationarity can be a non-realistic assumption; for instance, see Brännäs (1995), Enciso-Mora et al. (2009), and Wang (2020) for works that investigate non-stationary Poisson INAR process. Motivated by that, in this section, we propose a non-stationary version of the Geo-NonLINAR process allowing for time-varying parameters. Consider

\begin{aligned} \mu _t=\exp (\textbf{w}_t^\top \varvec{\beta })\quad \text{ and }\quad \alpha _t=\exp (\textbf{v}_t^\top \varvec{\gamma }), \end{aligned}

where $$\textbf{w}_t$$ and $$\textbf{v}_t$$ are $$p\times 1$$ and $$q\times 1$$ covariate vectors for $$t\ge 1$$, and $$\varvec{\beta }$$ and $$\varvec{\gamma }$$ are $$p\times 1$$ and $$q\times 1$$ vectors of associated regression coefficients.

We define a time-varying or non-stationary Geo-NonLINAR process by

\begin{aligned} X_t=\alpha _t\,{\triangle }\,X_{t-1}+\epsilon _t, \,\,\,t=2,3,\ldots , \end{aligned}
(22)

and $$X_1\sim \text{ Geo }(\mu _1)$$, where $$\alpha _t\,{\triangle }\,X_{t-1}=\min (X_{t-1},Z_t)$$, $$\{Z_t\}_{t\in \mathbb {N}}$$ is an independent sequence with $$Z_t\sim \text{ Geo }(\alpha _t)$$, $$\{\epsilon _t\}_{t\ge 1}$$ are independent random variables with $$\epsilon _t\sim \text{ ZMG }\left( \dfrac{\alpha _t}{1+\alpha _t+\mu _t},\mu _t\right)$$, for $$t\ge 2$$. It is also assumed that $$\epsilon _t$$ is independent of $$X_{t-l}$$ and $$Z_{t-l+1}$$, for all $$l\ge 1$$. Under these assumptions, the marginals of the process (22) are $$\text{ Geo }(\mu _t)$$ distributed, for $$t\in {\mathbb {N}}$$. This claim can be proved by following the same steps as for the stationary case.

### Remark 5

The transition probabilities and conditional mean and variance for the non-stationary Geo-NonLINAR(1) process are given by expressions in Propositions 5 and 8, respectively, just by replacing $$\mu$$ and $$\alpha$$ by $$\mu _t$$ and $$\alpha _t$$.

We consider two estimation methods for the parameter vector $$\varvec{\theta }=(\varvec{\beta },\varvec{\gamma })^\top$$. The first one is based on the conditional least squares. The CLS estimator of $$\varvec{\theta }$$ is obtained by minimizing (19) with $$\mu _t$$ and $$\alpha _t$$ instead of $$\mu$$ and $$\alpha$$, respectively. According to Wang (2020), this procedure might not be accurate in the sense that non-significant covariates can be included in the model. In that paper, a penalized CLS (PCLS) method is considered. Hence, a more accurate estimator is obtained by minimizing $${\widetilde{Q}}_n(\varvec{\theta })=Q_n(\varvec{\theta })+n\sum _{j=1}^{p+q} P_\delta (|\theta _i|)$$, where $$P_\delta (\cdot )$$ is a penalty function and $$\delta$$ is a tuning parameter. See Wang (2020) for possible choices of penalty function. This can be used as a selection criterion and we hope to explore it in a future paper. A second method for estimating the parameters is the maximum likelihood method. The log-likelihood function assumes the form (21) with $$\mu$$ and $$\alpha$$ replaced by $$\mu _t$$ and $$\alpha _t$$, respectively.

For the non-stationary case, we carry out a second set of Monte Carlo simulations by considering trend and seasonal covariates in the model as follows:

\begin{aligned} \mu _t = \exp (\beta _0 + \beta _1 t/n + \beta _2 \cos (2\pi t/12)) \quad \text{ and } \quad \alpha _t = \exp (\gamma _0 + \gamma _1 t/n), \end{aligned}

for $$t = 1, \ldots , n$$. The above structure aims to mimic realistic situations when dealing with epidemic diseases. We here set the following scenarios: (V) $$(\beta _0, \beta _1, \beta _2, \gamma _0, \gamma _1) = (2.0, 1.0, 0.7, 2.0, 1.0)$$ and (VI) $$(\beta _0, \beta _1, \beta _2, \gamma _0, \gamma _1) = (3.0, 1.0, 0.5, 3.0, 2.0)$$. We consider 500 Monte Carlo replications and the sample sizes $$n = 100, 200, 500, 1000$$. Table 2 reports the empirical mean and the RMSE (within parentheses) of the parameter estimates based on the MLE and CLS methods. We can observed that the MLE method outperforms the CLS method for all configurations considered, as expected since we are generating time series data from the “true" model. This can be also seen from Fig. 4, which presents the boxplots of MLE and CLS estimates under the Scenarios V with sample sizes $$n=200,500$$. Regardless, note that the bias and RMSE of the CLS estimates decrease as the sample size increases. Figure 5 displays the standardized Monte Carlo estimates under the MLE and CLS methods along with the standard normal density curve. As in the stationary case, we can observe a good normal approximation, which is more satisfactory under the MLE method since it uses the full distributional assumption and we are generating data from the correct model.

## 6 Real data applications

In this section, we discuss the usefulness of our methodology under stationary and non-stationary conditions. In the first empirical example, we consider the monthly number of polio cases reported to the U.S. Centers for Disease Control and Prevention from January 1970 to December 1983, with 168 observations. The data were obtained through the gamlss package in R. Polio (or poliomyelitis) is a disease caused by poliovirus. Symptoms associated with polio can vary from mild flu-like symptoms to paralysis and possibly death, mainly affecting children under 5 years of age. The second example concerns the monthly number of Hansen’s disease cases in the state of Paraíba, Brazil, reported by DATASUS - Information Technology Department of the Brazilian Public Health Care System (SUS), from January 2001 to December 2020, totalizing 240 observations. Hansen’s disease (or leprosy) is a curable infectious disease that is caused by M. leprae. It mainly affects the skin, the peripheral nerves mucosa of the upper respiratory tract, and the eyes. According to the World Health Organization, about 208,000 people worldwide are infected with Hansen’s disease. The data are displayed in Table 3.

### 6.1 Polio data analysis

We begin the analysis of the polio data by providing plots of the observed time series and the corresponding sample ACF and PACF plots in Fig. 6. These plots give us evidence that the count time series is stationary. Table 4 provides a summary of the polio data with descriptive statistics, including mean, median, variance, skewness, and kurtosis. From the results in Table 4, we can observe that counts vary between 0 and 14, with the sample mean and variance equal to 1.333 and 3.505, respectively, which suggests overdispersion of the data.

For comparison purposes, we consider the classic first-order INAR process with $$E(X_t|X_{t-1})=\kappa X_{t-1}+\mu (1-\kappa )$$, where $$\mu =E(X_{t})$$ and $$\kappa =\text{ corr }(X_t,X_{t-1})\in (0,1)$$. This linear conditional expectation on $$X_{t-1}$$ holds for the classic stationary INAR processes such as the binomial thinning-based ones, in particular, the Poisson INAR(1) model by Alzaid and Al-Osh (1987). The aim is to evaluate the effect of the nonlinearity of our proposed models on the prediction in comparison to the classic INAR(1) processes.

We consider the CLS estimation procedure, where just the conditional expectation is considered. This allows for a more flexible approach since no further assumptions are required. To obtain the standard errors of the CLS estimates, we consider a parametric bootstrap where some model satisfying the specific form for the conditional expectation holds. In this first application, for our NonLINAR process, we consider the geometric model derived in Sect. 3. For the classic INAR, the Poisson model by Alzaid and Al-Osh (1987) is considered in the bootstrap approach. This strategy to get standard errors has been considered, for example, by Maia et al. (2021) for a class of semiparametric time series models driven by a latent factor. In order to compare the predictive performance of the competing models, we compute the sum of squared prediction errors (SSPE) defined by $$\text {SSPE} = \sum _{t = 2}^n(x_t - {\hat{\mu }}_t)^2$$, where $${\hat{\mu }}_t = {\widehat{E}}(X_t|X_{t-1})$$ is the predicted mean at time t (see Proposition 8), for $$t = 2, \ldots , n$$. Table 5 summarizes the fitted models by providing CLS estimates and their respective standard errors, and the SSPE values. The SSPE results in Table 5 show the superior performance of the NonLINAR process over the classic INAR process in terms of prediction. This can also be observed from Fig. 7, where the NonLINAR process shows a better agreement between the observed and predicted values.

To evaluate the adequacy of our proposed NonLINAR process, we consider the Pearson residuals defined by $$R_t \equiv (X_t - {\hat{\mu }}_t) / {\hat{\sigma }}_t$$, with $${\hat{\sigma }}_t = \sqrt{\widehat{\text{ Var }}(X_t|X_{t-1})}$$, for $$t = 2, \ldots , n$$, where we assume that the conditional variance takes the form given in Proposition 8. Figure 8 presents the Pearson residuals against the time, its ACF, and the qq-plot against the normal quantiles. These plots show that the data correlation was well-captured. On the other hand, the qq-plot suggests that the Pearson residuals are not normally distributed. Actually, this discrepancy is not unusual especially when dealing with low counts; for instance, see Zhu (2011) and Silva and Barreto-Souza (2019). As an alternative way to check the adequacy, we use the normal pseudo-residuals introduced by Dunn and Smyth (1996), which is defined by $$R^*_t = \Phi ^{-1}(U_t)$$, where $$\Phi (\cdot )$$ is the standard normal distribution function and $$U_t$$ is uniformly distributed on the interval $$(F_{\varvec{{\hat{\theta }}}}(x_t-1), F_{\varvec{{\hat{\theta }}}}(x_t))$$, where $$F_{\varvec{{\hat{\theta }}}}(\cdot )$$ is the fitted predictive cumulative distribution function of the NonLINAR process. Figure 9 shows the pseudo residuals against the time, its ACF, and qq-plot. We can observe that the pseudo-residuals are not correlated and are approximately normally distributed. Therefore, we conclude that the NonLINAR process provides an adequate fit to the polio count time series data.

We now analyze the predictive performance of the proposed model by conducting an out-of-sample forecasting exercise through a rolling estimation window approach. More specifically, we split the data $$X_1,\ldots ,X_n$$ into the first $$n_0$$ observations $$X_1,\ldots ,X_{n_0}$$ and the remaining time series $$X_{n_0+1},\ldots ,X_n$$, where $$n_0<n$$. Hence, we estimate the model parameters using the trajectory $$X_1,\ldots ,X_{n_0}$$ and forecast $$X_{n_0+1}$$ by using the conditional expectation given in Proposition 8. Thereafter, we update the training dataset including the observation $$X_{n_0+1}$$ and reestimate the model parameters using $$X_1,\ldots ,X_{n_0}, X_{n_0+1}$$. Based on this fitted model, we forecast $$X_{n_0+2}$$ using the conditional 1-step ahead expectation as before. This procedure is repeated until we reach the last observation.

For the polio data, we consider $$n_0=84$$, which corresponds to December 1976. Figure 10 displays the polio time series and the 1-step ahead predicted values. This figure reveals a satisfactory out-of-sample forecasting performance of the NonLINAR(1) process since there is a good agreement between the observed and predicted values.

### 6.2 Hansen’s disease data analysis

We now analyze Hansen’s disease data. A descriptive data analysis is provided in Table 6. Figure 11 presents the Hansen’s count data and its corresponding sample ACF and PACF plots. This figure provides evidence that the count time series is non-stationary. In particular, we can observe a negative trend. This motivates us to use non-stationarity approaches to handle this data. We consider our non-stationary NonLINAR process with conditional mean

\begin{aligned} E(X_t|X_{t-1})=\alpha _t\left[ 1-\left( \dfrac{\alpha _t}{1+\alpha _t}\right) ^{X_{t-1}}\right] +\dfrac{\mu _t(1+\mu _t)}{1+\mu _t+\alpha _t}, \end{aligned}
(23)

where the following regression structure is assumed:

\begin{aligned} \mu _t = \exp \left( \beta _0 + \beta _1t/252 \right) \quad \text {and} \quad \alpha _t = \exp \left( \gamma _0 + \gamma _1t/252 \right) , \,\,\, \text {for}\,\,\, t = 1, \ldots , 252, \end{aligned}

with the term t/252 being a linear trend. For comparison purposes, we also consider the Poisson INAR(1) process allowing for covariates (Brännäs 1995) with conditional expectation $$E(X_t|X_{t-1})=\kappa _tX_{t-1}+\mu _t(1-\kappa _t)$$, where

\begin{aligned} \mu _t = \exp \left( \beta _0 + \beta _1t/252 \right) \quad \text {and} \quad \kappa _t = \dfrac{\exp \left( \xi _0 + \xi _1t/252 \right) }{1+\exp \left( \xi _0 + \xi _1t/252 \right) }, \,\,\, \text {for}\,\,\, t = 1, \ldots , 252. \end{aligned}

We consider the CLS estimation procedure for both approaches considered here. Table 7 gives us the parameter estimates under the NonLINAR and PINAR(1) processes, standard errors obtained via bootstrap, and the SSPE values (we use Eq. (23) and $$E(X_t|X_{t-1})=\kappa _tX_{t-1}+\mu _t(1-\kappa _t)$$ to obtain the predicted values according the non-stationary Geo-NonLINAR and Poisson INAR models, respectively). To get the standard errors for the parameter estimates, we proceed similarly as done in the first application with a slight difference. Since here the counts are high, the geometric assumption cannot be valid. Therefore, we consider a non-stationary NonLINAR process with innovations following a Poisson distribution with mean $$\dfrac{\mu _t(1+\mu _t)}{1+\mu _t+\alpha _t}$$ in our bootstrap scheme. This ensures that the conditional mean is the same as in (23). From Table 7, we have that the trend is significant (using, for example, a significance level at 5%) to explain the marginal mean $$\mu _t$$, but not for the parameter $$\alpha _t$$, under the NonLINAR model. Furthermore, we note that the sign of the estimate of $$\beta$$ is negative, which is in agreement with the observed negative trend. We highlight that the parameter $$\mu _t$$ also appears in the autocorrelation structure under our approach, therefore the trend is also significant to explain the autocorrelation of the NonLINAR process. By looking at the results from the PINAR fitting, we see that the trend is significant to explain $$\alpha _t$$ (parameter related to the autocorrelation) but not the marginal mean $$\mu _t$$. Once again, we have that the model producing the smallest SSPE is the NonLINAR process. So, our proposed methodology is performing better than the classic PINAR model in terms of prediction. The predictive values according to both models along with the observed counts are exhibited in Fig. 12.

We now check if the non-stationary NonLINAR process fits well the data. Figure 13 provides the Pearson residuals against time, its ACF plot, and the qq-plot of the residuals. By looking at this figure, we have evidence of the adequacy of the NonLINAR process to fit Hansen’s disease data.

We conclude this data analysis exploring the predictive power of the NonLINAR(1) process by performing the out-of-sample forecasting exercise through a rolling estimation window as described at the end of Sect. 6.1. Here, we consider $$n_0=168$$ (December 2014). Figure 14 provides the plot of Hansen’s data and the 1-step ahead predictions. Once again, we notice a good agreement between the observed time series and the predictions.

## 7 Generalization

In this section, we provide an extension of the geometric thinning operator and propose a non-linear INAR process based on such generalization. As we will see, alternative distributions rather than geometric for the operation in (4) can provide flexible approaches for dealing with different features on count time series. We also discuss how to handle zero-inflation or zero-deflation with respect to the geometric model.

### Definition 4

(Zero-modified geometric (ZMG) thinning operator) Assume that X is a non-negative integer-valued random variable, independent of $$Z^{(\eta ,\alpha )}\sim \textrm{ZMG}(1-\eta ,\alpha )$$, with $$\alpha >0$$ and $$1-\eta \in (-1/\alpha ,1)$$. We define the zero-modified geometric thinning operator $$(\eta ,\alpha ) \,{\triangle }\,$$ by

\begin{aligned} (\eta ,\alpha ) \,{\triangle }\,X {\mathop {=}\limits ^{d}} \min \left( X, Z^{(\eta ,\alpha )}\right) . \end{aligned}
(24)

### Remark 6

Note that the ZMG operator given in (24) has the geometric thinning operator as a special case when $$\eta =1$$ since $$Z^{(1,\alpha )}\sim \textrm{Geo}(\alpha )$$. Further, we stress that the parameterization of the ZMG distribution in terms of $$1-\eta$$ instead of $$\eta$$ will be convenient in what follows. Also, we will omit the dependence of Z on $$(\eta ,\alpha )$$ to simplify the notation.

Based on the ZMG operator, we can define a non-linear INAR process $$\{X_t\}_{t\in \mathbb {N}}$$ (similarly as done in Sect. 3) by

\begin{aligned} X_t = (\eta ,\alpha ) \,{\triangle }\,X_{t-1} + \epsilon _t,\quad t\in {\mathbb {N}}, \end{aligned}
(25)

where $$(\eta ,\alpha ) \,{\triangle }\,X_{t-1}=\min \left( X_{t-1},Z_t\right)$$, with $$\{Z_t\}_{t\in \mathbb {N}}{\mathop {\sim }\limits ^{iid}}\text{ ZMG }(1-\eta ,\alpha )$$, $$\{\epsilon _t\}_{t\ge 1}$$ is a sequence of iid non-negative integer-valued random variables, called innovations, with $$\epsilon _t$$ independent of $$X_{t-l}$$ and $$Z_{t-l+1}$$, for all $$l\ge 1$$, with $$X_0$$ being some starting value/random variable. This is basically the same idea as before; we are just replacing the geometric assumption by the zero-modified geometric law in the thinning operation.

We now show that it is possible to construct a stationary Markov chain satisfying (25) and having marginals ZMG-distributed; this could be seen as an alternative model to the zero-modified geometric INAR(1) process proposed by Barreto-Souza (2015). Furthermore, we argue that such construction is not possible under the geometric thinning operator defined in Sect. 2 (see Remark 7 below), which motivates the ZMG thinning introduced here.

Let $$X{\sim }\text{ ZMG }(1-\pi ,\mu )$$ with $$\mu >0$$ and $$1-\pi \in (-1/\mu ,1)$$. For $$z=0,1,\dots$$, it holds that

\begin{aligned} {\text {P}}\,((\eta ,\alpha ) \,{\triangle }\,X> z)&= {\text {P}}\,(X> z){\text {P}}\,(Z^{(\eta ,\alpha )} > z) = \pi \eta \left[ \left( \frac{\mu }{1+\mu }\right) \left( \frac{\alpha }{1+\alpha }\right) \right] ^{z+1}. \end{aligned}

In other words, $$(\eta ,\alpha ) \,{\triangle }\,X\sim \textrm{ZMG}\left( 1-\eta \pi ,\frac{\mu \alpha }{1+\mu +\alpha }\right)$$. Writing $$\Psi _\epsilon (s)\equiv \dfrac{\Psi _X(s)}{\Psi _{(\eta ,\alpha ) \,{\triangle }\,X}(s)}$$, we obtain

\begin{aligned} \Psi _\epsilon (s)= & {} \left\{ \dfrac{1+(1-\pi )\mu (1-s)}{1+\mu (1-s)}\right\} \Bigg /\left\{ \dfrac{1+(1-\pi \eta )\frac{\mu \alpha }{1+\mu +\alpha }(1-s)}{1+\frac{\mu \alpha }{1+\mu +\alpha }(1-s)}\right\} \nonumber \\= & {} \left\{ \dfrac{1+(1-\pi )\mu (1-s)}{1+(1-\pi \eta )\frac{\mu \alpha }{1+\mu +\alpha }(1-s)}\right\} \left\{ \dfrac{1+\frac{\mu \alpha }{1+\mu +\alpha }(1-s)}{1+\mu (1-s)}\right\} \equiv \varphi _1(s)\varphi _2(s),\nonumber \\ \end{aligned}
(26)

for all s such that $$|s|<1+\min (\mu ^{-1},\alpha ^{-1})$$, where $$\varphi _2(\cdot )$$ denotes the pgf of a $$\textrm{ZMG}\left( \frac{\alpha }{1+\mu +\alpha },\mu \right)$$ distribution. In addition to the restrictions on $$\pi$$ and $$\eta$$ above, assume that $$\pi \eta <1$$, $$\eta \ne 1$$, and $$\frac{1-\pi }{1-\pi \eta }\left( 1+\frac{1+\mu }{\alpha }\right) <1$$. Under these conditions, $$\varphi _1(\cdot )$$ is the pgf of a $$\textrm{ZMG}\left( \frac{1-\pi }{1-\pi \eta }\left( 1+\frac{1+\mu }{\alpha }\right) ,(1-\pi \eta )\frac{\mu \alpha }{1+\mu +\alpha }\right)$$ distribution. This implies that $$\Psi _\epsilon (\cdot )$$ is a proper pgf associated to a convolution between two independent ZMG random variables. Hence, we are able to introduce a NonLINAR process with ZMG marginals as follows.

### Definition 5

A stationary NonLINAR process $$\{X_t\}_{t\in \mathbb {N}}$$ with $$\text{ ZMG }(1-\pi ,\mu )$$ marginals (ZMG-NonLINAR) is defined by assuming that (25) holds with $$\{\epsilon _t\}_{t\ge 1}$$ being an iid sequence of random variables with pgf given by (26), and $$X_0{\sim }\text{ ZMG }(1-\pi ,\mu )$$, with $$\mu >0$$ and $$1-\pi \in (-1/\mu ,1)$$.

### Remark 7

Note that we are excluding the case $$\eta =1$$ (which corresponds to the geometric thinning operator) since the required inequality $$\frac{1-\pi }{1-\pi \eta }\left( 1+\frac{1+\mu }{\alpha }\right) <1$$ does not hold in this case ($$1+\frac{1+\mu }{\alpha }>1$$). This shows that a NonLINAR process with ZMG marginals cannot be constructed based on the geometric thinning operator defined previously and therefore motivates the ZMG operator. We would like to highlight the importance of the ZMG thinning operator (which is an extension of the geometric thinning operator) since it permits us to construct a Non-LINAR(1) process with ZMG marginals. As a consequence, this model can handle inflation or deflation of zeros that cannot be accounted by the geometric Non-LINAR(1) model introduced in Sect. 3.