1 Introduction

One of the important causes of overdispersion in the count data is an inflated number of zeros in excess of the number expected under the distribution. In such cases, one of the appropriate models is zero-inflated Poisson (ZIP) distribution. There are numerous papers in the literature dealing with the ZIP model. The earliest studies on the ZIP model were done by Cohen [6] and Yoneda [37]. Lambert [14] introduced and studied the ZIP regression model using the Expectation-Maximization (EM) approach. Vandenbroek [34] gave a score test for testing a standard Poisson regression model. Jansakul and Hinde [11] extended this test to a more general situation where the zero probability depends on covariates. Ridout et al. [27] derived a score test for testing a ZIP regression model against the zero-inflated negative binomial alternatives. Agarwal et al. [1] applied the ZIP regression model for analyzing spatial count data sets. The score test using the normal approximation might underestimate the nominal significance level for small sample cases. Jung et al. [12] proposed a parametric bootstrap method for this problem. They observed that the ZIP regression model for prediction is more robust than the usual Poisson regression model. Long et al. [20] developed a marginalized ZIP regression model approach for independent responses to model the population mean count directly, allowing straightforward inference for overall exposure effects and derived an empirical robust variance estimation for overall incidence density ratios. Zhu et al. [40] have extended zero-inflated count models to account for random effects. Lim et al. [16] proposed the ZIP regression mixture model to account for both excess zeros and overdispersion caused by unobserved heterogeneity. A Bayesian latent factor ZIP model was proposed by Neelon and Chung [25] to analyze the molecular differences among breast cancer patients. Furthermore, ZIP models for censored data were studied by Saffari and Adnan [28] and Yang and Simpson [36]. Research on these models is still in progress. For instance, the ZIP model’s parameters have been estimated using the Genetic Algorithm by Dikheel and Jouda [9] and the Shrinkage estimator by Zandi et al. [38]. ZIP models continue to be frequently utilized in many studies, even with the development of more comprehensive models [24, 32]. Empirical evidence shows that inflation may occur at more than one point. For example, Lin and Tsai [17] discussed a model that can be applied to both excessive zeros and ones known and called the zero–one inflated Poisson (ZOIP) model. Zhang et al. [39] initially studied the likelihood-based ZOIP model without covariates. When the covariates are available, it is essential to build a ZOIP regression model to clarify the relationship between the covariates and the response variable. Tang et al. [33], Liu et al. [18], Liu et al. [19], and Arora and Chaganty [5] studied the statistical inference for the ZOIP model. Melkersson and Rooth [23] proposed a zero–two inflated Poisson model, which accounts for a relative excess of both zero–two children in modeling complete female fertility.

In this article, we attempt to provide a generalization for an inflated regression model based on the Poisson distribution. For this purpose, we generalized the inflated points to \(0,\ldots ,k,\) for \(k=0,1,2,\ldots .\) This generalization provides various benefits for the modeling of inflated and non-inflated data. For example, this generalization includes all previous inflated regression models based on Poisson distribution. Also, this model provides a wide range of models to the researcher, who can choose the most appropriate ones according to the data. In summary, the originality of this paper lies in the development of the family of inflated Poisson distribution and its application to generalized linear models. This generalization is significant because it gives the researcher access to the entire family of Poisson-based inflated distributions (the most used family in discrete inflated distributions) at a single model. Thus researcher chooses the appropriate model for the data set by choosing the appropriate value of k according to the inflation at each point or any number of points. Besides, with other choices of k and distribution weights, new models can also be introduced. Selecting the appropriate value for k might be a challenge when using the ZKIP models. An effective approach for selecting the correct value for k in the ZKIP models is described in order to address this issue. Section 7 provides an explanation of this algorithm along with useful examples.

Therefore, the rest of the paper is organized as follows. In Sect. 2, we introduce the zero to k inflated Poisson (ZKIP) distribution and some special cases. Estimation by the maximum likelihood (ML) method and standard errors of the ML estimates are outlined in Sect. 3. Section 4 deals with the regression ZKIP model and the ML estimation, EM algorithm, and hypotheses testing for this model. In Sect. 5, the method of the randomized quantile residual (RQR) for the adequacy of the proposed model is introduced. Simulations are conducted in Sect. 6 to assess the usefulness of this model. In Sect. 7, two real data sets are used to demonstrate the flexibility and superiority of the proposed model against the existing ones. Finally, in Sect. 8, conclusions are given.

2 Proposing the New Model

We can build the zero to k inflated distribution with introducing the function g(y) as

$$\begin{aligned} g(y)= \left\{ \begin{array}{ll} w_{0}+(1-\eta ({\textbf{w}}))f(0), &{}\quad y=0,\\ w_{1}+(1-\eta ({\textbf{w}}))f(1),&{} \quad y=1,\\ \vdots &{} \quad \vdots \\ w_{k}+(1-\eta ({\textbf{w}}))f(k), &{}\quad y=k, \\ (1-\eta ({\textbf{w}}))f(y), &{}\quad y>k, \end{array} \right. \end{aligned}$$
(1)

where f(y) is a discrete distribution, say Poisson, \(0 \le w_{i}, i=0,\ldots ,k,\) \(\eta ({\textbf{w}})=\sum _{i=0}^{k} w_{i}\) and \(0\le \eta ({\textbf{w}}) \le 1.\) By substituting the Poisson probability mass function (PMF) \(f(y)=\exp (-\lambda )\lambda ^y/y!,\) \(y=0, 1, \ldots ,\) into Equation (1), we have

$$\begin{aligned} g(y;\,{\textbf{w}},\lambda )= \left\{ \begin{array}{ll} w_{0}+(1-\eta ({\textbf{w}}))\exp (-\lambda ), &{}\quad y=0,\\ w_{1}+(1-\eta ({\textbf{w}}))\lambda \exp (-\lambda ), &{}\quad y=1, \\ \vdots &{} \quad \vdots \\ w_{k}+(1-\eta ({\textbf{w}}))\frac{\lambda ^{k}}{k!} \exp (-\lambda ), &{} \quad y=k, \\ (1-\eta ({\textbf{w}}))\frac{\lambda ^{y}}{y!} \exp (-\lambda ), &{}\quad y>k, \end{array} \right. \end{aligned}$$
(2)

where \({\textbf{w}}=(w_{0},\ldots ,w_{k}).\) The discrete random variable Y with the PMF (2) is called to follow the ZKIP distribution and denoted by \(Y\sim \)ZKIP\(({\textbf{w}},\lambda ).\) Here, some special cases of the family of ZKIP models are given as follows:

  • \(k=0\) \(\rightarrow \) ZIP distribution [9, 24, 38] and [32].

  • \(k=1\) \(\rightarrow \) zero–one inflated Poisson distribution [17, 19, 33, 39], and [13].

  • \(k=2\) \(\rightarrow \) zero–one–two inflated Poisson distribution [32].

  • \(k>1, w_{1}=w_{2},\ldots ,w_{k-1}=0\) \(\rightarrow \) zero and k inflated Poisson distribution [5, 23, 30].

Fig. 1
figure 1

PMF of the ZIP, ZOIP, ZOTIP, and ZOTTIP distributions, for \(\lambda =1\) and \(\lambda =2\)

If \(Y \sim \)ZKIP\((y;\,{\mathbf{w}},\lambda ),\) then the moment generating function is obtained from Eq. (2) as

$$\begin{aligned} M_Y (t)=E[\exp (tY)]=\sum _{i=0}^{k}w_{i}\exp (ti)+(1-\eta ({\textbf{w}}))\exp [\lambda (\exp (t)-1)]. \end{aligned}$$
(3)

Equation (3) gives the expectation and variance of the random variable Y as

$$\begin{aligned} E(Y)= & {} \sum _{i=1}^{k}iw_{i}+(1-\eta ({\textbf{w}}))\lambda , \end{aligned}$$
(4)
$$\begin{aligned} Var(Y)= & {} \sum _{i=1}^{k}i^2 w_{i}-\left( \sum _{i=1}^{k}iw_{i}\right) ^2-\lambda (1-\eta ({\textbf{w}}))\left[ 2\sum _{i=1}^{k}iw_{i}-\lambda \eta ({\textbf{w}})-1 \right] . \end{aligned}$$
(5)

Figure 1 shows PMF of the ZKIP distribution for \(\lambda =1\) and some selected values of k and \({\textbf{w}}.\)

3 Estimation

Let \({\textbf{Y}}=(Y_1,\ldots ,Y_n)\) be a random sample arising from the ZKIP distribution. In this section, we study the problem of estimating unknown parameter vector \({\boldsymbol{\theta }} = ({\textbf{w}},\lambda )\) on the basis of \({\textbf{Y}}.\) Let

$$\begin{aligned} I_{i}(y)= \left\{ \begin{array}{ll} 1, &{}\quad y=i ,\\ 0, &{} \quad y\ne i,\ i=0,\ldots ,k, \end{array}\right. \end{aligned}$$
(6)

We can rewrite PMF (2) as

$$\begin{aligned}{} {} g(y;\,{\textbf{w}},\lambda )&=\prod _{i=0}^{k}\left\{ w_{i}+(1-\eta ({\textbf{w}}))\frac{\lambda ^{y}}{y!} \exp (-\lambda )\right\} ^{I_{i}(y)} \\{} & {} \quad \times \left\{ (1-\eta ({\textbf{w}})) \frac{\lambda ^{y}}{y!} \exp (-\lambda ) \right\} ^{I(y>k)}, \end{aligned}$$

where \(I(y>k)=1-\sum _{i=0}^{k} I_{i}(y).\) Thus, the likelihood function (LF) of the observed sample \({\textbf{Y}}={{\textbf{y}}}\) reads

$$\begin{aligned}{} & {} L_{obs}({\textbf{y}};\,{\textbf{w}},\lambda )=\prod _{j=1}^{n}\left\{ \prod _{i=0}^{k}\left( w_{i}+(1-\eta ({\textbf{w}}))\frac{\lambda ^{y_{j}}}{y_{j}!} \exp (-\lambda )\right) ^{I_{i}(y_{j})}\right. \\{} & {} \quad \left. \times \left( (1-\eta ({\textbf{w}})) \frac{\lambda ^{y_{j}}}{y_{j}!} \exp (-\lambda ) \right) ^{I(y_{j}>k)}\right\} . \end{aligned}$$
(7)

Then the logarithm of the LF (LLF) is

$$\begin{aligned} \ell _{obs} ({\textbf{w}},\lambda ;\,{\textbf{y}})&\varpropto&\sum _{j=1}^{n} \bigg \{ \sum _{i=0}^{k}\left[ I_{i}(y_{j})\log \left( w_{i}+(1-\eta ({\textbf{w}}))\frac{\lambda ^{y_{j}}}{y_{j}!} \exp (-\lambda )\right) \right] \\{} & {} +I(y_{j}>k)(\log (1-\eta ({\textbf{w}}))-\lambda + y_{j}\log (\lambda )) \bigg \}. \end{aligned}$$
(8)

From Eq. (8), the ML estimates are derived by solving the following equations with respect to the parameters:

$$\begin{aligned} \frac{\partial \ell _{obs} ({\textbf{w}},\lambda ;\,{\textbf{y}})}{\partial w_{l}}= & {} \sum _{j=1}^{n} \bigg \{ \sum _{i=0,i\ne l}^{k} \bigg [ I_{i}(y_{j}) \times K_{1i}((y_{j},{\textbf{w}},\lambda )\bigg ] + I_{l}(y_{j}) \times K_{2l}(y_{j},{\textbf{w}},\lambda ) \end{aligned}$$
(9)
$$\begin{aligned}{} & {} -I(y_{j}>k)\left( \frac{1}{1-\eta ({\textbf{w}})} \right) \bigg \}=0,\qquad l=0,\ldots ,k. \end{aligned}$$
(10)
$$\begin{aligned} \frac{\partial \ell _{obs} ({\textbf{w}},\lambda ;\,{\textbf{y}})}{\partial \lambda }= & {} \sum _{j=1}^{n} \left\{ \sum _{i=0}^{k}\bigg [ I_{i}(y_{j})\times K_{3i}(y_{j},{\textbf{w}}.\lambda )\bigg ]+I(y_{j}>k)(y_{j}/\lambda -1)\right\} =0, \end{aligned}$$
(11)

where

$$\begin{aligned} K_{1i}(y_{j},{\textbf{w}},\lambda )= & {} \frac{-\frac{\lambda ^{y_{j}}}{y_{j}!} \exp (-\lambda )}{w_{i}+(1-\eta ({\textbf{w}}))\frac{\lambda ^{y_{j}}}{y_{j}!} \exp (-\lambda )},\\ K_{2i}(y_{j},{\textbf{w}},\lambda )= & {} \frac{1-\frac{\lambda ^{y_{j}}}{y_{j}!} \exp (-\lambda )}{w_{i}+(1-\eta ({\textbf{w}}))\frac{\lambda ^{y_{j}}}{y_{j}!} \exp (-\lambda )},\\ K_{3i}(y_{j},{\textbf{w}},\lambda )= & {} -\frac{ \nu _{1j}({\textbf{w}})}{w_{i}+(1-\eta ({\textbf{w}}))\frac{\lambda ^{y_{j}}}{y_{j}!} \exp (-\lambda )},\\ \nu _{1j}({\textbf{w}})= & {} \frac{(1-\eta ({\textbf{w}}))(\lambda - y_{j})\exp (-\lambda ) \lambda ^{y_{j}-1}}{y_{j}!}. \end{aligned}$$

Since there is no closed-form solution for this system of equations, we may use numerical methods to estimate the parameters. To do this, we need to derive the elements of the observed information, which are given by

$$\begin{aligned} \frac{\partial ^{2} \ell _{obs} ({\textbf{w}},\lambda ;\,{\textbf{y}})}{\partial w_{l}^{2}}= & {} \sum _{j=1}^{n} \bigg \{- \sum _{i=0, i\ne l}^{k} \bigg [ I_{i}(y_{j})\times K_{1i}^{2}(y_{j},{\textbf{w}},\lambda )\bigg ]\\{} & {} -I_{l}(y_{j})\times K_{2l}^{2}(y_{j},{\textbf{w}},\lambda ) -\frac{I(y_{j}>k)}{(1-\eta ({\textbf{w}}))^2}\bigg \},\quad l=0,\ldots ,k,\\ \frac{\partial ^{2} \ell _{obs} ({\textbf{w}},\lambda ;\,{\textbf{y}})}{\partial w_{l}\partial w_{s}}= & {} \sum _{j=1}^{n} \bigg \{ \sum _{i=0,i\ne l,s}^{k} \bigg [ I_{i}(y_{j}) \times K_{1i}^{2}(y_{j},{\textbf{w}},\lambda ) \bigg ]+ I_{l}(y_{j}) \\{} & {} \times \frac{\nu _{3j}({\textbf{w}},\lambda )(1-\nu _{3j}({\textbf{w}},\lambda ))}{(\nu _{2lj}({\textbf{w}},\lambda ))^{2}}\\{} & {} +I_{s}(y_{j}) \times \frac{\nu _{3j}({\textbf{w}},\lambda )(1-\nu _{3j}({\textbf{w}},\lambda ))}{(\nu _{2sj}({\textbf{w}},\lambda ))^{2}} -\frac{I(y_{j}>k)}{(1-\eta ({\textbf{w}}))^2}\bigg \},\quad l,s=0,\ldots ,k,\\ \frac{\partial ^{2} \ell _{obs} ({\textbf{w}},\lambda ;\,{\textbf{y}})}{\partial w_{l}\partial \lambda }= & {} \sum _{j=1}^{n}\sum _{i=0, i\ne l}^{k} I_{i}(y_{j})\frac{{\lambda ^{y_{j}-1}\exp (-\lambda )} (\lambda - y_{j})w_{i}}{y_{j}! (\nu _{2ij}({\textbf{w}},\lambda ))^2}\\{} & {} +I_{l}(y_{j})\frac{{\lambda ^{y_{j}-1}\exp (-\lambda )} (\lambda - y_{j})(w_{l}+1-\eta ({\textbf{w}}))}{y_{j}! (\nu _{2ij}({\textbf{w}},\lambda ))^2}\quad l=0,\ldots ,k,\\ \frac{\partial ^{2} \ell _{obs} ({\textbf{w}},\lambda ;\,{\textbf{y}})}{\partial \lambda ^{2}}= & {} \sum _{j=1}^{n} \bigg \{ \sum _{i=0}^{k}\bigg [ I_{i}(y_{j})\times \frac{ \nu _{1j}({\textbf{w}})\exp (-\lambda )(\lambda -x_{j})(1-\eta ({\textbf{w}}))}{(\nu _{2ij}({\textbf{w}},\lambda ))^2\times y_{j}!}\bigg ] \\{} & {} - I(y_{j}>k)(y_{j}/\lambda ^{2}) \bigg \}, \end{aligned}$$

where

$$\begin{aligned} \nu _{2ij}({\textbf{w}},\lambda )= w_{i}+(1-\eta ({\textbf{w}}))\frac{\lambda ^{y_{j}}}{y_{j}!} \exp (-\lambda ), \end{aligned}$$

and

$$\begin{aligned} \nu _{3j}({\textbf{w}},\lambda )= \frac{\lambda ^{y_{j}}}{y_{j}!} \exp (-\lambda ). \end{aligned}$$

For the problems of interval estimation and testing statistical hypotheses, the Fisher information (FI) matrix is useful. The FI matrix \(({\texttt {I}}(\Theta )={\texttt {I}}_{ij}, {i,j=0,\ldots ,k+1}),\) obtained from the observed information matrix by taking the expected values of each entry, is given as follows:

$$\begin{aligned}{} & {} {\texttt {I}}_{ll}=-E\left( \frac{\partial ^{2} \ell _{obs} ({\textbf{w}},\lambda ;\,{\textbf{y}})}{\partial w_{i}^{2}}\right) \\{} & {} \quad =n \sum _{i=0, i\ne l}^{k} \left[ K_{1i}^{2}(i,{\textbf{w}},\lambda )\times g(i;\,{\textbf{w}},\lambda ) \right] +n K_{2l}^{2}(l,{\textbf{w}},\lambda )\times g(l;\,{\textbf{w}},\lambda )\\{} & {} \qquad +n Pr(Y>k)/(1-\eta ({\textbf{w}}))^2 ,\qquad l=0,\ldots k. \\{} & {} {\texttt {I}}_{ls}=-E\left( \frac{\partial ^{2} \ell _{obs} ({\textbf{w}},\lambda ;\,{\textbf{y}})}{\partial w_{l}\partial w_{s}}\right) \\{} & {} \quad =-n \sum _{i=0,i\ne l,s}^{k} \bigg [ K_{1i}^{2}(i,{\textbf{w}},\lambda )\times g(i;\,{\textbf{w}},\lambda ) \bigg ]\\{} & {} \qquad - \frac{n \nu _{3l}({\textbf{w}},\lambda )(1-\nu _{3l}({\textbf{w}},\lambda ))}{(\nu _{2ll}({\textbf{w}},\lambda ))^{2}}\times g(l;\,{\textbf{w}},\lambda )\\{} & {} \qquad - \frac{n \nu _{3s}({\textbf{w}},\lambda )(1-\nu _{3s}({\textbf{w}},\lambda ))}{(\nu _{2ss}({\textbf{w}},\lambda ))^{2}}\times g(s;\,{\textbf{w}},\lambda )\\{} & {} \qquad +\frac{ n Pr(Y>k) }{(1-\eta ({\textbf{w}}))^2}, \qquad l, s=0,\ldots ,k,\ and\ l\ne s, \\{} & {} {\texttt {I}}_{l(k+1)}={\texttt {I}}_{(k+1)s}=-E\left( \frac{\partial ^{2} \ell _{obs} ({\textbf{w}},\lambda ;\,{\textbf{y}})}{\partial w_{l} \partial \lambda }\right) \\{} & {} \quad =-n \sum _{i=0, i\ne l}^{k} \left[ \frac{{n \lambda ^{i-1}\exp (-\lambda )} (\lambda - i)w_{i}}{i! (\nu _{2ii}({\textbf{w}},\lambda ))^2}\times g(i;\,{\textbf{w}},\lambda )\right] \\{} & {} \qquad -\frac{{n \lambda ^{l-1}\exp (-\lambda )} (\lambda l)(w_{l}+1-\eta ({\textbf{w}}))}{l! (\nu _{2il}({\textbf{w}},\lambda ))^2}\\{} & {} \qquad \times g(l;\,{\textbf{w}},\lambda ) ,\qquad l,s=0,\ldots k, \\{} & {} {\texttt {I}}_{(k+1)(k+1)}=-E\left( \frac{\partial ^{2} \ell _{obs} ({\textbf{w}},\lambda ;\,{\textbf{y}})}{\partial \lambda ^{2}}\right) \\{} & {} \quad =-n\sum _{i=0}^{k}\bigg [ \frac{ \nu _{1i}({\textbf{w}})\exp (-\lambda )(\lambda -i)(1-\eta ({\textbf{w}}))}{(\nu _{2ii}({\textbf{w}},\lambda ))^2 i!}\times g(i;\,{\textbf{w}},\lambda )\bigg ] \\{} & {} \qquad +\frac{nPr(Y>k)}{\lambda ^{2}} . \end{aligned}$$

If \(\theta \) is the arbitrary parameter of model and \({\hat{\theta }}\) is the ML estimates for this parameter, then from Lehmann et al. [15], we have

$$\begin{aligned} n^{1/2}({\hat{\theta }}-\theta ) \xrightarrow {d} N(0,I^{-1}(\theta )). \end{aligned}$$
(12)

Therefore, we can use asymptotic distribution (12) to construct an asymptotic confidence interval and hypotheses testing for the parameters of the proposed model (see Lehmann et al. [15]) for more details).

4 ZKIP Regression Model

In Sect. 2, we introduce and motivate the ZKIP distribution. Recall that the PMF of the ZKIP distribution is

$$\begin{aligned} g(y;\,{\textbf{w}},\lambda )= \left\{ \begin{array}{ll} w_{y}+(1-\eta ({\textbf{w}}))P_{y}(\lambda ), &{}\quad y=0,\ldots ,k, \\ (1-\eta ({\textbf{w}}))P_{y}(\lambda ), &{}\quad y>k, \end{array} \right. \end{aligned}$$
(13)

where \(P_{\lambda }(y) = \exp (-\lambda )\lambda ^{y}/y!\) for \(y=0, 1, \ldots .\) So the ZKIP is a distribution with \(k+2\) parameters \({\textbf{w}}=(w_{0},\ldots ,w_{k})\) and \(\lambda .\) In fact, it is a mixture of \(k+2\) distributions. The first distribution is degenerate at zero with weight \(w_{0},\) and the second is degenerate at one with weight \(w_{1},\) and the same goes for up to k. Finally, the \((k+2)\)th the distribution is the Poisson with mean \(\lambda \) with weight \((1-\eta ({\textbf{w}})).\)

Suppose that we have a vector \({\textbf{y}}=(y_{1},\ldots ,y_{n})\) of n independent count responses from a ZKIP distribution. We assume that, associated with each \(y_{i},\) the vector of covariates \(({\textbf{x}}_{i}^{T})\) has been observed. The layout of the observed data is shown in Table 1.

Table 1 Form of data for regression analyses on inflated ZKIP distribution

From (13), the LF of the available data in Table 1 can be rewritten as

$$\begin{aligned} L_{obs}({\textbf{w}},{\boldsymbol{\lambda }}|{\textbf{y}})=\prod _{j=1}^{n}\left\{ \prod _{i=0}^{k}\left( \left[ w_{i}+[1-\eta ({\textbf{w}})]P_{\lambda _j}(y_j) \right] ^{I_{i}(y_{j})}\right) \times \left( [1-\eta ({\textbf{w}})] P_{\lambda _j}(y_j) \right) ^{I(y_{j}>k)}\right\} , \end{aligned}$$
(14)

where \({\boldsymbol{\lambda }}=(\lambda _{1},\ldots ,\lambda _{n})\) and \(P_{\lambda _j}(y_j) = \exp (-\lambda _{j})\lambda _{j}^{y_{j}}/y_{j}!,\) \((y_{j} \ge 0).\) To connect the parameters with the covariates, we follow the standard generalized linear model (GLM) framework for the multinomial distribution;\, see Agresti [2] for further reading. The \(k+2\) mixing distributions can be viewed as \(k+2\) nominal categories. Thus, the probabilities of the \(k+2\) (degenerate(0), degenerate(1), \(\ldots ,\) degenerate(k),  Poisson) categories are \(w_{0}, w_{1}, \ldots , w_{k},\) \((1-\eta ({\textbf{w}})),\) respectively. Following the GLM baseline category logit model for the multinomial, let

$$\begin{aligned} \gamma _{0}=\log \left( \frac{w_{0}}{1-\eta ({\textbf{w}})}\right) , \gamma _{1}=\log \left( \frac{w_{1}}{1-\eta ({\textbf{w}})}\right) , \ldots , \gamma _{k}=\log \left( \frac{w_{k}}{1-\eta ({\textbf{w}})}\right) . \end{aligned}$$
(15)

Here, we treat the Poisson distribution as the baseline category, and thus we have \(((k+2) - 1) = k+1\) equations for the other \(k+1\) categories. As in log-linear models, the ZKIP regression model assumes that the Poisson parameter \(\lambda _{i}\) is a loglinear function of the covariates;\, that is,

$$\begin{aligned} \log (\lambda _{j})={\textbf{x}}_{j}^{T}{\boldsymbol{\beta }},\quad j=1,\ldots ,n, \end{aligned}$$

where \({\boldsymbol{\beta }} = (\beta _{1},\ldots ,\beta _{p})^{T}\) is a p-dimensional unknown regression parameters vector. For the sake of brevity, we assume that the parameters vector \({\boldsymbol{\gamma }}= (\gamma _{0},\ldots ,\gamma _{k})\) is constant. The generalization where these \(k+1\) parameters are functions of the covariates is straightforward. Thus, the parameters of the ZKIP regression model are \({\boldsymbol{\beta }}\) and \({\boldsymbol{\gamma }}.\) In what follows, the problems of estimating parameters and hypotheses testing are discussed in detail.

4.1 Estimation of Regression Parameters

In the following, we study methods for estimating the parameters of the ZKIP regression model. Two popular methods are the ML and EM methods. The ML method involves optimizing the LF (14) or the logarithm of the LF with respect to the unknown parameters \({\boldsymbol{\beta }}\) and \({\boldsymbol{\gamma }}.\) By substituting the reparameterizations (15) into the LF (14), we can rewrite (8) as follows:

$$\begin{aligned} \ell _{obs} ({\boldsymbol{\beta }},{\boldsymbol{\gamma }})= & {} \log L_{obs}({\boldsymbol{\beta }},{\boldsymbol{\gamma }}|{\textbf{y}}) \\= & {} \sum _{j=1}^{n} \bigg \{ \sum _{i=0}^{k}\left[ I_{i}(y_{j})\log \left( \left[ \exp (\gamma _{i})+P_{y_{j}} (\lambda _{j})\right] \gamma ^* \right) \right] \\{} & {} +I(y_{j}>k) \log \left( P_{y_{j}}(\lambda _{j})\right) - \log \bigg ( 1+\sum _{l=0}^{k} \exp (\gamma _{l})\bigg )\bigg \}, \end{aligned}$$
(16)

where \(\log (\lambda _{j})={\textbf{x}}_{j}^{T}{\boldsymbol{\beta }}\) and \(\gamma ^*=1/\{1+\sum _{l=0}^{k} \exp (\gamma _{l})\} .\) The ML estimates can be obtained by maximizing the LLF (16) directly with respect to the parameters. Alternatively, one can take the partial derivatives of the LLF and solve the \((k+2)\) score equations as follows:

$$\begin{aligned}{} & {} \sum _{j=1}^{n}\left( I_{i}(y_{j})\times \frac{\exp (\gamma _{i})}{\exp (\gamma _{i})+P_{y_{j}}(\lambda _{j}))} \right) =\frac{n \exp (\gamma _{i})}{1+\sum _{l=0}^{k} \exp (\gamma _{l})},\quad i=0,\ldots , k, \\{} & {} \sum _{j=1}^{n}\bigg \{ \sum _{i=0}^{k}\left( I_{i}(y_{j})\times \frac{P_{y_{j}}(\lambda _{j})(y_{j}-\lambda _{j})/\lambda _{j}}{\exp (\gamma _{i})+P_{j}(\lambda _{j})} \right) + I(y_{j}>k)\times \frac{P_{y_{j}}(\lambda _{j})(y_{j}-\lambda _{j})/\lambda _{j}}{P_{j}(\lambda _{j})} \bigg \}=0. \end{aligned}$$

An alternative and popular method for parameter estimation is the EM approach. The EM approach treats the observed data \({\textbf{y}}=(y_{1},\ldots ,y_{n})\) as a part of complete data that includes \({\textbf{z}}=({\textbf{z}}_{1},\ldots ,{\textbf{z}}_{n}),\) which is regarded as missing. Here each \({\textbf{z}}_{i}=(z_{i0},\ldots ,z_{i(k+1)})\) is a \(k+2\) component vector with PMF (17). By the definition of latent variable \({\textbf{z}},\) we have

$$\begin{aligned} Pr\bigg ({\textbf{z}}=(z_{0},\ldots ,z_{k+1})\bigg )= \left\{ \begin{array}{ll} w_{0}, &{}\quad {\textbf{z}}=(1,0,\ldots ,0),\\ w_{1}, &{} \quad {\textbf{z}}=(0,1,0,\ldots ,0),\\ \vdots &{} \quad \vdots \\ w_{k}, &{} \quad {\textbf{z}}=(0,0,0,\ldots ,1,0), \\ w_{k+1}, &{} \quad {\textbf{z}}=(0,0,0,\ldots ,0,1), \end{array} \right. \end{aligned}$$
(17)

where \(w_{k+1}=(1-\eta ({\textbf{w}})).\)

Then, the conditional PMF of \(y_{i}\) given \({\textbf{z}}_{\textbf{i}}\) is calculated. Thus, the joint distribution of the observed and missing data, which is given by the ZKIP distribution, is a mixture of Poisson with \(k+1\) degenerate distributions at zero to k. Consider a latent variable \({\textbf{z}}=(z_{0},\ldots ,z_{k+1}),\) which is distributed as a multinomial with parameters \((1,w_{0},\ldots ,w_{k+1}).\) Note that \({\textbf{z}}\) takes values \((0,\ldots ,0,1_{i},0,\ldots ,0)\) with probability \(w_{i},\) \(i=0,\ldots ,(k+1).\) That is,

$$\begin{aligned} Pr(y_{j},{\textbf{z}}_{j})= \left\{ \begin{array}{ll} w_{0}, &{} \quad z_{0j}=1, y_{j}=0,\\ w_{1}, &{} \quad z_{1j}=1, y_{j}=1,\\ \vdots &{} \quad \vdots \\ w_{k}, &{} \quad z_{kj}=1, y_{j}=k, \\ w_{k+1}\times \frac{\exp (-\gamma _{j}) \lambda _{j} ^{y_{j}}}{y_{j}!}, &{} \quad z_{(k+1)j}=1, y_{j}\ge 0. \end{array} \right. \end{aligned}$$
(18)

Thus the conditional distribution of Y given \({\textbf{z}}=(z_0, \ldots , z_{k+1})\) is

$$\begin{aligned} Pr\bigg (Y=y|{\textbf{z}}=(z_{0},\ldots ,z_{k+1})\bigg )= \left\{ \begin{array}{ll} 1, &{}\quad z_{0}=1, y=0,\\ 1, &{} \quad z_{1}=1, y=1,\\ \vdots &{} \quad \vdots \\ 1, &{} \quad z_{k}=1, y=k, \\ \frac{\exp (-\lambda ) \lambda ^{y}}{y!}, &{}\quad z_{k+1}=1, y\ge 0. \end{array} \right. \end{aligned}$$
(19)

Finally, the joint PMF of \((Y, {\textbf{z}})\) is obtained from (17) and (19) as

$$\begin{aligned}{} & {} Pr \bigg ( Y=y,{\textbf{z}}=(z_{0},\ldots ,z_{k+1}) \bigg ) \\{} & {} \quad =Pr\bigg (Y=y|{\textbf{z}}=(z_{0},\ldots ,z_{k+1})\bigg )\times Pr\bigg ({\textbf{z}}=(z_{0},\ldots ,z_{k+1})\bigg ) \\{} & {} \quad =\left\{ \begin{array}{ll} w_{0}, &{} \quad z_{0}=1, y=0,\\ w_{1}, &{} \quad z_{1}=1, y=1,\\ \vdots &{} \quad \vdots \\ w_{k}, &{} \quad z_{k}=1, y=k, \\ w_{k+1} \times \frac{\exp (-\lambda ) \lambda ^{y}}{y!}, &{} \quad z_{k+1}=1, y\ge 0. \end{array} \right. \end{aligned}$$
(20)

Therefore from (20), the complete LF of the ZKIP model is

$$\begin{aligned} L_{comp}({\textbf{w}},{\boldsymbol{\lambda }};\,{\textbf{y}},{\textbf{z}})=\prod _{j=1}^{n}\left\{ \prod _{i=0}^{k}\left[ w_{i}^{z_{ij}\times I_{i}(y_{j})}\right] \times \left( w_{k+1} P_{y_{j}}(\lambda _{j}) \right) ^{z_{(k+1)j}}\right\} . \end{aligned}$$
(21)

By (15), the LLF of the complete data \(({\textbf{y}}, {\textbf{z}})\) reads

$$\begin{aligned} \ell _{comp} ({\textbf{w}},{\boldsymbol{\lambda }};\,{\textbf{y}},{\textbf{z}})= & {} \log L_{comp}({\boldsymbol{\beta }},{\boldsymbol{\gamma }}|{\textbf{y}},{\textbf{z}}) \\= & {} \sum _{j=1}^{n} \bigg \{ \sum _{i=0}^{k} z_{ij}I_{i}(y_{j})\bigg [ \gamma _{i} - \log (1+\sum _{l=0}^{k} e^{\gamma _{l}}) \bigg ]+z_{(k+1)j}\log w_{k+1} \\{} & {} +z_{(k+1)j}\log P_{y_{j}}(\lambda _{j})\bigg \}. \end{aligned}$$
(22)

For \(w_{1}=\cdots =w_{k}=0,\) the ZKIP is reduced to the ZIP model. From (22), the LLF of the ZIP for the complete data is derived as

$$\begin{aligned}{} & {} \ell _{comp} ({\textbf{w}},{\boldsymbol{\lambda }};\,{\textbf{y}},{\textbf{z}}_{\textbf{1}})= \sum _{j=1}^{n} \bigg \{ z_{0j}[ \gamma _{0} - \log \left( 1+\exp (\gamma _{0})]\right) \\{} & {} \quad +(1-z_{0j})\log w_1+(1-z_{0j})\log P_{y_{j}}(\lambda _{j})\bigg \}. \end{aligned}$$

Lambert [14] used Eq. (22) as the LLF of the complete data for the ZIP model to obtain the EM estimates.

We now proceed to describe the EM algorithm (Dempster et al. [8];\, Wu [35]) for the ZKIP model. The first step in the EM algorithm involves selecting some initial values for the unknown parameters. The choice of the initial values is important for the convergence of the algorithm. An incorrect choice of the initial values could result in slow convergence or breakdown of the algorithm. We recommend using the proportions of zeros, \(\ldots ,\) k’s, respectively, in the observed data as initial values for the parameters \(w_{0},\ldots ,w_{k}\) and use the relations (15) to get initial values \(\gamma _{00},\ldots ,\gamma _{k0},\) for the parameters \(\gamma _{0},\ldots ,\gamma _{k},\) respectively. The next step involves filling the latent values \({\textbf{z}}_{\textbf{i}}\) by their expectations, which is the E-step. We will use the conditional expected values of \(E({\textbf{z}}|{\textbf{y}})\) given in Table 2 to generate \({\textbf{z}}_{\textbf{i}}\)’s.

Table 2 \(E({\textbf{z}}|{\textbf{y}})\) for the ZKIP regression model

We use Table 2 to estimate the missing values in the expectation step of the EM algorithm as follows:

$$\begin{aligned}{} & {} {\hat{z}}_{ij}=E(z_{ij}|y_{j}=i)=\frac{\exp (\gamma _{i})}{\exp (\gamma _{i})+P_{i}(\lambda _{j})} \end{aligned}$$
(23)
$$\begin{aligned}{} & {} and\quad {\hat{z}}_{ij}=E(z_{ij}|y_{j} \ne i)=0,\qquad i=0,\ldots ,k,~ j=1,\ldots ,n. \end{aligned}$$
(24)

For the maximization step in the EM algorithm, we solve the following score equations instead of maximizing the complete LF directly:

$$\begin{aligned} \frac{\partial \ell _{comp}}{\partial {\boldsymbol{\beta }}}= & {} \sum _{j=1}^{n}{\hat{z}}_{(k+1)j}(y_{j}-\exp [{\textbf{x}}_{\textbf{i}}^{\textbf{T}}{\boldsymbol{\beta }}]){\textbf{x}}_{\textbf{i}}^{\textbf{T}}=0, \end{aligned}$$
(25)
$$\begin{aligned} \frac{\partial \ell _{comp}}{\partial \gamma _{i}}= & {} \sum _{j=1}^{n}{\hat{z}}_{ij}I_i(y_j)-\frac{n \exp (\gamma _{i})}{1+\sum _{l=0}^{k}\exp (\gamma _{l})}=0,\quad i=0,\ldots ,k, \end{aligned}$$
(26)

where \({\hat{z}}_{(k+1)j}=1-\sum _{i=0}^{k}{\hat{z}}_{ij}.\) In summary, the EM algorithm to estimate the parameters \((\gamma _{0},\ldots ,\gamma _{k})\) and the regression parameter \({\boldsymbol{\beta }}\) for the ZKIP regression model is as follows.

  1. 1.

    Select initial values \({\boldsymbol{\beta}}_{{\bf 0}},\) \((\gamma _{00},\ldots ,\gamma _{k0})\) for the parameters \({\boldsymbol{\beta }}\) and the vector of parameters \((\gamma _{0},\ldots ,\gamma _{k}),\) respectively.

  2. 2.

    E-step: Estimate \({\hat{z}}_{ij}\)’s, \(i=0,\ldots ,k\) using Eqs. (23) and (24).

  3. 3.

    M-step: Solve Eqs. (25) and (26) and obtain updated estimates \({\boldsymbol{\beta}}_{{\bf 1}},\) \((\gamma _{01},\ldots ,\gamma _{k1}).\)

  4. 4.

    Repeat E-step and M-step until the parameter estimates converge.

In the next subsection, we discuss how to obtain the standard errors of the estimates obtained by the EM algorithm.

4.2 Standard Errors for the EM Algorithm

The most commonly used method to get the standard errors in the mixture models is to compute the matrix of partial derivatives of the LLF for the observed data, that is, to calculate the information matrix from the observed data. Lambert [14] used this method for computing the standard errors for the ZIP regression model. Lin and Tsai [17] used the Hessian matrix to get the standard errors for the zero–k inflated Poisson model without actually computing second-order partial derivatives of the LLF. Recall that the Hessian matrix comes out as a byproduct of the nonlinear optimization methods in the common statistical packages.

To compute the standard errors of the estimates obtained by the EM algorithm, we follow the approach described by Louis [22]. The relation between the likelihood of the complete, observed, and missing data is given by

$$\begin{aligned} L_{comp}({\textbf{w}},{\boldsymbol{\lambda }}|{\textbf{y}},{\textbf{z}})=L_{obs}({\textbf{w}},{\boldsymbol{\lambda }}|{\textbf{y}})L_{miss}({\textbf{w}},{\boldsymbol{\lambda }}|({\textbf{z}}|{\textbf{y}})), \end{aligned}$$
(27)

where \({\textbf{y}}\) and \({\textbf{z}}\) stand for the observed and missing data, respectively. By taking the logarithm from Equation (27), we get

$$\begin{aligned} \ell _{obs}({\textbf{w}},{\boldsymbol{\lambda }}|{\textbf{y}})=\ell _{comp}({\textbf{w}},{\boldsymbol{\lambda }}|{\textbf{y}},{\textbf{z}})-\ell _{miss}({\textbf{w}},{\boldsymbol{\lambda }}|({\textbf{z}}|{\textbf{y}})), \end{aligned}$$
(28)

where \(\ell _{obs}({\textbf{w}},{\boldsymbol{\lambda }}|{\textbf{y}})=\log \left\{ L_{obs}({\textbf{w}},{\boldsymbol{\lambda }}|{\textbf{y}}) \right\} ,\) \(\ell _{comp}({\textbf{w}},{\boldsymbol{\lambda }}|{\textbf{y}},{\textbf{z}})=\log \left\{ L_{comp}({\textbf{w}},{\boldsymbol{\lambda }}|{\textbf{y}},{\textbf{z}}) \right\} \) and \(\ell _{miss}({\textbf{w}},\) \({\boldsymbol{\lambda }}|({\textbf{z}}|{\textbf{y}}))=\log \{L_{miss}({\textbf{w}},\) \({\boldsymbol{\lambda }}|({\textbf{z}}|{\textbf{y}}))\}.\) Taking second-order partial derivatives from Eq. (28), the information matrices for the complete, observed, and missing data satisfy the following identity:

$$\begin{aligned} I_{obs}=I_{comp}-I_{miss}, \end{aligned}$$
(29)

where the matrices \(I_{comp}\) and \(I_{miss}\) are, respectively, the Hessian matrix of the LLF of complete data and the LLF of missing data. Since the right hand side of Eq. (29) depends on the missing data, Louis [22] suggested to take the expected value of the missing data given the observed. This gives us the identity

$$\begin{aligned} I_{obs}=E(I_{obs}|{\textbf{y}})=E(I_{comp}|{\textbf{y}})-E(I_{miss}|{\textbf{y}}). \end{aligned}$$
(30)

In other words, Louis [22] estimated of the observed information matrix by

$$\begin{aligned} {\hat{I}}_{obs}=E(I_{obs}|{\textbf{y}})=E(I_{comp}|{\textbf{y}})-E(I_{miss}|{\textbf{y}}). \end{aligned}$$
(31)

Note that the LLF of the complete data for the ZKIP regression model is given by Eq. (22), and the corresponding first-order derivatives are shown in Eqs. (25) and (26). The elements of the matrix \(E(I_{comp}|{\textbf{y}})\) are the expected values of the negative of second-order partial derivatives of the complete LLF (22) given by

$$\begin{aligned} E\left[ -\frac{\partial ^{2} \ell _{comp}}{\partial \gamma _{i} ^{2}}\right]= & {} n\times \frac{\exp (\gamma _{i})(1+\sum _{l=0}^{k} \exp (\gamma _{l}))-\exp (2\gamma _{i})}{(1+\sum _{l=0}^{k} \exp (\gamma _{l}))^{2}},~\qquad i=0,\ldots ,k,\\ E\left[ -\frac{\partial ^{2} \ell _{comp}}{\partial \gamma _{l} \partial \gamma _{s} }\right]= & {} n\times \frac{\exp (\gamma _{l})(1+\sum _{l=0}^{k} \exp (\gamma _{l}))-\exp (\gamma _{l}+\gamma _s)}{(1+\sum _{m=0}^{k} \exp (\gamma _{m}))^{2}},\qquad l,s=0,\ldots ,k,\\ E\left[ -\frac{\partial ^{2} \ell _{comp}}{\partial {\boldsymbol{\beta }}\partial {\boldsymbol{\beta}}^{\varvec{T}}}\right]= & {} \sum _{j=1}^{n}\frac{\left[ P_{0}( \lambda _{j})\times \cdots \times P_{k}(\lambda _{j})-\exp (\gamma _{0}+\cdots +\gamma _{k})\right] \lambda _{j}}{[\exp (\gamma _{0})+P_{0}(\lambda _{j})]\times \cdots \times [\exp (\gamma _{k})+P_{k}(\lambda _{j})]}({\textbf{x}}_{\textbf{j}}{} {\textbf{x}}_{\textbf{j}}^{\textbf{T}}). \end{aligned}$$

To see this, from Eq. (15), we have

$$\begin{aligned} 1-\eta ({\textbf{w}})=w_{k+1}=\frac{1}{1+\sum _{l=0}^{k}\gamma _l}, \end{aligned}$$

and then Eq. (22) yields

$$\begin{aligned} \frac{\partial \ell _{comp}}{\partial \gamma _{i}} =\sum _{j=1}^{n} \bigg \{ z_{ij}I_{i}(y_{j})\bigg [ 1 - \frac{e^{\gamma _{i}}}{1+\sum _{l=0}^{k} e^{\gamma _{l}}} \bigg ]+z_{(k+1)j}\frac{e^{\gamma _{i}}}{1+\sum _{l=0}^{k} e^{\gamma _{l}}}\bigg \}. \end{aligned}$$
(32)

From Eq. (32) we conclude that

$$\begin{aligned} \frac{\partial ^2 \ell _{comp}}{\partial \gamma _{i}^2}= & {} \sum _{j=1}^{n} \bigg \{ z_{ij}I_{i}(y_{j})\bigg [ \frac{\exp (\gamma _{i})(1+\sum _{l=0}^{k} \exp (\gamma _{l}))-\exp (2\gamma _{i})}{(1+\sum _{l=0}^{k} \exp (\gamma _{l}))^{2}} \bigg ] \\{} & {} +z_{(k+1)j}\frac{\exp (\gamma _{i})(1+\sum _{l=0}^{k} \exp (\gamma _{l}))-\exp (2\gamma _{i})}{(1+\sum _{l=0}^{k} \exp (\gamma _{l}))^{2}}\bigg \}. \end{aligned}$$
(33)

After some algebraic manipulations and the value of \(E({\textbf{z}}|{\textbf{y}})\) from Table 2, the following result is concluded

$$\begin{aligned} E\left[ -\frac{\partial ^{2} \ell _{comp}}{\partial \gamma _{i} ^{2}}\right] =n\times \frac{\exp (\gamma _{i})(1+\sum _{l=0}^{k} \exp (\gamma _{l}))-\exp (2\gamma _{i})}{(1+\sum _{l=0}^{k} \exp (\gamma _{l}))^{2}}. \end{aligned}$$
(34)

Similarly, \(E\left[ -{\partial ^{2} \ell _{comp}}/{\partial \gamma _{l} \partial \gamma _{s} }\right] \) and \(E\left[ -{\partial ^{2} \ell _{comp}}/{\partial {\boldsymbol{\beta }}\partial {{\boldsymbol{\beta }}}^{\varvec{T}}}\right] \) are obtained. The LLF of the missing data for the ZKIP regression model is

$$\begin{aligned} \ell _{miss} ({\boldsymbol{\beta }},{\boldsymbol{\gamma }})= & {} \sum _{j=1}^{n} \bigg \{ \sum _{i=0}^{k}\left[ z_{ij} \gamma _{i} \right] + z_{(k+1)j}\log P_{y_{j}}(\lambda _{j})\bigg \} \\{} & {} -\sum _{j=1}^{n} \bigg \{ \sum _{i=0}^{k}I_{i}(y_{j})\log \left( \exp (\gamma _{i})+P_{y_{j}}(\lambda _{j})\right) \bigg \}-\sum _{j=1}^{n}\left( I(y_{j}>k)\log P_{y_{j}}(\lambda _{j}) \right) . \end{aligned}$$
(35)

Finally, the elements of the matrix \(E(I_{miss}|{\textbf{y}})\) are the negative of the expected value of the second-order derivatives of (35). These are given by

$$\begin{aligned} E\left[ -\frac{\partial ^{2} \ell _{miss}}{\partial \gamma _{i} ^{2}}\right]= & {} \sum _{j=1}^{n}\left\{ I_{j}(y_{j}) \frac{\exp (\gamma _{j})P_{y_{j}}(\lambda _{j})}{(\exp (\gamma _{j})+P_{y_{j}}(\lambda _{j}))^2} \right\} ,\qquad i=0,\ldots ,k, \\ E\left[ -\frac{\partial ^{2} \ell _{miss}}{\partial \gamma _{l} \partial \gamma _{s} }\right]= & {} 0,\qquad l,s=0,\ldots ,k\ and\ l\ne s,\\ E\left[ -\frac{\partial ^{2} \ell _{miss}}{\partial {\boldsymbol{\beta }}\partial \gamma _{i}}\right]= & {} \sum _{j=1}^{n}\left\{ I_{j}(y_{j}) \frac{\exp (\gamma _{j})P_{y_{j}}(\lambda _{j})}{(\exp (\gamma _{j})+P_{y_{j}}(\lambda _{j}))^2} \right\} ,\qquad i=0,\ldots ,k,\\ E\left[ -\frac{\partial ^{2} \ell _{miss}}{\partial {\boldsymbol{\beta }}\partial {\boldsymbol{\beta}}^{\varvec{T}}}\right]= & {} \sum _{j=1}^{n}\frac{\left[ P_{0}( \lambda _{j})\times \cdots \times P_{k}(\lambda _{j})-\exp (\gamma _{0}+\cdots +\gamma _{k})\right] \lambda _{j}{} {{\bf x}_{{\bf j}}{\bf x}_{{\bf j}}^{{\bf T}}}}{[\exp (\gamma _{0})+P_{0}(\lambda _{j})]\times \cdots \times [\exp (\gamma _{k})+P_{k}(\lambda _{j})]}\\{} & {} -\sum _{j=1}^{n}\bigg \{\sum _{i=0}^{k}I_{i}(y_{j})\frac{\left[ \exp (\gamma _{i})P_{y_{j}}(\lambda _{j})\left( \lambda _{j}-(y_{j}-\lambda _{j})^2 \right) +P^{2}_{y_{j}}(\lambda _{j})\lambda _{j}\right] {\textbf{x}}_{\textbf{j}}{} {\textbf{x}}_{\textbf{j}}^{\textbf{T}}}{(\exp (\gamma _{j})+P_{y_{j}(\lambda _{j})})^2} \bigg \}. \end{aligned}$$

4.3 Hypothesis Testing

Testing the impact of the jth covariate on the count response is equivalent to testing \(H_0: \beta _j= 0\) vs. \(H_1: \beta _j\ne 0.\) This hypothesis test is straightforward and can be done using the standard Wald statistic, \( z={\hat{\beta }}_i/SE({\hat{\beta }}_i),\) which has asymptotically standard normal distribution under the null hypothesis \(H_0.\) Here, \(SE(\hat{\beta _j})\) minus twice the LLF stands for the standard error of the estimate \(\hat{\beta _j}.\) An alternative approach for testing \(H_0: \beta _j = 0\) is to use the generalized likelihood ratio test statistic defined by

$$\begin{aligned} -2\log \Lambda = -2 \log \frac{L_{obs}(\widetilde{{\boldsymbol{\beta }}},\widetilde{\boldsymbol{\gamma }},\beta _j=0)}{L_{obs}(\hat{{\boldsymbol{\beta }}}, \hat{\boldsymbol{\gamma }})}, \end{aligned}$$
(36)

which has asymptotically the chi-Square distribution with one degree of freedom, where \(\widetilde{{\boldsymbol{\beta }}}\) and \(\widetilde{\boldsymbol{\gamma }}\) denote the ML estimates under the hypothesis \(\beta _j=0\) and \({\hat{\beta }}\) and \(\hat{\boldsymbol{\gamma }}\) are the ML estimates under the ZKIP model.

Since \(0 \le w_i \le 1,\ (i = 0,\ldots ,k),\) the null hypothesis \(H_0: w_{i_1}=\cdots =w_{i_r}= 0,\) \(0\le i_r \le k,\) corresponds to testing the parameters to check the necessity of their presence in the model. In this case, the regularity conditions are not met. That is, the standard asymptotic theory for the LRT statistic (36) is not applicable, and in fact, its asymptotic distribution is a mixture of chi-Square distributions.

5 The Model Diagnosis

The model diagnosis is an essential step to ensure that a fitted model is adequate for the observed data set. However, diagnosing counts models is still a challenging research problem. Pearson and deviance residuals are often used in practice for diagnosing counts models, despite wide recognition that these residuals are far from normality when applied to count data. RQRs, proposed by Dunn and Smyth [10] and Park et al. [26], are used to overcome the above-mentioned problems. The key idea of the RQR is to randomize the lower tail probability into a uniform random number between the discontinuity gap of the cumulative density function (CDF). It can be shown that the RQRs are normally distributed under the true model. To the best of our knowledge, the RQR has not been applied to the residual analyses for zero-inflated or modified mixed effects models.

To do this, we follow the Dunn and Smyth [10] approach. The RQR inverts the fitted distribution function for each response value and finds the equivalent standard normal quantile. Let \(G(y;\,{\textbf{w}},\lambda )\) denote the CDF for random variable y. If the CDF is continuous, then \(G(y_i;\, {\textbf{w}},\lambda )\) is uniformly distributed on the unit interval. RQRs can thus be defined as

$$\begin{aligned} q_i=\Phi ^{-1}[G(y_i;\,\hat{{\textbf{w}}}_i,{\hat{\lambda }}_i)], \end{aligned}$$
(37)

where \(\Phi ^{-1}(\cdot )\) is the quantile function of the standard normal distribution. However, if the CDF is discrete, then randomization is added to modify \(q_i\) in Eq. (37). To be more specific, let \(g(y_i;\, {\textbf{w}},\lambda )\) denote the PMF of y. The CDF can be redefined as

$$\begin{aligned} G^*(y;\, {\textbf{w}},\lambda )=G(y^-;\,{\textbf{w}},\lambda )+U\times g(y;\, {\textbf{w}},\lambda ), \end{aligned}$$

where U is a uniform random variable on [0, 1],  and \(G(y^-;\, {\textbf{w}},\lambda )\) is the lower limit of G in y. When G is discrete, we let \(a_i = lim_{y\rightarrow y_i^-} G(y;\, \hat{{\textbf{w}}}_i,{\hat{\lambda }}_i)\) and \(b_i = G(y_i;\, \hat{{\textbf{w}}}_i,{\hat{\lambda }}_i).\) Then, the randomized quantile residual is defined by

$$\begin{aligned} q_i=\Phi ^{-1}[G^{*}_{i}], \end{aligned}$$

where \(G^*_i\) is a uniform random variable on the interval \((a_i,b_i],\) and \(q_i \sim N(0,1).\) Here, N(0, 1) stands for the standard normal distribution. Therefore, the only information that is required for calculating RQRs is the CDF of the response variable. In numerical analysis, we will use RQRs to investigate the adequacy of the used models.

6 Simulation Studies

In this simulated scenario, we have evaluated the efficiency of the ZKIP regression model with the link function \(\log (\lambda )=\beta _0 +\beta _1 {\textbf{x}}_{1}+\beta _2 {\textbf{x}}_{2},\) where \(\beta _0=0.5,\) \(\beta _1=\beta _2=-0.05,\) \(w_0=w_1=w_2=w_3=0.2\) and \({\textbf{x}}_{1}\) and \({\textbf{x}}_{2}\) are, respectively, generated from N(0, 1) and the Bernoulli distribution with parameter 0.5 and \(k=0,1,2,3.\) In each step, we select a sample with sizes of \(n=100, 500, 1000\) from ZKIP regression \((k=0,1,2,3)\) models and estimate the parameters of the assumed models using the ML method. We repeat this step 10,000 times. If the model and its related definitions work correctly, the estimates obtained for the parameters of this models should be close to their real values. Also, as the sample size rises, the means of bias and standard error should be decreased. This assertions are supported by Table 3’s results.

Table 3 Bias of ML estimates and standard errors (in parentheses), and model diagnostics in the simulation study
Table 4 Frequency comparisons associated with simulated data from the ZKIP regression distribution

For the next simulated scenario, we have selected a sample with sizes \(n=100, 500, 1000,\) from Z3IP regression model (with the parameters similar to Table 3). Then, we have fitted the regression ZKIP with \(k=0,1,2\) and Poisson models, for the same sample. As anticipated, the proposed model performed admirably results. The regression Z3IP model shows the best outcomes based on the comparative criteria in Table 4.

7 Real Data Analysis

To assess the performance of the proposed ZKIP model and the corresponding regression version, two real data sets are analyzed.

7.1 Example 1 (Alcohol Consumption)

DeHart et al. [7] described a study in which “moderate to heavy drinkers” (at least 12 alcoholic drinks/week for women, 15 for men) were recruited to keep a daily record of each drink that they consumed over a 30-day study period. Participants also completed a variety of rating scales covering daily events in their lives and items related to self-esteem. Among the researchers’ hypotheses, the negative events, particularly those involving romantic relationships, are suspected to be related to the amount of alcohol consumed, especially among those with low self-esteem.

In this example, we consider numall (number of alcoholic beverages, or “drinks,” consumed in one day) as a response variable. In Tables 5 and 6 , we fit the ZKIP distribution to this data set. In the following, negevent (an index for combining the total number and intensity of negative events experienced during the day) and nrel (a measure of negative relationship interactions) are considered covariate variables. Tables 7 and 8 show the fitted regression ZKIP models.

Table 5 Estimates, standard errors (in parentheses), and model diagnostics for the Alcohol consumption data set
Table 6 Observed number of illness and the corresponding expected values under the fitted models in Table 5
Table 7 Estimates, standard errors (in parentheses), and model diagnostics (log-likelihood, and AIC) for the Alcohol consumption data set
Table 8 Observed number of illness and the corresponding expected values under the fitted models in Table 7

The results in Tables 5 and 6 show that the ZKIP regression model with \(k=6\) performs well and dominates the rest models. Tables 7 and 8 also confirm that the regression ZKIP with \(k=6\) is the best model among the considered models. In Table 9, we consider the problem of the hypothesis testing \(H_0 :w_6 = 0\) against the alternative \(H_1 :w_5> 0\) and \(H_1 :w_7 = 0,\) with significance level \(\alpha = 0.05.\) The asymptotic distribution is \(0.5\chi ^{2}_0 + 0.5\chi ^{2}_1,\) where \(\chi ^{2}_0 \equiv 0,\) for the Alcohol consumption data set (for more details, see [29]). We see that \(k=6\) is the best choose for the ZKIP model.

With these explanations, an algorithm can be presented to choose the appropriate value of k. We start fitting the inflated model from \(k=0\) and continuing this fitting until k,  so there is no significant difference between the fitting of k and the larger \(k+1.\)

Table 9 Testing to choose k in Tables 7 and 8

7.2 Example 2 (Lung Data Set)

This data set is about survival in those with advanced lung cancer from the “North Central Cancer Treatment Group” (Loprinzi et al. [21]). Performance scores rate how well the patient can perform usual daily activities. The Eastern Cooperative Oncology Group (ECOG) performance score is measured by the physician as 0 = asymptomatic, 1 = symptomatic but completely ambulatory, 2 = in bed \(<50\%\) of the day, 3 = in bed > 50% of the day but not bedbound, 4 = bedbound (ph.ecog). We consider the ph.ecog as the response variable and the Age and Sex of patients as covariates. The fitted ZKIP model for the response variable in the Lung data set is shown in Tables 10 and 11 . It is observed that the ZKIP model with \(k=2\) is the best one. Tables 12 and 13 confirm the same results in the regression ZKIP model. Table 14 also introduces \(k=2\) as an appropriate value for the regression ZKIP model for this data set.

Table 10 Estimates, standard errors (in parentheses), and model diagnostics for the Lung data set
Table 11 Observed number of illness and the corresponding expected values under the fitted models in Table 10
Table 12 Estimates, standard errors (in parentheses), and model diagnostics (log-likelihood, and AIC) for Lung data set
Table 13 Observed number of illness and the corresponding expected values under the fitted models in Table 12
Table 14 Testing to choose k in Tables 12 and 13

Figures 3 and 5 for, respectively, real data sets 1 and 2 show the RQRs of the regression ZKIP model for all observations. For the best k-obtained regression ZKIP fitted, these figures approximately show a random scatter around zero for RQR in the three real data sets, which seems to be reasonable. Figures 2 and 4 also show that the RQRs follow approximately the N(0, 1) distribution for the obtained k (Figs. 2, 345).

Fig. 2
figure 2

RQR’s plots for the alcohol consumption data set

Fig. 3
figure 3

QQ plots for the RQR’s of the alcohol consumption data set

Fig. 4
figure 4

RQR’s plots for the Lung data set

Fig. 5
figure 5

QQ plots for the RQR’s of the Lung data set

8 Conclusions

Previous research about the inflated Poisson distribution has suggested that inflation may occur at one or two points. By introducing the ZKIP distribution, in addition to covering all previous studies, we allowed the inflated points to be three or even more. Also, in this work, we studied the properties of the ZKIP distribution and the ZKIP regression model. The flexibility of the ZKIP distribution (which was perfectly illustrated with real data examples) shows that it may be used as a suitable model for real data set analysis with discrete responses. This flexibility was also observed in the regression version of the ZKIP distribution.

The introduced model can be used in decision trees, random forests and statistical quality control or any model that can be used for inflated discrete data. Also, this model can be developed for the neutrosophic statistics [3, 4, 31].