Skip to main content

Advertisement

Log in

A spatial scan statistic for zero-inflated Poisson process

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

The scan statistic is widely used in spatial cluster detection applications of inhomogeneous Poisson processes. However, real data may present substantial departure from the underlying Poisson process. One of the possible departures has to do with zero excess. Some studies point out that when applied to data with excess zeros, the spatial scan statistic may produce biased inferences. In this work, we develop a closed-form scan statistic for cluster detection of spatial zero-inflated count data. We apply our methodology to simulated and real data. Our simulations revealed that the Scan-Poisson statistic steadily deteriorates as the number of zeros increases, producing biased inferences. On the other hand, our proposed Scan-ZIP and Scan-ZIP+EM statistics are, most of the time, either superior or comparable to the Scan-Poisson statistic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Agarwal DK, Gelfand AE, Citron-Pousty S (2002) Zero-inflated models with application to spatial count data. Environ Ecol Stat 9:341–355

    Article  Google Scholar 

  • Agresti A (1990) Categorical data analysis. Wiley, New York

    Google Scholar 

  • Böhning D, Dietz E, Schlattmann P, Mendonça L, Kirchner U (1999) The zero-inflated poisson model and the decayed, missing and filled teeth index in dental epidemiology. J R Stat Soc 162(2):195–209

    Article  Google Scholar 

  • Casella G, Berger RL (1990) Statistical inference. Duxbury Press, Belmont, CA

  • Gómez-Rubio V, López-Quílez A (2010) Statistical methods for the geographical analysis of rare diseases. Adv Exp Med Biol 686:151–171

    Article  PubMed  Google Scholar 

  • Kulldorff M (1997) A spatial scan statistic. Commun Stat Theory Methods 26(6):1481–1496

    Article  Google Scholar 

  • Kulldorff M, Tango T, Park P (2003) Power comparisons for disease clustering tests. Comput Stat Data Anal 42:665–684

    Article  Google Scholar 

  • Kulldorff M, Huang L, Konty K (2009) A scan statistic for continuous data based on the normal probability model. Int J Health Geogr 8:58

    Article  PubMed Central  PubMed  Google Scholar 

  • Lambert D (1992) Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics 34:1–14

    Article  Google Scholar 

  • Özmen I, Famoye F (2007) Count regression models with an application to zoological data containing structural zeros. J Data Sci 5:491–502

    Google Scholar 

  • Rathbun SL, Fei S (2006) A spatial zero-inflated poisson regression model for oak regeneration. Environ Ecol Stat 13:409–426

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the anonymous referees for their careful reading of the manuscript and for constructive suggestions that considerably improved the article. André L. F. Cançado was partially supported by DPP/UnB. Cibele Q. da-Silva was supported by the National Research Council (CNPq-Brazil, BPPesq) and by the Office to Improve University Research (CAPES-Brazil) via Project PROCAD-NF 2008.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to André L. F. Cançado.

Additional information

Handling Editor: Pierre Dutilleul.

Appendix

Appendix

Proof of Theorem 1

Let \(\lambda (D)\) and \(\lambda (D^{\prime })\) denote the values for the test statistic for the two different data sets. Further, consider

$$\begin{aligned} I(Z)=\left[ \,\frac{\sum _{i \in {Z}}^{}x_i(1-d_i)}{\sum _{i \in {Z}}^{}n_i(1-d_i)}\right] \quad \hbox {and} \quad O(Z)=\left[ \,\frac{\sum _{i \notin {Z}}^{}x_i(1-d_i)}{\sum _{i \notin {Z}}^{}n_i(1-d_i)}\right] . \end{aligned}$$

Similarly, let

$$\begin{aligned} I'(Z)=\left[ \,\frac{\sum _{i \in {Z}}^{}x_i'(1-d_i')}{\sum _{i \in {Z}}^{}n_i'(1-d_i')}\right] \quad \hbox {and} \quad O'(Z)=\left[ \,\frac{\sum _{i \notin {Z}}^{}x_i'(1-d_i')}{\sum _{i \notin {Z}}^{}n_i'(1-d_i')}\right] , \end{aligned}$$

and let

$$\begin{aligned} C&= \sum _{i=1}^{k}x_i(1-d_i) = \sum _{j=1}^{k}x_j'(1-d_j'),\\ R&= \sum _{i=1}^{k}n_i(1-d_i)=\sum _{j=1}^{k}n_j'(1-d_j'),\\ c(Z)&= \sum _{i \in {Z}}^{}x_i(1-d_i),\\ r(Z)&= \sum _{i \in {Z}}^{}n_i(1-d_i),\\ c'(Z)&= \sum _{i \in {Z}}^{}x_i'(1-d_i'),\\ r'(Z)&= \sum _{i \in {Z}}^{}n_i'(1-d_i'),\\ K&= \left[ \,\frac{\sum _{i=1}^{k}x_i(1-d_i)}{\sum _{i=1}^{k}n_i(1-d_i)}\right] ^{\sum _{i=1}^{k}x_i(1-d_i)}. \end{aligned}$$

As stated previously, these quantities are related to the sufficient statistics of the MLEs.

Under the null hypothesis, \(\lambda (D)=1,\) and this implies that

$$\begin{aligned} K&= \hbox {sup}_{Z \in \mathcal {Z}} \left[ I(Z)\right] ^{c(Z)} \left[ O(Z)\right] ^{C-c(Z)}= \left[ I(\hat{Z})\right] ^{c(\hat{Z})} \left[ O(\hat{Z})\right] ^{C-c(\hat{Z})} \\&= \hbox {sup}_{Z \in \mathcal {Z}} \left[ I'(Z)\right] ^{c'(Z)} \left[ O'(Z)\right] ^{C-c'(Z)} = \left[ I'(\tilde{Z}^{\prime })\right] ^{c'(\tilde{Z}^{\prime })} \left[ O'(\tilde{Z}^{\prime })\right] ^{C-c'(\tilde{Z}^{\prime })}. \end{aligned}$$

Thus, \(c(\hat{Z})=C=c'(\tilde{Z}^{\prime })\), and the distributions of \(\lambda (D)\) and \(\lambda (D^{\prime })\) are the same.

Under the alternative hypothesis \(\lambda (D)>1\), and we need to show that \(\lambda (D') \ge \lambda (D)\). Notice that under the conditions of the theorem, \(c'(\tilde{Z}^{\prime }) \ge c(\hat{Z})\), since

$$\begin{aligned} \sum _{i \in \tilde{Z}^{\prime }}^{}x_i'(1-d_i')= \sum _{i \in \hat{Z}}^{}x_i'(1-d_i')+ \sum _{i \in \overline{\hat{Z}}\cap \tilde{Z}^{\prime }}^{}x_i'(1-d_i') \ge \sum _{j \in \hat{Z}} x_j(1-d_j), \end{aligned}$$

as \(\overline{\hat{Z}}\cap \tilde{Z}^{\prime }\) might be different from the empty set. When \(\lambda (D) > 1\), we have from Eq. (9) that

$$\begin{aligned} \lambda (D)&= \hbox {sup}_{Z \in \mathcal {Z}} \frac{1}{K} \left[ I(Z)\right] ^{c(Z)} \left[ O(Z)\right] ^{C-c(Z)}\\&= \frac{1}{K} \left[ I(\hat{Z})\right] ^{c(\hat{Z})} \left[ O(\hat{Z})\right] ^{C-c(\hat{Z})} \\&= \frac{1}{K} \left[ \frac{c(\hat{Z})}{r(\hat{Z})}\right] ^{c(\hat{Z})} \left[ \frac{C-c(\hat{Z})}{R-r(\hat{Z})}\right] ^{C-c(\hat{Z})}\\&\le \frac{1}{K}\left[ \frac{c'(\hat{Z})}{r'(\hat{Z})}\right] ^{c'(\hat{Z})} \left[ \frac{C-c'(\hat{Z})}{R-r'(\hat{Z})}\right] ^{C-c'(\hat{Z})}\\&\le \hbox {sup}_{Z \in \mathcal {Z}} \frac{1}{K} \left[ I'(Z)\right] ^{c'(Z)} \left[ O'(Z)\right] ^{C-c'(Z)}\\&= \frac{1}{K} \left[ I'(\tilde{Z}^{\prime })\right] ^{c'(\tilde{Z}^{\prime })} \left[ O'(\tilde{Z}^{\prime })\right] ^{C-c'(\tilde{Z}^{\prime })}= \lambda (D^{\prime }). \end{aligned}$$

The first inequality holds since for any constants \(\alpha \), \(\beta \), and \(N\), \((\alpha n)^n(\beta (N-n))^{N-n}\) is an increasing function of \(n\) when \(\alpha n > \beta (N-n)\). This is true since \(\lambda (D) > 1\) implies that \(I(\hat{Z})>O(\hat{Z})\), that is, \(\frac{c(\hat{Z})}{r(\hat{Z})}>\frac{C}{R}\). This also means that \(I'(\hat{Z}) > O'(\hat{Z})\). In order to verify this, using a proof by contradiction, let us suppose that \(I'(\hat{Z}) \le O'(\hat{Z})\). Then, \(\frac{c'(\hat{Z})}{r'(\hat{Z})}\le \frac{C}{R}\). Since \(c'(\hat{Z}) \ge c(\hat{Z})\), this implies that \(\frac{c(\hat{Z})}{r(\hat{Z})} \le \frac{c'(\hat{Z})}{r(\hat{Z})} \le \frac{C}{R}\) whenever \(r(\hat{Z})=r'(\hat{Z})\), which is absurd. \(\square \)

Proof of Theorem 2

According to Definition 1, in order to prove that \(\lambda \) is an IMP test, it is necessary to show that if statements (1) and (2) are true, then (3) cannot hold. This is equivalent to showing that for any \((Z,\theta _0,\theta _Z) \ \in A_Z\),

$$\begin{aligned} P( w \in R^{\prime }_Z \mid (Z,\theta _0,\theta _Z))-P( w \in R_Z \mid (Z,\theta _0,\theta _Z))\le 0. \end{aligned}$$
(13)

For an arbitrary \(Z\), let \(D_{-}=\{w: w \in R_Z, w \notin R^{\prime }_Z \}\) and \(D_{+}=\{w: w \in R^{\prime }_Z, w \notin R_Z \}\). Define

$$\begin{aligned} M = \hbox {sup}_{ \ w \in D_{+}} \frac{L(Z,\theta _Z,\theta _0 \mid w)}{L(\theta _0 \mid w)}. \end{aligned}$$

By the definition of \(D_{+}\) and \(D_{-}\), since \(R_Z\) is described in terms of \(Z\), which is the most likely cluster in a subset of the sample space, we have that each \(w\) in \(D_{-}\) has a higher likelihood ratio than any \(w\) in \(D_{+}\); that is,

$$\begin{aligned} M&= \hbox {sup}_{ \ w \in D_{+}} \frac{L(Z,\theta _Z,\theta _0 \mid w)}{L(\theta _0 \mid w)} \le \hbox {inf}_{ \ w \in D_{-}} \frac{L(Z,\theta _Z,\theta _0 \mid w)}{L(\theta _0 \mid w)},\\ M&= \hbox {sup}_{ \ w \in D_{+}} \frac{\left[ \frac{\sum _{i \in Z_{w}}^{}x_i(1-d_i)}{\sum _{i \in Z_{w}}^{}n_i(1-d_i)} \right] ^{\sum _{i \in Z_{w}}^{}x_i(1-d_i)} \left[ \frac{\sum _{j \notin Z_{w}}^{}x_j(1-d_j)}{\sum _{j \notin Z_{w}}^{}n_j(1-d_j)} \right] ^{\sum _{j \notin Z_{w}}^{}x_j(1-d_j)}}{\left[ \frac{\sum _{i=1}^{k}x_i(1-d_i)}{\sum _{i=1}^{k}n_i(1-d_i)} \right] ^{\sum _{i=1}^{k}x_i(1-d_i)}}\\&\le \hbox {inf}_{ \ w \in D_{-}} \frac{\left[ \frac{\sum _{i \in Z_{w}}^{}x_i(1-d_i)}{\sum _{i \in Z_{w}}^{}n_i(1-d_i)} \right] ^{\sum _{i \in Z_{w}}^{}x_i(1-d_i)} \left[ \frac{\sum _{j \notin Z_{w}}^{}x_j(1-d_j)}{\sum _{j \notin Z_{w}}^{}n_j(1-d_j)} \right] ^{\sum _{j \notin Z_{w}}^{}x_j(1-d_j)}}{\left[ \frac{\sum _{i=1}^{k}x_i(1-d_i)}{\sum _{i=1}^{k}n_i(1-d_i)} \right] ^{\sum _{i=1}^{k}x_i(1-d_i)}}\\&= \hbox {inf}_{ \ w \in D_{-}} \frac{L(Z,\theta _Z,\theta _0 \mid w)}{L(\theta _0 \mid w)}. \end{aligned}$$

The proof of inequality (13) for any \((Z, \theta _Z, \theta _0) \in A_Z\) follows largely from Kulldorff (1997), where it is verified that

$$\begin{aligned}&P( w \in R^{\prime }_Z \mid (Z,\theta _0,\theta _Z))-P( w \in R_Z \mid (Z,\theta _0,\theta _Z)) \\&\quad \le M(P(w \in R^{\prime } \mid H_0) - P(w \in R \mid H_0))=0. \end{aligned}$$

The last equality holds since \(R_j=R_j^{\prime }\) for all \(j \ne Z\), according to statement 2 in Definition 1.\(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cançado, A.L.F., da-Silva, C.Q. & da Silva, M.F. A spatial scan statistic for zero-inflated Poisson process. Environ Ecol Stat 21, 627–650 (2014). https://doi.org/10.1007/s10651-013-0272-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-013-0272-1

Keywords

Navigation