Abstract
The scan statistic is widely used in spatial cluster detection applications of inhomogeneous Poisson processes. However, real data may present substantial departure from the underlying Poisson process. One of the possible departures has to do with zero excess. Some studies point out that when applied to data with excess zeros, the spatial scan statistic may produce biased inferences. In this work, we develop a closed-form scan statistic for cluster detection of spatial zero-inflated count data. We apply our methodology to simulated and real data. Our simulations revealed that the Scan-Poisson statistic steadily deteriorates as the number of zeros increases, producing biased inferences. On the other hand, our proposed Scan-ZIP and Scan-ZIP+EM statistics are, most of the time, either superior or comparable to the Scan-Poisson statistic.
Similar content being viewed by others
References
Agarwal DK, Gelfand AE, Citron-Pousty S (2002) Zero-inflated models with application to spatial count data. Environ Ecol Stat 9:341–355
Agresti A (1990) Categorical data analysis. Wiley, New York
Böhning D, Dietz E, Schlattmann P, Mendonça L, Kirchner U (1999) The zero-inflated poisson model and the decayed, missing and filled teeth index in dental epidemiology. J R Stat Soc 162(2):195–209
Casella G, Berger RL (1990) Statistical inference. Duxbury Press, Belmont, CA
Gómez-Rubio V, López-Quílez A (2010) Statistical methods for the geographical analysis of rare diseases. Adv Exp Med Biol 686:151–171
Kulldorff M (1997) A spatial scan statistic. Commun Stat Theory Methods 26(6):1481–1496
Kulldorff M, Tango T, Park P (2003) Power comparisons for disease clustering tests. Comput Stat Data Anal 42:665–684
Kulldorff M, Huang L, Konty K (2009) A scan statistic for continuous data based on the normal probability model. Int J Health Geogr 8:58
Lambert D (1992) Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics 34:1–14
Özmen I, Famoye F (2007) Count regression models with an application to zoological data containing structural zeros. J Data Sci 5:491–502
Rathbun SL, Fei S (2006) A spatial zero-inflated poisson regression model for oak regeneration. Environ Ecol Stat 13:409–426
Acknowledgments
The authors would like to thank the anonymous referees for their careful reading of the manuscript and for constructive suggestions that considerably improved the article. André L. F. Cançado was partially supported by DPP/UnB. Cibele Q. da-Silva was supported by the National Research Council (CNPq-Brazil, BPPesq) and by the Office to Improve University Research (CAPES-Brazil) via Project PROCAD-NF 2008.
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling Editor: Pierre Dutilleul.
Appendix
Appendix
Proof of Theorem 1
Let \(\lambda (D)\) and \(\lambda (D^{\prime })\) denote the values for the test statistic for the two different data sets. Further, consider
Similarly, let
and let
As stated previously, these quantities are related to the sufficient statistics of the MLEs.
Under the null hypothesis, \(\lambda (D)=1,\) and this implies that
Thus, \(c(\hat{Z})=C=c'(\tilde{Z}^{\prime })\), and the distributions of \(\lambda (D)\) and \(\lambda (D^{\prime })\) are the same.
Under the alternative hypothesis \(\lambda (D)>1\), and we need to show that \(\lambda (D') \ge \lambda (D)\). Notice that under the conditions of the theorem, \(c'(\tilde{Z}^{\prime }) \ge c(\hat{Z})\), since
as \(\overline{\hat{Z}}\cap \tilde{Z}^{\prime }\) might be different from the empty set. When \(\lambda (D) > 1\), we have from Eq. (9) that
The first inequality holds since for any constants \(\alpha \), \(\beta \), and \(N\), \((\alpha n)^n(\beta (N-n))^{N-n}\) is an increasing function of \(n\) when \(\alpha n > \beta (N-n)\). This is true since \(\lambda (D) > 1\) implies that \(I(\hat{Z})>O(\hat{Z})\), that is, \(\frac{c(\hat{Z})}{r(\hat{Z})}>\frac{C}{R}\). This also means that \(I'(\hat{Z}) > O'(\hat{Z})\). In order to verify this, using a proof by contradiction, let us suppose that \(I'(\hat{Z}) \le O'(\hat{Z})\). Then, \(\frac{c'(\hat{Z})}{r'(\hat{Z})}\le \frac{C}{R}\). Since \(c'(\hat{Z}) \ge c(\hat{Z})\), this implies that \(\frac{c(\hat{Z})}{r(\hat{Z})} \le \frac{c'(\hat{Z})}{r(\hat{Z})} \le \frac{C}{R}\) whenever \(r(\hat{Z})=r'(\hat{Z})\), which is absurd. \(\square \)
Proof of Theorem 2
According to Definition 1, in order to prove that \(\lambda \) is an IMP test, it is necessary to show that if statements (1) and (2) are true, then (3) cannot hold. This is equivalent to showing that for any \((Z,\theta _0,\theta _Z) \ \in A_Z\),
For an arbitrary \(Z\), let \(D_{-}=\{w: w \in R_Z, w \notin R^{\prime }_Z \}\) and \(D_{+}=\{w: w \in R^{\prime }_Z, w \notin R_Z \}\). Define
By the definition of \(D_{+}\) and \(D_{-}\), since \(R_Z\) is described in terms of \(Z\), which is the most likely cluster in a subset of the sample space, we have that each \(w\) in \(D_{-}\) has a higher likelihood ratio than any \(w\) in \(D_{+}\); that is,
The proof of inequality (13) for any \((Z, \theta _Z, \theta _0) \in A_Z\) follows largely from Kulldorff (1997), where it is verified that
The last equality holds since \(R_j=R_j^{\prime }\) for all \(j \ne Z\), according to statement 2 in Definition 1.\(\square \)
Rights and permissions
About this article
Cite this article
Cançado, A.L.F., da-Silva, C.Q. & da Silva, M.F. A spatial scan statistic for zero-inflated Poisson process. Environ Ecol Stat 21, 627–650 (2014). https://doi.org/10.1007/s10651-013-0272-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10651-013-0272-1