Abstract
For the modeling of bounded counts, the binomial distribution is a common choice. In applications, however, one often observes an excessive number of zeros and extra-binomial variation, which cannot be explained by a binomial distribution. We propose statistics to evaluate the number of zeros and the dispersion with respect to a binomial model, which is based on the sample binomial index of dispersion and the sample binomial zero index. We apply this index to autocorrelated counts generated by a binomial autoregressive process of order one, which also includes the special case of independent and identically (i. i. d.) bounded counts. The limiting null distributions of the proposed test statistics are derived. A Monte-Carlo study evaluates their size and power under various alternatives. Finally, we present two real-data applications as well as the derivation of effective sample sizes to illustrate the proposed methodology.
Similar content being viewed by others
Notes
www.forecastingprinciples.com/index.php/crimedata, file PghCarBeat.csv.
In the Bavarian school system, Hauptschule constitutes the lowest level of secondary school, Gymnasium the highest level, and Realschule is ranked between them.
References
Ainsworth LM, Dean CB, Joy R (2016) Zero-inflated spatial models: application and interpretation. In B.C. Sutradhar (ed) Advances and challenges in parametric and semi-parametric analysis for correlated data (Lecture notes in statistics), vol 218. Springer, Basel, pp 75–96
Bayley GV, Hammersley JM (1946) The “effective” number of independent observations in an autocorrelated time series. Suppl J R Stat Soc 8(2):184–197
Böhning D, Dietz E, Schlattmann P, Mendonca L, Kirchner U (1999) The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. J R Stat Soc Ser A 162(2):195–209
Britt CL, Rocque M, Zimmerman GM (2017) The analysis of bounded count data in criminology. J Quant Criminol 34:591–607. https://doi.org/10.1007/s10940-017-9346-9
Davis RA, Holan SH, Lund R, Ravishanker N (eds) (2016) Handbook of discrete-valued time series. Chapman & Hall/CRC Press, Boca Raton
Falk M, Hain J, Marohn F, Fischer H, Michel R (2014) Statistik in Theorie und Praxis - Mit Anwendungen in R. Springer, Berlin (in German)
Fernández-Fontelo A, Cabaña A, Puig P, Moriña D (2016) Under-reported data analysis with INAR-hidden Markov chains. Stat Med 35(26):4875–4890
Guillera-Arroita G, Lahoz-Monfort JJ (2017) Species occupancy estimation and imperfect detection: Shall surveys continue after the first detection? AStA Adv Stat Anal 101(4):381–398
McKenzie E (1985) Some simple models for discrete variate time series. Water Resour Bull 21(4):645–650
Möller TA, Weiß CH, Kim H-Y, Sirchenko A (2018) Modeling zero inflation in count data time series with bounded support. Methodol Comput Appl Probab 20(2):589–609. https://doi.org/10.1007/s11009-017-9577-0
Mwalili SM, Lesaffre E, Declerck D (2008) The zero-inflated negative binomial regression model with correction for misclassification: an example in caries research. Stat Methods Med Res 17(2):123–139
Puig P, Valero J (2006) Count data distributions: some characterizations with applications. J Am Stat Assoc 101(473):332–340
Steutel FW, van Harn K (1979) Discrete analogues of self-decomposability and stability. Ann Probab 7(5):893–899
Weiß CH (2018) An introduction to discrete-valued time series. Wiley, Chichester
Weiß CH, Homburg A, Puig P (2016) Testing for zero inflation and overdispersion in INAR(1) models. Stat Papers. https://doi.org/10.1007/s00362-016-0851-y
Weiß CH, Kim H-Y (2013) Binomial AR(1) processes: moments, cumulants, and estimation. Statistics 47(3):494–510
Weiß CH, Kim H-Y (2014) Diagnosing and modeling extra-binomial variation for time-dependent counts. Appl Stoch Models Bus Ind 30(5):588–608
Weiß CH, Pollett PK (2012) Chain binomial models and binomial autoregressive processes. Biometrics 68(3):815–824
Yang M, Cavanaugh JE, Zamba GKD (2015) State-space models for count time series with excess zeros. Stat Model 15(1):70–90
Acknowledgements
The authors thank the editor and the referees for carefully reading the article and for their comments, which greatly improved the article. Main parts of this research were completed while the first author stayed as a guest professor at the Helmut Schmidt University in Hamburg. This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2018R1D1A1B07045707).
Author information
Authors and Affiliations
Corresponding author
Appendices
A Proofs
In the sequel, the following properties of a BAR(1) process are used: \((X_t)_{\mathbb {N}}\) is a stationary, ergodic and \(\phi \)-mixing finite Markov chain with marginal distribution \(\text {Bin}(n,\pi )\) (McKenzie 1985; Weiß and Kim 2013). Denoting \(\beta _h:=\pi \, (1-\rho ^h)\) and \(\alpha _h:=\beta _h +\rho ^h\), the (truly positive) h-step-ahead transition probabilities \(p_{k|l}^{(h)}=P(X_t=k\ |\ X_{t-h}=l)\) are given by
see Weiß and Pollett (2012), and conditional mean and variance are both linear in \(X_{t-h}\):
The ACF equals \(\rho _X(k)=\rho ^k\).
1.1 Proof of Theorem 1
We have \(E[{\varvec{Y}}_t]={\varvec{0}}\), and with analogous arguments as in “Appendix A.6” in Weiß and Kim (2013), we conclude that \(T^{-1/2}\cdot \sum _{t=1}^T {\varvec{Y}}_t\) is asymptotically normally distributed with covariance matrix \({\varvec{\varSigma }}= (\sigma _{ij})\) given by
Note that a BAR(1) process is time-reversible (McKenzie 1985), \(E[Y_{0,i}\cdot Y_{k,j}] = E[Y_{k,i} \cdot Y_{0,j}]\) always holds. So we compute \(\sigma _{11}\) as
To get \( E[Y_{0,1} \cdot Y_{0,1}]\) and \(E[Y_{0,1} \cdot Y_{k,1}]\), we evaluate
Hence,
For \(\sigma _{12}\), it follows that
To compute \(E[Z_{0,1} \cdot Z_{0,2}]\) and \(E[Z_{0,1} \cdot Z_{k,2}]\), note that
Hence,
For
we need to calculate
Next
Since
we obtain
Hence,
By inserting the corresponding terms, we obtain
The remaining entries of \({\varvec{\varSigma }}\) are available from Section 3 in Weiß and Kim (2014).
1.2 Proof of Theorem 2
Define the function \({\varvec{g}}:\mathbb {R}^3\rightarrow \mathbb {R}^2\) with components
where
The gradient of \(f_{\text {z}}\), \(f_{\text {d}}\) equals
Evaluated for \((x_1, x_2, x_3)=\big ((1-\pi )^n, n\pi , n\pi (n \pi +1-\pi )\big )\), we obtain
Theorem 2 follows from Theorem 1 and the Delta method. The matrix \(\tilde{{\varvec{\varSigma }}} := {\mathbf{D }}{\varvec{\varSigma }}{\mathbf{D }}^{\top }\) has entries
where
1.3 Proof of Theorem 3
The approximate bias of is obtained in analogy to the approach of Weiß et al. (2016), by using the second-order Taylor expansion of \(f_{\text {z}}\) in “Appendix A.2”. The derivatives of \(f_{\text {z}}\) are
So the Hessian of \(f_{\text {z}}\) evaluated at \(\big ((1-\pi )^n , n \pi \big )\) is given by
Therefore, we obtain , with \({\varvec{Z}}_T=\frac{1}{\sqrt{T}}\sum _{t=1}^T{\varvec{Y}}_t\) satisfying \(E[{\varvec{Z}}_T]={\varvec{0}}\), and
In order to get the approximate bias of , by analogous computation, we obtain the Hessian of \(f_{\text {d}}\) evaluated at \(\big (n \pi , n\pi (n\pi +1-\pi )\big )\):
where
Therefore,
B Summary of models used for power study in Sect. 3.3
The BB-AR(1) model used for DGP1 has been proposed by Weiß and Kim (2014) and extends the BAR(1) model to account for extra-binomial variation in the time series. This model is based on beta-binomial thinning: let \(\alpha _{\phi }\) be a random variable being independent of X, which follows the beta distribution BETA\(\big (\frac{1-\phi }{\phi }\cdot \alpha ,\ \frac{1-\phi }{\phi }\cdot (1-\alpha )\big )\), where \(\alpha ,\phi \in (0;1)\), then the random variable \(\alpha _{\phi }\circ X\) is obtained from X by beta-binomial thinning if the operator “\(\circ \)” is the binomial thinning operator, performed independently of X and \(\alpha _{\phi }\).
where all \(\alpha _{\phi },\beta _{\phi }\) and all thinnings are performed independently of each other, and where \(\alpha _{\phi },\beta _{\phi }\) and the thinnings at time t are independent of \((X_s)_{s<t}\). In analogy to the interpretation of (1.1), “\(\alpha _{\phi }\circ X_{t-1}\)” expresses a survival mechanism and “\(\beta _{\phi }\circ (n-X_{t-1})\)” a revival mechanism.
The models used for DGP2–DGP5 have been proposed by Möller et al. (2018). These four extensions of the BAR(1) model can accommodate a broad variety of zero inflation patterns. The RZ-BAR(1) process (DGP2),
and the IZ-BAR(1) process (DGP3),
are defined by distinguishing between the underlying BAR(1) process (kernel) and the resulting zero-inflated processes, by denoting the BAR(1) kernel by \((X_t)_{\mathbb {N}}\) and the zero-inflated process by \((Z_t)_{\mathbb {N}}\).
The ZIB-AR(1) process (DGP4) uses the concept of zero-inflated binomial thinning (ZIB thinning) “\(\odot \)”, which is defined as \((\alpha ,\omega )\odot Z | Z\ \sim {\text {ZIB}}(Z,\alpha , \omega )\). It follows the recursion
where we considered the special case \(\omega _{\alpha }=\omega _{\beta }\) for the simulations. In analogy to the interpretation of (1.1), “\((\alpha , \omega _{\alpha })\odot Z_{t-1}\)” expresses a survival mechanism and “\((\beta , \omega _{\beta })\odot (n-Z_{t-1})\)” a revival mechanism.
The ZT-BAR(1) process (DGP5) with the additional model parameter \(\beta _{0} \in (0;1)\) is defined by a self-exciting threshold mechanism with threshold value 0:
Rights and permissions
About this article
Cite this article
Kim, HY., Weiß, C.H. & Möller, T.A. Testing for an excessive number of zeros in time series of bounded counts. Stat Methods Appl 27, 689–714 (2018). https://doi.org/10.1007/s10260-018-00431-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-018-00431-z
Keywords
- Binomial AR(1) model
- Binomial index of dispersion
- Binomial zero index
- Extra-binomial dispersion
- Extra-binomial zeros
- Adjusted sample size