Abstract
When the number of zeros in a count dataset exceeds the accommodation of the probability mass of a regular Poisson distribution at zero, the zero-inflated Poisson (ZIP) distribution is often used. To characterize the potential non-linear effects of covariates and avoid the “curse of dimensionality”, we propose a spline-based ZIP regression single-index model. B-splines are employed to estimate the unknown smooth function. A modified Fisher scoring method is proposed to simultaneously estimate the linear coefficients and the regression function. It is shown that the spline estimator of the nonparametric component is uniformly consistent, and achieves the optimal convergence rate under the smooth condition, and that the estimators of regression parameters are asymptotically normal and efficient. The spline-based semiparametric likelihood ratio test is also established. Moreover, a direct and consistent variance estimation method based on least-squares estimation is proposed. Simulations are performed to evaluate the proposed method.
Similar content being viewed by others
References
Böhning, D., Dietz, E., Schlattmann, P., Mendonca, L., & Kirchner, U. (1999). The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. Journal of the Royal Statistical Society Series A Statistics in Society, 162, 195–209.
Carroll, R. J., Fan, J., Gijbels, I., & Wand, M. P. (1997). Generalized partially linear single-index models. Journal of the American Statistical Association, 92, 477–489.
Cheung, Y. B. (2002). Zero-inflated models for regression analysis of count data: a study of growth and development. Statistics in Medicine, 21, 1461–1469.
de Boor, C. (2001). A practical guide to splines. New York: Springer.
Dietz, K., & Böhning, D. (1997). The use of two-component mixture models with one completely or partly known component. Computational Statistics, 12, 219–234.
Delecroix, M., Härdle, W., & Hristache, M. (2003). Efficient estimation in conditional single-index regression. Journal of Multivariate Analysis, 86, 213–216.
Härdle, W., & Stoker, E. M. (1989). Investigating smooth multiple regression by the method of average derivatives. Journal of the American Statistical Association, 84, 986–995.
Härdle, W., Hall, P., & Ichimura, H. (1993). Optimal smoothing in single-index models. Annals of Statisitcs, 21, 157–178.
Hastie, T., Tibshirani, R. (1990). Generalized additive models. New York: Chapman & Hall/CRC.
Horowitz, J. L., & Härdle, W. (1996). Direct semiparametric estimation of single-index models with discrete covariate. Journal of the American Statistical Association, 91, 1632–1640.
Huang, J. Z., Liu, L. (2006). Polynomial spline estimation and inference of proportional hazards regression models with flexible relative risk form. Biometrics, 62, 793–802.
Huang, J., Zhang, Y., Hua, L. (2008). A least-squares approach to consistent information estimation in semiparametric models. Technical Report 2008–3, University of Iowa, Department of Biostatistics
Ichimura, H. (1993). Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. Journal of Econometrics, 58, 71–120.
Johnson, N. L., Kotz, S., Kemp, A. W. (2005). Univariate discrete distributions (3rd ed.). New York: Wiley.
Kosorok, M. R. (2008). Introduction to empirical processes and semiparametric inference. Dordrecht: Springer.
Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics, 34, 1–14.
Lu, M., & Loomis, D. (2013). Spline-based semiparametric estimation of partially linear poisson regression with single-index models. Journal of Nonparametric Statistics, 25, 905–922.
Lu, S. E., Lin, Y., Shih, W. C. J. (2004). Analyzing excessive no changes in clinical trials with clustered data. Biometrics, 60, 257–267.
Murphy, S. A., & van der Vaart, A. W. (1997). Semiparametric likelihood ratio inference. Annals of Statistics, 25, 1471–1509.
Murphy, S. A., van der Vaart, A. W. (1999). Observed information in semi-parametric models. Bernoulli, 5, 381–412.
Nielsen, G. G., Gill, R. D., Andersen, P. K., & Sörensen, T. I. A. (1992). A counting process approach to maximum likelihood estimation in frailty models. Scandinavian Journal of Statistics, 19, 25–43.
Rosenberg, P. S. (1995). Hazard function estimation using B-splines. Biometrics, 51, 874–887.
Schumaker, L. (1981). Spline functions: basic theory. New York: Wiley.
Shen, X., & Wong, W. H. (1994). Convergence rate of sieve estimates. Annals of Statistics, 22, 580–615.
Singh, S. (1963). A note on inflated Poisson distribution. Journal of the Indian Statistical Association, 1, 140–144.
Stone, C. J. (1985). Additive regression and other nonparametric models. Annals of Statistics, 13, 689–705.
Stone, C. J. (1986). The dimensionality reduction principle for generalized additive models. Annals of Statistics, 14, 590–606.
Sun, J., Kopciukb, K. A., & Lu, X. (2008). Polynomial spline estimation of partially linear single-index proportional hazards regression models. Computational Statistics and Data Analysis, 53, 176–188.
van der Vaart, A. W. (2000). Asymptotic statistics. Cambridge: Cambridge University Press.
van der Vaart, A. W., Wellner, J. A. (1996). Weak convergence and empirical processes. New York: Springer.
Yau, K. K. W., & Lee, A. H. (2001). Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme. Statistics in Medicine, 20, 2907–2920.
Yu, Y., & Ruppert, D. (2002). Penalized spline estimation for partially linear single-index models. Journal of the American Statistical Association, 97, 1042–1054.
Acknowledgments
The authors express their thanks to an associate editor and two referees whose constructive comments improved the presentation. The Project described was supported by the National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), through Grant #UL1 TR000002 (C.S. Li).
Author information
Authors and Affiliations
Corresponding author
A Appendix
A Appendix
1.1 A.1 Notations and lemmas
Let \(P_{\varvec{\tau }}\) be the distribution of \((y,\mathbf {w}^\mathtt{{T}})^\mathtt{{T}}\) under the parameter vector \(\varvec{\tau }\) and \(p_{\varvec{\tau }}\) the corresponding density. Denote \(P_0\equiv {P}_{\varvec{\tau }_0}\) and \(p_0\equiv {p}_{\varvec{\tau }_0}\). For a measurable function f, define Pf as the expectation of f under P. For any class of measurable functions \(\mathcal {F}\), the bracketing number \(N_{[]}(\varepsilon ,\mathcal {F},L_2(P))\) is defined as the minimum number of brackets \([f_i^L,f_i^R],i=1,\ldots ,m\), such that, for \(f\in \mathcal {F}\), there exists \(1\le {i}\le {m}\) such that \(f_i^L\le {f}\le {f}_i^R\) and \(\Vert f_i^R-f_i^L\Vert _2\le \varepsilon \). Define the bracketing integral \(J_{[]}(\delta ,\mathcal {F},L_2(P))=\int _0^\delta \left[ 1+N_{[]}(\varepsilon ,\mathcal {F},L_2(P))\right] ^{1/2}d\varepsilon \). Denote \(\mathbb {G}_nf=\sqrt{n}( {\mathbb P}_{n}-P)f\) and \(\Vert \mathbb {G}_n\Vert _{\mathcal {F}}=\sup _{f\in \mathcal {F}}|\mathbb {G}_nf|\). In the following, C represents a positive constant that may vary from place to place.
Lemma 1
For any \(\delta > 0\), define \(\mathcal {L}=\{\ell (\varvec{\tau };y,\mathbf {w}):\psi \in \mathcal {S}_{0,n},{\varvec{\theta }}\in \mathcal {R}^{2p+2d-1},\Vert \varvec{\tau }-\varvec{\tau }_0\Vert _2\le \delta \}\). Then, for any \(0 < \varepsilon \le \delta \), \(\log N_{[]}(\varepsilon ,\mathcal {L},\Vert \cdot \Vert _{P,B}) \le C q_n\log (\delta /\varepsilon )\), and, hence, \(J_{[]}(\delta ,\mathcal {L},\Vert \cdot \Vert _{P,B})\le C q_n^{1/2}\delta \), where \(\Vert \cdot \Vert _{P,B}\) is the Bernstein norm defined as \(\Vert f\Vert _{P,B}^2 = 2P\left[ \exp (|f|)-|f|-1\right] \) in van der Vaart and Wellner (1996) and \(q_n=m_n + l\) is the number of spline basis functions.
Lemma 2
If conditions C1–C6 hold, then there exists \(C > 0\) such that
for \(\varvec{\tau }\) in a neighborhood of \(\varvec{\tau }_0\).
Lemma 3
(Consistency) If conditions C1–C6 hold, then \(\Vert \hat{\varvec{\tau }}-\varvec{\tau }_0 \Vert _2 = o_p(1)\).
Remark 3
Lemma 1 and the similar entropy calculations are used to derive the consistency of \(\hat{\varvec{\tau }}\) and to prove Theorems 1–4. Lemma 2 is a key result to derive the consistency and rate of convergence of \(\hat{\varvec{\tau }}\). Lemma 3 shows \(\hat{\varvec{\tau }}\) is asymptotically consistent to \(\varvec{\tau }_0\).
1.2 A.2 Proof of Lemma 1
Proof
According to the bracketing calculation in Shen and Wong (1994), for any \(\delta > 0\) and \(0 < \varepsilon \le \delta \), the logarithm of bracketing number of \(\mathcal {S}_{0,n}\), computed with \(L_2(P)\), is bounded by \(q_n \log (\delta /\varepsilon )\) up to a constant. It is known that the neighborhoods \(\mathbf {A}(\delta )=\{\varvec{\alpha }:\Vert \varvec{\alpha }-\varvec{\alpha }_{0}\Vert _2\le \delta \}\), \(\mathbf {Z}(\delta )=\{\varvec{\zeta }:\Vert \varvec{\zeta }-\varvec{\zeta }_0\Vert _2\le \delta \}\), and \(\mathbf {B}(\delta )=\{\varvec{\beta }_1:\Vert \varvec{\beta }_1-\varvec{\beta }_{10}\Vert _2\le \delta \}\) can be covered by \(O((\delta /\varepsilon )^{p+d})\), \(O((\delta /\varepsilon )^d)\), and \(O((\delta /\varepsilon )^p)\) balls with radius \(\varepsilon \), respectively. In view of Theorem 9.23 of Kosorok (2008), the bracketing numbers for \(\{\mathbf {w}^\mathtt{{T}}\varvec{\alpha }:\Vert \varvec{\alpha }-\varvec{\alpha }_{0}\Vert _2\le \delta \}\), \(\{\mathbf {z}^\mathtt{{T}}\varvec{\zeta }:\Vert \varvec{\zeta }-\varvec{\zeta }_0\Vert _2\le \delta \}\), and \(\{\mathbf {x}^\mathtt{{T}}\varvec{\beta }_1:\Vert \varvec{\beta }_1 -\varvec{\beta }_{10}\Vert _2\le \delta \}\) are bounded by \(O((\delta /\varepsilon )^{p+d})\), \(O((\delta /\varepsilon )^d)\), and \(O((\delta /\varepsilon )^{p})\), respectively. It follows that, for sufficiently large n,
and, hence,
because the function \(x\mapsto \exp (x)\) is Lipschitz and monotonic. By inequality \(2[\exp (|x|)-1-|x|] \le x^2\exp (|x|)\),
Similarly, we can show that
The transformation \(\left( \pi (\mathbf {w};\varvec{\alpha }),\lambda (\mathbf {w};\varvec{\beta }_1,{\varvec{\phi }},\psi )\right) \mapsto \ell (\pi (\mathbf {w};\varvec{\alpha }),\lambda (\mathbf {w};\varvec{\beta }_1,{\varvec{\phi }},\psi );\varvec{\tau })\) is essentially Lipschitz, so it follows that \(\log {N}_{[]}\left( \varepsilon ,\mathcal {L},\Vert \cdot \Vert _{P,B}\right) \le {C}q_n\log (\delta /\varepsilon )\), and, hence, the bracketing integral is bounded by \(q_n^{1/2}\delta \), up to a constant. \(\square \)
1.3 A.3 Proof of Lemma 2
Proof
Let \( {\mathbb M}_{n}(\varvec{\tau })= {\mathbb P}_{n}\ell (\varvec{\tau };y,\mathbf {w})\) and \( {\mathbb M}(\varvec{\tau })=P\ell (\varvec{\tau };y,\mathbf {w})\). For any \(\varvec{\tau }\) in a neighborhood of \(\varvec{\tau }_0\), a Taylor’s expansion yields
up to a constant. Let \(\textit{g}_1(\mathbf {x})=\mathbf {x}^\mathtt{{T}}(\varvec{\beta }_1-\varvec{\beta }_{10})\) and \(\textit{g}_2(\mathbf {z})=\psi (\mathbf {z}^\mathtt{{T}}\varvec{\zeta })-\psi _0(\mathbf {z}^\mathtt{{T}}\varvec{\zeta }_0)\). According to the law of total expectation and Cauchy-Schwarz inequality, \(\{E[\textit{g}_1(\mathbf {x})\textit{g}_2(\mathbf {z})]\}^2\le {E}_{\mathbf {z}}[\textit{g}_2^2(\mathbf {z})]E_{\mathbf {z}}[\{E_{\mathbf {x}|\mathbf {z}}[\textit{g}_1(\mathbf {x})|\mathbf {z}]\}^2]\). By the orthogonality of a conditional expectation, there exists \(0<\xi <1\) such that \(E_{\mathbf {z}}[\{E_{\mathbf {x}|\mathbf {z}}[\textit{g}_1(\mathbf {x})|\mathbf {z}]\}^2]=\xi {E}_\mathbf {x}[\textit{g}_1^2(\mathbf {x})]\). Hence, \(E[\textit{g}_1^2(\mathbf {x})\textit{g}_2^2(\mathbf {z})] \le \xi {E}[\textit{g}_1^2(\mathbf {x})]E[\textit{g}_2^2(\mathbf {z})]\). In view of Lemma 25.86 of van der Vaart (2000),
up to a constant. By Lemma 1 of Stone (1985) and conditions C3 and C4, it follows that \( {\mathbb M}(\varvec{\tau }_0)- {\mathbb M}(\varvec{\tau })\ge {C}\Vert \varvec{\tau }-\varvec{\tau }_0\Vert _2^2\). \(\square \)
1.4 A.4 Proof of Lemma 3
Proof
We verify the conditions of Theorem 5.7 in van der Vaart (2000) to prove the consistency of \(\hat{\varvec{\tau }}\). According to Lemma 1, \(\mathcal {L}\) is a Donsker class, and is therefore a Glivenko-Cantelli class. Thus, \(\sup _{\varvec{\tau }}|( {\mathbb P}_{n}-P)\ell (\varvec{\tau };y,\mathbf {w})|=o_p(1)\) for \(\varvec{\tau }\) in a neighborhood of \(\varvec{\tau }_0\). The first condition of the theorem holds. It follows from Lemma 2 that \(\sup _{\Vert \varvec{\tau }-\varvec{\tau }_0\Vert _2\ge \varepsilon } {\mathbb M}(\varvec{\tau })\le {\mathbb M}(\varvec{\tau }_0)-C\varepsilon ^2 < {\mathbb M}(\varvec{\tau }_0)\). Hence, the second condition of the theorem also holds.
According to Jackson’s theorem for polynomials (de Boor 2001), there exists a spline of order \({l}\ge {2}\) \(\psi _{0,n}\in \mathcal {S}_{0,n}\) such that \(\Vert \psi _{0,n}-\psi _0\Vert _\infty =O(n^{-r\nu })\) for \(1/(2r+2)<\nu < 1/(2r)\). Let \(\varvec{\tau }_{0,n}=({\varvec{\theta }}_0,\psi _{0,n})\). By definition of \(\hat{\varvec{\tau }}\),
where \(I_{n1}=( {\mathbb P}_{n}-P)[\ell (\varvec{\tau }_{0,n};y,\mathbf {w})-\ell (\varvec{\tau }_0;y,\mathbf {w})]\) and \(I_{n2}= P[\ell (\varvec{\tau }_{0,n};y,\mathbf {w})-\ell (\varvec{\tau }_0;y,\mathbf {w})]\). As shown in the proof of Lemma 1, \(\mathcal {S}_{0,n}\) is a Donsker class. Because \(\ell ({\varvec{\theta }}_0,\psi ;y,\mathbf {w})\) is essentially Lipschitz with respect to \(\psi \), the preservation theorem of Donsker class yields the class of functions \(\ell ({\varvec{\theta }}_0,\psi ;y,\mathbf {w})-\ell ({\varvec{\theta }}_0,\psi _0;y,\mathbf {w})\), for \(\psi \in \mathcal {S}_{0,n}\) and \(\Vert \psi -\psi _0\Vert _2\le \delta \), is a Donsker class. Moreover, by the mean value theorem, \(P[\ell (\varvec{\tau }_{0,n};y,\mathbf {w})-\ell (\varvec{\tau }_0;y,\mathbf {w})]^2\le {C}\Vert \psi _{0,n}-\psi _0\Vert _\infty ^2\rightarrow 0\) as \(n\rightarrow \infty \). In view of Lemma 19.24 of van der Vaart (2000), \(I_{n1}=o_p(n^{-1/2})\). Observe that \(I_{n2}\ge {-C}\Vert \psi _{0,n}-\psi _0\Vert _\infty ^2=-O(n^{-2r\nu })\). It follows that \( {\mathbb M}_{n}(\hat{\varvec{\tau }})- {\mathbb M}_{n}(\varvec{\tau }_0) > -o_p(1)\). Therefore, Theorem 5.7 of van der Vaart (2000) applies and yields the of consistency of \(\hat{\varvec{\tau }}\). \(\square \)
1.5 A.5 Proof of Theorem 1
Proof
We apply Theorem 3.4.1 of van der Vaart and Wellner (1996) to prove the rate of convergence. Let \(\varvec{\tau }\in \mathbf {\Theta }_n=\{({\varvec{\phi }},\varvec{\alpha },\varvec{\beta }_1,\psi ):{\varvec{\phi }}\in \mathcal {R}^{d-1},\varvec{\alpha }\in \mathcal {R}^{p+d},\varvec{\beta }_1\in \mathcal {R}^{p},\psi \in \mathcal {S}_{0,n}\}\). Choose \(d_n(\varvec{\tau },\varvec{\tau }_{0,n})\) and \(M_n(\varvec{\tau })\) defined in the theorem to be \(\Vert \varvec{\tau }-\varvec{\tau }_{0,n}\Vert _2\) and \( {\mathbb M}(\varvec{\tau })\), respectively. By definition of \(\hat{\varvec{\tau }}\), \( {\mathbb M}_n(\hat{\varvec{\tau }}) \ge {\mathbb M}_n(\varvec{\tau }_{0,n})\). In the proof of Lemma 3 for consistency, we have already shown that \( {\mathbb M}(\varvec{\tau })- {\mathbb M}(\varvec{\tau }_0)\le -Cd_n^2(\varvec{\tau },\varvec{\tau }_0)\). Because \( {\mathbb M}(\varvec{\tau }_0)- {\mathbb M}(\varvec{\tau })\le {C}d_n(\varvec{\tau }_{0,n},\varvec{\tau }_0)\le C\Vert \psi _{0,n}-\psi _0\Vert _\infty =O(n^{-r\nu })\), for any \(\varvec{\tau }\in \mathbf {\Theta }_n\) such that \(\delta /2\le {d}_n(\varvec{\tau },\varvec{\tau }_{0,n})\le \delta \), we have \(d_n(\varvec{\tau },\varvec{\tau }_0) \ge d_n(\varvec{\tau },\varvec{\tau }_{0,n})-d_n(\varvec{\tau }_{0,n},\varvec{\tau }_0) >C\delta \) for sufficiently large n. It follows that
for sufficiently large n.
For any \(\delta >0\), in view of Lemma 1,
Moreover, for \(\varvec{\tau }\in \mathbf {\Theta }_n\) and \(\delta /2\le {d}_n(\varvec{\tau },\varvec{\tau }_{0,n}) \le \delta \), by inequality \(2[\exp (|x|)-|x|-1]\le {x}^2\exp (|x|)\) and conditions C3–C5, \(\Vert \ell (\varvec{\tau };y,\mathbf {w})-\ell (\varvec{\tau }_{0,n};y,\mathbf {w})\Vert _{P,B}^2\le {C}\delta ^2\). Lemma 3.4.3 of van der Vaart and Wellner (1996) yields
with \(\phi _n(\delta )=q_n^{1/2}\delta +n^{-1/2}q_n\). Obviously, \(\phi _n(\delta )/\delta \) is decreasing in \(\delta \). It can be readily shown that \(r_n^2\phi _n(1/r_n)\le {n}^{1/2}\) with \(r_n=n^{\min (r\nu ,(1-\nu )/2)}\). Theorem 3.4.1 of van der Vaart and Wellner (1996) is applied to yield \(r_nd_n(\hat{\varvec{\tau }},\varvec{\tau }_{0,n})=O_p(1)\). Because \(d_n(\varvec{\tau }_{0,n},\varvec{\tau }_0)=O(n^{-r\nu })\), it follows that \(r_nd_n(\hat{\varvec{\tau }},\varvec{\tau }_0) \le r_nd_n(\hat{\varvec{\tau }},\varvec{\tau }_{0,n})+r_nd_n(\varvec{\tau }_{0,n},\varvec{\tau }_0)=O_p(1)+r_nO(n^{-r\nu })=O_p(1)\). This completes the proof of the rate of convergence. \(\square \)
1.6 A.6 Proof of Theorem 2
Proof
Theorem 2 follows from the same arguments as those in the proof of Theorem 1(b) of Lu and Loomis (2013) and entropy calculations similar to those in Lemma 1. \(\square \)
1.7 A.7 Proof of Theorem 3
Proof
We apply Theorem 3.1 of Murphy and van der Vaart (1997) to prove the asymptotic distribution of the spline likelihood ratio test statistic. Let \(\mathbf {t}=\left( \mathbf {t}_1^\mathtt{{T}},\mathbf {t}_2^\mathtt{{T}},\mathbf {t}_3^\mathtt{{T}}\right) ^\mathtt{{T}}\), where \(\mathbf {t}_1\in \mathbb {R}^{d-1}\), \(\mathbf {t}_2\in \mathbb {R}^{p+d}\), and \(\mathbf {t}_3\in \mathbb {R}^{p}\). Define an approximately least favorable submodel
where \(\psi _{\mathbf {t}}({\varvec{\theta }},\psi )=\psi +({\varvec{\theta }}-\mathbf {t})^\mathtt{{T}}\mathbf {h}^*\circ \psi _0^{-1}\circ \psi \). Let \(p(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})\) and \(\ell (\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})\) be density and log density functions under parameters \((\mathbf {t},\psi _{\mathbf {t}}({\varvec{\theta }},\psi ))\), respectively. Also denote by \(\dot{\ell }(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})\) the first derivatives of \(\ell (\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})\) with respect to \(\mathbf {t}\). Some derivative calculations then yield
Here \(\xi _{\mathbf {t}}\), \(\eta _{\mathbf {t}}\), \(\psi _{\mathbf {t}}\), and \(\psi ^\prime _{\mathbf {t}}\) represent \(\xi \), \(\eta \), \(\psi \), and \(\psi ^\prime \) evaluated at \((\mathbf {t},\psi _{\mathbf {t}}(\mathbf {t},\psi ))\), respectively. \(\nabla _{\mathbf {t}_1}(\mathbf {h}_1^*\circ \psi _0\circ \psi _\mathbf {t})\) is the gradient of \(\mathbf {h}_1^*\circ \psi _0\circ \psi _\mathbf {t}\) with respect to \(\mathbf {t}_1\). Observe that \(\dot{\ell }(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})\) converges to \(\ell _{{\varvec{\theta }}}^*(\varvec{\tau }_0;y,\mathbf {w})\) as \((\mathbf {t},{\varvec{\theta }},\psi )\rightarrow ({\varvec{\theta }}_0,{\varvec{\theta }}_0,\psi _0)\). Moreover, using the similar arguments to those in the proof of Lemma 1, we can show that, for any \(\delta >0\), the class of functions \(\dot{\ell }(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})\) with \(\psi \in \mathcal {S}_{0,n}\), \(\Vert \psi -\psi _0\Vert _2\le \delta \), \(\Vert \mathbf {t}-{\varvec{\theta }}_0\Vert _2 \le \delta \), and \(\Vert {\varvec{\theta }}-{\varvec{\theta }}_0\Vert _2\le \delta \) is P-Donsker. Thus, Lemma 3.2 of Murphy and van der Vaart (1997) is applicable.
Using the same arguments as above, we can show that, for \((\mathbf {t},{\varvec{\theta }},\psi )\) in a neighborhood of \(({\varvec{\theta }}_0,{\varvec{\theta }}_0,\psi _0)\), the class of \(p^{-1}(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})\partial ^2p(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})/\partial \mathbf {t}\partial \mathbf {t}^\mathtt{{T}}\) is P-Donsker and is therefore P-Glivenko-Cantelli. Furthermore, as \((\mathbf {t},{\varvec{\theta }},\psi )\rightarrow ({\varvec{\theta }}_0,{\varvec{\theta }}_0,\psi _0)\),
It follows that condition 3.14 in Murphy and van der Vaart (1997) holds. Thus the conditions in Theorem 3.1 of Murphy and van der Vaart (1997) reduce to the unbiasedness condition
where \(\hat{\psi }_0\) is the estimator of \(\psi _0\) under \({\varvec{\theta }}={\varvec{\theta }}_0\). Using the same arguments as those in the proof of the convergence rate of \(\hat{\varvec{\tau }}\), we can deduce that \(\Vert \hat{\psi }_0-\psi _0\Vert _2=O_p(n^{-r/(1+2r)})\).
Abbreviate \(\dot{\ell }({\varvec{\theta }}_0,{\varvec{\theta }}_0,\psi ;y,\mathbf {w})\) to \(\dot{\ell }(\psi ;y,\mathbf {w})\). In view of the fact that \(P_{{\varvec{\theta }},\psi }\dot{\ell }({\varvec{\theta }},{\varvec{\theta }},\psi ;y,\mathbf {w})=0\) for all \(({\varvec{\theta }},\psi )\), we can decompose \(P_0\dot{\ell }(\hat{\psi }_0;y,\mathbf {w})\) as \(I_{n5}+I_{n6}\), where \(I_{n5}=(P_0-P_{{\varvec{\theta }}_0,\hat{\psi }_0})\dot{\ell }(\psi _0;y,\mathbf {w})\) and \(I_{n6}=(P_0-P_{{\varvec{\theta }}_0,\hat{\psi }_0})[\dot{\ell }(\hat{\psi }_0;y,\mathbf {w})-\dot{\ell }(\psi _0;y,\mathbf {w})]\). Observe that \(I_{n5}=P_0\{\dot{\ell }(\psi _0;y,\mathbf {w})[(p_0-p_{{\varvec{\theta }}_0,\hat{\psi }_0})/p_0-\dot{\ell }_\psi (\varvec{\tau }_0;y,\mathbf {w})[\psi _0-\hat{\psi }_0]]\}\). By a Taylor’s expansion, \(I_{n5}\) can be expressed as
where \(0<t^*<1\). According to conditions C3–C5 and the rate of convergence of \(\hat{\psi }_0\), \(I_{n5}=o_p(n^{-1/2})\). Similarly, a Taylor’s expansion and the rate of convergence of \(\hat{\psi }_0\) yield \(I_{n6}=o_p(n^{-1/2})\). This completes the proof of Theorem 3. \(\square \)
1.8 A.8 Proof of Theorem 4
Proof
In view of the consistency of \(\hat{\varvec{\tau }}\) and Proposition 2.1 of Huang et al. (2008), we can show that \( {\mathbb P}_{n}[\dot{\ell }_{{\varvec{\theta }}}(\hat{\varvec{\tau }};y,\mathbf {w})-\dot{\ell }_\psi (\hat{\varvec{\tau }};y,\mathbf {w})[\hat{\mathbf {h}}^*]]^{\otimes 2}\rightarrow \mathbf {I}({\varvec{\theta }}_0)\) in probability. According to some entropy calculations and the law of large numbers, it follows that \(\hat{E}_{{\varvec{\theta }}{\varvec{\theta }}}=\hat{A}_{{\varvec{\theta }}{\varvec{\theta }}}+o_p(1)\), \(\hat{E}_{{\varvec{\theta }}\psi }=\hat{A}_{{\varvec{\theta }}\psi }+o_p(1)\), and \(\hat{E}_{\psi \psi }=\hat{A}_{\psi \psi }+o_p(1)\). We conclude that \(\mathcal {E}_n\rightarrow \mathbf {I}({\varvec{\theta }}_0)\) in probability. This completes the proof of Theorem 4. \(\square \)
About this article
Cite this article
Lu, M., Li, CS. Spline-based semiparametric estimation of a zero-inflated Poisson regression single-index model. Ann Inst Stat Math 68, 1111–1134 (2016). https://doi.org/10.1007/s10463-015-0527-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-015-0527-8