Skip to main content

Advertisement

Log in

Spline-based semiparametric estimation of a zero-inflated Poisson regression single-index model

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

When the number of zeros in a count dataset exceeds the accommodation of the probability mass of a regular Poisson distribution at zero, the zero-inflated Poisson (ZIP) distribution is often used. To characterize the potential non-linear effects of covariates and avoid the “curse of dimensionality”, we propose a spline-based ZIP regression single-index model. B-splines are employed to estimate the unknown smooth function. A modified Fisher scoring method is proposed to simultaneously estimate the linear coefficients and the regression function. It is shown that the spline estimator of the nonparametric component is uniformly consistent, and achieves the optimal convergence rate under the smooth condition, and that the estimators of regression parameters are asymptotically normal and efficient. The spline-based semiparametric likelihood ratio test is also established. Moreover, a direct and consistent variance estimation method based on least-squares estimation is proposed. Simulations are performed to evaluate the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Böhning, D., Dietz, E., Schlattmann, P., Mendonca, L., & Kirchner, U. (1999). The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. Journal of the Royal Statistical Society Series A Statistics in Society, 162, 195–209.

    Article  Google Scholar 

  • Carroll, R. J., Fan, J., Gijbels, I., & Wand, M. P. (1997). Generalized partially linear single-index models. Journal of the American Statistical Association, 92, 477–489.

    Article  MathSciNet  MATH  Google Scholar 

  • Cheung, Y. B. (2002). Zero-inflated models for regression analysis of count data: a study of growth and development. Statistics in Medicine, 21, 1461–1469.

    Article  Google Scholar 

  • de Boor, C. (2001). A practical guide to splines. New York: Springer.

    MATH  Google Scholar 

  • Dietz, K., & Böhning, D. (1997). The use of two-component mixture models with one completely or partly known component. Computational Statistics, 12, 219–234.

    MATH  Google Scholar 

  • Delecroix, M., Härdle, W., & Hristache, M. (2003). Efficient estimation in conditional single-index regression. Journal of Multivariate Analysis, 86, 213–216.

    Article  MathSciNet  MATH  Google Scholar 

  • Härdle, W., & Stoker, E. M. (1989). Investigating smooth multiple regression by the method of average derivatives. Journal of the American Statistical Association, 84, 986–995.

    MathSciNet  MATH  Google Scholar 

  • Härdle, W., Hall, P., & Ichimura, H. (1993). Optimal smoothing in single-index models. Annals of Statisitcs, 21, 157–178.

    Article  MathSciNet  MATH  Google Scholar 

  • Hastie, T., Tibshirani, R. (1990). Generalized additive models. New York: Chapman & Hall/CRC.

  • Horowitz, J. L., & Härdle, W. (1996). Direct semiparametric estimation of single-index models with discrete covariate. Journal of the American Statistical Association, 91, 1632–1640.

    Article  MathSciNet  MATH  Google Scholar 

  • Huang, J. Z., Liu, L. (2006). Polynomial spline estimation and inference of proportional hazards regression models with flexible relative risk form. Biometrics, 62, 793–802.

  • Huang, J., Zhang, Y., Hua, L. (2008). A least-squares approach to consistent information estimation in semiparametric models. Technical Report 2008–3, University of Iowa, Department of Biostatistics

  • Ichimura, H. (1993). Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. Journal of Econometrics, 58, 71–120.

    Article  MathSciNet  MATH  Google Scholar 

  • Johnson, N. L., Kotz, S., Kemp, A. W. (2005). Univariate discrete distributions (3rd ed.). New York: Wiley.

  • Kosorok, M. R. (2008). Introduction to empirical processes and semiparametric inference. Dordrecht: Springer.

    Book  MATH  Google Scholar 

  • Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics, 34, 1–14.

    Article  MATH  Google Scholar 

  • Lu, M., & Loomis, D. (2013). Spline-based semiparametric estimation of partially linear poisson regression with single-index models. Journal of Nonparametric Statistics, 25, 905–922.

    Article  MathSciNet  MATH  Google Scholar 

  • Lu, S. E., Lin, Y., Shih, W. C. J. (2004). Analyzing excessive no changes in clinical trials with clustered data. Biometrics, 60, 257–267.

  • Murphy, S. A., & van der Vaart, A. W. (1997). Semiparametric likelihood ratio inference. Annals of Statistics, 25, 1471–1509.

    Article  MathSciNet  MATH  Google Scholar 

  • Murphy, S. A., van der Vaart, A. W. (1999). Observed information in semi-parametric models. Bernoulli, 5, 381–412.

  • Nielsen, G. G., Gill, R. D., Andersen, P. K., & Sörensen, T. I. A. (1992). A counting process approach to maximum likelihood estimation in frailty models. Scandinavian Journal of Statistics, 19, 25–43.

    MathSciNet  MATH  Google Scholar 

  • Rosenberg, P. S. (1995). Hazard function estimation using B-splines. Biometrics, 51, 874–887.

    Article  MATH  Google Scholar 

  • Schumaker, L. (1981). Spline functions: basic theory. New York: Wiley.

    MATH  Google Scholar 

  • Shen, X., & Wong, W. H. (1994). Convergence rate of sieve estimates. Annals of Statistics, 22, 580–615.

    Article  MathSciNet  MATH  Google Scholar 

  • Singh, S. (1963). A note on inflated Poisson distribution. Journal of the Indian Statistical Association, 1, 140–144.

    MathSciNet  Google Scholar 

  • Stone, C. J. (1985). Additive regression and other nonparametric models. Annals of Statistics, 13, 689–705.

    Article  MathSciNet  MATH  Google Scholar 

  • Stone, C. J. (1986). The dimensionality reduction principle for generalized additive models. Annals of Statistics, 14, 590–606.

    Article  MathSciNet  MATH  Google Scholar 

  • Sun, J., Kopciukb, K. A., & Lu, X. (2008). Polynomial spline estimation of partially linear single-index proportional hazards regression models. Computational Statistics and Data Analysis, 53, 176–188.

    Article  MathSciNet  MATH  Google Scholar 

  • van der Vaart, A. W. (2000). Asymptotic statistics. Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • van der Vaart, A. W., Wellner, J. A. (1996). Weak convergence and empirical processes. New York: Springer.

  • Yau, K. K. W., & Lee, A. H. (2001). Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme. Statistics in Medicine, 20, 2907–2920.

    Article  Google Scholar 

  • Yu, Y., & Ruppert, D. (2002). Penalized spline estimation for partially linear single-index models. Journal of the American Statistical Association, 97, 1042–1054.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors express their thanks to an associate editor and two referees whose constructive comments improved the presentation. The Project described was supported by the National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), through Grant #UL1 TR000002 (C.S. Li).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chin-Shang Li.

A Appendix

A Appendix

1.1 A.1 Notations and lemmas

Let \(P_{\varvec{\tau }}\) be the distribution of \((y,\mathbf {w}^\mathtt{{T}})^\mathtt{{T}}\) under the parameter vector \(\varvec{\tau }\) and \(p_{\varvec{\tau }}\) the corresponding density. Denote \(P_0\equiv {P}_{\varvec{\tau }_0}\) and \(p_0\equiv {p}_{\varvec{\tau }_0}\). For a measurable function f, define Pf as the expectation of f under P. For any class of measurable functions \(\mathcal {F}\), the bracketing number \(N_{[]}(\varepsilon ,\mathcal {F},L_2(P))\) is defined as the minimum number of brackets \([f_i^L,f_i^R],i=1,\ldots ,m\), such that, for \(f\in \mathcal {F}\), there exists \(1\le {i}\le {m}\) such that \(f_i^L\le {f}\le {f}_i^R\) and \(\Vert f_i^R-f_i^L\Vert _2\le \varepsilon \). Define the bracketing integral \(J_{[]}(\delta ,\mathcal {F},L_2(P))=\int _0^\delta \left[ 1+N_{[]}(\varepsilon ,\mathcal {F},L_2(P))\right] ^{1/2}d\varepsilon \). Denote \(\mathbb {G}_nf=\sqrt{n}( {\mathbb P}_{n}-P)f\) and \(\Vert \mathbb {G}_n\Vert _{\mathcal {F}}=\sup _{f\in \mathcal {F}}|\mathbb {G}_nf|\). In the following, C represents a positive constant that may vary from place to place.

Lemma 1

For any \(\delta > 0\), define \(\mathcal {L}=\{\ell (\varvec{\tau };y,\mathbf {w}):\psi \in \mathcal {S}_{0,n},{\varvec{\theta }}\in \mathcal {R}^{2p+2d-1},\Vert \varvec{\tau }-\varvec{\tau }_0\Vert _2\le \delta \}\). Then, for any \(0 < \varepsilon \le \delta \), \(\log N_{[]}(\varepsilon ,\mathcal {L},\Vert \cdot \Vert _{P,B}) \le C q_n\log (\delta /\varepsilon )\), and, hence, \(J_{[]}(\delta ,\mathcal {L},\Vert \cdot \Vert _{P,B})\le C q_n^{1/2}\delta \), where \(\Vert \cdot \Vert _{P,B}\) is the Bernstein norm defined as \(\Vert f\Vert _{P,B}^2 = 2P\left[ \exp (|f|)-|f|-1\right] \) in van der Vaart and Wellner (1996) and \(q_n=m_n + l\) is the number of spline basis functions.

Lemma 2

If conditions C1–C6 hold, then there exists \(C > 0\) such that

$$\begin{aligned} P[\ell (\varvec{\tau }_0;y,\mathbf {w})-\ell (\varvec{\tau };y,\mathbf {w})]\ge C\Vert \varvec{\tau }-\varvec{\tau }_0\Vert _2^2, \end{aligned}$$

for \(\varvec{\tau }\) in a neighborhood of \(\varvec{\tau }_0\).

Lemma 3

(Consistency) If conditions C1–C6 hold, then \(\Vert \hat{\varvec{\tau }}-\varvec{\tau }_0 \Vert _2 = o_p(1)\).

Remark 3

Lemma 1 and the similar entropy calculations are used to derive the consistency of \(\hat{\varvec{\tau }}\) and to prove Theorems 14. Lemma 2 is a key result to derive the consistency and rate of convergence of \(\hat{\varvec{\tau }}\). Lemma 3 shows \(\hat{\varvec{\tau }}\) is asymptotically consistent to \(\varvec{\tau }_0\).

1.2 A.2 Proof of Lemma 1

Proof

According to the bracketing calculation in Shen and Wong (1994), for any \(\delta > 0\) and \(0 < \varepsilon \le \delta \), the logarithm of bracketing number of \(\mathcal {S}_{0,n}\), computed with \(L_2(P)\), is bounded by \(q_n \log (\delta /\varepsilon )\) up to a constant. It is known that the neighborhoods \(\mathbf {A}(\delta )=\{\varvec{\alpha }:\Vert \varvec{\alpha }-\varvec{\alpha }_{0}\Vert _2\le \delta \}\), \(\mathbf {Z}(\delta )=\{\varvec{\zeta }:\Vert \varvec{\zeta }-\varvec{\zeta }_0\Vert _2\le \delta \}\), and \(\mathbf {B}(\delta )=\{\varvec{\beta }_1:\Vert \varvec{\beta }_1-\varvec{\beta }_{10}\Vert _2\le \delta \}\) can be covered by \(O((\delta /\varepsilon )^{p+d})\), \(O((\delta /\varepsilon )^d)\), and \(O((\delta /\varepsilon )^p)\) balls with radius \(\varepsilon \), respectively. In view of Theorem 9.23 of Kosorok (2008), the bracketing numbers for \(\{\mathbf {w}^\mathtt{{T}}\varvec{\alpha }:\Vert \varvec{\alpha }-\varvec{\alpha }_{0}\Vert _2\le \delta \}\), \(\{\mathbf {z}^\mathtt{{T}}\varvec{\zeta }:\Vert \varvec{\zeta }-\varvec{\zeta }_0\Vert _2\le \delta \}\), and \(\{\mathbf {x}^\mathtt{{T}}\varvec{\beta }_1:\Vert \varvec{\beta }_1 -\varvec{\beta }_{10}\Vert _2\le \delta \}\) are bounded by \(O((\delta /\varepsilon )^{p+d})\), \(O((\delta /\varepsilon )^d)\), and \(O((\delta /\varepsilon )^{p})\), respectively. It follows that, for sufficiently large n,

$$\begin{aligned} \log {N}_{[]}\left( \varepsilon ,\{\mathbf {x}^\mathtt{{T}}\varvec{\beta }_1+\psi (\mathbf {z}^\mathtt{{T}}\varvec{\zeta }):\psi \in \mathcal {S}_{0,n},\Vert \varvec{\tau }-\varvec{\tau }_0\Vert _2\le \delta \},L_2(P)\right) \le {C}q_n\log (\delta /\varepsilon ). \end{aligned}$$

and, hence,

$$\begin{aligned} \log {N}_{[]}\left( \varepsilon ,\{\lambda (\mathbf {w};\varvec{\beta }_1,{\varvec{\phi }},\psi ):\psi \in \mathcal {S}_{0,n},\Vert \varvec{\tau }-\varvec{\tau }_0\Vert _2\le \delta \},L_2(P)\right) \le {C}q_n\log (\delta /\varepsilon ) \end{aligned}$$

because the function \(x\mapsto \exp (x)\) is Lipschitz and monotonic. By inequality \(2[\exp (|x|)-1-|x|] \le x^2\exp (|x|)\),

$$\begin{aligned} \log {N}_{[]}\left( \varepsilon ,\{\lambda (\mathbf {w};\varvec{\beta }_1,{\varvec{\phi }},\psi ):\psi \in \mathcal {S}_{0,n},\Vert \varvec{\tau }-\varvec{\tau }_0\Vert _2\le \delta \},\Vert \cdot \Vert _{P,B}\right) \le {C}q_n\log (\delta /\varepsilon ). \end{aligned}$$

Similarly, we can show that

$$\begin{aligned} \log {N}_{[]}\left( \varepsilon ,\{\pi (\mathbf {w};\varvec{\alpha }):\Vert \varvec{\alpha }-\varvec{\alpha }_0\Vert _2\le \delta \},\Vert \cdot \Vert _{P,B}\right) \le {C}\log (\delta /\varepsilon ). \end{aligned}$$

The transformation \(\left( \pi (\mathbf {w};\varvec{\alpha }),\lambda (\mathbf {w};\varvec{\beta }_1,{\varvec{\phi }},\psi )\right) \mapsto \ell (\pi (\mathbf {w};\varvec{\alpha }),\lambda (\mathbf {w};\varvec{\beta }_1,{\varvec{\phi }},\psi );\varvec{\tau })\) is essentially Lipschitz, so it follows that \(\log {N}_{[]}\left( \varepsilon ,\mathcal {L},\Vert \cdot \Vert _{P,B}\right) \le {C}q_n\log (\delta /\varepsilon )\), and, hence, the bracketing integral is bounded by \(q_n^{1/2}\delta \), up to a constant. \(\square \)

1.3 A.3 Proof of Lemma 2

Proof

Let \( {\mathbb M}_{n}(\varvec{\tau })= {\mathbb P}_{n}\ell (\varvec{\tau };y,\mathbf {w})\) and \( {\mathbb M}(\varvec{\tau })=P\ell (\varvec{\tau };y,\mathbf {w})\). For any \(\varvec{\tau }\) in a neighborhood of \(\varvec{\tau }_0\), a Taylor’s expansion yields

$$\begin{aligned} {\mathbb M}(\varvec{\tau }_0)- {\mathbb M}(\varvec{\tau }){\,\ge }{P}[(\mathbf {w}^\mathtt{{T}}\varvec{\alpha }-\mathbf {w}^\mathtt{{T}}\varvec{\alpha }_0)^2] {+}P\{[\mathbf {x}^\mathtt{{T}}\varvec{\beta }_1-\mathbf {x}^\mathtt{{T}}\varvec{\beta }_{10}{+}\psi (\mathbf {z}^\mathtt{{T}}\varvec{\zeta })-\psi _0(\mathbf {z}^\mathtt{{T}}\varvec{\zeta }_0)]^2\}, \end{aligned}$$

up to a constant. Let \(\textit{g}_1(\mathbf {x})=\mathbf {x}^\mathtt{{T}}(\varvec{\beta }_1-\varvec{\beta }_{10})\) and \(\textit{g}_2(\mathbf {z})=\psi (\mathbf {z}^\mathtt{{T}}\varvec{\zeta })-\psi _0(\mathbf {z}^\mathtt{{T}}\varvec{\zeta }_0)\). According to the law of total expectation and Cauchy-Schwarz inequality, \(\{E[\textit{g}_1(\mathbf {x})\textit{g}_2(\mathbf {z})]\}^2\le {E}_{\mathbf {z}}[\textit{g}_2^2(\mathbf {z})]E_{\mathbf {z}}[\{E_{\mathbf {x}|\mathbf {z}}[\textit{g}_1(\mathbf {x})|\mathbf {z}]\}^2]\). By the orthogonality of a conditional expectation, there exists \(0<\xi <1\) such that \(E_{\mathbf {z}}[\{E_{\mathbf {x}|\mathbf {z}}[\textit{g}_1(\mathbf {x})|\mathbf {z}]\}^2]=\xi {E}_\mathbf {x}[\textit{g}_1^2(\mathbf {x})]\). Hence, \(E[\textit{g}_1^2(\mathbf {x})\textit{g}_2^2(\mathbf {z})] \le \xi {E}[\textit{g}_1^2(\mathbf {x})]E[\textit{g}_2^2(\mathbf {z})]\). In view of Lemma 25.86 of van der Vaart (2000),

$$\begin{aligned} {\mathbb M}(\varvec{\tau }_0)- {\mathbb M}(\varvec{\tau })\ge \Vert \psi (\mathbf {z}^\mathtt{{T}}\varvec{\zeta })-\psi _0(\mathbf {z}^\mathtt{{T}}\varvec{\zeta }_0)\Vert _2^2 +\Vert \mathbf {w}^\mathtt{{T}}\varvec{\alpha }-\mathbf {w}^\mathtt{{T}}\varvec{\alpha }_{0}\Vert _2^2+\Vert \mathbf {x}^\mathtt{{T}}\varvec{\beta }_1-\mathbf {x}^\mathtt{{T}}\varvec{\beta }_{10}\Vert _2^2, \end{aligned}$$

up to a constant. By Lemma 1 of Stone (1985) and conditions C3 and C4, it follows that \( {\mathbb M}(\varvec{\tau }_0)- {\mathbb M}(\varvec{\tau })\ge {C}\Vert \varvec{\tau }-\varvec{\tau }_0\Vert _2^2\). \(\square \)

1.4 A.4 Proof of Lemma 3

Proof

We verify the conditions of Theorem 5.7 in van der Vaart (2000) to prove the consistency of \(\hat{\varvec{\tau }}\). According to Lemma 1, \(\mathcal {L}\) is a Donsker class, and is therefore a Glivenko-Cantelli class. Thus, \(\sup _{\varvec{\tau }}|( {\mathbb P}_{n}-P)\ell (\varvec{\tau };y,\mathbf {w})|=o_p(1)\) for \(\varvec{\tau }\) in a neighborhood of \(\varvec{\tau }_0\). The first condition of the theorem holds. It follows from Lemma 2 that \(\sup _{\Vert \varvec{\tau }-\varvec{\tau }_0\Vert _2\ge \varepsilon } {\mathbb M}(\varvec{\tau })\le {\mathbb M}(\varvec{\tau }_0)-C\varepsilon ^2 < {\mathbb M}(\varvec{\tau }_0)\). Hence, the second condition of the theorem also holds.

According to Jackson’s theorem for polynomials (de Boor 2001), there exists a spline of order \({l}\ge {2}\) \(\psi _{0,n}\in \mathcal {S}_{0,n}\) such that \(\Vert \psi _{0,n}-\psi _0\Vert _\infty =O(n^{-r\nu })\) for \(1/(2r+2)<\nu < 1/(2r)\). Let \(\varvec{\tau }_{0,n}=({\varvec{\theta }}_0,\psi _{0,n})\). By definition of \(\hat{\varvec{\tau }}\),

$$\begin{aligned} {\mathbb M}_{n}(\hat{\varvec{\tau }})- {\mathbb M}_{n}(\varvec{\tau }_0)\ge {\mathbb M}_{n}(\varvec{\tau }_{0,n})- {\mathbb M}_{n}(\varvec{\tau }_0)=I_{n1}+I_{n2}, \end{aligned}$$

where \(I_{n1}=( {\mathbb P}_{n}-P)[\ell (\varvec{\tau }_{0,n};y,\mathbf {w})-\ell (\varvec{\tau }_0;y,\mathbf {w})]\) and \(I_{n2}= P[\ell (\varvec{\tau }_{0,n};y,\mathbf {w})-\ell (\varvec{\tau }_0;y,\mathbf {w})]\). As shown in the proof of Lemma 1, \(\mathcal {S}_{0,n}\) is a Donsker class. Because \(\ell ({\varvec{\theta }}_0,\psi ;y,\mathbf {w})\) is essentially Lipschitz with respect to \(\psi \), the preservation theorem of Donsker class yields the class of functions \(\ell ({\varvec{\theta }}_0,\psi ;y,\mathbf {w})-\ell ({\varvec{\theta }}_0,\psi _0;y,\mathbf {w})\), for \(\psi \in \mathcal {S}_{0,n}\) and \(\Vert \psi -\psi _0\Vert _2\le \delta \), is a Donsker class. Moreover, by the mean value theorem, \(P[\ell (\varvec{\tau }_{0,n};y,\mathbf {w})-\ell (\varvec{\tau }_0;y,\mathbf {w})]^2\le {C}\Vert \psi _{0,n}-\psi _0\Vert _\infty ^2\rightarrow 0\) as \(n\rightarrow \infty \). In view of Lemma 19.24 of van der Vaart (2000), \(I_{n1}=o_p(n^{-1/2})\). Observe that \(I_{n2}\ge {-C}\Vert \psi _{0,n}-\psi _0\Vert _\infty ^2=-O(n^{-2r\nu })\). It follows that \( {\mathbb M}_{n}(\hat{\varvec{\tau }})- {\mathbb M}_{n}(\varvec{\tau }_0) > -o_p(1)\). Therefore, Theorem 5.7 of van der Vaart (2000) applies and yields the of consistency of \(\hat{\varvec{\tau }}\). \(\square \)

1.5 A.5 Proof of Theorem 1

Proof

We apply Theorem 3.4.1 of van der Vaart and Wellner (1996) to prove the rate of convergence. Let \(\varvec{\tau }\in \mathbf {\Theta }_n=\{({\varvec{\phi }},\varvec{\alpha },\varvec{\beta }_1,\psi ):{\varvec{\phi }}\in \mathcal {R}^{d-1},\varvec{\alpha }\in \mathcal {R}^{p+d},\varvec{\beta }_1\in \mathcal {R}^{p},\psi \in \mathcal {S}_{0,n}\}\). Choose \(d_n(\varvec{\tau },\varvec{\tau }_{0,n})\) and \(M_n(\varvec{\tau })\) defined in the theorem to be \(\Vert \varvec{\tau }-\varvec{\tau }_{0,n}\Vert _2\) and \( {\mathbb M}(\varvec{\tau })\), respectively. By definition of \(\hat{\varvec{\tau }}\), \( {\mathbb M}_n(\hat{\varvec{\tau }}) \ge {\mathbb M}_n(\varvec{\tau }_{0,n})\). In the proof of Lemma 3 for consistency, we have already shown that \( {\mathbb M}(\varvec{\tau })- {\mathbb M}(\varvec{\tau }_0)\le -Cd_n^2(\varvec{\tau },\varvec{\tau }_0)\). Because \( {\mathbb M}(\varvec{\tau }_0)- {\mathbb M}(\varvec{\tau })\le {C}d_n(\varvec{\tau }_{0,n},\varvec{\tau }_0)\le C\Vert \psi _{0,n}-\psi _0\Vert _\infty =O(n^{-r\nu })\), for any \(\varvec{\tau }\in \mathbf {\Theta }_n\) such that \(\delta /2\le {d}_n(\varvec{\tau },\varvec{\tau }_{0,n})\le \delta \), we have \(d_n(\varvec{\tau },\varvec{\tau }_0) \ge d_n(\varvec{\tau },\varvec{\tau }_{0,n})-d_n(\varvec{\tau }_{0,n},\varvec{\tau }_0) >C\delta \) for sufficiently large n. It follows that

$$\begin{aligned} {\mathbb M}(\varvec{\tau })- {\mathbb M}(\varvec{\tau }_{0,n})= & {} {\mathbb M}(\varvec{\tau })- {\mathbb M}(\varvec{\tau }_0)+ {\mathbb M}(\varvec{\tau }_0)- {\mathbb M}(\varvec{\tau }_{0,n}) \\\le & {} -C\delta ^2+O(n^{-2r\nu })=-C\delta ^2 \end{aligned}$$

for sufficiently large n.

For any \(\delta >0\), in view of Lemma 1,

$$\begin{aligned} J_{[]}\{\delta ,\{\ell (\varvec{\tau };y,\mathbf {w})-\ell (\varvec{\tau }_{0,n};y,\mathbf {w}):\varvec{\tau }\in \mathbf {\Theta }_n,\delta /2\le d_n(\varvec{\tau },\varvec{\tau }_{0,n})\le \delta \},\Vert \cdot \Vert _{P,B}\} \le Cq_n^{1/2}\delta . \end{aligned}$$

Moreover, for \(\varvec{\tau }\in \mathbf {\Theta }_n\) and \(\delta /2\le {d}_n(\varvec{\tau },\varvec{\tau }_{0,n}) \le \delta \), by inequality \(2[\exp (|x|)-|x|-1]\le {x}^2\exp (|x|)\) and conditions C3–C5, \(\Vert \ell (\varvec{\tau };y,\mathbf {w})-\ell (\varvec{\tau }_{0,n};y,\mathbf {w})\Vert _{P,B}^2\le {C}\delta ^2\). Lemma 3.4.3 of van der Vaart and Wellner (1996) yields

$$\begin{aligned} E\left[ \sup _{\delta /2\le \Vert \varvec{\tau }-\varvec{\tau }_{0,n}\Vert _2\le \delta ,\varvec{\tau }\in \mathbf {\Theta }_n}n^{1/2}|( {\mathbb M}_{n}- {\mathbb M})(\varvec{\tau })-( {\mathbb M}_{n}- {\mathbb M})(\varvec{\tau }_{0,n})| \right] \le {C}\phi _n(\delta ) \end{aligned}$$

with \(\phi _n(\delta )=q_n^{1/2}\delta +n^{-1/2}q_n\). Obviously, \(\phi _n(\delta )/\delta \) is decreasing in \(\delta \). It can be readily shown that \(r_n^2\phi _n(1/r_n)\le {n}^{1/2}\) with \(r_n=n^{\min (r\nu ,(1-\nu )/2)}\). Theorem 3.4.1 of van der Vaart and Wellner (1996) is applied to yield \(r_nd_n(\hat{\varvec{\tau }},\varvec{\tau }_{0,n})=O_p(1)\). Because \(d_n(\varvec{\tau }_{0,n},\varvec{\tau }_0)=O(n^{-r\nu })\), it follows that \(r_nd_n(\hat{\varvec{\tau }},\varvec{\tau }_0) \le r_nd_n(\hat{\varvec{\tau }},\varvec{\tau }_{0,n})+r_nd_n(\varvec{\tau }_{0,n},\varvec{\tau }_0)=O_p(1)+r_nO(n^{-r\nu })=O_p(1)\). This completes the proof of the rate of convergence. \(\square \)

1.6 A.6 Proof of Theorem 2

Proof

Theorem 2 follows from the same arguments as those in the proof of Theorem 1(b) of Lu and Loomis (2013) and entropy calculations similar to those in Lemma 1. \(\square \)

1.7 A.7 Proof of Theorem 3

Proof

We apply Theorem 3.1 of Murphy and van der Vaart (1997) to prove the asymptotic distribution of the spline likelihood ratio test statistic. Let \(\mathbf {t}=\left( \mathbf {t}_1^\mathtt{{T}},\mathbf {t}_2^\mathtt{{T}},\mathbf {t}_3^\mathtt{{T}}\right) ^\mathtt{{T}}\), where \(\mathbf {t}_1\in \mathbb {R}^{d-1}\), \(\mathbf {t}_2\in \mathbb {R}^{p+d}\), and \(\mathbf {t}_3\in \mathbb {R}^{p}\). Define an approximately least favorable submodel

$$\begin{aligned} \mathrm {\Psi }_{\mathbf {t}}({\varvec{\theta }},\psi )=\left( \mathbf {t},\psi _{\mathbf {t}}({\varvec{\theta }},\psi )\right) , \end{aligned}$$

where \(\psi _{\mathbf {t}}({\varvec{\theta }},\psi )=\psi +({\varvec{\theta }}-\mathbf {t})^\mathtt{{T}}\mathbf {h}^*\circ \psi _0^{-1}\circ \psi \). Let \(p(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})\) and \(\ell (\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})\) be density and log density functions under parameters \((\mathbf {t},\psi _{\mathbf {t}}({\varvec{\theta }},\psi ))\), respectively. Also denote by \(\dot{\ell }(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})\) the first derivatives of \(\ell (\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})\) with respect to \(\mathbf {t}\). Some derivative calculations then yield

$$\begin{aligned} \dot{\ell }(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w}) =\left( \begin{array}{l} \eta _{\mathbf {t}}[\psi ^\prime _{\mathbf {t}}\mathbf {D}_{\mathbf {t}}\mathbf {z}-\mathbf {h}_1^*\circ \psi _0\circ \psi _\mathbf {t}+({\varvec{\phi }}-\mathbf {t}_1)^\mathtt{{T}}\nabla _{\mathbf {t}_1} (\mathbf {h}_1^*\circ \psi _0\circ \psi _\mathbf {t})] \\ -\eta _{\mathbf {t}}\mathbf {h}_2^*\circ \psi _0\circ \psi _\mathbf {t}+\xi _{\mathbf {t}}\mathbf {w}\\ \eta _{\mathbf {t}}(\mathbf {x}-\mathbf {h}_3^*\circ \psi _0\circ \psi _\mathbf {t}) \end{array}\right) . \end{aligned}$$

Here \(\xi _{\mathbf {t}}\), \(\eta _{\mathbf {t}}\), \(\psi _{\mathbf {t}}\), and \(\psi ^\prime _{\mathbf {t}}\) represent \(\xi \), \(\eta \), \(\psi \), and \(\psi ^\prime \) evaluated at \((\mathbf {t},\psi _{\mathbf {t}}(\mathbf {t},\psi ))\), respectively. \(\nabla _{\mathbf {t}_1}(\mathbf {h}_1^*\circ \psi _0\circ \psi _\mathbf {t})\) is the gradient of \(\mathbf {h}_1^*\circ \psi _0\circ \psi _\mathbf {t}\) with respect to \(\mathbf {t}_1\). Observe that \(\dot{\ell }(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})\) converges to \(\ell _{{\varvec{\theta }}}^*(\varvec{\tau }_0;y,\mathbf {w})\) as \((\mathbf {t},{\varvec{\theta }},\psi )\rightarrow ({\varvec{\theta }}_0,{\varvec{\theta }}_0,\psi _0)\). Moreover, using the similar arguments to those in the proof of Lemma 1, we can show that, for any \(\delta >0\), the class of functions \(\dot{\ell }(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})\) with \(\psi \in \mathcal {S}_{0,n}\), \(\Vert \psi -\psi _0\Vert _2\le \delta \), \(\Vert \mathbf {t}-{\varvec{\theta }}_0\Vert _2 \le \delta \), and \(\Vert {\varvec{\theta }}-{\varvec{\theta }}_0\Vert _2\le \delta \) is P-Donsker. Thus, Lemma 3.2 of Murphy and van der Vaart (1997) is applicable.

Using the same arguments as above, we can show that, for \((\mathbf {t},{\varvec{\theta }},\psi )\) in a neighborhood of \(({\varvec{\theta }}_0,{\varvec{\theta }}_0,\psi _0)\), the class of \(p^{-1}(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})\partial ^2p(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})/\partial \mathbf {t}\partial \mathbf {t}^\mathtt{{T}}\) is P-Donsker and is therefore P-Glivenko-Cantelli. Furthermore, as \((\mathbf {t},{\varvec{\theta }},\psi )\rightarrow ({\varvec{\theta }}_0,{\varvec{\theta }}_0,\psi _0)\),

$$\begin{aligned} E[{p^{-1}(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})\partial ^2p(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})}/\partial \mathbf {t}\partial \mathbf {t}^\mathtt{{T}}]\rightarrow - E[\ell _{{\varvec{\theta }}}^*(\varvec{\tau }_0;y,\mathbf {w})]^{\otimes 2}+\mathbf {I}({\varvec{\theta }}_0)={\varvec{0}}. \end{aligned}$$

It follows that condition 3.14 in Murphy and van der Vaart (1997) holds. Thus the conditions in Theorem 3.1 of  Murphy and van der Vaart (1997) reduce to the unbiasedness condition

$$\begin{aligned} \sqrt{n}P_0\dot{\ell }({\varvec{\theta }}_0,{\varvec{\theta }}_0,\hat{\psi }_0;y,\mathbf {w})=o_p(1), \end{aligned}$$

where \(\hat{\psi }_0\) is the estimator of \(\psi _0\) under \({\varvec{\theta }}={\varvec{\theta }}_0\). Using the same arguments as those in the proof of the convergence rate of \(\hat{\varvec{\tau }}\), we can deduce that \(\Vert \hat{\psi }_0-\psi _0\Vert _2=O_p(n^{-r/(1+2r)})\).

Abbreviate \(\dot{\ell }({\varvec{\theta }}_0,{\varvec{\theta }}_0,\psi ;y,\mathbf {w})\) to \(\dot{\ell }(\psi ;y,\mathbf {w})\). In view of the fact that \(P_{{\varvec{\theta }},\psi }\dot{\ell }({\varvec{\theta }},{\varvec{\theta }},\psi ;y,\mathbf {w})=0\) for all \(({\varvec{\theta }},\psi )\), we can decompose \(P_0\dot{\ell }(\hat{\psi }_0;y,\mathbf {w})\) as \(I_{n5}+I_{n6}\), where \(I_{n5}=(P_0-P_{{\varvec{\theta }}_0,\hat{\psi }_0})\dot{\ell }(\psi _0;y,\mathbf {w})\) and \(I_{n6}=(P_0-P_{{\varvec{\theta }}_0,\hat{\psi }_0})[\dot{\ell }(\hat{\psi }_0;y,\mathbf {w})-\dot{\ell }(\psi _0;y,\mathbf {w})]\). Observe that \(I_{n5}=P_0\{\dot{\ell }(\psi _0;y,\mathbf {w})[(p_0-p_{{\varvec{\theta }}_0,\hat{\psi }_0})/p_0-\dot{\ell }_\psi (\varvec{\tau }_0;y,\mathbf {w})[\psi _0-\hat{\psi }_0]]\}\). By a Taylor’s expansion, \(I_{n5}\) can be expressed as

$$\begin{aligned} I_{n5}=-(1/2)P_0[p^{-1}_0\dot{\ell }(\psi _0;y,\mathbf {w})d^2p({{\varvec{\theta }}_0,\psi _0+t(\hat{\psi }_0-\psi _0);y,\mathbf {w}})/dt^2]|_{t=t^*}, \end{aligned}$$

where \(0<t^*<1\). According to conditions C3–C5 and the rate of convergence of \(\hat{\psi }_0\), \(I_{n5}=o_p(n^{-1/2})\). Similarly, a Taylor’s expansion and the rate of convergence of \(\hat{\psi }_0\) yield \(I_{n6}=o_p(n^{-1/2})\). This completes the proof of Theorem 3. \(\square \)

1.8 A.8 Proof of Theorem 4

Proof

In view of the consistency of \(\hat{\varvec{\tau }}\) and Proposition 2.1 of  Huang et al. (2008), we can show that \( {\mathbb P}_{n}[\dot{\ell }_{{\varvec{\theta }}}(\hat{\varvec{\tau }};y,\mathbf {w})-\dot{\ell }_\psi (\hat{\varvec{\tau }};y,\mathbf {w})[\hat{\mathbf {h}}^*]]^{\otimes 2}\rightarrow \mathbf {I}({\varvec{\theta }}_0)\) in probability. According to some entropy calculations and the law of large numbers, it follows that \(\hat{E}_{{\varvec{\theta }}{\varvec{\theta }}}=\hat{A}_{{\varvec{\theta }}{\varvec{\theta }}}+o_p(1)\), \(\hat{E}_{{\varvec{\theta }}\psi }=\hat{A}_{{\varvec{\theta }}\psi }+o_p(1)\), and \(\hat{E}_{\psi \psi }=\hat{A}_{\psi \psi }+o_p(1)\). We conclude that \(\mathcal {E}_n\rightarrow \mathbf {I}({\varvec{\theta }}_0)\) in probability. This completes the proof of Theorem 4. \(\square \)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, M., Li, CS. Spline-based semiparametric estimation of a zero-inflated Poisson regression single-index model. Ann Inst Stat Math 68, 1111–1134 (2016). https://doi.org/10.1007/s10463-015-0527-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-015-0527-8

Keywords

Navigation