Spline-based semiparametric estimation of a zero-inflated Poisson regression single-index model

Lu, Minggen; Li, Chin-Shang

doi:10.1007/s10463-015-0527-8

Spline-based semiparametric estimation of a zero-inflated Poisson regression single-index model

Published: 02 July 2015

Volume 68, pages 1111–1134, (2016)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Minggen Lu¹ &
Chin-Shang Li²

337 Accesses
2 Citations
Explore all metrics

Abstract

When the number of zeros in a count dataset exceeds the accommodation of the probability mass of a regular Poisson distribution at zero, the zero-inflated Poisson (ZIP) distribution is often used. To characterize the potential non-linear effects of covariates and avoid the “curse of dimensionality”, we propose a spline-based ZIP regression single-index model. B-splines are employed to estimate the unknown smooth function. A modified Fisher scoring method is proposed to simultaneously estimate the linear coefficients and the regression function. It is shown that the spline estimator of the nonparametric component is uniformly consistent, and achieves the optimal convergence rate under the smooth condition, and that the estimators of regression parameters are asymptotically normal and efficient. The spline-based semiparametric likelihood ratio test is also established. Moreover, a direct and consistent variance estimation method based on least-squares estimation is proposed. Simulations are performed to evaluate the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spline-based quasi-likelihood estimation of mixed Poisson regression with single-index models

Article 16 October 2017

Penalised spline estimation for generalised partially linear single-index models

Article 18 February 2016

Efficient estimation of quasi-likelihood models using B-splines

Article 03 August 2016

References

Böhning, D., Dietz, E., Schlattmann, P., Mendonca, L., & Kirchner, U. (1999). The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. Journal of the Royal Statistical Society Series A Statistics in Society, 162, 195–209.
Article Google Scholar
Carroll, R. J., Fan, J., Gijbels, I., & Wand, M. P. (1997). Generalized partially linear single-index models. Journal of the American Statistical Association, 92, 477–489.
Article MathSciNet MATH Google Scholar
Cheung, Y. B. (2002). Zero-inflated models for regression analysis of count data: a study of growth and development. Statistics in Medicine, 21, 1461–1469.
Article Google Scholar
de Boor, C. (2001). A practical guide to splines. New York: Springer.
MATH Google Scholar
Dietz, K., & Böhning, D. (1997). The use of two-component mixture models with one completely or partly known component. Computational Statistics, 12, 219–234.
MATH Google Scholar
Delecroix, M., Härdle, W., & Hristache, M. (2003). Efficient estimation in conditional single-index regression. Journal of Multivariate Analysis, 86, 213–216.
Article MathSciNet MATH Google Scholar
Härdle, W., & Stoker, E. M. (1989). Investigating smooth multiple regression by the method of average derivatives. Journal of the American Statistical Association, 84, 986–995.
MathSciNet MATH Google Scholar
Härdle, W., Hall, P., & Ichimura, H. (1993). Optimal smoothing in single-index models. Annals of Statisitcs, 21, 157–178.
Article MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R. (1990). Generalized additive models. New York: Chapman & Hall/CRC.
Horowitz, J. L., & Härdle, W. (1996). Direct semiparametric estimation of single-index models with discrete covariate. Journal of the American Statistical Association, 91, 1632–1640.
Article MathSciNet MATH Google Scholar
Huang, J. Z., Liu, L. (2006). Polynomial spline estimation and inference of proportional hazards regression models with flexible relative risk form. Biometrics, 62, 793–802.
Huang, J., Zhang, Y., Hua, L. (2008). A least-squares approach to consistent information estimation in semiparametric models. Technical Report 2008–3, University of Iowa, Department of Biostatistics
Ichimura, H. (1993). Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. Journal of Econometrics, 58, 71–120.
Article MathSciNet MATH Google Scholar
Johnson, N. L., Kotz, S., Kemp, A. W. (2005). Univariate discrete distributions (3rd ed.). New York: Wiley.
Kosorok, M. R. (2008). Introduction to empirical processes and semiparametric inference. Dordrecht: Springer.
Book MATH Google Scholar
Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics, 34, 1–14.
Article MATH Google Scholar
Lu, M., & Loomis, D. (2013). Spline-based semiparametric estimation of partially linear poisson regression with single-index models. Journal of Nonparametric Statistics, 25, 905–922.
Article MathSciNet MATH Google Scholar
Lu, S. E., Lin, Y., Shih, W. C. J. (2004). Analyzing excessive no changes in clinical trials with clustered data. Biometrics, 60, 257–267.
Murphy, S. A., & van der Vaart, A. W. (1997). Semiparametric likelihood ratio inference. Annals of Statistics, 25, 1471–1509.
Article MathSciNet MATH Google Scholar
Murphy, S. A., van der Vaart, A. W. (1999). Observed information in semi-parametric models. Bernoulli, 5, 381–412.
Nielsen, G. G., Gill, R. D., Andersen, P. K., & Sörensen, T. I. A. (1992). A counting process approach to maximum likelihood estimation in frailty models. Scandinavian Journal of Statistics, 19, 25–43.
MathSciNet MATH Google Scholar
Rosenberg, P. S. (1995). Hazard function estimation using B-splines. Biometrics, 51, 874–887.
Article MATH Google Scholar
Schumaker, L. (1981). Spline functions: basic theory. New York: Wiley.
MATH Google Scholar
Shen, X., & Wong, W. H. (1994). Convergence rate of sieve estimates. Annals of Statistics, 22, 580–615.
Article MathSciNet MATH Google Scholar
Singh, S. (1963). A note on inflated Poisson distribution. Journal of the Indian Statistical Association, 1, 140–144.
MathSciNet Google Scholar
Stone, C. J. (1985). Additive regression and other nonparametric models. Annals of Statistics, 13, 689–705.
Article MathSciNet MATH Google Scholar
Stone, C. J. (1986). The dimensionality reduction principle for generalized additive models. Annals of Statistics, 14, 590–606.
Article MathSciNet MATH Google Scholar
Sun, J., Kopciukb, K. A., & Lu, X. (2008). Polynomial spline estimation of partially linear single-index proportional hazards regression models. Computational Statistics and Data Analysis, 53, 176–188.
Article MathSciNet MATH Google Scholar
van der Vaart, A. W. (2000). Asymptotic statistics. Cambridge: Cambridge University Press.
MATH Google Scholar
van der Vaart, A. W., Wellner, J. A. (1996). Weak convergence and empirical processes. New York: Springer.
Yau, K. K. W., & Lee, A. H. (2001). Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme. Statistics in Medicine, 20, 2907–2920.
Article Google Scholar
Yu, Y., & Ruppert, D. (2002). Penalized spline estimation for partially linear single-index models. Journal of the American Statistical Association, 97, 1042–1054.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors express their thanks to an associate editor and two referees whose constructive comments improved the presentation. The Project described was supported by the National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), through Grant #UL1 TR000002 (C.S. Li).

Author information

Authors and Affiliations

School of Community Health Sciences, University of Nevada, Reno, NV, 89557, USA
Minggen Lu
Division of Biostatistics, Department of Public Health Sciences, University of California, Davis, CA, 95616, USA
Chin-Shang Li

Authors

Minggen Lu
View author publications
You can also search for this author in PubMed Google Scholar
Chin-Shang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chin-Shang Li.

A Appendix

1.1 A.1 Notations and lemmas

Let $P_{\varvec{\tau }}$ be the distribution of $(y,\mathbf {w}^\mathtt{{T}})^\mathtt{{T}}$ under the parameter vector $\varvec{\tau }$ and $p_{\varvec{\tau }}$ the corresponding density. Denote $P_0\equiv {P}_{\varvec{\tau }_0}$ and $p_0\equiv {p}_{\varvec{\tau }_0}$. For a measurable function f, define Pf as the expectation of f under P. For any class of measurable functions $\mathcal {F}$, the bracketing number $N_{[]}(\varepsilon ,\mathcal {F},L_2(P))$ is defined as the minimum number of brackets $[f_i^L,f_i^R],i=1,\ldots ,m$, such that, for $f\in \mathcal {F}$, there exists $1\le {i}\le {m}$ such that $f_i^L\le {f}\le {f}_i^R$ and $\Vert f_i^R-f_i^L\Vert _2\le \varepsilon $. Define the bracketing integral $J_{[]}(\delta ,\mathcal {F},L_2(P))=\int _0^\delta \left[ 1+N_{[]}(\varepsilon ,\mathcal {F},L_2(P))\right] ^{1/2}d\varepsilon $. Denote $\mathbb {G}_nf=\sqrt{n}( {\mathbb P}_{n}-P)f$ and $\Vert \mathbb {G}_n\Vert _{\mathcal {F}}=\sup _{f\in \mathcal {F}}|\mathbb {G}_nf|$. In the following, C represents a positive constant that may vary from place to place.

Lemma 1

For any $\delta > 0$, define $\mathcal {L}=\{\ell (\varvec{\tau };y,\mathbf {w}):\psi \in \mathcal {S}_{0,n},{\varvec{\theta }}\in \mathcal {R}^{2p+2d-1},\Vert \varvec{\tau }-\varvec{\tau }_0\Vert _2\le \delta \}$. Then, for any $0 < \varepsilon \le \delta $, $\log N_{[]}(\varepsilon ,\mathcal {L},\Vert \cdot \Vert _{P,B}) \le C q_n\log (\delta /\varepsilon )$, and, hence, $J_{[]}(\delta ,\mathcal {L},\Vert \cdot \Vert _{P,B})\le C q_n^{1/2}\delta $, where $\Vert \cdot \Vert _{P,B}$ is the Bernstein norm defined as $\Vert f\Vert _{P,B}^2 = 2P\left[ \exp (|f|)-|f|-1\right] $ in van der Vaart and Wellner (1996) and $q_n=m_n + l$ is the number of spline basis functions.

Lemma 2

If conditions C1–C6 hold, then there exists $C > 0$ such that

$$\begin{aligned} P[\ell (\varvec{\tau }_0;y,\mathbf {w})-\ell (\varvec{\tau };y,\mathbf {w})]\ge C\Vert \varvec{\tau }-\varvec{\tau }_0\Vert _2^2, \end{aligned}$$

for $\varvec{\tau }$ in a neighborhood of $\varvec{\tau }_0$.

Lemma 3

(Consistency) If conditions C1–C6 hold, then $\Vert \hat{\varvec{\tau }}-\varvec{\tau }_0 \Vert _2 = o_p(1)$.

Remark 3

Lemma 1 and the similar entropy calculations are used to derive the consistency of $\hat{\varvec{\tau }}$ and to prove Theorems 1–4. Lemma 2 is a key result to derive the consistency and rate of convergence of $\hat{\varvec{\tau }}$. Lemma 3 shows $\hat{\varvec{\tau }}$ is asymptotically consistent to $\varvec{\tau }_0$.

1.2 A.2 Proof of Lemma 1

Proof

According to the bracketing calculation in Shen and Wong (1994), for any $\delta > 0$ and $0 < \varepsilon \le \delta $, the logarithm of bracketing number of $\mathcal {S}_{0,n}$, computed with $L_2(P)$, is bounded by $q_n \log (\delta /\varepsilon )$ up to a constant. It is known that the neighborhoods $\mathbf {A}(\delta )=\{\varvec{\alpha }:\Vert \varvec{\alpha }-\varvec{\alpha }_{0}\Vert _2\le \delta \}$, $\mathbf {Z}(\delta )=\{\varvec{\zeta }:\Vert \varvec{\zeta }-\varvec{\zeta }_0\Vert _2\le \delta \}$, and $\mathbf {B}(\delta )=\{\varvec{\beta }_1:\Vert \varvec{\beta }_1-\varvec{\beta }_{10}\Vert _2\le \delta \}$ can be covered by $O((\delta /\varepsilon )^{p+d})$, $O((\delta /\varepsilon )^d)$, and $O((\delta /\varepsilon )^p)$ balls with radius $\varepsilon $, respectively. In view of Theorem 9.23 of Kosorok (2008), the bracketing numbers for $\{\mathbf {w}^\mathtt{{T}}\varvec{\alpha }:\Vert \varvec{\alpha }-\varvec{\alpha }_{0}\Vert _2\le \delta \}$, $\{\mathbf {z}^\mathtt{{T}}\varvec{\zeta }:\Vert \varvec{\zeta }-\varvec{\zeta }_0\Vert _2\le \delta \}$, and $\{\mathbf {x}^\mathtt{{T}}\varvec{\beta }_1:\Vert \varvec{\beta }_1 -\varvec{\beta }_{10}\Vert _2\le \delta \}$ are bounded by $O((\delta /\varepsilon )^{p+d})$, $O((\delta /\varepsilon )^d)$, and $O((\delta /\varepsilon )^{p})$, respectively. It follows that, for sufficiently large n,

$$\begin{aligned} \log {N}_{[]}\left( \varepsilon ,\{\mathbf {x}^\mathtt{{T}}\varvec{\beta }_1+\psi (\mathbf {z}^\mathtt{{T}}\varvec{\zeta }):\psi \in \mathcal {S}_{0,n},\Vert \varvec{\tau }-\varvec{\tau }_0\Vert _2\le \delta \},L_2(P)\right) \le {C}q_n\log (\delta /\varepsilon ). \end{aligned}$$

and, hence,

$$\begin{aligned} \log {N}_{[]}\left( \varepsilon ,\{\lambda (\mathbf {w};\varvec{\beta }_1,{\varvec{\phi }},\psi ):\psi \in \mathcal {S}_{0,n},\Vert \varvec{\tau }-\varvec{\tau }_0\Vert _2\le \delta \},L_2(P)\right) \le {C}q_n\log (\delta /\varepsilon ) \end{aligned}$$

because the function $x\mapsto \exp (x)$ is Lipschitz and monotonic. By inequality $2[\exp (|x|)-1-|x|] \le x^2\exp (|x|)$,

$$\begin{aligned} \log {N}_{[]}\left( \varepsilon ,\{\lambda (\mathbf {w};\varvec{\beta }_1,{\varvec{\phi }},\psi ):\psi \in \mathcal {S}_{0,n},\Vert \varvec{\tau }-\varvec{\tau }_0\Vert _2\le \delta \},\Vert \cdot \Vert _{P,B}\right) \le {C}q_n\log (\delta /\varepsilon ). \end{aligned}$$

Similarly, we can show that

$$\begin{aligned} \log {N}_{[]}\left( \varepsilon ,\{\pi (\mathbf {w};\varvec{\alpha }):\Vert \varvec{\alpha }-\varvec{\alpha }_0\Vert _2\le \delta \},\Vert \cdot \Vert _{P,B}\right) \le {C}\log (\delta /\varepsilon ). \end{aligned}$$

The transformation $\left( \pi (\mathbf {w};\varvec{\alpha }),\lambda (\mathbf {w};\varvec{\beta }_1,{\varvec{\phi }},\psi )\right) \mapsto \ell (\pi (\mathbf {w};\varvec{\alpha }),\lambda (\mathbf {w};\varvec{\beta }_1,{\varvec{\phi }},\psi );\varvec{\tau })$ is essentially Lipschitz, so it follows that $\log {N}_{[]}\left( \varepsilon ,\mathcal {L},\Vert \cdot \Vert _{P,B}\right) \le {C}q_n\log (\delta /\varepsilon )$, and, hence, the bracketing integral is bounded by $q_n^{1/2}\delta $, up to a constant. $\square $

1.3 A.3 Proof of Lemma 2

Proof

Let $ {\mathbb M}_{n}(\varvec{\tau })= {\mathbb P}_{n}\ell (\varvec{\tau };y,\mathbf {w})$ and $ {\mathbb M}(\varvec{\tau })=P\ell (\varvec{\tau };y,\mathbf {w})$. For any $\varvec{\tau }$ in a neighborhood of $\varvec{\tau }_0$, a Taylor’s expansion yields

$$\begin{aligned} {\mathbb M}(\varvec{\tau }_0)- {\mathbb M}(\varvec{\tau }){\,\ge }{P}[(\mathbf {w}^\mathtt{{T}}\varvec{\alpha }-\mathbf {w}^\mathtt{{T}}\varvec{\alpha }_0)^2] {+}P\{[\mathbf {x}^\mathtt{{T}}\varvec{\beta }_1-\mathbf {x}^\mathtt{{T}}\varvec{\beta }_{10}{+}\psi (\mathbf {z}^\mathtt{{T}}\varvec{\zeta })-\psi _0(\mathbf {z}^\mathtt{{T}}\varvec{\zeta }_0)]^2\}, \end{aligned}$$

up to a constant. Let $\textit{g}_1(\mathbf {x})=\mathbf {x}^\mathtt{{T}}(\varvec{\beta }_1-\varvec{\beta }_{10})$ and $\textit{g}_2(\mathbf {z})=\psi (\mathbf {z}^\mathtt{{T}}\varvec{\zeta })-\psi _0(\mathbf {z}^\mathtt{{T}}\varvec{\zeta }_0)$. According to the law of total expectation and Cauchy-Schwarz inequality, $\{E[\textit{g}_1(\mathbf {x})\textit{g}_2(\mathbf {z})]\}^2\le {E}_{\mathbf {z}}[\textit{g}_2^2(\mathbf {z})]E_{\mathbf {z}}[\{E_{\mathbf {x}|\mathbf {z}}[\textit{g}_1(\mathbf {x})|\mathbf {z}]\}^2]$. By the orthogonality of a conditional expectation, there exists $0<\xi <1$ such that $E_{\mathbf {z}}[\{E_{\mathbf {x}|\mathbf {z}}[\textit{g}_1(\mathbf {x})|\mathbf {z}]\}^2]=\xi {E}_\mathbf {x}[\textit{g}_1^2(\mathbf {x})]$. Hence, $E[\textit{g}_1^2(\mathbf {x})\textit{g}_2^2(\mathbf {z})] \le \xi {E}[\textit{g}_1^2(\mathbf {x})]E[\textit{g}_2^2(\mathbf {z})]$. In view of Lemma 25.86 of van der Vaart (2000),

$$\begin{aligned} {\mathbb M}(\varvec{\tau }_0)- {\mathbb M}(\varvec{\tau })\ge \Vert \psi (\mathbf {z}^\mathtt{{T}}\varvec{\zeta })-\psi _0(\mathbf {z}^\mathtt{{T}}\varvec{\zeta }_0)\Vert _2^2 +\Vert \mathbf {w}^\mathtt{{T}}\varvec{\alpha }-\mathbf {w}^\mathtt{{T}}\varvec{\alpha }_{0}\Vert _2^2+\Vert \mathbf {x}^\mathtt{{T}}\varvec{\beta }_1-\mathbf {x}^\mathtt{{T}}\varvec{\beta }_{10}\Vert _2^2, \end{aligned}$$

up to a constant. By Lemma 1 of Stone (1985) and conditions C3 and C4, it follows that $ {\mathbb M}(\varvec{\tau }_0)- {\mathbb M}(\varvec{\tau })\ge {C}\Vert \varvec{\tau }-\varvec{\tau }_0\Vert _2^2$. $\square $

1.4 A.4 Proof of Lemma 3

Proof

We verify the conditions of Theorem 5.7 in van der Vaart (2000) to prove the consistency of $\hat{\varvec{\tau }}$. According to Lemma 1, $\mathcal {L}$ is a Donsker class, and is therefore a Glivenko-Cantelli class. Thus, $\sup _{\varvec{\tau }}|( {\mathbb P}_{n}-P)\ell (\varvec{\tau };y,\mathbf {w})|=o_p(1)$ for $\varvec{\tau }$ in a neighborhood of $\varvec{\tau }_0$. The first condition of the theorem holds. It follows from Lemma 2 that $\sup _{\Vert \varvec{\tau }-\varvec{\tau }_0\Vert _2\ge \varepsilon } {\mathbb M}(\varvec{\tau })\le {\mathbb M}(\varvec{\tau }_0)-C\varepsilon ^2 < {\mathbb M}(\varvec{\tau }_0)$. Hence, the second condition of the theorem also holds.

According to Jackson’s theorem for polynomials (de Boor 2001), there exists a spline of order ${l}\ge {2}$ $\psi _{0,n}\in \mathcal {S}_{0,n}$ such that $\Vert \psi _{0,n}-\psi _0\Vert _\infty =O(n^{-r\nu })$ for $1/(2r+2)<\nu < 1/(2r)$. Let $\varvec{\tau }_{0,n}=({\varvec{\theta }}_0,\psi _{0,n})$. By definition of $\hat{\varvec{\tau }}$,

$$\begin{aligned} {\mathbb M}_{n}(\hat{\varvec{\tau }})- {\mathbb M}_{n}(\varvec{\tau }_0)\ge {\mathbb M}_{n}(\varvec{\tau }_{0,n})- {\mathbb M}_{n}(\varvec{\tau }_0)=I_{n1}+I_{n2}, \end{aligned}$$

where $I_{n1}=( {\mathbb P}_{n}-P)[\ell (\varvec{\tau }_{0,n};y,\mathbf {w})-\ell (\varvec{\tau }_0;y,\mathbf {w})]$ and $I_{n2}= P[\ell (\varvec{\tau }_{0,n};y,\mathbf {w})-\ell (\varvec{\tau }_0;y,\mathbf {w})]$. As shown in the proof of Lemma 1, $\mathcal {S}_{0,n}$ is a Donsker class. Because $\ell ({\varvec{\theta }}_0,\psi ;y,\mathbf {w})$ is essentially Lipschitz with respect to $\psi $, the preservation theorem of Donsker class yields the class of functions $\ell ({\varvec{\theta }}_0,\psi ;y,\mathbf {w})-\ell ({\varvec{\theta }}_0,\psi _0;y,\mathbf {w})$, for $\psi \in \mathcal {S}_{0,n}$ and $\Vert \psi -\psi _0\Vert _2\le \delta $, is a Donsker class. Moreover, by the mean value theorem, $P[\ell (\varvec{\tau }_{0,n};y,\mathbf {w})-\ell (\varvec{\tau }_0;y,\mathbf {w})]^2\le {C}\Vert \psi _{0,n}-\psi _0\Vert _\infty ^2\rightarrow 0$ as $n\rightarrow \infty $. In view of Lemma 19.24 of van der Vaart (2000), $I_{n1}=o_p(n^{-1/2})$. Observe that $I_{n2}\ge {-C}\Vert \psi _{0,n}-\psi _0\Vert _\infty ^2=-O(n^{-2r\nu })$. It follows that $ {\mathbb M}_{n}(\hat{\varvec{\tau }})- {\mathbb M}_{n}(\varvec{\tau }_0) > -o_p(1)$. Therefore, Theorem 5.7 of van der Vaart (2000) applies and yields the of consistency of $\hat{\varvec{\tau }}$. $\square $

1.5 A.5 Proof of Theorem 1

Proof

We apply Theorem 3.4.1 of van der Vaart and Wellner (1996) to prove the rate of convergence. Let $\varvec{\tau }\in \mathbf {\Theta }_n=\{({\varvec{\phi }},\varvec{\alpha },\varvec{\beta }_1,\psi ):{\varvec{\phi }}\in \mathcal {R}^{d-1},\varvec{\alpha }\in \mathcal {R}^{p+d},\varvec{\beta }_1\in \mathcal {R}^{p},\psi \in \mathcal {S}_{0,n}\}$. Choose $d_n(\varvec{\tau },\varvec{\tau }_{0,n})$ and $M_n(\varvec{\tau })$ defined in the theorem to be $\Vert \varvec{\tau }-\varvec{\tau }_{0,n}\Vert _2$ and $ {\mathbb M}(\varvec{\tau })$, respectively. By definition of $\hat{\varvec{\tau }}$, $ {\mathbb M}_n(\hat{\varvec{\tau }}) \ge {\mathbb M}_n(\varvec{\tau }_{0,n})$. In the proof of Lemma 3 for consistency, we have already shown that $ {\mathbb M}(\varvec{\tau })- {\mathbb M}(\varvec{\tau }_0)\le -Cd_n^2(\varvec{\tau },\varvec{\tau }_0)$. Because $ {\mathbb M}(\varvec{\tau }_0)- {\mathbb M}(\varvec{\tau })\le {C}d_n(\varvec{\tau }_{0,n},\varvec{\tau }_0)\le C\Vert \psi _{0,n}-\psi _0\Vert _\infty =O(n^{-r\nu })$, for any $\varvec{\tau }\in \mathbf {\Theta }_n$ such that $\delta /2\le {d}_n(\varvec{\tau },\varvec{\tau }_{0,n})\le \delta $, we have $d_n(\varvec{\tau },\varvec{\tau }_0) \ge d_n(\varvec{\tau },\varvec{\tau }_{0,n})-d_n(\varvec{\tau }_{0,n},\varvec{\tau }_0) >C\delta $ for sufficiently large n. It follows that

$$\begin{aligned} {\mathbb M}(\varvec{\tau })- {\mathbb M}(\varvec{\tau }_{0,n})= & {} {\mathbb M}(\varvec{\tau })- {\mathbb M}(\varvec{\tau }_0)+ {\mathbb M}(\varvec{\tau }_0)- {\mathbb M}(\varvec{\tau }_{0,n}) \\\le & {} -C\delta ^2+O(n^{-2r\nu })=-C\delta ^2 \end{aligned}$$

for sufficiently large n.

For any $\delta >0$, in view of Lemma 1,

$$\begin{aligned} J_{[]}\{\delta ,\{\ell (\varvec{\tau };y,\mathbf {w})-\ell (\varvec{\tau }_{0,n};y,\mathbf {w}):\varvec{\tau }\in \mathbf {\Theta }_n,\delta /2\le d_n(\varvec{\tau },\varvec{\tau }_{0,n})\le \delta \},\Vert \cdot \Vert _{P,B}\} \le Cq_n^{1/2}\delta . \end{aligned}$$

Moreover, for $\varvec{\tau }\in \mathbf {\Theta }_n$ and $\delta /2\le {d}_n(\varvec{\tau },\varvec{\tau }_{0,n}) \le \delta $, by inequality $2[\exp (|x|)-|x|-1]\le {x}^2\exp (|x|)$ and conditions C3–C5, $\Vert \ell (\varvec{\tau };y,\mathbf {w})-\ell (\varvec{\tau }_{0,n};y,\mathbf {w})\Vert _{P,B}^2\le {C}\delta ^2$. Lemma 3.4.3 of van der Vaart and Wellner (1996) yields

$$\begin{aligned} E\left[ \sup _{\delta /2\le \Vert \varvec{\tau }-\varvec{\tau }_{0,n}\Vert _2\le \delta ,\varvec{\tau }\in \mathbf {\Theta }_n}n^{1/2}|( {\mathbb M}_{n}- {\mathbb M})(\varvec{\tau })-( {\mathbb M}_{n}- {\mathbb M})(\varvec{\tau }_{0,n})| \right] \le {C}\phi _n(\delta ) \end{aligned}$$

with $\phi _n(\delta )=q_n^{1/2}\delta +n^{-1/2}q_n$. Obviously, $\phi _n(\delta )/\delta $ is decreasing in $\delta $. It can be readily shown that $r_n^2\phi _n(1/r_n)\le {n}^{1/2}$ with $r_n=n^{\min (r\nu ,(1-\nu )/2)}$. Theorem 3.4.1 of van der Vaart and Wellner (1996) is applied to yield $r_nd_n(\hat{\varvec{\tau }},\varvec{\tau }_{0,n})=O_p(1)$. Because $d_n(\varvec{\tau }_{0,n},\varvec{\tau }_0)=O(n^{-r\nu })$, it follows that $r_nd_n(\hat{\varvec{\tau }},\varvec{\tau }_0) \le r_nd_n(\hat{\varvec{\tau }},\varvec{\tau }_{0,n})+r_nd_n(\varvec{\tau }_{0,n},\varvec{\tau }_0)=O_p(1)+r_nO(n^{-r\nu })=O_p(1)$. This completes the proof of the rate of convergence. $\square $

1.6 A.6 Proof of Theorem 2

Proof

Theorem 2 follows from the same arguments as those in the proof of Theorem 1(b) of Lu and Loomis (2013) and entropy calculations similar to those in Lemma 1. $\square $

1.7 A.7 Proof of Theorem 3

Proof

We apply Theorem 3.1 of Murphy and van der Vaart (1997) to prove the asymptotic distribution of the spline likelihood ratio test statistic. Let $\mathbf {t}=\left( \mathbf {t}_1^\mathtt{{T}},\mathbf {t}_2^\mathtt{{T}},\mathbf {t}_3^\mathtt{{T}}\right) ^\mathtt{{T}}$, where $\mathbf {t}_1\in \mathbb {R}^{d-1}$, $\mathbf {t}_2\in \mathbb {R}^{p+d}$, and $\mathbf {t}_3\in \mathbb {R}^{p}$. Define an approximately least favorable submodel

$$\begin{aligned} \mathrm {\Psi }_{\mathbf {t}}({\varvec{\theta }},\psi )=\left( \mathbf {t},\psi _{\mathbf {t}}({\varvec{\theta }},\psi )\right) , \end{aligned}$$

where $\psi _{\mathbf {t}}({\varvec{\theta }},\psi )=\psi +({\varvec{\theta }}-\mathbf {t})^\mathtt{{T}}\mathbf {h}^*\circ \psi _0^{-1}\circ \psi $. Let $p(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})$ and $\ell (\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})$ be density and log density functions under parameters $(\mathbf {t},\psi _{\mathbf {t}}({\varvec{\theta }},\psi ))$, respectively. Also denote by $\dot{\ell }(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})$ the first derivatives of $\ell (\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})$ with respect to $\mathbf {t}$. Some derivative calculations then yield

$$\begin{aligned} \dot{\ell }(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w}) =\left( \begin{array}{l} \eta _{\mathbf {t}}[\psi ^\prime _{\mathbf {t}}\mathbf {D}_{\mathbf {t}}\mathbf {z}-\mathbf {h}_1^*\circ \psi _0\circ \psi _\mathbf {t}+({\varvec{\phi }}-\mathbf {t}_1)^\mathtt{{T}}\nabla _{\mathbf {t}_1} (\mathbf {h}_1^*\circ \psi _0\circ \psi _\mathbf {t})] \\ -\eta _{\mathbf {t}}\mathbf {h}_2^*\circ \psi _0\circ \psi _\mathbf {t}+\xi _{\mathbf {t}}\mathbf {w}\\ \eta _{\mathbf {t}}(\mathbf {x}-\mathbf {h}_3^*\circ \psi _0\circ \psi _\mathbf {t}) \end{array}\right) . \end{aligned}$$

Here $\xi _{\mathbf {t}}$, $\eta _{\mathbf {t}}$, $\psi _{\mathbf {t}}$, and $\psi ^\prime _{\mathbf {t}}$ represent $\xi $, $\eta $, $\psi $, and $\psi ^\prime $ evaluated at $(\mathbf {t},\psi _{\mathbf {t}}(\mathbf {t},\psi ))$, respectively. $\nabla _{\mathbf {t}_1}(\mathbf {h}_1^*\circ \psi _0\circ \psi _\mathbf {t})$ is the gradient of $\mathbf {h}_1^*\circ \psi _0\circ \psi _\mathbf {t}$ with respect to $\mathbf {t}_1$. Observe that $\dot{\ell }(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})$ converges to $\ell _{{\varvec{\theta }}}^*(\varvec{\tau }_0;y,\mathbf {w})$ as $(\mathbf {t},{\varvec{\theta }},\psi )\rightarrow ({\varvec{\theta }}_0,{\varvec{\theta }}_0,\psi _0)$. Moreover, using the similar arguments to those in the proof of Lemma 1, we can show that, for any $\delta >0$, the class of functions $\dot{\ell }(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})$ with $\psi \in \mathcal {S}_{0,n}$, $\Vert \psi -\psi _0\Vert _2\le \delta $, $\Vert \mathbf {t}-{\varvec{\theta }}_0\Vert _2 \le \delta $, and $\Vert {\varvec{\theta }}-{\varvec{\theta }}_0\Vert _2\le \delta $ is P-Donsker. Thus, Lemma 3.2 of Murphy and van der Vaart (1997) is applicable.

Using the same arguments as above, we can show that, for $(\mathbf {t},{\varvec{\theta }},\psi )$ in a neighborhood of $({\varvec{\theta }}_0,{\varvec{\theta }}_0,\psi _0)$, the class of $p^{-1}(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})\partial ^2p(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})/\partial \mathbf {t}\partial \mathbf {t}^\mathtt{{T}}$ is P-Donsker and is therefore P-Glivenko-Cantelli. Furthermore, as $(\mathbf {t},{\varvec{\theta }},\psi )\rightarrow ({\varvec{\theta }}_0,{\varvec{\theta }}_0,\psi _0)$,

$$\begin{aligned} E[{p^{-1}(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})\partial ^2p(\mathbf {t},{\varvec{\theta }},\psi ;y,\mathbf {w})}/\partial \mathbf {t}\partial \mathbf {t}^\mathtt{{T}}]\rightarrow - E[\ell _{{\varvec{\theta }}}^*(\varvec{\tau }_0;y,\mathbf {w})]^{\otimes 2}+\mathbf {I}({\varvec{\theta }}_0)={\varvec{0}}. \end{aligned}$$

It follows that condition 3.14 in Murphy and van der Vaart (1997) holds. Thus the conditions in Theorem 3.1 of Murphy and van der Vaart (1997) reduce to the unbiasedness condition

$$\begin{aligned} \sqrt{n}P_0\dot{\ell }({\varvec{\theta }}_0,{\varvec{\theta }}_0,\hat{\psi }_0;y,\mathbf {w})=o_p(1), \end{aligned}$$

where $\hat{\psi }_0$ is the estimator of $\psi _0$ under ${\varvec{\theta }}={\varvec{\theta }}_0$. Using the same arguments as those in the proof of the convergence rate of $\hat{\varvec{\tau }}$, we can deduce that $\Vert \hat{\psi }_0-\psi _0\Vert _2=O_p(n^{-r/(1+2r)})$.

Abbreviate $\dot{\ell }({\varvec{\theta }}_0,{\varvec{\theta }}_0,\psi ;y,\mathbf {w})$ to $\dot{\ell }(\psi ;y,\mathbf {w})$. In view of the fact that $P_{{\varvec{\theta }},\psi }\dot{\ell }({\varvec{\theta }},{\varvec{\theta }},\psi ;y,\mathbf {w})=0$ for all $({\varvec{\theta }},\psi )$, we can decompose $P_0\dot{\ell }(\hat{\psi }_0;y,\mathbf {w})$ as $I_{n5}+I_{n6}$, where $I_{n5}=(P_0-P_{{\varvec{\theta }}_0,\hat{\psi }_0})\dot{\ell }(\psi _0;y,\mathbf {w})$ and $I_{n6}=(P_0-P_{{\varvec{\theta }}_0,\hat{\psi }_0})[\dot{\ell }(\hat{\psi }_0;y,\mathbf {w})-\dot{\ell }(\psi _0;y,\mathbf {w})]$. Observe that $I_{n5}=P_0\{\dot{\ell }(\psi _0;y,\mathbf {w})[(p_0-p_{{\varvec{\theta }}_0,\hat{\psi }_0})/p_0-\dot{\ell }_\psi (\varvec{\tau }_0;y,\mathbf {w})[\psi _0-\hat{\psi }_0]]\}$. By a Taylor’s expansion, $I_{n5}$ can be expressed as

$$\begin{aligned} I_{n5}=-(1/2)P_0[p^{-1}_0\dot{\ell }(\psi _0;y,\mathbf {w})d^2p({{\varvec{\theta }}_0,\psi _0+t(\hat{\psi }_0-\psi _0);y,\mathbf {w}})/dt^2]|_{t=t^*}, \end{aligned}$$

where $0<t^*<1$. According to conditions C3–C5 and the rate of convergence of $\hat{\psi }_0$, $I_{n5}=o_p(n^{-1/2})$. Similarly, a Taylor’s expansion and the rate of convergence of $\hat{\psi }_0$ yield $I_{n6}=o_p(n^{-1/2})$. This completes the proof of Theorem 3. $\square $

1.8 A.8 Proof of Theorem 4

Proof

In view of the consistency of $\hat{\varvec{\tau }}$ and Proposition 2.1 of Huang et al. (2008), we can show that $ {\mathbb P}_{n}[\dot{\ell }_{{\varvec{\theta }}}(\hat{\varvec{\tau }};y,\mathbf {w})-\dot{\ell }_\psi (\hat{\varvec{\tau }};y,\mathbf {w})[\hat{\mathbf {h}}^*]]^{\otimes 2}\rightarrow \mathbf {I}({\varvec{\theta }}_0)$ in probability. According to some entropy calculations and the law of large numbers, it follows that $\hat{E}_{{\varvec{\theta }}{\varvec{\theta }}}=\hat{A}_{{\varvec{\theta }}{\varvec{\theta }}}+o_p(1)$, $\hat{E}_{{\varvec{\theta }}\psi }=\hat{A}_{{\varvec{\theta }}\psi }+o_p(1)$, and $\hat{E}_{\psi \psi }=\hat{A}_{\psi \psi }+o_p(1)$. We conclude that $\mathcal {E}_n\rightarrow \mathbf {I}({\varvec{\theta }}_0)$ in probability. This completes the proof of Theorem 4. $\square $

About this article

Cite this article

Lu, M., Li, CS. Spline-based semiparametric estimation of a zero-inflated Poisson regression single-index model. Ann Inst Stat Math 68, 1111–1134 (2016). https://doi.org/10.1007/s10463-015-0527-8

Download citation

Received: 18 January 2014
Revised: 06 April 2015
Published: 02 July 2015
Issue Date: October 2016
DOI: https://doi.org/10.1007/s10463-015-0527-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spline-based semiparametric estimation of a zero-inflated Poisson regression single-index model

Abstract

Access this article

Similar content being viewed by others

Spline-based quasi-likelihood estimation of mixed Poisson regression with single-index models

Penalised spline estimation for generalised partially linear single-index models

Efficient estimation of quasi-likelihood models using B-splines

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

A Appendix

1.1 A.1 Notations and lemmas

Lemma 1

Lemma 2

Lemma 3

Remark 3

1.2 A.2 Proof of Lemma 1

Proof

1.3 A.3 Proof of Lemma 2

Proof

1.4 A.4 Proof of Lemma 3

Proof

1.5 A.5 Proof of Theorem 1

Proof

1.6 A.6 Proof of Theorem 2

Proof

1.7 A.7 Proof of Theorem 3

Proof

1.8 A.8 Proof of Theorem 4

Proof

About this article

Cite this article

Keywords

Navigation

Spline-based semiparametric estimation of a zero-inflated Poisson regression single-index model

Abstract

Access this article

Similar content being viewed by others

Spline-based quasi-likelihood estimation of mixed Poisson regression with single-index models

Penalised spline estimation for generalised partially linear single-index models

Efficient estimation of quasi-likelihood models using B-splines

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

A Appendix

A Appendix

1.1 A.1 Notations and lemmas

Lemma 1

Lemma 2

Lemma 3

Remark 3

1.2 A.2 Proof of Lemma 1

Proof

1.3 A.3 Proof of Lemma 2

Proof

1.4 A.4 Proof of Lemma 3

Proof

1.5 A.5 Proof of Theorem 1

Proof

1.6 A.6 Proof of Theorem 2

Proof

1.7 A.7 Proof of Theorem 3

Proof

1.8 A.8 Proof of Theorem 4

Proof

About this article

Cite this article

Share this article

Keywords

Search

Navigation