Abstract
A generalized partially linear single-index model (GPLSIM) is proposed in which the unknown smooth function of single index is approximated by a spline function that can be expressed as a linear combination of B-spline basis functions. The regression coefficients and the unknown smooth function are estimated simultaneously via a modified Fisher-scoring method. It can be shown that the estimators of regression parameters are asymptotically normally distributed. The asymptotic covariance matrix of the estimators can be estimated directly and consistently by using the least-squares method. As an application, the proposed GPLSIM can be employed to assess the lack of fit of a postulated generalized linear model (GLM) based on the comparison of the goodness of fit of the GPLSIM and postulated GLM to construct a likelihood ratio test. An extensive simulation study is conducted to examine the finite-sample performance of the likelihood ratio test. The practicality of the proposed methodology is illustrated with a real-life data set from a study of nesting horseshoe crabs.
Similar content being viewed by others
References
Agresti A (2002) Categorical data analysis, 2nd edn. Wiley, New York
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19:716–723
Brockmann HJ (1996) Satellite male groups in horseshoe crabs, Limulus polyphemus. Ethology 102:1–21
Carroll RJ, Fan J, Gijbels I, Wand MP (1997) Generalized partially linear single-index models. J Am Stat Assoc 92:477–489
de Boor C (2001) A practical guide to splines. Springer-Verlag, New York
Delecroix M, Härdle W, Hristache M (2003) Efficient estimation in conditional single-index regression. J Multivar Anal 86:213–216
Ding Y, Nan B (2011) A sieve M-theorem for bundled parameters in semiparametric models, with application to the efficient estimation in a linear model for censored data. Ann Stat 39:3032–3061
Efron B, Tibshirani R (1994) An introduction to the Bootstrap. Chapman Hall, New York
Härdle W, Stoker EM (1989) Investigating smooth multiple regression by the method of average derivatives. J Am Stat Assoc 84:986–995
Härdle W, Hall P, Ichimura H (1993) Optimal smoothing in single-index models. Ann Stat 21:157–178
Hart JD (1997) Nonparametric smoothing and lack-of-fit tests. Springer Verlag, New York
Hastie T, Tibshirani R (1990) Generalized additive models. Chapman Hall, New York
Horowitz JL, Härdle W (1996) Direct semiparametric estimation of single-index models with discrete covariate. J Am Stat Assoc 91:1632–1640
Huang J, Zhang Y, Hua L (2008) A least-squares approach to consistent information estimation in semiparametric models. Technical report 2008-3, University of Iowa, Departmant of Biostatistics
Huang JZ, Liu L (2006) Polynomial spline estimation and inference of proportional hazards regression models with flexible relative risk form. Biometrics 62:793–802
Ichimura H (1993) Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. J Econom 58:71–120
Koehler AB, Emily S, Murphree ES (1988) A comparison of the Akaike and Schwarz criteria for selecting model order. J R Stat Soc Ser C 37:187–195
Kosorok MR (2008) Introduction to empirical processes and semiparametric inference. Springer, Dordrecht
Lu M, Loomis D (2014) Spline-based semiparametric estimation of partially linear Poisson regression with single-index model. J Nonparametric Stat 25:905–922
Lu M, Zhang Y, Huang J (2007) Estimation of the mean function with panel count data using monotone polynomial splines. Biometrika 94:705–718
Lu M, Zhang Y, Huang J (2009) Semiparametric estimation methods for panel count data using monotone \(B\)-splines. J Am Stat Assoc 104:1060–1070
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman Hall, London
Nelder JA, Wedderburn RWM (1972) Generalized linear models. J R Stat Soc Ser A 135:370–384
Newey DA, Stoker TM (1993) Efficiency of weighted average derivative estimators and index models. Econometrica 61:1199–1223
Neyman J, Pearson ES (1933) On the problem of the most efficient tests of statistical hypotheses. Philos Trans R Soc A: Math Phys Eng Sci 231:289–337
Powell JL, Stock JH, Stoker TM (1989) Semiparametric estimation of index coefficients. Econometrica 57:1403–1430
Rosenberg PS (1995) Hazard function estimation using \(B\)-splines. Biometrics 51:874–887
Schumaker L (1981) Spline functions: basic theory. Wiley, New York
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Shen X, Wong WH (1994) Convergence rate of sieve estimates. Ann Stat 22:580–615
Stoker TM (1986) Consistent estimation of scaled coefficients. Econometrica 54:461–481
Stone CJ (1985) Additive regression and other nonparametric models. Ann Stat 13:689–705
Stone CJ (1986) The dimensionality reduction principle for generalized additive models. Ann Stat 14:590–606
Sun J, Kopciukb KA, Lu X (2008) Polynomial spline estimation of partially linear single-index proportional hazards regression models. Comput Stat Data Anal 53:176–188
van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer-Verlag, New York
Xia Y (2009) Model checking in regression via dimension reduction. Biometrika 96:133–148
Yu Y, Ruppert D (2002) Penalized spline estimation for partially linear single-index models. J Am Stat Assoc 97:1042–1054
Zhou S, Shen X, Wolfe DA (1986) Local asymptotics for regression splines and confidence region. Ann Stat 26:1760–1782
Acknowledgements
The authors express their thanks to an associate editor and two referees whose constructive comments improved the presentation.
Author information
Authors and Affiliations
Corresponding author
A Appendix
A Appendix
The proofs of Theorems 1 and 2 are sketched in this Appendix. Given a probability measure P and a measurable function f, let \(\mathbb {G}_nf=\sqrt{n}({\mathbb {P}}_{n}-P)f\) and \(\Vert \mathbb {G}_n\Vert _{\mathcal {F}}=\sup _{f\in \mathcal {F}}|\mathbb {G}_nf|\). In what follows, C represents a positive constant that may change from place to place.
1.1 A.1 Technical lemmas
Lemma 1
Let \(\mathcal {L}_1=\{\ell (\varvec{\tau };Y,\varvec{w}): \varphi \in \mathcal {S}_ n,d_2(\varvec{\tau },\varvec{\tau }_0)\le \eta \}\) for any \(\eta >0\), where \(\mathcal {S}_n=\mathcal {S}_n(\mathcal {K}_n,l)\). Then, for any \(0<\varepsilon \le \eta \), \(J_{[~]}(\eta ,\mathcal {L}_1,\Vert \cdot \Vert _{P,B})\le {C}q_n^{1/2}\eta \), where \(\Vert \cdot \Vert _{P,B}\) is the Bernstein norm defined as \(\Vert f\Vert _{P,B}^2=2P(e^{|f|}-|f|-1)\) in van der Vaart and Wellner (1996) and \(q_n=m_n+l\) is the number of spline basis functions.
Proof
By using the bracketing number calculation in Shen and Wong (1994), we can have \(\log {N}_{[~]}(\varepsilon ,\mathcal {S}_n,L_2(P))\le {C}q_n\log (\eta /\varepsilon )\) for any \(\eta >0\) and \(0<\varepsilon \le \eta \). Therefore, for each \(\varphi \in \mathcal {S}_n\), there exists a \([\varphi _i^L,\varphi _i^R]\) such that \(\varphi _i^L\le \varphi \le \varphi _i^R\) and \(P(\varphi _i^R-\varphi _i^L)^2\le \varepsilon ^2\) for \(1\le {i}\le (\eta /\varepsilon )^{Cq_n}\). We know that the neighborhoods of \(\mathcal {A}(\eta ) = \{\varvec{\alpha }_{\varvec{\zeta }}:\Vert \varvec{\alpha }_{\varvec{\zeta }}-\varvec{\alpha }_{\varvec{\zeta }_0}\Vert \le \eta \}\) and \(\mathcal {B}(\eta )=\{\varvec{\beta }:\Vert \varvec{\beta }-\varvec{\beta }_0\Vert \le \eta \}\) can be covered by \(C(\eta /\varepsilon )^p\) and \(C(\eta /\varepsilon )^d\) balls with radius \(\varepsilon \), respectively. Because \(\Vert \varvec{x}^T\varvec{\beta }_t-\varvec{x}^T\varvec{\beta }_s\Vert \le \Vert \varvec{x}\Vert \Vert \varvec{\beta }_t-\varvec{\beta }_s\Vert \) and \(\Vert \varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_t} - \varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_s}\Vert \le \Vert \varvec{z}\Vert \Vert \varvec{\alpha }_{\varvec{\zeta }_t}-\varvec{\alpha }_{\varvec{\zeta }_s}\Vert \) by the Cauchy–Schwarz inequality, it can be shown with the assumptions of finite moments for \(\varvec{x}\) and \(\varvec{z}\) that the bracketing numbers for the classes of \(\{\varvec{x}^T\varvec{\beta }:\Vert \varvec{\beta }-\varvec{\beta }_0\Vert \le \eta \}\) and \(\{\varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }}:\Vert \varvec{\alpha }_{\varvec{\zeta }}-\varvec{\alpha }_{\varvec{\zeta }_0}\Vert \le \eta \}\) are \((\eta /\varepsilon )^d\) and \((\eta /\varepsilon )^p\), up to constants, respectively, by Theorem 9.23 of Kosorok (2008). Hence, for any \(\varvec{\alpha }_{\varvec{\zeta }}\in \mathcal {A}(\eta )\) and \(\varvec{\beta }\in \mathcal {B}(\eta )\), there exist \(\varvec{\alpha }_{\varvec{\zeta }_r}\) and \(\varvec{\beta }_s\), \(1\le {r}\le {C}(\eta /\varepsilon )^d\) and \(1\le {s}\le {C}(\eta /\varepsilon )^p\), such that \(\Vert \varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }} - \varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_r}\Vert <C\varepsilon \) and \(\Vert \varvec{x}^T\varvec{\beta }-\varvec{x}^T\varvec{\beta }_s\Vert < C\varepsilon \). It is assumed without loss of generality that \(\varphi ^\prime (u)> 0\) at \(u=\varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }}\). \(\varphi (u)\) is then monotone increasing in a neighborhood of u by continuity of \(\varphi ^\prime (u)\) at u. Let
The class \(\mathcal {L}_1\) is clearly covered by the set \(\{[A_{i,s,r}^L,A_{i,s,r}^R]: i=1,\dots ,(\eta /\varepsilon )^{Cq_n},r=1,\dots ,C(\eta /\varepsilon )^p\), \(s = 1,\dots ,C(\eta /\varepsilon )^d\}\). We can then express \(A_{i,s,r}^R-A_{i,s,r}^L=I_{n1}+I_{n2}\), where
By Assumption A6, given \(\varvec{w}\), \(EY^2\) is uniformly bounded. Therefore, \(PI_{n1}^2\) is bounded by \(C\varepsilon ^2+CP\left[ \varphi _i^R(\varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_r}+C\varepsilon )-\varphi _i^L(\varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_r}-C\varepsilon )\right] ^2\) by using the Cauchy–Schwarz inequality. By the mean value theorem, \(I_{n2}=b^\prime (\vartheta _{i\varepsilon }^*)\big [\varphi _i^R(\varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_r}+C\varepsilon ) - \varphi _i^L(\varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_r}-C\varepsilon )+2\varepsilon \big ]\), where \(\vartheta _{i\varepsilon }^*\) is between \(\varvec{x}^T\varvec{\beta }_s-C\varepsilon +\varphi _i^L(\varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_r}-C\varepsilon )\) and \(\varvec{x}^T\varvec{\beta }_s+C\varepsilon +\varphi _i^R(\varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_r}+C\varepsilon )\). For sufficiently small \(\varepsilon \), \(\varphi _i^R(\varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_r}+C\varepsilon )\) and \(\varphi _i^L(\varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_r}-C\varepsilon )\) are bounded by the fact that \(\varphi (\varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_r})\) is bounded on \(\mathfrak {I}\). Therefore, based on the Cauchy–Schwarz inequality and Assumption A6, \(PI_{n2}^2\) is bounded by \(C\varepsilon ^2+CP\left[ \varphi _i^R(\varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_r}+C\varepsilon )-\varphi _i^L(\varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_r} -C\varepsilon )\right] ^2\). Furthermore, it can be shown by using the Cauchy–Schwarz inequality that
The last inequality holds due to smoothness of \(\varphi \) and \(\varphi _i^L\le \varphi \le \varphi _i^R\). It then follows that \(P(A_{i,s,r}^R-A_{i,s,r}^L)^2\le {C}\varepsilon ^2\). By the inequality \(2\left[ e^{|x|}-|x|-1\right] \le {x}^2\exp (|x|)\) and Assumptions A3–A6, we have \(\Vert A_{i,s,r}^R-A_{i,s,r}^L\Vert _{P,B}^2\le {C}P(A_{i,s,r}^R-A_{i,s,r}^L)^2\le {C}\varepsilon ^2\). This implies that the total number of brackets to cover \(\mathcal {L}_1\) is bounded by \(C(\eta /\varepsilon )^{p+d}(\eta /\varepsilon )^{Cq_n}\). Therefore, it can be shown that \(\log {N}_{[~]}(\varepsilon ,\mathcal {L}_1,\Vert \cdot \Vert _{P,B})\le {C}q_n\log (\eta /\varepsilon )\) and, hence, \(J_{[~]}(\eta ,\mathcal {L}_1,\Vert \cdot \Vert _{P,B})\le {C}q_n^{1/2}\eta \). \(\square \)
1.2 A.2 Proof of Theorem 1(a) (Consistency)
Let \({\mathbb {M}}(\vartheta (\varvec{w}))=P[Y\vartheta (\varvec{w})-b(\vartheta (\varvec{w}))]\) and \({\mathbb {M}}_{n}(\vartheta (\varvec{w}))={\mathbb {P}}_{n}[Y\vartheta (\varvec{w})-b(\vartheta (\varvec{w}))]\). Because \(\varphi _0\) has the rth bounded derivative on \(\mathfrak {I}\), based on Jackson’s theorem for polynomials of de Boor (2001), there exists a spline \(\varphi _n\in \mathcal {S}_n\) such that \(\Vert \varphi _n-\varphi _0\Vert _\infty =O(n^{-r\nu })\) for \(1/(2r+2)< \nu < 1/(2r)\). Let \(\vartheta _n(\varvec{w})=\varvec{x}^T\varvec{\beta }_0+\varphi _n(\varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_0})\). Recall that the estimator of \(\vartheta _0(\varvec{w})=\varvec{x}^T\varvec{\beta }_0+\varphi _0(\varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_0})\) is \(\hat{\vartheta }_n(\varvec{w})=\varvec{x}^T\hat{\varvec{\beta }}_n+\hat{\varphi }_n(\varvec{z}^T\hat{\varvec{\alpha }}_{\hat{\varvec{\zeta }}_n}) =\varvec{x}^T\hat{\varvec{\beta }}_n+\sum _{j=1}^{q_n}\hat{\gamma }_{n,j}\mathcal {B}_j(\varvec{z}^T\hat{\varvec{\alpha }}_{\hat{\varvec{\zeta }}_n})\).
Let \(\mathbf a \in \mathbb {R}^p\) and \(\mathbf b \in \mathbb {R}^d\) such that \(\Vert \varvec{x}^T\mathbf b +\varphi _n(\varvec{z}^T\mathbf a )\Vert _2^2=O(n^{-r\nu }+n^{-(1-\nu )/2})\). Denote \(h_n(\varvec{w})=\varvec{x}^T\mathbf b +\varphi _n(\varvec{z}^T\mathbf a )\) and \(H_n(\texttt {u}) = {\mathbb {M}}_{n}(\vartheta _n(\varvec{w})+\texttt {u}h_n(\varvec{w}))\). Because \(\Vert \vartheta _n-\vartheta _0\Vert _\infty =O(n^{-r\nu })\), this implies that \(\Vert \vartheta _n-\vartheta _0+\texttt {u}h_n\Vert _2^2=O(n^{-r\nu }+n^{-(1-\nu )/2})\) for any \(\texttt {u}>0\) and sufficiently large n. The first and second derivatives of \(H_n(\texttt {u})\) with respect to \(\texttt {u}\) are \(H_n^\prime (\texttt {u})=n{\mathbb {P}}_{n}\left[ Y-b^\prime (\vartheta _n(\varvec{w})+\texttt {u}h_n(\varvec{w}))\right] h_n(\varvec{w})\) and \(H_n^{\prime \prime }(\texttt {u})= -n{\mathbb {P}}_{n}[b^{\prime \prime }(\vartheta (\varvec{w})+\texttt {u}h_n(\varvec{w}))]h_n^2(\varvec{w})\), so \(H_n^\prime (\texttt {u})\) is a non-increasing function with respect to \(\texttt {u}\). Therefore, to prove the consistency of \(\hat{\vartheta }_n\), it is sufficient to show that for any \(\texttt {u}_0>0\), \(H_n^\prime (\texttt {u}_0)<0\) and \(H_n^\prime (-\texttt {u}_0)>0\) except on an event with probability converging to zero as \(n\rightarrow \infty \). Then \(\hat{\vartheta }_n(\varvec{w})\) must be between \(\vartheta _n(\varvec{w})-\texttt {u}_0{h}_n(\varvec{w})\) and \(\vartheta _n(\varvec{w})+\texttt {u}_0{h}_n(\varvec{w})\) and, hence, \(\Vert \hat{\vartheta }_n-\vartheta _n\Vert _2\le \texttt {u}_0\Vert h_n\Vert _2\). We can write \(H_n^\prime (\texttt {u}_0)=I_{n3}+I_{n4}\), where \(I_{n3}=({\mathbb {P}}_{n}-P)[Y-b^\prime (\vartheta _n(\varvec{w})+\texttt {u}_0h_n(\varvec{w}))]h_n(\varvec{w})\) and \(I_{n4}=P[Y-b^\prime (\vartheta _n(\varvec{w})+\texttt {u}_0h_n(\varvec{w}))]h_n(\varvec{w})\). For any \(\eta > 0\), let \(\mathcal {L}_2=\{[Y-b^\prime (\vartheta (\varvec{w}))] [\vartheta (\varvec{w})-\vartheta _n(\varvec{w})]:\varphi _n\in \mathcal {S}_n, \Vert \vartheta _n-\vartheta \Vert _2\le \eta \}\). By the bracketing integral \(J_{[~]}(\eta ,\mathcal {S}_n,L_2(P))\le Cq_n^{1/2}\eta \) and Assumptions A3 – A6, it can be shown that \(J_{[~]}(\eta ,\mathcal {L}_2,L_2(P))\le {C}q_n^{1/2}\eta \) by using the arguments similar to those in Lemma A1 and, hence, \(\mathcal {L}_2\) is a Donsker class. Furthermore, \(P[Y-b^\prime (\vartheta _n(\varvec{w})+\texttt {u}_0h_n(\varvec{w}))]^2h_n^2(\varvec{w})\rightarrow 0\) as \(n\rightarrow \infty \). Hence, \(I_{n3}=O_p(n^{-1/2})\). By the mean value theorem and Assumptions A3 – A6, \(I_{n4}\) can be written as \(P\left\{ b^\prime \left( \vartheta ^*(\varvec{w})\right) [\vartheta _0(\varvec{w})-\vartheta _n(\varvec{w})-\texttt {u}_0h_n(\varvec{w})]h_n(\varvec{w})\right\} \le -CPh_n^2(\varvec{w})=-Cp_n^{-1}\) for sufficiently large n, where \(\vartheta ^*(\varvec{w})\) is between \(\vartheta _0(\varvec{w})\) and \(\vartheta _n(\varvec{w})+\texttt {u}_0h_n(\varvec{w})\) and \(p_n^{-1}=n^{-r\nu }+n^{-(1-\nu )/2}\). Because \(n^{-r\nu }+n^{-(1-\nu )/2}\ge {n}^{-r/(1+2r)}>n^{-1/2}\) for \(1/(2r+2)<\nu < 1/(2r)\), it then follows that \(H_n^\prime (\texttt {u}_0)\le {O_p}(n^{-1/2})-p_n^{-1}<0\) except on an event with probability converging to zero as \(n\rightarrow \infty \). The same arguments can show \(H_n^\prime (-\texttt {u}_0)> 0\) with probability converging to 1 as \(n\rightarrow \infty \). Therefore, \(\Vert \hat{\vartheta }_n-\vartheta _n\Vert _2^2=O_p(n^{-r\nu } + n^{-(1-\nu )/2})\) and, hence, \(\Vert \hat{\vartheta }_n-\vartheta _0\Vert _2^2=O_p(n^{-r\nu } +n^{-(1-\nu )/2})\). This completes the proof of Theorem 1(a).
1.3 A.3 Proof of Theorem 1 (b) (Rate of Convergence)
Theorem 3.4.1 of van der Vaart and Wellner (1996) will be applied to prove the rate of convergence. Let \(\ell (\vartheta (\varvec{w});Y,\varvec{w})=Y\vartheta (\varvec{w})-b(\vartheta (\varvec{w}))\). To apply the theorem, it is needed to find \(\phi _n(\eta )\) such that \(\phi _n(\eta )/\eta \) is decreasing in \(\eta \) and \(P\sup _{\eta /2\le \Vert \vartheta -\vartheta _n\Vert _2\le \eta }|{\mathbb {G}}_{n}\ell (\vartheta (\varvec{w});Y,\varvec{w}) -{\mathbb {G}}_{n}\ell (\vartheta _n(\varvec{w});Y,\varvec{w})|\le {C}\phi _n(\eta )\). Let \(\mathcal {L}_3=\{\ell (\vartheta (\varvec{w});Y,\varvec{w})-\ell (\vartheta _n(\varvec{w});Y,\varvec{w}): \varphi _n\in \mathcal {S}_n,\Vert \vartheta -\vartheta _n\Vert _2\le \eta \}\). It can be shown with the arguments similar to those in Lemma A.1 that \(J_{[~]}(\eta ,\mathcal {L}_3,\Vert \cdot \Vert _{P,B})\le {C}q_n^{1/2}\eta \). Moreover, it can be shown that \(\Vert \ell (\vartheta (\varvec{w});Y,\varvec{w})-\ell (\vartheta _n(\varvec{w});Y,\varvec{w})\Vert _{P,B}^2\le {C}\eta ^2\) for any \(\ell (\vartheta (\varvec{w});Y,\varvec{w})-\ell (\vartheta _n(\varvec{w});Y,\varvec{w})\in \mathcal {L}_3\). Therefore, based on Lemma 3.4.3 of van der Vaart and Wellner (1996), we have
When choosing \(\phi _n(\eta )=q_n^{1/2}\eta + n^{-1/2}q_n\), \(\phi _n(\eta )/\eta \) is clearly decreasing in \(\eta \). Hence, when we choose \(d_n\) defined in Theorem 3.4.1 of van der Vaart and Wellner (1996) to be \(d_n^2(\hat{\vartheta }_n,\vartheta _n)={\mathbb {M}}(\vartheta _n(\varvec{w}))-{\mathbb {M}}(\hat{\vartheta }_n(\varvec{w}))\), it can be shown that \(r_n^2[{\mathbb {M}}(\vartheta _n(\varvec{w}))-{\mathbb {M}}(\hat{\vartheta }_n(\varvec{w}))]=O_p(1)\), where \(r_n\) satisfies \(r_n^2\phi _n(1/r_n)\le n^{1/2}\) for every n. Note that \(n^{1-\nu }\phi _n(1/n^{(1-\nu )/2})=2n^{1/2}\) and that if \((1-\nu )/2\ge {r}\nu \), then \(n^{2r\nu }\phi _n(1/n^{r\nu })=n^{1/2}[n^{r\nu -(1-\nu )/2} + n^{2r\nu -(1-\nu )}]\le 2n^{1/2}\). It follows that \(r_n=n^{\min \{r\nu ,(1-\nu )/2\}}\). Consequently, \({\mathbb {M}}(\vartheta _n(\varvec{w}))-{\mathbb {M}}(\hat{\vartheta }_n(\varvec{w}))=O_p(n^{-2r\nu }+n^{-(1-\nu )})\). By using the inequality \(x^2/4\le \exp (x)-1-x\le {x^2}\) for x in a neighborhood of 0, we can show that \({\mathbb {M}}(\vartheta _0(\varvec{w}))-{\mathbb {M}}(\vartheta _n(\varvec{w}))\le {C}\Vert \vartheta _n-\vartheta _0\Vert _2^2=O_{p}(n^{-2r\nu } + n^{-(1-\nu )})\) and \({\mathbb {M}}(\vartheta _0(\varvec{w}))- {\mathbb {M}}(\hat{\vartheta }_n(\varvec{w}))\ge {C}\Vert \hat{\vartheta }_n-\vartheta _0\Vert _2^2\). By the equation \({\mathbb {M}}(\vartheta _0(\varvec{w}))-{\mathbb {M}}(\hat{\vartheta }_n(\varvec{w}))= {\mathbb {M}}(\vartheta _0(\varvec{w})) - {\mathbb {M}}(\vartheta _n(\varvec{w}))+{\mathbb {M}}(\vartheta _n(\varvec{w}))-{\mathbb {M}}(\hat{\vartheta }_n(\varvec{w}))\), we have \(\Vert \hat{\vartheta }_n-\vartheta _0\Vert _2^2=O_p(n^{-2r\nu }+n^{-(1-\nu )})\). \((\hat{\varvec{\beta }}_n-\varvec{\beta }_0)^T[\varvec{x}-E(\varvec{x}|\varvec{z})]\) is orthogonal to \((\hat{\varvec{\beta }}_n-\varvec{\beta }_0)^TE(\varvec{x}|\varvec{z})+ \hat{\varphi }_n(\varvec{z}^T\hat{\varvec{\alpha }}_{\hat{\varvec{\zeta }}_n}) -\varphi _0(\varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_0})\), so it follows that
Because \(E[\varvec{x}-E(\varvec{x}|\varvec{z})]^{\otimes 2}\) is assumed to be nonsingular, it follows that \(\Vert \hat{\varvec{\beta }}_n-\varvec{\beta }_0\Vert _2^2 = O_p(n^{-2r\nu }+n^{-(1-\nu )})\). This in turn yields \(\Vert (\hat{\varvec{\beta }}_n-\varvec{\beta }_0)^TE(\varvec{x}|\varvec{z})+\hat{\varphi }_n(\varvec{z}^T\hat{\varvec{\alpha }}_{\hat{\varvec{\zeta }}_n})-\varphi _0(\varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_0})\Vert _2^2 =O_p(n^{-2r\nu }+n^{-(1-\nu )})\). By the triangle inequality \(\big |\Vert \mathbf a \Vert _2^2-\Vert \mathbf b \Vert _2^2\big |\le \Vert \mathbf a +\mathbf b \Vert _2^2\) and bounded fourth moment of \(\varvec{x}\), it follows that \(\Vert \hat{\varphi }_n(\varvec{z}^T\hat{\varvec{\alpha }}_{\hat{\varvec{\zeta }}_n})-\varphi _0(\varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_0})\Vert _2^2 = O_p(n^{-2r\nu } + n^{-(1-\nu )})\). Thus, by Lemma 1 of Stone (1985) and Assumptions A3 and A4, we have \(\Vert \hat{\varvec{\alpha }}_{\hat{\varvec{\zeta }}_n} - \varvec{\alpha }_{\varvec{\zeta }_0}\Vert _2^2 + \Vert \hat{\varphi }_n-\varphi _0\Vert _2^2=O_p(n^{-2r\nu }+n^{-(1-\nu )})\). Consequently, by Lemma 7 of Stone (1986), \(\Vert \hat{\varphi }_n-\varphi _0\Vert _{\infty }=0_p(1)\). This completes the proof of Theorem 1(b).
1.4 A.4 Proof of Theorem 1 (c) (Asymptotic Normality)
To derive the asymptotic normality of \(\hat{\varvec{\theta }}_{\hat{\varvec{\zeta }}_n}\), it is needed to verify conditions of Theorem 2.1 in Ding and Nan (2011). Condition A1 is valid because of the rate of convergence of \(\hat{\varvec{\tau }}_n\). Condition A2 holds due to the property of generalized linear models. For condition A3, observe that \(\ddot{S}_{\varvec{\theta }_{\varvec{\zeta }}\varphi }(\varvec{\tau }_0)[h] - \ddot{S}_{\varphi \varphi }(\varvec{\tau }_0)[\mathbf h ,h] = \displaystyle E\left\{ V(u_0)h(u_0)\begin{pmatrix} \mathbf h _1(u_0) - \varphi _0^\prime (u_0)\varvec{D}(\varvec{\zeta }_0)\varvec{z}\\ \mathbf h _2(u_0) - \varvec{x}, \end{pmatrix} \right\} = 0\), for any \(h\in \mathbf {H}\), where \(u_0= \varvec{z}^T\varvec{\alpha }_{\varvec{\zeta }_0}\) and \(\mathbf h = (\mathbf h _1^T,\mathbf h _2^T)^T\). By applying the law of total expectation, we have \( {{\mathbf {\mathsf{{h}}}}}_1^*(u_0) =\varphi _0^\prime (u_0)\displaystyle {E[V(\mu _0)\varvec{D}(\varvec{\zeta }_0)\varvec{z}|u_0]}/{E[V(\mu _0)|u_0]}\) and \({{\mathbf {\mathsf{{h}}}}}_2^*(u_0) ={E[V(\mu _0)\varvec{x}|u_0]}/{E[V(\mu _0)|u_0]}. \) For condition A4, \({\mathbb {P}}_{n}\dot{\ell }_{\varvec{\theta }_{\varvec{\zeta }}}(\hat{\varvec{\tau }}_n;Y,\varvec{w})=0\), so we only need to verify \({\mathbb {P}}_{n}\dot{\ell }_\varphi (\hat{\varvec{\tau }};Y,\varvec{w})[{{\mathbf {\mathsf{{h}}}}}^*]=o_p(n^{-1/2})\). According to Jackson’s theorem for polynomials (de Boor 2001), there exist \(\tilde{\varvec{\zeta }}_n^*=(\tilde{\zeta }_{1,n}^*,\dots ,\tilde{\zeta }_{p+d-1,n}^*)^T\in \mathcal {S}_n^{p+d-1}\) such that \(\Vert h_j^*-\tilde{\zeta }_{j,n}^*\Vert _\infty =O(n^{-r\nu })\), \(j=1,\dots ,p+d-1\), for \(1/(2r+2)< \nu < 1/(2r)\). For any \(\textsf {h}\in \mathcal {S}_n\), according to the property that \((\hat{\varvec{\theta }}_{\hat{\varvec{\zeta }}_n},\hat{\varphi }_n)\) maximizes \({\mathbb {P}}_{n}\ell (\varvec{\theta }_{\varvec{\zeta }},\varphi ;Y,\varvec{w})\) over the region \((\hat{\varvec{\theta }}_{\hat{\varvec{\zeta }}_n},\hat{\varphi }_n+\varepsilon {h})\), we have
It follows that \({\mathbb {P}}_{n}\{Y -b^\prime (\varvec{x}^T\hat{\varvec{\beta }}_n+\hat{\varphi }_n(\varvec{z}^T\hat{\varvec{\alpha }}_{\hat{\varvec{\zeta }}_n}))\} \tilde{\zeta }_{j,n}^*(\varvec{z}^T\hat{\varvec{\alpha }}_{\hat{\varvec{\zeta }}_n})=0\), \(j= 1,\dots ,p+d-1\). Therefore, to show \({\mathbb {P}}_{n}\dot{\ell }_\varphi (\hat{\varvec{\tau }}_n;Y,\varvec{w})[\textsf {h}_j^*(\varvec{z}^T\hat{\varvec{\alpha }}_{\hat{\varvec{\zeta }}_n})]=o_p(n^{-1/2})\) is equivalent to showing \({\mathbb {P}}_{n}\dot{\ell }_\varphi (\hat{\varvec{\tau }}_n;Y,\varvec{w})[\textsf {h}_j^*(\varvec{z}^T\hat{\varvec{\alpha }}_{\hat{\varvec{\zeta }}_n}) -\tilde{\zeta }_{j,n}^*(\varvec{z}^T\hat{\varvec{\alpha }}_{\hat{\varvec{\zeta }}_n})] = I_{n5}+I_{n6}=o_p(n^{-1/2})\), where \(I_{n5}=({\mathbb {P}}_{n}-P)\dot{\ell }_\varphi (\hat{\varvec{\tau }}_n;Y,\varvec{w})[\textsf {h}_j^*(\varvec{z}^T\hat{\varvec{\alpha }}_{\hat{\varvec{\zeta }}_n}) -\tilde{\zeta }_{j,n}^*(\varvec{z}^T\hat{\varvec{\alpha }}_{\hat{\varvec{\zeta }}_n})]\) and \(I_{n6}=P\dot{\ell }_\varphi (\hat{\varvec{\tau }}_n;Y,\varvec{w})[\textsf {h}_j^*(\varvec{z}^T\hat{\varvec{\alpha }}_{\hat{\varvec{\zeta }}_n}) -\tilde{\zeta }_{j,n}^*(\varvec{z}^T\hat{\varvec{\alpha }}_{\hat{\varvec{\zeta }}_n})]\). Let \(\mathcal {L}_{4,j}=\{\dot{\ell }_\varphi (\varvec{\tau };Y,\varvec{w})[\textsf {h}_j^*-\textsf {h}]: \textsf {h}\in \mathcal {S}_n,d_2(\varvec{\tau },\varvec{\tau }_0)\le \eta ,\Vert \textsf {h}_j^*-\textsf {h}\Vert _\infty \le \eta \}\) for \(j=1,\dots , p+d-1\). Using the arguments similar to those in Lemma A.1, we can show that \(J_{[~]}(\eta ,\mathcal {L}_{4,j},\Vert \cdot \Vert _{P,B})\le {C}q_n^{1/2}\eta \). Furthermore, by Assumptions A3 - A6, we have \(\Vert \dot{\ell }_\varphi (\varvec{\tau };Y,\varvec{w})[\textsf {h}_j^*-\textsf {h}]\Vert _{P,B}^2\le {C}\Vert \textsf {h}_j^*-\textsf {h}\Vert _\infty ^2 \le {C}\eta ^2\). Therefore, by Lemma 3.4.3 of van der Vaart and Wellner (1996), we have \(E_P\Vert {\mathbb {G}}_{n}\Vert _{\mathcal {L}_{4,j}}\le O\left( n^{-(\nu -1/2)}+n^{-(r-1/2)\nu }\right) =o(1)\) for \(\eta =O(n^{-r\nu })\) and \(0<\nu <1/2\). Hence, \(I_{n5}=o_p(n^{-1/2})\). Furthermore, the Cauchy–Schwarz inequality and the rate of convergence of \(\hat{\varvec{\tau }}_n\) yield \(I^2_{n6}\le O_p(n^{-2r\nu } + n^{-(1-\nu )})O_p(n^{-2r\nu })=o_p(n^{-1})\). Thus, condition A4 holds. To verify condition A5 is equivalent to showing \({\mathbb {G}}_{n}[\ell _{\varvec{\alpha }_{\varvec{\zeta }}}(\hat{\varvec{\tau }}_n;Y,\varvec{w})-\ell _{\varvec{\alpha }_{\varvec{\zeta }}}(\varvec{\tau }_0;Y,\varvec{w})] = o_p(1)\) and \({\mathbb {G}}_{n}[\ell _\varphi (\hat{\varvec{\tau }}_n;Y,\varvec{w})[\mathbf {h}^*] -\ell _\varphi (\varvec{\tau }_0;Y,\varvec{w})[\mathbf {h}^*]] = o_p(1)\). We only show the second equation because the proof of the first equation is similar. The equation holds by showing that the bracketing integral of the class \(\mathcal {L}_{5}=\{\ell _\varphi (\varvec{\tau };Y,\varvec{w})[h_s^*]-\ell _\varphi (\varvec{\tau }_0;Y,\varvec{w}[h_s^*]): d_2(\varvec{\tau },\varvec{\tau }_0)\le \eta \}\), \(s = 1,\ldots ,p+d-1\), is bounded by \(Cq_n^{1/2}\eta \) and \(\Vert r^2(\varvec{\tau };Y,\varvec{w})\Vert _{P,B}^2\le {C}\eta ^2\) for any \(r(\varvec{\tau };Y,\varvec{w})\in \mathcal {L}_5\), and applying Lemma 3.4.3 of van der Vaart and Wellner (1996). Finally, condition A6 holds by applying a Taylor’s expansion and the Cauchy–Schwarz inequality as well as Assumptions A3–A6. We conclude that Theorem 2.1 of Ding and Nan (2011) applies and yields the asymptotic normality of \(\hat{\varvec{\alpha }}_{\varvec{\zeta }_n}\). The proof of Theorem 1(c) is complete.
1.5 A.5 Proof of Theorem 2 (Variance Estimation)
Denote \(\mathcal {L}_{6,r}=\{\dot{\ell }_{\varvec{\theta }_{\varvec{\zeta }},r}(\varvec{\tau };Y,\varvec{w}) - \dot{\ell }_\varphi (\varvec{\tau };Y,\varvec{w})[\textsf {h}]:d_2(\varvec{\tau },\varvec{\tau }_0)\le \eta ,\textsf {h}\in {{\mathbf {\mathsf{{H}}}}}\}\), \(r=1,\dots ,p+d-1\). By the bracketing number calculation in Shen and Wong (1994) and Assumptions A3–A6, we can show that \(\mathcal {L}_{6,r}\) is a Glivenko–Cantelli class. Based on the consistency of \(\hat{\varvec{\tau }}_n\) and Proposition 2.1 in Huang et al. (2008), it can then be shown that \({\mathbb {P}}_{n}[\dot{\ell }_{\varvec{\theta }_{\varvec{\zeta }}}(\hat{\varvec{\tau }}_n;Y,\varvec{w}) - \dot{\ell }_\varphi (\hat{\varvec{\tau }}_n;Y,\varvec{w})[\hat{{{\mathbf {\mathsf{{h}}}}}}_n^*]]^{\otimes 2}\rightarrow \mathcal {I}(\varvec{\theta }_{\varvec{\zeta }_0})\) in probability. It can be shown by the law of large numbers and with some entropy calculations that \(\hat{E}_{\varvec{\theta }_{\varvec{\zeta }}\varvec{\theta }_{\varvec{\zeta }}}=\hat{A}_{\varvec{\theta }_{\varvec{\zeta }}\varvec{\theta }_{\varvec{\zeta }}}+o_p(1)\), \(\hat{E}_{\varvec{\theta }_{\varvec{\zeta }}\varphi }=\hat{A}_{\varvec{\theta }_{\varvec{\zeta }}\varphi }+o_p(1)\), and \(\hat{E}_{\varphi \varphi }=\hat{A}_{\varphi \varphi } +o_p(1)\). Therefore, \(\hat{\mathcal {E}}_n\rightarrow \mathcal {I}(\varvec{\theta }_{\varvec{\zeta }_0})\) in probability. It follows by the consistency of \(\hat{\varvec{\tau }}\) that \(\varvec{J}(\hat{\varvec{\zeta }}_n)\hat{\mathcal {E}}_n^{-1}\varvec{J}^T(\hat{\varvec{\zeta }}_n)\) \(\rightarrow \varvec{J}(\varvec{\zeta }_0)\mathcal {I}^{-1}(\varvec{\theta }_{\varvec{\zeta }_0})\varvec{J}^T(\varvec{\zeta }_0)\) in probability. This completes the proof of Theorem 2.
Rights and permissions
About this article
Cite this article
Li, CS., Lu, M. A lack-of-fit test for generalized linear models via single-index techniques. Comput Stat 33, 731–756 (2018). https://doi.org/10.1007/s00180-018-0802-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-018-0802-2