Abstract
We propose a residual-marked empirical process test to check goodness of fit for generalized partially linear models. The proposed test can gain dimension reduction, is shown to be consistent, and can detect root-n local alternatives. We further establish asymptotic distributions of the proposed test under the null hypothesis and analyze asymptotic properties under the local and global alternatives, and suggest a bootstrap procedure for calculating the critical value. We investigate its numerical performance by simulation experiments and illustrate its utilization in two real data examples.
Similar content being viewed by others
References
Bierens HJ (1982) Consistent model specification tests. J Econom 20:105–134
Boente G, He X, Zhou J (2006) Robust estimates in generalized partially linear models. Ann Stat 34(6):2856–2878
Chen B, Zhou X-H (2013) Generalized partially linear models for incomplete longitudinal data in the presence of population-level information. Biometrics 69(2):386–395
Dikta G, Kvesic M, Schmidt C (2006) Bootstrap approximations in model checks for binary data. J Am Stat Assoc 101:521–530
Escanciano JC (2006) A consistent diagnostic test for regression models using projections. Econom Theory 22:1030–1051
Flynn PM, Rudy BJ, Douglas SD, Lathey J, Spector SA, Martinez J, Silio M, Belzer M, Friedman L, D’Angelo L, McNamara J, Hodge J, Hughes MD, Lindsey JC (2004) Virologic and immunologic outcomes after 24 weeks in HIV type 1-infected adolescents receiving highly active antiretroviral therapy. J Infect Dis 179:271–279
Härdle W, Huet S, Mammen E, Sperlich S (2004) Bootstrap inference in semiparametric generalized additive models. Econom Theory 20(02):265–300
Härdle W, Mammen E, Müller M (1998) Testing parametric versus semiparametric modeling in generalized linear models. J Am Stat Assoc 93:1461–1474
Härdle W, Müller M, Sperlich S, Werwatz A (2004) Nonparametric and semiparametric models. New York
He X, Fung WK, Zhu Z (2005) Robust estimation in generalized partial linear models for clustered data. J Am Stat Assoc 100(472):1176–1184
Horowitz JL (2001) Nonparametric estimation of a generalized additive model with an unknown link function. Econometrica 69:499–513
Horowitz JL, Mammen E (2007) Rate-optimal estimation for a general class of nonparametric regression models with unknown link functions. Ann Stat 35(6):2589–2619
Hunsberger S (1994) Semiparametric regression in likelihood-based models. J Am Stat Assoc 89:1354–1365
Hunsberger S, Albert PS, Follmann DA, Suh E (2002) Parametric and semiparametric approaches to testing for seasonal trend in serial count data. Biostatistics 3:289–298
Kosorok MR (2008) Introduction to empirical processes and semiparametric inference. Springer
Leng C, Liang H, Martinson N (2011) Efficient variable selection for semiparametric generalized partially linear models with applications in study of condom use for HIV patients. Stat Med 30:2015–2027
Liang H, Wu H, Carroll R (2003) The relationship between virologic and immunologic responses in AIDS clinical research using mixed-effect varying-coefficient ssemiparametric models with measurement error. Biostatistics 4:297–312
Severini TA, Staniswalis JG (1994) Quasi-likelihood estimation in semiparametric models. J Am Stat Assoc 89:501–511
Severini TA, Wong WH (1992) Generalized profile likelihood and conditionally parametric models. Ann Stat 20:1768–1802
Smith JW, Everhart J, Dickson W, Knowler W, Johannes R (1988) Using the adap learning algorithm to forecast the onset of diabetes mellitus. IEEE Computer Society Press, pp 261–265
Stute W, Zhu L-X (2002) Model checks for generalized linear models. Scand J Stat Theory Appl 29:535–545
Sun Z, Chen F, Liang H, Ruppert D (2022) Efficient diagnosis for parametric regression models with distortion measurement errors incorporating dimension-reduction. Stat Sin 32:1661–1681
van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer-Verlag
Wang L, Liu X, Liang H, Carroll R (2011) Estimation and variable selection for generalized additive partial linear models. Ann Stat 39:1827–1851
Wang X, Zhang J, Yu L, Yin G (2015) Generalized partially linear single-index model for zero-inflated count data. Stat Med 34(5):876–886
Weber G-W, Çavuşoğlu Z, Özmen A (2012) Predicting default probabilities in emerging markets by new conic generalized partial linear models and their optimization. Optimization 61(4):443–457
Wood SN (2017) Generalized additive models, Texts in statistical science series. CRC Press, Boca Raton. FL, An introduction with R
Wu H, Kuritzkes D, Clair M, Spear G, Connick E, Landay A, Lederman M (1999) Characterizing individual and population viral dynamics in HIV-1-infected patients with potent antiretroviral therapy: correlations with host-specific factors and virological endpoints. J Infect Dis 179:799–897
Xia Y, Li W, Tong H, Zhang D (2004) A goodness-of-fit test for single-index models. Stat Sin 14(1):1–28
Zheng X, Qin G, Tu D (2017) A generalized partially linear mean-covariance regression model for longitudinal proportional data, with applications to the analysis of quality of life data from cancer clinical trials. Stat Med 36(12):1884–1894
Zhu Z-Y, He X, Fung W-K (2003) Local influence analysis for penalized Gaussian likelihood estimators in partially linear models. Scand J Stat 30(4):767–780
Acknowledgements
The authors thank the editor and two referees for their helpful suggestions and constructive comments. Li’s research was partially supported by NNSFC grant 11871294. Härdle gratefully acknowledges the financial support of the European Union’s Horizon 2020 research and innovation program “FIN-TECH: A Financial supervision and Technology compliance training programme" under the grant agreement No 825215 (Topic: ICT-35-2018, Type of action: CSA), the European Cooperation in Science & Technology COST Action grant CA19130 - Fintech and Artificial Intelligence in Finance - Towards a transparent financial industry, and the Deutsche Forschungsgemeinschaft’s IRTG 1792 grant.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No potential competing interest was reported by the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Technical details
Appendix: Technical details
We need the representations of \(\widehat{\varvec{\beta }}_n-\varvec{\beta }_0\) and \(\widehat{m}_n(Z_i) - m_0(Z_i)\), which play a critical role in the proofs of the main results and need the following assumptions modified from Wang et al. (2011).
Let \(\nu \) be a positive integer and \(\alpha \in (0,1] \) such that \(\zeta =\nu +\alpha >2\). Let \({\mathcal {H}}(\zeta )\) be the collection of functions G on [0, 1] whose \(\nu \)th derivative, \(G^{(\nu )}\), exists and satisfies a Lipschitz condition of order \(\alpha \), \(|G^{(\nu ) }(s^{*})-G^{(\nu )}(s) | \le C| {s^{*}}-s| ^{\alpha }\), for \(0\le {s^{*}}, s\le 1\), where C is a positive constant. Let \(\rho _{\ell }(s)=\{{dG(s)/ds}\}^{\ell }/ \sigma \{G(s)\} \) and \(q_{\ell }(s, y) =\partial ^{\ell }/\partial s^{\ell }Q\{G(s),y\}\), so that
Write \(\textbf{A}^{\otimes 2}=\textbf{A} \textbf{A}^{\top }\) for any matrix or vector \(\textbf{A}\). We make the following assumptions.
-
(C1)
The function \(m^{(2)}(\cdot )\) is continuous and \(m(\cdot )\in {\mathcal {H}}(\zeta )\).
-
(C2)
The function \(q_{2}(s, y) <0\) and \(c_{q}<| q_{2}^{k}(s, y) | <C_{q}\) (\(k=0,1\)) for \(s\in R\) and y in the range of the response variable.
-
(C3)
The distribution of Z is absolutely continuous and its density f is bounded away from zero and infinity on [0, 1].
-
(C4)
The random vector X satisfies that
$$\begin{aligned} c\le E(X^{\otimes 2}|Z=z)\le C. \end{aligned}$$ -
(C5)
The number of knots \(N_n\) satisfies \(n^{1/(2\zeta ) }\ll N_{n}\ll n^{1/4}\).
-
(C6)
For \(\rho _{\ell }\), we have
$$\begin{aligned} |\rho _{\ell }(s_0) | \le C_{\rho }\text { and }| \rho _{\ell }(s) -\rho _{\ell }( s_0) | \le C_{\rho }^{*}|s-s_0| \text { for all }|s-s_0| \le {C_{s},\ \ell =1,2}, \end{aligned}$$where and below s and \(s_0\) appearing in \(\rho _{\ell }(\cdot )\) correspond to \(X^{\top }\varvec{\beta }+m(Z)\) and \(X^{\top }\varvec{\beta }_0+m_0(Z)\), respectively.
-
(C7)
There exists a positive constant \(C_{0}\), such that \(E[\{Y-G(X^{\top }\varvec{\beta }+m(Z))\}^{2}|V] \le C_{0}\), almost surely.
Let \(\alpha _n=n^{-1/4}\log n\).
and
The proof of (A.1) is referred to the proof of Theorem 1 of Wang et al. (2011), and the proof of (A.2) is referred to the proof of the last line on page 1847 of Wang et al. (2011).
Proof of Theorem 1
By the definition of \(M_{n}(u, W)\), we have
It follows from model (2) that
Let us examine \(X_i^{\top } (\widehat{\varvec{\beta }}_n-\varvec{\beta }_0)+\widehat{m}_n(Z_i)-m_0(Z_i)\). This expression can be simplified as follows using (A.1) and (A.2).
The second term \(B_{n2}(u, W)\) in (A.3) can be simplified as follows:
Recall \(\Gamma (u)=E\left[ \widetilde{X}_1^{\top } \{E(S_{1,2} \widetilde{X}_1 \widetilde{X}_1^{\top })\}^{-1} I(V_1^{\top } W\le u) G'\{X_1^{\top }\varvec{\beta }_0+m_0(Z_1)\}\right] \). As a result, we have
So, we have the following expression for \(M_{n}(u, W)\).
It is easy to see that \(I(V^{\top } W\le u)\) is monotone with respect to u. By Lemma 9.10 of Kosorok (2008), the function class \(\{I(V^{\top } W\le u): u\in {\mathbb {R}}^1\}\) is a VC-class. Similarly the function class \(\{\Gamma (u): u\in {\mathbb {R}}^1\}\) is a VC-class as well. By Theorem 2.6.8 of van der Vaart and Wellner (1996), the function classes \(\{\varepsilon I(V^{\top } W\le u): u\in {\mathbb {R}}^1\}\) and the class \(\{\Gamma (u)\frac{G'\{X^{\top }\varvec{\beta }_0+m_0(Z)\}}{\sigma (G\{X^{\top }\varvec{\beta }_0 +m_0(Z)\})}\widetilde{X}: u\in {\mathbb {R}}^1\}\) are all VC-class. Then, by Lemma 9.17 of Kosorok (2008), the function class \(\{\Psi _{u}({u},{y},\varepsilon ,w): u\in {\mathbb {R}}^1\}\) is a VC-class. By Theorem 2.6.7 and Theorem 2.5.2 of van der Vaart and Wellner (1996), we can prove that the estimated empirical process \(M_{n}(u, W)\) converges weakly to M(u) in the Skorokhod space \(S[\Pi ]\). By the continuous mapping theorem, we prove the result for \(T_{n}\). \(\square \)
Proof of Theorem 2
Under the local alternatives (8), we have
Along the line to prove Theorem 1, we can validate that, under the alternatives (8),
When \(r_nn^{1/2}\rightarrow \infty \), the second term tends to infinity. So the first assertion holds. When \(r_nn^{1/2}\rightarrow C_r\), we have
\(\square \)
Proof of Theorem 3
Let \(\widehat{\varvec{\beta }}_n^*\) and \(\widehat{m}_n^*(\cdot )\) be the regression spline-based estimators of \(\varvec{\beta }\) and \(m(\cdot )\) based on the bootstrap samples \(\{(V_i, Y_i^*), i=1, \cdots , n\}\), where \(Y^*_i\) has the success probability \(p_i\). Analogously to establish (A.1) and (A.2), we can prove that
Write the bootstrap version of \(M_{n}(u, W)\) as \( M^*_{n}(u, W)=1/\sqrt{n}\sum _{i=1}^n [Y^*_i-G\{X_i^{\top }\widehat{\varvec{\beta }}_n^*+\widehat{m}_n^*(Z_i)\}]I(V_i^{\top } W\le u)\). Note that \(Y^*_i-G\{X_i^{\top }\widehat{\varvec{\beta }}_n^*+\widehat{m}_n^*(Z_i)\} =[Y_i^*-G\{X_i^{\top }\widehat{\varvec{\beta }}_n+\widehat{m}_n(Z_i)\}] -[G\{X_i^{\top }\widehat{\varvec{\beta }}_n^*+\widehat{m}_n^*(Z_i)\}-G\{X_i^{\top }\widehat{\varvec{\beta }}_n+\widehat{m}_n(Z_i)\}].\) Then, we have
Applying (A.7) along with the similar proof to that for (A.5) yields that
It follows from (A.9)–(A.11) that
Note that \(E(Y_i^*|\textrm{data})=G\{X_i^{\top }\widehat{\varvec{\beta }}_n +\widehat{m}_n(Z_i)\}.\) The similar arguments to the proof of Theorem 1 along the line with the proof of Theorem 2 in Dikta et al. (2006) can prove that the conditional distribution of \(T^*_{n}\) converges in distribution to the limiting null distribution of \(T_{n}\).
Note that the validity of (A.7) is independent of \(D(V)=0\). We can similarly prove that the conditional distribution of \(T^*_{n}\) converges in distribution to the limiting alternative distribution of \(T_{n}\). Theorem 3 follows. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, X., Liang, H., Härdle, W. et al. Model checking for generalized partially linear models. TEST (2023). https://doi.org/10.1007/s11749-023-00897-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11749-023-00897-4
Keywords
- Consistent test
- Curse of dimensionality
- Dimension reduction
- Model-based bootstrap
- Projection
- Residual-marked empirical processes