Abstract
This paper considers sparse additive models with missing covariates. The missing mechanism is assumed to be missing at random. The additive components are estimated via a two stage method. First, the penalized weighted least squares method is used. The weight is the inverse of the selection probability, which is the probability of observing covariates. As the penalty, we utilize the adaptive group lasso to distinguish between the zero and the nonzero components. Thus, the penalty is used to investigate the sparse structure and the weight reflects the missing structure. The estimator obtained from the penalized weighted least squares method is denoted by the first stage estimator (FSE). We show the sparsity and consistency properties of the FSE. However, the asymptotic distribution of the FSE of the nonzero components is not derived as it is difficult. Therefore for each nonzero component, we apply the penalized spline methods for univariate regression with the residual of the FSE of other component. The asymptotic normality of the second stage estimator is shown. To confirm the performance of the proposed estimator, simulation studies and a real data application are implemented.
Similar content being viewed by others
References
Barrow DL, Smith PW (1978) Asymptotic properties of best \(L_2[0,1]\) approximation by spline with variable knots. Q Appl Math 36:293–304
Buja A, Hastie T, Tibshirani R (1989) Linear smoothers and additive models (with discussion). Ann Statist 17:453–555
Chen X, Wan A, Zhou Y (2015) Efficient quantile regression analysis with missing observations. J Am Statist Assoc 110:723–741
Claeskens G, Krivobokova T, Opsomer JD (2009) Asymptotic properties of penalized spline estimators. Biometrika 96:529–544
Fan JQ, Li RZ (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Hao M, Song X, Sun L (2014) Reweighting estimators for the additive hazards model with missing covariates. Can J Stat 42:285–307
Hastie T, Tibshirani RJ (1990) Generalized additive models. CRC Press, London
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York
Horowitz JL, Lee S (2005) Nonparametric estimation of an additive quantile regression model. J Am Stat Assoc 100:1238–1249
Horowitz JL, Mammen E (2004) Nonparametric estimation of an additive model with a link function. Ann Stat 32:2412–2443
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
Huang J, Horowitz JL, Wei F (2010) Variable selection in nonparametric additive models. Ann Stat 38:2282–2313
Konishi S, Kitagawa G (2008) Information criteria and statistical modeling. Springer, New York
Lee YK, Mammen E, Park BU (2010) Backfitting and smooth backfitting for additive quantile models. Ann Stat 40:2356–2357
Li T, Yang H (2016) Inverse probability weighted estimators for single-index models with missing covariates. Commun Stat 45:1199–1214
Liang H, Wang S, Robins J, Caroll R (2004) Estimation in partially linear models with missing covariates. J Am Stat Assoc 99:357–367
Lian H, Liang H, Ruppert D (2015) Separation of covariates into nonparametric and parametric parts in high-dimensional partially linear additive models. Stat Sin 25:591–608
Marx BD, Eilers PHC (1998) Direct generalized additive modeling with penalized likelihood. Comp Stat Data Anal 28:193–209
Meier L, van de Geer S, Bühlmann P (2009) High-dimensional additive modeling. Ann Stat 37:3779–3821
O’Sullivan F (1986) A statistical perspective on ill-posed inverse problems. Stat Sci 1:505–527 with discussion
Ravikumar P, Lafferty J, Liu H, Wasserman L (2009) Sparse additive models. J R Stat Soc B 71:1009–1030
Robins J, Rotnitsky A, Zhao L (1994) Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 89:846–866
Ruppert D (2002) Selecting the number of knots for penalized splines. J Comput Graph Stat 11:735–757
Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, Cambridge
Ruppert D, Sheather SJ, Wand MP (1995) An effective bandwidth selector for local least squares regression. J Am Stat Assoc 90:1257–1270
Sepanski J, Knickerbocker R, Carroll R (1994) A semiparametric correction for attenuation. J Am Stat Assoc 89:1366–1373
Sherwood B, Wang L, Zhou X (2013) Weighted quantile regression for analyzing health care cost data with missing covariates. Stat Med 32:4967–4979
Sherwood B (2016) Variable selection for additive partial linear quantile regression with missing covariates. J Multivar Anal 152:206–223
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288
Wang CY, Wang S, Zhao LP, Ou ST (1997) Weighted semiparametric estimation in regression analysis with missing covariate data. J Am Stat Assoc 92:512–525
Wang CY, Wang S, Gutierrez RG, Carroll RJ (1998) Local linear regression for generalized linear models with missing data. Ann Stat 26:1028–1050
Yang H, Liu H (2016) Penalized weighted composite quantile estimators with missing covariates. Stat Pap 57:69–88
Yi GY, He W (2009) Median regression models for longitudinal data with dropouts. Biometrics 65:618–625
Yoshida T, Naito K (2014) Asymptotics for penalized splines in generalized additive models. J Nonparametric Stat 26:269–289
Zhang HH, Lin Y (2006) Component selection and smoothing for nonparametric regression in exponential families. Stat Sin 16:1021–1041
Zhou S, Shen X, Wolfe DA (1998) Local asymptotics for regression splines and confidence regions. Ann Stat 26:1760–1782
Acknowledgements
The authors wish to thank the Editor, Associate Editor and two anonymous referees for their variable comments. The research of the author was partially supported by KAKENHI 26730019.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
We describe the proofs of Theorems 1–2. To prove the main theorem of this paper, we first show the consistency of the centered B-spline model.
Lemma 1
For \(j=1,\ldots ,p\), under Assumptions A and B, there exists \(\mathbf {b}_{0j}\in \mathbb {R}^{K+r}\) such that
proof of Lemma 1
For \(j=1,\ldots ,p\), from Barrow and Smith (1978), there exists \(\mathbf {b}_{0j}\in \mathbb {R}^{K+r}\) such that \(\sup _{x_j\in [a,b]}|f_j(x)-\mathbf {\psi }(x_j)^T\mathbf {b}_{0j}|=O(K^{-d})\). For \(x_{ij}=x_{ij,2}\), we have
and
From Assumption A 2, we obtain
Therefore, Assumption A 5 yields that \(f_j(x_j)-\mathbf {B}(x_j)^T\mathbf {b}_j^*=O(K^{-d})\). For \(x_{ij}=x_{ij,1}\), since
we have
From Assumptions A 2 and B, we have
Thus, we obtain \(f_j(x_j)-\mathbf {B}(x_j)^T\mathbf {b}_j^*=O(K^{-d})\) and this completes the proof.
Define \(Z_A=(Z_j, j\in A)\) and \(\hat{\mathbf {b}}_A=(\hat{\mathbf {b}}_j, j\in A)\). From Lemma 1, there exists \(\mathbf {b}_{0,j}\in \mathbb {R}^{K+r}\) such that \(f_j(x_j)-\mathbf {B}(x_j)^T\mathbf {b}_j^*=O_P(K^{-d})\) for \(j\in A\). From Assumption A 3, for \(j\in A\), there exist \(c_j>0\) such that \(||\mathbf {b}_{0,j}||>c_j\) and for \(j\not \in A\), \(||\mathbf {b}_j||=0\). Define \(\mathbf {b}_A=(\mathbf {b}_{0,j},j\in A)\).
Proof of Theorem 1
From KKT conditions, a necessary and sufficient condition for \(\hat{\mathbf {b}}\) is
We first show
For this, it is sufficient to prove that as \(n\rightarrow \infty \),
Since for \(j\in A\), \(||\hat{\mathbf {b}}_j||\not =0\) is necessary and sufficient for \(||\hat{\mathbf {b}}_j-\mathbf {b}_{0.j}||<||\mathbf {b}_{0,j}||\), we obtain for \(j\in A\) that
if and only if \(||\hat{\mathbf {b}}_j-\mathbf {b}_{0.j}||<||\mathbf {b}_{0,j}||\). Thus under (5), we show
The estimator \(\hat{\mathbf {b}}_A\) can be written as
where
Let \(\mathbf {f}_j=(f_j(x_{1j}),\ldots ,f_j(x_{nj}))^T\) and \(\mathbf {f}_A=\sum _{j\in A} \mathbf {f}_j\). Then from Lemma 1, we have \(\mathbf {\delta }_A=\mathbf {f}_A-Z_A\mathbf {b}_A=O_P(K^{-d})\). Since \(\mathbf {Y}=\mathbf {f}_A+\mathbf {\varepsilon }=Z_A\mathbf {b}_A+\mathbf {\delta }_A+\mathbf {\varepsilon }\), we obtain
From Assumption B, \(\hat{\Pi }=\Pi +O_P(n^{-\alpha })\) and hence \(\hat{\Pi }=O_P(1)\). Therefore, similar to proof of Lemma 5 of Huang et al. (2010),
and
Thus, \(||\hat{\mathbf {b}}_A-\mathbf {b}_A||=o_P(1)\). Together with \(||\mathbf {b}_{0,j}||=O(1) (j\in A)\), the first assertion of (4) holds. The analogy of the proof of second assertion of (4) is the same manner as the proof of Lemma 6 of Huang et al. (2010) and hence it is omitted. Thus, the first assertion of Theorem 1 can be derived.
We next show the second assertion of Theorem 1. From the property of the B-spline function, for \(j\in A\), there exist \(\eta _1, \eta _2>0\) such that
Since
we obtain
The proof was completed.
Proof of Theorem 2
For simplicity, we rewrite \(Z_j\equiv Z_{j,S}, j\in \hat{A}\). For \(j\in A\cap \hat{A}\), the second stage estimator can be written as
where \(Z_{(-j),\hat{A}}=(Z_k, k\in \hat{A}-\{j\})^T\) and \(\hat{\mathbf {b}}_{(-j,\hat{A})}=(\hat{\mathbf {b}}_k, k\in \hat{A}-\{j\})\). Let \(\mathbf {\delta }_{(-j),\hat{A}}=\mathbf {f}_{(-j),\hat{A}}-Z_{(-j),\hat{A}}\mathbf {b}_{(-j),\hat{A}}=O_P(K^{-d})\). Then we have
and so
Similar to Theorem 2(b) of Claeskens et al. (2009), we have
From Lemma A3 of Claeskens et al. (2009) and the fundamental property of B-splines, \(D_2=O_P(K_SK_{j,m}^{-1}K^{-d})\). The result of Theorem 1 and Lemma A3 of Claeskens et al. (2009) yields that
We note that \(K_{j,m}\) is the maximum eigenvalue of \((Z_j^TZ_j)^{-1/2}D_m^TRD_m(Z_j^TZ_j)^{-1/2}\) and this has an asymptotic order \(O(K_S^{2m}(\lambda /n))\) from the proof of Lemma A1 of Claeskens et al. (2009). If \(K_S=O(n^{\eta }), \eta >1/(2m+1)\) and \(\lambda _j=O(n^{m/(2m+1)})\), \(K_{j,m}\rightarrow \infty \) as \(n\rightarrow \infty \). Thus under Assumption C, we have \(E[D_2]=o((\lambda _j/n)^{1/2})\) and \(E[D_3]=o((\lambda _j/n)^{1/2})\). From
we obtain \(E[D_4]=o((\lambda _j/n)^{1/2})\). Thus,
Similarly, the variance of \(\hat{f}_{j,S}(x_j)\) is dominated by
which can be calculated by similar manner as Theorem 2(b) of Claeskens et al. (2009). As the result, \(V[\hat{f}_{j,S}(x_j)]=v_j(x_j)(1+o(1))\), where \(v_j(x_j)=O(n^{-1}(\lambda _j/n)^{-1/(2m)})\). This completes the proof.
Rights and permissions
About this article
Cite this article
Yoshida, T. Two stage smoothing in additive models with missing covariates. Stat Papers 60, 1803–1826 (2019). https://doi.org/10.1007/s00362-017-0896-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-017-0896-6