Abstract
In this paper, we propose a new semiparametric method to simultaneously select important variables, identify the model structure and estimate covariate effects in the additive AFT model, for which the dimension of covariates is allowed to increase with sample size. Instead of directly approximating the non-parametric effects as in most existing studies, we take a linear effect out to weak the condition required for model identifiability. To compute the proposed estimates numerically, we use an alternating direction method of multipliers algorithm so that it can be implemented easily and achieve fast convergence rate. Our method is proved to be selection consistent and possess an asymptotic oracle property. The performance of the proposed methods is illustrated through simulations and the real data analysis.
Similar content being viewed by others
References
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Second international symposium on information theory, pp 267–281
Antoniadis A, Gijbels I, Lambert-Lacroix S (2014) Penalized estimation in additive varying coefficient models using grouped regularization. Stat Pap 55:727–750
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3:1–122
Buckley J, James I (1979) Linear regression with censored data. Biometrika 66:429–436
Candes E, Tao T (2007) The Dantzig selector: statsitical estimation when \(p\) is much larger than \(n\). Ann Stat 35:2313–2351
Cao Y, Huang J, Liu Y, Zhao X (2016) Sieve estimation of Cox models with latent structures. Biometrics 72:1086–1097
Chen K, Shen J, Ying Z (2005) Rank estimation in partial linear model with censored data. Stat Sin 15(3):767–779
Chen S, Zhou Y, Ji Y (2018) Nonparametric identification and estimation of sample selection models under symmetry. J Econom 202(2):148–160
Craven P, Wahba G (1979) Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer Math 31:377–403
de Boor C (1978) A practical guide to splines. Applied Mathematical Sciences, vol 27, no 149. Springer, New York, pp 157
Fleming TR, Harrington DP (1991) Counting processes and survival analysis. Wiley, New York
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Fan J, Li R (2002) Variable selection for Cox’s proportional hazards model and frailty model. Ann Stat 30:74–99
Huang J (1999) Efficient estimation of the partly linear additive Cox model. Ann Stat 27:1536–1563
Huang J, Ma S (2010) Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Anal 16:176–195
Huang J, Horowitz JL, Wei F (2010) Variable selection in nonparametric additive models. Ann Stat 38:2282–2313
Huang J, Wei F, Ma S (2012) Semiparametic regression pursuit. Stat Sin 22:1403–1426
Joseph A (2013) Variable selection in high-dimension with random designs and orthogonal matching pursuit. J Mach Learn Res 14:1771–1800
Kim J, Pollard DB (1990) Cube root asymptotics. Ann Stat 18:191–219
Lam C, Fan J (2009) Sparsitency and rates of convergence on large covariance matrix estimation. Ann Stat 37:4254–4278
Leng C, Ma S (2007) Accelerated failure time models with nonlinear covariates effects. Aust N Z J Stat 49:155–172
Lian H, Lai P, Liang H (2013) Partially linear structure selection in Cox models with varying coefficients. Biometrics 69:348–357
Liu Y, Zhang J, Zhao X (2018) A new nonparametric screening method for ultrahigh-dimensional survival data. Comput Stat Data Anal 119:74–85
Ma S, Du P (2012) Variable selection in partly linear regression model with diverging dimensions for right censored data. Stat Sin 22:1003–1020
Ma S, Kosorok MR, Fine JP (2006) Additive risk models for survival data with high-dimensional covariates. Biometrics 62:202–210
Newey WK (1994) The asymptotic variance of semiparametric estimators. Econometrica 62:1349–1382
Neykov NM, Filzmoser P, Neytchev PN (2014) Ultrahigh dimensional variable selection through the penalized maximum trimmed likelihood estimator. Stat Pap 55:187–207
Robert J, Gray (1992) Flexible methods for analyzing survival data using splines with applications to breast cancer prognosis. J Am Stat Assoc 8:942–951
Rosenwald A et al (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med 346:1937–1947
Schumaker L (1981) Spline functions: basic theory. Wiley, New York
Schwartz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Stone C (1986) The dimensionality reduction principle for generalized additive models. Ann Stat 14:590–606
Stute W (1993) Consistent estimation under random censorship when covariables are available. J Multivar Anal 45:89–103
Stute W (1996) Distributional convergence under random censorship when covariables are present. Scand J Stat 23:461–471
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288
Tibshirani R (1997) The lasso method for variable selection in the Cox model. Stat Med 16:385–395
van der Vaart A, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York
Wang K, Lin L (2019) Robust and efficient estimator for simultaneous model structure indentification and variable selection in generalized partial linear varying coefficient models with longitudinal data. Stat Pap 60:1649–1676
Wang S, Nan B, Zhu J, David GB (2008) Doubly penalized Buckley-James method for survival data with high-dimensional covariates. Biometrics 64:132–140
Wei LJ (1992) The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Stat Med 11:1871–1879
Wu Y, Stefanski LA (2015) Automatic structure recovery for additive models. Biometrika 102:381–395
Zeng D, Lin D (2007) Efficient estimation for the accelerated failure time model. J Am Stat Assoc 102:1387–1396
Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942
Zhang HH, Lu W (2007) Adaptive Lasso for Cox’s proportional hazards model. Biometrika 94:691–703
Zhang HH, Cheng G, Liu Y (2011) Linear or nonlinear? Automatic structure discovery for partially linear models. J Am Stat Assoc 106:1099–1112
Zhang J, Yin G, Liu Y, Wu Y (2018) Censored cumulative residual independent screening for ultrahigh-dimensional survival data. Lifetime Data Anal 24:273–292
Zou H (2006) The adaptive Lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Acknowledgements
The authors would like to thank the referees, the associate editor and the editor for their constructive and insightful comments and suggestions that greatly improved the paper. This research was partially supported by the National Nature Science Foundation of China (Nos. 11971362, 11571263 and 11771366). The work of J. Huang is supported in part by the NSF grant DMS-1916199.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Proofs
Appendix: Proofs
Proof of Proposition 1
First, the fact that \(\phi _j(x)=\beta _j+\tilde{\phi }_j(x)\) with \(\beta _j=\int _{\alpha _1}^{\alpha _2} \phi _j(x)dx\) and \(\tilde{\phi }_j(x)=\phi _j(x)-\beta _j\) for \(j=1,\ldots , p\) implies that the decomposition (4) holds. To show the uniqueness of the decomposition, we assume that there exist \((\beta _1^{(l)},\ldots ,\beta _{d_n}^{(l)})'\in {\mathbb {R}}^{d_n}\) and \((\tilde{\phi }_1^{(l)},\ldots ,\tilde{\phi }_{d_n}^{(l)})'\in \tilde{{\mathcal {H}}}^{d_n}\), \(l=1,2\) such that
It suffices to prove that \(\beta _j^{(1)}=\beta _j^{(2)}\) and \(\tilde{\phi }_j^{(1)}(x)\equiv \tilde{\phi }_j^{(2)}(x)\) for each \(j=1,\ldots , d_n\). To the end, we note that (10) implies that
When the covariates are not linearly dependent, by the Fubini’s theorem, there exists \((x_1^0,\ldots ,x_{j-1}^0,x_{j+1}^0,\ldots ,x_{d_n}^0)\in [\alpha _1,\alpha _2]^{d_n-1}\) such that
Writing \(-\sum \nolimits _{i\ne j}x_i^0\Big ([\beta _i^{(1)}-\beta _i^{(2)}]+[\tilde{\phi }_i^{(1)}(x_i^0)-\tilde{\phi }_i^{(2)}(x_i^0)]\Big )\) as \(C_j\) and using the condition that \(E(\beta _j^{(l)} X_j+X_j\tilde{\phi }_j^{(l)}(X_j))=E(X_j\phi (X_j))\) for \(l=1,2\), we have
for each \(j=1,\ldots , d_n\). Noting that \(\tilde{\phi }_j^{(l)}(x)\in \tilde{{\mathcal {H}}}\), integrating two sides of (11) on variable x from \(\alpha _1\) to \(\alpha _2\) gives that
Combining with (11), we get that
\(\square \)
Let \({\mathbb {P}}_n\) be the empirical measure of \(\{(Y_i,\delta _i, \varvec{X}_i):i=1,2,\ldots ,n\}\), and \({\mathbb {P}}\) be the probability measure of \((Y,\delta ,\varvec{X})\). Define \(g_{nj}^*(X_j)=g_{j}^*(\phi _{nj},X_j)\) and \(g_{0j}^*(X_j)=g_{j}^*(\phi _{0j},X_j)\) for \(\phi _{nj}\in {\Omega _n}\). Then denote \(g_n(\varvec{X})=\sum _{j=1}^{d_n} X_j\phi _{nj}(X_j)\), \(g_n^*(\varvec{X})=\sum _{j=1}^{d_n} g_{nj}^*(X_j)\) and \(g_0^*(\varvec{X})=\sum _{j=1}^{d_n} g_{0j}^*(X_j)\). Define
for \(u_0=\cdots =u_m=\xi _0,\ u_{m+1}=\xi _1,\ldots , u_{q_n-1}=\xi _{K_n-1},\ u_{q_n}=\cdots =u_{q_n+m}=\xi _{K_n}\). Let \(\xrightarrow {P}\) and \(\xrightarrow {d}\) represent convergence in probability and in distribution, respectively, as \(n\rightarrow \infty \) unless otherwise stated. Similar to Lemma A5 in Huang (1999), the following lemma can be established first.
Lemma 1
Assume that Conditions (C1)–(C4) hold for any \(1\le j\le d_n\). Then there exists a function \(\phi _{nj} \in {\Omega _n}\) such that
with \({\mathbb {P}}_n\delta g_{nj}=0\).
Proof
According to Corollary 6.21 of Schumaker (1981), for any \(1\le j\le d_n\), there exists \(\phi _{nj}\in \Omega _n\) such that \(\Vert \phi _{nj}-\phi _{0j}\Vert _\infty =O(n^{-\nu p})\). We define \(\widetilde{g}_{nj}(X_j)=X_j\phi _{nj}(X_j)\) and
where \(n_{\delta }=\sum _{i=1}^n\delta _i/n\). Then it is easy to see that \({\mathbb {P}}_n\delta g_{nj}=0\) for any \(1\le j\le d_n\). Furthermore, we note that
where
with c being a constant independent of n. By Lemma 3.4.2 in van der Vaart and Wellner (1996), we have \(({\mathbb {P}}_n-{\mathbb {P}})\delta \widetilde{g}_{nj}=O_p(n^{-1/2}n^{\nu /2})\). And the definition of \(\phi _{nj}\) shows that \(\Vert {\mathbb {P}}(\delta \widetilde{g}_{nj}-\delta g_{0j})\Vert _\infty \le E(\delta )\Vert \widetilde{g}_{nj}-g_{0j}\Vert _\infty =O(n^{-\nu p})\). Hence we have
In addition,
Plugging (13) and (14) into (12), we can get \(\Vert g_{nj}-g_{0j}\Vert _\infty =O_p(n^{-\nu p}+n^{-(1-\nu )/2})\). By using the property of Kaplan–Meier weights (Stute 1993) and Lemma 3.4.2 in van der Vaart and Wellner (1996), we have
where \(c_i\)’s \(i=1,\ldots , 4\) are finite constants. Thus, we have
\(\square \)
Define \(\widehat{g}_{nj}^*(X_j)=g_j^*(\widehat{\phi }_{nj},X_j)\) and \(\widehat{g}_n^*(\varvec{X})=\sum _{j=1}^{d_n}\widehat{g}_{nj}^*(X_j)\), then we have the following lemma.
Lemma 2
Assume that Conditions (C1)–(C7) hold. If \(0.25/p<\nu <0.5\), then \(\Vert \widehat{g}_n^*-g_n^*\Vert ^2=o_p(d_n^2 q_n^{-1})\) and \(\displaystyle \left\| \frac{1}{d_n}(\widehat{g}_n^*-g_n^*)\right\| _{\infty }=o_p(1)\).
Proof
Let \(\eta _{nj}\in \Omega _n\) such that \(\eta _{nj}(x)=\varvec{\theta }_{nj}^{*T}\varvec{\psi }_{q_n,m}(x)\) and \(\Vert \eta _{nj}(x)\Vert ^2=O(q_n^{-1})\). Denote \(h_n^*(\varvec{X})=\sum _{j=1}^{d_n} g_j^*(\eta _{nj},X_j)\), then we have \(\displaystyle \Vert \frac{1}{d_n}h_n^*(\varvec{X})\Vert ^2=O_p(q_n^{-1})\). Define \(H_n(\alpha )=Q_n(\varvec{\theta }_n+\alpha \varvec{\theta }_n^{*})\). To prove this lemma, it is sufficient to show that for any \(\alpha _0>0\), \(H'_n(\alpha _0)>0\) and \(H'_n(-\alpha _0)<0\) with probability tending to one.
Note that
Then
We consider the first part
where \(c_0>0\) is a constant and the first term
In \(J_{1n}\), \(\Vert Y^*-g_n^*\Vert _\infty =\Vert Y^*-g_0^*+g_0^*-g_n^*\Vert _\infty \le O_p(1)+O_p(d_n (n^{-\nu p}+n^{-(1-\nu )/2}))\). Since \(d_n^4/n\rightarrow 0,\ 0.25/p<\nu <0.5\), we have \(\Vert \displaystyle \frac{1}{d_n^2}h_n^*(Y^*-g_n^*)\Vert _\infty \le M_0\) with a constant \(M_0\). Let
Then similar to Lemma A2 and Corollary A1 in Huang (1999), we have
for any \(\varepsilon <\eta \) with a constant \(c_0\) and
Here we can take \(\eta =q_n^{-1/2}\). Combining the results of Lemma 3.4.2 in van der Vaart and Wellner (1996) and Lemma A1 in Huang (1999), we get
We then consider \(J_{2n}\) as
which gives that
Therefore,
Next we focus on \(H_2\) and \(H_3\). Let \(\varvec{B}_j(X_j)=(\varvec{\psi }_{q_n,m}(X_{1j}),\ldots ,\varvec{\psi }_{q_n,m}(X_{nj}))^T\). By Lemma 3 of Huang and Ma (2010), it follows that there are constants \(0<c_3<c_4<\infty \) such that
with probability tending to one. Then we have \(\Vert \varvec{\theta }_{nj}^{*}\Vert =O_p(1)\) and \(\Vert \varvec{{C}_{\xi }}\varvec{\theta }_{nj}^{*}\Vert =O_p(1)\) by using of the fact that \(\Vert \varvec{\theta }_{nj}^{*T} \varvec{\psi }_{q_n,m}(X_j)\Vert =O(q_n^{-1/2})\). Observing that
by using of Condition (C9) and \(\lambda _{1}=o(d_n n^{-\nu })\), we have
The same arguments as above give that \(|H_3|\le o_p(d_n^2 n^{-\nu })\) if \(\lambda _{2}=o(d_n n^{-\nu })\).
Consequently, \(H'_n(\alpha _0)\ge c_0\alpha _0 d_n^2 n^{-\nu }+o_p(d_n^2 n^{-\nu })>0\) with probability tending to one. Similarly, we can prove that \(H'_n(-\alpha _0)<0\) with probability tending to one. Therefore, the boundness of covariate \(\varvec{X}\) in Condition (C2) ensures that
Subsequently, Lemma 7 of Stone (1986) yields that \(\displaystyle \Vert \frac{1}{d_n}(\widehat{g}_n^*-g_n^*)\Vert _{\infty }=o_p(1)\). \(\square \)
To verify the consistency of parameter estimation, we need the following lemma.
Lemma 3
Define \(m_0(x,y^*;g^*)=(y^*-g^*(x))^2/d_n^2\). Denote \(M_0={\mathbb {P}} m_0\) and \(\displaystyle M_n={\mathbb {P}}_n m_0=\frac{1}{n}\Vert Y^*-g^*(\varvec{X})\Vert ^2/d_n^2\). Under the conditions of Lemma 1, for any function \(g(\cdot )\) satisfying \(E[\delta g(\varvec{X})]=0\), there exists a constant \(c>0\) such that
Proof
Let \(h^*=g^*-g_0^*\) and
Since \(L'(0)=0\) and \(L''(0)=2{\mathbb {P}}(h^{*2})/d_n^2\), there exists a constant \(c>0\), such that \(\displaystyle {\mathbb {P}} m_0(\cdot ;g^*)-{\mathbb {P}} m_0(\cdot ;g_0^*)=c\left\| \frac{1}{d_n}(g^*-g_0^*)\right\| ^2\). Similarly, we have
By Lemma 1, \({\mathbb {P}} m_0(\cdot ;g_n^*)-{\mathbb {P}} m_0(\cdot ;g_0^*)=O_p(n^{-2\nu p}+n^{-(1-\nu )})\). Combining the following equality
with the triangle inequality
we have
where \(c>0\) is a finite constant. \(\square \)
Proof of Theorem 1
Let
By Lemma 3.4.2 of van der Vaart and Wellner (1996),
Then by Theorem 3.4.1 of van der Vaart and Wellner (1996), choosing the distance \(d(\widehat{g}_n^*,g_n^*)=-[{\mathbb {P}} m_0(\cdot ;\widehat{g}_n^*)-{\mathbb {P}} m_0(\cdot ;g_n^*)]\) there, we have
where \(r_{1n}=O(n^{1/2}q_n^{-1/2})=O(n^{(1-\nu )/2})\). Therefore, \({\mathbb {P}} m_0(\cdot ;\widehat{g}_n^*)-{\mathbb {P}} m_0(\cdot ;g_n^*)=O_p(n^{-(1-\nu )})\). Thus Lemma 3 gives that \(\Vert \frac{1}{d_n}(\widehat{g}_n^*-g_n^*)\Vert ^2=O_p(n^{-2\nu p}+n^{-(1-\nu )})\). Combining the result in Lemma 1 that \(\Vert g_n^*-g_0^*\Vert _\infty ^2=O_p(d_n^2(n^{-2\nu p}+n^{-(1-\nu )}))\), we have
By Conditions (C2)–(C4), it follows that
Denoting the projection of \(\varvec{\varvec{X}}_{M_2}\) on \(\varvec{\varvec{X}}_{M_1}\) as W, we have
By Condition (C6), we obtain
This in turn implies \(E\delta \big \Vert \varvec{\varvec{X}}_{M_1} (\widehat{\varvec{\phi }}_n-\tilde{\varvec{\phi }}_0)\big \Vert ^2=O_p(d_n^2 (n^{-(1-\nu )}+n^{-2\nu p}))\). Therefore,
This completes the proof of Theorem 1. \(\square \)
Proof of Theorem 2
-
(i)
First, we prove the selection consistency of the variables. Let \(\widetilde{\varvec{\theta }}_n=(\widetilde{\varvec{\theta }}_{n1}^T,\ldots ,\widetilde{\varvec{\theta }}_{nd_n}^T)^T\) with
$$\begin{aligned} \widetilde{\varvec{\theta }}_{nj}= \left\{ \begin{array}{ll} \widehat{\varvec{\theta }}_{nj},&{}\text{ if } j\notin M_3\text{, }\\ 0,&{}\text{ if } j\in M_3\text{. } \end{array}\right. \end{aligned}$$Note that \(\widehat{\varvec{\theta }}_n\) satisfies \(\displaystyle \frac{\partial Q_n(\widehat{\varvec{\theta }}_n)}{\partial \varvec{\theta }}=\varvec{0}\). By the definition of \(\widehat{\varvec{\theta }}_n\) and \(\widetilde{\varvec{\theta }}_n\), we have
$$\begin{aligned}&Q_n(\widehat{\varvec{\theta }}_n)-Q_n(\widetilde{\varvec{\theta }}_n) \\= & {} \frac{\partial Q_n(\widehat{\varvec{\theta }}_{n})^T}{\partial \varvec{\theta }}(\widehat{\varvec{\theta }}_{n}-\widetilde{\varvec{\theta }}_n)-\frac{1}{2}(\widehat{\varvec{\theta }}_n-\widetilde{\varvec{\theta }}_n)^T\frac{\partial ^2 Q_n(\varvec{\theta }_n^*)}{\partial \varvec{\theta }\partial \varvec{\theta }^T}(\widehat{\varvec{\theta }}_n-\widetilde{\varvec{\theta }}_n)\\= & {} -\frac{1}{2}(\widehat{\varvec{\theta }}_n-\widetilde{\varvec{\theta }}_n)^T\frac{\partial ^2 Q_n(\varvec{\theta }_n^*)}{\partial \varvec{\theta }\partial \varvec{\theta }^T}(\widehat{\varvec{\theta }}_n-\widetilde{\varvec{\theta }}_n)\\= & {} -\frac{1}{2}\widehat{\varvec{\theta }}_{nM_3}^T \frac{\partial ^2 \ell _n(\varvec{\theta }_n^*)}{\partial \varvec{\theta }_{M_3}\partial \varvec{\theta }_{M_3}^T}\widehat{\varvec{\theta }}_{nM_3}-\frac{1}{2}\sum \limits _{j\in M_3}(\varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj})^T\big (P''_1(\Vert \varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj}\Vert ;\lambda _{1})+o_p(1)\big )(\varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj}) \\&-\frac{1}{2}\sum \limits _{j\in M_3}\widehat{\varvec{\theta }}_{nj}^T\big (P''_2(\Vert \widehat{\varvec{\theta }}_{nj}\Vert ;\lambda _{2})+o_p(1)\big )\widehat{\varvec{\theta }}_{nj}, \end{aligned}$$where \(\varvec{\theta }_n^*\) is between \(\widehat{\varvec{\theta }}_n\) and \(\widetilde{\varvec{\theta }}_n\).
Since \( \widehat{\varvec{\theta }}_n\) is the minimizer of \(Q(\varvec{\theta })\), we have \(Q(\widehat{\varvec{\theta }}_n)\le Q(\widetilde{\varvec{\theta }}_n)\), which implies that
$$\begin{aligned} \frac{1}{2}\widehat{\varvec{\theta }}_{nM_3}^T \frac{\partial ^2 \ell _n(\varvec{\theta }_n^*)}{\partial \varvec{\theta }_{M_3}\partial \varvec{\theta }_{M_3}^T}\widehat{\varvec{\theta }}_{nM_3}\ge & {} -\frac{1}{2}\sum \limits _{j\in M_3}(\varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj})^T\big (P''_1(\Vert \varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj}\Vert ;\lambda _{1})+o_p(1)\big )(\varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj})\nonumber \\&-\frac{1}{2}\sum \limits _{j\in M_3}\widehat{\varvec{\theta }}_{nj}^T\big (P''_2(\Vert \widehat{\varvec{\theta }}_{nj}\Vert ;\lambda _{2})+o_p(1)\big )\widehat{\varvec{\theta }}_{nj}. \end{aligned}$$(15)Note that the left hand of Eq (15)
$$\begin{aligned} I_1\le c \widehat{\varvec{\theta }}_{nM_3}^T E(X_{M_3}^TX_{M_3})\widehat{\varvec{\theta }}_{nM_3}\le c\rho _n^*\Vert \widehat{\varvec{\theta }}_{nM_3}\Vert ^2 \end{aligned}$$for some constant c by the continuity of the B-spline functions and the definition of \(\rho _n^*\). And using Condition (C9), there exist constants a, b and c such that the right hand of Eq (15)
$$\begin{aligned} I_2\ge c (\lambda _{1}^a+\lambda _{2}^a) \Vert \widehat{\varvec{\theta }}_{nM_3}\Vert ^{2-b}. \end{aligned}$$Thus, by the results of Theorem 1, we obtain that
$$\begin{aligned} O_p(1)(d_n^2(n^{-(1-\nu )}+n^{-2\nu p}))^{b/2}\ge \Vert \widehat{\varvec{\theta }}_{nM_3}\Vert ^{b}\ge O_p(1)\frac{\lambda _{1}^a+\lambda _{2}^a}{\rho _n^*}. \end{aligned}$$This shows that under the condition that \(\displaystyle \frac{\lambda _{1}^a}{\rho _n^*(d_n^2(n^{-(1-\nu )}+n^{-2\nu p}))^{b/2}}\) and \(\displaystyle \frac{\lambda _{2}^a}{\rho _n^*(d_n^2(n^{-(1-\nu )}+n^{-2\nu p}))^{b/2}} \) goes to infinity,
$$\begin{aligned} P(\Vert \widehat{\varvec{\theta }}_{nM_3}\Vert >0)\le P\Big (\frac{\lambda _{1}^a+\lambda _{2}^a}{\rho _n^*(d_n^2(n^{-(1-\nu )}+n^{-2\nu p}))^{b/2}}\le O_p(1)\Big )\rightarrow 0. \end{aligned}$$Next, we prove the structure selection consistency. Assume that \({\mathop {\varvec{\theta }}\limits _{\sim }}_{n}=({\mathop {\varvec{\theta }}\limits _{\sim }}_{n1}^{T},\ldots ,{\mathop {\varvec{\theta }}\limits _{\sim }}_{nd_{n}}^T)^T\) with
$$\begin{aligned} {\mathop {\varvec{\theta }}\limits _{\sim }}_{nM_{2}}= \left\{ \begin{array}{ll} \widehat{\varvec{\theta }}_{nj},&{}\text{ if } j\notin M_2\text{, }\\ {\mathop {\varvec{\theta }}\limits _{\sim }}_{nj}, s.t. \ \varvec{{C}_{\xi }}{\mathop {\varvec{\theta }}\limits _{\sim }}_{nj}=0,&{}\text{ if } j\in M_2\text{. } \end{array}\right. \end{aligned}$$Then we have
$$\begin{aligned}&\frac{1}{2}\Big (\widehat{\varvec{\theta }}_{nM_2}-{\mathop {\varvec{\theta }}\limits _{\sim }}_{nM_2}\Big )^T \frac{\partial ^2 \ell _n(\varvec{\theta }_n^0)}{\partial \varvec{\theta }_{M_2}\partial \varvec{\theta }_{M_2}^T}\Big (\widehat{\varvec{\theta }}_{nM_2}-{\mathop {\varvec{\theta }}\limits _{\sim }}_{nM_2}\Big )\nonumber \\\ge & {} -\frac{1}{2}\sum \limits _{j\in M_2}(\varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj})^T\big (P''_1(\Vert \varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj}\Vert ;\lambda _{1})+o_p(1)\big )(\varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj})\nonumber \\&-\frac{1}{2}\sum \limits _{j\in M_2}\Big (\widehat{\varvec{\theta }}_{nj}-{\mathop {\varvec{\theta }}\limits _{\sim }}_{nj}\Big )^T \big (P''_2(\Vert \widehat{\varvec{\theta }}_{nj}\Vert ;\lambda _{2})+o_p(1)\big ) \Big (\widehat{\varvec{\theta }}_{nj}-{\mathop {\varvec{\theta }}\limits _{\sim }}_{nj}\Big ), \end{aligned}$$(16)where \(\varvec{\theta }_n^0\) is between \(\widehat{\varvec{\theta }}_n\) and \({\mathop {\varvec{\theta }}\limits _{\sim }}_n\). The left hand of equation (16)
$$\begin{aligned} II_1\le & {} O_p(1)\Big (\varvec{C}_\xi (\widehat{\varvec{\theta }}_{nM_2}-{\mathop {\varvec{\theta }}\limits _{\sim }}_{nM_2})\Big )^T \cdot \frac{\partial ^2 \ell _n(\varvec{\theta }_n^0)}{\partial \varvec{\theta }_{M_2}\partial \varvec{\theta }_{M_2}^T}\cdot \Big (\varvec{C}_\xi (\widehat{\varvec{\theta }}_{nM_2}-{\mathop {\varvec{\theta }}\limits _{\sim }}_{nM_2})\Big )\\= & {} O_p(1)(\varvec{C}_\xi \widehat{\varvec{\theta }}_{nM_2})^T \cdot \frac{\partial ^2 \ell _n(\varvec{\theta }_n^0)}{\partial \varvec{\theta }_{M_2}\partial \varvec{\theta }_{M_2}^T}\cdot (\varvec{C}_\xi \widehat{\varvec{\theta }}_{nM_2})\\\le & {} O_p(1)\rho _n^*\Vert \widehat{\varvec{\theta }}_{nM_2}\Vert ^2. \end{aligned}$$Similarly, we can obtain that the right hand of equation (16)
$$\begin{aligned} II_2\ge c (\lambda _{1}^a+\lambda _{2}^a) \Vert \widehat{\varvec{\theta }}_{nM_2}\Vert ^{2-b}. \end{aligned}$$Therefore,
$$\begin{aligned} P(\Vert \widehat{\varvec{\theta }}_{nM_2}\Vert >0)\le P\Big (\frac{\lambda _{1}^a+\lambda _{2}^a}{\rho _n^*(d_n^2(n^{-(1-\nu )}+n^{-2\nu p}))^{b/2}}\le O_p(1)\Big )\rightarrow 0. \end{aligned}$$The selection consistency of variable and structure is concluded.
-
(ii)
Let the column and row vectors of covariate matrix \(\varvec{X}^*\) are \(X_1^*,\ldots , X_{d_n}^*\) and \(X_{(1)}^*,\ldots , X_{(n)}^*\), respectively. Define
$$\begin{aligned} \overline{X}_w=\frac{\sum \nolimits _{i=1}^n \omega _i X_{(i)}}{\sum \nolimits _{i=1}^n \omega _i}, \quad X_{(i)}^*=(n\omega _i)^{1/2}(X_{(i)}-\overline{X}_w), \end{aligned}$$$$\begin{aligned} U(\varvec{W};\varvec{\beta },\widehat{\varvec{\phi }}_n)\triangleq (-\varvec{X}_{M_2}^{*})\Big (Y^*-\sum \limits _{j\in M_1}\hat{g}_{nj}^*(X_{j})-\varvec{X}_{M_2}^* \varvec{\beta }\Big ), \end{aligned}$$$$\begin{aligned} \widehat{U}_n(\varvec{\beta })\triangleq \frac{1}{n}\sum \limits _{i=1}^{n}U(\varvec{W}_i;\varvec{\beta },\widehat{\varvec{\phi }}_n), \end{aligned}$$with \(\varvec{W}\triangleq (\omega ,\varvec{X},Y)\). Then \(\widehat{\varvec{\beta }}_n\) satisfies the estimating equation \(\widehat{U}_n(\widehat{\varvec{\beta }})=0\) by the definition of \(\widehat{\varvec{\beta }}_n\) and \(\widehat{\varvec{\phi }}_n\).
Let \(\displaystyle U_n(\varvec{\beta })\triangleq \frac{1}{n}\sum \nolimits _{i=1}^{n}U(\varvec{W}_i;\varvec{\beta },\tilde{\varvec{\phi }}_0)\) and \(\widetilde{\varvec{\beta }}_n\) be the root of \(U_n(\varvec{\beta })=0\). We then show that \(\widehat{\varvec{\beta }}_n\) has the same distribution with \(\widetilde{\varvec{\beta }}_n\). The Fréchet derivative of \(U(\varvec{W};\varvec{\beta }_0,\varvec{\phi })\) at \(\tilde{\varvec{\phi }}_0\) in the direction \(\varvec{h}\) is given as
$$\begin{aligned} D(\varvec{W},\varvec{h})= & {} \lim \limits _{\alpha \rightarrow 0}\frac{U(\varvec{W};\varvec{\beta }_0,\tilde{\varvec{\phi }}_0+\alpha \varvec{h})-U(\varvec{W};\varvec{\beta }_0,\tilde{\varvec{\phi }}_0)}{\alpha } \\= & {} \varvec{X}_{M_2}^{*T}\varvec{X}_{M_1}^{*} \varvec{h}, \end{aligned}$$with \(\varvec{h}\in \{h_1+\ldots +h_{|M_1|},h_j\in \tilde{{\mathcal {H}}},\ j\in M_1\}\).
The relation \(\Vert (\widehat{\varvec{\phi }}_n-\tilde{\varvec{\phi }}_0)/d_n\Vert =O_p(n^{-(1-\nu )/2}+n^{-\nu p})=o_p(n^{-1/4})\) ensures that the linear assumption 5.1 in Newey (1994) is satisfied. Then by Lemma 3.4.2 of van der Vaart and Wellner (1996), we have
$$\begin{aligned} \sqrt{n}({\mathbb {P}}_n-{\mathbb {P}})\{D(\varvec{W};\widehat{\varvec{\phi }}_n-\tilde{\varvec{\phi }}_0)\}\xrightarrow {P}0. \end{aligned}$$It follows that the stochastic equicontinuity assumption 5.2 holds. For \(\varvec{\phi }\) close enough to \(\tilde{\varvec{\phi }}_0\), a straightforward calculation yields that \(ED(\varvec{W};\varvec{\phi }-\tilde{\varvec{\phi }}_0)=0\) by using Condition (C8). Then the mean square continuity assumption 5.3 holds with \(\alpha (\varvec{W})=0\). By Lemma 5.1 of Newey (1994), \(\widehat{\varvec{\beta }}_n\) and \(\widetilde{\varvec{\beta }}_n\) have the same distribution.
Next, we seek for the asymptotic distribution of \(\widetilde{\varvec{\beta }}_{nM_2}\). Let \(\iota _n=n^{-1/2},\ V_{1n}(\varvec{a})=Q_n(\varvec{\beta }_0+\iota _n(\varvec{a}^T,\varvec{0}^T)^T,\tilde{\varvec{\phi }}_0)-Q_n(\varvec{\beta }_0,\tilde{\varvec{\phi }}_0)\), where \(\varvec{a}=(a_1,\ldots ,a_{|M_2|})^T\) is a \(|M_2|\)-dimensional constant vector and \(\varvec{0}\) is a \(|M_3|\)-dimensional zero vector. By part (i) of Theorem 2, \(\widetilde{\varvec{\beta }}_n-\varvec{\beta }_0=\iota _n(\widehat{\varvec{a}}_n^T,\varvec{0}^T)^T\) with probability converging to one, where \(\widehat{\varvec{a}}_n=\text {argmin}\{V_{1n}(\varvec{a}): \varvec{a}\in {\mathbb {R}}^{|M_2|}\}\). Letting \(\widetilde{\varvec{\theta }}_n\) be the estimator corresponding to \(\widetilde{\varvec{\beta }}_n\), then similar to Theorem 2 (i), we also have \(\varvec{{C}_{\xi }}\widetilde{\varvec{\theta }}_{nj}=0\ (j\in M_2)\) with probability converging to one.
Note that
$$\begin{aligned} V_{1n}(\varvec{a})= & {} Q_n(\varvec{\beta }_0+\iota _n(\varvec{a}^T,\varvec{0}^T)^T,\tilde{\varvec{\phi }}_0)-Q_n(\varvec{\beta }_0,\tilde{\varvec{\phi }}_0) \\= & {} \Big (\iota _n \varvec{a}^T U_n(\varvec{\beta }_0)+ \frac{\iota _n^2}{2} \varvec{a}^T U_n'(\varvec{\beta }_0)\varvec{a}\Big ) \\&+\left( \sum _{j\in M_2}P_1(\Vert \varvec{{C}_{\xi }}\widetilde{\varvec{\theta }}_{nj}\Vert ;\lambda _{1})-\sum _{j\in M_2}P_1(\Vert \varvec{{C}_{\xi }}\varvec{\theta }_{0j}\Vert ;\lambda _{1})\right) \\&+\left( \sum _{j\in M_2}P_2(\Vert \widetilde{\varvec{\theta }}_{nj}\Vert ;\lambda _{2n}) -\sum _{j\in M_2}P_2(\Vert \varvec{\theta }_{0j}\Vert ;\lambda _{2n})\right) \\\triangleq & {} A_{1n}(\varvec{a})+A_{2n}(\varvec{a})+A_{3n}(\varvec{a}). \end{aligned}$$Since \(\widetilde{\varvec{\beta }}_{n M_2}-\varvec{\beta }_{0}=\iota _n\varvec{a}=\varvec{\psi }_{q_n,m}(\varvec{X}_{M_2})^T(\widetilde{\varvec{\theta }}_{n M_2}-\varvec{\theta }_{0 M_2})\), we have
$$\begin{aligned} \widetilde{\varvec{\theta }}_{nj}-\varvec{\theta }_{0j}=(\varvec{\psi }_{q_n,m}(X_j)\varvec{\psi }_{q_n,m}(X_j)^T)^{-1}\varvec{\psi }_{q_n,m}(X_j)\iota _na_j,\ j=s_1+1,\ldots ,s_2. \end{aligned}$$It follows that
$$\begin{aligned} A_{3n}(\varvec{a})= & {} \sum _{j\in M_2}P_2(\Vert \widetilde{\varvec{\theta }}_{nj}\Vert ;\lambda _{2n})-\sum _{j\in M_2}P_2(\Vert \varvec{\theta }_{0j}\Vert ;\lambda _{2n}) \\= & {} \sum _{j\in M_2}\left[ P_2'(\Vert \varvec{\theta }_{0j}\Vert ;\lambda _{2n})\frac{\varvec{\theta }_{0j}^T}{\Vert \varvec{\theta }_{0j}\Vert }+o_p(1)\right] \\&\big [(\varvec{\psi }_{q_n,m}(X_j)\varvec{\psi }_{q_n,m}(X_j)^T)^{-1}\varvec{\psi }_{q_n,m}(X_j)\iota _na_j\big ]. \end{aligned}$$By Condition (C7), we have
$$\begin{aligned} |A_{3n}(\varvec{a})|\le & {} d_n P_2'(0+;\lambda _{2n}) \sqrt{\Vert ({\varvec{\psi }}_{q_{n,m}}(X_j){\varvec{\psi }}_{q_{n,m}}(X_j)^T)^{-1}\Vert }\iota _n a_j\\= & {} O_p(d_n^2 n^{-\nu }) O_p(n^{-(1-\nu )/2})=o_p(1). \end{aligned}$$Similarly, we can get that \(A_{2n}(\varvec{a})\xrightarrow {p}0\).
Hence, \(\widehat{\varvec{a}}_n=\text {argmin}\{V_{1n}(\varvec{a}): \varvec{a}\in {\mathbb {R}}^{|M_2|}\}=\text {argmin}\{A_{1n}(\varvec{a}): \varvec{a}\in {\mathbb {R}}^{|M_2|}\}\) and so we only care about the minimum of \(nA_{1n}(\varvec{a})\). Similar to Huang et al. (2010), we have
$$\begin{aligned} nA_{1n}(\varvec{a})= & {} \varvec{a}^T \big (\sqrt{n}U_n(\varvec{\beta }_0)\big )+ \frac{1}{2}\varvec{a}^T U_n'(\varvec{\beta }_0)\varvec{a}\\\triangleq & {} \varvec{a}^T T_1+ \varvec{a}^T T_2 \varvec{a}. \end{aligned}$$It can be seen that \(T_2 \xrightarrow {p} \Sigma _2\) and \(\varvec{u}\Sigma _3^{-1/2} T_1\) is distributed asymptotically by N(0, 1) for any \(\varvec{u}\in {\mathbb {R}}^{|M_2|}\) with \(\Vert \varvec{u}\Vert =1\) , where \(\Sigma _3=Var(\delta \gamma _0(Y)(Y-g_0(\varvec{X}))\varvec{X}_{M_2}+(1-\delta )\gamma _1(Y)-\gamma _2(Y))\) with the following notations that
$$\begin{aligned}&\tilde{H}^{11}(\varvec{x},y)=P(\varvec{X}\le \varvec{x}, Y\le y, \delta =1), \quad \tilde{H}^0=P(Y\le y,\delta =0),\\&\gamma _0(y)=\exp \Bigg (\int _0^{y-}\frac{\tilde{H}^0(dw)}{1-H(w)}\Bigg ),\\&\gamma _{1,j}(y)=\frac{1}{1-H(y)}\int I(z>y)e(z,\varvec{x})x_j\gamma _0(z)\tilde{H}^{11}(d\varvec{x},dz),\\&\gamma _{2,j}(y)=\iint \frac{I(v<y,v<z)e(z,\varvec{x})x_j\gamma _0(z)}{[1-H(v)]^2}\tilde{H}^0(dv)\tilde{H}^{11}(d\varvec{x},dz),\\&\gamma _l(y)=(\gamma _{l,j}; j\in M_2), ~~ l=1,2. \end{aligned}$$Let \(\hat{\varvec{a}}=\text {argmin}\{V_1(\varvec{a})=\varvec{a}^T T_1+\frac{1}{2}\varvec{a}^T \Sigma _2 \varvec{a}: \varvec{a}\in {\mathbb {R}}^{|M_2|}\}\). According to the continuous mapping theorem of Kim and Pollard (1990), \(\sqrt{n} \varvec{u}\Sigma ^{-1/2}(\widehat{\varvec{\beta }}_{nM_2}-\varvec{\beta }_{0})\) has the same asymptotical distribution as \(\varvec{u}\Sigma ^{-1/2}\hat{\varvec{a}}\xrightarrow {d} N(0,1)\) for any \(\varvec{u}\in {\mathbb {R}}^{|M_2|}\) with \(\Vert \varvec{u}\Vert =1\), where \(\Sigma =\Sigma _2^{-1}\Sigma _3\Sigma _2^{-1}\). This completes the proof of Theorem 2.
\(\square \)
Rights and permissions
About this article
Cite this article
Liu, L., Wang, H., Liu, Y. et al. Model pursuit and variable selection in the additive accelerated failure time model. Stat Papers 62, 2627–2659 (2021). https://doi.org/10.1007/s00362-020-01205-0
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-020-01205-0