Skip to main content
Log in

Low-dimensional confounder adjustment and high-dimensional penalized estimation for survival analysis

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

High-throughput profiling is now common in biomedical research. In this paper we consider the layout of an etiology study composed of a failure time response, and gene expression measurements. In current practice, a widely adopted approach is to select genes according to a preliminary marginal screening and a follow-up penalized regression for model building. Confounders, including for example clinical risk factors and environmental exposures, usually exist and need to be properly accounted for. We propose covariate-adjusted screening and variable selection procedures under the accelerated failure time model. While penalizing the high-dimensional coefficients to achieve parsimonious model forms, our procedure also properly adjust the low-dimensional confounder effects to achieve more accurate estimation of regression coefficients. We establish the asymptotic properties of our proposed methods and carry out simulation studies to assess the finite sample performance. Our methods are illustrated with a real gene expression data analysis where proper adjustment of confounders produces more meaningful results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Bradic J, Fan J, Jiang J (2011) Regularization for Cox’s proportional hazards model with NP-dimensionality. Ann Stat 39:3092–3120

    Article  MathSciNet  MATH  Google Scholar 

  • Cai T, Huang J, Tian L (2009) Regularized estimation for the accelerated failure time model. Biometrics 65:394–404

    Article  MathSciNet  MATH  Google Scholar 

  • Chen HY, Yu SL et al (2007) A five-gene signature and clinical outcome in non-small-cell lung cancer. N Engl J Med 356:11–20

    Article  Google Scholar 

  • Cheng MY, Zhang W, Chen LH (2009) Statistical estimation in generalized multiparameter likelihood models. J Am Stat Assoc 104:1179–1191

    Article  MathSciNet  MATH  Google Scholar 

  • Cheng MY, Honda T, Li J, Peng H (2014) Nonparametric independence screening and structure identification for ultra-high dimensional longitudinal/clustered data. Ann Stat 42:1819–1849

    Article  MathSciNet  MATH  Google Scholar 

  • Cheng MY, Honda T, Zhang JT (2015) Forward variable selection for sparse ultra-high dimensional varying coefficient models. J Am Stat Assoc. arXiv:1410.6556

  • Fan J, Feng Y, Song R (2001) Nonparametric independence screening in sparse ultra-high dimensional additive models. J Am Stat Assoc 106:544–555

    Article  MathSciNet  MATH  Google Scholar 

  • Fan J, Li R (2001) Variable selection via noncancave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    Article  MathSciNet  MATH  Google Scholar 

  • Fan J, Li R (2002) Variable selection for coxs proportional hazards model and frailty model. Ann Stat 30:74–99

    Article  MathSciNet  MATH  Google Scholar 

  • Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B 70:849–911

    Article  MathSciNet  Google Scholar 

  • Fan J, Samworth R, Wu Y (2009) Ultrahigh dimensional feature selection: beyond the linear model. J Mach Learn Res 10:2013–2038

    MathSciNet  MATH  Google Scholar 

  • Gordis L (2008) Epidemiology, 4th edn. Saunders, Philadelphia

    Google Scholar 

  • Hu J, Chai H (2013) Adjusted regularized estimation in the accelerated failure time model with high dimensional covariates. J Multivar Anal 122:96–114

    Article  MathSciNet  MATH  Google Scholar 

  • Huang J, Ma S (2010) Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Analysis 16:176–195

    Article  MathSciNet  MATH  Google Scholar 

  • Huang J, Ma S, Xie H (2006) Regularized estimation in the accelerated failure time model with high dimensional covariate. Biometrics 62:813–820

    Article  MathSciNet  MATH  Google Scholar 

  • Huang JZ, Wu CO, Zhou L (2004) Polynomial spline estimation and inference for varying-coefficient models with longitudinal data. Statistica Sinica 14:763–788

    MathSciNet  MATH  Google Scholar 

  • Johnson BA, Lin DY, Zeng D (2008) Penalized estimating functions and variable selection in semiparametric regression models. J Am Stat Assoc 103:672–680

    Article  MathSciNet  MATH  Google Scholar 

  • Li GR, Peng H, Zhang J, Zhu LX (2012) Robust rank correlation based screening. Ann Stat 40:1846–1877

    Article  MathSciNet  MATH  Google Scholar 

  • Li J, Ma S (2010) Interval-censored data with repeated measurements and a cured subgroup. Appl Stat 59:693–705

    MathSciNet  Google Scholar 

  • Li J, Zhang W (2011) A semiparametric threshold model for censored longitudinal data analysis. J Am Stat Assoc 106:685–696

    Article  MathSciNet  MATH  Google Scholar 

  • Lian H, Li J, Tang X (2014) SCAD-penalized regression in additive partially linear proportional hazards models with an ultra-high-dimensional linear part. J Multivar Anal 125:50–64

    Article  MathSciNet  MATH  Google Scholar 

  • Liu X, Wang L, Liang H (2011) Estimation and variable selection for semiparametric additive partially linear models. Statistica Sinica 21:1225–1248

    Article  MathSciNet  MATH  Google Scholar 

  • Lu Y, Lemon W et al (2006) A gene expression signature predicts survival of subjects with state i non-small cell lung cancer. PLoS Med 3:2229–2243

    Article  Google Scholar 

  • Petrov V (1975) Sums of independent random variables. Springer-Verlag, New York

    Book  MATH  Google Scholar 

  • Shao F, Li J, Ma S, Lee M-LT (2014) Semiparametric varying-coefficient model for interval censored data with a cured proportion. Stat Med 33:1700–1712

    Article  MathSciNet  Google Scholar 

  • Shedden K, Taylor JM et al (2008) Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med 14:822–827

    Article  Google Scholar 

  • Stute W (1993) Consistent estimation under random censorship when covariates are present. J Multivar Anal 45:89–103

    Article  MathSciNet  MATH  Google Scholar 

  • Stute W (1996) Distributional convergence under random censorship when covariables are present. Scand J Stat 23:461–471

    MathSciNet  MATH  Google Scholar 

  • VanderWeele TJ, Shpitser I (2013) On the definition of a confounder. Ann Stat 41:196–220

    Article  MathSciNet  MATH  Google Scholar 

  • Wang H, Li B, Leng C (2009) Shrinkage tuning parameter selection with a diverging number of parameters. J R Stat Soc Ser B 71:671–683

    Article  MathSciNet  MATH  Google Scholar 

  • Xie Y, Huang J (2009) SCAD-penalized regression in high-dimensional partially linear models. Ann Stat 37:673–696

    Article  MathSciNet  MATH  Google Scholar 

  • Xie Y, Xiao G et al (2011) Robust gene expression signature from formalin-fixed paraffin- embedded samples predicts prognosis of non-small-cell lung cancer patients. Clin Cancer Res 17:5705–5714

    Article  Google Scholar 

Download references

Acknowledgments

We are grateful to the Editor, the Associate Editor and the Reviewers for helpful comments. Xia’s work is partially supported by the National Natural Science Foundation of China (Grant No. 11471058). Jiang’s research is partially supported by the Singapore National Research Foundation under its International Research Centre at Singapore Funding Initiative and administered by the IDM Programme Office, Media Development Authority (MDA). Li’s work is partially supported by AcRF R-155-000-152-112 and NMRC/CBRG/0014/2012. This work was initiated when Li took his sabbatical leave in Chongqing University. He would like to thank the university for their hospitality.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaochao Xia.

Appendix: Conditions and proofs

Appendix: Conditions and proofs

1.1 Proposition 1

To obtain the screening consistency, we need the following technical assumptions.

  1. (S1)

    \(p>n\) and \(\log (p)=O(n^\xi )\) for some \(\xi \in (0,1-2\kappa )\) where \(\kappa \) is given in condition (S3).

  2. (S2)

    Denote \(\varSigma =\text{ cov }(\mathbf{X})\). We assume \(\varSigma ^{-1/2}{} \mathbf{X}\) has a spherically symmetric distribution and for any \(n\times \tilde{p}\) submatrix \({\tilde{\mathbf{X}}}\) of \(\varSigma ^{-1/2}{} \mathbf{X}\) with \(cn<\tilde{p}\le p\),

    $$\begin{aligned} P(\lambda _{max}(\tilde{p}^{-1}\tilde{\mathbf{X}}\tilde{\mathbf{X}}^T)>c_1 \text{ or } \lambda _{min}(\tilde{p}^{-1}\tilde{\mathbf{X}}\tilde{\mathbf{X}}^T)<1/c_1) \le \exp (-C_1 n) \end{aligned}$$
    (15)

    for some \(c_1>1\) and \(C_1>0\), where \(\lambda _{max}\) and \(\lambda _{min}\) are the largest and smallest eigenvalues of a matrix.

  3. (S3)

    \(\text{ var }[T-\sum _{j=1}^d g_j(U_j)]=O(1)\) and for some \(\kappa \ge 0\) and \(c_2,c_3>0\),

    $$\begin{aligned} \min _{j\in {\mathcal {M}}_*} |\beta _j|\ge \frac{c_2}{n^{\kappa }}\qquad \text{ and }\qquad \min _{j\in {\mathcal {M}}_*} \text{ cov }[\beta _j^{-1}\{T-\sum _{k=1}^d g_k(U_k)\},X_j]\ge c_3. \end{aligned}$$
    (16)
  4. (S4)

    There are some \(\tau \ge 0\) and \(c_4>0\) such that \(\lambda _{max}(\varSigma )\le c_4 n^{\tau }\).

  5. (S5)

    The initial benchmark estimates satisfy \(\sup _j \Vert g_j^*-g_j\Vert =O_p(\omega _n)\) and \(\omega _n\rightarrow 0\) as \(n\rightarrow \infty \).

Proof

(Sketch of Proof of Proposition 1) The proof can be completed in two-steps. In the first step, we define a submodel

$$\begin{aligned} {\mathcal {M}}_\delta =\{1\le j\le p: |\beta _j| \text{ is } \text{ among } \text{ the } \text{ first } [\delta p] \text{ largests }. \} \end{aligned}$$
(17)

and we may show that, if \(\delta \rightarrow 0\) in such a way that \(\delta n^{1-2\kappa -\tau }\rightarrow \infty \) as \(n\rightarrow \infty \), we have for some \(C>0\),

$$\begin{aligned} P({\mathcal {M}}_*\subset {{\mathcal {M}}_\delta })\ge 1- O\{\exp (-Cn^{1-2\kappa }/\log (n))\}. \end{aligned}$$
(18)

In the second step, we fix an \(r\in (0,1)\) and choose a shrinking factor \(\delta \) of the form \((n/p)^{1/(\kappa -r)}\). We successively carry out the argument in Step 1 to obtain submodels \({\mathcal {M}}_{\delta ^k}\), \(k=1,\cdots ,r\). The final results may follow after some lengthy algebra. Similar derivation in Fan and Lv (2008) can be adopted. \(\square \)

1.2 Main theorems

We write \({\varvec{\theta }}=({\varvec{\beta }}^T,{\varvec{\gamma }}^T)^T\). Define \(\tilde{\mathbf{Y}}=E(\mathbf{Y}|\mathbf{X}, \mathbf{U})\), \(\tilde{\varvec{\theta }}=[\widetilde{\mathbf {X}}^T\mathbf { W} \widetilde{\mathbf {X}}]^{-1}\widetilde{\mathbf {X}}^T \mathbf {W} \tilde{\mathbf {Y}}=(\tilde{\varvec{\beta }}^T,\tilde{\varvec{\gamma }}^T)^T\), and \(\tilde{g}_k(u)=\mathbf{B}(u)^T\tilde{\varvec{\gamma }}_k\). To prove the main theorems in this paper, we need the following assumptions.

  1. (C1)

    \(E(\epsilon _i| \mathbf{X}_i, \mathbf{U}_i)=0\) and \(E(T_i^2)\) is finite.

  2. (C2)

    \(T_i\) and \(C_i\) are independent and \(P(Y_i\le C_i| \mathbf{X}_i, \mathbf{U}_i, Y_i)=P(Y_i\le C_i | Y_i)\).

  3. (C3)

    The eigenvalues of the matrix \(E(\mathbf{X}_i\mathbf{X}_i^T)\) is bounded away from zero and finite.

  4. (C4)

    \(\tau _T<\tau _C\) or \(\tau _T=\tau _C=\infty \).

  5. (C5)

    The true parameter \({\varvec{\beta }}_0\) lives in a compact space \(\varTheta \). Each true function \(g_{k0}\) is from a second order Sobolev space.

  6. (C6)

    \(E\left\{ \epsilon ^2\delta \mathbf{X}^{(1)}(\mathbf{X}^{(1)})^T \right\} <\infty \) and \(E\left\{ |\epsilon \mathbf{X}^{(1)}| \sqrt{R(Y)}\right\} <\infty \) where \(R(y)=\int _0^{y-}\{(1-H(w))(1-G(w))\}^{-1}G(dw)\) and G is the distribution function of the censoring time C.

The estimator \(\hat{\varvec{\theta }}\) is of a form of weighted least squares estimator. However, the Kaplan-Meier weights \(\{w_{ni}: i=1,\cdots ,n\}\) do not satisfy the assumptions which are usually required in weighted least squares estimation. We need Lemma 3 to ensure the validity of convergence argument used in the proof of Theorems 1 and 2. The proof of the lemma may follow Stute (1993).

Lemma 3

For an integrable function \(\phi \), define a functional \({\mathcal {S}}_n\phi =\sum _{i=1}^n w_{ni} \phi (Y_i,\mathbf{X}_i,\mathbf{U}_i)\). Under (C1) and (C2), with probability one and in the mean we have

$$\begin{aligned} \lim _{n\rightarrow \infty }{\mathcal {S}}_n\phi =\int _{Y<\tau _H} \phi (Y,\mathbf{X},\mathbf{U}) dP + I(\tau _H\in A)\int _{Y=\tau _H}\phi (\tau _H,\mathbf{X},\mathbf{U})dP. \end{aligned}$$
(19)

We need the following lemma which summarizes necessary properties of the polynomial spline functions. The proof of the lemma may follow Lemma A.3 in Huang et al. (2004).

Lemma 4

Assume \(\lim _{n\rightarrow \infty } n^{-1}M_n\log (M_n)=0\). Except on an event whose probability tends to zero, all the eigenvalues of \(M_n/n \sum _{i=1}^n\mathbf{B}_i\mathbf{B}_i^T\) are bounded away from zero and infinity.

The next lemma establishes the consistency of the estimator.

Lemma 5

Assume the same conditions as Theorem 1. Then \(\Vert \hat{\varvec{\theta }}-\tilde{\varvec{\theta }}\Vert =O_p(r_n +(\lambda _n\rho _n)^{1/2})\).

Proof

(Proof of Lemma 5) We note

$$\begin{aligned} Q(\hat{\varvec{\theta }})-Q(\tilde{\varvec{\theta }})= & {} \frac{1}{n}\left[ (\mathbf{Y}-\tilde{\mathbf{X}}\hat{\varvec{\theta }})^T\mathbf{W} (\mathbf{Y}-\tilde{\mathbf{X}}\hat{\varvec{\theta }}) - (\mathbf{Y}-\tilde{\mathbf{X}}\tilde{\varvec{\theta }})^T\mathbf{W} (\mathbf{Y}-\tilde{\mathbf{X}}\tilde{\varvec{\theta }})\right] \nonumber \\&+\sum _{k=1}^p\{p_{\lambda _n}(|\hat{\beta }_k|)-p_{\lambda _n}(|\tilde{\beta }_k|) \}\nonumber \\= & {} \frac{1}{n}\left[ -2(\mathbf{Y}-\tilde{\mathbf{X}}\tilde{\varvec{\theta }})^T\mathbf{W}\tilde{\mathbf{X}}(\hat{\varvec{\theta }}-\tilde{\varvec{\theta }}) +(\hat{\varvec{\theta }}-\tilde{\varvec{\theta }})^T\tilde{\mathbf{X}}^T\mathbf{W}\tilde{\mathbf{X}}(\hat{\varvec{\theta }}-\tilde{\varvec{\theta }})\right] \nonumber \\&+\sum _{k=1}^p\{p_{\lambda _n}(|\hat{\beta }_k|)-p_{\lambda _n}(|\tilde{\beta }_k|)\} \nonumber \\= & {} -2{\varvec{\epsilon }}^T\mathbf{W}\tilde{\mathbf{X}}\mathbf{v}M_n^{1/2}\delta _n/n+\delta _n^2/n M_n \mathbf{v}^T\tilde{\mathbf{X}}^T\mathbf{W}\tilde{\mathbf{X}} \mathbf{v}\nonumber \\&+\sum _{k=1}^p\{p_{\lambda _n}(|\hat{\beta }_k|)-p_{\lambda _n}(|\tilde{\beta }_k|) \}\le 0, \end{aligned}$$
(20)

where in the third equality we write \(\hat{\varvec{\theta }}-\tilde{\varvec{\theta }}=\delta _nM_n^{1/2}\mathbf{v}\), with \(\delta _n\) being a scalar and \(\mathbf{v}\) being a vector satisfying \(\Vert \mathbf{v}\Vert =1\), and use the fact that \(\tilde{\mathbf{X}}^T\mathbf{W}(\mathbf{Y}-{\varvec{\epsilon }}-\tilde{\mathbf{X}}\tilde{\varvec{\theta }})=0\). The inequality follows from the definition of \(\hat{\varvec{\theta }}\). We first show that \(\delta _n=O_p(r_n+\lambda _n)\). To this end, we can show easily

$$\begin{aligned} \frac{M_n^{1/2}}{n}{\varvec{\epsilon }}^T\mathbf{W}\tilde{\mathbf{X}}\mathbf{v}=\frac{M_n^{1/2}}{n}\sum _{i=1}^n \epsilon _i w_{ni}\tilde{\mathbf{X}}_{(i)}{} \mathbf{v}=O_p(r_n). \end{aligned}$$
(21)

By assumption (C3), Lemma 4 and following Lemma A.2 of Liu et al. (2011), there exists a positive \(c_1\) such that

$$\begin{aligned} \frac{M_n}{n} \mathbf{v}^T\tilde{\mathbf{X}}^T\mathbf{W}\tilde{\mathbf{X}} \mathbf{v}=\frac{M_n}{n} \sum _{i=1}^n w_{ni}{} \mathbf{v}^T \left( \begin{array}{cc} \mathbf{X}_{(i)}{} \mathbf{X}_{(i)}^T &{} \mathbf{X}_{(i)}{} \mathbf{B}_{(i)}^T \\ \mathbf{B}_{(i)}{} \mathbf{X}_{(i)}^T &{} \mathbf{B}_{(i)}{} \mathbf{B}_{(i)}^T \end{array} \right) \mathbf{v} \ge c_1, \end{aligned}$$
(22)

with probability approaching 1. Using inequality \(| p_\lambda (a)-p_\lambda (b)|\le \lambda |a-b|\), we obtain

$$\begin{aligned} \sum _{k=1}^p\{p_{\lambda _n}(|\hat{\beta }_k|)-p_{\lambda _n}(|\tilde{\beta }_k|)\ge & {} \sum _{k=1}^p -\lambda _n |\hat{\beta }_k-\tilde{\beta }_k| \asymp -\lambda _n \delta _n. \end{aligned}$$
(23)

Therefore, \(-O_p(r_n)\delta _n+c_1\delta _n^2-\lambda _n\delta _n\le 0\) with probability approaching 1, which implies that \(\delta _n=O_p(r_n+\lambda _n)\).

Now we notice for \(1\le k \le p\), \(|\hat{\beta }_k-\tilde{\beta }_k|=o_p(1)\). Next we can see

$$\begin{aligned} | \Vert \tilde{\gamma }_k\Vert -\Vert g_k\Vert _{L_2}|\le & {} | \Vert \tilde{g}_k\Vert _{L_2}-\Vert g_k\Vert _{L_2}|\end{aligned}$$
(24)
$$\begin{aligned}\le & {} \Vert \tilde{g}_k -g_k\Vert _{L_2} =O_p(\rho _n)=o_p(1). \end{aligned}$$
(25)

It then follows that \(\hat{\beta }_k\rightarrow \beta _{k0}\), \(\tilde{\beta }_k\rightarrow \beta _{k0}\), \(\Vert \hat{g}_k\Vert _{L_2}\rightarrow \Vert g_{k0}\Vert _{L_2}\) and \(\Vert \tilde{g}_k\Vert _{L_2}\rightarrow \Vert g_{k0}\Vert _{L_2}\) in probability. Because \(|\beta _{k0}|>0\) for \(1\le k\le s\) and \(\lambda \rightarrow 0\), we have that, with probability approaching 1, \(|\hat{\beta }_k|>a\lambda _n\) and \(|\tilde{\beta }_k|>a\lambda _n\) for \(1\le k\le s\). On the other hand, \(\beta _{k0}=0\) for \(s+1\le k \le p\), so the previous results imply \(\tilde{\beta }_k=O_p(\rho _n)\). Since \(\lambda _n/\rho _n \rightarrow \infty \), we have \(|\tilde{\beta }_k| < \lambda _n\), \(s+1\le k \le p\). Consequently by the definition of \(p_\lambda (\cdot )\), we have \(P(p_{\lambda _n}(|\tilde{\beta }_k|)=p_{\lambda _n}(|\hat{\beta }_k|))\rightarrow 1\) when \(1\le k\le s\); and \(P(p_{\lambda _n}(|\tilde{\beta }_k|)=\lambda _n |\tilde{\beta }_k|) \rightarrow 1\) when \(s+1\le k \le p\). Therefore

$$\begin{aligned} \sum _{k=1}^p\{p_{\lambda _n}(|\hat{\beta }_k|)-p_{\lambda _n}(|\tilde{\beta }_k|)\}=\lambda _n \sum _{k=s+1}^p |\tilde{\beta }_k| \ge -O_p(\lambda _n\rho _n). \end{aligned}$$
(26)

Combining with previous results, we have

$$\begin{aligned} Q(\hat{\varvec{\theta }})-Q(\tilde{\varvec{\theta }}) \ge -O_p(r_n)\delta _n+c_1\delta _n^2-O(\lambda _n\rho _n), \end{aligned}$$
(27)

which implies that \(\delta _n=O_p(r_n+(\lambda _n\rho _n)^{1/2})\). \(\square \)

Proof

(Proof of Theorem 1) We prove part a by contradiction. Suppose that for a sufficiently large n there exists a constant \(\eta >0\) such that with probability at least \(\eta \) there exists a \(k^*>s\) such that \(\hat{\beta }_{k^*}\not =0\). Let \(\hat{\varvec{\theta }}^*\) be a vector constructed by replacing \(\hat{\beta }_{k^*}\) with 0 in \(\hat{\varvec{\theta }}\). Then

$$\begin{aligned} Q(\hat{\varvec{\theta }})-Q(\hat{\varvec{\theta }}^*)= & {} \frac{1}{n}\left[ (\mathbf{Y}-\tilde{\mathbf{X}}\hat{\varvec{\theta }})^T\mathbf{W} (\mathbf{Y}-\tilde{\mathbf{X}}\hat{\varvec{\theta }}) - (\mathbf{Y}-\tilde{\mathbf{X}}\hat{\varvec{\theta }}^*)^T\mathbf{W} (\mathbf{Y}-\tilde{\mathbf{X}}\hat{\varvec{\theta }}^*)\right] \nonumber \\&+\,p_{\lambda _n}(|\hat{\beta }_{k^*}|). \end{aligned}$$
(28)

By Lemma 3 and the fact that \(\beta _{k^*0}=0\), \(\hat{\beta }_{k^*}=O_p(r_n+(\lambda _n\rho _n)^{1/2})\). Because \(\lambda _n/\max (r_n,\rho _n)\rightarrow \infty \), we have \(|\hat{\beta }_{k^*}| < \lambda _n\) and thus \(p_{\lambda _n}(|\hat{\beta }_{k^*}|)=\lambda _n |\hat{\beta }_{k^*}|\) with probability approaching 1. For the first term in (28), simple algebra leads to

$$\begin{aligned}&(\mathbf{Y}-\tilde{\mathbf{X}}\hat{\varvec{\theta }})^T\mathbf{W} (\mathbf{Y}-\tilde{\mathbf{X}}\hat{\varvec{\theta }}) - (\mathbf{Y}-\tilde{\mathbf{X}}\hat{\varvec{\theta }}^*)^T\mathbf{W} (\mathbf{Y}-\tilde{\mathbf{X}}\hat{\varvec{\theta }}^*) \nonumber \\&\quad \ge - (\mathbf{Y}-\tilde{\mathbf{X}}\hat{\varvec{\theta }})^T\mathbf{W}\tilde{\mathbf{X}}(\hat{\varvec{\theta }}-\hat{\varvec{\theta }}^*) \nonumber \\&\quad = -2 (\mathbf{Y}-\tilde{\mathbf{X}}\tilde{\varvec{\theta }})^T\mathbf{W}\tilde{\mathbf{X}}(\hat{\varvec{\theta }}-\hat{\varvec{\theta }}^*) \nonumber \\&\qquad -2 (\tilde{\varvec{\theta }}-\hat{\varvec{\theta }}^*)\tilde{\mathbf{X}}^T\mathbf{W}\tilde{\mathbf{X}}(\hat{\varvec{\theta }}-\hat{\varvec{\theta }}^*). \end{aligned}$$
(29)

By the Cauchy–Schwartz inequality,

$$\begin{aligned}&(\tilde{\varvec{\theta }}-\hat{\varvec{\theta }}^*)\tilde{\mathbf{X}}^T\mathbf{W}\tilde{\mathbf{X}}(\hat{\varvec{\theta }}-\hat{\varvec{\theta }}^*)\\&\quad \le \{(\tilde{\varvec{\theta }}-\hat{\varvec{\theta }}^*)\tilde{\mathbf{X}}^T\mathbf{W}\tilde{\mathbf{X}}(\tilde{\varvec{\theta }}-\hat{\varvec{\theta }}^*)\}^{1/2}\\&\qquad \times \{(\hat{\varvec{\theta }}-\hat{\varvec{\theta }}^*)\tilde{\mathbf{X}}^T\mathbf{W}\tilde{\mathbf{X}}(\hat{\varvec{\theta }}-\hat{\varvec{\theta }}^*)\}^{1/2}\\&\quad \le c_2 \frac{n}{M_n}\Vert \tilde{\varvec{\theta }}-\hat{\varvec{\theta }}^*\Vert \Vert \hat{\varvec{\theta }}-\hat{\varvec{\theta }}^*\Vert . \end{aligned}$$

From the triangle inequality and Lemma 3, it follows that

$$\begin{aligned} \Vert \tilde{\varvec{\theta }}-\hat{\varvec{\theta }}^*\Vert\le & {} \Vert \tilde{\varvec{\theta }}-\hat{\varvec{\theta }}\Vert +|\tilde{\beta }_{k^*}|\\= & {} O_p(M_n^{1/2}\{r_n+(\lambda _n\rho _n)^{1/2}+\rho _n \}), \end{aligned}$$

thus

$$\begin{aligned} \frac{1}{n}(\tilde{\varvec{\theta }}-\hat{\varvec{\theta }}^*)\tilde{\mathbf{X}}^T\mathbf{W}\tilde{\mathbf{X}}(\hat{\varvec{\theta }}-\hat{\varvec{\theta }}^*)=O_p(M_n^{-1/2} (r_n+(\lambda _n\rho _n)^{1/2}+\rho _n))|\hat{\beta }_{k^*}|. \end{aligned}$$
(30)

We can also show that

$$\begin{aligned} |(\mathbf{Y}-\tilde{\mathbf{X}}\tilde{\varvec{\theta }})^T\mathbf{W}\tilde{\mathbf{X}}(\hat{\varvec{\theta }}-\hat{\varvec{\theta }}^*)|=|{\varvec{\epsilon }}^T \mathbf{W}\tilde{\mathbf{X}}(\hat{\varvec{\theta }}-\hat{\varvec{\theta }}^*)| =O_p(\frac{nr_n}{M_n^{1/2}})|\hat{\beta }_{k^*}|. \end{aligned}$$
(31)

Combining (28) to (31), we arrive at

$$\begin{aligned} Q(\hat{\varvec{\theta }})-Q(\hat{\varvec{\theta }}^*)\ge & {} \frac{\lambda _n}{M_n^{1/2}}|\hat{\beta }_{k^*}| -O_p(\frac{r_n}{M_n^{1/2}})|\hat{\beta }_{k^*}| \nonumber \\&-O_p(\frac{r_n+(\lambda _n\rho _n)^{1/2}+\rho _n}{M_n^{1/2}})| \hat{\beta }_{k^*}|. \end{aligned}$$
(32)

We note that the first term on the right hand side of (32) dominates the other two terms since \(\lambda _n/\max (\rho _n,r_n)\rightarrow \infty \). This contradicts the fact that \(Q(\hat{\varvec{\theta }})-Q(\hat{\varvec{\theta }}^*)\le 0\). Hence the proof of part a is completed.

To prove parts b and c, we define the oracle version of \(\tilde{\varvec{\theta }}\),

$$\begin{aligned} \tilde{\varvec{\theta }}_{\varOmega }=\text{ arg }\min _{{\varvec{\theta }} =({\varvec{\beta }}_1^T,\mathbf{0}^T,{\varvec{\gamma }}^T)}\frac{1}{n}(\mathbf{Y}-\tilde{\mathbf{X}}{\varvec{\theta }})^T\mathbf{W} (\mathbf{Y}-\tilde{\mathbf{X}}{\varvec{\theta }}) \end{aligned}$$
(33)

which is obtained as if the information of the nonzero components were given. By the construction and Lemma 4, we have \(\Vert \tilde{\varvec{\theta }}_{\varOmega }-{\varvec{\theta }}_0\Vert =O_p(\rho _n)\) and \(\Vert \hat{\varvec{\theta }}-{\varvec{\theta }}_0\Vert =o_p(1)\). Thus, with probability approaching 1, \(\tilde{\beta }_{k,\varOmega }\rightarrow \beta _{k0}\), \(\hat{\beta }_{k}\rightarrow \beta _{k0}\) (\(1\le k\le s\)), \(\Vert \tilde{g}_{k,\varOmega }\Vert \rightarrow \Vert g_{k0}\Vert \), and \(\Vert \hat{g}_{k}\Vert \rightarrow \Vert g_{k0}\Vert \) (\(1\le k\le d\)). On the other hand, for \(s+1\le k\le p\), by the definition \(\tilde{\beta }_{k,\varOmega }=0\), and by part a, with probability approaching 1, \(\hat{\beta }_{k}=0\). Consequently, we have

$$\begin{aligned} \sum _{k=1}^p p_{\lambda _n}(|\tilde{\beta }_{k,\varOmega }|)=\sum _{k=1}^p p_{\lambda _n}(|\hat{\beta }_k|) \end{aligned}$$
(34)

with probability approaching 1. Now write \(\hat{\varvec{\theta }}-\tilde{\varvec{\theta }}_{\varOmega }=\delta _nM_n^{1/2}\mathbf{v}\), with \(\Vert \mathbf{v}\Vert =1\). By (20) and (34),

$$\begin{aligned} 0\ge & {} Q(\hat{\varvec{\theta }})-Q(\tilde{\varvec{\theta }}_{\varOmega })\\= & {} -2{\varvec{\epsilon }}{} \mathbf{W}\tilde{\mathbf{X}}\mathbf{v}M_n^{1/2}\delta _n/n+\delta ^2/n M_n \mathbf{v}^T\tilde{\mathbf{X}}^T\mathbf{W}\tilde{\mathbf{X}} \mathbf{v}\\\ge & {} -O_p(r_n)\delta _n+c_1 \delta _n^2. \end{aligned}$$

Thus \(\Vert \hat{\varvec{\theta }}-\tilde{\varvec{\theta }}_{\varOmega }\Vert \asymp \delta _n=O_p(r_n)\), which, together with \(\Vert \tilde{\varvec{\theta }}_{\varOmega }-{\varvec{\beta }}_0\Vert =O_p(\rho _n)\), implies that \(\Vert \hat{\varvec{\theta }}-\tilde{\varvec{\theta }}\Vert =O_p(\rho _n+r_n)\). Hence the claims in parts b and c follow. \(\square \)

To prove Theorem 2, we need the following lemma which gives the asymptotic behavior of the AFT estimator under Kaplan-Meier weights. Denote the right hand side of (19) to be \({\mathcal {S}}\phi \). Introduce the following sub-distribution functions:

$$\begin{aligned} \tilde{H}_{1}(\mathbf{x},\mathbf{u},y)= & {} P(\mathbf{X}\le \mathbf{x}, \mathbf{U}\le \mathbf{u}, Y\le y, \delta =1)\\ \tilde{H}_0(y)= & {} P(Y\le y, \delta =1). \end{aligned}$$

Put

$$\begin{aligned} \xi _0(y)= & {} \exp \left\{ \int _0^{y-} \frac{\tilde{H}_0(dz)}{1-H(z)} \right\} \\ \xi _1^\phi (y)= & {} \frac{1}{1-H(y)}\int _{w>y}\phi (\mathbf{x},\mathbf{u},w)\xi _0(w)\tilde{H}_1(d\mathbf{x}, d\mathbf{u}, dw)\\ \xi _2^\phi (y)= & {} \int \int \frac{I(v<y,v<w)\phi (\mathbf{x},\mathbf{u},w)\xi _0(w)}{(1-H(v))^2}\tilde{H}_0(dv)\tilde{H}_1(d\mathbf{x},d\mathbf{u}, dw). \end{aligned}$$

Let \(\{\phi _1,\cdots ,\phi _J\}\) be a set of measurable functions. Write

$$\begin{aligned} \underline{\mathcal {S}}_n=({\mathcal {S}}_n\phi _1,\cdots ,{\mathcal {S}}_n\phi _J)^T \end{aligned}$$

and

$$\begin{aligned} \underline{\mathcal {S}}=({\mathcal {S}}\phi _1,\cdots ,{\mathcal {S}}\phi _J)^T. \end{aligned}$$

Lemma 6

Assume that (C1) and (C2) hold. In addition, assume the following two integrability conditions hold for all \(\phi _j\), \(1\le j\le J\),

$$\begin{aligned} \int \phi _j(\mathbf{X},\mathbf{U},W)\xi _0(W)\delta ^2 d\mathbf{P}_{\mathbf{X},\mathbf{U},Y}< & {} \infty \end{aligned}$$
(35)
$$\begin{aligned} \int \phi _j(\mathbf{X},\mathbf{U},W)\sqrt{R(W)}d\mathbf{P}_{\mathbf{X},\mathbf{U},Y}< & {} \infty . \end{aligned}$$
(36)

Then in distribution

$$\begin{aligned} \sqrt{n}(\underline{\mathcal {S}}_n - \underline{\mathcal {S}}) \rightarrow N(0,\varSigma ), \end{aligned}$$
(37)

where \(\varSigma =(\sigma _{jj'})\), \(\sigma _{jj'}=\text{ cov }(\psi _j,\psi _{j'})\) and \(\psi _j=\phi _j(\mathbf{X},\mathbf{U},Y)\xi _0(Y)\delta + \xi _1^{\phi _j}(Y)(1-\delta )-\xi _2^{\phi _j}(Y)\).

The proof of this lemma may follow Stute (1996). We now proceed to prove Theorem 2.

Proof

(Proof of Theorem 2) According to the proof of Lemma 5, with probability approaching 1, \(|\tilde{\beta }_k|>a\lambda _n\), \(|\hat{\beta }_k|>a\lambda _n\) and thus \(p_{\lambda _n}(|\tilde{\beta }_k|)=p_{\lambda _n}(|\hat{\beta }_k|)\) for \(1\le k\le s\). By Theorem 1, with probability approaching 1, \(\hat{\varvec{\theta }}=(\hat{\varvec{\beta }}_{1}^T, \mathbf{0}^T,\hat{\varvec{\gamma }}^T)^T\) is a local minimizer of \(Q({\varvec{\theta }})\). We may note that \(Q({\varvec{\theta }})\) is quadratic in \(({\varvec{\beta }}_1^T,{\varvec{\gamma }}^T)^T\) when \(|\beta _k|>a\lambda _n\) for \(1\le k\le s\). Therefore \(\partial Q({\varvec{\theta }})/\partial {\varvec{\theta }}|_{{\varvec{\beta }}_1=\hat{\varvec{\beta }}_1,{\varvec{\beta }}_2=\mathbf{0}, {\varvec{\gamma }}=\hat{\varvec{\gamma }}}=\mathbf{0}\), which implies that

$$\begin{aligned} (\hat{\varvec{\beta }}_{1}^T, \hat{\varvec{\gamma }}^T)^T=\left( \sum _{i=1}^nw_{ni}\left[ \begin{array}{c@{\quad }c} \mathbf{X}_i^{(1)}(\mathbf{X}_i^{(1)})^T&{}\mathbf{X}_i^{(1)}{} \mathbf{B}_i^T\\ \mathbf{B}_i(\mathbf{X}_i^{(1)})^T&{}\mathbf{B}_i\mathbf{B}_i^T] \end{array}\right] \right) ^{-1} \left( \sum _{i=1}^nw_{ni}\left[ \begin{array}{c} \mathbf{X}_i^{(1)}\\ \mathbf{B}_i \end{array}\right] {Y}_i\right) . \end{aligned}$$

Next we put

$$\begin{aligned} (\tilde{\varvec{\beta }}_{1}^T, \tilde{\varvec{\gamma }}^T)^T= & {} \left( \sum _{i=1}^nw_{ni}\left[ \begin{array}{c@{\quad }c} \mathbf{X}_i^{(1)}(\mathbf{X}_i^{(1)})^T&{}\mathbf{X}_i^{(1)}{} \mathbf{B}_i^T\\ \mathbf{B}_i(\mathbf{X}_i^{(1)})^T&{}\mathbf{B}_i\mathbf{B}_i^T \end{array}\right] \right) ^{-1}\nonumber \\&\times \left( \sum _{i=1}^n\left[ \begin{array}{c} (\mathbf{X}_i^{(1)})\\ \mathbf{B}_i \end{array}\right] E\{w_{ni}{Y}_i|\mathbf{X}_i,\mathbf{U}_i\}\right) . \end{aligned}$$

Invoking Lemmas 3 and 6 while applying a version of Lindeberge central limit theorem (cf. Petrov 1975), we obtain that for any vector \(\mathbf{c}_n\) with dimension \(s+dM_n\) and components not all 0,

$$\begin{aligned} \{ \mathbf{c}_n^T\varUpsilon \mathbf{c}_n\}^{-1/2}{} \mathbf{c}_n^T\left( \left[ \begin{array}{c} \hat{\varvec{\beta }}_1\\ \hat{\varvec{\gamma }} \end{array}\right] -\left[ \begin{array}{c} \tilde{\varvec{\beta }}_1\\ \tilde{\varvec{\gamma }}\end{array}\right] \right) \rightarrow _d N(0,1), \end{aligned}$$
(38)

where \(\varUpsilon = \mathbf{H}^{-1} \varSigma ^* \mathbf{H}^{-1}\), \(\mathbf{H}=E\left[ \begin{array}{cc} (\mathbf{X}_i^{(1)})(\mathbf{X}_i^{(1)})^T&{}(\mathbf{X}_i^{(1)})\mathbf{B}_i^T\\ \mathbf{B}_i(\mathbf{X}_i^{(1)})^T&{}\mathbf{B}_i\mathbf{B}_i^T \end{array}\right] \) and \(\varSigma ^*=\text{ Var }[ \delta _i \xi _0^*(Y_i)(Y_i-E(Y_i|\mathbf{X}_i^{(1)}, \mathbf{U}_i))((\mathbf{X}_i^{(1)})^T, \mathbf{B}_i^T)^T+(1-\delta _i)\xi _1^*(Y_i)-\xi _2^*(Y_i)]\). Part a of Theorem 2 follows from (38) immediately. Further, if we choose \(\mathbf{c}_n=(\mathbf{0}^T,\mathbf{B}(\mathbf{u})^T\mathbf{a}_n)\) such that not all elements of \(\mathbf{a}_n\) are 0, we obtain

$$\begin{aligned} \{ \mathbf{a}_n^T\varGamma (\mathbf{u}) \mathbf{a}_n\}^{-1/2}{} \mathbf{a}_n^T\left\{ \left[ \begin{array}{c} \hat{g}_1(u_1)\\ \cdots \\ \hat{g}_d(u_d) \end{array}\right] -\left[ \begin{array}{c} \tilde{g}_1(u_1)\\ \cdots \\ \tilde{g}_d(u_d) \end{array}\right] \right\} \rightarrow _d N(0,1) \end{aligned}$$

which leads to part b. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xia, X., Jiang, B., Li, J. et al. Low-dimensional confounder adjustment and high-dimensional penalized estimation for survival analysis. Lifetime Data Anal 22, 547–569 (2016). https://doi.org/10.1007/s10985-015-9350-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-015-9350-z

Keywords

Navigation