Abstract
The regularization approach for variable selection was well developed for semiparametric accelerated failure time (AFT) models, where the response variable is right censored. In the presence of missing data, this approach needs to be tailored to different missing data mechanisms. In this paper, we propose a flexible and generally applicable missing data mechanism for AFT models, which contains both ignorable and nonignorable missing data mechanism assumptions. We propose weighted rank (WR) estimators and corresponding penalized estimators of regression parameters under this missing data mechanism. An advantage of the WR estimators and corresponding penalized estimators is that they do not require specifying a missing data model for the proposed missing data mechanism. The theoretical properties of the WR and corresponding penalized estimators are established. Comprehensive simulation studies and a real data application further demonstrate the merits of our approach.
Similar content being viewed by others
Data availability
Data will be made available on request.
References
Amemiya, T. (1985). Advanced econometrics. Harvard University Press.
Buckley, J., & James, I. (1979). Linear regression with censored data. Biometrika, 66(3), 429–436.
Cai, T., Huang, J., & Tian, L. (2009). Regularized estimation for the accelerated failure time model. Biometrics, 65(2), 394–404.
Ding, Y., & Nan, B. (2011). A sieve M-theorem for bundled parameters in semiparametric models, with application to the efficient estimation in a linear model for censored data. The Annals of Statistics, 39(6), 3032–3061.
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Fan, J., & Lv, J. (2011). Nonconcave penalized likelihood with NP-dimensionality. IEEE Transactions on Information Theory, 57(8), 5467–5484.
Fleming, T., & Harrington, D. (1991). Counting processes and survival analysis. Wiley.
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1–22.
Fu, W. J. (1998). Penalized regression: The bridge versus the LASSO. Journal of Computational and Graphical Statistics, 7, 397–416.
Fygenson, M., & Ritov, Y. (1994). Monotone estimating equations for censored data. The Annals of Statistics, 22, 732–746.
Huang, J., & Ma, S. (2010). Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Analysis, 16(2), 176–195.
Huang, J., Ma, S., & Xie, H. (2006). Regularized estimation in the accelerated failure time model with high-dimensional covariates. Biometrics, 62(3), 813–820.
Jin, Z., Lin, D. Y., Wei, L. J., & Ying, Z. (2003). Rank-based inference for the accelerated failure time model. Biometrika, 90, 341–353.
Jin, Z., Lin, D. Y., & Ying, Z. (2006). Least-squares regression with censored data. Biometrika, 93, 147–161.
Jin, Z., Ying, Z., & Wei, L. J. (2001). A simple resampling method by perturbing the minimand. Biometrika, 88, 381–390.
Johnson, L. M., & Strawderman, R. L. (2009). Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data. Biometrika, 96, 577–590.
Jones, M. P. (1997). A class of semiparametric regressions for the accelerated failure time model. Biometrika, 84, 73–84.
Kalbeisch, J. D. (1978). Likelihood methods and nonparametric tests. Journal of the American Statistical Association, 73, 167–170.
Kowalski, J., & Tu, X. M. (2008). Modern applied U-statistics. Wiley.
Lai, T. L., & Ying, Z. (1991). Rank regression methods for left-truncated and right-censored data. The Annals of Statistics, 19, 531–556.
Lai, T. L., & Ying, Z. (1991). Large-sample theory of a modified Buckley-James estimator for regression analysis with censored data. The Annals of Statistics, 19, 1370–1402.
Liang, K.-Y., & Qin, J. (2000). Regression analysis under non-standard situations: A pairwise pseudolikelihood approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(4), 773–786.
Lin, Y., & Chen, K. (2013). Efficient estimation of the censored linear regression model. Biometrika, 100(2), 525–530.
Lin, D. Y., & Ying, Z. (1995). Semiparametric inference for the accelerated life model with time-dependent covariates. Journal of Statistical Planning and Inference, 44, 47–63.
Liu, T., Yuan, X., & Sun, J. (2021). Weighted rank estimation for nonparametric transformation models with nonignorable missing data. Computational Statistics and Data Analysis, 153, 107061.
Miller, R. G. (1976). Least squares regression with censored data. Biometrika, 63(3), 449–464.
Nan, B., Kalbfleisch, J. D., & Yu, M. (2009). Asymptotic theory for the semiparametric accelerated failure time model with missing data. The Annals of Statistics, 37, 2351–2376.
Nolan, D., & Pollard, D. (1987). U-processes: Rates of convergence. The Annals of Statistics, 15, 780–799.
Prentice, R. L. (1978). Linear rank tests with right-censored data. Biometrika, 65, 167–179.
Ritov, Y. (1990). Estimation in a linear regression model with censored data. The Annals of Statistics, 18(1), 303–328.
Sherman, R. (1993). The limiting distribution of the maximum rank correlation estimator. Econometrica, 61, 123–137.
Sherman, R. (1994). Maximal inequalities for degenerate U-processes with applications to optimization estimators. The Annals of Statistics, 22, 439–459.
Steingrimsson, J. A., & Strawderman, R. L. (2017). Estimation in the semiparametric accelerated failure time model with missing covariates: Improving efficiency through augmentation. Journal of the American Statistical Association, 112, 1221–1235.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 58, 267–288.
Tsiatis, A. A. (1990). Estimating regression parameters using linear rank tests for censored data. The Annals of Statistics, 18(1), 354–372.
Wang, H., & Leng, C. (2007). Unified Lasso estimation via least squares approximation. Journal of the American Statistical Association, 102(479), 1039–1048.
Wang, H., Li, R., & Tsai, C.-L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94(3), 553–568.
Wang, X., & Song, L. (2011). Adaptive lasso variable selection for the accelerated failure models. Communications in Statistics - Theory and Methods, 40(24), 4372–4386.
Wei, L. J., Ying, Z., & Lin, D. Y. (1990). Linear regression analysis of censored survival data based on rank tests. Biometrika, 11, 845–851.
Xu, J., Leng, C., & Ying, Z. (2010). Rank-based variable selection with censored data. Statistics and Computing, 20, 165–176.
Yang, S. (1997). Extended weighted log-rank estimating functions in censored regression. Journal of the American Statistical Association, 92, 977–984.
Ying, Z. (1993). A large sample study of rank estimation for censored regression data. The Annals of Statistics, 1(21), 76–99.
Yuan, X., Wang, Y., & Liu, T. (2020). Variable selection for semiparametric random-effects conditional density models with longitudinal data. Communications in Statistics-Theory and Methods, 49(4), 977–996.
Zeng, D., & Lin, D. (2007). Efficient estimation for the accelerated failure time model. Journal of the American Statistical Association, 102(480), 1387–1396.
Zhao, J., Yang, Y., & Ning, Y. (2018). Penalized pairwise pseudo likelihood for variable selection with nonignorable missing data. Statistica Sinica, 28, 2125–2148.
Zhou, M. (2005). Empirical likelihood analysis of the rank estimator for the censored accelerated failure time model. Biometrika, 92, 492–498.
Zou, H. (2006). The adaptive LASSO and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.
Zou, H., & Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. The Annals of Statistics, 36, 1509–1566.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There are no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendices
Appendix A
The proofs of Theorems 3 and 4 are very similar to those of Theorems 3.1 and 3.2 in Yuan et al. (2020), respectively. However, our problem has some distinct features. In Yuan et al. (2020), the unpenalized objective function is a smooth function of the regression parameters. In this paper, the unpenalized objective function and its score function are not smooth, which makes a direct application of the standard penalized approach difficult.
Henceforth, we suppress \(\lambda\) from the quantities \(Q_{\varpi ,\lambda }({\varvec{\beta }})\) and \({\hat{{\varvec{\beta }}}}_{\varpi ,\lambda }\) for \(\varpi \in \{A,S\}\).
Proof of Theorem 1
Recall that \(\ell ({\varvec{\beta }})=E\{\ell _{N}({\varvec{\beta }})\}\). Our consistency proof utilizes the approach developed in Amemiya (1985, pp. 106–107) to carry out the steps outlined below: (i) \(\ell ({\varvec{\beta }})\) is uniquely maximized at \({\varvec{\beta }}^*\); (ii) \(\sup _{{{\varvec{\beta }}}\in \Theta }|\ell _N({\varvec{\beta }})-\ell ({\varvec{\beta }})|=o_p(1)\); (iii) \(\ell ({\varvec{\beta }})\) is continuous on \(\Theta\).
Proof of (i) Note that \(\varepsilon _i\) \((i=1, \cdots , N)\) are independent error terms with a common density function \(f_\varepsilon (u)\). For convenience, let
then \(\ell ({\varvec{\beta }})=E\{{\bar{\ell }}_{N}({\varvec{\beta }})\}\). We first show that
Let \(O_i=(X_i^{\textsf {T}},C_i,\varepsilon _i)^{\textsf {T}}\), \(i=1,\ldots ,N\). On one hand,
where
On the other hand,
It follows that \(\partial {\bar{\ell }}_N({\varvec{\beta }}^*)/\partial {\varvec{\beta }}=0\) and thus \(\partial \ell ({\varvec{\beta }}^*)/\partial {\varvec{\beta }}=E\{\partial {\bar{\ell }}_N({\varvec{\beta }}^*)/\partial {\varvec{\beta }}\}=0.\) In the following, we show that \(\ell ({\varvec{\beta }})=E\{{\bar{\ell }}_{N}({\varvec{\beta }})\}\) is a concave function of \({\varvec{\beta }}\). For two matrices A and B, we write \(A \le B\) if \(B - A\) is a nonnegative-definite matrix. Direct calculations give
Define \(g(v)= m_2(v)f_{\varepsilon }(v)\) and \(G(v)=\int _{v}^{+\infty }g(s)ds\). From C2, we have \(g(v)>0\) and \(G(v)>0\) for all \(v\in {\mathbb {R}}\). It follows that
Let
Utilizing the identity (A1), we can write
and
Recall that \(S(u|X)=P(\log (C)>u|X)\). Then, for \(i\ne j\), we can write
Using conditions C2–C3, for \(i\ne j\), we have
with probability one. By conditions C2, C4 and the inequality (A2), we obtain
which is a positive definite matrix for large enough N. Notice that
This fact implies that the matrix \(D({\varvec{\beta }})\) is positive definite for all \({\varvec{\beta }}\in \Theta\). Therefore, \(\ell ({\varvec{\beta }})\) is a concave function on \(\Theta\). Combining previous results, it follows that \({\varvec{\beta }}^*\) is the unique maximizer of \(\ell ({\varvec{\beta }})\) and thus (i) is proved.
Proof of (ii) For each \({\varvec{\beta }}\in \Theta\) and each \((z_1,z_2)\) in \({\mathcal {Z}}\times {\mathcal {Z}}\), we can write \(\ell _N({\varvec{\beta }})-\ell ({\varvec{\beta }})={\mathbb {U}}_Nf(\cdot ,\cdot ,{\varvec{\beta }})\), where
and \({\mathbb {U}}_N\) denotes the random measure putting mass \(1/(N^2-N)\) on each pair \((Z_i,Z_j)\), \(i\ne j\). Applying the arguments in Sherman (1993, Section 5) to the class \(\{f(\cdot ,\cdot ,{\varvec{\beta }}):{\varvec{\beta }}\in \Theta \}\), shows that it is Euclidean for the envelope \(|\Psi (Z_1,Z_2)|+E\{|\Psi (Z_1,Z_2)|\}\), where \(\Psi (Z_1,Z_2)=|\log (Y_i)-\log (Y_j)|+\Vert X_{i}-X_{j}\Vert \sup _{{{\varvec{\beta }}}\in \Theta }\Vert {\varvec{\beta }}\Vert\) and \(E\{|\Psi (Z_1,Z_2)|\}<+\infty\) under condition C4. An application of Corollary 7 in Sherman (1994, Section 6) shows that \(\sup _{{{\varvec{\beta }}}\in \Theta }|{\mathbb {U}}_Nf(\cdot ,\cdot ,{\varvec{\beta }})|=O_p(N^{-1/2}).\) This establishes (ii).
Proof of (iii) Note that \(\ell _N({\varvec{\beta }})\) is continuous. It follows that \(\ell ({\varvec{\beta }})\) is continuous. Hence, consistency is proved. \(\square\)
Lemma 1
Under conditions C0C4, for every sequence \(\kappa _N>0\) with \(\kappa _N\rightarrow 0\), we have
holds uniformly in \(\{{\varvec{\beta }}\in \Theta : \Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert \le \kappa _N\}\).
Proof of Lemma 1
Let \(\epsilon _N({\varvec{\beta }})=\ell _{N}({\varvec{\beta }})-{\ell }({\varvec{\beta }})\). A standard decomposition of U-statistics gives
where
Note that \(E\{b_i({\varvec{\beta }})\} = 0\) for \({\varvec{\beta }}\in \Theta\) and \(b_i({\varvec{\beta }}^*)=0\). A Taylor expansion gives
where \({\dot{b}}_{i}({\varvec{\beta }})=\partial b_{i}({\varvec{\beta }})/\partial {\varvec{\beta }}\).
Combining the identical subgraph set and Vapnik-Chervonenkis class set arguments of Sherman (1993, Section 5) with Corollary 17 and Corollary 21 in Nolan and Pollard (1987), shows that the class of function \(d_{ij}({\varvec{\beta }})\) is Euclidean. The Euclidean property together with Corollary 8 of Sherman (1994) guarantee that, for any sequence \(\kappa _N\) of order o(1),
For \({\varvec{\beta }}\) in a neighbourhood of \({\varvec{\beta }}^*\), a Taylor expansion gives
where \(u({\varvec{\beta }})=\partial \ell ({\varvec{\beta }})/\partial {\varvec{\beta }}\), \(D({\varvec{\beta }})=-\partial ^2 \ell ({\varvec{\beta }})/\partial {\varvec{\beta }}\partial {\varvec{\beta }}^{\textsf {T}}\) and \(u({\varvec{\beta }}^*)=0\). Under conditions C2-C4, the matrix \(D({\varvec{\beta }})\) is invertible for \(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert \le \kappa _N\).
By (A3), (A4), (A5), and (A6), we have
\(\square\)
Lemma 2
Under conditions C0-C4, for every sequence \(\kappa _N>0\) with \(\kappa _N\rightarrow 0\), we have
Proof of Lemma 2
We only need to show that, for \(l=1,\ldots ,p\),
where \(U_{Nl}({\varvec{\beta }})=\frac{1}{N(N-1)}\sum _{i\ne j}^Nh_l(Z_i,Z_j;{\varvec{\beta }})\), \(h_l(Z_i,Z_j;{\varvec{\beta }})=-R_{i}R_{j}\Delta _iI\{e_i({\varvec{\beta }})<e_j({\varvec{\beta }})\}(X_{il}-X_{jl})\) and \(\Gamma _{l}=\partial E\{h_l(Z_1,Z_2;{\varvec{\beta }})/\partial {\varvec{\beta }}^{\textsf {T}}\}\bigr |_{{{\varvec{\beta }}}={{\varvec{\beta }}}^*}.\) It is easy to verify that \(D=-(\Gamma _{1}^{\textsf {T}},\ldots ,\Gamma _{p}^{\textsf {T}})^{\textsf {T}}\). Let \(U_l({\varvec{\beta }})=E\{U_{Nl}({\varvec{\beta }})\}\) and \(\epsilon _{Nl}({\varvec{\beta }})=U_{Nl}({\varvec{\beta }})-U_{l}({\varvec{\beta }})\). A standard decomposition of U-statistics gives
where
Note that \(E\{b_{i,l}({\varvec{\beta }})\} = 0\) for \({\varvec{\beta }}\in \Theta _2\) and \(b_{i,l}({\varvec{\beta }}^*)=0\). A Taylor expansion gives
where \({\dot{b}}_{i,l}({\varvec{\beta }})=\partial b_{i,l}({\varvec{\beta }})/\partial {\varvec{\beta }}\) and \(N^{-1}\sum _{i=1}^N{\dot{b}}_{i,l}({\varvec{\beta }}^*)=O_p(N^{-1/2}).\) Similar arguments for proving (A5) again guarantee that, for any sequence \(\kappa _N\) of order o(1),
It is easy to see that
Combining (A7), (A8), (A9) and (A10), the desired result follows. \(\square\)
Proof of Theorem 2
From the proof of Theorem 1, we know that the matrix \(D({\varvec{\beta }})\) is positive definite for all \({\varvec{\beta }}\in \Theta\). By Lemma 1, we obtain
where
Hence, the maximizer of \(\varrho _N({\varvec{\beta }})\) is \({\hat{{\varvec{\gamma }}}}={\varvec{\beta }}^*+D^{-1}({\varvec{\beta }}^*)\frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*).\) By Theorem 1, \({\hat{{\varvec{\beta }}}}_W={\varvec{\beta }}^*+o_p(1)\). Since \({\hat{{\varvec{\beta }}}}_W\) is the maximizer of \(\ell _{N}({\varvec{\beta }})\), we have
On the other hand, in view of the expression for \(\varrho _N\),
Combining (A11) and (A12), we obtain
Obviously, \({\hat{{\varvec{\gamma }}}}-{\varvec{\beta }}^*=D^{-1}({\varvec{\beta }}^*)\frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*)=O_p(N^{-1/2})\). It follows that \({\hat{{\varvec{\beta }}}}_W-{\varvec{\beta }}^*=O_p(N^{-1/2})\) and \({\hat{{\varvec{\beta }}}}_W={\varvec{\beta }}^*+D^{-1}({\varvec{\beta }}^*)\frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*)+o_p(N^{-1/2}).\) By the multivariate central limit theorem, the proof of Theorem 2 is complete. \(\square\)
Proof of Theorem 3
Proof of part (1). Let \(\alpha _N=N^{-1/2}+\max \{P'_{S,\lambda _N}(|\beta _j^*|):\beta _j^*\ne 0\}\). We first prove that for any \(\varepsilon >0\), there exists a large constant C such that
where \({\varvec{u}}=(u_1,\ldots ,u_p)^\textsf {T}\). Observing that \(P_{S,\lambda _N}(0)=0\), we have
where s is the number of components of \({\varvec{\beta }}_1^*\). For any \({\varvec{\beta }}\in \{{\varvec{\beta }}: {\varvec{\beta }}={\varvec{\beta }}^*+\alpha _N{\varvec{u}}, \Vert {\varvec{u}}\Vert =C\}\), by Lemma 1, we get
Combining (A13) with (A14), we obtain
Note that the third term in (A15) is bounded by
From Fan and Li (2001), the SCAD penalty function \(P_{S,\lambda _N}(\cdot )\) satisfies
where \(\lambda _N \rightarrow 0\) and \(N^{1/2} \lambda _N \rightarrow \infty\) as \(N\rightarrow \infty\).
(A17) implies that the term in (A16) is further dominated by the second term of (A15). By choosing a sufficiently large C, the second term of (A15) also dominates the first term. Hence, by choosing a sufficiently large C, (A12) is proved.
(A12) guarantees that with probability at least \(1-\varepsilon\), there exists a local maximum in the ball \({\varvec{\beta }}\in \{{\varvec{\beta }}: {\varvec{\beta }}={\varvec{\beta }}^*+\alpha _N{\varvec{u}}, \Vert {\varvec{u}}\Vert \le C\}\). Let \({\hat{{\varvec{\beta }}}}_{S}\) denote the local maximizer in the ball \({\varvec{\beta }}\in \{{\varvec{\beta }}: {\varvec{\beta }}={\varvec{\beta }}^*+\alpha _N{\varvec{u}}, \Vert {\varvec{u}}\Vert \le C\}\), it follows that \(\Vert {\hat{{\varvec{\beta }}}}_{S}-{\varvec{\beta }}^*\Vert =O_p(\alpha _N)\).
Next, we show that
for any given \({\varvec{\beta }}_1\) satisfying \(\Vert {\varvec{\beta }}_1-{\varvec{\beta }}_1^*\Vert =O_p(N^{-1/2})\) and any constant C. Note that (A18) implies that \({\hat{{\varvec{\beta }}}}_{2,S}=0\). To prove (A18), we only need to prove that for some \(\varepsilon _N=CN^{-1/2}\) and \(j=s+1,\ldots , p\), \(\partial Q_{S}({\varvec{\beta }})/\partial \beta _j<0\) for \(0< \beta _j< \varepsilon _N\), \(\partial Q_{S}({\varvec{\beta }})/\partial \beta _j>0\) for \(-\varepsilon _N< \beta _j<0\).
Using Lemma 2, we obtain
Then, we can write
From (A17), \(\lim \inf _{N\rightarrow \infty } \lim \inf _{\theta \rightarrow 0+} P'_{S,\lambda _N}(\theta )/\lambda _N >0\) and \(N^{-1/2}/\lambda _N\rightarrow 0\). These facts indicate that the sign of the derivative \(\partial Q_{S}({\varvec{\beta }})/\partial \beta _j\) is completely determined by that of \(\beta _j\). The desired result follows.
Proof of part (2). Consider \(Q_{S}(({\varvec{\beta }}_1^\textsf {T},0^\textsf {T})^\textsf {T})\) as a function of \({\varvec{\beta }}_1\). Then there exists a \(N^{1/2}\)-consistent local maximizer of \(Q_{S}(({\varvec{\beta }}_1^\textsf {T},0^\textsf {T})^\textsf {T})\). Let \({\hat{{\varvec{\beta }}}}_{1,S}\) denote the local maximizer, which is a solution to the following likelihood equations \(\frac{\partial Q_{S}(({{\varvec{\beta }}}_1^\textsf {T},0^\textsf {T})^\textsf {T})}{\partial {{\varvec{\beta }}}_1}=0.\) By Lemma 2 and a Taylor expansion, we get
where \(U_{1,N}({\varvec{\beta }})=\partial \ell _{N}({\varvec{\beta }})/\partial {\varvec{\beta }}_1\), and \(E_1\) and \(F_1\) are defined in theorem 3. An application of the Theorem 1 of Kowalski and Tu (2008, page 158) yields
By Slutsky’s theorem and the central limit theorem, it follows that
This completes the proof. \(\square\)
Proof of Theorem 4
Proof of part (1). Let \(\alpha _N=N^{-1/2}\). We first prove that for any given \(\varepsilon >0\), there exists a large constant C such that
where \({\varvec{u}}=(u_1,\ldots ,u_p)^\textsf {T}\). By the definition of \(Q_{A}({\varvec{\beta }})\), we have
Thus, it is sufficient to show that \(\{\ell _N({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-\ell _N({\varvec{\beta }}^*)\}+\lambda _N \alpha _N \sum _{j=1}^{s}w_j |u_j|\le 0\) for a sufficiently large C with probability approaching one.
From (A14), we get
for any \({\varvec{\beta }}\in \{{\varvec{\beta }}: {\varvec{\beta }}={\varvec{\beta }}^*+\alpha _N{\varvec{u}}, \Vert {\varvec{u}}\Vert =C\}\). Note that D is a positive definite matrix. Then \(\frac{1}{2}\alpha _N^2 {\varvec{u}}^\textsf {T}D{\varvec{u}}=O_p(\alpha _N^2C^2)\). Therefore, for a sufficiently large C, the first term dominates the other terms on the right hand side of the equation (A21).
Moreover, direct applications of Taylor’s expansion lead to
Based on this fact, we obtain
Combining (A21) and (A22), it follows that
for a sufficiently large C. Thus (A20) is proved and there exists a local maximizer \({\hat{{\varvec{\beta }}}}_{A}\) such that \(\Vert {\hat{{\varvec{\beta }}}}_{A}-{\varvec{\beta }}^*\Vert =O_p(\alpha _N)\).
Next, we show that show that
for any given \({\varvec{\beta }}_1\) satisfying \(\Vert {\varvec{\beta }}_1-{\varvec{\beta }}_1^*\Vert =O_p(N^{-1/2})\) and any constant C. By (A19), we can write
From the condition of Theorem 4, \(N^{\frac{\gamma +1}{2}}\lambda _N \rightarrow \infty\) as \(N\rightarrow \infty\). Then, \(-\beta _j\partial Q_{A}({\varvec{\beta }})/\partial \beta _j>0\) with probability approaching on as \(N\rightarrow \infty\). Consequently, (A23) is proved and thus \({\hat{{\varvec{\beta }}}}_{2,A}=0\).
Proof of part (2). Let \({\hat{{\varvec{\beta }}}}_{1,A}\) denote the local maximizer of \(Q_{A}(({\varvec{\beta }}_1^\textsf {T},0^\textsf {T})^\textsf {T})\), which is a function of \({\varvec{\beta }}_1\). Then, \({\hat{{\varvec{\beta }}}}_{1,A}\) is \(N^{1/2}\)-consistent and we have the following asymptotic expression
By Theorem 1 of Kowalski and Tu (2008, page 158), we obtain
Since \(N^{1/2}\lambda _N=o_p(1)\), the desired result follows. \(\square\)
Appendix B
In this appendix, we give algorithms for maximizing \(Q_{\varpi ,\lambda }({\varvec{\beta }})\) in (5). Efficient algorithms for maximizing penalized likelihood include the local quadratic approximation (LQA) algorithm (Fan & Li, 2001), the local linear approximation (LLA) algorithm (Zou & Li, 2008) and the coordinate optimization algorithm (Fu, 1998; Fan & Lv, 2011). Here, we adopt the LLA algorithm. Define
where \({\varvec{\alpha }}=(\alpha _1,\ldots ,\alpha _p)^\textsf {T}\). The LLA algorithm can be summarized as follows:
- 1.:
-
Initialize \({\hat{{\varvec{\beta }}}}^{(0)}=({\hat{\beta }}_1^{(0)},\ldots ,{\hat{\beta }}_p^{(0)})\), and compute the adaptive weight
$$\begin{aligned} {\hat{{\varvec{\alpha }}}}^{(0)}= & {} ({\hat{\alpha }}_1^{(0)},\ldots ,{\hat{\alpha }}_p^{(0)})^\textsf {T}\\= & {} (P_\lambda '(|{\hat{\beta }}_1^{(0)}|),\ldots ,P_\lambda '(|{\hat{\beta }}_p^{(0)}|))^\textsf {T}; \end{aligned}$$ - 2.:
-
Compute
$$\begin{aligned} {\hat{{\varvec{\beta }}}}^{(m)}=({\hat{\beta }}_1^{(m)},\ldots ,{\hat{\beta }}_p^{(m)})^\textsf {T} =\arg \max _{{{\varvec{\beta }}}} Q ({\varvec{\beta }}, {\hat{{\varvec{\alpha }}}}^{(m-1)}), \end{aligned}$$where \(Q({\varvec{\beta }}, {\varvec{\alpha }})\) is defined in (B.1);
- 3.:
-
Update the adaptive weight vector
$$\begin{aligned} {\hat{{\varvec{\alpha }}}}^{(m)}=({\hat{\alpha }}_1^{(m)},\ldots ,{\hat{\alpha }}_p^{(m)})^\textsf {T}, \end{aligned}$$where \({\hat{\alpha }}_j^{(m)}=P_\lambda '(|{\hat{\beta }}_j^{(m)}|)\), \(j=1,\ldots ,p\);
- 4.:
-
Repeat steps (2)–(3) till convergence.
For fixed \({\varvec{\alpha }}\), to maximize \(Q ({\varvec{\beta }}, {\varvec{\alpha }})\) with respect to \({\varvec{\beta }}\) in step (2), we employ the idea of the coordinate optimization algorithm. In a coordinate optimization algorithm, one maximizes one coordinate at a time with successive displacements. Let \(X_{i,-k}=(X_{i,1},\ldots ,X_{i,k-1},X_{i,k+1},\ldots ,X_{i,p})^\textsf {T}\) and \(X_{i}=(X_{i,1},\ldots ,X_{i,p})^\textsf {T}\). Define
where \(u_{ij,k}=\log (Y_i)-\log (Y_j)-({\varvec{\zeta }}^\textsf {T},{\varvec{\eta }}^\textsf {T})(X_{i,-k}-X_{j,-k})^\textsf {T},\) \(v_{ij,k}= X_{i,k}-X_{j,k}\), \({\varvec{\zeta }}^\textsf {T}=(\zeta _1,\ldots ,\zeta _{k-1})\), and \({\varvec{\eta }}^\textsf {T}=(\eta _1,\ldots ,\zeta _{p-k})\). Note that we set \({\varvec{\zeta }}=\emptyset\) if \(k=1\) and \({\varvec{\eta }}=\emptyset\) if \(k=p\). Differentiating \(\ell _{k}(\gamma |{\varvec{\zeta }},{\varvec{\eta }},{\varvec{\alpha }})\) with respect to \(\gamma\), we obtain the following estimating equation
which is equivalent to
Let \(b_{00,k}=0\), \(b_{ij,k}=u_{ij,k}/v_{ij,k}\), \(w_{00,k}=2(N^2-N)\alpha _k\), \(w_{ij,k}=R_{i}R_{j}\Delta _i |v_{ij,k}|\), \(i,j=1,\ldots ,N\). Define
where \({\mathscr {S}}=\{(0,0)\}\cup \{(i,j):v_{ij,k}\ne 0,\ i,j=1,\ldots ,N\}.\) Then, we can write
where
To this end, the proposed coordinate optimization algorithm for maximizing \(Q ({\varvec{\beta }}, {\varvec{\alpha }})\) with respect to \({\varvec{\beta }}\) in step (2) of the LLA algorithm is given by
- (a):
-
Set initial value \({\varvec{\beta }}^{(0)}=(\beta _1^{(0)},\ldots ,\beta _p^{(0)})^\textsf {T}\);
- (b):
-
Given \({\varvec{\beta }}^{(m)}=(\beta _1^{(m)},\ldots ,\beta _p^{(m)})^\textsf {T}\), for \(k=1,\ldots ,p\), compute
$$\begin{aligned} \beta _k^{(m+1)}={\hat{\gamma }}_k((\beta _1^{(m+1)},\ldots ,\beta _{k-1}^{(m+1)})^\textsf {T},(\beta _{k+1}^{(m)},\ldots ,\beta _p^{(m)})^\textsf {T},{\varvec{\alpha }}), \end{aligned}$$where \({\hat{\gamma }}_k({\varvec{\zeta }},{\varvec{\eta }},{\varvec{\alpha }})\) is defined in (B.2).
Then, set \({\varvec{\beta }}^{(m+1)}=(\beta _1^{(m+1)},\ldots ,\beta _p^{(m+1)})^\textsf {T};\)
- (c):
-
Repeat step (b) till \(\Vert {\varvec{\beta }}^{(m+1)}-{\varvec{\beta }}^{(m)}\Vert <10^{-6}\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, T., Yuan, X. & Sun, L. Variable selection for semiparametric accelerated failure time models with nonignorable missing data. J. Korean Stat. Soc. 53, 100–131 (2024). https://doi.org/10.1007/s42952-023-00238-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-023-00238-z