Skip to main content

Advertisement

Log in

Joint feature screening for ultra-high-dimensional sparse additive hazards model by the sparsity-restricted pseudo-score estimator

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

Due to the coexistence of ultra-high dimensionality and right censoring, it is very challenging to develop feature screening procedure for ultra-high-dimensional survival data. In this paper, we propose a joint screening approach for the sparse additive hazards model with ultra-high-dimensional features. Our proposed screening is based on a sparsity-restricted pseudo-score estimator which could be obtained effectively through the iterative hard-thresholding algorithm. We establish the sure screening property of the proposed procedure theoretically under rather mild assumptions. Extensive simulation studies verify its improvements over the main existing screening approaches for ultra-high-dimensional survival data. Finally, the proposed screening method is illustrated by dataset from a breast cancer study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Annest, A., Bumgarner, R., Raftery, A., Yeung, K. (2009). Iterative Bayesian model averaging: A method for the application of survival analysis to high-dimensional microarray data. BMC Bioinformatics, 10, 72.

  • Bertsekas, D. (2016). Nonlinear programming (3rd ed.). Nashua: Athena Scientific.

    MATH  Google Scholar 

  • Bickel, P., Ritov, Y., Tsybakov, A. (2009). Simultaneous analysis of lasso and dantzig selector. The Annals of Statistics, 37, 1705–1732.

  • Bradic, J., Fan, J., Jiang, J. (2011). Regularization for Cox’s proportional hazards model with NP-dimensionality. The Annals of Statistics, 39, 3092–3120.

  • Cai, J., Fan, J., Li, R., Zhou, H. (2005). Variable selection for multivariate failure time data. Biometrika, 92, 303–316.

  • Chang, J., Tang, C., Wu, Y. (2013). Marginal empirical likelihood and sure independence feature screening. The Annals of Statistics, 41, 2123–2148.

  • Chen, X. (2018). Model-free conditional feature screening for ultra-high dimensional right censored data. Journal of Statistical Computation and Simulation. https://doi.org/10.1080/00949655.2018.1466142.

  • Chen, X., Chen, X., Liu, Y. (2017). A note on quantile feature screening via distance correlation. Statistical Papers. https://doi.org/10.1007/s00362-017-0894-8.

  • Chen, X., Chen, X., Wang, H. (2018). Robust feature screening for ultra-high dimensional right censored data via distance correlation. Computational Statistics and Data Analysis, 119, 118–138.

  • Fan, J., Li, R. (2002). Variable selection for Cox’s proportional hazards model and frailty model. The Annals of Statistics, 30, 74–99.

  • Fan, J., Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space (with discussion). Journal of Royal Statistical Society, Series B, 70, 849–911.

  • Fan, J., Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality. The Annals of Statistics, 38, 3567–3604.

  • Fan, J., Feng, Y., Wu, Y. (2010). Ultrahigh dimensional variable selection for Cox’s proportional hazards model. Institute of Mathematical Statistics Collections, 6, 70–86.

  • Fan, J., Samworth, R., Wu, Y. (2009). Ultrahigh dimensional variable selection: Beyond the linear model. Journal of Machine Learning Research, 10, 1829–1853.

  • Fan, J., Ma, Y., Dai, W. (2014). Nonparametric independent screening in sparse ultra-high dimensional varying coefficient models. Journal of the American Statistical Association, 109, 1270–1284.

  • Gorst-Rasmussen, A., Scheike, T. (2013). Independent screening for single-index hazard rate models with ultrahigh dimensional features. Journal of Royal Statistical Society, Series B, 75, 217–245.

  • He, X., Wang, L., Hong, H. (2013). Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. The Annals of Statistics, 41, 342–369.

  • Huang, J., Sun, T., Ying, Z., Yu, Y., Zhang, C. (2013). Oracle inequalities for the lasso in the Cox model. The Annals of Statistics, 41, 1142–1165.

  • Leng, C., Ma, S. (2007). Path consistent model selection in additive risk model via lasso. Statistics in Medicine, 26, 3753–3770.

  • Li, G., Peng, H., Zhang, J., Zhu, L. (2012a). Robust rank correlation based screening. The Annals of Statistics, 40, 1846–1877.

  • Li, R., Zhong, W., Zhu, L. (2012b). Feature screening via distance correlation learning. Journal of the American Statistical Association, 107, 1129–1139.

  • Lin, D., Ying, Z. (1994). Semiparametric analysis of the additive risk model. Biometrika, 81, 61–71.

  • Lin, W., Lv, J. (2013). High-dimensional sparse additive hazards regression. Journal of the American Statistical Association, 108, 247–264.

  • Liu, Y., Chen, X. (2018). Quantile screening for ultra-high-dimensional heterogeneous data conditional on some variables. Journal of Statistical Computation and Simulational, 88, 329–342.

  • Martinussen, T., Scheike, T. (2009). The additive hazards model with high-dimensional regressors. The Annals of Statistics, 15, 330–342.

  • Song, R., Lu, W., Ma, S., Jessie Jeng, X. (2014). Censored rank independence screening for high-dimensional survival data. Biometrika, 101, 799–814.

  • Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in Medicine, 16, 385–395.

    Article  Google Scholar 

  • Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., et al. (2001). Missing value estimation methods for DNA microarray. Bioinformatics, 17, 520–525.

    Article  Google Scholar 

  • van’t Veer, L., Dai, H., van de Vijver, M., He, Y., Hart, A., Mao, M., et al. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536.

    Article  Google Scholar 

  • Wu, Y., Yin, G. (2015). Conditional quantile screening in ultrahigh-dimensional heterogeneous data. Biometrika, 102, 65–76.

  • Xu, C., Chen, J. (2014). The sparse MLE for ultra-high-dimensional feature screening. Journal of the American Statistical Association, 109, 1257–1269.

  • Yang, G., Yu, Y., Li, R., Buu, A. (2016). Feature screening in ultrahigh dimensional Cox’s model. Statistics Sinica, 26, 881–901.

  • Yang, G., Hou, S., Wang, L., Sun, Y. (2018). Feature screening in ultrahigh-dimensional additive Cox model. Journal of Statistical Computation and Simulation, 88, 1117–1133.

  • Zhang, C., Zhang, T. (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statistical Science, 27, 576–593.

  • Zhang, J., Liu, Y., Wu, Y. (2017). Correlation rank screening for ultrahigh-dimensional survival data. Computational Statistics & Data Analysis, 108, 121–132.

  • Zhao, S., Li, Y. (2012). Principled sure independence screening for Cox model with ultra-high-dimensional covariates. Journal of Multivariate Analysis, 105, 397–411.

  • Zhao, S., Li, Y. (2014). Score test variable screening. Biometrics, 70, 862–871.

  • Zhou, T., Zhu, L. (2017). Model-free features screening for ultrahigh dimensional censored regression. Statistics and Computing, 27, 947–961.

  • Zhu, L., Li, L., Li, R., Zhu, L. (2011). Model-free feature screening for ultrahigh-dimensional data. Journal of the American Statistical Association, 106, 1464–1475.

Download references

Acknowledgements

Chen’s research was supported by the National Natural Science Foundation of China (11501573, 11326184 and 11771250) and and National Social Science Foundation of China (17BTJ019). Liu’s research was supported by the Fundamental Research Funds for the Central Universities (17CX02035A). Wang’s research was supported by the National Natural Science Foundation of China (General program 11171331, Key program 11331011 and program for Creative Research Group in China 61621003 ), a grant from the Key Lab of Random Complex Structure and Data Science, CAS and a grant from Zhejiang Gongshang University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qihua Wang.

Appendix: Proofs of the theorems

Appendix: Proofs of the theorems

To prove Theorem 1, we firstly give a large deviation result for martingale under the additive hazards model. This result could be proved along the similar line as those for Theorem 3.1 in Bradic et al. (2011). So we omit the proof here for simplicity.

Lemma 1

Under Assumptions 1–3, for any positive sequence \(\{u_{n}\}\) bounded away from zero, if \(\max \limits _{1 \le j\le p}\sigma _{j}^{2}=O(u_{n})\), there exist positive constants \(c_{7}\) and \(c_{8}\) such that

$$\begin{aligned} \mathrm {pr}(|\sqrt{n}U_{n,j}(\varvec{\beta }^{*})|>u_{n})\le c_{7}\mathrm {exp}(-c_{8}u_{n}) \end{aligned}$$

uniformly over j, where \(U_{n,j}(\varvec{\beta }^{*})\) is the jth component of \(\varvec{U}_{n}(\varvec{\beta }^{*})\).

This large deviation result represents a uniform, nonasymptotic exponential inequality for martingales under the additive hazards model and is independent of dimensionality p. So it will be very useful for the high-dimensional additive hazards model.

Proof of Theorem 1

Denote \(\hat{\varvec{\beta }}_{M}\) to be the (unrestricted) pseudo-score estimator of \(\varvec{\beta }\) based on model M. In order to establish the sure screening property, we just need to prove

$$\begin{aligned} \mathrm {pr}\big ({\hat{M}}\in {\mathbf {M}}_{+}^{k}\big )\longrightarrow 1, \end{aligned}$$

as n goes to \(\infty \). It suffices to show

$$\begin{aligned} \mathrm {pr} \left( \max _{M\in {\mathbf {M}}_{-}^{k}}L_{n}(\hat{\varvec{\beta }}_{M}) \ge \min _{M\in {\mathbf {M}}_{+}^{k}}L_{n}(\hat{\varvec{\beta }}_{M})\right) \longrightarrow 0, \end{aligned}$$

as n goes to \(\infty \).

For any \(M\in {\mathbf {M}}_{-}^{k}\), let \(M^{\prime }=M\cup M_{0}\in {\mathbf {M}}_{+}^{2k}\).

Firstly, consider \(\varvec{\beta }_{M^{\prime }}\) being close to \(\varvec{\beta }_{M^{\prime }}^{*}\) such that \(\Vert \varvec{\beta }_{M^{\prime }}-\varvec{\beta }_{M^{\prime }}^{*}\Vert _{2}=c_{2}n^{-\tau _{1}}\). After some algebraic manipulations, we have

$$\begin{aligned}&L_{n}\big (\varvec{\beta }_{M^{\prime }}\big )-L_{n}\big (\varvec{\beta }_{M^{\prime }}^{*}\big ) \nonumber \\&\quad =\big (\varvec{\beta }_{M^{\prime }}-\varvec{\beta }_{M^{\prime }}^{*}\big )^{T} \varvec{U}_{n,M^{\prime }}\big (\varvec{\beta }_{M^{\prime }}^{*}\big ) -\frac{1}{2}\big (\varvec{\beta }_{M^{\prime }}-\varvec{\beta }_{M^{\prime }}^{*}\big )^{T} \varvec{V}_{n,M^{\prime }} \big (\varvec{\beta }_{M^{\prime }}-\varvec{\beta }_{M^{\prime }}^{*}\big ). \end{aligned}$$

Then, by the Cauchy–Schwartz inequality and Assumption 5, we can conclude that

$$\begin{aligned}&L_{n}\big (\varvec{\beta }_{M^{\prime }}\big )-L_{n}\big (\varvec{\beta }_{M^{\prime }}^{*}\big ) \nonumber \\&\quad \le \Vert \varvec{\beta }_{M^{\prime }}-\varvec{\beta }_{M^{\prime }}^{*}\Vert _{2} \Vert \varvec{U}_{n,M^{\prime }}\big (\varvec{\beta }_{M^{\prime }}^{*}\big )\Vert _{2} -\frac{c_{4}}{2}\Vert \varvec{\beta }_{M^{\prime }}-\varvec{\beta }_{M^{\prime }}^{*}\Vert _{2}^{2} \nonumber \\&\quad \le \,c_{2}n^{-\tau _{1}} \Vert \varvec{U}_{n,M^{\prime }}\big (\varvec{\beta }_{M^{\prime }}^{*}\big )\Vert _{2} -\frac{1}{2}c_{4}c_{2}^{2}n^{-2\tau _{1}}. \end{aligned}$$

Thus, we have

$$\begin{aligned}&\mathrm {pr}\left( L_{n}\left( \varvec{\beta }_{M^{\prime }}\right) -L_{n}\left( \varvec{\beta }_{M^{\prime }}^{*}\right) \ge 0\right) \nonumber \\&\quad \le \,\mathrm {pr}\left( \Vert \varvec{U}_{n,M^{\prime }}\left( \varvec{\beta }_{M^{\prime }}^{*}\right) \Vert _{2}\ge \frac{1}{2}c_{2}c_{4}n^{-\tau _{1}}\right) \nonumber \\&\quad \le \,\sum \limits _{j\in M^{\prime }} \mathrm {pr}\left( U_{n,j}^{2}\left( \varvec{\beta }_{M^{\prime }}^{*}\right) \ge \frac{1}{2k}\left( \frac{1}{2}c_{2}c_{4}n^{-\tau _{1}}\right) ^{2}\right) \nonumber \\&\quad =\sum \limits _{j\in M^{\prime }} \mathrm {pr}\left( |U_{n,j}\left( \varvec{\beta }_{M^{\prime }}^{*}\right) |\ge \left( \frac{1}{2k}\right) ^{1/2}\frac{1}{2}c_{2}c_{4}n^{-\tau _{1}}\right) , \end{aligned}$$

where the second inequality is obtained by Bonferroni inequality.

Because \(M_{0}\subset M'\), we can get that \(U_{n,j}(\varvec{\beta }_{M^{\prime }}^{*})=U_{n,j}(\varvec{\beta }^{*})\). Then under the conditions in Theorem 1 and by Lemma 1, we have

$$\begin{aligned}&\mathrm {pr}\left( |U_{n,j}\left( \varvec{\beta }_{M^{\prime }}^{*}\right) |\ge \left( \frac{1}{2k}\right) ^{1/2}\frac{1}{2}c_{2}c_{4}n^{-\tau _{1}}\right) \nonumber \\&\quad =\mathrm {pr}\left( |\sqrt{n}U_{n,j}\left( \varvec{\beta }_{M^{\prime }}^{*}\right) |\ge \left( \frac{n}{2k}\right) ^{1/2}\frac{1}{2}c_{2}c_{4}n^{-\tau _{1}}\right) \nonumber \\&\quad =\mathrm {pr}\left( |\sqrt{n}U_{n,j}\left( \varvec{\beta }_{M^{\prime }}^{*}\right) |\ge \frac{1}{2\sqrt{2}}c_{2}c_{4}c_{3}^{-\frac{1}{2}}n^{\frac{1}{2}-\tau _{1}-\frac{\tau _{2}}{2}}\right) \nonumber \\&\quad \le \,c_{7}\mathrm {exp}(-c_{8}n^{\frac{1}{2}-\tau _{1}-\frac{\tau _{2}}{2}}). \end{aligned}$$

Then

$$\begin{aligned} \mathrm {pr}(L_{n}\big (\varvec{\beta }_{M^{\prime }}\big )-L_{n}\big (\varvec{\beta }_{M^{\prime }}^{*}\big )\ge 0) \le 2kc_{7}\mathrm {exp}\left( -c_{8}n^{\frac{1}{2}-\tau _{1}-\frac{\tau _{2}}{2}}\right) . \end{aligned}$$

Then, by the Bonferroni inequality and assumptions in Theorem 1, we can arrive at

$$\begin{aligned}&\mathrm {pr} \left( \max _{M\in {\mathbf {M}}_{-}^{k}}L_{n}\left( \varvec{\beta }_{M^{\prime }}\right) \ge L_{n}\left( \varvec{\beta }_{M^{\prime }}^{*}\right) \right) \nonumber \\&\quad \le \,\sum \limits _{M\in {\mathbf {M}}_{-}^{k}} \mathrm {pr}\left( L_{n}\left( \varvec{\beta }_{M^{\prime }}\right) \ge L_{n}\left( \varvec{\beta }_{M^{\prime }}^{*}\right) \right) \nonumber \\&\quad \le \,p^{k}2kc_{7}\mathrm {exp}\left( -c_{8}n^{\frac{1}{2}-\tau _{1}-\frac{\tau _{2}}{2}}\right) \nonumber \\&\quad \le \,2c_{7} \mathrm {exp}\left( \mathrm {log}(c_{3})+\tau _{2}\mathrm {log}(n)+c_{9}n^{m+\tau _{2}} -c_{8}n^{\frac{1}{2}-\tau _{1}-\frac{\tau _{2}}{2}}\right) \nonumber \\&\quad =o(1), \end{aligned}$$

where \(c_{9}\) is a positive constant.

By the concavity of \(L_{n}(\varvec{\beta }_{M^{\prime }})\), we can conclude that the above result holds for any \(\varvec{\beta }_{M^{\prime }}\) that \(\Vert \varvec{\beta }_{M^{\prime }}-\varvec{\beta }_{M^{\prime }}^{*}\Vert \ge c_{2}n^{-\tau _{1}}\).

For any \(M\in {\mathbf {M}}_{-}^{k}\), let \(\check{\varvec{\beta }}_{M^{\prime }}\) being \(\hat{\varvec{\beta }}_{M}\) augmented with zeros corresponding to the elements in \(M^{\prime }/M_{0}\). It is easy to see that \(\Vert \check{\varvec{\beta }}_{M^{\prime }}-\varvec{\beta }_{M^{\prime }}^{*}\Vert \ge \Vert \varvec{\beta }_{M_{0}/M}^{*}\Vert \ge c_{2}n^{-\tau _{1}}\). So we have

$$\begin{aligned}&\mathrm {pr} \left( \max _{M\in {\mathbf {M}}_{-}^{k}}L_{n}(\hat{\varvec{\beta }}_{M}) \ge \min _{M\in {\mathbf {M}}_{+}^{k}}L_{n}(\hat{\varvec{\beta }}_{M})\right) \nonumber \\&\quad =\mathrm {pr} \left( \max _{M\in {\mathbf {M}}_{-}^{k}}L_{n}\left( \check{\varvec{\beta }}_{M^{\prime }}\right) \ge \min _{M\in {\mathbf {M}}_{+}^{k}}L_{n}(\hat{\varvec{\beta }}_{M})\right) \nonumber \\&\quad \le \,\mathrm {pr} \left( \max _{M\in {\mathbf {M}}_{-}^{k}}L_{n}\left( \check{\varvec{\beta }}_{M^{\prime }}\right) \ge L_{n}\left( \varvec{\beta }_{M^{\prime }}^{*}\right) \right) \nonumber \\&\quad =o(1). \end{aligned}$$

Then the proof is finished. \(\square \)

Proof of Theorem 2

Denote \(Q_{n}(\varvec{\beta }\mid \varvec{{\hat{\beta }}}^{(t)})=L_{n}(\varvec{{\hat{\beta }}}^{(t)})+ (\varvec{\beta }-\varvec{{\hat{\beta }}}^{(t)})^{T}\varvec{U}_{n}(\varvec{{\hat{\beta }}}^{(t)}) -\frac{u}{2}\Vert \varvec{\beta }-\varvec{{\hat{\beta }}}^{(t)}\Vert _{2}^{2}\). Then \( \varvec{{\hat{\beta }}}^{(t+1)}=\mathop {\mathrm {argmin}}_{\varvec{\beta }\in {\mathcal {B}}(k)} \{-Q_{n}(\varvec{\beta }\mid \varvec{{\hat{\beta }}}^{(t)})\}. \)

After some algebraic manipulations, it is easy to see that

$$\begin{aligned}&L_{n}\left( \varvec{{\hat{\beta }}}^{(t)}\right) \nonumber \\&\quad = Q_{n}\left( \varvec{{\hat{\beta }}}^{(t)}|\varvec{{\hat{\beta }}}^{(t)}\right) \nonumber \\&\quad \le \,Q_{n}\left( \varvec{{\hat{\beta }}}^{(t+1)}|\varvec{{\hat{\beta }}}^{(t)}\right) \nonumber \\&\quad =L_{n}\left( \varvec{{\hat{\beta }}}^{(t)}\right) +\left( \varvec{{\hat{\beta }}}^{(t+1)}-\varvec{{\hat{\beta }}}^{(t)}\right) ^{T}\varvec{U}_{n}(\varvec{{\hat{\beta }}}^{(t)}) -\frac{u}{2}\Vert \varvec{{\hat{\beta }}}^{(t+1)}-\varvec{{\hat{\beta }}}^{(t)}\Vert _{2}^{2} \nonumber \\&\quad =L_{n}\left( \varvec{{\hat{\beta }}}^{(t+1)}\right) -\frac{u}{2}\Vert \varvec{{\hat{\beta }}}^{(t+1)}-\varvec{{\hat{\beta }}}^{(t)}\Vert _{2}^{2} +\left( \varvec{{\hat{\beta }}}^{(t+1)}-\varvec{{\hat{\beta }}}^{(t)}\right) ^{T}\varvec{U}_{n}\left( \varvec{{\hat{\beta }}}^{(t)}\right) \nonumber \\&\qquad +\,L_{n}\left( \varvec{{\hat{\beta }}}^{(t)}\right) -L_{n}\left( \varvec{{\hat{\beta }}}^{(t+1)}\right) \nonumber \\&\quad =L_{n}\left( \varvec{{\hat{\beta }}}^{(t+1)}\right) -\frac{u}{2}\Vert \varvec{{\hat{\beta }}}^{(t+1)}-\varvec{{\hat{\beta }}}^{(t)}\Vert _{2}^{2}+ \frac{1}{2}\left( \varvec{{\hat{\beta }}}^{(t+1)}-\varvec{{\hat{\beta }}}^{(t)}\right) ^{T}\varvec{V}_{n}\left( \varvec{{\hat{\beta }}}^{(t+1)}-\varvec{{\hat{\beta }}}^{(t)}\right) . \end{aligned}$$

It is easy to see that

$$\begin{aligned} \left( \varvec{{\hat{\beta }}}^{(t+1)}-\varvec{{\hat{\beta }}}^{(t)}\right) ^{T}\varvec{V}_{n}\left( \varvec{{\hat{\beta }}}^{(t+1)}-\varvec{{\hat{\beta }}}^{(t)}\right) \le \rho _{\mathrm {max}}(\varvec{V}_{n})\Vert \varvec{{\hat{\beta }}}^{(t+1)}-\varvec{{\hat{\beta }}}^{(t)}\Vert _{2}^{2}. \end{aligned}$$

So under the assumptions in Theorem 2, we have

$$\begin{aligned}&L_{n}\left( \varvec{{\hat{\beta }}}^{(t)}\right) \nonumber \\&\quad \le \,L_{n}\left( \varvec{{\hat{\beta }}}^{(t+1)}\right) -\frac{u}{2}\Vert \varvec{{\hat{\beta }}}^{(t+1)}-\varvec{{\hat{\beta }}}^{(t)}\Vert _{2}^{2}+\frac{1}{2}\rho _{\mathrm {max}}(\varvec{V}_{n})\Vert \varvec{{\hat{\beta }}}^{(t+1)}-\varvec{{\hat{\beta }}}^{(t)}\Vert _{2}^{2}\nonumber \\&\quad = L_{n}\left( \varvec{{\hat{\beta }}}^{(t+1)}\right) -\frac{1}{2}(u-\rho _{\mathrm {max}}(\varvec{V}_{n}))\Vert \varvec{{\hat{\beta }}}^{(t+1)}-\varvec{{\hat{\beta }}}^{(t)}\Vert _{2}^{2}\nonumber \\&\quad \le \,L_{n}\left( \varvec{{\hat{\beta }}}^{(t+1)}\right) . \end{aligned}$$

This ends up the proof. \(\square \)

Before presenting the proof of Theorem 2, let’s introduce a lemma firstly.

Lemma 2

Define \(\varvec{{\hat{\beta }}}^{(0)}=\mathrm {argmax}_{\varvec{\beta }}\{L_{n}(\varvec{\beta })-\lambda \Vert \varvec{\beta }\Vert _{1}\}\), where \(\lambda \) satisfies \(\lambda n^{\frac{1}{2}-m}\rightarrow \infty \), \(\lambda n^{\tau _{1}+\tau _{2}}\rightarrow 0\). Under Assumptions 1–3 and 6, if \(\max \limits _{1 \le j\le p}\sigma _j^2=O(\lambda n^{\frac{1}{2}})\), we have

$$\begin{aligned} \mathrm {pr}\left( \Vert \varvec{{\hat{\beta }}}^{(0)}-\varvec{\beta }^{*}\Vert _{1}\le 8c_{5}^{-1}\lambda q\right) \rightarrow 1, \end{aligned}$$

where \(c_{5}\) is defined in Assumption 6.

Proof

It is easy to see that

$$\begin{aligned} L_{n}\left( \varvec{{\hat{\beta }}}^{(0)}\right) -\lambda \Vert \varvec{{\hat{\beta }}}^{(0)}\Vert _{1}- \left( L_{n}(\varvec{\beta }^{*})-\lambda \Vert \varvec{\beta }^{*}\Vert _{1}\right) \ge 0, \end{aligned}$$

or equivalently

$$\begin{aligned} L_{n}(\varvec{\beta }^{*})-L_{n}(\varvec{{\hat{\beta }}}^{(0)})\le \lambda \Vert \varvec{\beta }^{*}\Vert _{1}-\lambda \Vert \varvec{{\hat{\beta }}}^{(0)}\Vert _{1}. \end{aligned}$$

Define \(\varvec{\delta }=(\varvec{{\hat{\beta }}}^{(0)}-\varvec{\beta }^{*})=(\delta _{1},\ldots ,\delta _{p})^{T}\). By some algebraic manipulations, we have

$$\begin{aligned}&L_{n}(\varvec{{\hat{\beta }}}^{(0)})-L_{n}(\varvec{\beta }^{*}) \nonumber \\&\quad =\varvec{b}_{n}^{T}\varvec{{\hat{\beta }}}^{(0)}-\frac{1}{2}\varvec{{\hat{\beta }}}^{(0) T}\varvec{V}_{n}\varvec{{\hat{\beta }}}^{(0)} -\left\{ \varvec{b}_{n}^{T}\varvec{\beta }^{*}-\frac{1}{2}\varvec{\beta }^{*T}\varvec{V}_{n}\varvec{\beta }^{*}\right\} \nonumber \\&\quad =(\varvec{{\hat{\beta }}}^{(0)}-\varvec{\beta }^{*})^{T}\left\{ \varvec{b}_{n}-\frac{1}{2}\varvec{V}_{n}\varvec{\beta }^{*}\right\} -\frac{1}{2}(\varvec{{\hat{\beta }}}^{(0)}-\varvec{\beta }^{*})^{T}\varvec{V}_{n}(\varvec{{\hat{\beta }}}^{(0)}-\varvec{\beta }^{*}) \nonumber \\&\quad =\varvec{\delta }^{T}U_{n}(\varvec{\beta }^{*})-\frac{1}{2}\varvec{\delta }^{T}\varvec{V}_{n}\varvec{\delta }. \end{aligned}$$

Then we have

$$\begin{aligned}&\varvec{\delta }^{T}\varvec{V}_{n}\varvec{\delta } \nonumber \\&\quad =2\varvec{\delta }^{T}\varvec{U}_{n}(\varvec{\beta }^{*}) +L_{n}(\varvec{\beta }^{*})-L_{n}(\varvec{{\hat{\beta }}}^{(0)}) \nonumber \\&\quad \le \,2\varvec{\delta }^{T}\varvec{U}_{n}(\varvec{\beta }^{*}) +\lambda \Vert \varvec{\beta }^{*}\Vert _{1}-\lambda \Vert \varvec{{\hat{\beta }}}^{(0)}\Vert _{1}. \end{aligned}$$

Denote \({\mathcal {A}}=\{\max \nolimits _{1 \le j\le p}|U_{n,j}(\varvec{\beta }^{*})|\le \frac{\lambda }{4}\}\). Because \(\max \limits _{1 \le j\le p}\sigma _{j}^{2}=O(\lambda n^{\frac{1}{2}})\), then by Lemma 1, we have

$$\begin{aligned}&\mathrm {pr}({\mathcal {A}}^{c})\nonumber \\&\quad \le \,\sum _{j=1}^{p}\mathrm {pr}\left( |U_{n,j}(\varvec{\beta }^{*})|>\frac{\lambda }{4}\right) \nonumber \\&\quad = \sum _{j=1}^{p} \mathrm {pr}\left( |\sqrt{n}U_{n,j}(\varvec{\beta }^{*})|>\frac{\sqrt{n}\lambda }{4}\right) \nonumber \\&\quad \le \,p c_{7}\mathrm {exp}\left( -c_{8}\frac{\sqrt{n}\lambda }{4}\right) \nonumber \\&\quad \le \,c_{7}\mathrm {exp}\left( c_{10}n^{m}-c_{8}\frac{\sqrt{n}\lambda }{2}\right) \nonumber \\&\quad \rightarrow 0, \end{aligned}$$

where \(c_{10}\) is a positive constant. So we obtain that \(\mathrm {pr}({\mathcal {A}})\rightarrow 1\) and \(\Vert \varvec{U}_{n}(\varvec{\beta }^{*})\Vert _{\infty }=O_{p}(\lambda )\). Under the event \({\mathcal {A}}\), it is easy to see that

$$\begin{aligned}&\varvec{\delta }^{T}\varvec{V}_{n}\varvec{\delta } \nonumber \\&\quad \le \frac{1}{2}\lambda \Vert \varvec{\delta }\Vert _{1}+\lambda \Vert \varvec{\beta }^{*}\Vert _{1}-\lambda \Vert \varvec{{\hat{\beta }}}^{(0)}\Vert _{1}. \end{aligned}$$

Thus

$$\begin{aligned}&\varvec{\delta }^{T}\varvec{V}_{n}\varvec{\delta }+\frac{1}{2}\lambda \Vert \varvec{\delta }\Vert _{1} \nonumber \\&\quad \le \,\lambda \Vert \varvec{\delta }\Vert _{1}+\lambda \Vert \varvec{\beta }^{*}\Vert _{1}-\lambda \Vert \varvec{{\hat{\beta }}}^{(0)}\Vert _{1} \nonumber \\&\quad \le \,\lambda \sum _{j=1}^{p} \left( |{\hat{\beta }}_{j}^{(0)}-\beta _{j}^{*}|+|\beta _{j}^{*}|-|{\hat{\beta }}_{j}^{(0)}|\right) \nonumber \\&\quad = \lambda \sum _{j\in M_{0}} \left( |{\hat{\beta }}_{j}^{(0)}-\beta _{j}^{*}|+|\beta _{j}^{*}|-|{\hat{\beta }}_{j}^{(0)}|\right) \nonumber \\&\quad \le \,2\lambda \sum _{j\in M_{0}}|\delta _{j}| \nonumber \\&\quad \le \,2\lambda \Vert \varvec{\delta }_{M_{0}}\Vert _{1}. \end{aligned}$$

It is easy to see that \(\varvec{V}_{n}\) is semipositive definite. Thus \(\Vert \varvec{\delta }\Vert _{1}\le 4\Vert \varvec{\delta }_{M_{0}}\Vert _{1}\), and furthermore \(\Vert \varvec{\delta }_{M_{0}^{c}}\Vert _{1}\le 3\Vert \varvec{\delta }_{M_{0}}\Vert _{1}\). By the Cauchy–Schwarz inequality and Assumption 6,

$$\begin{aligned} \Vert \varvec{\delta }_{M_{0}}\Vert _{1}^{2}\le q \Vert \varvec{\delta }_{M_{0}}\Vert _{2}^{2} \le q c_{5}^{-1}\varvec{\delta }^{T}\varvec{V}_{n}\varvec{\delta } \le 2 c_{5}^{-1}\lambda q \Vert \varvec{\delta }_{M_{0}}\Vert _{1}. \end{aligned}$$

So \(\Vert \varvec{\delta }_{M_{0}}\Vert _{1}\le 2 c_{5}^{-1}\lambda q\). Then finally we arrive at

$$\begin{aligned} \Vert \varvec{\delta }\Vert _{1}=\Vert \varvec{\delta }_{M_{0}^{c}}\Vert _{1}+\Vert \varvec{\delta }_{M_{0}}\Vert _{1} \le 4 \Vert \varvec{\delta }_{M_{0}}\Vert _{1} \le 8 c_{5}^{-1}\lambda q. \end{aligned}$$

This finishes the proof. \(\square \)

Proof of Theorem 3

Recall that \(w=\mathrm {min}_{j\in M_{0}}\Vert \beta _{j}^{*}\Vert \). We just need to show \(\mathrm {pr}(\Vert \varvec{{\hat{\beta }}}^{(t)}-\varvec{\beta }^{*}\Vert _{\infty }<\frac{w}{2})\rightarrow 1\). It suffices to prove \(\Vert \varvec{{\hat{\beta }}}^{(t)}-\varvec{\beta }^{*}\Vert _{\infty }=o_{p}(w)\). As in Xu and Chen (2014), we use the method of mathematical induction to get this result.

When \(t=0\), by Lemma 2, we have

$$\begin{aligned} \mathrm {pr}(\Vert \varvec{{\hat{\beta }}}^{(0)}-\varvec{\beta }^{*}\Vert _{1}\le 8c_{5}^{-1}\lambda q)\rightarrow 1. \end{aligned}$$

Because \(\lambda =o(n^{-(\tau _{1}+\tau _{2})})\), \(q=O(n^{\tau _{2}})\), \(w^{-1}=O(n^{\tau _{1}})\), \(\lambda qw^{-1}=o(n^{-(\tau _{1}+\tau _{2})})O(n^{\tau _{2}})O(n^{\tau _{1}})=o(1)\). Thus \(\lambda q=o(w)\). So we have \(\Vert \varvec{{\hat{\beta }}}^{(0)}-\varvec{\beta }^{*}\Vert _{1}=o_{p}(w)\). It is noted that \(\Vert \varvec{{\hat{\beta }}}^{(0)}-\varvec{\beta }^{*}\Vert _{\infty }\le \Vert \varvec{{\hat{\beta }}}^{(0)}-\varvec{\beta }^{*}\Vert _{1}\). Then the desired result is obtained for \(t=0\).

Suppose that \(\Vert \varvec{{\hat{\beta }}}^{(t-1)}-\varvec{\beta }^{*}\Vert _{\infty }=o_{p}(w)\). In the following, we will show that \(\Vert \varvec{{\hat{\beta }}}^{(t)}-\varvec{\beta }^{*}\Vert _{\infty }=o_{p}(w)\) is also true. From the adaptive iterative hard-thresholding algorithm, it is noted that \(\varvec{{\hat{\beta }}}^{(t)}={\mathbf {H}}(\varvec{{\tilde{\beta }}}^{(t-1)};k)\), where \(\varvec{{\tilde{\beta }}}^{(t-1)}=\varvec{{\hat{\beta }}}^{(t-1)}+u^{-1}{\dot{L}}_{n}(\varvec{{\hat{\beta }}}^{(t-1)})\). If \(\Vert \varvec{{\tilde{\beta }}}^{(t-1)}-\varvec{\beta }^{*}\Vert _{\infty }=o_{p}(w)\) holds, it can be seen that elements of \(\varvec{{\tilde{\beta }}}^{(t-1)}_{M_{0}}\) are among the ones with top k largest absolute values in probability. Thus \(\Vert \varvec{{\hat{\beta }}}^{(t)}-\varvec{\beta }^{*}\Vert _{\infty }\le \Vert \varvec{{\tilde{\beta }}}^{(t-1)}-\varvec{\beta }^{*}\Vert _{\infty }=o_{p}(w)\). So what remains is to prove \(\Vert \varvec{{\tilde{\beta }}}^{(t-1)}-\varvec{\beta }^{*}\Vert _{\infty }=o_{p}(w)\). Note that \(\Vert \varvec{{\tilde{\beta }}}^{(t-1)}-\varvec{\beta }^{*}\Vert _{\infty }\le \Vert \varvec{{\hat{\beta }}}^{(t-1)}-\varvec{\beta }^{*}\Vert _{\infty }+ u^{-1}\Vert \varvec{U}_{n}(\varvec{{\hat{\beta }}}^{(t-1)})\Vert _{\infty }\). By some algebraic manipulations, we could obtain that

$$\begin{aligned}&\Vert \varvec{U}_{n}(\varvec{{\hat{\beta }}}^{(t-1)})\Vert _{\infty } \nonumber \\&\quad =\Vert \varvec{U}_{n}(\varvec{\beta }^{*})+\varvec{U}_{n}(\varvec{{\hat{\beta }}}^{(t-1)})-\varvec{U}_{n}(\varvec{\beta }^{*})\Vert _{\infty } \nonumber \\&\quad \le \,\Vert \varvec{U}_{n}(\varvec{\beta }^{*})\Vert _{\infty }+ \Vert \varvec{U}_{n}(\varvec{{\hat{\beta }}}^{(t-1)})-\varvec{U}_{n}(\varvec{\beta }^{*})\Vert _{\infty } \nonumber \\&\quad = \Vert \varvec{U}_{n}(\varvec{\beta }^{*})\Vert _{\infty }+ \Vert \varvec{V}_{n}(\varvec{\beta }^{*}-\varvec{{\hat{\beta }}}^{(t-1)})\Vert _{\infty } \nonumber \\&\quad \le \,\Vert \varvec{U}_{n}(\varvec{\beta }^{*})\Vert _{\infty }+ \Vert \varvec{V}_{n}\Vert _{\infty }\Vert \varvec{\beta }^{*}-\varvec{{\hat{\beta }}}^{(t-1)}\Vert _{\infty }. \end{aligned}$$

Thus

$$\begin{aligned}&u^{-1}\Vert \varvec{U}_{n}(\varvec{{\hat{\beta }}}^{(t-1)})\Vert _{\infty } \nonumber \\&\quad \le \,u^{-1}O_{p}(\lambda )+u^{-1}\Vert \varvec{V}_{n}\Vert _{\infty }o_{p}(w) \nonumber \\&\quad \le \,(c_{6}r)^{-1}\lambda O_{p}(1)+(c_{6}r)^{-1}n^{\tau _{3}}o_{p}(w)O_{p}(1)\nonumber \\&\quad = c_{6}^{-1}O(n^{-\tau _{3}})o(n^{-(\tau _{1}+\tau _{2})}) O_{p}(1) +c_{6}^{-1}O_{p}(n^{-\tau _{3}})n^{\tau _{3}}o_{p}(w)\nonumber \\&\quad =o_{p}(w). \end{aligned}$$

This ends up the proof. \(\square \)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, X., Liu, Y. & Wang, Q. Joint feature screening for ultra-high-dimensional sparse additive hazards model by the sparsity-restricted pseudo-score estimator. Ann Inst Stat Math 71, 1007–1031 (2019). https://doi.org/10.1007/s10463-018-0675-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-018-0675-8

Keywords

Navigation