Broken adaptive ridge regression for right-censored survival data

Sun, Zhihua; Liu, Yi; Chen, Kani; Li, Gang

doi:10.1007/s10463-021-00794-3

Broken adaptive ridge regression for right-censored survival data

Published: 05 April 2021

Volume 74, pages 69–91, (2022)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Zhihua Sun¹,
Yi Liu¹,
Kani Chen² &
…
Gang Li³

566 Accesses
5 Citations
Explore all metrics

Abstract

Broken adaptive ridge (BAR) is a computationally scalable surrogate to $L_0$-penalized regression, which involves iteratively performing reweighted $L_2$ penalized regressions and enjoys some appealing properties of both $L_0$ and $L_2$ penalized regressions while avoiding some of their limitations. In this paper, we extend the BAR method to the semi-parametric accelerated failure time (AFT) model for right-censored survival data. Specifically, we propose a censored BAR (CBAR) estimator by applying the BAR algorithm to the Leurgan’s synthetic data and show that the resulting CBAR estimator is consistent for variable selection, possesses an oracle property for parameter estimation and enjoys a grouping property for highly correlation covariates. Both low- and high-dimensional covariates are considered. The effectiveness of our method is demonstrated and compared with some popular penalization methods using simulations. Real data illustrations are provided on a diffuse large-B-cell lymphoma data and a glioblastoma multiforme data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Censored broken adaptive ridge regression in high-dimension

Article 17 January 2024

Variable selection in proportional odds model with informatively interval-censored data

Article 29 September 2023

Bayesian penalized Buckley-James method for high dimensional bivariate censored regression models

Article 03 March 2022

References

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.
Article MathSciNet MATH Google Scholar
Box, J. K., Paquet, N., Adams, M. N., Boucher, D., Bolderson, E., Obyrne, K. J., Richard, D. J. (2016). Nucleophosmin: From structure and function to disease development. BMC Molecular Biology, 17(19), 1–12.
Google Scholar
Breheny, P., Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. The Annals of Applied Statistics, 5(1), 232–253.
Article MathSciNet MATH Google Scholar
Breiman, L. (1996). Heuristics of instability and stabilization in model selection. Annals of Statistics, 24, 2350–2383.
Article MathSciNet MATH Google Scholar
Buckley, J., James, I. (1979). Linear regression with censored data. Biometrika, 66(3), 429–436.
Article MATH Google Scholar
Cai, T., Huang, J., Tian, L. (2009). Regularized estimation for the accelerated failure time model. Biometrics, 65(2), 394–404.
Article MathSciNet MATH Google Scholar
Chen, J., Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95, 759–771.
Article MathSciNet MATH Google Scholar
Cox, B. D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2), 187–220.
MathSciNet MATH Google Scholar
Cui, H., Li, R., Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110(510), 630–641.
Article MathSciNet MATH Google Scholar
Dai, L., Chen, K., Sun, Z., Liu, Z., Li, G. (2018). Broken adaptive ridge regression and its asymptotic properties. Journal of Multivariate Analysis, 168, 334–351.
Article MathSciNet MATH Google Scholar
Dai, L., Chen, K., Li, G. (2020). The broken adaptive ridge procedure and its applications. Statistica Sinica, 30(2), 1069–1094.
MathSciNet MATH Google Scholar
Datta, S., Le-Rademacher, J., Datta, S. (2007). Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and lasso. Biometrics, 63(1), 259–271.
Article MathSciNet Google Scholar
Eirín-López, J. M., Frehlick, L. J., Ausió, J. (2006). Long-term evolution and functional diversification in the members of the nucleophosmin/nucleoplasmin family of nuclear chaperones. Genetics, 173(4), 1835–1850.
Article Google Scholar
Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.
Article MathSciNet MATH Google Scholar
Fan, J., Li, R. (2002). Variable selection for cox’s proportional hazards model and frailty model. Annals of Statistics, 30(1), 74–99.
Article MathSciNet MATH Google Scholar
Fan, J., Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Methodological), 70(5), 849–911.
Article MathSciNet MATH Google Scholar
Foster, D., George, E. (1994). The risk inflation criterion for multiple regression. Annals of Statistics, 22, 1947–1975.
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.
Article Google Scholar
Huang, J., Ma, S. (2010). Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Analysis, 16(2), 176–95.
Article MathSciNet MATH Google Scholar
Huang, J., Ma, S., Xie, H. (2006). Regularized estimation in the accelerated failure time model with high-dimensional covariates. Biometrics, 62(3), 813–820.
Article MathSciNet MATH Google Scholar
Johnson, B. A. (2009). On lasso for censored data. Electronic Journal of Statistics, 3(2009), 485–506.
MathSciNet MATH Google Scholar
Johnson, B. A., Lin, D. Y., Zeng, D. (2008). Penalized estimating functions and variable selection in semiparametric regression models. Journal of the American Statistical Association, 103(482), 672–680.
Article MathSciNet MATH Google Scholar
Johnson, K. D., Lin, D., Ungar, L. H., Foster, D., Stine, R. (2015). A risk ratio comparison of $l_0$ and $l_1$ penalized regression. arXiv:1510.06319 [math.ST].
Kalbfleisch, J. D., Prentice, R. L. (2002). The statistical analysis of failure time data (2nd ed.). Hoboken: Wiley.
Book MATH Google Scholar
Kawaguchi, E. S., Suchard, M. A., Liu, Z., Li, G. (2020). A surrogate $l0$ sparse cox’s regression with applications to sparse high-dimensional massive sample size time-to-event data. Statistics in Medicine, 39(6), 675–686.
Article MathSciNet Google Scholar
Koul, H., Susarla, V., Ryzin, J. V. (1981). Regression analysis with randomly right-censored data. Annals of Statistics, 9(6), 1276–1288.
Article MathSciNet MATH Google Scholar
Leurgans, S. (1987). Linear models, random censoring and synthetic data. Biometrika, 74(2), 301–309.
Article MathSciNet MATH Google Scholar
Li, Y., Dicker, L., Zhao, S. D. (2014). The dantzig selector for censored linear regression models. Statistica Sinica, 24(1), 251–2568.
MathSciNet MATH Google Scholar
Liu, Y., Chen, X., Li, G. (2020). A new joint screening method for right-censored time-to-event data with ultra-high dimensional covariates. Statistical Methods in Medical Research, 29(6), 1499–1513.
Article MathSciNet Google Scholar
Mallows, C. (1973). Some comments on $c_p$. Technometrics, 15, 661–675.
MATH Google Scholar
Mummenhoff, J., Houweling, A. C., Peters, T., Christoffels, V. M., Rther, U. (2001). Expression of Irx6 during mouse morphogenesis. Mechanisms of Development, 103(1–2), 193–195.
Article Google Scholar
Nachmani, D., Bothmer, A. H., Grisendi, S., Mele, A., Pandolfi, P. P. (2019). Germline NPM1 mutations lead to altered rRNA 2-O-methylation and cause dyskeratosis congenita. Nature Genetics, 51(10), 1518–1529.
Article Google Scholar
Nardi, Y., Rinaldo, A. (2008). On the asymptotic properties of the group lasso estimator for linear models. Electronic Journal of Statistics, 2, 605–633.
Article MathSciNet MATH Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.
Article MathSciNet MATH Google Scholar
Shen, X., Pan, W., Zhu, Y. (2012). Likelihood-based selection and sharp parameter estimation. Journal of the American Statistical Association, 107, 223–232.
Article MathSciNet MATH Google Scholar
Stute, W. (1993). Consistent estimation under random censorship when covariables are present. Journal of Multivariate Analysis, 45(1), 89–103.
Article MathSciNet MATH Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.
MathSciNet MATH Google Scholar
Tibshirani, R. (1997). The lasso method for variable selection in the cox model. Statistics in Medicine, 16(4), 385–395.
Article Google Scholar
Wang, S., Nan, B., Zhu, J., Beer, D. G. (2008). Doubly penalized Buckley–James method for survival data with high-dimensional covariates. Biometrics, 64(1), 132–140.
Article MathSciNet MATH Google Scholar
Yuan, M., Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67.
Article MathSciNet MATH Google Scholar
Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38(2), 894–942.
Article MathSciNet MATH Google Scholar
Zhao, H., Wu, Q., Li, G., Sun, J. (2019). Simultaneous estimation and variable selection for interval-censored data with broken adaptive ridge regression. Journal of the American Statistical Association, 115(529), 204–216.
Article MathSciNet MATH Google Scholar
Zhou, M. (1992). Asymptotic normality of the synthetic data regression estimator for censored survival data. Annals of Statistics, 20(2), 1002–1021.
Article MathSciNet MATH Google Scholar
Zhu, L., Li, L., Li, R., Zhu, L. (2011). Model-free feature screening for ultrahigh dimensional data. Journal of the American Statistical Association, 106(496), 1464–1475.
Article MathSciNet MATH Google Scholar
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429.
Article MathSciNet MATH Google Scholar
Zou, H., Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We are grateful to the referees, the associate editor and the editor for their helpful comments. The Glioblastoma multiforme data used in Sect. 4.2 are generated by the TCGA Research Network: https://www.cancer.gov/tcga.

Author information

Authors and Affiliations

Department of Mathematics, Ocean University of China, Qingdao, 266000, China
Zhihua Sun & Yi Liu
Department of Mathematics, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
Kani Chen
Biostatistics and Computational Medicine, University of California, Los Angeles, CA, 90095-1772, USA
Gang Li

Authors

Zhihua Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kani Chen
View author publications
You can also search for this author in PubMed Google Scholar
Gang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gang Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The research of Gang Li was partly supported by National Institute of Health Grants P30 CA-16042, P50 CA211015, and UL1TR000124-02. The research of Zhihua Sun was partly supported by Natural Science Foundation of China 11871444. The research of Yi Liu was partly supported by Natural Science Foundation of China 11801567.

Appendix: Proofs of the theorems

We first introduce notations and lemmas used to prove Theorem 1.

Using Leurgans (1987) method, we transform ${\mathbf{Y}}$ into synthetic data ${\mathbf{Y}}^*$. Let ${\varvec{\beta }}=( {\varvec{\alpha }}^{\top }, {\varvec{\gamma }}^{\top })^{\top }$, where ${\varvec{\alpha }}$ and ${\varvec{\gamma }}$ are $q_n\times 1$ and $(p_n-q_n) \times 1$ vector, respectively, ${\varvec{\Sigma }}_n={\mathbf{x}}^{\top } {\mathbf{x}}/n$.

$$\begin{aligned} g( {\varvec{\beta }})=\{ {\mathbf{x}}^{\top } {\mathbf{x}}+\lambda _n {\mathbf{D}}( {\varvec{\beta }})\}^{-1}{} {\mathbf{x}}^{\top } {\mathbf{Y}}^* =( {\varvec{\alpha }}^{*}( {\varvec{\beta }})^{\top }, {\varvec{\gamma }}^*( {\varvec{\beta }})^{\top })^{\top }. \end{aligned}$$

(10)

For simplicity, we write ${\varvec{\alpha }}^*( {\varvec{\beta }})$ and ${\varvec{\gamma }}^*( {\varvec{\beta }})$ as ${\varvec{\alpha }}^*$ and ${\varvec{\gamma }}^*$ hereafter. ${\varvec{\Sigma }}_n^{-1}$ can be partitioned as

$$\begin{aligned} {\varvec{\Sigma }}_n^{-1}=\begin{pmatrix} \mathbf{A}_{11} &{} \mathbf{A}_{12}\\ \mathbf{A}^{\top }_{12} &{} \mathbf{A}_{22} \end{pmatrix} \end{aligned}$$

where the $A_{11}$ is a $q\times q$ matrix. Multiplying $( {\mathbf{x}}^{\top } {\mathbf{x}})^{-1}( {\mathbf{x}}^{\top } {\mathbf{x}}+\lambda _n {\mathbf{D}}( {\varvec{\beta }}))$ to equation (10)

$$\begin{aligned} \begin{pmatrix} {{\varvec{\alpha }}}^*-{{\varvec{\beta }}}_{01}\\ {{\varvec{\gamma }}}^* \end{pmatrix} +\frac{\lambda _n}{n}\begin{pmatrix} \mathbf{A}_{11}{} {\mathbf{D}}_1({\varvec{\alpha }}){\varvec{\alpha }}^*+\mathbf{A}_{12}{} {\mathbf{D}}_2({\varvec{\gamma }}){\varvec{\gamma }}^*\\ \mathbf{A}_{12}^{\top }{} {\mathbf{D}}_1({\varvec{\alpha }}){\varvec{\alpha }}^*+\mathbf{A}_{22}{} {\mathbf{D}}_2({\varvec{\gamma }}){\varvec{\gamma }}^* \end{pmatrix}=({\mathbf{x}}^{\top }{} {\mathbf{x}})^{-1}{} {\mathbf{x}}^{\top } {\varvec{\varepsilon }}^* {={\hat{{\varvec{\beta }}}}_{\mathrm{Z}}-{\varvec{\beta }}_0}, \end{aligned}$$

(11)

where ${\varvec{\varepsilon }}^*={\mathbf{Y}}^*-{\mathbf{x}}{{\varvec{\beta }}_0}$, ${\hat{{\varvec{\beta }}}}_\mathrm{Z}=({\mathbf{x}}^{\top }{} {\mathbf{x}})^{-1}{} {\mathbf{x}}^{\top }{} {\mathbf{Y}}^*$, ${\mathbf{D}}_1( {\varvec{\alpha }})={\text{ diag }} (\alpha _1^{-2},...,\alpha _{{q}}^{-2})$ and ${\mathbf{D}}_2( {\varvec{\gamma }})={\text{ diag }} (\gamma _1^{-2},...,\gamma _{p_n-{q}}^{-2})$.

Lemma 1

Let $\delta _n$ be a sequence of positive real numbers satisfying $\delta _n \rightarrow \infty$ and $p_n\delta _n^2/\lambda _n \rightarrow 0$. Define $\mathbf{H}_n = \{ {\varvec{\beta }}\in {\mathbb {R}}^{p_n}: \Vert {\varvec{\beta }}-{\varvec{\beta }}_0\Vert \le \delta _n\sqrt{p_n/n}\}$ and $\mathbf{H}_{n1} = \{ {\varvec{\alpha }}\in {\mathbb {R}}^{{q}}: \Vert {\varvec{\alpha }}-{\varvec{\beta }}_{01}\Vert \le \delta _n\sqrt{p_n/n}\}$. Assume conditions (C1)–(C5) hold. Then, with probability tending to 1, we have

(a)
$\sup _{ {\varvec{\beta }} \in \mathbf{H}_n} {\Vert {\varvec{\gamma }}^*\Vert }/{\Vert {\varvec{\gamma }}\Vert }< {1}/{C_0}, {\text{ for some constant }} C_0>1$;
(b)
g is a mapping from $\mathbf{H}_n$ to itself.

Proof

We first prove part (a).

First, under $\lambda _n/\sqrt{n} \rightarrow 0$ and $p_n\delta _n^2/\lambda _n \rightarrow 0$, we have $\delta _n\sqrt{p_n/n} \rightarrow 0$.

Let ${\hat{{\varvec{\beta }}}}_\mathrm{Z}=({\mathbf{x}}^{\top }{} {\mathbf{x}})^{-1}{} {\mathbf{x}}^{\top }{} {\mathbf{Y}}^*$, $\omega _{ji}=(({\mathbf{x}}^{\top } {\mathbf{x}})^{-1}{} {\mathbf{x}}^{\top } )_{ji}$, $\mu _j^*=\sum _i \omega _{ji}\int _{0}^{T_n}{F_i \mathrm{d}t}$ and ${\varvec{\mu }} =(\mu _1^*, \mu _2^*, ..., \mu _{pn}^*)$. For any $p_n$-vector ${\mathbf{b}}_n$ which $\Vert {\mathbf{b}}_n\Vert \le 1$, define $t_n^2= {\mathbf{b}}_n^{\top } {\varvec{\Omega }}(\infty )) {\mathbf{b}}_n$. Then, we have $\sqrt{n} \, t_n^{-1} {\mathbf{b}}_n^{\top } ({\hat{{\varvec{\beta }}}}_\mathrm{Z}-{\varvec{\omega }}) \rightarrow _D N(0,1).$ This result can be proved using similar techniques to those used in the proof of Theorem 3.1 of Zhou (1992) along the same lines as outlined below: First, we separate ${\mathbf{b}}_n^{\top } ({\hat{{\varvec{\beta }}}}_\mathrm{Z}-{\varvec{\omega }})$ like (3.6) in Zhou (1992) with a main term $S_{{\varvec{\beta }}}(T^n)$ and a remainder term $SS_{{\varvec{\beta }}}(T^n)$, i.e., ${\mathbf{b}}_n^{\top } ({\hat{{\varvec{\beta }}}}_\mathrm{Z}-{\varvec{\omega }})=S_{{\varvec{\beta }}}(T^n)+SS_{{\varvec{\beta }}}(T^n)$, where $S_{{\varvec{\beta }}}(T^n)$ is a weighted sum of ${{\hat{H}}}(t)-H(t)$ and ${{\hat{G}}}(t)-G(t)$; and $SS_{{\varvec{\beta }}}(T^n)$ is a weighted sum of $({{\hat{H}}}(t)-H(t))({{\hat{G}}}(t)-G(t))$ and $({{\hat{H}}}(t)-H(t))({{\hat{H}}}(t)-H(t))$. Second, under conditions (C2) and (C3), one can show that $\sqrt{n}SS_{{\varvec{\beta }}}(T^n)$ is negligible. Finally, by applying the martingale central limit theorem and conditions (C1) and (C4), we establish the asymptotic normality of $\sqrt{n}S_{{\varvec{\beta }}}(T^n)$. By conditions (C1) and (C2), we have $\sqrt{n}t_n^{-1}{} {\mathbf{b}}_n^{\top }({\varvec{\beta }}_0 - {\varvec{\omega }}) = o_p(1)$, for ${\mathbf{b}}_n={\mathbf{e}}_i=(0,...,1,0,...,0)$. Hence, we have $\Vert {\hat{{\varvec{\beta }}}}_{\mathrm{Z}}-{\varvec{\beta }}_0\Vert ^2=O_p(p_n/n)$.

It then follows from (11) that

$$\begin{aligned} \sup _{ \beta \in \mathbf{H}_n}\big \Vert {\varvec{\gamma }}^*+ \lambda _n \mathbf{A}_{12}^{\top } {\mathbf{D}}_1( {\varvec{\alpha }}) {\varvec{\alpha }}^*/n + \lambda _n \mathbf{A}_{22} {\mathbf{D}}_2( {\varvec{\gamma }}) {\varvec{\gamma }}^*/n \big \Vert =O_p(\sqrt{{p_n}/{n}}). \end{aligned}$$

(12)

Note that $\Vert {\varvec{\alpha }} - {\varvec{\beta }}_{01}\Vert \le \delta _n(p_n/n)^{1/2}$ and $\Vert {\varvec{\alpha }}^*\Vert \le \Vert g( {\varvec{\beta }})\Vert \le \Vert {\hat{{\varvec{\beta }}}}_\mathrm{Z}\Vert =O_p({\sqrt{p_n}})$. By assumptions (C4) and (C5), we have

$$\begin{aligned} \sup _{ {\varvec{\beta }}\in \mathbf{H}_n}\left\| \lambda _n \mathbf{A}_{12}^{\top } {\mathbf{D}}_1( {\varvec{\alpha }}) {\varvec{\alpha }}^*/n \right\|&\le \frac{\lambda _n}{n} \, \Vert \mathbf{A}_{12}^{\top } \Vert \sup _{ {\varvec{\beta }}\in \mathbf{H}_n}\Vert {\mathbf{D}}_1( {\varvec{\alpha }}) {\varvec{\alpha }}^*\Vert \\&\le { \sqrt{2}\, {\tilde{C}}\, \frac{\lambda _n}{n} \, {\frac{a_{1}}{a_{0}^2}}\sup _{ {\varvec{\beta }}\in \mathbf{H}_n}\Vert {\varvec{\alpha }}^*\Vert }= o_p(\sqrt{{p_n}/{n}}), \end{aligned}$$

(13)

where the second inequality uses the fact $\Vert \mathbf{A}_{12}^{\top } \Vert \le \sqrt{2}\, {\tilde{C}}$, which follows from the inequality $\Vert \mathbf{A}_{12}{} \mathbf{A}_{12}^{\top } \Vert -\Vert \mathbf{A}_{11}^2\Vert \le \Vert \mathbf{A}_{11}^2+\mathbf{A}_{12}{} \mathbf{A}_{21}\Vert \le \Vert {\varvec{\Sigma }}_n^{-2}\Vert <{\tilde{C}}^2.$ Combining (12) and (13) gives

$$\begin{aligned} \sup _{ {\varvec{\beta }}\in \mathbf{H}_n}\left\| {\varvec{\gamma }}^*+ \lambda _n \mathbf{A}_{22} {\mathbf{D}}_2( {\varvec{\gamma }}) {\varvec{\gamma }}^*/n \right\| =O_p(\sqrt{{p_n}/{n}}). \end{aligned}$$

(14)

Note that $\mathbf{A}_{22}=\sum _{i=1}^{p_n-{q}}\tau _{2i} {\mathbf{u}}_{2i} {\mathbf{u}}_{2i}^{\top }$ is positive definite and by the singular value decomposition, , where $\tau _{2i}$ and ${\mathbf{u}}_{2i}$ are eigenvalues and eigenvectors of $\mathbf{A}_{22}$. Then, since $1/{\tilde{C}}<\tau _{2i}< {\tilde{C}}$, we have

$$\begin{aligned} \frac{\lambda _n}{n} \, \Vert \mathbf{A}_{22} {\mathbf{D}}_2( {\varvec{\gamma }}) {\varvec{\gamma }}^*\Vert&=\frac{\lambda _n}{n}\left\| \sum _{i=1}^{p_n-{q}}\tau _{2i} {\mathbf{u}}_{2i} {\mathbf{u}}_{2i}^{\top } {\mathbf{D}}_2( {\varvec{\gamma }}) {\varvec{\gamma }}^*\right\| = \frac{\lambda _n}{n}\left\{ \sum _{i=1}^{p_n-{q}}\tau _{2i}^2\Vert {\mathbf{u}}_{2i}^{\top } {\mathbf{D}}_2( {\varvec{\gamma }}) {\varvec{\gamma }}^*\Vert ^2\right\} ^{1/2}\\&\ge \frac{\lambda _n}{n}\frac{1}{{\tilde{C}}}\left\{ \sum _{i=1}^{p_n-{q}}\Vert {\mathbf{u}}_{2i}^{\top } {\mathbf{D}}_2( {\varvec{\gamma }}) {\varvec{\gamma }}^*\Vert ^2\right\} ^{1/2} = \frac{1}{{\tilde{C}}} \left\| \lambda _n {\mathbf{D}}_2( {\varvec{\gamma }}) {\varvec{\gamma }}^* /n \right\| . \end{aligned}$$

This, together with (14) and (C4), implies that with probability tending to 1,

$$\begin{aligned} \frac{1}{{\tilde{C}}}\left\| \lambda _n {\mathbf{D}}_2( {\varvec{\gamma }}) {\varvec{\gamma }}^*/n \right\| -\Vert {\varvec{\gamma }}^*\Vert \le \delta _n\sqrt{{p_n}/{n}}. \end{aligned}$$

(15)

Let ${\mathbf{D}}_{\gamma */\gamma }=(\gamma ^*_1/\gamma _1, \ldots ,\gamma ^*_{p_n-{q}}/\gamma _{p_n-{q}})^{\top }$. Because $\Vert {\varvec{\gamma }}\Vert \le \delta _n\sqrt{p_n/n}$, we have

$$\begin{aligned} \frac{1}{{\tilde{C}}}\left\| \frac{\lambda _n}{n}\, {\mathbf{D}}_2( {\varvec{\gamma }}) {\varvec{\gamma }}^*\right\| =\frac{1}{{\tilde{C}}}\frac{\lambda _n}{n}\left\| { \{{\mathbf{D}}_2( {\varvec{\gamma }})}\}^{1/2} {\mathbf{D}}_{{\varvec{\gamma }}*/{\varvec{\gamma }}}\right\| \ge \frac{1}{{\tilde{C}}}\frac{\lambda _n}{n}\frac{\sqrt{n}}{\delta _n\sqrt{p_n}} \, \Vert {\mathbf{D}}_{{\varvec{\gamma }}*/{\varvec{\gamma }}}\Vert \end{aligned}$$

(16)

and

$$\begin{aligned} \Vert {\varvec{\gamma }}^*\Vert =\Vert { {\mathbf{D}}_2( {\varvec{\gamma }})}^{-1/2} {\mathbf{D}}_{{\varvec{\gamma }}*/{\varvec{\gamma }}}\Vert \le \frac{\delta _n\sqrt{p_n}}{\sqrt{n}}\, \Vert {\mathbf{D}}_{{\varvec{\gamma }}*/{\varvec{\gamma }}}\Vert . \end{aligned}$$

(17)

Combining (15), (16) and (17), we have that with probability tending to 1,

$$\begin{aligned} \Vert {\mathbf{D}}_{{\varvec{\gamma }}*/{\varvec{\gamma }}}\Vert \le \frac{1}{{\lambda _n}/({p_n}\delta _n^2 {\tilde{C}})-1}<{1}/{C_0} \end{aligned}$$

(18)

for some constant $C_0 > 1$ provided that $\lambda _n/({p_n}\delta _n^2) \rightarrow \infty$.

It is worth noting that $\Pr (\Vert {\mathbf{D}}_{{\varvec{\gamma }}*/{\varvec{\gamma }}}\Vert \rightarrow 0) \rightarrow 1$, as $n \rightarrow \infty$. Furthermore, with probability tending to 1,

$$\begin{aligned} \Vert {\varvec{\gamma }}^*\Vert \le \Vert {\mathbf{D}}_{{\varvec{\gamma }}^*/{\varvec{\gamma }}}\Vert \max _{1\le j\le (p_n-{q})}|{\varvec{\gamma }}_j|\le \Vert {\mathbf{D}}_{{\varvec{\gamma }}^*/{\varvec{\gamma }}}\Vert \times \Vert {\varvec{\gamma }}\Vert \le \Vert {\varvec{\gamma }}\Vert /C_0. \end{aligned}$$

This proves part (a).

Next we prove part (b). First, it is easy to see from (17) and (18) that, as $n \rightarrow \infty$,

$$\begin{aligned} \Pr \Big (\Vert {\varvec{\gamma }}^*\Vert \le \delta _n\sqrt{p_n/n} \Big ) \rightarrow 1. \end{aligned}$$

(19)

Then, by (11), we have

$$\begin{aligned} \sup _{ {\varvec{\beta }} \in \mathbf{H}_n}\left\| {\varvec{\alpha }}^*- {\varvec{\beta }}_{01}+ \lambda _n \mathbf{A}_{11} {\mathbf{D}}_1( {\varvec{\alpha }}) {\varvec{\alpha }}^*/n +\lambda _n \mathbf{A}_{12} {\mathbf{D}}_2( {\varvec{\gamma }}){\varvec{\gamma }}^* /n \right\| =O_p(\sqrt{{p_n}/{n}}). \end{aligned}$$

(20)

Similar to (13), it is easily to verify that

$$\begin{aligned} \sup _{ {\varvec{\beta }}\in \mathbf{H}_n}\left\| \lambda _n \mathbf{A}_{11} {\mathbf{D}}_1( {\varvec{\alpha }}){\varvec{\alpha }}^*/n \right\| = o_p(\sqrt{{p_n}/{n}}). \end{aligned}$$

(21)

Moreover, with probability tending to 1,

$$\begin{aligned} \sup _{ {\varvec{\beta }}\in \mathbf{H}_n} \left\| \lambda _n \mathbf{A}_{12} {\mathbf{D}}_2( {\varvec{\gamma }}) {\varvec{\gamma }}^*/n \right\| \le \frac{\lambda _n}{n} \sup _{ {\varvec{\beta }}\in \mathbf{H}_n} \left\| {\mathbf{D}}_2( {\varvec{\gamma }}) {\varvec{\gamma }}^*\right\| \times \Vert \mathbf{A}_{12}\Vert \le 2\sqrt{2}{\tilde{C}}^2\delta _n\sqrt{{p_n}/{n}}, \end{aligned}$$

(22)

where the last step follows from (15), (19), and the fact that $\Vert \mathbf{A}_{12}\Vert \le \sqrt{2}{\tilde{C}}$. It follows from (20), (21) and (22) that with probability tending to 1,

$$\begin{aligned} \sup _{ {\varvec{\beta }}\in \mathbf{H}_n} \Vert {\varvec{\alpha }}^*- {\varvec{\beta }}_{01}\Vert \le { \big (2\sqrt{2}{\tilde{C}}^2+1 \big )\delta _n n^{-1/2}\sqrt{p_n}}. \end{aligned}$$

(23)

Because $\delta _n\sqrt{p_n}/\sqrt{n} \rightarrow 0$, we have, as $n \rightarrow \infty$,

$$\begin{aligned} \Pr ( {\varvec{\alpha }}^*\in \mathbf{H}_{n1}) \rightarrow 1. \end{aligned}$$

(24)

Combining (19) and (24) completes the proof of part (b).$\square$

Lemma 2

Assume that (C1)–(C5) hold. For any q-vector ${\mathbf{c}}$ satisfying $\Vert {\mathbf{c}}\Vert \le 1$, define ${z^2}= {\mathbf{c}}^{\top } {\varvec{\Omega }}_{1} \mathbf{c}$ as in Theorem 1. Define

$$\begin{aligned} f( {\varvec{\alpha }})=\{ {\mathbf{x}}_{1} ^{\top } {\mathbf{x}}_1+\lambda _n {\mathbf{D}}_1( {\varvec{\alpha }})\}^{-1} {\mathbf{x}}_1^{\top } {\mathbf{Y}}^*. \end{aligned}$$

(25)

Then, with probability tending to 1,

(a) $f( {\varvec{\alpha }})$ is a contraction mapping from ${\mathbf{b}}_{n} \equiv \{ {\varvec{\alpha }}\in {\mathbb {R}}^{{q}}: \Vert {\varvec{\alpha }}-{\varvec{\beta }}_{01}\Vert \le \delta _n\sqrt{p_n/n}\}$ to itself;

(b) $\sqrt{n} \, {z^{-1} \mathbf{c}^{\top }}(\hat{ {\varvec{\alpha }}}^{\circ }- {\varvec{\beta }}_{01}) \rightsquigarrow {\mathcal {N}}(0,1),$ where $\hat{ {\varvec{\alpha }}}^{\circ }$ is the unique fixed point of $f({\varvec{\alpha }})$ defined by

$$\begin{aligned} \hat{ {\varvec{\alpha }}}^{\circ }= \{ {\mathbf{x}}_{1}^{\top } { {\mathbf{x}}_1}+\lambda _n {\mathbf{D}}_1( \hat{ {\varvec{\alpha }}}^{\circ })\}^{-1}{ {\mathbf{x}}}_1^{\top } {\mathbf{Y}}^*. \end{aligned}$$

Proof

We first prove part (a). Note that (25) can be rewritten as

$$\begin{aligned} f( {\varvec{\alpha }})- {\varvec{\beta }}_{01}+\frac{\lambda _n}{n} {\varvec{\Sigma }}_{n1}^{-1} {\mathbf{D}}_1( {\varvec{\alpha }})f( {\varvec{\alpha }}) = {{\hat{{\varvec{\beta }}}}_{1\mathrm Z}- {\varvec{\beta }}_{01}}, \end{aligned}$$

where ${\hat{{\varvec{\beta }}}}_{1\mathrm Z}=( {\mathbf{x}}_1^{\top } {\mathbf{x}}_1)^{-1} {\mathbf{x}}_1^{\top } {\mathbf{Y}}^*$. Then,

$$\begin{aligned}&\sup _{ {\varvec{\alpha }}\in {{\mathbf{b}}}_n}\left\| f( {\varvec{\alpha }})- {\varvec{\beta }}_{01}+(\lambda _n/n) {\varvec{\Sigma }}_{n1}^{-1} {\mathbf{D}}_1( {\varvec{\alpha }})f( {\varvec{\alpha }}) \right\| = O_p({1/\sqrt{n}}). \end{aligned}$$

(26)

$$\begin{aligned}&\sup _{ {\varvec{\alpha }}\in {\mathbf{b}}_n}\left\| (\lambda _n/n) {\varvec{\Sigma }}_{n1}^{-1} {\mathbf{D}}_1( {\varvec{\alpha }})f( {\varvec{\alpha }}) \right\| =o_p({1/\sqrt{n}}). \end{aligned}$$

(27)

It follows from (26) and (27) that

$$\begin{aligned} \sup _{ {\varvec{\alpha }}\in {\mathbf{b}}_n}\left\| f( {\varvec{\alpha }})- {\varvec{\beta }}_{01} \right\| \le \delta _n{/\sqrt{n}}, \end{aligned}$$

(28)

where $\delta _n \rightarrow \infty$ and $\delta _n{/\sqrt{n}}\rightarrow 0$. Then we can get

$$\begin{aligned} {\text{ Pr }}(f( {\varvec{\alpha }})\in {\mathbf{b}}_n)\rightarrow 1, {\text{ as }} n\rightarrow \infty . \end{aligned}$$

(29)

This means that f is a mapping from the region ${\mathbf{b}}_n$ to itself.

Rewrite (25) as $\{ {\mathbf{x}}_{1}^{\top }{ {\mathbf{x}}_1}+\lambda _n {\mathbf{D}}_1( {\varvec{\alpha }})\}f( {\varvec{\alpha }})= {\mathbf{x}}_{1}^{\top } {\mathbf{Y}}^*$, then, we have

$$\begin{aligned} ( {\varvec{\Sigma }}_{n1}+(\lambda _n/n){ {\mathbf{D}}_1}({\varvec{\alpha }})){\dot{f}}( {\varvec{\alpha }})+(\lambda _n/n) {\text{ diag }} \{-2f_j( {\varvec{\alpha }})/{\alpha _j^3}\} ={ 0}, \end{aligned}$$

(30)

where ${\dot{f}}( {\varvec{\alpha }})={\partial f( {\varvec{\alpha }})}/{\partial { {\varvec{\alpha }}^{\top }}}$ and ${\text{ diag }} \{\frac{-2f_j( {\varvec{\alpha }})}{\alpha _j^3}\}= {\text{ diag }} \{\frac{-2f_1( {\varvec{\alpha }})}{\alpha _1^3},...,\frac{-2f_{{q}}( {\varvec{\alpha }})}{\alpha _{{q}}^3}\}.$ With the assumption $\lambda _n/\sqrt{n}\rightarrow 0$,

$$\begin{aligned} \sup _{ {\varvec{\alpha }}\in {\mathbf{b}}_n }\big\Vert \left\{ {\varvec{\Sigma }}_{n1}+\frac{\lambda _n}{n}{ {\mathbf{D}}}_1( {\varvec{\alpha }})\right\}{\dot{f}}( {\varvec{\alpha }})\big\Vert = \sup _{ {\varvec{\alpha }}\in {\mathbf{b}}_n} \frac{2\lambda _n}{n}\big\Vert {\text{ diag }} \left \{\frac{f_j( {\varvec{\alpha }})}{\alpha _j^3}\right\}\big\Vert =o_p(1). \end{aligned}$$

(31)

Write ${\varvec{\Sigma }}_{n1} = \sum _{i=1}^{{q}}\tau _{1i} {\mathbf{u}}_{1i} {\mathbf{u}}_{1i}^{\top }$, where $\tau _{1i}$ and ${\mathbf{u}}_{1i}$ are eigenvalues and eigenvectors of ${\varvec{\Sigma }}_{n1}$. Then, by (C4), $1/{\tilde{C}}<\tau _{1i}< {\tilde{C}}$ for all i and

$$\begin{aligned} \Vert {\varvec{\Sigma }}_{n1}{\dot{f}}( {\varvec{\alpha }})\Vert&=\sup _{\Vert {\mathbf{x}}\Vert =1, {\mathbf{x}}\in R^{{q}}}\Vert {\varvec{\Sigma }}_{n1}{\dot{f}}( {\varvec{\alpha }}) {\mathbf{x}}\Vert =\sup _{\Vert {\mathbf{x}}\Vert =1, {\mathbf{x}}\in R^{{q}}}\left\| \sum _{i=1}^{{q}}\lambda _{1i} {\mathbf{u}}_{1i} {\mathbf{u}}_{1i}^{\top }{\dot{f}}( {\varvec{\alpha }}) {\mathbf{x}}\right\| \\&= \sup _{\Vert {\mathbf{x}}\Vert =1, {\mathbf{x}}\in R^{{q}}}\left( \sum _{i=1}^{{q}}\lambda _{1i}^2\Vert {\mathbf{u}}_{1i}^{\top }{\dot{f}}( {\varvec{\alpha }}) {\mathbf{x}}\Vert ^2\right) ^{1/2}\\&\ge \sup _{\Vert {\mathbf{x}}\Vert =1, {\mathbf{x}}\in R^{{q}}}\frac{1}{{\tilde{C}}}\left( \sum _{i=1}^{{q}}\Vert {\mathbf{u}}_{1i}^{\top }{\dot{f}}({\varvec{\alpha }}) {\mathbf{x}}\Vert ^2\right) ^{1/2} \\&= \sup _{\Vert {\mathbf{x}}\Vert =1, {\mathbf{x}}\in R^{{q}}}\frac{1}{{\tilde{C}}} \Vert {\dot{f}}( {\varvec{\alpha }}) {\mathbf{x}}\Vert =\frac{1}{{\tilde{C}}}\Vert {\dot{f}}( {\varvec{\alpha }})\Vert . \end{aligned}$$

(32)

Therefore, it follows from ${\varvec{\alpha }}\in {\mathbf{b}}_n$, (32) and (C4) that

$$\begin{aligned} \left\| \left\{ {\varvec{\Sigma }}_{n1}+(\lambda _n/n){\mathbf{D}}_1( {\varvec{\alpha }})\right\} {\dot{f}}( {\varvec{\alpha }})\right\|&\ge \left\| {\varvec{\Sigma }}_{n1}{\dot{f}}( {\varvec{\alpha }})\right\| -\left\| (\lambda _n/n){ {\mathbf{D}}}_1( {\varvec{\alpha }}){\dot{f}}( {\varvec{\alpha }})\right\| \\&\ge (1/{\tilde{C}})\Vert {\dot{f}}( {\varvec{\alpha }})\Vert -(\lambda _n/n)\cdot {a_{0}^{-2}}\Vert {\dot{f}}( {\varvec{\alpha }})\Vert . \end{aligned}$$

This, together with (31) and the fact $\lambda _n/n\rightarrow 0$, implies that

$$\begin{aligned} \sup _{ {\varvec{\alpha }}\in {\mathbf{b}}_n}\Vert {\dot{f}}( {\varvec{\alpha }})\Vert =o_p(1). \end{aligned}$$

(33)

Finally, we can get the conclusion in part (a) from (29) and (33).

Next we prove part (b). Write

$$\begin{aligned} n^{1/2} \, {z^{-1} \mathbf{c}^{\top }} (\hat{ {\varvec{\alpha }}}^\circ - {\varvec{\beta }}_{01})&=n^{1/2} \, {z^{-1} \mathbf{c}^{\top }} \left[ \left\{ {\varvec{\Sigma }}_{n1}+\frac{\lambda _n}{n} {\mathbf{D}}_1(\hat{ {\varvec{\alpha }}}^\circ )\right\} ^{-1} {\varvec{\Sigma }}_{n1}- \mathbf{I}_{q_n}\right] {\varvec{\beta }}_{01}\\&\quad + n^{-1/2}\, {z^{-1} \mathbf{c}^{\top }} \left\{ {\varvec{\Sigma }}_{n1} +\frac{\lambda _n}{n} {\mathbf{D}}_1(\hat{ {\varvec{\alpha }}}^\circ )\right\} ^{-1} {\mathbf{x}}_{1}^{\top } {{\varvec{\varepsilon }}}^* \equiv I_1 +I_2. \end{aligned}$$

(34)

By the first order resolvent expansion formula

$$\begin{aligned}( \mathbf{H}+ {\varvec{\Delta }})^{-1}= \mathbf{H}^{-1}- \mathbf{H}^{-1}{\varvec{\Delta }}( \mathbf{H}+{\varvec{\Delta }})^{-1},\end{aligned}$$

the first term on the right-hand side of equation (34) can be rewritten as

$$\begin{aligned} I_1 = - {z^{-1} \mathbf{c}^{\top }} {\varvec{\Sigma }}_{n1}^{-1}\frac{\lambda _n}{\sqrt{n}} {\mathbf{D}}_1(\hat{ {\varvec{\alpha }}}^\circ )\left\{ {\varvec{\Sigma }}_{n1} +\frac{\lambda _n}{n} {\mathbf{D}}_1(\hat{ {\varvec{\alpha }}}^\circ )\right\} ^{-1}{\varvec{\Sigma }}_{n1}{\varvec{\beta }}_{01}. \end{aligned}$$

Hence, by the assumption (C4) and (C5), we have

$$\begin{aligned} \Vert I_1\Vert \le { \frac{\lambda _n}{\sqrt{n}}{z^{-1}a_{0}^{-2}}\Vert {\varvec{\Sigma }}_{n1}^{-1}{\varvec{\beta }}_{01}\Vert =O_p\bigg ({\lambda _n/\sqrt{n}}\bigg ) \rightarrow 0.} \end{aligned}$$

(35)

Furthermore, applying the first order resolvent expansion formula, it can be shown that

$$\begin{aligned} I_2&={\frac{z^{-1} }{\sqrt{n}}{} \mathbf{c}^{{\rm {T} }} {\varvec{\Sigma }}_{n1}^{-1}{} {\mathbf{x}}_1^{{\rm {T} }}{{\varvec{\varepsilon }}}^*+o_p(1)}\\&={\frac{z^{-1} }{\sqrt{n}}{} \mathbf{c}^{{\rm {T} }} {\varvec{\Sigma }}_{n1}^{-1}{} {\mathbf{x}}_1^{{\rm {T} }}({\mathbf{Y}}^*-{\mathbf{x}}_1 {\varvec{\omega }}+{\mathbf{x}}_1 {\varvec{\omega }}-{\mathbf{x}}_1{\varvec{\beta }}_{01})+o_p(1)}\\&={\sqrt{n}z^{-1}{} \mathbf{c}^{{\rm {T} }}({\hat{{\varvec{\beta }}}}_{1\mathrm Z}-{\varvec{\omega }}_1+{\varvec{\omega }}_1- {\varvec{\beta }}_{01}) +o_p(1) }\\ \end{aligned}$$

(36)

where ${\varvec{\mu }}_1 =(\mu _1^*, \mu _2^*, ..., \mu _{q}^*)$. $I_2$ converges in distribution to N(0, 1) by the Lindeberg-Feller central limit theorem. Finally, combining (34), (35), and (36) proves part (b). $\Box$ $\square$

Proof of Theorem 1

Given the initial ridge estimator ${\hat{{\varvec{\beta }}}}^{(0)}$ in (4), we have

$$\begin{aligned} \hat{{\varvec{\beta }}}^{(0)}-{\varvec{\beta }}_0&=\left[ \left( {\varvec{\Sigma }}_{n}+\frac{\xi _n}{n} \mathbf{I}_{p_n}\right) ^{-1} {\varvec{\Sigma }}_{n}- \mathbf{I}_{p_n}\right] {\varvec{\beta }}_{0}+\left( {\varvec{\Sigma }}_{n} +\frac{\xi _n}{n} \mathbf{I}_{p_n}\right) ^{-1} {\mathbf{x}}^{\top } {\varvec{\varepsilon }}^*/n\\&\equiv {\mathbf{T}}_1+{\mathbf{T}}_2. \end{aligned}$$

(37)

By the first-order resolvent expansion formula and $\xi _n/\sqrt{n}\rightarrow 0$,

$$\begin{aligned} \Vert {\mathbf{T}}_1\Vert =\left\| - {\varvec{\Sigma }}_{n}^{-1}\frac{\xi _n}{{n}} \left( {\varvec{\Sigma }}_{n} +\frac{\xi _n}{n}{} \mathbf{I}_{p_n}\right) ^{-1}{\varvec{\Sigma }}_{n}{\varvec{\beta }}_{0}\right\| \le {\tilde{C}}^3\frac{\xi _n a_{1}\sqrt{p_n}}{n} =o_p\left( \sqrt{\frac{p_n}{n}}\right) . \end{aligned}$$

(38)

It is easy to see that $\Vert {\mathbf{T}}_2\Vert =O_p(\sqrt{{p_n}/{n}}).$ Thus $\Vert {\hat{{\varvec{\beta }}}}^{(0)}- {\varvec{\beta }}_0\Vert =O_p((p_n/n)^{1/2})$. This, combined with part (a) of Lemma 1, implies that

$$\begin{aligned} {\text{ Pr }}(\lim _{k\rightarrow \infty }{\hat{ {\varvec{\gamma }}}^{(k)}}= 0)\rightarrow 1. \end{aligned}$$

(39)

Hence, to prove part (i) of Theorem 1, it is sufficient to show that

$$\begin{aligned} {\text{ Pr }}(\lim _{k\rightarrow \infty }\Vert {\hat{ {\varvec{\alpha }}}^{(k)}}-\hat{ {\varvec{\alpha }}}^\circ \Vert =0)\rightarrow 1, \end{aligned}$$

(40)

where $\hat{ {\varvec{\alpha }}}^\circ$ is the fixed point of $f({\varvec{\alpha }})$ defined in part (b) of Lemma 2.

Define ${\varvec{\gamma }}^*= 0$ if ${\varvec{\gamma }} = 0$, for any ${\varvec{\alpha }}\in {\mathbf{b}}_n$,

$$\begin{aligned} \lim _{ {\varvec{\gamma }}\rightarrow 0} {\varvec{\gamma }}^*( {\varvec{\alpha }},{\varvec{\gamma }})= 0. \end{aligned}$$

(41)

Combining (41) with the fact

$$\begin{aligned} \begin{pmatrix} {\mathbf{x}}_1^{\top }{ {\mathbf{x}}_1}+\lambda _n {\mathbf{D}}_1( {\varvec{\alpha }}) &{} {\mathbf{x}}_1^{\top }{} \mathbf{X_2}\\ {\mathbf{x}}_2^{\top }{ {\mathbf{x}}_1}&{} {\mathbf{x}}_2^{\top }{ {\mathbf{x}}_2}+\lambda _n {\mathbf{D}}_2( {\varvec{\gamma }}) \end{pmatrix} \begin{pmatrix} {\varvec{\alpha }}^*\\ {\varvec{\gamma }}^* \end{pmatrix} =\begin{pmatrix} {\mathbf{x}}_1^{\top } {\mathbf{Y}}^*\\ {\mathbf{x}}_2^{\top } {\mathbf{Y}}^* \end{pmatrix}, \end{aligned}$$

implies that for any ${\varvec{\alpha }}\in {\mathbf{b}}_n$,

$$\begin{aligned} \lim _{ {\varvec{\gamma }}\rightarrow 0} {\varvec{\alpha }}^*({\varvec{\alpha }}, {\varvec{\gamma }})=\{ {\mathbf{x}}_1^{{\rm {T} }}{ {\mathbf{x}}_1}+\lambda _n {\mathbf{D}}_1( {\varvec{\alpha }})\}^{-1}{} {\mathbf{x}}_1 {\mathbf{Y}}^*{=f( {\varvec{\alpha }})}. \end{aligned}$$

(42)

Therefore, $g(\cdot )$ is continuous and thus uniformly continuous on the compact set ${\varvec{\beta }}\in \mathbf{H}_n$. This, together with (39) and (42), implies that as $k\rightarrow \infty$,

$$\begin{aligned} \eta _k\equiv \sup _{{\varvec{\alpha }}\in {\mathbf{b}}_n}\left\| f( {\varvec{\alpha }})- {\varvec{\alpha }}^*( {\varvec{\alpha }},{\hat{ {\varvec{\gamma }}}^{(k)}})\right\| \longrightarrow 0, \end{aligned}$$

(43)

with probability tending to 1.

Note that

$$\begin{aligned} \Vert {\hat{ {\varvec{\alpha }}}^{(k+1)}}-\hat{{\varvec{\alpha }}}^\circ \Vert= & {} \left\| {\varvec{\alpha }}^*({\hat{ {\varvec{\beta }}}^{(k)}})-\hat{ {\varvec{\alpha }}}^\circ \right\| \le \left\| {\varvec{\alpha }}^*({\hat{{\varvec{\beta }}}^{(k)}})-f({\hat{{\varvec{\alpha }}}^{(k)}})\right\| +\Vert f({\hat{ {\varvec{\alpha }}}^{(k)}})-\hat{ {\varvec{\alpha }}}^\circ \Vert \nonumber \\\le & {} \eta _k + \frac{1}{{\tilde{C}}}\Vert {\hat{ {\varvec{\alpha }}}^{(k)}}-\hat{ {\varvec{\alpha }}}^\circ \Vert , \end{aligned}$$

(44)

where the last step follows from $\Vert f({\hat{ {\varvec{\alpha }}}^{(k)}})-\hat{{\varvec{\alpha }}}^\circ \Vert =\Vert f({\hat{ {\varvec{\alpha }}}^{(k)}})-f(\hat{ {\varvec{\alpha }}}^\circ )\Vert \le (1/{\tilde{C}})\Vert {\hat{ {\varvec{\alpha }}}^{(k)}}-\hat{ {\varvec{\alpha }}}^\circ \Vert$. Let $a_k=\Vert {\hat{ {\varvec{\alpha }}}^{(k)}}-\hat{{\varvec{\alpha }}}^\circ \Vert$, for all $k\ge 0$. From (43), we can induce that with probability tending to 1, for any $\epsilon >0$, there exists an positive integer N such that for all $k> N$, $|\eta _k|<\epsilon$ and

$$\begin{aligned} a_{k+1}&\le \frac{a_{k-1}}{{\tilde{C}}^2} + \frac{\eta _{k-1}}{{\tilde{C}}}+\eta _k\\&\le \frac{a_1}{{\tilde{C}}^k}+\frac{\eta _1}{{\tilde{C}}^{k-1}}+ \cdots +\frac{\eta _N}{{\tilde{C}}^{k-N}}+ (\frac{\eta _{N+1}}{{\tilde{C}}^{k-N-1}}+\cdots +\frac{\eta _{k-1}}{{\tilde{C}}}+\eta _k)\\&\le (a_1+\eta _1+...+\eta _N) \frac{1}{{\tilde{C}}^{k-N}} + \frac{1-(1/{\tilde{C}})^{k-N}}{1-1/{\tilde{C}}} \epsilon \rightarrow 0, {\text{ as }} k\rightarrow \infty . \end{aligned}$$

This proves (40).

Therefore, it immediately follows from (39) and (40) that the with probability tending to 1, $\lim _{k\rightarrow \infty } {\varvec{\beta }}^{(k)}= \lim _{k\rightarrow \infty } (\hat{ {\varvec{\alpha }}}^{(k)\top } , \hat{ {\varvec{\gamma }}}^{(k)\top })^{\top }=(\hat{ {\varvec{\alpha }}}^{\circ \top } , 0)^{{\rm {T} }}$, which completes the proof of part (i). This, in addition to part (b) of Lemma 2, proves part (ii) of Theorem 1. $\Box$

Proof of Theorem 2

Recall that ${\hat{{\varvec{\beta }}}}^* =\lim _{k\rightarrow \infty }\hat{ {\varvec{\beta }}}^{(k+1)}$ and $\hat{ {\varvec{\beta }}}^{(k+1)}=\arg \min _{ {\varvec{\beta }}} \{ Q({\varvec{\beta }}| \hat{ {\varvec{\beta }}}^{(k)})\}$, where

$$\begin{aligned} Q({\varvec{\beta }}| \hat{ {\varvec{\beta }}}^{(k)})= \Vert {\mathbf{Y}}^*-{\mathbf{x}} {\varvec{\beta }}\Vert ^2+\lambda _n \sum _{\ell =1}^{p_n} {\beta _\ell ^2}/{\{{\hat{\beta }}^{(k)}_\ell \}^2}. \end{aligned}$$

If $\beta _\ell ^*\ne 0$ for $\ell \in \{ i,j\}$, then ${\hat{{\varvec{\beta }}}}^*$ must satisfy the following normal equations for $\ell \in \{ i,j\}$:

$$\begin{aligned} -2 {\mathbf{x}}_\ell ^{\top } \{{\mathbf{Y}}^*- {\mathbf{x}}{\hat{{\varvec{\beta }}}}^{(k+1)}\}+2\lambda _n {{\hat{\beta }}_\ell ^{(k+1)}}/{\{{\hat{\beta }}^{(k)}_\ell \}^2} =0. \end{aligned}$$

Thus, for $\ell \in \{ i, j \}$,

$$\begin{aligned} {{\hat{\beta }}_\ell ^{(k+1)}}/{\{{\hat{\beta }}^{(k)}_\ell \}^2} = {{\mathbf{x}}_\ell ^{\top } \hat{{\varvec{\varepsilon }}}^{*(k+1)}}/{\lambda _n}, \end{aligned}$$

(45)

where $\hat{{\varvec{\varepsilon }}}^{*(k+1)}= {\mathbf{Y}}^*- {\mathbf{x}}\hat{ {\varvec{\beta }}}^{(k+1)}$. Moreover, because

$$\begin{aligned} \Vert \hat{{\varvec{\varepsilon }}}^{*(k+1)}\Vert ^2+ \lambda _n\sum _{i=1}^{p_n}\frac{{\hat{\beta }}_i^{2}}{{\tilde{\beta }}_i^2}=Q(\hat{ {\varvec{\beta }}}^{(k+1)}| \hat{ {\varvec{\beta }}}^{(k)})\le Q(0|\hat{ {\varvec{\beta }}}^{(k)}) =\Vert {\mathbf{Y}}^*\Vert ^2, \end{aligned}$$

we have

$$\begin{aligned} \Vert \hat{{\varvec{\varepsilon }}}^{*(k+1)}\Vert \le \Vert {\mathbf{Y}}^*\Vert . \end{aligned}$$

(46)

Letting $k\rightarrow \infty$ in (45) and (46), we have, for $\ell \in \{i, j\}$ and $\Vert \hat{{\varvec{\varepsilon }}}^{*}\Vert \le \Vert {\mathbf{Y}}^*\Vert$, ${\hat{\beta }}_\ell ^{*-1}= {\mathbf{x}}_\ell ^{\top } \hat{{\varvec{\varepsilon }}}^{*}{\lambda _n}$, where $\hat{{\varvec{\varepsilon }}}^{*} = {\mathbf{Y}}^*- {\mathbf{x}}{\hat{{\varvec{\beta }}}}^*$. Therefore,

$$\begin{aligned}\big |{\hat{\beta }}_i^{*-1}-{\hat{\beta }}_j^{*-1}\big |\le \frac{1}{\lambda _n} \, \Vert {\mathbf{Y}}^*\Vert \times \Vert {\mathbf{x}}_i - {\mathbf{x}}_j\Vert = \frac{1}{\lambda _n} \, \Vert {\mathbf{Y}}^*\Vert \sqrt{2(1-\rho _{ij})}. \end{aligned}$$

$\square$

About this article

Cite this article

Sun, Z., Liu, Y., Chen, K. et al. Broken adaptive ridge regression for right-censored survival data. Ann Inst Stat Math 74, 69–91 (2022). https://doi.org/10.1007/s10463-021-00794-3

Download citation

Received: 01 February 2020
Revised: 29 December 2020
Accepted: 18 January 2021
Published: 05 April 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s10463-021-00794-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Broken adaptive ridge regression for right-censored survival data

Abstract

Access this article

Similar content being viewed by others

Censored broken adaptive ridge regression in high-dimension

Variable selection in proportional odds model with informatively interval-censored data

Bayesian penalized Buckley-James method for high dimensional bivariate censored regression models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs of the theorems

Lemma 1

Proof

Lemma 2

Proof

Proof of Theorem 1

Proof of Theorem 2

About this article

Cite this article

Keywords

Navigation

Broken adaptive ridge regression for right-censored survival data

Abstract

Access this article

Similar content being viewed by others

Censored broken adaptive ridge regression in high-dimension

Variable selection in proportional odds model with informatively interval-censored data

Bayesian penalized Buckley-James method for high dimensional bivariate censored regression models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs of the theorems

Appendix: Proofs of the theorems

Lemma 1

Proof

Lemma 2

Proof

Proof of Theorem 1

Proof of Theorem 2

About this article

Cite this article

Share this article

Keywords

Search

Navigation