Penalized Cox’s proportional hazards model for high-dimensional survival data with grouped predictors

Dang, Xuan; Huang, Shuai; Qian, Xiaoning

doi:10.1007/s11222-021-10052-4

Penalized Cox’s proportional hazards model for high-dimensional survival data with grouped predictors

Published: 30 September 2021

Volume 31, article number 77, (2021)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

779 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

The rapid development of next-generation sequencing technologies has made it possible to measure the expression profiles of thousands of genes simultaneously. Often, there exist group structures among genes manifesting biological pathways and functional relationships. Analyzing such high-dimensional and structural datasets can be computationally expensive and results in the complicated models that are hard to interpret. To address this, variable selection such as penalized methods are often taken. Here, we focus on the Cox’s proportional hazards model to deal with censoring data. Most of the existing penalized methods for Cox’s model are the group lasso methods that show deficiencies, including the over-shrinkage problem. In addition, the contemporary algorithms either exhibit the loss of efficiency or require the group-wise orthonormality assumption. Hence, efficient algorithms for general design matrices are needed to enable practical applications. In this paper, we investigate and comprehensively evaluate three group penalized methods for Cox’s model: the group lasso and two nonconvex penalization methods—group SCAD and group MCP—that have several advantages over the group lasso. These methods are able to perform group selection in both non-overlapping and overlapping cases. We have developed the fast and stable algorithms and a new package grpCox to fit these models without the initial orthonormalization step. The runtime of grpCox is improved significantly over the existing packages, such as grpsurv (for the non-overlapping case), grpregOverlap (overlapping), and SGL. In addition, grpCox is better than grpsurv and comparable with SGL in terms of variable selection performances. Comprehensive studies on both simulation and real-world cancer datasets demonstrate the statistical properties of our grpCox implementations with the group lasso, SCAD, and MCP regularization terms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models

Article Open access 02 July 2020

Gsslasso Cox: a Bayesian hierarchical model for predicting survival and detecting associated genes by incorporating pathway information

Article Open access 27 February 2019

Controlling the false discovery rate by a Latent Gaussian Copula Knockoff procedure

Article 25 March 2023

References

Ahmed, M., Rahman, N.: Atm and breast cancer susceptibility. Oncogene 25(43), 5906–11 (2006)
Article Google Scholar
Alsina-Sanchis, E., Figueras, A., Lahiguera Vidal, A., Casanovas, O., Graupera, M., Villanueva, A., Viñals, F.: The tgf pathway stimulates ovarian cancer cell proliferation by increasing igf1r levels. Int. J. Cancer 139(8), 1894–903 (2016)
Article Google Scholar
Alsina-Sanchis, E., Figueras, A., Gil-Martín, M., Pardo, B., Piulats, J.M., Martí, L., Ponce, J., Matias-Guiu, X., Vidal, A., Villanueva, A., Viñals, F.: Tgf controls ovarian cancer cell proliferation. Int. J. Mol. Sci. 18(8) (2017)
Andersen, P.K., Gill, R.D.: Cox’s regression model for counting processes: a large sample study. Ann. Stat. 10(4), 1100–1120 (1982)
Assefnia, S., Dakshanamurthy, S., Guidry-Auvil, J.M., Hampel, C., Anastasiadis, P.Z., Kallakury, B., Uren, A., Foley, D.W., Brown, M.L., Shapiro, L., Brenner, M., Haigh, D., Byers, S.: Cadherin-11 in poor prognosis malignancies and rheumatoid arthritis: common target, common therapies. Oncotarget 5(6), 1458–74 (2014)
Article Google Scholar
Belhechmi, S., De Bin, R., Rotolo, F., Michiels, S.: Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models. BMC Bioinf. 21(277) (2020)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995)
MathSciNet MATH Google Scholar
Benjamini, Y., Yekutieli, D.: The control of the false discovery rate in multiple testing under dependency. Ann. Stat.B 29, 1165–1188 (2001)
MathSciNet MATH Google Scholar
Bertucci, F., Nasser, V., Granjeaud, S., Eisinger, F., Adelaïde, J., Tagett, R., Loriod, B., Giaconia, A., Benziane, A., Devilard, E., Jacquemier, J., Viens, P., Nguyen, C., Birnbaum, D., Houlgatte, R.: Gene expression profiles of poor-prognosis primary breast cancer correlate with survival. Hum. Mol. Genet. 11(8), 863–72 (2002)
Article Google Scholar
Blighe, K., Lasky-Su, J.: Regparallel: Standard regression functions in r enabled for parallel processing over large data-frames (2021)
Breheny, P., Huang, J.: Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Stat. Comput. 25, 173–187 (2015)
Article MathSciNet MATH Google Scholar
Brisson, B.K., Mauldin, E.A., Lei, W., Vogel, L.K., Power, A.M., Lo, A., Dopkin, D., Khanna, C., Wells, R.G., Pure, E.: Estimation of mean sojourn time in breast cancer screening using a Markov chain model of entry to and exit from preclinical detectable phase. Am. J. Pathol. 185(5), 1471–86 (2015)
Article Google Scholar
Cox, D.R.: Regression models and life-tables. J. R. Stat. Soc. B 34(1), 187–220 (1972)
MathSciNet MATH Google Scholar
Dang, X.: grpCox: Penalized Cox model for high-dimensional data with grouped predictors. (2020) https://CRAN.R-project.org/package=grpCox, R package version 1.0-1
Etemadmoghadam, D., deFazio, A., Beroukhim, R., Mermel, C.: Integrated genome-wide dna copy number and expression analysis identifies distinct mechanisms of primary chemoresistance in ovarian carcinomas. Clin. Cancer Res. 15(4), 1417–27 (2009)
Article Google Scholar
Fan, J., Li, R.: Variable selection for cox’s proportional hazards model and frailty model. Ann. Stat. 6, 74–99 (2002)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)
Article MathSciNet MATH Google Scholar
Gatcliffe, T.A., Monk, B.J., Planutis, K., Holcombe, R.F.: Wnt signaling in ovarian tumorigenesis. Int. J. Gynecol. Cancer 18(5), 954–62 (2008)
Article Google Scholar
Gee, M.E., Faraahi, Z., McCormick, A., Edmondson, R.: Dna damage repair in ovarian cancer: unlocking the heterogeneity. J. Ovarian Res. 11(50),(2018)
Goldgar, D.E., Healey, S., Dowty, J.G., Da-Silva, L., Chen, X., Spurdle, A.B., Terry, M.B., Daly, M.J., Buys, S.M., Southey, M.C., Andrulis, I., John, E.M., Khanna, K.K., Hopper, J.L., Oefner, P.J., Lakhani, S., Chenevix-Trench, G.: Rare variants in the atm gene and risk of breast cancer. Breast Cancer Res. 13(4) (2011)
Gui, J., Li, H.: Penalized cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Biofinformatics 21(13), 3001–3008 (2005)
Article Google Scholar
Hänzelmann, S., Castelo, R., Guinney, J.: GSVA: gene set variation analysis for microarray and RNA-Seq data. BMC Bioinf. 14(7) (2013)
Hochberg, Y.: A sharper bonferroni procedure for multiple tests of significance. Biometrika 75, 800–80 (1988)
Article MathSciNet MATH Google Scholar
Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979)
MathSciNet MATH Google Scholar
Hommel, G.: A stagewise rejective multiple test procedure based on a modified bonferroni test. Biometrika 75, 383–386 (1988)
Article MATH Google Scholar
Huang, J., Breheny, P., Ma, S.: A selective review of group selection in high-dimensional models. Stat. Sci. 27(4), 481–499 (2012)
Article MathSciNet MATH Google Scholar
Hunter, D., Lange, K.: A tutorial on mm algorithms. Am. Stat. 58(1), 30–37 (2004)
Article MathSciNet Google Scholar
Jacob, L., Obozinski, G., Vert, J.: Group lasso with overlap and graph lasso. In International Conference on Machine Learning, Montreal, Canada, Proceedings of the 26th annual international conference on machine learning, pp. 433–440, (2009)
Jenatton, R., Mairal, G., Obozinski, G., Bach, F.: Proximal methods for hierarchical sparse coding. J. Mach. Learn. Res. 12, 2297–2334 (2011)
MathSciNet MATH Google Scholar
Jones, S., Zhang, X., Parsons, D.W., Lin, J.C., Leary, R.J., Angenendt, P., Mankoo, P., Carter, H., Kamiyama, H., Jimeno, A., Hong, S.M., Fu, B., Lin, M.T., Calhoun, E.S., Kamiyama, M., Walter, K., Nikolskaya, T., Nikolsky, Y., Hartigan, J., Smith, D.R., Hidalgo, M., Leach, S.D., Klein, A.P., Jaffee, E.M., Goggins, M., Maitra, A., IacobuzioDonahue, C., Eshleman, J.R., Kern, S.E., Hruban, R.H., Karchin, R., Papadopoulos, N., Parmigiani, G., Vogelstein, B., Velculescu, V.E., Kinzler, K.W.: Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science 321, 1801–1806 (2008)
Article Google Scholar
Kim, Y., Kim, J., Kim, Y.: Blockwise sparse regression. Stat. Sin. 16, 375–390 (2006)
MathSciNet MATH Google Scholar
Lange, K., Hunter, D., Yang, I.: Optimization transfer using surrogate objective functions (with discussion). J. Comput. Graph. Stat. 9(1), 1–20 (2000)
Google Scholar
Li, Y., Chao, F., Huang, B.: Hoxc8 promotes breast tumorigenesis by transcriptionally facilitating cadherin-11 expression. Oncotarget 5(9), 2596–607 (2014)
Article Google Scholar
Lin, Z., Zhu, G., Tang, D., Bu, J., Zou, J.: High expression of col6a1 correlates with poor prognosis in patients with breast cancer. Int. J. Clin. Exp. Med. 11(11), 12157–12164 (2018)
Google Scholar
Loss, L.A., Sadanandam, A., Durinck, S., Nautiyal, S., Flaucher, D., Carlton, V.E., Moorhead, M., Lu, Y., Gray, J.W., Faham, M., Spellman, P., Parvin, B.: Prediction of epigenetically regulated genes in breast cancer cell lines. BMC Bioinf. 11(305) (2010)
Ma, S., Song, X., Huang, J.: Supervised group lasso with applications to microarray data analysis. BMC Bioinf. 8, 60–76 (2007)
Article Google Scholar
Mairal, J., Yu, B.: Complexity analysis of the lasso regularization path (2012)
McCormick, A., Donoghue, P., Dixon, M., O’Sullivan, R., O’Donnell, R., Murray, J., Kaufmann, A., Curtin, N., Edmondson, R.: Ovarian cancers harbour defects in non-homologous end joining resulting in resistance to rucaparib. Clin. Cancer Res. 23(8), 2050–2060 (2017)
Meir, L., Van de Geer, S., Buhlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. Ser. B (Methodol.) 70(1), 53–71 (2008)
Article MathSciNet MATH Google Scholar
Miller, L.D., Smeds, J., George, J., Vega, V.B., Vergara, L., Ploner, A., Pawitan, Y., Hall, P., Klaar, S., Liu, E.T., Bergh, J.: An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc. Natl. Acad. Sci. U.S.A. 102(38), 13550–13555 (2005)
Article Google Scholar
Molecular signatures database v7.4. (2021) https://www.gsea-msigdb.org/gsea/msigdb/collections.jsp
Obozinski, G., Jacob, L., Vert, J.: Group lasso with overlaps: the latent group lasso approach. arXiv (2011)
Otsuka, A., de Paolis, A., Tocchini-Valentini, G.P.: Ribonuclease “xlai,” an activity from xenopus laevis oocytes that excises intervening sequences from yeast transfer ribonucleic acid precursors. Mol. Cell. Biol. 1(3), 269–280 (1981)
Park, M.Y., Hastie, T.: Penalized logistic regression for detecting gene interactions. Tech report, Stanford University, United States, Tech. Rep (2006)
Puig, A., Wiesel, A., Fleury, G., Hero, A.: Multidimensional shrinkage-thresholding operator and group lasso penalties. IEEE Signal Process. Lett. 18, 363–366 (2011)
Article Google Scholar
Sarrio, D., Rodriguez-Pinilla, S.M., Hardisson, D., Cano, A., Moreno-Bueno, G., Palacios, J.: Epithelial-mesenchymal transition in breast cancer relates to the basal-like phenotype. Cancer Res. 68(4), 989–997 (2008)
Article Google Scholar
Sengupta, P.K., Smith, E.M., Kim, K., Murnane, M.J., Smith, B.D.: Dna hypermethylation near the transcription start site of collagen alpha2(i) gene occurs in both cancer cell lines and primary colorectal cancers. Can. Res. 63, 1789–1797 (2003)
Google Scholar
Simon, N.: Regularization paths for coxś proportional hazards model via coordinate descent. J. Stat. Softw. 39(5), 53–66 (2012)
Simon, N., Tibshiran, R.: Standardization and the group lasso penalty. Stat. Sin. 22, 983–1001 (2011)
MathSciNet Google Scholar
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)
Article MathSciNet Google Scholar
Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U.S.A. 102(43), 15545–50 (2005)
Article Google Scholar
Szkandera, J., Kiesslich, T., Haybaeck, J., Gerger, A., Pichler, M.: Hedgehog signaling pathway in ovarian cancer. Int. J. Mol. Sci. 14(1), 1179–1196 (2013)
Article Google Scholar
Ternes, N., Rotolo, F., Michiels, S.: Empirical extensions of the lasso penalty to reduce the false discovery rate in high-dimensional Cox regression models. Stat. Med. 35(15), 2561–73 (2016)
Article MathSciNet Google Scholar
Therneau, T.M.: A package for survival analysis in R. https://CRAN.R-project.org/package=survival, R package version 3.2-11 (2021)
Tibshirani, R.: The lasso method for variable selection in the cox model. Stat. Med. 16(4), 385–395 (1996)
Article Google Scholar
Van de Vijer, M.J., He, Y.D., van’t Veer, L.J., Dai, H., Hart, A.A., Voskuil, D., Schreiber, G.J., Peterse, J.L., CW, R., Marton, M.J., Parrish, M., Atsma, D., Witteveen, A., Glas, A., Delahaye, L., van der Velde, T.W., Bartelink, H., Rodenhuis, S., Rutgers, E.T., Friend, S.H., Bernards, R.: A gene-expression signature as a predictor of survival in breast cancer. New Engl. J. Med. 347(25), 1999–2009 (2002)
Verweij, P.J., Houwelingen, H.C.: Cross-validation in survival analysis. Stat. Med. 12(24), 385–395 (1993)
Article Google Scholar
Wang, L., Chen, G., Li, H.: Group scad regression analysis for microarray time course gene expression data. Bioinformatics 23(12), 1486–1494 (2007)
Article Google Scholar
Wang, L., Li, H., Huang, J.: Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J. Am. Stat. Assoc. 103(484), 1556–1569 (2008)
Article MathSciNet MATH Google Scholar
Wu, T., Wang, S.: Doubly regularized cox regression for high-dimensional survival data with group structures. Stat. Interface 6, 175–186 (2013)
Article MathSciNet MATH Google Scholar
Xiong, G., Deng, L., Zhu, J., Xu, R.: Prolyl-4-hydroxylase subunit 2 promotes breast cancer progression and metastasis by regulating collagen deposition. BMC Cancer 14(1) (2014)
Yang, Y., Zou, H.: A fast unified algorithm for solving group-lasso penalize learning problems. Stat. Comput. 25, 1129–1141 (2015)
Article MathSciNet MATH Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Methodol.) 68(1), 49–67 (2006)
Article MathSciNet MATH Google Scholar
Zeng, Y., Breheny, P.: Overlapping group logistic regression with applications to genetic pathway selection. Cancer Inf. 15, 179–187 (2016)
Google Scholar
Zhang, H., Lu, W.: Adaptive lasso for cox’s proportional hazards model. Biometrika 94(3), 691–703 (2007)
Zhang, C.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)
Article MathSciNet MATH Google Scholar
Zhao, P., Rocha, G., Yu, B.: The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Stat. 37(6A), 3468–3497 (2009)
Article MathSciNet MATH Google Scholar
Zou, H.: A note on path-based variable selection in the penalized proportional hazards model. Biometrika 95(1), 241–247 (2008)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We are grateful to three anonymous reviewers for their helpful comments and constructive feedback, which help significantly improve the preliminary version of this paper. We thank Texas A&M High Performance Research Computing for providing computational resources to perform experiments in this work. This work was supported in part by the National Science Foundation (NSF)–Division of Communication & Computing Foundations (CCF) awards #1553281, #1718513, #1715027, NSF–Division of Information & Intelligent Systems (IIS) award #1812641, and the JDRF award #2-SRA-2018-513-S-B.

Author information

Authors and Affiliations

Texas A&M University, College Station, TX, 77840, USA
Xuan Dang & Xiaoning Qian
University of Washington, Seattle, WA, 98195, USA
Shuai Huang

Authors

Xuan Dang
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoning Qian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuan Dang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 58 KB)

Appendices

Appendix 1

We have studied the statistical properties of the estimators: consistency and convergence rate as follows.

The partial likelihood

$$\begin{aligned} \ell _n(\beta ) = -\frac{1}{n} \sum _{i=1}^{D} \Bigg [ \big (\sum _{j=1}^J X_{j}^{(i)}\mathbf {\beta }_j \big ) - \text {log} \bigg ( \sum _{l \in R_i} \text {exp}\big (\sum _{j=1}^J X_{j}^{(l)}\mathbf {\beta }_j \big ) \bigg ) \Bigg ], \end{aligned}$$

where the penalty term $P_{\lambda , \gamma }(\beta )$ can be denoted as $P_{\lambda _n}(\beta )$ since $\gamma $ for group SCAD and group MCP are fixed. Here, $\ell _n(\beta ), \lambda _n$ denote the partial likelihood and tuning parameter changing with the sample size n, respectively.

Let the true parameter be $\beta _0 = \big ( \beta _{01}^T, \beta _{02}^T \big )^T$ where $\beta _{01}$ consists of all nonzero groups and $\beta _{02}$ consists of all remaining zero groups. The objective function is

$$\begin{aligned} {\mathcal {Q}}_n(\beta , \lambda _n)&= \ell _n(\beta _0) + \ell _n^{'}(\beta _0)^T (\beta -\beta _0) \\&\quad + \frac{\tau }{2}(\beta -\beta _0)^T(\beta -\beta _0) + P_{\lambda _n}(\beta ). \end{aligned}$$

Correspondingly, the minimizer of ${\mathcal {Q}}_n(\beta , \lambda _n)$ is $\beta _n = \big ( \beta _{n1}^T, \beta _{n2}^T \big )^T$ where ${\beta _n} = \underset{\beta }{\text {argmin }}{\mathcal {Q}}_n(\beta , \lambda _n)$.

Define $a_n = \text {max} \{ P^{'}_{\lambda _n}(\Vert \beta _{j0}\Vert ): \Vert \beta _{j0}\Vert \ne 0 \}$ and $b_n = \text {max} \{ P^{''}_{\lambda _n}(\Vert \beta _{j0}\Vert ): \Vert \beta _{j0}\Vert \ne 0 \}$.

Theorem 1

(Consistency and convergence rate) If $ P_{\lambda _n}(\Vert \beta \Vert )$ simultaneously satisfies two conditions: $a_n = O_p(n^{-1/2})$ and $b_n \rightarrow 0$, then $\beta _n$ is a root-n consistent estimator for $\beta _0$ with rate $n^{-1/2}$, i.e. $\Vert \beta _n - \beta _0\Vert = O_p(n^{-1/2})$.

Table 11 Results for group lasso using different cross-validation methods to select hyperparameters over 100 replications

Full size table

Table 12 Results for group SCAD using different cross-validation methods to select hyperparameters over 100 replications

Full size table

Table 13 Results for group MCP using different cross-validation methods to select hyperparameters over 100 replications

Full size table

Table 14 Results for misspecified group structures over 100 replications

Full size table

Proof

According to Theorem 3.2 in Andersen and Gill (1982) two results hold

$$\begin{aligned}&-\ell ^{'}(\beta _0) \overset{p}{\rightarrow } n^{-1/2} {\mathcal {N}}(0,\Sigma ) \\&\ell ^{\prime \prime }(\beta ^*) \overset{p}{\rightarrow } n\Sigma \text{ for } \text{ any } \text{ random } \beta ^* \overset{p}{\rightarrow } \beta _0 \\ \text{ Then, }&\ell ^{\prime \prime }(\beta ^*) = n(\Sigma + O_p(1)), \end{aligned}$$

where $\Sigma $ is the positive definite Fisher information matrix.

Consider a constant ball, $B(C) = \{ \beta _0 + \alpha _n\mathbf{u }: \Vert \mathbf{u }\Vert \le C \}$ and its boundary $\partial B(C)$ where $C>0$ and $\alpha _n = n^{-1/2} + a_n$. Therefore, $O_p(\alpha _n) = O_p(a_n) = O_p(n^{-1/2})$. To prove $\Vert \beta _n - \beta _0\Vert = O_p(n^{-1/2})$, it is sufficient to prove that for any $\epsilon >0$, there exists a large constant C such that

$$\begin{aligned} \text {P}\bigg ( \underset{\beta \in \partial B(C)}{\text {sup}} Q_n(\beta , \lambda _n) < Q(\beta _0, \lambda _n) \bigg ) \ge 1-\epsilon . \end{aligned}$$

(15)

This implies that with probability at least $1-\epsilon $ (or goes to 1), $Q_n(\beta , \lambda _n)$ has a local minimum in the ball B(C) for a given $\lambda _n$.

Denote $D_n(\mathbf{u }) = Q_n(\beta , \lambda _n) - Q(\beta _0, \lambda _n)$, we have

$$\begin{aligned} D_n(\mathbf{u })&= \ell ^{'}(\beta _0)^T (\beta -\beta _0) + \frac{\tau }{2}(\beta -\beta _0)^T(\beta -\beta _0) \\&\quad + P_{\lambda _n}(\beta ) - P_{\lambda _n}(\beta _0) = D_1 + D_2. \end{aligned}$$

Consider that

$$\begin{aligned} D_1&= \ell ^{'}(\beta _0)^T (\beta -\beta _0) + \frac{\tau }{2}(\beta -\beta _0)^T(\beta -\beta _0) \\&= O_p(n^{-1/2})\alpha _n\mathbf{u } + \frac{\tau }{2} \alpha _n^2 \mathbf{u }^T\mathbf{u } \\&= O_p(C\alpha _n^2) + O_p(C^2\alpha _n^2). \end{aligned}$$

Consider $D_2$ using Taylor expansion, we have

$$\begin{aligned} D_2&= P_{\lambda _n}(\beta ) - P_{\lambda _n}(\beta _0) \\&= \sum _j P^{'}_{\lambda _n}(\Vert \beta _{j0}\Vert )(\Vert \beta _{j0}+\alpha _n\mathbf{u }_j\Vert -\Vert \beta _{j0}\Vert ) + \frac{1}{2}(\Vert \beta _{j0} + \\&\alpha _n\mathbf{u }_j\Vert -\Vert \beta _{j0}\Vert )^T\big ( P^{\prime \prime }_{\lambda _n}(\Vert \beta _{j0}\Vert )(\Vert \beta _{j0}+\alpha _n\mathbf{u }_j\Vert -\Vert \beta _{j0}\Vert )\\&\le \sum _j a_n\alpha _n \Vert \mathbf{u }_j\Vert + b_n \alpha _n^2 \Vert \mathbf{u }_j\Vert ^2 \\&\le \sum _j \alpha _n^2 C + b_n \alpha _n^2 C^2 = J(\alpha _n^2 C + b_n \alpha _n^2 C^2). \end{aligned}$$

Because $b_n \rightarrow 0$, $D_2 \rightarrow O_p(C\alpha _n^2)$. By choosing a sufficiently large C, $D_1$ dominates $D_2$. Thus, inequality (15) holds.

Appendix 2

We present the simulation studies of the second cross-validation approach described in Section 2.7 to select the tuning parameters $\lambda $ and evaluate its variable selection performance.

In Fig. 8, each dot represents the logarithm of the $\lambda $ values along the solution path, and the error bars provide the confidence intervals for the cross-validation log-partial-likelihood. The left vertical bar indicates the maximum cross-validation partial-log-likelihood using the first method Verweij and Houwelingen (1993) while the right one shows the maximum cross-validation log-partial-likelihood using the second method Ternes et al. (2016).

We continue considering $N=100$ observations and $P=400$ covariates with 40 groups, each with 10 elements. There are two non-zero groups. The coefficient magnitude $|\beta | = 0.5$, the values of the population correlation $\rho $ are 0, 0.2 and 0.5, the censoring rates are 0% and 20%. The results are summarized in Tables 11, 12, and 13 . It can be seen that using the second cross-validation method always results in smaller models than using the first cross-validation method. For group lasso, it produces better variable selection results with much smaller FPR values. For group SCAD and MCP, it often gives better results, but sometimes suppresses too much, e.g., in group MCP case with 20% censoring, $\rho =0.5$. Therefore, the second cross-validation method should be used with caution.

Appendix 3

We present additional settings based on the reviewer’s suggestions: settings with a large number of overlapping covariates and the number of zero groups being more than the number of non-zero groups. More specifically, we have performed an additional experiment using the simulated data with $N=100$, $P=55$, in which there are 10 groups of size 10 and 50% covariates overlap between two successive groups. The “correct” underlying group structure is given by

$$\begin{aligned}&\underset{group 1}{\underbrace{1,\dots ,10}}\text { } \underset{group 2}{\underbrace{6,\dots ,15}}\text { } \underset{group 3}{\underbrace{11,\dots ,20}}\text { } \underset{group 4}{\underbrace{16,\dots ,25}}\text { } \underset{group 5}{\underbrace{21,\dots ,30}}\text { } \\&\underset{group 6}{\underbrace{26,\dots ,35}}\text { }\underset{group 7}{\underbrace{31,\dots ,40}}\text { }\underset{group 8}{\underbrace{36,\dots ,45}}\text { }\underset{group 9}{\underbrace{41,\dots ,50}}\text { }\underset{group 10}{\underbrace{46,\dots ,55}}. \end{aligned}$$

We set the population correlation $\rho =0.5$ with 30% censoring rate. The corresponding coefficients are

$$\begin{aligned}&\underset{group 1-2}{\underbrace{0,\dots ,0}}\text { } \underset{group 3}{\underbrace{0,0,0,0,0,1.5,0,0,-2,0}}\text { } \underset{group 4}{\underbrace{1.5,0,0,-2,0,0,0,0,0,0}}\\&\text { }\underset{group 5-6}{\underbrace{0,\dots ,0}} \text { }\underset{group 7}{\underbrace{0,0,0,0,0,1.4,0,0,0, 1.8}}\\&\text { }\underset{group 8}{\underbrace{1.4,0,0,0, 1.8,0,0,0,0,0}}\text { }\underset{group 9-10}{\underbrace{0,\dots ,0}}. \end{aligned}$$

Then we consider four setups with the misspecified group structures for inference. In the first setup, the number of groups are incorrect because the overlapping groups are collapsed as follows:

$$\begin{aligned}&\underset{group 1}{\underbrace{1,\dots ,10}}\text { } \underset{group 2}{\underbrace{6,\dots ,15}}\text { } \underset{group 3}{\underbrace{11,\dots ,25}}\text { } \underset{group 4}{\underbrace{21,\dots ,30}}\text { } \underset{group 5}{\underbrace{26,\dots ,35}} \\&\underset{group 6}{\underbrace{31,\dots ,45}}\text { }\underset{group 7}{\underbrace{41,\dots ,50}}\text { }\underset{group 8}{\underbrace{46,\dots ,55}}. \end{aligned}$$

In the second setup, the misspecified group structure deviates from the ground truth more significantly will all the overlapping covariates put into one group:

$$\begin{aligned}&\underset{group 1}{\underbrace{1,3,5,7,9,11,13,15}}\text { } \underset{group 2}{\underbrace{2,4,\dots ,12,14,16,17,18,19,20,21,22}} \\&\text { } \underset{group 3}{\underbrace{16,\dots ,25}}\text { } \underset{group 4}{\underbrace{21,\dots ,30}} \text { } \underset{group 5}{\underbrace{26,\dots ,35}}\text { }\underset{group 6}{\underbrace{31,\dots ,45}}\text { }\underset{group 7}{\underbrace{41,\dots ,50}} \\&\text { }\underset{group 8}{\underbrace{46,\dots ,55}}. \end{aligned}$$

Similar as the first setup, the third and fourth setups are defined as follows:

$$\begin{aligned}&\underset{group 1}{\underbrace{1,\dots ,20}}\text { } \underset{group 2}{\underbrace{16,\dots ,25}}\text { } \underset{group 3}{\underbrace{21,\dots ,30}}\text { } \underset{group 4}{\underbrace{26,\dots ,35}}\text { }\underset{group 5}{\underbrace{31,\dots ,45}}\\&\underset{group 6}{\underbrace{41,\dots ,50}}\text { }\underset{group 7}{\underbrace{46,\dots ,55}} \end{aligned}$$

and

$$\begin{aligned}&\underset{group 1}{\underbrace{1,\dots ,10}}\text { } \underset{group 2}{\underbrace{6,\dots ,20}}\text { } \underset{group 3}{\underbrace{16,\dots ,25}}\text { } \underset{group 4}{\underbrace{21,\dots ,30}}\text { } \underset{group 5}{\underbrace{26,\dots ,40}}\\&\underset{group 6}{\underbrace{36,\dots ,45}}\text { }\underset{group 7}{\underbrace{41,\dots ,50}}\text { }\underset{group 8}{\underbrace{46,\dots ,55}} \end{aligned}$$

The results shown in Table 14 confirm our expectation: the setup with the collapsed groups including several non-zero (active) groups produces worse results than the cases with the collapsed groups with none or only one non-zero group. More clearly, the first setup in the table including two collapsed groups (group3 and group5), where each of them consists of two non-zero groups, has the worst variable selection performance. Both the second and third misspecification setups including only one group (group5) that is collapsed from two non-zero groups have almost the same performance, better than the first misspecification setup. The fourth mispecification setup with no misspecified group collapsed from two non-zero groups has the best performance. We hypothesize that the probability of variables being incorrectly selected increases due to the ignorance of the overlapping property of active elements in the collapsed groups and the larger group sizes of these collapsed groups. In other words, FPR increases and then corresponding RMSE increases.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dang, X., Huang, S. & Qian, X. Penalized Cox’s proportional hazards model for high-dimensional survival data with grouped predictors. Stat Comput 31, 77 (2021). https://doi.org/10.1007/s11222-021-10052-4

Download citation

Received: 08 July 2020
Accepted: 13 September 2021
Published: 30 September 2021
DOI: https://doi.org/10.1007/s11222-021-10052-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Penalized Cox’s proportional hazards model for high-dimensional survival data with grouped predictors

Abstract

Access this article

Similar content being viewed by others

Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models

Gsslasso Cox: a Bayesian hierarchical model for predicting survival and detecting associated genes by incorporating pathway information

Controlling the false discovery rate by a Latent Gaussian Copula Knockoff procedure

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 58 KB)

Appendices

Appendix 1

Theorem 1

Proof

Appendix 2

Appendix 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Penalized Cox’s proportional hazards model for high-dimensional survival data with grouped predictors

Abstract

Access this article

Similar content being viewed by others

Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models

Gsslasso Cox: a Bayesian hierarchical model for predicting survival and detecting associated genes by incorporating pathway information

Controlling the false discovery rate by a Latent Gaussian Copula Knockoff procedure

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 58 KB)

Appendices

Appendix 1

Theorem 1

Proof

Appendix 2

Appendix 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation