A Bayesian Approach to Sparse Cox Regression in High-Dimentional Survival Analysis

Krasotkina, Olga; Mottl, Vadim

doi:10.1007/978-3-319-21024-7_30

Olga Krasotkina⁵ &
Vadim Mottl⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9166))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

3132 Accesses
4 Citations

Abstract

Survival prediction and prognostic factor identification play an important role in machine learning research. This paper employs the machine learning regression algorithms for constructing survival model. The paper suggests a new Bayesian framework for feature selection in high-dimensional Cox regression problems. The proposed approach gives a strong probabilistic statement of the shrinkage criterion for feature selection. The proposed regularization gives the estimates that are unbiased, possesses grouping and oracle properties, their maximal risk diverges to a finite value. Experimental results show that the proposed framework is competitive on both simulated data and publicly available real data sets.

The work is supported by grants of the Russian Foundation for Basic Research No. 11-07-00409 and No. 11-07-00634.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aalen, O., Borgan, O., Gjessing, H., Gjessing, S.: Survival and Event History Analysis: A Process Point of View, ser. Statistics for Biology and Health. Springer-Verlag, New York (2008)
Book Google Scholar
Klein, J.P., Moeschberger, M.L.: Survival Analysis, 2nd edn. Springer, New York (2005)
MATH Google Scholar
Cox, D.R.: Regression models and life-tables (with discussion). J. Roy. Stat. Soc. B 34, 187–220 (1972)
MATH Google Scholar
Gui, J., Li, H.: Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics 21, 3001–3008 (2005)
Article Google Scholar
Fan, J., Li, R.: Variable selection for Coxs proportional hazards model and frailty model. Ann. Stat. 30, 74–99 (2002)
Article MATH MathSciNet Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Article MATH MathSciNet Google Scholar
Van Houwelingen, H.C., et al.: Cross-validated Cox regression on microarray gene expression data. Stat. Med. 25, 3201–3216 (2006)
Article MathSciNet Google Scholar
Lin, D.W., Porter, M., Montgomery, B.: Treatment and survival outcomes in young men diagnosed with prostate cancer: a Population-based Cohort Study. Cancer 115(13), 2863–2871 (2009)
Article Google Scholar
Ying, Z.L.: A large sample study of rank estimation for censored regression data. Ann. Stat. 21, 76–99 (1993)
Article MATH Google Scholar
Jin, Z., Lin, D.Y., Wei, L.J., Ying, Z.L.: Rank-based inference for the accelerated failure time model. Biometrika 90, 341–353 (2003)
Article MATH MathSciNet Google Scholar
Sauerbrei, W.: The use of resampling methods to simplify regression models in medical statistics. Apply Stat. 48, 313–339 (1999)
MATH Google Scholar
Sauerbrei, W., Schumacher, M.: A bootstrap resampling procedure for model building: application to the cox regression model. Stat. Med. 11, 2093–2109 (1992)
Article Google Scholar
Tibshirani, R.: The lasso method for variable selection in the Cox model. Stat. Med. 16, 385–395 (1997)
Article Google Scholar
Zhang, H.H., Lu, W.: Adaptive lasso for Coxs proportional hazards model. Biometrika 94, 691–703 (2007)
Article MATH MathSciNet Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. B 67, 301–320 (2005)
Article MATH MathSciNet Google Scholar
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)
Article MATH Google Scholar
Zou, H., Li, R.: One-step sparse estimates in nonconcave penalized likelihood models (with discussion). Ann. Stat. 36(4), 1509–1566 (2008)
Article MATH MathSciNet Google Scholar
Fan, J., Samworth, R., Wu, Y.: Ultrahigh dimensional feature selection: beyond the linear model. J. Mach. Learn. Res. 10, 2013–2038 (2009)
MATH MathSciNet Google Scholar
Leeb, H., Potscher, B.M.: Sparse estimators and the oracle property, or the return of Hodges estimator. J. Econometrics 142(1), 201–211 (2008)
Article MathSciNet Google Scholar
Hastie, T.: Tibshirani R Generalized Additive Models. Chapman and Hall, London (1990)
Google Scholar
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. J. Stat. Softw. 39(5), 1–13 (2011)
Google Scholar
Rosenwald, A., et al.: The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. The New England J. Med. 25, 1937–1947 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computing Center RAS, Moscow, Russia
Olga Krasotkina & Vadim Mottl

Authors

Olga Krasotkina
View author publications
You can also search for this author in PubMed Google Scholar
Vadim Mottl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vadim Mottl .

Editor information

Editors and Affiliations

IBaI, Leipzig, Germany
Petra Perner

Appendix: Proofs

Proof of Theorem 1. The function $l(\varvec{\beta })$ (11) is strictly convex [5]. The penalty term (15) is strictly convex in subspace ${\beta _l} \in [-1/\mu ,1/\mu ], l = 1,...,p.$ In fact, the second derivatives of $p(\varvec{\beta })$ are

$$\begin{aligned} \frac{\partial ^2 p(\varvec{\beta })}{\partial \beta _i\partial \beta _j}= {\left\{ \begin{array}{ll} \frac{1-\mu \beta _i^{2}}{(\mu \beta _i+1)^2}, i=j\\ 0, i\not =j. \end{array}\right. } \end{aligned}$$

(18)

So the Hessian of $p(\varvec{\beta })$ non-negative defined in subspace ${\beta _l} \in [-1/\mu ,1/\mu ], l = 1,...,p.$ Define estimator $\hat{\varvec{\beta }}^*$: let $\hat{\beta }^*_k=\hat{\beta }_k$ for all $i \not = j$, otherwise let $\hat{\beta }_k^*= a\hat{\beta }_i+(1-a)\hat{\beta }_j$ for $a=1/2$. Since $\varvec{x}^{(i)}=\varvec{x}^{(j)}$, $\varvec{\tilde{X}}\varvec{\hat{\beta }^*}= \tilde{\varvec{X}}\varvec{\hat{\beta }}$ and $|\varvec{\tilde{z}} -\tilde{\varvec{X}}\varvec{\hat{\beta }^*} |=|\varvec{\tilde{z}} -\tilde{\varvec{X}}\hat{\varvec{\beta }} |$. However, the penalization function is convex in ${\beta _l} \in [-1/\mu ,1/\mu ], l = 1,...,p.$, that

$$\begin{aligned} p(\hat{\varvec{\beta }}^*) = p(a\hat{\beta }_i+(1-a)\hat{\varvec{\beta }}_j) <a p(\hat{\varvec{\beta _i}})+(1-a)p(\hat{\varvec{\beta }}_j)< p(\hat{\varvec{\beta }}) . \end{aligned}$$

Because $p(\hat{\varvec{\beta }}^*)=p(\hat{\varvec{\beta }})$ and because p(.) is additive, $p(\hat{\varvec{\beta }}^*) < p(\hat{\varvec{\beta }})$ and therefore cannot be the case that $\hat{\varvec{\beta }}$ is a minimizer. Hence $\hat{\beta _i} = \hat{\beta _j}$.

Proof of Theorem 2. By definition,

$$\begin{aligned} \frac{\partial J(\varvec{\beta }|\mu )}{\partial {\beta _k}}\mid _{\beta =\hat{\beta }}=0. \end{aligned}$$

(19)

By (19) (for non-zero $\hat{\beta }_i$ and $\hat{\beta }_j$ ),

$$\begin{aligned} -2\tilde{\varvec{x}}_i^T(\tilde{\varvec{z}}-\tilde{\varvec{X}}\hat{\varvec{\beta }})+(1+1/\mu )\frac{2\mu \hat{\varvec{\beta }_i}}{\mu \hat{\varvec{\beta }_i^2}+1}=0 \end{aligned}$$

(20)

and

$$\begin{aligned} -2\tilde{\varvec{x}}_j^T(\tilde{\varvec{z}}-\tilde{\varvec{X}}\hat{\varvec{\beta }})+(1+1/\mu )\frac{2\mu \hat{\varvec{\beta }_j}}{\mu \hat{\varvec{\beta }_j^2}+1}=0 \end{aligned}$$

(21)

Hence

$$\begin{aligned} \frac{\hat{\varvec{\beta }_i}}{\mu \hat{\varvec{\beta }_i^2}+1} - \frac{\hat{\varvec{\beta }_j}}{\mu \hat{\varvec{\beta }_j^2}+1} = \frac{1}{1+\mu }(\tilde{\varvec{x}}_i - \tilde{\varvec{x}}_j)^T(\tilde{\varvec{z}}-\tilde{\varvec{X}}\hat{\varvec{\beta }})\le \frac{1}{1+\mu }|\tilde{\varvec{x}}_i - \tilde{\varvec{x}}_j||\tilde{\varvec{z}}-\tilde{\varvec{X}}\hat{\varvec{\beta }}|. \end{aligned}$$

(22)

Also, note that $J(\varvec{\hat{\beta }}|\mu )\le J(\varvec{\hat{\beta }=\varvec{0}}|\mu )$, so $|\tilde{\varvec{z}}-\tilde{\varvec{X}}\hat{\varvec{\beta }}|\le |\tilde{\varvec{z}}| = 1$, since $\tilde{\varvec{z}}$ is centered and standardize . Hence,

$$\begin{aligned} \frac{\hat{\varvec{\beta }_i}}{\mu \hat{\varvec{\beta }_i^2}+1} - \frac{\hat{\varvec{\beta }_j}}{\mu \hat{\varvec{\beta }_j^2}+1} \le \frac{1}{1+\mu }|\tilde{\varvec{x}}_i - \tilde{\varvec{x}}_j| =\frac{\sqrt{2(1-\rho )}}{\mu +1} . \end{aligned}$$

(23)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Krasotkina, O., Mottl, V. (2015). A Bayesian Approach to Sparse Cox Regression in High-Dimentional Survival Analysis. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2015. Lecture Notes in Computer Science(), vol 9166. Springer, Cham. https://doi.org/10.1007/978-3-319-21024-7_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-21024-7_30
Published: 01 July 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21023-0
Online ISBN: 978-3-319-21024-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Bayesian Approach to Sparse Cox Regression in High-Dimentional Survival Analysis

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Proofs

Appendix: Proofs

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation