Skip to main content
Log in

Gradient-based kernel variable selection for support vector hazards machine

  • Research Article
  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

This study aims to improve the predictive performance for the event time through the machine learning model and find informative variables in the time-to-event data, simultaneously. To address this issue, after regarding the time-to-event data as the dichotomized counting processes data for predicting survival time, we consider the time-dependent support vector machine (SVM) framework for the dichotomized counting process data, where the decision function in this framework consists of the time-independent risk score and time-dependent intercept. Also, we consider the empirical partial derivative of the risk score function with respect to each marginal predictor as the indicator for the important predictor. Through this approach, it is possible to predict survival time and find variables that affect on the survival time at the same time. Simulation studies were conducted to confirm the performance of the model, and real data analysis was conducted by predicting the survival time of the lung cancer after the diagnosis and selecting genes associate with lung cancer through human gene data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

We used Beer’s microarray data, which is available with LungCancer3 function in the R package “GSCA”.

References

  • Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68, 337–404.

    Article  MathSciNet  Google Scholar 

  • Beer, D. G., Kardia, S. L., Huang, C.-C., Giordano, T. J., Levin, A. M., Misek, D. E., Lin, L., Chen, G., Gharib, T. G., Thomas, D. G., et al. (2002). Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Medicine, 8, 816–824.

    Article  Google Scholar 

  • Carleo, A., Landi, C., Prasse, A., Bergantini, L., d’Alessandro, M., Cameli, P., Janciauskiene, S., Rottoli, P., Bini, L., & Bargagli, E. (2020). Proteomic characterization of idiopathic pulmonary fibrosis patients: Stable versus acute exacerbation. Monaldi Archives for Chest Disease, 90, 180–190.

    Article  Google Scholar 

  • Clarke, B. S., Fokoué, E., & Zhang, H. H. (2009). Principles and Theory for Data Mining and Machine Learning. Springer.

    Book  Google Scholar 

  • Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B, 34, 187–202.

    Article  MathSciNet  Google Scholar 

  • Fan, J., & Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20(1), 101–148.

    MathSciNet  Google Scholar 

  • Fleming, T. R., & Harrington, D. P. (2011). Counting Processes and Survival Analysis. Wiley.

    Google Scholar 

  • Fukumizu, K., & Leng, C. (2014). Gradient-based kernel dimension reduction for regression. Journal of the American Statistical Association, 109, 359–370.

    Article  MathSciNet  Google Scholar 

  • Gustafsson, P. M., Oxelius, V.-A., Nilsson, S., & Kjellman, B. (2008). Association between gm allotypes and asthma severity from childhood to young middle age. Respiratory Medicine, 102, 266–272.

    Article  Google Scholar 

  • He, X., Wang, J., & Lv, S. (2021). Efficient kernel-based variable selection with sparsistency. Statistica Sinica, 31, 2123–2151.

    MathSciNet  Google Scholar 

  • Ibrahim, J. G., Chen, M.-H., & Sinha, D. (2001). Bayesian Survival Analysis. Springer.

    Book  Google Scholar 

  • Jeong, S., Kim, C., & Yang, H. (2023). Wasserstein filter for variable screening in binary classification in the reproducing kernel Hilbert space. Journal of Nonparametric Statistics, 1–20 (in press)

  • Kalbfleisch, J. D., & Prentice, R. L. (2011). The Statistical Analysis of Failure Time Data (2nd ed.). Wiley.

    Google Scholar 

  • Khan, F. M., & Zubek, V. B. (2008). Support Vector Regression for Censored Data (SVRc): A Novel Tool for Survival Analysis (pp. 863–868). IEEE, IEEE International Conference on Data Mining.

  • Lawless, J. F. (2002). Statistical Models and Methods for Lifetime Data. Wiley.

    Book  Google Scholar 

  • Ma, Y., Chen, Y., & Petersen, I. (2017). Expression and epigenetic regulation of cystatin b in lung cancer and colorectal cancer. Pathology-Research and Practice, 213, 1568–1574.

    Article  Google Scholar 

  • Park, B., & Park, C. (2021). Kernel variable selection for multicategory support vector machines. Journal of Multivariate Analysis, 186, 104800.

    Article  MathSciNet  Google Scholar 

  • Peng, J., Li, W., Tan, N., Lai, X., Jiang, W., & Chen, G. (2022). Usp47 stabilizes bach1 to promote the Warburg effect and non-small cell lung cancer development via stimulating hk2 and gapdh transcription. American Journal of Cancer Research, 12, 91–107.

    Google Scholar 

  • Tibshirani, R., et al. (1997). The lasso method for variable selection in the cox model. Statistics in Medicine, 16, 385–395.

    Article  Google Scholar 

  • Van Belle, V., Pelckmans, K., Van Huffel, S., & Suykens, J. A. (2011). Support vector methods for survival analysis: A comparison between ranking and regression approaches. Artificial Intelligence in Medicine, 53, 107–118.

    Article  Google Scholar 

  • Wang, Q. (2012). Kernel principal component analysis and its applications in face recognition and active shape models. arXiv:1207.3538

  • Wang, Y., Chen, T., & Zeng, D. (2016). Support vector hazards machine: A counting process framework for learning risk scores for censored outcomes. The Journal of Machine Learning Research, 17, 5825–5861.

    MathSciNet  Google Scholar 

  • Wei, L.-J. (1992). The accelerated failure time model: a useful alternative to the cox regression model in survival analysis. Statistics in Medicine, 11, 1871–1879.

    Article  Google Scholar 

  • Xia, Y. (2007). A constructive approach to the estimation of dimension reduction directions. The Annals of Statistics, 35, 2654–2690.

    Article  MathSciNet  Google Scholar 

  • Xia, Y., Tong, H., Li, W., & Zhu, L.-X. (2002). An adaptive estimation of dimension reduction space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64, 363–410.

    Article  MathSciNet  Google Scholar 

  • Yang, H., Zhu, H., Ahn, M., & Ibrahim, J. G. (2021). Weighted functional linear cox regression model. Statistical Methods in Medical Research, 30, 1917–1931.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank the Editor, Associate Editor and two reviewers, whose questions and insightful comments have led to a much improved paper.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF2021R1C1C1007023).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hojin Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jeong, S., Kang, K. & Yang, H. Gradient-based kernel variable selection for support vector hazards machine. J. Korean Stat. Soc. 53, 509–536 (2024). https://doi.org/10.1007/s42952-024-00256-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42952-024-00256-5

Keywords

Navigation