Abstract
Purpose
To explore interpretable machine learning (ML) methods, with the hope of adding more prognosis value, for predicting survival for patients with Oropharyngeal-Cancer (OPC).
Methods
A cohort of 427 OPC patients (Training 341, Test 86) from TCIA database was analyzed. Radiomic features of gross-tumor-volume (GTV) extracted from planning CT using Pyradiomics, and HPV p16 status, etc. patient characteristics were considered as potential predictors. A multi-level dimension reduction algorithm consisting of Least-Absolute-Selection-Operator (Lasso) and Sequential-Floating-Backward-Selection (SFBS) was proposed to effectively remove redundant/irrelevant features. The interpretable model was constructed by quantifying the contribution of each feature to the Extreme-Gradient-Boosting (XGBoost) decision by Shapley-Additive-exPlanations (SHAP) algorithm.
Results
The Lasso-SFBS algorithm proposed in this study finally selected 14 features, and our prediction model achieved an area-under-ROC-curve (AUC) of 0.85 on the test dataset based on this feature set. The ranking of the contribution values calculated by SHAP shows that the top predictors that were most correlated with survival were ECOG performance status, wavelet-LLH_firstorder_Mean, chemotherapy, wavelet-LHL_glcm_InverseVariance, tumor size. Those patients who had chemotherapy, with positive HPV p16 status, and lower ECOG performance status, tended to have higher SHAP scores and longer survival; who had an older age at diagnosis, heavy drinking and smoking pack year history, tended to lower SHAP scores and shorter survival.
Conclusion
We demonstrated predictive values of combined patient characteristics and imaging features for the overall survival of OPC patients. The multi-level dimension reduction algorithm can reliably identify the most plausible predictors that are mostly associated with overall survival. The interpretable patient-specific survival prediction model, capturing correlations of each predictor and clinical outcome, was developed to facilitate clinical decision-making for personalized treatment.
Similar content being viewed by others
Data availability
Source data used during the current study are available from The Cancer Imaging Archive website (https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=33948764). All datasets generated during the current study are available from the corresponding author on reasonable request.
References
Ahmad MA, Eckert C, Teredesai A (2018) Interpretable machine learning in healthcare. In: Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics, pp 559–560. https://doi.org/10.1145/3233547.3233667
Alabi RO, Youssef O, Pirinen M, Elmusrati M, Mäkitie AA, Leivo I, Almangush A (2021) Machine learning in oral squamous cell carcinoma: current status, clinical concerns and prospects for future—a systematic review. Artif Intell Med 115:102060. https://doi.org/10.1016/j.artmed.2021.102060
Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46:175–185. https://doi.org/10.1080/00031305.1992.10475879
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785–794. https://doi.org/10.1145/2939672.2939785
Cheng N-M et al (2021) Deep learning for fully automated prediction of overall survival in patients with oropharyngeal cancer using FDG-PET imaging. Clin Cancer Res 27:3948–3959. https://doi.org/10.1158/1078-0432.CCR-20-4935
Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21:1–13. https://doi.org/10.1186/s12864-019-6413-7
Chu CS, Lee NP, Adeoye J, Thomson P, Choi SW (2020) Machine learning and treatment outcome prediction for oral cancer. J Oral Pathol Med 49:977–985. https://doi.org/10.1111/jop.13089
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Cruz JA, Wishart DS (2006) Applications of machine learning in cancer prediction and prognosis. Cancer Inform 2:117693510600200030. https://doi.org/10.1177/117693510600200030
Dong J, Peng L, Yang X, Zhang Z, Zhang P (2022) XGBoost-based intelligence yield prediction and reaction factors analysis of amination reaction. J Comput Chem 43:289–302. https://doi.org/10.1002/jcc.26791
Du M, Haag DG, Lynch JW, Mittinty MN (2020) Comparison of the tree-based machine learning algorithms to Cox regression in predicting the survival of oral and pharyngeal cancers: analyses based on SEER database. Cancers 12:2802. https://doi.org/10.3390/cancers12102802
Gillies RJ, Kinahan PE, Hricak H (2016) Radiomics: images are more than pictures, they are data. Radiology 278:563. https://doi.org/10.1148/radiol.2015151169
Graboyes EM et al (2019) Association of treatment delays with survival for patients with head and neck cancer a systematic review. JAMA Otolaryngol Head Neck Surg 145:166–177. https://doi.org/10.1001/jamaoto.2018.2716
Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition. IEEE, pp 278–282. https://doi.org/10.1109/ICDAR.1995.598994
Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, vol 398. Wiley
Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2015) Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 13:8–17. https://doi.org/10.1016/j.csbj.2014.11.005
Kumar V et al (2012) Radiomics: the process and the challenges. Magn Reson Imaging 30:1234–1248. https://doi.org/10.1016/j.mri.2012.06.010
Kwan JYY et al (2019) Data from radiomic biomarkers to refine risk models for distant metastasis in oropharyngeal carcinoma. Cancer Imaging Arch. https://doi.org/10.7937/tcia.2019.8dho2gls
Lambin P et al (2012) Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer 48:441–446. https://doi.org/10.1016/j.ejca.2011.11.036
Lambin P et al (2017) Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol 14:749–762. https://doi.org/10.1038/nrclinonc.2017.141
Leijenaar RT et al (2015) External validation of a prognostic CT-based radiomic signature in oropharyngeal squamous cell carcinoma. Acta Oncol 54:1423–1429. https://doi.org/10.3109/0284186X.2015.1061214
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. https://doi.org/10.48550/arXiv.1705.07874
Lundberg SM, Erion GG, Lee S-I (2018) Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:180203888. https://doi.org/10.48550/arXiv.1802.03888
Mihaylov I, Nisheva M, Vassilev D (2019) Application of machine learning models for survival prognosis in breast cancer studies. Information 10:93. https://doi.org/10.3390/info10030093
Molnar C (2020) Interpretable machine learning. Lulu, Com
Moro JdS, Maroneze MC, Ardenghi TM, Barin LM, Danesi CC (2018) Oral and oropharyngeal cancer: epidemiology and survival analysis. Einstein (sao Paulo). https://doi.org/10.1590/S1679-45082018AO4248
Patel H, Vock DM, Marai GE, Fuller CD, Mohamed AS, Canahuate G (2021) Oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features. Sci Rep 11:1–11. https://doi.org/10.1038/s41598-021-9
Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15:1119–1125. https://doi.org/10.1016/0167-8655(94)90127-9
Ren ZH, Hu CY, He HR, Li YJ, Lyu J (2020) Global and regional burdens of oral cancer from 1990 to 2017: results from the global burden of disease study. Cancer Commun 40:81–92. https://doi.org/10.1002/cac2.12009
Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21:660–674. https://doi.org/10.1109/21.97458
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manage 45:427–437. https://doi.org/10.1016/j.ipm.2009.03.002
Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F (2021) Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA A Cancer J Clin 71:209–249. https://doi.org/10.3322/caac.21660
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc: Ser B (methodol) 58:267–288
Tonekaboni S, Joshi S, McCradden MD, Goldenberg A (2019) What clinicians want: contextualizing explainable machine learning for clinical end use. In: Machine learning for healthcare conference. PMLR, pp 359–380
Tseng W-T, Chiang W-F, Liu S-Y, Roan J, Lin C-N (2015) The application of data mining techniques to oral cancer prognosis. J Med Syst 39:1–7. https://doi.org/10.1007/s10916-015-024
Van Griethuysen JJ et al (2017) Computational radiomics system to decode the radiographic phenotype. Can Res 77:e104–e107. https://doi.org/10.1158/0008-5472.CAN-17-0339
Vellido A (2020) The importance of interpretability and visualization in machine learning for applications in medicine and health care. Neural Comput Appl 32:18069–18083. https://doi.org/10.1007/s00521-01904051-w
Venkatesh B, Anuradha J (2019) A review of feature selection and its methods. Cybern Inf Technol 19:3–26. https://doi.org/10.2478/cait-2019-0001
Xin Z, Xiaopin C (2020) Relationship between HPV and oropharyngeal cancer in China. J Int Oncol 47:164. https://doi.org/10.3760/cma.j.issn.1673-422X.2020.03.007
Funding
This research was funded by the National Natural Science Foundation of China (INo.62001380) and the General Special Scientific Research Program of Shaanxi Provincial Education Department (20JK0910).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study’s conception and design. Material preparation, data collection and analysis were performed by TF, CL and XSQ. The first draft of the manuscript was written by XP and all authors commented on previous versions of the manuscript. RRS and RKC gave valuable comments when the paper was revised, which greatly improved the quality of the article. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
See Appendix Table 5.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pan, X., Feng, T., Liu, C. et al. A survival prediction model via interpretable machine learning for patients with oropharyngeal cancer following radiotherapy. J Cancer Res Clin Oncol 149, 6813–6825 (2023). https://doi.org/10.1007/s00432-023-04644-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00432-023-04644-y