A survival prediction model via interpretable machine learning for patients with oropharyngeal cancer following radiotherapy

Pan, Xiaoying; Feng, Tianhao; Liu, Chen; Savjani, Ricky R.; Chin, Robert K.; Sharon Qi, X.

doi:10.1007/s00432-023-04644-y

A survival prediction model via interpretable machine learning for patients with oropharyngeal cancer following radiotherapy

Research
Published: 18 February 2023

Volume 149, pages 6813–6825, (2023)
Cite this article

Journal of Cancer Research and Clinical Oncology Aims and scope Submit manuscript

Xiaoying Pan ORCID: orcid.org/0000-0002-8899-7540^1,2,
Tianhao Feng^1,2,
Chen Liu^1,2,
Ricky R. Savjani³,
Robert K. Chin³ &
…
X. Sharon Qi³

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Purpose

To explore interpretable machine learning (ML) methods, with the hope of adding more prognosis value, for predicting survival for patients with Oropharyngeal-Cancer (OPC).

Methods

A cohort of 427 OPC patients (Training 341, Test 86) from TCIA database was analyzed. Radiomic features of gross-tumor-volume (GTV) extracted from planning CT using Pyradiomics, and HPV p16 status, etc. patient characteristics were considered as potential predictors. A multi-level dimension reduction algorithm consisting of Least-Absolute-Selection-Operator (Lasso) and Sequential-Floating-Backward-Selection (SFBS) was proposed to effectively remove redundant/irrelevant features. The interpretable model was constructed by quantifying the contribution of each feature to the Extreme-Gradient-Boosting (XGBoost) decision by Shapley-Additive-exPlanations (SHAP) algorithm.

Results

The Lasso-SFBS algorithm proposed in this study finally selected 14 features, and our prediction model achieved an area-under-ROC-curve (AUC) of 0.85 on the test dataset based on this feature set. The ranking of the contribution values calculated by SHAP shows that the top predictors that were most correlated with survival were ECOG performance status, wavelet-LLH_firstorder_Mean, chemotherapy, wavelet-LHL_glcm_InverseVariance, tumor size. Those patients who had chemotherapy, with positive HPV p16 status, and lower ECOG performance status, tended to have higher SHAP scores and longer survival; who had an older age at diagnosis, heavy drinking and smoking pack year history, tended to lower SHAP scores and shorter survival.

Conclusion

We demonstrated predictive values of combined patient characteristics and imaging features for the overall survival of OPC patients. The multi-level dimension reduction algorithm can reliably identify the most plausible predictors that are mostly associated with overall survival. The interpretable patient-specific survival prediction model, capturing correlations of each predictor and clinical outcome, was developed to facilitate clinical decision-making for personalized treatment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prediction of response after chemoradiation for esophageal cancer using a combination of dosimetry and CT radiomics

Article 26 April 2019

Individualized treatment decision model for inoperable elderly esophageal squamous cell carcinoma based on multi-modal data fusion

Article Open access 23 October 2023

Development and validation of survival prognostic models for head and neck cancer patients using machine learning and dosiomics and CT radiomics features: a multicentric study

Article Open access 22 January 2024

Data availability

Source data used during the current study are available from The Cancer Imaging Archive website (https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=33948764). All datasets generated during the current study are available from the corresponding author on reasonable request.

References

Ahmad MA, Eckert C, Teredesai A (2018) Interpretable machine learning in healthcare. In: Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics, pp 559–560. https://doi.org/10.1145/3233547.3233667
Alabi RO, Youssef O, Pirinen M, Elmusrati M, Mäkitie AA, Leivo I, Almangush A (2021) Machine learning in oral squamous cell carcinoma: current status, clinical concerns and prospects for future—a systematic review. Artif Intell Med 115:102060. https://doi.org/10.1016/j.artmed.2021.102060
Article PubMed Google Scholar
Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46:175–185. https://doi.org/10.1080/00031305.1992.10475879
Article Google Scholar
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785–794. https://doi.org/10.1145/2939672.2939785
Cheng N-M et al (2021) Deep learning for fully automated prediction of overall survival in patients with oropharyngeal cancer using FDG-PET imaging. Clin Cancer Res 27:3948–3959. https://doi.org/10.1158/1078-0432.CCR-20-4935
Article CAS PubMed Google Scholar
Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21:1–13. https://doi.org/10.1186/s12864-019-6413-7
Article Google Scholar
Chu CS, Lee NP, Adeoye J, Thomson P, Choi SW (2020) Machine learning and treatment outcome prediction for oral cancer. J Oral Pathol Med 49:977–985. https://doi.org/10.1111/jop.13089
Article PubMed Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Article Google Scholar
Cruz JA, Wishart DS (2006) Applications of machine learning in cancer prediction and prognosis. Cancer Inform 2:117693510600200030. https://doi.org/10.1177/117693510600200030
Article Google Scholar
Dong J, Peng L, Yang X, Zhang Z, Zhang P (2022) XGBoost-based intelligence yield prediction and reaction factors analysis of amination reaction. J Comput Chem 43:289–302. https://doi.org/10.1002/jcc.26791
Article CAS PubMed Google Scholar
Du M, Haag DG, Lynch JW, Mittinty MN (2020) Comparison of the tree-based machine learning algorithms to Cox regression in predicting the survival of oral and pharyngeal cancers: analyses based on SEER database. Cancers 12:2802. https://doi.org/10.3390/cancers12102802
Article CAS PubMed PubMed Central Google Scholar
Gillies RJ, Kinahan PE, Hricak H (2016) Radiomics: images are more than pictures, they are data. Radiology 278:563. https://doi.org/10.1148/radiol.2015151169
Article PubMed Google Scholar
Graboyes EM et al (2019) Association of treatment delays with survival for patients with head and neck cancer a systematic review. JAMA Otolaryngol Head Neck Surg 145:166–177. https://doi.org/10.1001/jamaoto.2018.2716
Article PubMed PubMed Central Google Scholar
Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition. IEEE, pp 278–282. https://doi.org/10.1109/ICDAR.1995.598994
Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, vol 398. Wiley
Book Google Scholar
Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2015) Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 13:8–17. https://doi.org/10.1016/j.csbj.2014.11.005
Article CAS PubMed Google Scholar
Kumar V et al (2012) Radiomics: the process and the challenges. Magn Reson Imaging 30:1234–1248. https://doi.org/10.1016/j.mri.2012.06.010
Article PubMed PubMed Central Google Scholar
Kwan JYY et al (2019) Data from radiomic biomarkers to refine risk models for distant metastasis in oropharyngeal carcinoma. Cancer Imaging Arch. https://doi.org/10.7937/tcia.2019.8dho2gls
Article Google Scholar
Lambin P et al (2012) Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer 48:441–446. https://doi.org/10.1016/j.ejca.2011.11.036
Article PubMed PubMed Central Google Scholar
Lambin P et al (2017) Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol 14:749–762. https://doi.org/10.1038/nrclinonc.2017.141
Article PubMed Google Scholar
Leijenaar RT et al (2015) External validation of a prognostic CT-based radiomic signature in oropharyngeal squamous cell carcinoma. Acta Oncol 54:1423–1429. https://doi.org/10.3109/0284186X.2015.1061214
Article CAS PubMed Google Scholar
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. https://doi.org/10.48550/arXiv.1705.07874
Article Google Scholar
Lundberg SM, Erion GG, Lee S-I (2018) Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:180203888. https://doi.org/10.48550/arXiv.1802.03888
Mihaylov I, Nisheva M, Vassilev D (2019) Application of machine learning models for survival prognosis in breast cancer studies. Information 10:93. https://doi.org/10.3390/info10030093
Article Google Scholar
Molnar C (2020) Interpretable machine learning. Lulu, Com
Google Scholar
Moro JdS, Maroneze MC, Ardenghi TM, Barin LM, Danesi CC (2018) Oral and oropharyngeal cancer: epidemiology and survival analysis. Einstein (sao Paulo). https://doi.org/10.1590/S1679-45082018AO4248
Article PubMed Google Scholar
Patel H, Vock DM, Marai GE, Fuller CD, Mohamed AS, Canahuate G (2021) Oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features. Sci Rep 11:1–11. https://doi.org/10.1038/s41598-021-9
Article CAS Google Scholar
Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15:1119–1125. https://doi.org/10.1016/0167-8655(94)90127-9
Article Google Scholar
Ren ZH, Hu CY, He HR, Li YJ, Lyu J (2020) Global and regional burdens of oral cancer from 1990 to 2017: results from the global burden of disease study. Cancer Commun 40:81–92. https://doi.org/10.1002/cac2.12009
Article Google Scholar
Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21:660–674. https://doi.org/10.1109/21.97458
Article Google Scholar
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manage 45:427–437. https://doi.org/10.1016/j.ipm.2009.03.002
Article Google Scholar
Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F (2021) Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA A Cancer J Clin 71:209–249. https://doi.org/10.3322/caac.21660
Article Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc: Ser B (methodol) 58:267–288
Google Scholar
Tonekaboni S, Joshi S, McCradden MD, Goldenberg A (2019) What clinicians want: contextualizing explainable machine learning for clinical end use. In: Machine learning for healthcare conference. PMLR, pp 359–380
Tseng W-T, Chiang W-F, Liu S-Y, Roan J, Lin C-N (2015) The application of data mining techniques to oral cancer prognosis. J Med Syst 39:1–7. https://doi.org/10.1007/s10916-015-024
Article Google Scholar
Van Griethuysen JJ et al (2017) Computational radiomics system to decode the radiographic phenotype. Can Res 77:e104–e107. https://doi.org/10.1158/0008-5472.CAN-17-0339
Article CAS Google Scholar
Vellido A (2020) The importance of interpretability and visualization in machine learning for applications in medicine and health care. Neural Comput Appl 32:18069–18083. https://doi.org/10.1007/s00521-01904051-w
Article Google Scholar
Venkatesh B, Anuradha J (2019) A review of feature selection and its methods. Cybern Inf Technol 19:3–26. https://doi.org/10.2478/cait-2019-0001
Article Google Scholar
Xin Z, Xiaopin C (2020) Relationship between HPV and oropharyngeal cancer in China. J Int Oncol 47:164. https://doi.org/10.3760/cma.j.issn.1673-422X.2020.03.007
Article Google Scholar

Download references

Funding

This research was funded by the National Natural Science Foundation of China (INo.62001380) and the General Special Scientific Research Program of Shaanxi Provincial Education Department (20JK0910).

Author information

Authors and Affiliations

School of Computer Science and Technology, Xi’an University of Posts and Telecommunications, Xi’an, 710121, China
Xiaoying Pan, Tianhao Feng & Chen Liu
Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi’an University of Posts and Telecommunications, Xi’an, 710121, China
Xiaoying Pan, Tianhao Feng & Chen Liu
Department of Radiation Oncology, University of California Los Angeles, Los Angeles, CA, 90095, USA
Ricky R. Savjani, Robert K. Chin & X. Sharon Qi

Authors

Xiaoying Pan
View author publications
You can also search for this author in PubMed Google Scholar
Tianhao Feng
View author publications
You can also search for this author in PubMed Google Scholar
Chen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ricky R. Savjani
View author publications
You can also search for this author in PubMed Google Scholar
Robert K. Chin
View author publications
You can also search for this author in PubMed Google Scholar
X. Sharon Qi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study’s conception and design. Material preparation, data collection and analysis were performed by TF, CL and XSQ. The first draft of the manuscript was written by XP and all authors commented on previous versions of the manuscript. RRS and RKC gave valuable comments when the paper was revised, which greatly improved the quality of the article. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xiaoying Pan.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

See Appendix Table 5.

Table 5 Definition of clinical features used for modeling

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pan, X., Feng, T., Liu, C. et al. A survival prediction model via interpretable machine learning for patients with oropharyngeal cancer following radiotherapy. J Cancer Res Clin Oncol 149, 6813–6825 (2023). https://doi.org/10.1007/s00432-023-04644-y

Download citation

Received: 11 November 2022
Accepted: 08 February 2023
Published: 18 February 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00432-023-04644-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survival prediction model via interpretable machine learning for patients with oropharyngeal cancer following radiotherapy