Abstract
Estrogen-related metabolic enzyme gene polymorphisms have been demonstrated to be linked to breast cancer, and in this paper, a novel noninvasive breast cancer prediction model was developed utilizing machine learning algorithms incorporating estrogen metabolic enzyme gene single nucleotide polymorphisms (SNPs). To precisely forecast the susceptibility to breast cancer,, the coded data of 14 SNPs from enrolled breast patients and normal women were randomly shuffled, with 80% of the data designated as training data, the remaining 20% reserved as the test group to be validated. Single factor analysis was performed to screen independent risk factors, and subsequent application of Breast Cancer with Single Nucleotide Polymorphisms - Machine Learning model (BCSNP-ML) prediction model was completed using Light Gradient Boosting Machine (LightGBM) algorithm. A total of 14 SNPs variables from 280 subjects were utilized in this study. Single factor analysis indicated that a meaningful association between SULT1A1 rs1042028, CYP1A1 rs1048943, CYP1B1 rs1056827, CYP1A1 rs1056836 and the incidence of breast cancer, with 14 variables demonstrates a notable area under the receiver operating characteristic curve (AUROC) of 0.809. The AUROC of the BCSNP-ML model constructed by four variables was 0.831. Additionally, BCSNP-ML is visualized and interpretated in the paper using SHapley Additive exPlanations analysis to further validate that the model exhibits great potential as a robust tool for clinical forecasting of breast cancer.
T. Zheng and S. Geng—The authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Michailidou, K., Hall, P., Gonzalez-Neira, A., et al.: Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat. Genet. 45(4), 353–361 (2013)
Yin, M., et al.: Analysis on incidence and mortality trends and age-period-cohort of breast cancer in Chinese women from 1990 to 2019. Int. J. Environ. Res. Publ. Health 20(1) (2023)
Yager, J.D., Davidson, N.E.: Estrogen carcinogenesis in breast cancer. N. Engl. J. Med. 354(3), 270–282 (2006)
Clemons, M., Goss, P.: Estrogen and the risk of breast cancer. N. Engl. J. Med. 344(4), 276–285 (2001)
Peto, J., Mack, T.M.: High constant incidence in twins and other relatives of women with breast cancer. Nat. Genet. 26(4), 411–414 (2000)
Michailidou, K., et al.: Association analysis identifies 65 new breast cancer risk loci. Nature 551(7678), 92–94 (2017)
Friesenhengst, A., et al.: Elevated aromatase (CYP19A1) expression is associated with a poor survival of patients with Estrogen receptor positive breast cancer. Horm. Cancer 9(2), 128–138 (2018)
Bahreini, F., et al.: MiR-559 polymorphism rs58450758 is linked to breast cancer. Br. J. Biomed. Sci. 77(1), 29–34 (2020)
Mavaddat, N., et al.: Prediction of breast cancer risk based on profiling with common genetic variants. J. Natl. Cancer Inst. 107(5) (2015)
Reinbolt, R.E., et al.: Genomic risk prediction of aromatase inhibitor-related arthralgia in patients with breast cancer using a novel machine-learning algorithm. Cancer Med. 7(1), 240–253 (2018)
Cui, P., et al.: SNP rs2071095 in LincRNA H19 is associated with breast cancer risk. Breast Cancer Res. Treat. 171(1), 161–171 (2018)
Desautels, T., et al.: Prediction of early unplanned intensive care unit readmission in a UK tertiary care hospital: a cross-sectional machine learning approach. BMJ Open 7(9), e017199 (2017)
Ho, D.S.W., et al.: Machine learning SNP based prediction for precision medicine. Front. Genet. 10, 267 (2019)
Pattarabanjird, T., et al.: A machine learning model utilizing a Novel SNP shows enhanced prediction of coronary artery disease severity. Genes (Basel) 11(12) (2020)
Gaudillo, J., et al.: Machine learning approach to single nucleotide polymorphism-based asthma prediction. PLoS ONE 14(12), e0225574 (2019)
Wang, H.Y., et al.: Machine learning-based method for obesity risk evaluation using single-nucleotide polymorphisms derived from next-generation sequencing. J. Comput. Biol. 25(12), 1347–1360 (2018)
Tai, K.Y., Dhaliwal, J., Wong, K.: Risk score prediction model based on single nucleotide polymorphism for predicting malaria: a machine learning approach. BMC Bioinform. 23(1), 325 (2022)
Lakeman, I.M.M., et al.: Addition of a 161-SNP polygenic risk score to family history-based risk prediction: impact on clinical management in non-BRCA1/2 breast cancer families. J. Med. Genet. 56(9), 581–589 (2019)
Reeves, G.K., et al.: Incidence of breast cancer and its subtypes in relation to individual and multiple low-penetrance genetic susceptibility loci. JAMA 304(4), 426–434 (2010)
Lee, O., et al.: Association of genetic polymorphisms with local steroid metabolism in human benign breasts. Steroids 177, 108937 (2022)
Babu, G., Bin Islam, S., Khan, M.A.: A review on the genetic polymorphisms and susceptibility of cancer patients in Bangladesh. Mol. Biol. Rep. 49(7), 6725–6739 (2022)
Kristanti, A.N., et al.: Anticancer potential of beta-Sitosterol and Oleanolic acid as through inhibition of human estrogenic 17beta-hydroxysteroid dehydrogenase type-1 based on an in silico approach. RSC Adv. 12(31), 20319–20329 (2022)
Khorshid Shamshiri, A., et al.: Genetic architecture of mammographic density as a risk factor for breast cancer: a systematic review. Clin. Transl. Oncol. 25(6), 1729–1747 (2023)
Yi, M., Negishi, M., Lee, S.J.: Estrogen Sulfotransferase (SULT1E1): its molecular regulation, polymorphisms, and clinical perspectives. J. Pers. Med. 11(3) (2021)
Li, J., et al.: Value of UGT2B7-161 single nucleotide polymorphism in predicting the risk of cardiotoxicity in HER-2 positive breast cancer patients who underwent Pertuzumab combined with Trastuzumab therapy by PSL. Pharmgenomics Pers. Med. 15, 215–225 (2022)
Nyangwara, V.A., et al.: Cardiotoxicity and pharmacogenetics of doxorubicin in black Zimbabwean breast cancer patients. Br. J. Clin. Pharmacol. (2023)
Jin, M., et al.: Association between KRAS gene polymorphisms and genetic susceptibility to breast cancer in a Chinese population. J. Clin. Lab. Anal. 37(1), e24806 (2023)
Quinlan, J.R.: Learning decision tree classifiers. ACM Comput. Surv. 28(1), 71–72 (1996)
Rigatti, S.J.: Random forest. J. Insur. Med. 47(1), 31–39 (2017)
Cortes, C., Vapnik, V.J.M.L.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. ACM (2016)
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Wei, Q., et al.: Machine learning based on eye-tracking data to identify autism spectrum disorder: a systematic review and meta-analysis. J. Biomed. Inform. 137, 104254 (2023)
Morgenstern, J.D., et al.: Perspective: big data and machine learning could help advance nutritional epidemiology. Adv. Nutr. 12(3), 621–631 (2021)
Liew, B.X.W., et al.: Machine learning versus logistic regression for prognostic modelling in individuals with non-specific neck pain. Eur. Spine J. 31(8), 2082–2091 (2022)
Founta, K., et al.: Gene targeting in amyotrophic lateral sclerosis using causality-based feature selection and machine learning. Mol. Med. 29(1), 12 (2023)
Yin, L., Ma, P., Deng, Z.: JLGBMLoc-a novel high-precision indoor localization method based on LightGBM. Sensors (Basel) 21(8) (2021)
Gupta, V., Kumar, E.: H(3)O-LGBM: hybrid Harris hawk optimization based light gradient boosting machine model for real-time trading. Artif. Intell. Rev., 1–24 (2023)
Xie, P., et al.: An explainable machine learning model for predicting in-hospital amputation rate of patients with diabetic foot ulcer. Int. Wound J. 19(4), 910–918 (2022)
Zhao, F., et al.: Discovery of breast cancer risk genes and establishment of a prediction model based on Estrogen metabolism regulation. BMC Cancer 21(1), 194 (2021)
Roberts, E., Howell, S., Evans, D.G.: Polygenic risk scores and breast cancer risk prediction. Breast 67, 71–77 (2023)
Lopes Cardozo, J.M.N., et al.: Associations of a breast cancer polygenic risk score with Tumor characteristics and survival. J. Clin. Oncol. 41(10), 1849–1863 (2023)
Warren Andersen, S., et al.: The associations between a polygenic score, reproductive and menstrual risk factors and breast cancer risk. Breast Cancer Res. Treat. 140(2), 427–434 (2013)
Acknowledgments
This research was funded by the Opening Project of Jiangsu Key Laboratory of Xuzhou Medical University (XZSYSKF2021030), the Affiliated Hospital of Xuzhou Medical University Faculty Research Project (2022ZL26), the Science and Technology Plan Social Development Key Project of Xuzhou (KC21172) and the National Natural Science Foundation of China (81402765). The authors thank Zhao Feng from the Jiangsu College of Nursing for providing support and guidance during the experiment. The authors would also like to thank Professor Zhang Xiaoqiang from the China University of Mining and Technology and Professor Gong Ping from the Xuzhou Medical University for their detailed revisions and suggestions on our paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zheng, T. et al. (2024). BCSNP-ML: A Novel Breast Cancer Prediction Model Base on LightGBM and Estrogen Metabolic Enzyme Genes. In: Dong, J., Zhang, L., Cheng, D. (eds) Proceedings of the 2nd International Conference on Internet of Things, Communication and Intelligent Technology. IoTCIT 2023. Lecture Notes in Electrical Engineering, vol 1197. Springer, Singapore. https://doi.org/10.1007/978-981-97-2757-5_66
Download citation
DOI: https://doi.org/10.1007/978-981-97-2757-5_66
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2756-8
Online ISBN: 978-981-97-2757-5
eBook Packages: Computer ScienceComputer Science (R0)