Skip to main content

BCSNP-ML: A Novel Breast Cancer Prediction Model Base on LightGBM and Estrogen Metabolic Enzyme Genes

  • Conference paper
  • First Online:
Proceedings of the 2nd International Conference on Internet of Things, Communication and Intelligent Technology (IoTCIT 2023)

Abstract

Estrogen-related metabolic enzyme gene polymorphisms have been demonstrated to be linked to breast cancer, and in this paper, a novel noninvasive breast cancer prediction model was developed utilizing machine learning algorithms incorporating estrogen metabolic enzyme gene single nucleotide polymorphisms (SNPs). To precisely forecast the susceptibility to breast cancer,, the coded data of 14 SNPs from enrolled breast patients and normal women were randomly shuffled, with 80% of the data designated as training data, the remaining 20% reserved as the test group to be validated. Single factor analysis was performed to screen independent risk factors, and subsequent application of Breast Cancer with Single Nucleotide Polymorphisms - Machine Learning model (BCSNP-ML) prediction model was completed using Light Gradient Boosting Machine (LightGBM) algorithm. A total of 14 SNPs variables from 280 subjects were utilized in this study. Single factor analysis indicated that a meaningful association between SULT1A1 rs1042028, CYP1A1 rs1048943, CYP1B1 rs1056827, CYP1A1 rs1056836 and the incidence of breast cancer, with 14 variables demonstrates a notable area under the receiver operating characteristic curve (AUROC) of 0.809. The AUROC of the BCSNP-ML model constructed by four variables was 0.831. Additionally, BCSNP-ML is visualized and interpretated in the paper using SHapley Additive exPlanations analysis to further validate that the model exhibits great potential as a robust tool for clinical forecasting of breast cancer.

T. Zheng and S. Geng—The authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Michailidou, K., Hall, P., Gonzalez-Neira, A., et al.: Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat. Genet. 45(4), 353–361 (2013)

    Article  Google Scholar 

  2. Yin, M., et al.: Analysis on incidence and mortality trends and age-period-cohort of breast cancer in Chinese women from 1990 to 2019. Int. J. Environ. Res. Publ. Health 20(1) (2023)

    Google Scholar 

  3. Yager, J.D., Davidson, N.E.: Estrogen carcinogenesis in breast cancer. N. Engl. J. Med. 354(3), 270–282 (2006)

    Article  Google Scholar 

  4. Clemons, M., Goss, P.: Estrogen and the risk of breast cancer. N. Engl. J. Med. 344(4), 276–285 (2001)

    Article  Google Scholar 

  5. Peto, J., Mack, T.M.: High constant incidence in twins and other relatives of women with breast cancer. Nat. Genet. 26(4), 411–414 (2000)

    Article  Google Scholar 

  6. Michailidou, K., et al.: Association analysis identifies 65 new breast cancer risk loci. Nature 551(7678), 92–94 (2017)

    Article  Google Scholar 

  7. Friesenhengst, A., et al.: Elevated aromatase (CYP19A1) expression is associated with a poor survival of patients with Estrogen receptor positive breast cancer. Horm. Cancer 9(2), 128–138 (2018)

    Article  Google Scholar 

  8. Bahreini, F., et al.: MiR-559 polymorphism rs58450758 is linked to breast cancer. Br. J. Biomed. Sci. 77(1), 29–34 (2020)

    Article  MathSciNet  Google Scholar 

  9. Mavaddat, N., et al.: Prediction of breast cancer risk based on profiling with common genetic variants. J. Natl. Cancer Inst. 107(5) (2015)

    Google Scholar 

  10. Reinbolt, R.E., et al.: Genomic risk prediction of aromatase inhibitor-related arthralgia in patients with breast cancer using a novel machine-learning algorithm. Cancer Med. 7(1), 240–253 (2018)

    Article  Google Scholar 

  11. Cui, P., et al.: SNP rs2071095 in LincRNA H19 is associated with breast cancer risk. Breast Cancer Res. Treat. 171(1), 161–171 (2018)

    Article  Google Scholar 

  12. Desautels, T., et al.: Prediction of early unplanned intensive care unit readmission in a UK tertiary care hospital: a cross-sectional machine learning approach. BMJ Open 7(9), e017199 (2017)

    Article  Google Scholar 

  13. Ho, D.S.W., et al.: Machine learning SNP based prediction for precision medicine. Front. Genet. 10, 267 (2019)

    Article  Google Scholar 

  14. Pattarabanjird, T., et al.: A machine learning model utilizing a Novel SNP shows enhanced prediction of coronary artery disease severity. Genes (Basel) 11(12) (2020)

    Google Scholar 

  15. Gaudillo, J., et al.: Machine learning approach to single nucleotide polymorphism-based asthma prediction. PLoS ONE 14(12), e0225574 (2019)

    Article  Google Scholar 

  16. Wang, H.Y., et al.: Machine learning-based method for obesity risk evaluation using single-nucleotide polymorphisms derived from next-generation sequencing. J. Comput. Biol. 25(12), 1347–1360 (2018)

    Article  Google Scholar 

  17. Tai, K.Y., Dhaliwal, J., Wong, K.: Risk score prediction model based on single nucleotide polymorphism for predicting malaria: a machine learning approach. BMC Bioinform. 23(1), 325 (2022)

    Article  Google Scholar 

  18. Lakeman, I.M.M., et al.: Addition of a 161-SNP polygenic risk score to family history-based risk prediction: impact on clinical management in non-BRCA1/2 breast cancer families. J. Med. Genet. 56(9), 581–589 (2019)

    Article  Google Scholar 

  19. Reeves, G.K., et al.: Incidence of breast cancer and its subtypes in relation to individual and multiple low-penetrance genetic susceptibility loci. JAMA 304(4), 426–434 (2010)

    Article  Google Scholar 

  20. Lee, O., et al.: Association of genetic polymorphisms with local steroid metabolism in human benign breasts. Steroids 177, 108937 (2022)

    Article  Google Scholar 

  21. Babu, G., Bin Islam, S., Khan, M.A.: A review on the genetic polymorphisms and susceptibility of cancer patients in Bangladesh. Mol. Biol. Rep. 49(7), 6725–6739 (2022)

    Article  Google Scholar 

  22. Kristanti, A.N., et al.: Anticancer potential of beta-Sitosterol and Oleanolic acid as through inhibition of human estrogenic 17beta-hydroxysteroid dehydrogenase type-1 based on an in silico approach. RSC Adv. 12(31), 20319–20329 (2022)

    Article  Google Scholar 

  23. Khorshid Shamshiri, A., et al.: Genetic architecture of mammographic density as a risk factor for breast cancer: a systematic review. Clin. Transl. Oncol. 25(6), 1729–1747 (2023)

    Article  Google Scholar 

  24. Yi, M., Negishi, M., Lee, S.J.: Estrogen Sulfotransferase (SULT1E1): its molecular regulation, polymorphisms, and clinical perspectives. J. Pers. Med. 11(3) (2021)

    Google Scholar 

  25. Li, J., et al.: Value of UGT2B7-161 single nucleotide polymorphism in predicting the risk of cardiotoxicity in HER-2 positive breast cancer patients who underwent Pertuzumab combined with Trastuzumab therapy by PSL. Pharmgenomics Pers. Med. 15, 215–225 (2022)

    Google Scholar 

  26. Nyangwara, V.A., et al.: Cardiotoxicity and pharmacogenetics of doxorubicin in black Zimbabwean breast cancer patients. Br. J. Clin. Pharmacol. (2023)

    Google Scholar 

  27. Jin, M., et al.: Association between KRAS gene polymorphisms and genetic susceptibility to breast cancer in a Chinese population. J. Clin. Lab. Anal. 37(1), e24806 (2023)

    Article  Google Scholar 

  28. Quinlan, J.R.: Learning decision tree classifiers. ACM Comput. Surv. 28(1), 71–72 (1996)

    Article  Google Scholar 

  29. Rigatti, S.J.: Random forest. J. Insur. Med. 47(1), 31–39 (2017)

    Article  Google Scholar 

  30. Cortes, C., Vapnik, V.J.M.L.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    Article  Google Scholar 

  31. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. ACM (2016)

    Google Scholar 

  32. Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  33. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  34. Wei, Q., et al.: Machine learning based on eye-tracking data to identify autism spectrum disorder: a systematic review and meta-analysis. J. Biomed. Inform. 137, 104254 (2023)

    Article  Google Scholar 

  35. Morgenstern, J.D., et al.: Perspective: big data and machine learning could help advance nutritional epidemiology. Adv. Nutr. 12(3), 621–631 (2021)

    Article  Google Scholar 

  36. Liew, B.X.W., et al.: Machine learning versus logistic regression for prognostic modelling in individuals with non-specific neck pain. Eur. Spine J. 31(8), 2082–2091 (2022)

    Article  Google Scholar 

  37. Founta, K., et al.: Gene targeting in amyotrophic lateral sclerosis using causality-based feature selection and machine learning. Mol. Med. 29(1), 12 (2023)

    Article  Google Scholar 

  38. Yin, L., Ma, P., Deng, Z.: JLGBMLoc-a novel high-precision indoor localization method based on LightGBM. Sensors (Basel) 21(8) (2021)

    Google Scholar 

  39. Gupta, V., Kumar, E.: H(3)O-LGBM: hybrid Harris hawk optimization based light gradient boosting machine model for real-time trading. Artif. Intell. Rev., 1–24 (2023)

    Google Scholar 

  40. Xie, P., et al.: An explainable machine learning model for predicting in-hospital amputation rate of patients with diabetic foot ulcer. Int. Wound J. 19(4), 910–918 (2022)

    Article  Google Scholar 

  41. Zhao, F., et al.: Discovery of breast cancer risk genes and establishment of a prediction model based on Estrogen metabolism regulation. BMC Cancer 21(1), 194 (2021)

    Article  Google Scholar 

  42. Roberts, E., Howell, S., Evans, D.G.: Polygenic risk scores and breast cancer risk prediction. Breast 67, 71–77 (2023)

    Article  Google Scholar 

  43. Lopes Cardozo, J.M.N., et al.: Associations of a breast cancer polygenic risk score with Tumor characteristics and survival. J. Clin. Oncol. 41(10), 1849–1863 (2023)

    Google Scholar 

  44. Warren Andersen, S., et al.: The associations between a polygenic score, reproductive and menstrual risk factors and breast cancer risk. Breast Cancer Res. Treat. 140(2), 427–434 (2013)

    Article  Google Scholar 

Download references

Acknowledgments

This research was funded by the Opening Project of Jiangsu Key Laboratory of Xuzhou Medical University (XZSYSKF2021030), the Affiliated Hospital of Xuzhou Medical University Faculty Research Project (2022ZL26), the Science and Technology Plan Social Development Key Project of Xuzhou (KC21172) and the National Natural Science Foundation of China (81402765). The authors thank Zhao Feng from the Jiangsu College of Nursing for providing support and guidance during the experiment. The authors would also like to thank Professor Zhang Xiaoqiang from the China University of Mining and Technology and Professor Gong Ping from the Xuzhou Medical University for their detailed revisions and suggestions on our paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deqiang Cheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, T. et al. (2024). BCSNP-ML: A Novel Breast Cancer Prediction Model Base on LightGBM and Estrogen Metabolic Enzyme Genes. In: Dong, J., Zhang, L., Cheng, D. (eds) Proceedings of the 2nd International Conference on Internet of Things, Communication and Intelligent Technology. IoTCIT 2023. Lecture Notes in Electrical Engineering, vol 1197. Springer, Singapore. https://doi.org/10.1007/978-981-97-2757-5_66

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-2757-5_66

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-2756-8

  • Online ISBN: 978-981-97-2757-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics