Skip to main content
Log in

Iterative principal component analysis method for improvised classification of breast cancer disease using blood sample analysis

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

A Correction to this article was published on 23 August 2021

This article has been updated

Abstract

Breast cancer is the most common cancer in women occurring worldwide. Some of the procedures used to diagnose breast cancer are mammogram, breast ultrasound, biopsy, breast magnetic resonance imaging, and blood tests such as complete blood count. Detecting breast cancer at an early stage plays an important role in diagnostic and curative procedures. This paper aims to develop a predictive model for detecting the breast cancer using blood samples data containing age, body mass index (BMI), glucose, insulin, homeostasis model assessment (HOMA), leptin, adiponectin, resistin, and chemokine monocyte chemoattractant protein 1 (MCP-1).The two main challenges encountered in this process are identification of biomarkers and the precision of disease prediction accuracy. The proposed methodology employs principal component analysis in a peculiar approach followed by random forest tree prediction model to discriminate between healthy and breast cancer patients. This approach extracts high communalities, a linear combination of input attributes in a systematic procedure as principal axis elements. The iteratively extracted principal axis elements combined with minimum number of input attributes are able to predict the disease with higher accuracy of classification with increased sensitivity and specificity score. The results proved that the proposed approach generates a higher predictor performance than the previous reported results by opting relevant extracted principal axis elements and attributes that commend the classifier with increased performance measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Availability of data and material

The datasets analyzed during the current study are available in the UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Coimbra.

Code availability

Tanagra: Machine Learning software for research and academic purposes.

Change history

References

  1. Hu Z, Tang J, Wang Z, Zhang K, Zhang L, & Sun Q (2018) Deep learning for image-based cancer detection and diagnosis − A survey. Patt Recog 83:134–149. https://doi.org/10.1016/j.patcog.2018.05.014

  2. Idri A, Chlioui I, El Ouassif B (2018) A systematic map of data analytics in breast cancer. ACM International Conference Proceeding Series. doi: https://doi.org/10.1145/3167918.3167930

  3. Ribli D, Horváth A, Unger Z, Pollner P, Csabai I (2018) Detecting and classifying lesions in mammograms with deep learning. Sci Rep. https://doi.org/10.1038/s41598-018-22437-z

    Article  PubMed  PubMed Central  Google Scholar 

  4. Dhahri H, Al Maghayreh E, Mahmood A, Elkilani W, Faisal Nagi M (2019) Automated breast cancer diagnosis based on machine learning algorithms. Journal of Healthcare Engineering 2019:4253641. https://doi.org/10.1155/2019/4253641

    Article  PubMed  PubMed Central  Google Scholar 

  5. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F (2021) Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin: 1–41. https://doi.org/10.3322/caac.21660

  6. Tajammal Abbas Shah SSG (2017) Breast cancer screening programs: review of merits, demerits, and recent recommendations practiced across the world. \Journal of Microsc Ultrastruct 5(2):59–69. https://doi.org/10.1016/j.jmau.2016.10.002

    Article  Google Scholar 

  7. Dua D, Graff C (2019) UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. University of California, School of Information and Computer Science, Irvine

  8. Patrício M, Pereira J, Crisóstomo J, Matafome P, Gomes M, Seiça R, Caramelo F (2018) Using Resistin, glucose, age and BMI to predict the presence of breast cancer. BMC Cancer 18(1):29. https://doi.org/10.1186/s12885-017-3877-1

    Article  CAS  Google Scholar 

  9. Ragab DA, Sharkas M, Marshall S, Ren J (2019) Breast cancer detection using deep convolutional neural networks and support vector machines. PeerJ 2019(1):1–23. https://doi.org/10.7717/peerj.6201

    Article  Google Scholar 

  10. Cole KD, Lili Wang H-JH (2013) Breast cancer biomarker measurements and standards. Proteomics Clin Appl 7(1–2):17–29. https://doi.org/10.1002/prca.201200075

    Article  CAS  PubMed  Google Scholar 

  11. Das RN, Lee Y (2019) Relationship of leptin with glucose, BMI, age, insulin and breast cancer biomarkers. Arch Gen Int Med 3(1):9–10

    Google Scholar 

  12. Das RN, Lee Y, Mukherjee S, Oh S (2019b) Relationship of body mass index with diabetes and breast cancer biomarkers, 9, pp 1–6

  13. Nath Das R, Lee Y (2019) Association of serum adiponectin with age, BMI and other breast cancer biomarkers. J Blood Lymph 08(04):8–11. https://doi.org/10.4172/2165-7831.1000233

    Article  Google Scholar 

  14. Ramani RG, Sivagami G (2019) Identification of bio-markers for breast cancer detection through data mining methods. Int J Recent Technol Eng 8(2):763–769. https://doi.org/10.35940/ijrte.B1141.0782S319

    Article  Google Scholar 

  15. Jacob SG, Ramani RG (2012) Efficient classifier for classification of prognostic breast cancer data through data mining techniques. In: Proceedings of the World Congress on Engineering and Computer Science 2012

  16. Ramani G, Jacob SG (2013) Benchmarking classification models for cancer prediction from gene expression data: a novel approach and new findings. Stud Inform Control 22(2):133–142. https://doi.org/10.24846/v22i2y201303

    Article  Google Scholar 

  17. Kruse CS, Ehrbar N (2020) Effects of computerized decision support systems on practitioner performance and patient outcomes: systematic review. JMIR Med Inform 8(8):1–8. https://doi.org/10.2196/17283

    Article  Google Scholar 

  18. Lopez FJ, Cuadros M, Cano C, Concha A, Blanco A (2012) Biomedical application of fuzzy association rules for identifying breast cancer biomarkers. Med Biol Eng Compu 50(9):981–990. https://doi.org/10.1007/s11517-012-0914-8

    Article  CAS  Google Scholar 

  19. Healthline. https://www.healthline.com/. Accessed 31 Mar 2021

  20. Breastcancer.org. https://www.breastcancer.org/. Accessed 31 Mar 2021

  21. Sun S, Sun Y, Rong X, Bai L (2019) High glucose promotes breast cancer proliferation and metastasis by impairing angiotensinogen expression. Biosci Rep 39(6):1–9. https://doi.org/10.1042/BSR20190436

    Article  Google Scholar 

  22. Andò S, Gelsomino L, Panza S, Giordano C, Bonofiglio D, Barone I, Catalano S (2019) Obesity, leptin and breast cancer: epidemiological evidence and proposed mechanisms. Cancers 11(1):1–27. https://doi.org/10.3390/cancers11010062

    Article  CAS  Google Scholar 

  23. Zeidan B, Manousopoulou A, Garay-Baquero DJ, White CH, Larkin SET, Potter KN, Roumeliotis TI, Papachristou EK, Copson E, Cutress RI, Beers SA, Eccles D, Townsend PA, Garbis SD (2018) Increased circulating resistin levels in early-onset breast cancer patients of normal body mass index correlate with lymph node negative involvement and longer disease free survival: a multi-center POSH cohort serum proteomics study. Breast Cancer Res 20(1):1–12. https://doi.org/10.1186/s13058-018-0938-6

    Article  CAS  Google Scholar 

  24. Capasso I, Esposito E, Pentimalli F, Montella M, Crispo A, Maurea N, D’Aiuto M, Fucito A, Grimaldi M, Cavalcanti E, Esposito G, Brillante G, Lodato S, Pedicini T, D’Aiuto G, Ciliberto G, Giordano A (2013) ‘Homeostasis model assessment to detect insulin resistance and identify patients at high risk of breast cancer development: National Cancer Institute of Naples experience. J Exp Clin Cancer Res 32(1):1. https://doi.org/10.1186/1756-9966-32-14

    Article  CAS  Google Scholar 

  25. Kang JH, Yu BY, Youn DS (2007) Relationship of serum adiponectin and resistin levels with breast cancer risk. J Korean Med Sci 22(1):117–121. https://doi.org/10.3346/jkms.2007.22.1.117

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Muti P, Quattrin T, Grant BJB, Krogh V, Micheli A, Schünemann HJ, Ram M, Freudenheim JL, Sieri S, Trevisan M, Berrino F (2002) Fasting glucose is a risk factor for breast cancer: a prospective study. Cancer Epidemiol Biomark Prev 11(11):1361–1368

    CAS  Google Scholar 

  27. Assiri AMA, Kamel HFM, Hassanien MFR (2015) Resistin, visfatin, adiponectin, and leptin: risk of breast cancer in pre- and postmenopausal Saudi females and their possible diagnostic and predictive implications as novel biomarkers. Dis Markers 2015:1–9. https://doi.org/10.1155/2015/253519. Hindawi Publishing Corporation

    Article  CAS  Google Scholar 

  28. Brandt J, Garne JP, Tengrup I, Manjer J (2015) Age at diagnosis inrelation to survival following breast cancer: a cohort study. WorldJ Surg Oncol 13:33. https://doi.org/10.1186/s12957-014-0429-x

    Article  CAS  Google Scholar 

  29. Mignone P, Pio G, Džeroski S, Ceci M (2020) Multi-task learning for the simultaneous reconstruction of the human and mouse gene regulatory networks. Sci Rep. https://doi.org/10.1038/s41598-020-78033-7

    Article  PubMed  PubMed Central  Google Scholar 

  30. Wang H, Zhang Q, Kong H, Zeng Y, Hao M, Yu T, Peng J, Xu Z, Chen J, Shi H (2014) Monocyte chemotactic protein-1 expression as a prognosic biomarker in patients with solid tumor: A meta-analysis. Int J Clin Exp Pathol 7(7):3876–3886

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Kim BH, Yu K, Lee PCW (2020) Cancer classification of single-cell gene expression data by neural network. Bioinformatics 36(5):1360–1366. https://doi.org/10.1093/bioinformatics/btz772

    Article  CAS  PubMed  Google Scholar 

  32. Hasdyna N, Sianipar B, Zamzami EM (2020) Improving the performance of K-nearest neighbor algorithm by reducing the attributes of dataset using gain ratio. J Phys Conf Ser 1566(1). https://doi.org/10.1088/1742-6596/1566/1/012090

  33. Singh BK (2019) Determining relevant biomarkers for prediction of breast cancer using anthropometric and clinical features: a comparative investigation in machine learning paradigm. Biocybern Biomed Eng 39(2):393–409. https://doi.org/10.1016/j.bbe.2019.03.001.

  34. Silva Araújo V, Guimarães A, de Campos Souza P, Silva Rezende T, Souza Araújo V (2019) Using resistin, glucose, age and BMI and pruning fuzzy neural network for the construction of expert systems in the prediction of breast cancer. Machine Learning and Knowledge Extraction 1(1):466–482. https://doi.org/10.3390/make1010028

    Article  Google Scholar 

  35. Akben SB (2019) Determination of the blood, hormone and obesity value ranges that indicate the breast cancer, using data mining based expert system. IRBM 40(6):355–360. https://doi.org/10.1016/j.irbm.2019.05.007.

    Article  Google Scholar 

  36. Lalata JP, Maria LBS, Goh JEE, Goh MLI, Vicente HN, Factors AR (2019) Comparison of machine learning algorithms in breast cancer prediction using the Coimbra Dataset. Int J Simul Syst Sci Technol 1–8. https://doi.org/10.5013/IJSSST.a.20.S2.23.

  37. Mohaimenul Islam M, Poly TN (2019) Machine learning models of breast cancer risk prediction. bioRxiv, pp 4. https://doi.org/10.1101/723304

  38. Sardouk F, Dr. Duru AD, D. O. B. (2019) ‘Classification of Breast Cancer Using Data Mining’. Am Sci Res J Eng Technol Sci (ASRJETS) 51(1)

  39. Livieris IE (2019) Improving the classification efficiency of an ANN utilizing a new training methodology. Informatics 6(1):1–17. https://doi.org/10.3390/informatics6010001

    Article  Google Scholar 

  40. Polat K, Senturk U (2018) A novel ML approach to prediction of breast cancer: combining of mad normalization, KMC based feature weighting and AdaBoostM1 classifier. ISMSIT 2018 - 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies, Proceedings. https://doi.org/10.1109/ISMSIT.2018.8567245

  41. Chaurasia V, Pal S, Tiwari BB (2018) Prediction of benign and malignant breast cancer using data mining techniques. J Algorithm Comput Technol 12(2):119–126. https://doi.org/10.1177/1748301818756225

    Article  Google Scholar 

  42. Aslan MF, Celik Y, Kadir Sabanci AD (2018) Breast cancer diagnosis by different machine learning methods using blood analysis data. Int J Intell Syst Appl Eng 6(4):289–293

    Article  Google Scholar 

  43. Li Y, Chen Z (2018) Performance evaluation of machine learning methods for breast cancer prediction. Appl Comput Math 7(4):212–216. https://doi.org/10.11648/j.acm.20180704.15

    Article  Google Scholar 

  44. Hung PD, Hanh TD, Diep VT (2018) Breast cancer prediction using spark MLlib and ML packages. ACM International Conference Proceeding Series, pp 52–59. https://doi.org/10.1145/3309129.3309133

  45. Hirra I, Ahmad M, Hussain A, Ashraf MU, Saeed IA, Qadri SF, Alghamdi AM, Alfakeeh AS (2021) Breast cancer classification from histopathological images using patch-based deep learning modeling. IEEE Access 9:24273–24287. https://doi.org/10.1109/ACCESS.2021.3056516

    Article  Google Scholar 

  46. Adegoke VF, Chen D, Banissi E, Barikzai S (2017) Prediction of breast cancer survivability using ensemble algorithms. Proceedings of International Conference on Smart Systems and Technologies 2017, SST 2017, 2017-Decem, pp 223–231. https://doi.org/10.1109/SST.2017.8188699

  47. Weli ZNS (2020) Data mining in cancer diagnosis and prediction: review about latest ten years. Curr J Appl Sci Technol 39(6):11–32. https://doi.org/10.9734/cjast/2020/v39i630555

    Article  Google Scholar 

  48. Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer-Verlag, New York. https://doi.org/10.1007/b98835

    Book  Google Scholar 

  49. Constantin C (2014) Principal component analysis—a powerful tool in computing marketing information. Bulletin of the Transilvania University of Brasov. Series V:Economic Sciences 7(2):25–30

    Google Scholar 

  50. Faes L, Nollo G, Kirchner M, Olivetti E, Gaita F, Riccardi R, Antolini R (2001) Principal component analysis and cluster analysis for measuring the local organisation of human atrial fibrillation. Med Biol Eng Comp 39(6):656–663. https://doi.org/10.1007/BF02345438

    Article  CAS  Google Scholar 

  51. Zhang Y, Xin Y, Li Q, Ma J, Li S, Lv X, Lv W (2017) Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications. Biomed Eng Online 16(1):1–15. https://doi.org/10.1186/s12938-017-0416-x

    Article  PubMed  PubMed Central  Google Scholar 

  52. Fang J-Q (2014) Medical statistics and computer experiments, 2nd edn. World Scientific Publishing Co Pte Ltd. https://doi.org/10.1142/8981

  53. Baratloo A, Hosseini M, Negida A, El-Ashal G (2015) Part 1: simple definition and calculation of accuracy, sensitivity and specificity. Emergency (Tehran, Iran) 3(2):48–49

    Google Scholar 

  54. Rakotomalala R (2005) TANAGRA: a free software for research and academic purposes. In: Proceedings of EGC, pp. 697–702. Available at: http://eric.univ-lyon2.fr/~ricco/tanagra/en/tanagra.html. Accessed 31 Mar 2021

  55. Lever J & Martin K & Naomi A (2017) Points of Significance: Principal component analysis. Nat Meth 14:641–642. https://doi.org/10.1038/nmeth.4346

  56. Mukaka MM (2012) Statistics corner: a guide to appropriate use of correlation coefficient in medical research. Malawi Med J 24(3).

  57. Mooi E, Sarstedt M (2011) A concise guide to market research, second edn. Springer. https://doi.org/10.1007/978-3-642-12541-6

  58. Williams B, Onsman A, Brown T (2012) Exploratory factor analysis: a five-step guide for novices EDUCATION exploratory factor analysis: a five-step guide for novices. Australas J Paramed 8(3):1–13. Available at: http://ro.ecu.edu.au/jephc/vol8/iss3/1. Accessed 31 Mar 2021 

  59. Gaskin J (2016) Gaskination’s StatWiki, Corporate. Available at: http://statwiki.kolobkreations.com

Download references

Author information

Authors and Affiliations

Authors

Contributions

R. Geetharamani: conceptualization, methodology, validation, resources, supervision, writing-review and editing.

G. Sivagami: software, formal analysis, investigation, data curation, writing-original draft, project administration.

Corresponding author

Correspondence to Sivagami G.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: Duplicate image of Figure 1 was removed and Algorithm 1 was inserted.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

R, G., G, S. Iterative principal component analysis method for improvised classification of breast cancer disease using blood sample analysis. Med Biol Eng Comput 59, 1973–1989 (2021). https://doi.org/10.1007/s11517-021-02405-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-021-02405-y

Keywords

Navigation