Abstract
Breast cancer is the most common cancer in women occurring worldwide. Some of the procedures used to diagnose breast cancer are mammogram, breast ultrasound, biopsy, breast magnetic resonance imaging, and blood tests such as complete blood count. Detecting breast cancer at an early stage plays an important role in diagnostic and curative procedures. This paper aims to develop a predictive model for detecting the breast cancer using blood samples data containing age, body mass index (BMI), glucose, insulin, homeostasis model assessment (HOMA), leptin, adiponectin, resistin, and chemokine monocyte chemoattractant protein 1 (MCP-1).The two main challenges encountered in this process are identification of biomarkers and the precision of disease prediction accuracy. The proposed methodology employs principal component analysis in a peculiar approach followed by random forest tree prediction model to discriminate between healthy and breast cancer patients. This approach extracts high communalities, a linear combination of input attributes in a systematic procedure as principal axis elements. The iteratively extracted principal axis elements combined with minimum number of input attributes are able to predict the disease with higher accuracy of classification with increased sensitivity and specificity score. The results proved that the proposed approach generates a higher predictor performance than the previous reported results by opting relevant extracted principal axis elements and attributes that commend the classifier with increased performance measures.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11517-021-02405-y/MediaObjects/11517_2021_2405_Fig1_HTML.png)
Similar content being viewed by others
Availability of data and material
The datasets analyzed during the current study are available in the UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Coimbra.
Code availability
Tanagra: Machine Learning software for research and academic purposes.
Change history
23 August 2021
A Correction to this paper has been published: https://doi.org/10.1007/s11517-021-02426-7
References
Hu Z, Tang J, Wang Z, Zhang K, Zhang L, & Sun Q (2018) Deep learning for image-based cancer detection and diagnosis − A survey. Patt Recog 83:134–149. https://doi.org/10.1016/j.patcog.2018.05.014
Idri A, Chlioui I, El Ouassif B (2018) A systematic map of data analytics in breast cancer. ACM International Conference Proceeding Series. doi: https://doi.org/10.1145/3167918.3167930
Ribli D, Horváth A, Unger Z, Pollner P, Csabai I (2018) Detecting and classifying lesions in mammograms with deep learning. Sci Rep. https://doi.org/10.1038/s41598-018-22437-z
Dhahri H, Al Maghayreh E, Mahmood A, Elkilani W, Faisal Nagi M (2019) Automated breast cancer diagnosis based on machine learning algorithms. Journal of Healthcare Engineering 2019:4253641. https://doi.org/10.1155/2019/4253641
Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F (2021) Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin: 1–41. https://doi.org/10.3322/caac.21660
Tajammal Abbas Shah SSG (2017) Breast cancer screening programs: review of merits, demerits, and recent recommendations practiced across the world. \Journal of Microsc Ultrastruct 5(2):59–69. https://doi.org/10.1016/j.jmau.2016.10.002
Dua D, Graff C (2019) UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. University of California, School of Information and Computer Science, Irvine
Patrício M, Pereira J, Crisóstomo J, Matafome P, Gomes M, Seiça R, Caramelo F (2018) Using Resistin, glucose, age and BMI to predict the presence of breast cancer. BMC Cancer 18(1):29. https://doi.org/10.1186/s12885-017-3877-1
Ragab DA, Sharkas M, Marshall S, Ren J (2019) Breast cancer detection using deep convolutional neural networks and support vector machines. PeerJ 2019(1):1–23. https://doi.org/10.7717/peerj.6201
Cole KD, Lili Wang H-JH (2013) Breast cancer biomarker measurements and standards. Proteomics Clin Appl 7(1–2):17–29. https://doi.org/10.1002/prca.201200075
Das RN, Lee Y (2019) Relationship of leptin with glucose, BMI, age, insulin and breast cancer biomarkers. Arch Gen Int Med 3(1):9–10
Das RN, Lee Y, Mukherjee S, Oh S (2019b) Relationship of body mass index with diabetes and breast cancer biomarkers, 9, pp 1–6
Nath Das R, Lee Y (2019) Association of serum adiponectin with age, BMI and other breast cancer biomarkers. J Blood Lymph 08(04):8–11. https://doi.org/10.4172/2165-7831.1000233
Ramani RG, Sivagami G (2019) Identification of bio-markers for breast cancer detection through data mining methods. Int J Recent Technol Eng 8(2):763–769. https://doi.org/10.35940/ijrte.B1141.0782S319
Jacob SG, Ramani RG (2012) Efficient classifier for classification of prognostic breast cancer data through data mining techniques. In: Proceedings of the World Congress on Engineering and Computer Science 2012
Ramani G, Jacob SG (2013) Benchmarking classification models for cancer prediction from gene expression data: a novel approach and new findings. Stud Inform Control 22(2):133–142. https://doi.org/10.24846/v22i2y201303
Kruse CS, Ehrbar N (2020) Effects of computerized decision support systems on practitioner performance and patient outcomes: systematic review. JMIR Med Inform 8(8):1–8. https://doi.org/10.2196/17283
Lopez FJ, Cuadros M, Cano C, Concha A, Blanco A (2012) Biomedical application of fuzzy association rules for identifying breast cancer biomarkers. Med Biol Eng Compu 50(9):981–990. https://doi.org/10.1007/s11517-012-0914-8
Healthline. https://www.healthline.com/. Accessed 31 Mar 2021
Breastcancer.org. https://www.breastcancer.org/. Accessed 31 Mar 2021
Sun S, Sun Y, Rong X, Bai L (2019) High glucose promotes breast cancer proliferation and metastasis by impairing angiotensinogen expression. Biosci Rep 39(6):1–9. https://doi.org/10.1042/BSR20190436
Andò S, Gelsomino L, Panza S, Giordano C, Bonofiglio D, Barone I, Catalano S (2019) Obesity, leptin and breast cancer: epidemiological evidence and proposed mechanisms. Cancers 11(1):1–27. https://doi.org/10.3390/cancers11010062
Zeidan B, Manousopoulou A, Garay-Baquero DJ, White CH, Larkin SET, Potter KN, Roumeliotis TI, Papachristou EK, Copson E, Cutress RI, Beers SA, Eccles D, Townsend PA, Garbis SD (2018) Increased circulating resistin levels in early-onset breast cancer patients of normal body mass index correlate with lymph node negative involvement and longer disease free survival: a multi-center POSH cohort serum proteomics study. Breast Cancer Res 20(1):1–12. https://doi.org/10.1186/s13058-018-0938-6
Capasso I, Esposito E, Pentimalli F, Montella M, Crispo A, Maurea N, D’Aiuto M, Fucito A, Grimaldi M, Cavalcanti E, Esposito G, Brillante G, Lodato S, Pedicini T, D’Aiuto G, Ciliberto G, Giordano A (2013) ‘Homeostasis model assessment to detect insulin resistance and identify patients at high risk of breast cancer development: National Cancer Institute of Naples experience. J Exp Clin Cancer Res 32(1):1. https://doi.org/10.1186/1756-9966-32-14
Kang JH, Yu BY, Youn DS (2007) Relationship of serum adiponectin and resistin levels with breast cancer risk. J Korean Med Sci 22(1):117–121. https://doi.org/10.3346/jkms.2007.22.1.117
Muti P, Quattrin T, Grant BJB, Krogh V, Micheli A, Schünemann HJ, Ram M, Freudenheim JL, Sieri S, Trevisan M, Berrino F (2002) Fasting glucose is a risk factor for breast cancer: a prospective study. Cancer Epidemiol Biomark Prev 11(11):1361–1368
Assiri AMA, Kamel HFM, Hassanien MFR (2015) Resistin, visfatin, adiponectin, and leptin: risk of breast cancer in pre- and postmenopausal Saudi females and their possible diagnostic and predictive implications as novel biomarkers. Dis Markers 2015:1–9. https://doi.org/10.1155/2015/253519. Hindawi Publishing Corporation
Brandt J, Garne JP, Tengrup I, Manjer J (2015) Age at diagnosis inrelation to survival following breast cancer: a cohort study. WorldJ Surg Oncol 13:33. https://doi.org/10.1186/s12957-014-0429-x
Mignone P, Pio G, Džeroski S, Ceci M (2020) Multi-task learning for the simultaneous reconstruction of the human and mouse gene regulatory networks. Sci Rep. https://doi.org/10.1038/s41598-020-78033-7
Wang H, Zhang Q, Kong H, Zeng Y, Hao M, Yu T, Peng J, Xu Z, Chen J, Shi H (2014) Monocyte chemotactic protein-1 expression as a prognosic biomarker in patients with solid tumor: A meta-analysis. Int J Clin Exp Pathol 7(7):3876–3886
Kim BH, Yu K, Lee PCW (2020) Cancer classification of single-cell gene expression data by neural network. Bioinformatics 36(5):1360–1366. https://doi.org/10.1093/bioinformatics/btz772
Hasdyna N, Sianipar B, Zamzami EM (2020) Improving the performance of K-nearest neighbor algorithm by reducing the attributes of dataset using gain ratio. J Phys Conf Ser 1566(1). https://doi.org/10.1088/1742-6596/1566/1/012090
Singh BK (2019) Determining relevant biomarkers for prediction of breast cancer using anthropometric and clinical features: a comparative investigation in machine learning paradigm. Biocybern Biomed Eng 39(2):393–409. https://doi.org/10.1016/j.bbe.2019.03.001.
Silva Araújo V, Guimarães A, de Campos Souza P, Silva Rezende T, Souza Araújo V (2019) Using resistin, glucose, age and BMI and pruning fuzzy neural network for the construction of expert systems in the prediction of breast cancer. Machine Learning and Knowledge Extraction 1(1):466–482. https://doi.org/10.3390/make1010028
Akben SB (2019) Determination of the blood, hormone and obesity value ranges that indicate the breast cancer, using data mining based expert system. IRBM 40(6):355–360. https://doi.org/10.1016/j.irbm.2019.05.007.
Lalata JP, Maria LBS, Goh JEE, Goh MLI, Vicente HN, Factors AR (2019) Comparison of machine learning algorithms in breast cancer prediction using the Coimbra Dataset. Int J Simul Syst Sci Technol 1–8. https://doi.org/10.5013/IJSSST.a.20.S2.23.
Mohaimenul Islam M, Poly TN (2019) Machine learning models of breast cancer risk prediction. bioRxiv, pp 4. https://doi.org/10.1101/723304
Sardouk F, Dr. Duru AD, D. O. B. (2019) ‘Classification of Breast Cancer Using Data Mining’. Am Sci Res J Eng Technol Sci (ASRJETS) 51(1)
Livieris IE (2019) Improving the classification efficiency of an ANN utilizing a new training methodology. Informatics 6(1):1–17. https://doi.org/10.3390/informatics6010001
Polat K, Senturk U (2018) A novel ML approach to prediction of breast cancer: combining of mad normalization, KMC based feature weighting and AdaBoostM1 classifier. ISMSIT 2018 - 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies, Proceedings. https://doi.org/10.1109/ISMSIT.2018.8567245
Chaurasia V, Pal S, Tiwari BB (2018) Prediction of benign and malignant breast cancer using data mining techniques. J Algorithm Comput Technol 12(2):119–126. https://doi.org/10.1177/1748301818756225
Aslan MF, Celik Y, Kadir Sabanci AD (2018) Breast cancer diagnosis by different machine learning methods using blood analysis data. Int J Intell Syst Appl Eng 6(4):289–293
Li Y, Chen Z (2018) Performance evaluation of machine learning methods for breast cancer prediction. Appl Comput Math 7(4):212–216. https://doi.org/10.11648/j.acm.20180704.15
Hung PD, Hanh TD, Diep VT (2018) Breast cancer prediction using spark MLlib and ML packages. ACM International Conference Proceeding Series, pp 52–59. https://doi.org/10.1145/3309129.3309133
Hirra I, Ahmad M, Hussain A, Ashraf MU, Saeed IA, Qadri SF, Alghamdi AM, Alfakeeh AS (2021) Breast cancer classification from histopathological images using patch-based deep learning modeling. IEEE Access 9:24273–24287. https://doi.org/10.1109/ACCESS.2021.3056516
Adegoke VF, Chen D, Banissi E, Barikzai S (2017) Prediction of breast cancer survivability using ensemble algorithms. Proceedings of International Conference on Smart Systems and Technologies 2017, SST 2017, 2017-Decem, pp 223–231. https://doi.org/10.1109/SST.2017.8188699
Weli ZNS (2020) Data mining in cancer diagnosis and prediction: review about latest ten years. Curr J Appl Sci Technol 39(6):11–32. https://doi.org/10.9734/cjast/2020/v39i630555
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer-Verlag, New York. https://doi.org/10.1007/b98835
Constantin C (2014) Principal component analysis—a powerful tool in computing marketing information. Bulletin of the Transilvania University of Brasov. Series V:Economic Sciences 7(2):25–30
Faes L, Nollo G, Kirchner M, Olivetti E, Gaita F, Riccardi R, Antolini R (2001) Principal component analysis and cluster analysis for measuring the local organisation of human atrial fibrillation. Med Biol Eng Comp 39(6):656–663. https://doi.org/10.1007/BF02345438
Zhang Y, Xin Y, Li Q, Ma J, Li S, Lv X, Lv W (2017) Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications. Biomed Eng Online 16(1):1–15. https://doi.org/10.1186/s12938-017-0416-x
Fang J-Q (2014) Medical statistics and computer experiments, 2nd edn. World Scientific Publishing Co Pte Ltd. https://doi.org/10.1142/8981
Baratloo A, Hosseini M, Negida A, El-Ashal G (2015) Part 1: simple definition and calculation of accuracy, sensitivity and specificity. Emergency (Tehran, Iran) 3(2):48–49
Rakotomalala R (2005) TANAGRA: a free software for research and academic purposes. In: Proceedings of EGC, pp. 697–702. Available at: http://eric.univ-lyon2.fr/~ricco/tanagra/en/tanagra.html. Accessed 31 Mar 2021
Lever J & Martin K & Naomi A (2017) Points of Significance: Principal component analysis. Nat Meth 14:641–642. https://doi.org/10.1038/nmeth.4346
Mukaka MM (2012) Statistics corner: a guide to appropriate use of correlation coefficient in medical research. Malawi Med J 24(3).
Mooi E, Sarstedt M (2011) A concise guide to market research, second edn. Springer. https://doi.org/10.1007/978-3-642-12541-6
Williams B, Onsman A, Brown T (2012) Exploratory factor analysis: a five-step guide for novices EDUCATION exploratory factor analysis: a five-step guide for novices. Australas J Paramed 8(3):1–13. Available at: http://ro.ecu.edu.au/jephc/vol8/iss3/1. Accessed 31 Mar 2021
Gaskin J (2016) Gaskination’s StatWiki, Corporate. Available at: http://statwiki.kolobkreations.com
Author information
Authors and Affiliations
Contributions
R. Geetharamani: conceptualization, methodology, validation, resources, supervision, writing-review and editing.
G. Sivagami: software, formal analysis, investigation, data curation, writing-original draft, project administration.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: Duplicate image of Figure 1 was removed and Algorithm 1 was inserted.
Rights and permissions
About this article
Cite this article
R, G., G, S. Iterative principal component analysis method for improvised classification of breast cancer disease using blood sample analysis. Med Biol Eng Comput 59, 1973–1989 (2021). https://doi.org/10.1007/s11517-021-02405-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-021-02405-y