Abstract
Breast cancer is the second most common cause of death among the women worldwide, whereas the early detection may well lead to a longer survival or even full recovery. With the development of clinical technologies, massive tumor feature data become available to be collected and meanwhile many machine learning techniques have been introduced to support doctors in diagnostic decision-making process. In this paper, we develop a support vector machine (SVM) based diagnosing system which mainly consists of three stages. For the first stage, principal component analysis is implemented to eliminate the redundant information and extract representive patterns out of the original data. This procedure reduces the feature space dimension which cuts down the computational complexity significantly. In the second stage, we search the optimal parameter values for SVM using the differential evolution algorithm. At last, a classifier is trained to differentiate the incoming tumors. In order to objectively and comprehensively evaluate the classifier’s performance, a series of indices are considered simultaneously such as classification accuracy, sensitivity, specificity and area under receiver operating characteristic curves. In comparison with K-nearest neighbor, random forest, bagging, naive bayes, decision tree and other classificaiton approaches, the proposed method presents a superior performance when tested on the Wisconsin Diagnostic Breast Cancer (WDBC) data set from the University of California with fivefold cross validation.
Similar content being viewed by others
References
Torre LA, Sauer AMG, Chen MS (2016) Cancer statistics for Asian Americans, Native Hawaiians, and Pacific Islanders, 2016: Converging incidence in males and females. CA Cancer J Clin 66(3):182
Sivaramakrishna R, Gordon R (1997) Detection of breast cancer at a smaller size can reduce the likelihood of metastatic spread: a quantitative analysis. Acad Radiol 4(1):8–12
Norden A, https://www.forbes.com/sites/ibm/2017/01/26/putting-ibm-watson-to-the-test-for-cancer-care/#7b2313ed4990. 26 Jan 2017
Razavi AR, Gill H, Ahlfeldt H, Shahsavar N (2007) Predicting metastasis in breast cancer: comparing a decision tree with domain experts. J Med Syst 31(4):263–273
Sumbaly R, Vishnusri N, Jeyalatha S (2014) Diagnosis of breast cancer using decision tree data mining technique. Int J Compu Appl 98(10):16–24
Pawlovsky AP, Mai N, (2014) A method to select a good setting for the kNN algorithm when using it for breast cancer prognosis. In: IEEE-EMBS international conference on biomedical and health informatics, pp 189–192
Iruthayaraj NOS, Sairam N (2015) Prediction of breast cancer outcome using KNN with dimensionality reduction. Int J Appl Eng Res 10(11):27869–27878
Dheeba J, Singh NA, Selvi ST (2014) Computer-aided detection of breast cancer on mammograms: a swarm intelligence optimized wavelet neural network approach. J Biomed Inform 49(C):45–52
Chhatwal J, Alagoz OM, Kahn-Ce J, Shaffer K, Burnside E (2009) A logistic regression model based on the national mammography database format to aid breast cancer diagnosis. Ajr Am J Roentgenol 192(4):1117
Li JB (2012) Mammographic image based breast tissue classification with kernel self-optimized fisher discriminant for breast cancer diagnosis. J Med Syst 36(4):2235
Kharya S, Agrawal S, Soni S (2014) Naive bayes classifiers: A probabilistic detection model for breast cancer. Int J Comput Appl 92(10):26–31
Kharya S, Soni S, Weighted naive bayes classifier: a predictive model for breast cancer detection. Int J Comput Appl (2016) 133
Palivela H, Yogish HK, Vijaykumar S, Patil K (2013) Survey on mining techniques for breast cancer related data. In: International conference on information communication and embedded systems. pp 540–546
Krawczyk B, Galar M, Jelen L, Herrera F (2015) Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Appl Soft Comput 38(C):714–726
Nguyen C, Wang Y, Nguyen HN (2013) Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic. J Biomed Sci Eng 06(5):551–560
Maglogiannis I, Zafiropoulos E, Anagnostopoulos I (2009) An intelligent system for automated breast cancer diagnosis and prognosis using SVM based classifiers. Appl Intell 30(1):24–36
Zhao X, Wong EK, Wang Y, Lymberis S, Wen B (2010) A support vector machine (SVM) for predicting preferred treatment position in radiotherapy of patients with breast cancer. Med Phys 37(10):5341–5350
Azizi N, Tlili-Guiassa Y, Zemmal N (2013) A computer-aided diagnosis system for breast cancer combining features complementarily and new scheme of SVM classifiers fusion. Int J Multimed Ubiquitous Eng 8(4):45–58
Ghosh S, Mondal S, Ghosh B (2014) A comparative study of breast cancer detection based on SVM and MLP BPN classifier. In: International conference on automation, pp 1–4
Torrentsbarrena J, Puig D, Melendez J, Valls A (2015) Computer-aided diagnosis of breast cancer via Gabor wavelet bank and binary-class SVM in mammographic images. J Exp Theor Artif Intell 28(1):1–17
Ndez-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res 15(1):3133–3181
Zhang JH (2012) Optimization of kernel function parameters SVM based on the GA. Adv Mater Res 433–440:4124–4128
Liu XY, Jiang HY, Tang FZ (2010) Parameters optimization in SVM based-on ant colony optimization algorithm. Adv Mater Res 121–122:470–475
Jiang H, Tang F, Zhang X (2010) Liver cancer identification based on PSO-SVM model. In: International conference on control automation robotics and vision, pp 2519–2523
Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2(2):559–572
Hotelling H (1932) Analysis of a complex of statistical variables into principal components. Br J Educ Psychol 24(6):417–520
Hotelling H (1935) Relations between two sets of variates. Biometrika 28(3):321–377
Huang YL, Wang KL, Chen DR (2006) Diagnosis of breast tumors with ultrasonic texture analysis using support vector machines. Neural Comput Applic 15(2):164–169
Elsayed SM, Sarker RA, Ray T (2012) Parameters adaptation in differential evolution. IEEE Congr Evolut Comput 22(10):1–8
Kimotho JK, Sondermann-Woelke C, Meyer T, Sextro W (2013) Machinery prognostic method based on multi-class support vector machines and hybrid differential evolution-particle swarm optimization. In: IEEE conference on prognostic system health management, pp 619–624
Acknowledgements
This work was funded by the National Natural Science Foundation of China (Nos. 71571123, 71532007).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, L., Xu, Z. Feature extraction by PCA and diagnosis of breast tumors using SVM with DE-based parameter tuning. Int. J. Mach. Learn. & Cyber. 10, 591–601 (2019). https://doi.org/10.1007/s13042-017-0741-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-017-0741-1