Abstract
Breast cancer is one of the primary causes of death among the women worldwide, and the accurate diagnosis is one of the most significant steps in breast cancer treatment. Data mining techniques can support doctors in diagnosis decision-making process. In this paper, we present different data mining techniques for diagnosis of breast cancer. Two different Wisconsin Breast Cancer datasets have been used to evaluate the system proposed in this study. The proposed system has two stages. In the first stage, in order to eliminate insignificant features, genetic algorithms are used for extraction of informative and significant features. This process reduces the computational complexity and speed up the data mining process. In the second stage, several data mining techniques are employed to make a decision for two different categories of subjects with or without breast cancer. Different individual and multiple classifier systems were used in the second stage in order to construct accurate system for breast cancer classification. The performance of the methods is evaluated using classification accuracy, area under receiver operating characteristic curves and F-measure. Results obtained with the Rotation Forest model with GA-based 14 features show the highest classification accuracy (99.48 %), and when compared with the previous works, the proposed approach reveals the enhancement in performances. Results obtained in this study have potential to open new opportunities in diagnosis of breast cancer.
Similar content being viewed by others
References
Abbas HA (2001) An evolutionary artificial neural network approach for breast cancer diagnosis. Artif Intell Med 25:265–281
Abonyi J, Szeifert F (2003) Supervised fuzzy clustering for the identification of fuzzy classifiers. Pattern Recogn Lett 24(14):2195–2207
Albrecht AA, Lappas G, Vinterbo SA, Wong CK, Ohno-Machado L (2002) Two applications of the LSA machine. In: 9th international conference on neural information processing, pp 184–189
Astudillo CA, Oommenb BJ (2013) On achieving semi-supervised pattern recognition by utilizing tree-based SOMs. Pattern Recogn 46(1):293–304
Breiman L (2001) Random forests. Mach Learn 45:5–32
Cevikalp H, Triggs B, Yavuz HS, Kucuk Y, Kucuk M, Barkana A (2010) Large margin classifiers based on affine hulls. Neurocomputing 73:3160–3168
Chang PC, Fan CY, Dzan WY (2010) A CBR-based fuzzy decision tree approach for database classification. Expert Syst Appl 37:214–225
Chen HL, Yang B, Wang SJ, Liu DY, Li HZ, Wen BL (2014) Towards an optimal support vector machine classifier using a parallel particle swarm optimization strategy. Appl Math Comput 239:180–197
Chen H-L, Yang B, Liu J, Liu D-Y (2011) A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis. Expert Syst Appl 38(7):9014–9022
Du K-L, Swamy M (2006) Neural networks in a softcomputing framework. Springer, New York
Fan CY, Chang PC, Lin JJ, Hsieh JC (2011) A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classification. Appl Soft Comput 11:632–644
Fielding AH (2007) Cluster and classification techniques for the biosciences. Cambridge University Press, Cambridge
Fogel DB, Wasson EC, Boughton EM (1995) Evolving neural network for detecting breast cancer. Cancer Lett 96:49–53
Gadaras I, Mikhailov L (2009) An interpretable fuzzy rule-based classification methodology for medical diagnosis. Artif Intell Med 47(1):25–41
Goodman D, Boggess L, Watkins A (2002) Artificial immune system classification of multiple-class problems. In: Intelligent engineering systems through artificial neural networks: smart engineering system design: neural networks, fuzzy logic, evolutionary programming, complex systems and artificial life, vol 12, pp 179–184
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18
Hamilton HJ, Shan N, Cercone N (1996) RIAC: a rule induction algorithm based on approximate classification. In: International conference on engineering applications of neural networks
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36
Hassan MR, Begg R, Morsi Y, Lynch K (2006) HMM-fuzzy model for breast cancer diagnosis. In: 15th international conference on machines in medicine and biology
Hassan MR, Hossain MM, Begg RK, Ramamohanarao K, Morsi Y (2010) Breast-cancer identification using HMM-fuzzy approach. Comput Biol Med 40:240–251
Hassanien AE (2004) Rough set approach for attribute reduction and rule generation. J Am Soc Inf Sci Technol 55(11):954–962
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, California
Haykin S (2005) Neural networks: a comprehensive foundation. Pearson Education, New York
Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification. http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
Jerez-Aragones J, Gomez-Ruiz JA, Ramos-Jimenez G, Munoz-Perez J, Alba-Conejo E (2003) A combined neural network and decision trees model for prognosis of breast cancer relapse. Artif Intell Med 27(1):45–63
Kim SB, Rattakorn P (2011) Unsupervised feature selection using weighted principal components. Expert Syst Appl 38:5704–5710
Koloseni D, Lampinen J, Luukka P (2013) Differential evolution based nearest prototype classifier with optimized distance measures for the features in the data sets. Expert Syst Appl 40(10):4075–4082
Law M, Figueiredo M, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26(9):1154–1166
Li DC, Liu CW (2010) A class possibility based kernel to increase classification accuracy for small data sets using support vector machines. Expert Syst Appl 37:3104–3110
Lim CK, Chan CS (2015) A weighted inference engine based on interval-valued fuzzy relational theory. Expert Syst Appl 42(7):3410–3419
Liu X, Ren Y (2010) Novel artificial intelligent techniques via AFS theory: feature selection, concept categorization and characteristic description. Appl Soft Comput 10:793–805
Maglogiannis I, Zafiropoulos E (2009) An intelligent system for automated breast cancer diagnosis and prognosis using SVM based classifiers. Appl Intell 30(1):24–36
Marcano-Cedeño A, Quintanilla-Domínguez J, Andina D (2011) WBCD breast cancer database classification applying artificial metaplasticity neural network. Expert Syst Appl 38(11):9573–9579
Müller KR, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–202
Nauck D, Kruse R (1999) Obtaining interpretable fuzzy classification rules from medical data. Artif Intell Med 16:149–169
Obuchowski NA (2003) Receiver operating characteristic curves and their use in radiology. Radiology 229:3–8
Pawlak Z (1982) Rough sets. Int J Parallel Prog 11(5):341–356
Pena-Reyes CA, Sipper M (1999) A fuzzy-genetic approach to breast cancer diagnosis. Artif Intell Med 17:131–155
Peng L, Yang B, Jiang J (2009) A novel feature selection approach for biomedical data classification. J Biomed Inform 179(1):809–819
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, Los Altos
Quinlan JR (1996) Improved use of continuous attributes in C4.5. J Artif Intell Res 4:77–90
Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630
Saez JA, Derrac J, Luengo J, Herrera F (2014) Statistical computation of feature weighting schemes through data estimation for nearest neighbor classifiers. Pattern Recogn 47(12):3941–3948
Sahan S, Polat K (2007) A new hybrid method based on fuzzy-artificial immune system and k-nn algorithm for breast cancer diagnosis. Comput Biol Med 3:415–423
Salzberg SL (1997) On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min Knowl Disc 1:317–328
Sebe N, Cohen I, Garg A, Huang TS (2005) Machine learning in computer vision. Springer, New York
Setiono R (2000) Generating concise and accurate classification rules for breast cancer diagnosis. Artif Intell Med 18(3):205–217
Ster B, Dobnikar A (1996) Neural networks in medical diagnosis: comparison with other methods. In: Proceedings of the international conference on engineering applications of neural networks, pp 427–430
Stoean R, Stoean C (2013) Modeling medical decision making by support vector machines, explaining by rules of evolutionary algorithms with feature selection. Expert Syst Appl 40:2677–2686
Swets JA (1979) ROC analysis applied to the evaluation of medical imaging techniques. Invest Radiol 14:109–121
Tabakhi S, Moradi P, Akhlaghian F (2014) An unsupervised feature selection algorithm based on ant colony optimization. Eng Appl Artif Intell 32:112–123
UCI Machine Learning Repository: Breast Cancer Wisconsin (Diagnostic) Data Set (2012) Retrieved 15 Mar 2012, from UCI Machine Learning Repository: Breast Cancer Wisconsin (Diagnostic) Data set: http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
UCI Machine Learning Repository: Breast Cancer Wisconsin (Original) Data Set (2012) Retrieved 16 Mar 2012, from UCI Machine Learning Repository: Breast Cancer Wisconsin (Original) Data set: http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original)
Vapnik VN (2005) The nature of statistical learning theory. Spinger, New York
Wang CJ, Huang CL (2006) A GA-based feature selection and parameters optimization. Expert Syst Appl 31:231–240
Weka 3: Data Mining with Open Source Machine Learning Software in Java (2012) Retrieved 15 Mar 2012, from Weka 3—Data Mining with Open Source Machine Learning Software in Java. http://www.cs.waikato.ac.nz/~ml/weka/
WHO | Breast Cancer: Prevention and Control (2015) Retrieved 20 Jan 2015, from WHO | World Health Organization. http://www.who.int/cancer/detection/breastcancer/en/index1.html
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Elsevier, San Francisco
Zhao M, Fu C, Ji L, Tang K, Zhou M (2011) Feature selection and parameter optimization for support vector machines: a new approach based on genetic algorithm with feature chromosomes. Expert Syst Appl 38:5197–5204
Zheng B, Yoon SW, Lam SS (2014) Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Syst Appl 41(4):1476–1482
Zweig MH, Campbell G (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39:561–577
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Aličković, E., Subasi, A. Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Comput & Applic 28, 753–763 (2017). https://doi.org/10.1007/s00521-015-2103-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-015-2103-9