Abstract
This paper focuses on comparing performance of six data mining methods namely: Bagging, SVM (SMO), Decorate, C4.5 (J48), Naïve Bayes and IBK in analyzing Wisconsin Breast Cancer (WBC) datasets. The datasets were obtained from the UCI Machine Learning Repository and comprises of 699 instances and 11 attributes. A confusion matrix, based on a 10-fold cross validation technique was used in our experiment to provide the basis for measuring the accuracy of each algorithm. We introduce an idea of combining the algorithms at classification level to obtain the most ideal multi-classifier approach for the WBC data set. Waikato Environment Knowledge Explorer (WEKA), open source data mining software was used for the experimental analysis. The experimental results show that SMO offers the best accuracy (97 %) among the six algorithms, while merging SMO, Naïve Bayes, J48 and IBK offers the best accuracy (97.3 %) on the data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ferlay, J., Soerjomataram, I., Ervik, M., Dikshit, R., Eser, S., Mathers, C., Rebelo, M., Parkin, D.M., Forman, D., Bray, F.: GLOBOCAN 2012 v1.0, cancer incidence and mortality worldwide: IARC cancer base no. 11 [Internet]. International Agency for Research on Cancer, Lyon, France (2013)
Danaei, G., et al.: Causes of cancer in the world: comparative risk assessment of nine behavioural and environmental risk factors. Lancet 366, 1784–1793 (2005)
Lacey Jr., J.V., et al.: Breast cancer epidemiology according to recognized breast cancer risk factors in the prostate, lung, colorectal and ovarian (PLCO) cancer screening trial cohort. BMC Cancer 9, 84 (2009)
Witten, H.I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Publishers, Burlington (2005)
Pei, J., Han, J., Wang, W.: Mining sequential patterns with constraints in large databases. In: Proceedings of 2002 International Conference on Information and Knowledge Management (CIKM 2002), Washington, D.C. (2001)
Mitchell, T.M.: Machine Learning. McGraw-Hill Science/Engineering/Math, Boston (1997)
Lichman, M.: UCI machine learning repository [http://archive.ics.uci.edu/ml]. University of California, School of Information and Computer Science, Irvine, CA
Aruna, S., Rajagopalan, D.S., Nandakishore, L.V.: Knowledge based analysis of various statistical tools in detecting breast cancer. Comput. Sci. Inf. Technol. 2, 37–45 (2011)
Christobel, A., Sivaprakasam, Y.: An empirical comparison of data mining classification methods. Int. J. Comput. Inf. Syst. 3(2), 24–28 (2011)
Lavanya, D., UshaRani, K.: Analysis of feature selection with classification: breast cancer datasets. Indian J. Comput. Sci. Eng. (IJCSE) 2, 756–763 (2011)
Skevofilakas, M.T., Nikita, K.S.: A decision support system for breast cancer treatment based on data mining technologies and clinical practice guidelines. In: Proceedings of the 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference. IEEE (2005)
Frank, A., Asuncion, A.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA (2010)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996). doi:10.1007/BF00058655. CiteSeerX: 10.1.1.121.7654
Melville, P., Money, R.: Constructing diverse classifier ensembles using artificial training examples. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pp. 505–510, Acapulco, Mexico (2003)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Academic Press, San Francisco (2001). ISBN 1-55860-489-8
Vapnik, V.N.: The Nature of Statistical Learning Theory, 1st edn. Springer, New York (1995)
Wu, X., Kumar, V., Quinlan, J.R., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14, 1–37 (2008)
Matyja, D., Tuzinkiewicz, L.: Analysis of oncological data with use of MS BI SQL server. In: Proceedings of the Methods and Tools of Software Development Conference, pp. 293–306. Wroclaw University of Technology Publishing House (2007)
Acknowledgements
The authors would like to thank the Chinese Scholarship Council, Harbin Engineering University and the Kenyan Government for their support in these efforts.
We also acknowledge Dr. William H. Wolberg at the University of Wisconsin for availing the breast cancer dataset used in our analysis.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Gatuha, G., Jiang, T. (2015). Evaluating Diagnostic Performance of Machine Learning Algorithms on Breast Cancer. In: He, X., et al. Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques. IScIDE 2015. Lecture Notes in Computer Science(), vol 9243. Springer, Cham. https://doi.org/10.1007/978-3-319-23862-3_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-23862-3_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23861-6
Online ISBN: 978-3-319-23862-3
eBook Packages: Computer ScienceComputer Science (R0)