Skip to main content

Evaluating Diagnostic Performance of Machine Learning Algorithms on Breast Cancer

  • Conference paper
  • First Online:
Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques (IScIDE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9243))

Abstract

This paper focuses on comparing performance of six data mining methods namely: Bagging, SVM (SMO), Decorate, C4.5 (J48), Naïve Bayes and IBK in analyzing Wisconsin Breast Cancer (WBC) datasets. The datasets were obtained from the UCI Machine Learning Repository and comprises of 699 instances and 11 attributes. A confusion matrix, based on a 10-fold cross validation technique was used in our experiment to provide the basis for measuring the accuracy of each algorithm. We introduce an idea of combining the algorithms at classification level to obtain the most ideal multi-classifier approach for the WBC data set. Waikato Environment Knowledge Explorer (WEKA), open source data mining software was used for the experimental analysis. The experimental results show that SMO offers the best accuracy (97 %) among the six algorithms, while merging SMO, Naïve Bayes, J48 and IBK offers the best accuracy (97.3 %) on the data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ferlay, J., Soerjomataram, I., Ervik, M., Dikshit, R., Eser, S., Mathers, C., Rebelo, M., Parkin, D.M., Forman, D., Bray, F.: GLOBOCAN 2012 v1.0, cancer incidence and mortality worldwide: IARC cancer base no. 11 [Internet]. International Agency for Research on Cancer, Lyon, France (2013)

    Google Scholar 

  2. Danaei, G., et al.: Causes of cancer in the world: comparative risk assessment of nine behavioural and environmental risk factors. Lancet 366, 1784–1793 (2005)

    Article  Google Scholar 

  3. Lacey Jr., J.V., et al.: Breast cancer epidemiology according to recognized breast cancer risk factors in the prostate, lung, colorectal and ovarian (PLCO) cancer screening trial cohort. BMC Cancer 9, 84 (2009)

    Article  Google Scholar 

  4. Witten, H.I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Publishers, Burlington (2005)

    MATH  Google Scholar 

  5. Pei, J., Han, J., Wang, W.: Mining sequential patterns with constraints in large databases. In: Proceedings of 2002 International Conference on Information and Knowledge Management (CIKM 2002), Washington, D.C. (2001)

    Google Scholar 

  6. Mitchell, T.M.: Machine Learning. McGraw-Hill Science/Engineering/Math, Boston (1997)

    MATH  Google Scholar 

  7. Lichman, M.: UCI machine learning repository [http://archive.ics.uci.edu/ml]. University of California, School of Information and Computer Science, Irvine, CA

  8. Aruna, S., Rajagopalan, D.S., Nandakishore, L.V.: Knowledge based analysis of various statistical tools in detecting breast cancer. Comput. Sci. Inf. Technol. 2, 37–45 (2011)

    Google Scholar 

  9. Christobel, A., Sivaprakasam, Y.: An empirical comparison of data mining classification methods. Int. J. Comput. Inf. Syst. 3(2), 24–28 (2011)

    Google Scholar 

  10. Lavanya, D., UshaRani, K.: Analysis of feature selection with classification: breast cancer datasets. Indian J. Comput. Sci. Eng. (IJCSE) 2, 756–763 (2011)

    Google Scholar 

  11. Skevofilakas, M.T., Nikita, K.S.: A decision support system for breast cancer treatment based on data mining technologies and clinical practice guidelines. In: Proceedings of the 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference. IEEE (2005)

    Google Scholar 

  12. Frank, A., Asuncion, A.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA (2010)

    Google Scholar 

  13. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996). doi:10.1007/BF00058655. CiteSeerX: 10.1.1.121.7654

    MATH  Google Scholar 

  14. Melville, P., Money, R.: Constructing diverse classifier ensembles using artificial training examples. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pp. 505–510, Acapulco, Mexico (2003)

    Google Scholar 

  15. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Academic Press, San Francisco (2001). ISBN 1-55860-489-8

    MATH  Google Scholar 

  16. Vapnik, V.N.: The Nature of Statistical Learning Theory, 1st edn. Springer, New York (1995)

    Book  MATH  Google Scholar 

  17. Wu, X., Kumar, V., Quinlan, J.R., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14, 1–37 (2008)

    Article  Google Scholar 

  18. Matyja, D., Tuzinkiewicz, L.: Analysis of oncological data with use of MS BI SQL server. In: Proceedings of the Methods and Tools of Software Development Conference, pp. 293–306. Wroclaw University of Technology Publishing House (2007)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Chinese Scholarship Council, Harbin Engineering University and the Kenyan Government for their support in these efforts.

We also acknowledge Dr. William H. Wolberg at the University of Wisconsin for availing the breast cancer dataset used in our analysis.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Jiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Gatuha, G., Jiang, T. (2015). Evaluating Diagnostic Performance of Machine Learning Algorithms on Breast Cancer. In: He, X., et al. Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques. IScIDE 2015. Lecture Notes in Computer Science(), vol 9243. Springer, Cham. https://doi.org/10.1007/978-3-319-23862-3_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23862-3_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23861-6

  • Online ISBN: 978-3-319-23862-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics