On the Performance of Ensemble Learning for Automated Diagnosis of Breast Cancer

  • Aytuğ OnanEmail author
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 347)


The automated diagnosis of diseases with high accuracy rate is one of the most crucial problems in medical informatics. Machine learning algorithms are widely utilized for automatic detection of illnesses. Breast cancer is one of the most common cancer types in females and the second most common cause of death from cancer in females. Hence, developing an efficient classifier for automated diagnosis of breast cancer is essential to improve the chance of diagnosing the disease at the earlier stages and treating it more properly. Ensemble learning is a branch of machine learning that seeks to use multiple learning algorithms so that better predictive performance acquired. Ensemble learning is a promising field for improving the performance of base classifiers. This paper is concerned with the comparative assessment of the performance of six popular ensemble methods (Bagging, Dagging, Ada Boost, Multi Boost, Decorate, and Random Subspace) based on fourteen base learners (Bayes Net, FURIA, K-nearest Neighbors, C4.5, RIPPER, Kernel Logistic Regression, K-star, Logistic Regression, Multilayer Perceptron, Naïve Bayes, Random Forest, Simple Cart, Support Vector Machine, and LMT) for automatic detection of breast cancer. The empirical results indicate that ensemble learning can improve the predictive performance of base learners on medical domain. The best results for comparative experiments are acquired with Random Subspace ensemble method. The experiments show that ensemble learning methods are appropriate methods to improve the performance of classifiers for medical diagnosis.


Ensemble learning Breast cancer diagnosis Classification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ahmad, A.: Breast Cancer Metastasis and Drug Resistance Progress and Prospects. Springer, Berlin (2013)CrossRefGoogle Scholar
  2. 2.
    Tabar, L., Tot, T., Dean, P.B.: Breast Cancer-The Art and Science of Early Detection with Mammography: Perception, Interpretation, Histopathologic Correlation. Thieme, New York (2004)Google Scholar
  3. 3.
    Westa, D., Mangiamelib, P., Rampalc, R., Westd, V.: Ensemble strategies for a medical diagnostic decision support system: A breast cancer diagnosis application. European Journal of Operational Research 162(2), 532–551 (2005)CrossRefGoogle Scholar
  4. 4.
    Lundin, M., Lundin, J., Burke, H.B., Toikkanen, S., Pylkkanen, L., Joensuu, H.: Artificial Neural Networks Applied to Survival Prediction in Breast Cancer. Oncology 57, 281–286 (1999)CrossRefGoogle Scholar
  5. 5.
    Bellaachia, A., Guven, E.: Predicting Breast Cancer Survivability using Data Mining Techniques. In: Proceedings of the Sixth SIAM International Conference on Data Mining, pp. 1–4. SIAM, Maryland (2006)Google Scholar
  6. 6.
    Akay, M.F.: Support vector machines combined with feature selection for breast cancer diagnosis. Expert Systems with Applications 36(2), 3240–3247 (2009)CrossRefGoogle Scholar
  7. 7.
    Delen, D., Walker, G., Kadam, A.: Predicting breast cancer survivability: a comparison of three data mining methods. Artificial Intelligence in Medicine 34(2), 113–127 (2005)CrossRefGoogle Scholar
  8. 8.
    Ubeyli, E.D.: Adaptive neuro-fuzzy inference systems for automatic detection of breast cancer. Journal of Medical Systems 33(5), 353–358 (2009)CrossRefGoogle Scholar
  9. 9.
    Thongkam, J., Sukmak, V.: Bagging Random Tree for Analyzing Breast Cancer Survival. KKU Res. J. 17(1), 1–13 (2012)Google Scholar
  10. 10.
    Ya-Qin, L., Cheng, W.: Decision Tree Based Predictive Models for Breast Cancer Survivability on Imbalanced Data. In: Proc. 3rd International Conference on Bioinformatics and Biomedical Engineering, pp. 1–4. IEEE Press, New York (2009)Google Scholar
  11. 11.
    Lavanya, D., Rani, K.U.: Ensemble Decision Tree Classifier for Breast Cancer Data. International Journal of Information Technology Convergence and Services (IJITCS) 2(1), 17–24 (2012)CrossRefGoogle Scholar
  12. 12.
    Cruz, J.A., Wishart, D.S.: Application of Machine Learning in Cancer Prediction and Prognosis. Cancer Informatics 2006(2), 59–77 (2006)Google Scholar
  13. 13.
    Gayathri, B.M., Sumathi, C.P., Santhanam, T.: Breast Cancer Diagnosis Using Machine Learning Algorithm- A Survey. International Journal of Distributed and Parallel Systems 4(3), 105–112 (2013)CrossRefGoogle Scholar
  14. 14.
    Li, L., Hu, Q., Wu, X., Yu, D.: Exploration of classification confidence in ensemble learning. Pattern Recognition 47, 3120–3131 (2014)CrossRefGoogle Scholar
  15. 15.
    Cohen, W.W.: Fast Effective Rule Induction. In: Proc. Twelfth International Conference on Machine Learning, pp. 115–123. Morgan Kaufmann, San Francisco (1995)Google Scholar
  16. 16.
    Duma, M., Twala, B., Marwala, T., Nelwamondo, F.V.: Improving the Performance of the Ripper in Insurance Risk Classification- A Comparative Study using Feature Selection. In: Ferrier, J.-L., Bernard, A., Yu, O., Gusikin, K.M. (eds.) Proceedings of the 8th International Conference on Informatics in Control, Automation and Robotics, vol. 1, pp. 203–210. SciTePress, Netherlands (2011)Google Scholar
  17. 17.
    Hühn, J., Hüllermeier, E.: FURIA: an algorithm for unordered fuzzy rule induction. Data Mining and Knowledge Discovery 19(3), 293–319 (2009)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Aha, D.W., Kibler, D., Albert, M.K.: Instance-Based Learning Algorithms. Machine Learning 6, 37–66 (1991)Google Scholar
  19. 19.
    Wu, X., Kumar, V.: The Top Ten Algorithms in Data Mining. Taylor & Francis Group, New York (2009)CrossRefzbMATHGoogle Scholar
  20. 20.
    Clearly, J.G., Trigg, L.E.: K*: An Instance-based learner using and entropic distance measure. In: Proc. Twelfth International Conference on Machine Learning, pp. 108–114. Morgan Kaufmann, San Francisco (1995)Google Scholar
  21. 21.
    John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proc. of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Francisco (1995)Google Scholar
  22. 22.
    Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2011)Google Scholar
  23. 23.
    Bouckaert, R.R.: Bayesian Network Classifiers in Weka,
  24. 24.
    Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2011)Google Scholar
  25. 25.
    Cessie, S.L., VanHowelingen, J.C.: Ridge Estimators in Logistic Regression. Applied Statistics 41(1), 191–201 (1992)CrossRefzbMATHGoogle Scholar
  26. 26.
    Negnevitsky, M.: Artificial Intelligence: A Guide to Intelligent Systems. Addison-Wesley, Reading (2005)Google Scholar
  27. 27.
    Platt, J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods-Support Vector Learning. MIT Press, Cambridge (1998) Google Scholar
  28. 28.
    Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
  29. 29.
    Niuniu, X., Yuxun, L.: Review of Decision Trees. In: Proc The Third IEEE International Conferrence on Computer Science and Information Technology, pp. 105–109. IEEE Press, New York (2010)Google Scholar
  30. 30.
    Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  31. 31.
    Landwehr, N., Hall, M., Frank, E.: Logistic Model Trees. Machine Learning 59, 161–205 (2005)CrossRefzbMATHGoogle Scholar
  32. 32.
    Doestcsh, P., Buck, C., Golik, P., Hoppe, N.: Logistic Model Trees with AUCsplit Criterion for KDD Cup 2009 Small Challgenge. Journal of Machine Learning Research 7, 77–88 (2009)Google Scholar
  33. 33.
    Loh, W.Y.: Classification and regression trees. WIREs Data Mining and Knowledge Discovery 1, 14–23 (2011)CrossRefGoogle Scholar
  34. 34.
    Breiman, L.: Bagging predictors. Machine Learning 4(2), 123–140 (1996)Google Scholar
  35. 35.
    Rokach, L.: Ensemble-based classifiers. Artificial Intelligence Review 33, 1–39 (2010)CrossRefGoogle Scholar
  36. 36.
    Ting, K.M., Witten, I.H.: Stacking Bagged and Dagged Models. In: Fourteenth International Conference on Machine Learning, pp. 367–375. Morgan Kaufmann, San Francisco (1997)Google Scholar
  37. 37.
    Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proc of the Thirteenth International Conference on Machine Learning, pp. 148–156. Morgan Kaufmann, San Francisco (1996)Google Scholar
  38. 38.
    Opitz, D., Maclin, R.: Popular Ensemble Methods: An Empirical Study. Journal of Artificial Intelligence Research 11, 169–198 (1999)zbMATHGoogle Scholar
  39. 39.
    Guo, H., Viktor, H.L.: Boosting with Data Generation: Improving the Classification of Hard to Learn Examples. In: Orchard, B., Yang, C., Ali, M. (eds.) IEA/AIE 2004. LNCS (LNAI), vol. 3029, pp. 1082–1091. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  40. 40.
    Webb, G.I.: MultiBoosting: A Technique for Combining Boosting and Wagging. Machine Learning 40, 159–196 (2000)CrossRefGoogle Scholar
  41. 41.
    Melville, P., Mooney, R.J.: Constructing Diverse Classifier Ensembles using Artificial Training Examples. In: Proceedings of the 18th IJCAI, pp. 505–510. Morgan Kaufmann, San Francisco (2003)Google Scholar
  42. 42.
    Ho, T.K.: The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)CrossRefGoogle Scholar
  43. 43.
    Mangasarian, O.L., Wolberg, W.H.: Cancer diagnosis via linear programming. SIAM News 23(5), 1–18 (1990)Google Scholar
  44. 44.
    Bache, K., Lichman, M.: UCI Machine Learning Repository,

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Faculty of Engineering, Department of Computer EngineeringCelal Bayar UniversityManisaTurkey

Personalised recommendations