Prediction of Malignant and Benign Breast Cancer: A Data Mining Approach in Healthcare Applications

  • Vivek KumarEmail author
  • Brojo Kishore Mishra
  • Manuel Mazzara
  • Dang N. H. Thanh
  • Abhishek Verma
Conference paper
Part of the Lecture Notes on Data Engineering and Communications Technologies book series (LNDECT, volume 37)


As much as data science is playing a pivotal role everywhere, health care also finds its prominent application. Breast Cancer is the top-rated type of cancer amongst women; which alone took away 627,000 lives. This high mortality rate due to breast cancer does need attention, for early detection so that prevention can be done in time. As a potential contributor to state-of-the-art technology development, data mining finds a multi-fold application in predicting Brest cancer. This work focuses on different classification techniques implementation for data mining in predicting malignant and benign breast cancer. Breast Cancer Wisconsin data set from the UCI repository has been used as an experimental dataset while attribute clump thickness being used as an evaluation class. The performances of these twelve algorithms: Ada Boost M1, Decision Table, J-Rip, J48, Lazy IBK, Lazy K-star, Logistics Regression, Multiclass Classifier, Multilayer–Perceptron, Naïve Bayes, Random Forest, and Random Tree is analyzed on this data set.


Data mining Classification techniques UCI repository Breast cancer Classification algorithms 



Mean absolute error


Root mean squared error


Relative absolute error


Root relative squared error


True Positive


True Negative


  1. 1.
  2. 2.
    American Institute of Cancer Research Statistics,
  3. 3.
  4. 4.
    V. Kumar, P. Tiwari, B.K. Mishra, S. Kumar, Implementation of n-gram methodology for rotten tomatoes review dataset sentiment analysis. Int. J. Knowl. Discov. Bioinform (IJKDB). 7(1), 30–41 (2017). Scholar
  5. 5.
    V. Kumar, A. Verma, N. Mittal, S.V. Gromov, Anatomy of preprocessing of big data for monolingual corpora paraphrase extraction: source language sentence selection, in Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol. 814 (Springer Nature, Singapore, 2019), pp. 495–505. Scholar
  6. 6.
    V. Kumar, D. Kalitin, P. Tiwari, Unsupervised learning dimensionality reduction algorithm PCA for face recognition, in IEEE Xplore: International Conference on. Computing, Communication and Automation (ICCCA) (2017), pp. 32–37.
  7. 7.
    V. Kumar, R. Zinovyev, A. Verma, P. Tiwari, Performance evaluation of lazy and decision tree classifier: a data mining approach for global celebrity‘s death analysis. IEEE Xplore: In International Conference on Research in Intelligent and Computing in Engineering (RICE) (2018), pp. 1–6,
  8. 8.
    V. Kumar, M. Mazzara, A. Messina, J.Y. Lee, A conjoint application of data mining techniques for analysis of global terrorist attacks, prevention and prediction for combating terrorism, in Proceedings of 6th International Conference in Software Engineering for Defense Applications- SEDA 2018. Advances in Intelligent Systems and Computing, vol. 925 (Springer Nature, Switzerland, 2019), pp. 1–13. Scholar
  9. 9.
    V. Chaurasia, S. Pal, B.B. Tiwari, Prediction of benign and malignant breast cancer using data mining techniques. J. Algorithms Comput. Technol. 12(2), 119–126 (2018). Scholar
  10. 10.
    D. Verma, N. Mishra, Analysis and prediction of breast cancer and diabetes disease datasets using data mining classification techniques, in Proceedings of the International Conference on Intelligent Sustainable Systems (ICISS), pp. 533–538 (2017)Google Scholar
  11. 11.
    U. Ojha, S. Goel, A study on prediction of breast cancer recurrence using data mining techniques, in 7th International Conference on Cloud Computing, Data Science & Engineering—Confluence, pp. 527–530 (2017)Google Scholar
  12. 12.
    B.L. Rodrigues, Analysis of the Wisconsin breast cancer dataset and machine learning for breast cancer detection, in Proceedings of XI Workshop de Visão Computational, pp. 15–19 (2015)Google Scholar
  13. 13.
    S. Saxena, K. Burse, A survey on neural network techniques for classification of breast cancer data. Int. J. Eng. Adv. Technol. 2(1), 234–237 (2012)Google Scholar
  14. 14.
    UCI Machine Learning Repository: Breast Cancer Wisconsin Dataset,
  15. 15.
    O.L. Mangasarian, W.H. Wolberg, Cancer diagnosis via linear programming. SIAM News 23(5), 1–18 (1990)Google Scholar
  16. 16.
    W.H. Wolberg, O.L. Mangasarian, Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Natl. Acad. Sci. USA. 87, 9193–9196 (1990)CrossRefGoogle Scholar
  17. 17.
    O.L. Mangasarian, R. Setiono, W.H. Wolberg, Pattern recognition via linear programming: theory and application to medical diagnosis, in Large-Scale Numerical Optimization, ed. by T.F. Coleman, Y. Li (SIAM Publications, Philadelphia, 1990), pp. 22–30Google Scholar
  18. 18.
    K.P. Bennett, O.L. Mangasarian, Robust linear programming discrimination of two linearly inseparable sets, Optim. Method Softw. 1, 23–34 (1992). Gordon & Breach Science PublishersGoogle Scholar
  19. 19.
    V. Kumar, M. Mazzara, B.K. Mishra, D.N.H. Thanh, A. Verma,

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.National University of Science and Technology-MiSiSMoscowRussian Federation
  2. 2.GIET UniversityGunupurIndia
  3. 3.Innopolis UniversityKazanRussian Federation
  4. 4.Hue College of IndustryHueVietnam
  5. 5.Malaviya National Institute of TechnologyJaipurIndia

Personalised recommendations