Analysis of Breast Cancer Dataset Using Big Data Algorithms for Accuracy of Diseases Prediction

Conference paper
Part of the Lecture Notes on Data Engineering and Communications Technologies book series (LNDECT, volume 44)


Data Mining Techniques easily handle and solve the problem of handling the massive amount of data due to heterogeneous data, missing data, inconsistent data. HealthCare is one of the most important applications of Big Data. Diagnosis of diseases like cancer at an early stage is also very crucial. This paper focuses on the prediction model analysis for the breast cancer diagnosis either benign or malignant at an early stage as it increases the chances for successful treatment So predicting breast cancer at benign increases the survival rate of women. Data mining classification algorithm like SVM, Naive Bayes, k-NN, Decision Tree compares a variety of statistical techniques like accuracy, sensitivity, specification, positive prediction value, negative predictive value, area under curve and plotted ROC curve in R analytical tool which is promising independent tool for handling huge datasets is proven better in a prediction of the breast cancer diagnosis.


Big data Cancer Breast cancer Data mining classification algorithm R analytical tool Prediction 


  1. 1.
    Sakri, S.B., Rashid, N.B.A., Zain, Z.M.: Particle swarm optimization feature selection for breast cancer recurrence prediction. IEEE Access 6, 29 (2018)CrossRefGoogle Scholar
  2. 2.
    Alwidian, J., Hammo, B.H., Obeid, N.: WCBA: weighted classification based on association rules algorithm for breast cancer disease. Appl. Soft Comput. 62, 536–549 (2018)CrossRefGoogle Scholar
  3. 3.
    Asri, H., Mousannif, H., Al Moatassime, H., Noel, T.: Using machine learning algorithms for breast cancer risk prediction and diagnosis. Procedia Comput. Sci. 83, 1064–1069 (2016)CrossRefGoogle Scholar
  4. 4.
    Tripathy, P., Rautaray, S.S., Pandey, M.: Parallel support vector machine used in map-reduce for risk analysis. In: 2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1–4. IEEE (2017)Google Scholar
  5. 5.
    Bhardwaj, A., Tiwari, A.: Breast cancer diagnosis using genetically optimized neural network model. Expert Syst. Appl. 42(10), 4611–4620 (2015)CrossRefGoogle Scholar
  6. 6.
    Gupta, S., Kumar, D., Sharma, A.: Data mining classification techniques applied for breast cancer diagnosis and prognosis. Indian J. Comput. Sci. Eng. (IJCSE) 2(2), 188–195 (2011)Google Scholar
  7. 7.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  8. 8.
    Zaki, M.J., Meira Jr., W.: Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, New York (2014)CrossRefGoogle Scholar
  9. 9.
    Jonsdottir, T., Hvannberg, E.T., Sigurdsson, H., Sigurdsson, S.: The feasibility of constructing a Predictive outcome model for breast cancer using the tools of data mining. Expert Syst. Appl. 34(1), 108–118 (2008)CrossRefGoogle Scholar
  10. 10.
    Han, J., Pei, J., Kamber, M.: Data mining: concepts and techniques. Elsevier, New York (2011)zbMATHGoogle Scholar
  11. 11.
  12. 12.
    Shah, C., Jivani, A.G.: Comparison of data mining classification algorithms for breast cancer prediction. In: 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), pp. 1–4. IEEE (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.KIIT Deemed UniversityBhuneshwarIndia

Personalised recommendations