Skip to main content
Log in

Performance study of K-nearest neighbor classifier and K-means clustering for predicting the diagnostic accuracy

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

The major challenge related to data management lies in healthcare sector due to increase in patients proportional to the population growth and change in lifestyle. The data analytics and big data are becoming trends to provide solution to all analytical problems that can be obtained by using machine learning techniques. Today, cancer is evolving as one of the major attention seeking phenomenon in developed as well as in developing countries that may lead to death if not diagnosed at the early stage. The late diagnosis, and hence delayed treatment increase the risk for the survival. Thus, early detection to improve the cancer outcome is very critical. This study is intended towards early diagnosis of cancer using more efficient analytical techniques. Moreover, accuracy plays an important role in prediction to improve the quality of care, thereby increasing the survival rate. For this study, the datasets are extracted from UCI Machine Learning Repository prepared by University of Wisconsin Hospitals. For the diagnosis and classification process, K Nearest Neighbor (KNN) classifier is applied with different values of K variable, introducing the process called KNN Clustering. Later the performance of KNN is compared with K-Means clustering on the same datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. http://www.cancer.org/acs/groups/content/@research/documents/document/acspc-046381.pdf. Accessed Dec 2016

  2. Khosravanian A, Ayat S (2016) Diagnosing breast cancer type by using probabilistic neural network in decision support system. Int J Knowledge Eng 2(1):73–76. https://doi.org/10.18178/ijke.2016.2.1.056

    Article  Google Scholar 

  3. Kalaivani S, Gandhimathi S (2015) An efficient bayes classification algorithm for analysis of breast cancer dataset using cross validation parameter. Int J Adv Res Comput Sci Softw Eng 5(10):430–434

    Google Scholar 

  4. Senturk ZK, Karal R (2014) Breast cancer diagnosis via data mining: performance analysis of seven different algorithms. Int J Comput Sci Eng (CSEIJ) 4(1):4775–4781

    Google Scholar 

  5. Imandoust SB, Bolandraftar M (2013) Application of K nearest neighbor (KNN) for predicting economic events: theoretical background. Int J Eng Res Appl 3(5):605–610

    Google Scholar 

  6. Bellaachia A, Guven E (2003) Predicting breast cancer survivability using data mining techniques. J Soc Ind Appl Math 7(1):37–42

    Google Scholar 

  7. Chi CL, Street WN, Wolberg WH (2007) Application of artificial neural network-based survival analysison two breast cancer datasets. Proceedings of AMIA 2007 Symposium

  8. Buciński A, Bączek T, Krysiński J, Szoszkiewicz R, Załuski J (2007) Clinical data analysis using artificial neural networks (ANN) and principal component analysis (PCA) of patients with breast cancer after mastectomy. Rep Pract Oncol Radiother 12(1):9–17

    Article  Google Scholar 

  9. Joshi J, Doshi R, Patel J (2014) Diagnosis of breast cancer using clustering data mining approach. Int J Comput Appl 101(10):13–17

    Google Scholar 

  10. Saleema JS et al (2014) Cancer prognosis prediction using balanced stratified sampling. Int J Soft Comput Artif Intell Appl (IJSCAI) 3(1)

  11. Pandey A (2014) Study and analysis of K-means clustering algorithm using rapidminer a case study on students’ exam result. Int J Eng Res Appl 4(12):60–64 (ISSN: 2248-9622)

    Google Scholar 

  12. Parvin H, Alizadeh H, Minaei-Bidgoli B (2008) MKNN: modified K-nearest neighbour. Proceedings of World Congress in Engineering and Computer Science, USA

  13. Salama GI, Abdelhalim MB, Zeid MA (2012) Breast cancer diagnosis on three different datasets using multi-classifiers. Int J Comput Inf Technol 1(1)

  14. Madjahed SA, Saadi TA, Benyettou A (2013) Breast cancer diagnosis by using k-nearest neighbour with different diatances and classification rules. Int J Comput Appl 62(1):1–5

    Google Scholar 

  15. Kandhasomy JP, Balemurli S (2015) Performance analysis of classifier models to predict diabetes mellitus. Proc Comput Sci 47:45–51

    Article  Google Scholar 

  16. Malarzhi R, Thanamani AS (2012) K-NN classifier performs better than K-means clustering in missing value imputation. IOSR J Comput Eng (IOSRJCE) 6(5):12–15 (ISSSN-2278-0061)

    Article  Google Scholar 

  17. Manjisha M, Hari Kumar R (2016) Performance analysis of KNN and K-means custering for robust classification of epilepsy from EEG signals. Int Conf Wirel Commun Signal Process Netw (NISPNET). https://doi.org/10.1110/wispnet.2016.7566575

    Google Scholar 

  18. Sahu SK, Kumar P, Singh AP (2018) Modified K-NN algorithm for classification problems with improved accuracy. Int J Inf Technol IBJIT 10(1):65–70

    Google Scholar 

  19. https://sites.google.com/site/dataclusteringalgorithms/k-means-clustering-algorithm. Accessed Jan 2017

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kavita Mittal.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mittal, K., Aggarwal, G. & Mahajan, P. Performance study of K-nearest neighbor classifier and K-means clustering for predicting the diagnostic accuracy. Int. j. inf. tecnol. 11, 535–540 (2019). https://doi.org/10.1007/s41870-018-0233-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-018-0233-x

Keywords

Navigation