Detection of Type 2 Diabetes Using Clustering Methods – Balanced and Imbalanced Pima Indian Extended Dataset

  • S. Nivetha
  • B. ValarmathiEmail author
  • K. Santhi
  • T. Chellatamilan
Conference paper
Part of the Lecture Notes on Data Engineering and Communications Technologies book series (LNDECT, volume 49)


Diabetes mellitus is a metabolic illness that causes high blood sugar, which is widely known as diabetes. Insulin is a hormone produced by an organ situated behind the abdomen called the pancreas. This insulin agent moves glucose from your blood into the cells for energy and storage. With diabetic disorder, the body either will not create enough insulin or can’t effectively use the insulin it does create. Untreated high blood glucose or sugar from diabetic disorder will harm the nerves, eyes, kidneys, and different organs of the body. There are different data mining software tools to predict and analyze diabetes. Many attempts have been made by researchers to improve the efficiency of various models. The proposed method is Dimensionality reduction and clustering technique. It gives the highest accuracy for the larger dataset for both balanced and imbalanced datasets. In this paper, large and small datasets have been taken for clustering using K-means approach, Farthest first method, Density based technique, Filtered clustering method and X-means approach. K-means, density based and X-means gives the highest accuracy of 75.64%. For the larger balanced dataset when compared with the smaller balanced dataset.


Data mining Indian Pima Diabetes Over sampling Clustering K-means Density based Filtered clustering Farthest first X-means 


  1. 1.
    Iyer, A., Jeyalatha, S., Sumbaly, R.: Diagnosis of diabetes using classification mining techniques. Int. J. Data Min. Knowl. Manag. (IJDKP) 5, 1–14 (2015)Google Scholar
  2. 2.
    Kadhm, M.S., Ghindawi, I.W.C., Mhawi, D.E.: An accurate diabetes prediction system based on K-means clustering and proposed classification approach. Int. J. Appl. Eng. 13, 4038–4041 (2018). ISSN 0973-4562Google Scholar
  3. 3.
    Karegowda, A.G., Jayaram, M.A., Manjunath, A.S.: Cascading K-means clustering and K-nearest neighbor classifier for categorization of diabetic patients. Int. J. Eng. Adv. Technol. (IJEAT) 1, 147–151 (2012)Google Scholar
  4. 4.
    Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Procedia Comput. Sci. 132, 1578–1585 (2018)CrossRefGoogle Scholar
  5. 5.
    Li, Y., Li, H., Yao, H.: Analysis and study of diabetes follow-up data using a data-mining-based approach in new urban area of Urumqi, Xinjiang, China, 2016–2017. Comput. Math. Methods Med. 2018, 1–8 (2018)Google Scholar
  6. 6.
    Zou, Q., Qu, K., Luo, Y., Yin, D., Ju, Y., Tang, H.: Predicting diabetes mellitus with machine learning techniques. Front. Genet. 9, 515 (2018)CrossRefGoogle Scholar
  7. 7.
    George Amalarethinam, D.I., Aswin Vignesh, N.: Prediction of diabetes mellitus using data mining techniques: a survey. Int. J. Appl. Eng. Res. 10, 24–31 (2015)Google Scholar
  8. 8.
    Azrar, A., Awais, M., Ali, Y., Zaheer, K.: Data mining models comparison for diabetes prediction. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 9, 320–323 (2018)Google Scholar
  9. 9.
    Manimaran, R., Vanitha, M.: Prediciton of diabetes disease using classification data mining techniques. Int. J. Eng. Technol. 9, 3610–3614 (2018)Google Scholar
  10. 10.
    Gangil, T., Sneha, N.: Analysis of diabetes mellitus for early prediction using optimal features selection. J. Big Data 6, 13 (2019)CrossRefGoogle Scholar
  11. 11.
    Zhang, T., Ma, F., Yue, D., Peng, C., O’Hare, G.M.P.: Interval type-2 fuzzy local enhancement based rough k-means clustering considering imbalanced clusters. IEEE Trans. Fuzzy Syst. (2019)Google Scholar
  12. 12.
    Liu, C.-L., Hsieh, P.-Y.: Model-based synthetic sampling for imbalanced data. IEEE Trans. Knowl. Data Eng. (2019)Google Scholar
  13. 13.
    Kothainayaki, M., Thangaraj, P.: Clustering and classifying diabetic datasets using K-means algorithm. J. Appl. Inf. Sci. 1 (2013)Google Scholar
  14. 14.
    Jeevanandhini, D., Gokul Raj, E., Dinesh Kumar, V., Sasipriyaa, N.: Prediction of type 2 diabetes mellitus based on data mining. Int. J. Eng. Res. Technol. (IJERT) (2018)Google Scholar
  15. 15.
    Zhu, C., Idemudia, C.U., Feng, W.: Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques. Inform. Med. Unlocked (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • S. Nivetha
    • 1
  • B. Valarmathi
    • 2
    Email author
  • K. Santhi
    • 3
  • T. Chellatamilan
    • 4
  1. 1.School of Information Technology and EngineeringVellore Institute of TechnologyVelloreIndia
  2. 2.Department of Software and Systems Engineering, School of Information Technology and EngineeringVellore Institute of TechnologyVelloreIndia
  3. 3.Department of Analytics, School of Computer Science and EngineeringVellore Institute of TechnologyVelloreIndia
  4. 4.Department of Information Technology, School of Information Technology and EngineeringVellore Institute of TechnologyVelloreIndia

Personalised recommendations