Skip to main content
Log in

Performance of the K-means and fuzzy C-means algorithms in big data analytics

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

Nowadays, cloud computing is used by most organizations to utilize cloud resources and services in dealing with big data. Besides, machine learning techniques are applied to extract the most valuable information from raw data provided by different resources. This paper evaluates the performance of the K-means (KM) and fuzzy C-means (FCM) algorithms in terms of the clustering time and accuracy. Data clustering has been applied to encrypted data as an instance of data analytics in both distributed and centralized-based environments. Furthermore, two different datasets are used in the designed framework, one which contains well-separated data and the other with overlapped data to check the performance of the two clustering algorithms for both data types. Moreover, adding different numbers of virtual machines in a distributed environment can obviously speed up the clustering time and reduce computational overheads. Experiments show that with overlapped data, FCM can obtain better accuracy than KM, whereas, in the case of well-separated data, the KM algorithm performs better than FCM with higher accuracy and less clustering time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

The used datasets have been provided from the Kaggle website https://www.kaggle.com/.

References

  1. Ayed AB, Halima MB, Alimi AM (2014) Survey on clustering methods: towards fuzzy clustering for big data. IEEE. https://doi.org/10.1109/socpar.2014.7008028

    Article  Google Scholar 

  2. Hurbungs V, Bassoo V, Fowdur TP (2021) Fog and edge computing: concepts, tools and focus areas. Int J Inf Technol 13(2):511–522. https://doi.org/10.1007/s41870-020-00588-5

    Article  Google Scholar 

  3. Idrees SM, Alam MA, Agarwal P (2018) A study of big data and its challenges. Int J Inf Technol 11(4):841–846. https://doi.org/10.1007/s41870-018-0185-1

    Article  Google Scholar 

  4. Salman Z, Hammad M (2021) Securing cloud computing: a review. Int J Comput Digit Syst 10(1):545–554. https://doi.org/10.12785/ijcds/100152

    Article  Google Scholar 

  5. Patibandla RSML, Veeranjaneyulu N (2018) Survey on clustering algorithms for unstructured data. Springer Singapore, Singapore, pp 421–429. https://doi.org/10.1007/978-981-10-7566-7_41

    Book  Google Scholar 

  6. Wiharto W, Suryani E (2020) The comparison of clustering algorithms k-means and fuzzy c-means for segmentation retinal blood vessels. Acta Inf Med 28(1):42. https://doi.org/10.5455/aim.2020.28.42-47

    Article  Google Scholar 

  7. Jain V (2017) Perspective analysis of telecommunication fraud detection using data stream analytics and neural network classification based data mining. Int J Inf Technol 9(3):303–310. https://doi.org/10.1007/s41870-017-0036-5

    Article  Google Scholar 

  8. Shaikh TA, Ali R (2019) Big data for better Indian healthcare. Int J Inf Technol 11(4):735–741. https://doi.org/10.1007/s41870-019-00342-6

    Article  Google Scholar 

  9. Ngo VM, Duong T-VT, Nguyen T-B-T, Dang CN, Conlan O (2023) A big data smart agricultural system: recommending optimum fertilisers for crops. Int J Inf Technol 15(1):249–265. https://doi.org/10.1007/s41870-022-01150-1

    Article  Google Scholar 

  10. Arunkumar N et al (2018) K-Means clustering and neural network for object detecting and identifying abnormality of brain tumor. Soft Comput 23(19):9083–9096. https://doi.org/10.1007/s00500-018-3618-7

    Article  Google Scholar 

  11. Anas M, Gupta K, Ahmad S (2017) Skin cancer classification using k-means clustering. Int J Tech Res Appl 5(1):62–65

    Google Scholar 

  12. Aung YY, Min MM (2018) Hybrid intrusion detection system using k-means and classification and regression trees algorithms. IEEE. https://doi.org/10.1109/sera.2018.8477203

    Article  Google Scholar 

  13. HussianHassan AA, Shah WM, Othman MFI, Hassan HAH (2020) Evaluate the performance of k-means and the fuzzy c-means algorithms to formation balanced clusters in wireless sensor networks. Int J Electr Comput Eng (IJECE) 10(2):1515. https://doi.org/10.11591/ijece.v10i2.pp1515-1523

    Article  Google Scholar 

  14. Alabdulatif A, Khalil I, Yi X (2020) Towards secure big data analytic for cloud-enabled applications with fully homomorphic encryption. J Parallel Distrib Comput 137:192–204. https://doi.org/10.1016/j.jpdc.2019.10.008

    Article  Google Scholar 

  15. Salman Z, Alomary A (2022) An efficient approach to reduce the encryption and decryption time based on the concept of unique values. IEEE. https://doi.org/10.1109/3ict56508.2022.9990852

    Article  Google Scholar 

  16. Alam MS et al (2019) Automatic human brain tumor detection in MRI image using template-based k means and improved fuzzy c means clustering algorithm. Big Data Cogn Comput 3(2):27. https://doi.org/10.3390/bdcc3020027

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zainab Salman.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Salman, Z., Alomary, A. Performance of the K-means and fuzzy C-means algorithms in big data analytics. Int. j. inf. tecnol. 16, 465–470 (2024). https://doi.org/10.1007/s41870-023-01436-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-023-01436-y

Keywords

Navigation