Abstract
Nowadays, cloud computing is used by most organizations to utilize cloud resources and services in dealing with big data. Besides, machine learning techniques are applied to extract the most valuable information from raw data provided by different resources. This paper evaluates the performance of the K-means (KM) and fuzzy C-means (FCM) algorithms in terms of the clustering time and accuracy. Data clustering has been applied to encrypted data as an instance of data analytics in both distributed and centralized-based environments. Furthermore, two different datasets are used in the designed framework, one which contains well-separated data and the other with overlapped data to check the performance of the two clustering algorithms for both data types. Moreover, adding different numbers of virtual machines in a distributed environment can obviously speed up the clustering time and reduce computational overheads. Experiments show that with overlapped data, FCM can obtain better accuracy than KM, whereas, in the case of well-separated data, the KM algorithm performs better than FCM with higher accuracy and less clustering time.
Similar content being viewed by others
Data availability
The used datasets have been provided from the Kaggle website https://www.kaggle.com/.
References
Ayed AB, Halima MB, Alimi AM (2014) Survey on clustering methods: towards fuzzy clustering for big data. IEEE. https://doi.org/10.1109/socpar.2014.7008028
Hurbungs V, Bassoo V, Fowdur TP (2021) Fog and edge computing: concepts, tools and focus areas. Int J Inf Technol 13(2):511–522. https://doi.org/10.1007/s41870-020-00588-5
Idrees SM, Alam MA, Agarwal P (2018) A study of big data and its challenges. Int J Inf Technol 11(4):841–846. https://doi.org/10.1007/s41870-018-0185-1
Salman Z, Hammad M (2021) Securing cloud computing: a review. Int J Comput Digit Syst 10(1):545–554. https://doi.org/10.12785/ijcds/100152
Patibandla RSML, Veeranjaneyulu N (2018) Survey on clustering algorithms for unstructured data. Springer Singapore, Singapore, pp 421–429. https://doi.org/10.1007/978-981-10-7566-7_41
Wiharto W, Suryani E (2020) The comparison of clustering algorithms k-means and fuzzy c-means for segmentation retinal blood vessels. Acta Inf Med 28(1):42. https://doi.org/10.5455/aim.2020.28.42-47
Jain V (2017) Perspective analysis of telecommunication fraud detection using data stream analytics and neural network classification based data mining. Int J Inf Technol 9(3):303–310. https://doi.org/10.1007/s41870-017-0036-5
Shaikh TA, Ali R (2019) Big data for better Indian healthcare. Int J Inf Technol 11(4):735–741. https://doi.org/10.1007/s41870-019-00342-6
Ngo VM, Duong T-VT, Nguyen T-B-T, Dang CN, Conlan O (2023) A big data smart agricultural system: recommending optimum fertilisers for crops. Int J Inf Technol 15(1):249–265. https://doi.org/10.1007/s41870-022-01150-1
Arunkumar N et al (2018) K-Means clustering and neural network for object detecting and identifying abnormality of brain tumor. Soft Comput 23(19):9083–9096. https://doi.org/10.1007/s00500-018-3618-7
Anas M, Gupta K, Ahmad S (2017) Skin cancer classification using k-means clustering. Int J Tech Res Appl 5(1):62–65
Aung YY, Min MM (2018) Hybrid intrusion detection system using k-means and classification and regression trees algorithms. IEEE. https://doi.org/10.1109/sera.2018.8477203
HussianHassan AA, Shah WM, Othman MFI, Hassan HAH (2020) Evaluate the performance of k-means and the fuzzy c-means algorithms to formation balanced clusters in wireless sensor networks. Int J Electr Comput Eng (IJECE) 10(2):1515. https://doi.org/10.11591/ijece.v10i2.pp1515-1523
Alabdulatif A, Khalil I, Yi X (2020) Towards secure big data analytic for cloud-enabled applications with fully homomorphic encryption. J Parallel Distrib Comput 137:192–204. https://doi.org/10.1016/j.jpdc.2019.10.008
Salman Z, Alomary A (2022) An efficient approach to reduce the encryption and decryption time based on the concept of unique values. IEEE. https://doi.org/10.1109/3ict56508.2022.9990852
Alam MS et al (2019) Automatic human brain tumor detection in MRI image using template-based k means and improved fuzzy c means clustering algorithm. Big Data Cogn Comput 3(2):27. https://doi.org/10.3390/bdcc3020027
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Salman, Z., Alomary, A. Performance of the K-means and fuzzy C-means algorithms in big data analytics. Int. j. inf. tecnol. 16, 465–470 (2024). https://doi.org/10.1007/s41870-023-01436-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41870-023-01436-y