Abstract
Conventional clustering methods have an assumption that data is stored centrally and are memory resident which made it tough to arrive at solutions when dealing with large data. Centralizing huge data from multiple locations are always a challenging task owing to the large memory space and computational time required by traditional mining methods. Traditional k-means type of clustering were used for the identification of clusters’ prototype that can serve as a representative point in a large dataset and the major setback is that the cluster centers tend to distort the distribution of the underlying data making the representative points incapable of handling the complete distribution of the data leading to poor pattern generation. With the aim to resolve this issue, this paper proposes an empirical model (EM) that ensures the centers of the cluster for capturing the data distribution which lies under. In the proposed methodology, the asymptotic convergence is centered on the data which is distributed. Secondly, an efficient mechanism for measuring the cluster centers in practice. Finally, a methodology for distributive convergence and center optimization is proposed. The model is compared with that of other methods in the literature and the results are discussed.
Similar content being viewed by others
Change history
30 May 2022
This article has been retracted. Please see the Retraction Notice for more detail: https://doi.org/10.1007/s12652-022-03978-8
References
Agarwal RC, Aggarwal CC, Prasad VVV (2001) A tree projection algorithm for generation of frequent item sets. J Parall Distrib Comput 61(3):350–371
Bagde U, Tripathi P (2018) An analytic survey on mapreduce based k-means and its hybrid clustering algorithms. In: IEEE second international conference on computing methodologies and communication (iccmc), pp 32–36
Belhaouari SB, Ahmed S, Mansour S (2014) Optimized K-means algorithm. Math Probl Eng 4(2):1–14
Bober D, Kapron H (2009) Distributed system for data acquisition and management of electric energy consumption. In: IEEE international workshop on intelligent data acquisition and advanced computing systems: technology and applications, pp 192–195
Echoukairi H, Kada A, Bouragba K, Ouzzif M (2017) A novel centralized clustering approach based on k-means algorithm for wireless sensor network. In: IEEE computing conference, pp 1259–1262
Esteves RM, Hacker T, Rong C (2012) Cluster analysis for the cloud: Parallel competitive fitness and parallel k-means++ for large dataset analysis. In: IEEE international conference on cloud computing technology and science proceedings, pp 177–184
Fang YW, Wang Y, Li PY, Lu YJ, Zhao XB, Xu H (2006) Research on dynamic generating algorithms of large itemsets of distributive data mining architecture. In: IEEE international conference on machine learning and cybernetics, pp 1314–1319
Gan W, Lin JCW, Chao HC, Zhan J (2017) Data mining in distributed environment: a survey. Wiley Interdiscipl Rev Data Min Knowl Discov 7(6):1–19
Hofmans J, Ceulemans E, Steinley D, Van Mechelen I (2015) On the added value of bootstrap analysis for K-means clustering. J Classif 32(2):268–284
Jin W, Wang Y, Zhou Y, Wang H (2009) Research on distributive algorithm of data mining with association rules. In: IEEE International Conference on Management and Service Science, pp 1–4
Kopetz H (1999) Which models and architectures of distributed real-time computing systems suit which application area? In: IEEE International symposium on object-oriented real-time distributed computing, pp 286–288
Lu J, Feng J (2014) A survey of parallel processing technologies with MapReduce. In: International conference on cyberspace technology (CCT 2014), pp 1–4
Mashayekhi H, Habibi J, Khalafbeigi T, Voulgaris S, Van Steen M (2015) GDCluster: a general decentralized clustering algorithm. IEEE Trans Knowl Data Eng 27(7):1892–1905
Mumtaz K, Duraiswamy K (2010) A novel density based improved k-means clustering algorithm–Dbkmeans. Int J Comput Sci Eng 2(2):213–218
Nasser A, Hamad D, Nasr C (2006) Kernel PCA as a visualization tools for clusters identifications. Int Conf Artif Neural Netw 2006:321–329
Nigam N, Saxena T, Richhariya V (2016) Global high dimension outlier algorithm for efficient clustering & outlier detection. In: IEEE Symposium on Colossal Data Analysis and Networking (CDAN), pp 1–5
Periyasamy S, Khara S, Thangavelu S (2016) Balanced cluster head selection based on modified k-means in a distributed wireless sensor network. Int J Distrib Sens Netw 12(3):1–11
Pourkamali-Anaraki F, Becker S (2017) Preconditioned data sparsification for big data with applications to PCA and K-means. IEEE Trans Inf Theory 63(5):2954–2974
Sánchez A, Pena JM, Pérez MS, Robles V, Herrero P (2004) Improving distributed data mining techniques by means of a grid infrastructure.In: OTM confederated international conferences on the move to meaningful internet systems, pp 111–122
Sharma DK, Dhurandher SK, Agarwal D, Kunal A (2019) kROp: k-Means clustering based routing protocol for opportunistic networks. J Ambient Intell Human Comput 10:1289–1306
Talukder N, Zaki MJ (2016) A distributed approach for graph mining in massive networks. Data Min Knowl Discov 30(5):1024–1052
Tambe SB, Gajre SS (2018) Cluster-based real-time analysis of mobile healthcare application for prediction of physiological data. J Ambient Intell Human Comput 9:429–445
Visalakshi NK, Thangavel K (2009) Distributed data clustering: a comparative analysis. In: Foundations of computational, pp 371–397
Xu Y, Qu W, Li Z, Min G, Li K, Liu Z (2014) Efficient $ k $-Means++ approximation with MapReduce. IEEE Trans Parallel Distrib Syst 25(12):3135–3144
Younis O, Fahmy S (2014) Distributed clustering in ad-hocsensor networks: a hybrid, energy-efficient approach. In: Proceedings of the annual joint conference of the IEEE Computer and communications societies, pp 1–12
Zhang Y, Liu N, Wang S (2018) A differential privacy protecting K-means clustering algorithm based on contour coefficients. PLoS ONE 13(11):1–15
Zhou L, Yang M (2008) A classifier build around cellular automata for distributed data mining. IEEE Int Conf Comput Sci Softw Eng 4:312–315
Zhou A, Cao F, Yan Y, Sha C (2015) Distributed data stream clustering: a fast em-based approach. In: Proceedings of the 23rd IEEE international conference on data engineering, pp 736–745
Zhou J, Chen L, Chen CP, Wang Y, Li HX (2017) Uncertain data clustering in distributed peer-to-peer networks. IEEE Trans Neural Netw Learn Syst 29(6):2392–2406
Acknowlegements
The authors extend their appreciation to the Deanship of Scientific Research at Majmaah University for funding this work under project number No (RGP-2019-25).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article has been retracted. Please see the retraction notice for more detail: https://doi.org/10.1007/s12652-022-03978-8
About this article
Cite this article
Sharma, S.K. RETRACTED ARTICLE: An empirical model (EM: CCO) for clustering, convergence and center optimization in distributive databases. J Ambient Intell Human Comput 12, 5045–5054 (2021). https://doi.org/10.1007/s12652-020-01955-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-01955-7