RETRACTED ARTICLE: An empirical model (EM: CCO) for clustering, convergence and center optimization in distributive databases

Sharma, Sunil Kumar

doi:10.1007/s12652-020-01955-7

RETRACTED ARTICLE: An empirical model (EM: CCO) for clustering, convergence and center optimization in distributive databases

Original Research
Published: 20 April 2020

Volume 12, pages 5045–5054, (2021)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Sunil Kumar Sharma ORCID: orcid.org/0000-0002-1732-2677¹

178 Accesses
1 Citation
Explore all metrics

This article was retracted on 30 May 2022

This article has been updated

Abstract

Conventional clustering methods have an assumption that data is stored centrally and are memory resident which made it tough to arrive at solutions when dealing with large data. Centralizing huge data from multiple locations are always a challenging task owing to the large memory space and computational time required by traditional mining methods. Traditional k-means type of clustering were used for the identification of clusters’ prototype that can serve as a representative point in a large dataset and the major setback is that the cluster centers tend to distort the distribution of the underlying data making the representative points incapable of handling the complete distribution of the data leading to poor pattern generation. With the aim to resolve this issue, this paper proposes an empirical model (EM) that ensures the centers of the cluster for capturing the data distribution which lies under. In the proposed methodology, the asymptotic convergence is centered on the data which is distributed. Secondly, an efficient mechanism for measuring the cluster centers in practice. Finally, a methodology for distributive convergence and center optimization is proposed. The model is compared with that of other methods in the literature and the results are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Randomized self-updating process for clustering large-scale data

Article 24 November 2023

A Fast Distribution-Based Clustering Algorithm for Massive Data

Size-Constrained Clustering Using an Initial Points Selection Method

Change history

30 May 2022
This article has been retracted. Please see the Retraction Notice for more detail: https://doi.org/10.1007/s12652-022-03978-8

References

Agarwal RC, Aggarwal CC, Prasad VVV (2001) A tree projection algorithm for generation of frequent item sets. J Parall Distrib Comput 61(3):350–371
Article Google Scholar
Bagde U, Tripathi P (2018) An analytic survey on mapreduce based k-means and its hybrid clustering algorithms. In: IEEE second international conference on computing methodologies and communication (iccmc), pp 32–36
Belhaouari SB, Ahmed S, Mansour S (2014) Optimized K-means algorithm. Math Probl Eng 4(2):1–14
Article Google Scholar
Bober D, Kapron H (2009) Distributed system for data acquisition and management of electric energy consumption. In: IEEE international workshop on intelligent data acquisition and advanced computing systems: technology and applications, pp 192–195
Echoukairi H, Kada A, Bouragba K, Ouzzif M (2017) A novel centralized clustering approach based on k-means algorithm for wireless sensor network. In: IEEE computing conference, pp 1259–1262
Esteves RM, Hacker T, Rong C (2012) Cluster analysis for the cloud: Parallel competitive fitness and parallel k-means++ for large dataset analysis. In: IEEE international conference on cloud computing technology and science proceedings, pp 177–184
Fang YW, Wang Y, Li PY, Lu YJ, Zhao XB, Xu H (2006) Research on dynamic generating algorithms of large itemsets of distributive data mining architecture. In: IEEE international conference on machine learning and cybernetics, pp 1314–1319
Gan W, Lin JCW, Chao HC, Zhan J (2017) Data mining in distributed environment: a survey. Wiley Interdiscipl Rev Data Min Knowl Discov 7(6):1–19
Google Scholar
Hofmans J, Ceulemans E, Steinley D, Van Mechelen I (2015) On the added value of bootstrap analysis for K-means clustering. J Classif 32(2):268–284
Article MathSciNet Google Scholar
Jin W, Wang Y, Zhou Y, Wang H (2009) Research on distributive algorithm of data mining with association rules. In: IEEE International Conference on Management and Service Science, pp 1–4
Kopetz H (1999) Which models and architectures of distributed real-time computing systems suit which application area? In: IEEE International symposium on object-oriented real-time distributed computing, pp 286–288
Lu J, Feng J (2014) A survey of parallel processing technologies with MapReduce. In: International conference on cyberspace technology (CCT 2014), pp 1–4
Mashayekhi H, Habibi J, Khalafbeigi T, Voulgaris S, Van Steen M (2015) GDCluster: a general decentralized clustering algorithm. IEEE Trans Knowl Data Eng 27(7):1892–1905
Article Google Scholar
Mumtaz K, Duraiswamy K (2010) A novel density based improved k-means clustering algorithm–Dbkmeans. Int J Comput Sci Eng 2(2):213–218
Google Scholar
Nasser A, Hamad D, Nasr C (2006) Kernel PCA as a visualization tools for clusters identifications. Int Conf Artif Neural Netw 2006:321–329
Google Scholar
Nigam N, Saxena T, Richhariya V (2016) Global high dimension outlier algorithm for efficient clustering & outlier detection. In: IEEE Symposium on Colossal Data Analysis and Networking (CDAN), pp 1–5
Periyasamy S, Khara S, Thangavelu S (2016) Balanced cluster head selection based on modified k-means in a distributed wireless sensor network. Int J Distrib Sens Netw 12(3):1–11
Article Google Scholar
Pourkamali-Anaraki F, Becker S (2017) Preconditioned data sparsification for big data with applications to PCA and K-means. IEEE Trans Inf Theory 63(5):2954–2974
MathSciNet MATH Google Scholar
Sánchez A, Pena JM, Pérez MS, Robles V, Herrero P (2004) Improving distributed data mining techniques by means of a grid infrastructure.In: OTM confederated international conferences on the move to meaningful internet systems, pp 111–122
Sharma DK, Dhurandher SK, Agarwal D, Kunal A (2019) kROp: k-Means clustering based routing protocol for opportunistic networks. J Ambient Intell Human Comput 10:1289–1306
Article Google Scholar
Talukder N, Zaki MJ (2016) A distributed approach for graph mining in massive networks. Data Min Knowl Discov 30(5):1024–1052
Article MathSciNet Google Scholar
Tambe SB, Gajre SS (2018) Cluster-based real-time analysis of mobile healthcare application for prediction of physiological data. J Ambient Intell Human Comput 9:429–445
Article Google Scholar
Visalakshi NK, Thangavel K (2009) Distributed data clustering: a comparative analysis. In: Foundations of computational, pp 371–397
Xu Y, Qu W, Li Z, Min G, Li K, Liu Z (2014) Efficient $ k $-Means++ approximation with MapReduce. IEEE Trans Parallel Distrib Syst 25(12):3135–3144
Article Google Scholar
Younis O, Fahmy S (2014) Distributed clustering in ad-hocsensor networks: a hybrid, energy-efficient approach. In: Proceedings of the annual joint conference of the IEEE Computer and communications societies, pp 1–12
Zhang Y, Liu N, Wang S (2018) A differential privacy protecting K-means clustering algorithm based on contour coefficients. PLoS ONE 13(11):1–15
Google Scholar
Zhou L, Yang M (2008) A classifier build around cellular automata for distributed data mining. IEEE Int Conf Comput Sci Softw Eng 4:312–315
Google Scholar
Zhou A, Cao F, Yan Y, Sha C (2015) Distributed data stream clustering: a fast em-based approach. In: Proceedings of the 23rd IEEE international conference on data engineering, pp 736–745
Zhou J, Chen L, Chen CP, Wang Y, Li HX (2017) Uncertain data clustering in distributed peer-to-peer networks. IEEE Trans Neural Netw Learn Syst 29(6):2392–2406
Article Google Scholar

Download references

Acknowlegements

The authors extend their appreciation to the Deanship of Scientific Research at Majmaah University for funding this work under project number No (RGP-2019-25).

Author information

Authors and Affiliations

College of Computer and Information Sciences, Majmaah University, Majmaah, 11952, Saudi Arabia
Sunil Kumar Sharma

Authors

Sunil Kumar Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sunil Kumar Sharma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article has been retracted. Please see the retraction notice for more detail: https://doi.org/10.1007/s12652-022-03978-8

About this article

Cite this article

Sharma, S.K. RETRACTED ARTICLE: An empirical model (EM: CCO) for clustering, convergence and center optimization in distributive databases. J Ambient Intell Human Comput 12, 5045–5054 (2021). https://doi.org/10.1007/s12652-020-01955-7

Download citation

Received: 14 February 2020
Accepted: 06 April 2020
Published: 20 April 2020
Issue Date: May 2021
DOI: https://doi.org/10.1007/s12652-020-01955-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RETRACTED ARTICLE: An empirical model (EM: CCO) for clustering, convergence and center optimization in distributive databases

Abstract

Access this article

Similar content being viewed by others

Randomized self-updating process for clustering large-scale data

A Fast Distribution-Based Clustering Algorithm for Massive Data

Size-Constrained Clustering Using an Initial Points Selection Method

Change history

30 May 2022

References

Acknowlegements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

About this article

Cite this article

Keywords

Navigation

RETRACTED ARTICLE: An empirical model (EM: CCO) for clustering, convergence and center optimization in distributive databases

Abstract

Access this article

Similar content being viewed by others

Randomized self-updating process for clustering large-scale data

A Fast Distribution-Based Clustering Algorithm for Massive Data

Size-Constrained Clustering Using an Initial Points Selection Method

Change history

30 May 2022

References

Acknowlegements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

About this article

Cite this article

Share this article

Keywords

Search

Navigation