Skip to main content
Log in

RETRACTED ARTICLE: An empirical model (EM: CCO) for clustering, convergence and center optimization in distributive databases

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

This article was retracted on 30 May 2022

This article has been updated

Abstract

Conventional clustering methods have an assumption that data is stored centrally and are memory resident which made it tough to arrive at solutions when dealing with large data. Centralizing huge data from multiple locations are always a challenging task owing to the large memory space and computational time required by traditional mining methods. Traditional k-means type of clustering were used for the identification of clusters’ prototype that can serve as a representative point in a large dataset and the major setback is that the cluster centers tend to distort the distribution of the underlying data making the representative points incapable of handling the complete distribution of the data leading to poor pattern generation. With the aim to resolve this issue, this paper proposes an empirical model (EM) that ensures the centers of the cluster for capturing the data distribution which lies under. In the proposed methodology, the asymptotic convergence is centered on the data which is distributed. Secondly, an efficient mechanism for measuring the cluster centers in practice. Finally, a methodology for distributive convergence and center optimization is proposed. The model is compared with that of other methods in the literature and the results are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Change history

References

  • Agarwal RC, Aggarwal CC, Prasad VVV (2001) A tree projection algorithm for generation of frequent item sets. J Parall Distrib Comput 61(3):350–371

    Article  Google Scholar 

  • Bagde U, Tripathi P (2018) An analytic survey on mapreduce based k-means and its hybrid clustering algorithms. In: IEEE second international conference on computing methodologies and communication (iccmc), pp 32–36

  • Belhaouari SB, Ahmed S, Mansour S (2014) Optimized K-means algorithm. Math Probl Eng 4(2):1–14

    Article  Google Scholar 

  • Bober D, Kapron H (2009) Distributed system for data acquisition and management of electric energy consumption. In: IEEE international workshop on intelligent data acquisition and advanced computing systems: technology and applications, pp 192–195

  • Echoukairi H, Kada A, Bouragba K, Ouzzif M (2017) A novel centralized clustering approach based on k-means algorithm for wireless sensor network. In: IEEE computing conference, pp 1259–1262

  • Esteves RM, Hacker T, Rong C (2012) Cluster analysis for the cloud: Parallel competitive fitness and parallel k-means++ for large dataset analysis. In: IEEE international conference on cloud computing technology and science proceedings, pp 177–184

  • Fang YW, Wang Y, Li PY, Lu YJ, Zhao XB, Xu H (2006) Research on dynamic generating algorithms of large itemsets of distributive data mining architecture. In: IEEE international conference on machine learning and cybernetics, pp 1314–1319

  • Gan W, Lin JCW, Chao HC, Zhan J (2017) Data mining in distributed environment: a survey. Wiley Interdiscipl Rev Data Min Knowl Discov 7(6):1–19

    Google Scholar 

  • Hofmans J, Ceulemans E, Steinley D, Van Mechelen I (2015) On the added value of bootstrap analysis for K-means clustering. J Classif 32(2):268–284

    Article  MathSciNet  Google Scholar 

  • Jin W, Wang Y, Zhou Y, Wang H (2009) Research on distributive algorithm of data mining with association rules. In: IEEE International Conference on Management and Service Science, pp 1–4

  • Kopetz H (1999) Which models and architectures of distributed real-time computing systems suit which application area? In: IEEE International symposium on object-oriented real-time distributed computing, pp 286–288

  • Lu J, Feng J (2014) A survey of parallel processing technologies with MapReduce. In: International conference on cyberspace technology (CCT 2014), pp 1–4

  • Mashayekhi H, Habibi J, Khalafbeigi T, Voulgaris S, Van Steen M (2015) GDCluster: a general decentralized clustering algorithm. IEEE Trans Knowl Data Eng 27(7):1892–1905

    Article  Google Scholar 

  • Mumtaz K, Duraiswamy K (2010) A novel density based improved k-means clustering algorithm–Dbkmeans. Int J Comput Sci Eng 2(2):213–218

    Google Scholar 

  • Nasser A, Hamad D, Nasr C (2006) Kernel PCA as a visualization tools for clusters identifications. Int Conf Artif Neural Netw 2006:321–329

    Google Scholar 

  • Nigam N, Saxena T, Richhariya V (2016) Global high dimension outlier algorithm for efficient clustering & outlier detection. In: IEEE Symposium on Colossal Data Analysis and Networking (CDAN), pp 1–5

  • Periyasamy S, Khara S, Thangavelu S (2016) Balanced cluster head selection based on modified k-means in a distributed wireless sensor network. Int J Distrib Sens Netw 12(3):1–11

    Article  Google Scholar 

  • Pourkamali-Anaraki F, Becker S (2017) Preconditioned data sparsification for big data with applications to PCA and K-means. IEEE Trans Inf Theory 63(5):2954–2974

    MathSciNet  MATH  Google Scholar 

  • Sánchez A, Pena JM, Pérez MS, Robles V, Herrero P (2004) Improving distributed data mining techniques by means of a grid infrastructure.In: OTM confederated international conferences on the move to meaningful internet systems, pp 111–122

  • Sharma DK, Dhurandher SK, Agarwal D, Kunal A (2019) kROp: k-Means clustering based routing protocol for opportunistic networks. J Ambient Intell Human Comput 10:1289–1306

    Article  Google Scholar 

  • Talukder N, Zaki MJ (2016) A distributed approach for graph mining in massive networks. Data Min Knowl Discov 30(5):1024–1052

    Article  MathSciNet  Google Scholar 

  • Tambe SB, Gajre SS (2018) Cluster-based real-time analysis of mobile healthcare application for prediction of physiological data. J Ambient Intell Human Comput 9:429–445

    Article  Google Scholar 

  • Visalakshi NK, Thangavel K (2009) Distributed data clustering: a comparative analysis. In: Foundations of computational, pp 371–397

  • Xu Y, Qu W, Li Z, Min G, Li K, Liu Z (2014) Efficient $ k $-Means++ approximation with MapReduce. IEEE Trans Parallel Distrib Syst 25(12):3135–3144

    Article  Google Scholar 

  • Younis O, Fahmy S (2014) Distributed clustering in ad-hocsensor networks: a hybrid, energy-efficient approach. In: Proceedings of the annual joint conference of the IEEE Computer and communications societies, pp 1–12

  • Zhang Y, Liu N, Wang S (2018) A differential privacy protecting K-means clustering algorithm based on contour coefficients. PLoS ONE 13(11):1–15

    Google Scholar 

  • Zhou L, Yang M (2008) A classifier build around cellular automata for distributed data mining. IEEE Int Conf Comput Sci Softw Eng 4:312–315

    Google Scholar 

  • Zhou A, Cao F, Yan Y, Sha C (2015) Distributed data stream clustering: a fast em-based approach. In: Proceedings of the 23rd IEEE international conference on data engineering, pp 736–745

  • Zhou J, Chen L, Chen CP, Wang Y, Li HX (2017) Uncertain data clustering in distributed peer-to-peer networks. IEEE Trans Neural Netw Learn Syst 29(6):2392–2406

    Article  Google Scholar 

Download references

Acknowlegements

The authors extend their appreciation to the Deanship of Scientific Research at Majmaah University for funding this work under project number No (RGP-2019-25).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sunil Kumar Sharma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article has been retracted. Please see the retraction notice for more detail: https://doi.org/10.1007/s12652-022-03978-8

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, S.K. RETRACTED ARTICLE: An empirical model (EM: CCO) for clustering, convergence and center optimization in distributive databases. J Ambient Intell Human Comput 12, 5045–5054 (2021). https://doi.org/10.1007/s12652-020-01955-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-01955-7

Keywords

Navigation