Skip to main content
Log in

Towards Efficient Ensemble Hierarchical Clustering with MapReduce-based Clusters Clustering Technique and the Innovative Similarity Criterion

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Today, data plays an important and fundamental role in our daily lives. The increasing growth of data production has led to the big data revolution. Managing and analyzing this data, which is often unlabeled, is a major challenge for the real world. Clustering is one of the most important branches of data mining for data analysis and its purpose is to divide the data into meaningful subsets called clusters. Hierarchical clustering is one of the unsupervised learning algorithms for grouping data points with similar properties, so that its concept lies in the construction and analysis of dendrograms. Over the decades, many algorithms have been developed for clustering with different approaches. In this paper, an efficient ensemble hierarchical clustering algorithm based on MapReduce-based clusters clustering technique and an innovative similarity criterion is introduced. The main idea of ensemble clustering is to combine the results of different single clustering methods. Ensemble techniques usually produce better results than single methods due to multiple learning. Accordingly, it can be expected that the aggregation of hierarchical clustering methods will lead to higher quality in clustering. In addition, MapReduce is a model for implementing big data applications, where we use this model to implement hierarchical clustering methods. Meanwhile, the similarity between the samples is calculated through an innovative similarity criterion. The proposed approach is presented in three steps. In the first step, the data are clustered by several single hierarchical clustering methods. Then in the second step, hyper-clusters are generated by applying the clusters clustering technique. Finally, the final clusters are generated in the third step. This is done by allocating samples to hyper-clusters. Accordingly, the final clusters are formed in the third step. The simulation is performed on multiple real-world datasets and the results show better performance of the proposed approach compared to algorithms such as CHC and RCESCC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Availability of data and material

Data sharing not applicable to this manuscript as no datasets were generated or analyzed during the current study.

References

  1. Boongoen, T., Iam-On, N.: Cluster ensembles: A survey of approaches with recent extensions and applications. Comput. Sci. Rev. 28, 1–25 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  2. Rezaeipanah, A., Nazari, H., Ahmadi, G.: A Hybrid Approach for Prolonging Lifetime of Wireless Sensor Networks Using Genetic Algorithm and Online Clustering. J. Comput. Sci. Eng. 13(4), 163–174 (2019)

    Article  Google Scholar 

  3. Nasiri, E., Berahmand, K., Rostami, M., Dabiri, M.: A novel link prediction algorithm for protein-protein interaction networks by attributed graph embedding. Comput. Biol. Med. 137, 104772 (2021)

    Article  Google Scholar 

  4. Ghobaei-Arani, M.: A workload clustering based resource provisioning mechanism using Biogeography based optimization technique in the cloud based systems. Soft. Comput. 25(5), 3813–3830 (2021)

    Article  Google Scholar 

  5. Mirzaei, A., Rahmati, M., Ahmadi, M.: A new method for hierarchical clustering combination. Intell. Data Anal. 12(6), 549–571 (2008)

    Article  Google Scholar 

  6. Mojarad, M., Nejatian, S., Parvin, H., Mohammadpoor, M.: A fuzzy clustering ensemble based on cluster clustering and iterative Fusion of base clusters. Appl. Intell. 49(7), 2567–2581 (2019)

    Article  Google Scholar 

  7. Shahidinejad, A., Ghobaei-Arani, M., Esmaeili, L.: An elastic controller using Colored Petri Nets in cloud computing environment. Clust. Comput. 23(2), 1045–1071 (2020)

    Article  Google Scholar 

  8. Rezaeipanah, A., Amiri, P., Jafari, S.: Performing the kick during walking for robocup 3d soccer simulation league using reinforcement learning algorithm. Int. J. Soc. Robot. 13(6), 1235–1252 (2021)

    Article  Google Scholar 

  9. Ghobaei-Arani, M., Shahidinejad, A.: An efficient resource provisioning approach for analyzing cloud workloads: a metaheuristic-based clustering approach. J. Supercomput. 77(1), 711–750 (2021)

    Article  Google Scholar 

  10. Lu, W.: Improved K-means clustering algorithm for big data mining under Hadoop parallel framework. J. Grid Comput. 18(2), 239–250 (2020)

    Article  Google Scholar 

  11. Mojarad, M., Sarhangnia, F., Rezaeipanah, A., Parvin, H., Nejatian, S.: Modeling Hereditary Disease Behavior Using an Innovative Similarity Criterion and Ensemble Clustering. Curr. Bioinform. 16(5), 749–764 (2021)

    Article  Google Scholar 

  12. Xia, D., Ning, F., He, W.: Research on parallel adaptive Canopy-K-Means clustering algorithm for big data mining based on cloud platform. J. Grid Comput. 18(2), 263–273 (2020)

    Article  Google Scholar 

  13. Shanthamallu, U. S., Spanias, A., Tepedelenlioglu, C., & Stanley, M.: A brief survey of machine learning methods and their sensor and IoT applications. In 2017 8th International Conference on Information, Intelligence, Systems & Applications (IISA) (pp. 1–8). IEEE. (2017)

  14. Karthick, S., Yuvaraj, N., Rajakumari, P. A., & Raja, R. A.: Ensemble Similarity Clustering Frame work for Categorical Dataset Clustering Using Swarm Intelligence. In Intelligent Computing and Applications (pp. 549–557). Springer, Singapore. (2021)

  15. Strehl, A., Ghosh, J.: Cluster ensembles–-a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3(Dec), 583–617 (2002)

    MathSciNet  MATH  Google Scholar 

  16. Fern, X.Z., Lin, W.: Cluster ensemble selection. Stat. Anal. Data Mining: ASA Data Sci. J. 1(3), 128–141 (2008)

    Article  MathSciNet  Google Scholar 

  17. Azimi, J., & Fern, X: Adaptive cluster ensemble selection. In Twenty-First International Joint Conference on Artificial Intelligence (pp. 992–997). Pasadena, California (2009)

  18. Jia, J., Xiao, X., Liu, B., Jiao, L.: Bagging-based spectral clustering ensemble selection. Pattern Recogn. Lett. 32(10), 1456–1467 (2011)

    Article  Google Scholar 

  19. Jia, J., Xiao, X., & Liu, B: Similarity-based spectral clustering ensemble selection. In 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery (pp. 1071–1074). IEEE. (2012)

  20. Banerjee, A: Leveraging frequency and diversity based ensemble selection to consensus clustering. In 2014 Seventh international conference on contemporary computing (IC3) (pp. 123–129). IEEE. (2014)

  21. Naldi, M.C., Carvalho, A.C.P.L.F., Campello, R.J.: Cluster ensemble selection based on relative validity indexes. Data Min. Knowl. Disc. 27(2), 259–289 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  22. Tripathi, A.K., Sharma, K., Bala, M.: A novel clustering method using enhanced grey wolf optimizer and mapreduce. Big Data Res. 14, 93–100 (2018)

    Article  Google Scholar 

  23. Padmapriya, K.M., Anandhi, B., Vijayakumar, M.: MapReduce fuzzy C-means ensemble clustering with gentle AdaBoost for big data analytics. Int. J. Business Intell. Data Mining 19(2), 170–188 (2021)

    Article  Google Scholar 

  24. Santos, J.A., Syed, T.I., Naldi, M.C., Campello, R.J., Sander, J.: Hierarchical density-based clustering using MapReduce. IEEE Transact. Big Data 7(1), 102–114 (2019)

    Article  Google Scholar 

  25. Rajasekaran, S.: Efficient parallel hierarchical clustering algorithms. IEEE Trans. Parallel Distrib. Syst. 16(6), 497–502 (2005)

    Article  Google Scholar 

  26. Gao, H., Jiang, J., She, L., Fu, Y.: A new agglomerative hierarchical clustering algorithm implementation based on the map reduce framework. Int. J. Digital Content Technol. Appl. 4(3), 95–100 (2010)

    Article  Google Scholar 

  27. Liang, Z., Chen, P.: An automatic clustering algorithm based on the density-peak framework and Chameleon method. Pattern Recogn. Lett. 150, 40–48 (2021)

    Article  Google Scholar 

  28. Osmani, A., Mohasefi, J.B., Gharehchopogh, F.S.: Sentiment classification using two effective optimization methods derived from the artificial bee colony optimization and imperialist competitive algorithm. Comput. J. 65(1), 18–66 (2022)

    Article  Google Scholar 

  29. Berahmand, K., Mohammadi, M., Faroughi, A., Mohammadiani, R.P.: A novel method of spectral clustering in attributed networks by constructing parameter-free affinity matrix. Clust. Comput. 25, 869–888 (2022)

    Article  Google Scholar 

  30. Ishizaka, A., Lokman, B., Tasiou, M.: A stochastic multi-criteria divisive hierarchical clustering algorithm. Omega 103, 102370 (2021)

    Article  Google Scholar 

  31. Khedairia, S., Khadir, M.T.: A multiple clustering combination approach based on iterative voting process. J. King Saud Univ.-Comput. Inform. Sci. 34(1), 1370–1380 (2022)

    Google Scholar 

  32. Gupta, D., Khanna, A., L, S.K., Shankar, K., Furtado, V., Rodrigues, J.J.: Efficient artificial fish swarm based clustering approach on mobility aware energy-efficient for MANET. Transact. Emerg. Telecommun. Technol. 30(9), e3524 (2019)

    Google Scholar 

  33. Jafarzadegan, M., Safi-Esfahani, F., Beheshti, Z.: Combining hierarchical clustering approaches using the PCA method. Expert Syst. Appl. 137, 1–10 (2019)

    Article  Google Scholar 

Download references

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the design and implementation of the research, to the analysis of the results and to the writing of the manuscript.

Corresponding author

Correspondence to Ping Tian.

Ethics declarations

Ethics approval and consent to participate

This material is the authors' own original work, which has not been previously published elsewhere.

Consent for publication

Informed consent was obtained from all individual participants included in the study.

Competing interests

We certify that there is no actual or potential conflict of interest in relation to this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, P., Shen, H. & Abolfathi, A. Towards Efficient Ensemble Hierarchical Clustering with MapReduce-based Clusters Clustering Technique and the Innovative Similarity Criterion. J Grid Computing 20, 34 (2022). https://doi.org/10.1007/s10723-022-09623-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10723-022-09623-0

Keywords

Navigation