Towards Efficient Ensemble Hierarchical Clustering with MapReduce-based Clusters Clustering Technique and the Innovative Similarity Criterion

Tian, Ping; Shen, Huitao; Abolfathi, Ahad

doi:10.1007/s10723-022-09623-0

Towards Efficient Ensemble Hierarchical Clustering with MapReduce-based Clusters Clustering Technique and the Innovative Similarity Criterion

Published: 20 September 2022

Volume 20, article number 34, (2022)
Cite this article

Journal of Grid Computing Aims and scope Submit manuscript

Ping Tian¹,
Huitao Shen¹ &
Ahad Abolfathi²

144 Accesses
1 Citation
Explore all metrics

Abstract

Today, data plays an important and fundamental role in our daily lives. The increasing growth of data production has led to the big data revolution. Managing and analyzing this data, which is often unlabeled, is a major challenge for the real world. Clustering is one of the most important branches of data mining for data analysis and its purpose is to divide the data into meaningful subsets called clusters. Hierarchical clustering is one of the unsupervised learning algorithms for grouping data points with similar properties, so that its concept lies in the construction and analysis of dendrograms. Over the decades, many algorithms have been developed for clustering with different approaches. In this paper, an efficient ensemble hierarchical clustering algorithm based on MapReduce-based clusters clustering technique and an innovative similarity criterion is introduced. The main idea of ensemble clustering is to combine the results of different single clustering methods. Ensemble techniques usually produce better results than single methods due to multiple learning. Accordingly, it can be expected that the aggregation of hierarchical clustering methods will lead to higher quality in clustering. In addition, MapReduce is a model for implementing big data applications, where we use this model to implement hierarchical clustering methods. Meanwhile, the similarity between the samples is calculated through an innovative similarity criterion. The proposed approach is presented in three steps. In the first step, the data are clustered by several single hierarchical clustering methods. Then in the second step, hyper-clusters are generated by applying the clusters clustering technique. Finally, the final clusters are generated in the third step. This is done by allocating samples to hyper-clusters. Accordingly, the final clusters are formed in the third step. The simulation is performed on multiple real-world datasets and the results show better performance of the proposed approach compared to algorithms such as CHC and RCESCC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

A survey on ensemble learning

Article 30 August 2019

Data clustering: application and trends

Article 27 November 2022

Availability of data and material

Data sharing not applicable to this manuscript as no datasets were generated or analyzed during the current study.

References

Boongoen, T., Iam-On, N.: Cluster ensembles: A survey of approaches with recent extensions and applications. Comput. Sci. Rev. 28, 1–25 (2018)
Article MathSciNet MATH Google Scholar
Rezaeipanah, A., Nazari, H., Ahmadi, G.: A Hybrid Approach for Prolonging Lifetime of Wireless Sensor Networks Using Genetic Algorithm and Online Clustering. J. Comput. Sci. Eng. 13(4), 163–174 (2019)
Article Google Scholar
Nasiri, E., Berahmand, K., Rostami, M., Dabiri, M.: A novel link prediction algorithm for protein-protein interaction networks by attributed graph embedding. Comput. Biol. Med. 137, 104772 (2021)
Article Google Scholar
Ghobaei-Arani, M.: A workload clustering based resource provisioning mechanism using Biogeography based optimization technique in the cloud based systems. Soft. Comput. 25(5), 3813–3830 (2021)
Article Google Scholar
Mirzaei, A., Rahmati, M., Ahmadi, M.: A new method for hierarchical clustering combination. Intell. Data Anal. 12(6), 549–571 (2008)
Article Google Scholar
Mojarad, M., Nejatian, S., Parvin, H., Mohammadpoor, M.: A fuzzy clustering ensemble based on cluster clustering and iterative Fusion of base clusters. Appl. Intell. 49(7), 2567–2581 (2019)
Article Google Scholar
Shahidinejad, A., Ghobaei-Arani, M., Esmaeili, L.: An elastic controller using Colored Petri Nets in cloud computing environment. Clust. Comput. 23(2), 1045–1071 (2020)
Article Google Scholar
Rezaeipanah, A., Amiri, P., Jafari, S.: Performing the kick during walking for robocup 3d soccer simulation league using reinforcement learning algorithm. Int. J. Soc. Robot. 13(6), 1235–1252 (2021)
Article Google Scholar
Ghobaei-Arani, M., Shahidinejad, A.: An efficient resource provisioning approach for analyzing cloud workloads: a metaheuristic-based clustering approach. J. Supercomput. 77(1), 711–750 (2021)
Article Google Scholar
Lu, W.: Improved K-means clustering algorithm for big data mining under Hadoop parallel framework. J. Grid Comput. 18(2), 239–250 (2020)
Article Google Scholar
Mojarad, M., Sarhangnia, F., Rezaeipanah, A., Parvin, H., Nejatian, S.: Modeling Hereditary Disease Behavior Using an Innovative Similarity Criterion and Ensemble Clustering. Curr. Bioinform. 16(5), 749–764 (2021)
Article Google Scholar
Xia, D., Ning, F., He, W.: Research on parallel adaptive Canopy-K-Means clustering algorithm for big data mining based on cloud platform. J. Grid Comput. 18(2), 263–273 (2020)
Article Google Scholar
Shanthamallu, U. S., Spanias, A., Tepedelenlioglu, C., & Stanley, M.: A brief survey of machine learning methods and their sensor and IoT applications. In 2017 8th International Conference on Information, Intelligence, Systems & Applications (IISA) (pp. 1–8). IEEE. (2017)
Karthick, S., Yuvaraj, N., Rajakumari, P. A., & Raja, R. A.: Ensemble Similarity Clustering Frame work for Categorical Dataset Clustering Using Swarm Intelligence. In Intelligent Computing and Applications (pp. 549–557). Springer, Singapore. (2021)
Strehl, A., Ghosh, J.: Cluster ensembles–-a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3(Dec), 583–617 (2002)
MathSciNet MATH Google Scholar
Fern, X.Z., Lin, W.: Cluster ensemble selection. Stat. Anal. Data Mining: ASA Data Sci. J. 1(3), 128–141 (2008)
Article MathSciNet Google Scholar
Azimi, J., & Fern, X: Adaptive cluster ensemble selection. In Twenty-First International Joint Conference on Artificial Intelligence (pp. 992–997). Pasadena, California (2009)
Jia, J., Xiao, X., Liu, B., Jiao, L.: Bagging-based spectral clustering ensemble selection. Pattern Recogn. Lett. 32(10), 1456–1467 (2011)
Article Google Scholar
Jia, J., Xiao, X., & Liu, B: Similarity-based spectral clustering ensemble selection. In 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery (pp. 1071–1074). IEEE. (2012)
Banerjee, A: Leveraging frequency and diversity based ensemble selection to consensus clustering. In 2014 Seventh international conference on contemporary computing (IC3) (pp. 123–129). IEEE. (2014)
Naldi, M.C., Carvalho, A.C.P.L.F., Campello, R.J.: Cluster ensemble selection based on relative validity indexes. Data Min. Knowl. Disc. 27(2), 259–289 (2013)
Article MathSciNet MATH Google Scholar
Tripathi, A.K., Sharma, K., Bala, M.: A novel clustering method using enhanced grey wolf optimizer and mapreduce. Big Data Res. 14, 93–100 (2018)
Article Google Scholar
Padmapriya, K.M., Anandhi, B., Vijayakumar, M.: MapReduce fuzzy C-means ensemble clustering with gentle AdaBoost for big data analytics. Int. J. Business Intell. Data Mining 19(2), 170–188 (2021)
Article Google Scholar
Santos, J.A., Syed, T.I., Naldi, M.C., Campello, R.J., Sander, J.: Hierarchical density-based clustering using MapReduce. IEEE Transact. Big Data 7(1), 102–114 (2019)
Article Google Scholar
Rajasekaran, S.: Efficient parallel hierarchical clustering algorithms. IEEE Trans. Parallel Distrib. Syst. 16(6), 497–502 (2005)
Article Google Scholar
Gao, H., Jiang, J., She, L., Fu, Y.: A new agglomerative hierarchical clustering algorithm implementation based on the map reduce framework. Int. J. Digital Content Technol. Appl. 4(3), 95–100 (2010)
Article Google Scholar
Liang, Z., Chen, P.: An automatic clustering algorithm based on the density-peak framework and Chameleon method. Pattern Recogn. Lett. 150, 40–48 (2021)
Article Google Scholar
Osmani, A., Mohasefi, J.B., Gharehchopogh, F.S.: Sentiment classification using two effective optimization methods derived from the artificial bee colony optimization and imperialist competitive algorithm. Comput. J. 65(1), 18–66 (2022)
Article Google Scholar
Berahmand, K., Mohammadi, M., Faroughi, A., Mohammadiani, R.P.: A novel method of spectral clustering in attributed networks by constructing parameter-free affinity matrix. Clust. Comput. 25, 869–888 (2022)
Article Google Scholar
Ishizaka, A., Lokman, B., Tasiou, M.: A stochastic multi-criteria divisive hierarchical clustering algorithm. Omega 103, 102370 (2021)
Article Google Scholar
Khedairia, S., Khadir, M.T.: A multiple clustering combination approach based on iterative voting process. J. King Saud Univ.-Comput. Inform. Sci. 34(1), 1370–1380 (2022)
Google Scholar
Gupta, D., Khanna, A., L, S.K., Shankar, K., Furtado, V., Rodrigues, J.J.: Efficient artificial fish swarm based clustering approach on mobility aware energy-efficient for MANET. Transact. Emerg. Telecommun. Technol. 30(9), e3524 (2019)
Google Scholar
Jafarzadegan, M., Safi-Esfahani, F., Beheshti, Z.: Combining hierarchical clustering approaches using the PCA method. Expert Syst. Appl. 137, 1–10 (2019)
Article Google Scholar

Download references

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Institute of Applied Mathematics, Xuchang University, Xuchang, 461000, Henan, China
Ping Tian & Huitao Shen
Department of Computer and Information Technology Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran
Ahad Abolfathi

Authors

Ping Tian
View author publications
You can also search for this author in PubMed Google Scholar
Huitao Shen
View author publications
You can also search for this author in PubMed Google Scholar
Ahad Abolfathi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the design and implementation of the research, to the analysis of the results and to the writing of the manuscript.

Corresponding author

Correspondence to Ping Tian.

Ethics declarations

Ethics approval and consent to participate

This material is the authors' own original work, which has not been previously published elsewhere.

Consent for publication

Informed consent was obtained from all individual participants included in the study.

Competing interests

We certify that there is no actual or potential conflict of interest in relation to this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tian, P., Shen, H. & Abolfathi, A. Towards Efficient Ensemble Hierarchical Clustering with MapReduce-based Clusters Clustering Technique and the Innovative Similarity Criterion. J Grid Computing 20, 34 (2022). https://doi.org/10.1007/s10723-022-09623-0

Download citation

Received: 11 January 2022
Accepted: 09 September 2022
Published: 20 September 2022
DOI: https://doi.org/10.1007/s10723-022-09623-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards Efficient Ensemble Hierarchical Clustering with MapReduce-based Clusters Clustering Technique and the Innovative Similarity Criterion

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

A survey on ensemble learning

Data clustering: application and trends

Availability of data and material

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards Efficient Ensemble Hierarchical Clustering with MapReduce-based Clusters Clustering Technique and the Innovative Similarity Criterion

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

A survey on ensemble learning

Data clustering: application and trends

Availability of data and material

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation