Abstract
This study proposes two new hierarchical clustering methods, namely weighted and neighbourhood to overcome the issues such as getting less accuracy, inability to separate the clusters properly and the grouping of more number of clusters which exist in present hierarchical clustering methods. We have also proposed three new criteria to assess the performance of clustering methods: (1) overall effectiveness which means the product of overall efficiency and accuracy of the clusters which is used to evaluate the performance of the hierarchical clustering methods for the class label datasets, (2) modified structure strength S(c) to overcome the usage problem in hierarchical clustering methods to determine the number of clusters for non-class label datasets and (3) R-value which is the ratio of the determinant of the sum of square and cross product matrix of between-clusters to the determinant of the sum of square and cross product matrix of within-clusters. This will help us to validate the performance of hierarchical clustering methods for non-class label datasets. The evolved algorithms provided high accuracy, ability to separate the clusters properly and the grouping of less number of clusters. The performance of the new algorithms with existing algorithms is compared in terms of newly developed performance criteria. The new algorithms thus performed better than the existing algorithms. The whole exercise is done with the help of twelve class label and six non-class label datasets.
Similar content being viewed by others
References
Day WHE, Edelsbrunner H (1984) Efficient algorithms for agglomerative hierarchical clustering methods. J Classif 1:7–24
Murthy N, Devi S (2011) Pattern recognition: an algorithmic approach. Springer, Berlin
Frigui H, Krishnapuram R (1997) Clustering by competitive agglomeration. Pattern Recogn 30:1109–1119
Clarke MRB, Duda RO, Hart PE (2006) Pattern classification and scene analysis. J R Stat Soc Ser A 137:442–443. https://doi.org/10.2307/2344977
Jain AK, Dubes C (1988) Algorithms for clustering data_Jain.pdf. Prentice Hall, Englewood Cliffs
Bouguettaya A, Yu Q, Liu X et al (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797. https://doi.org/10.1016/j.eswa.2014.09.054
Guha S, Rastogi R, Shim K (2001) CURE: an efficient clustering algorithm for large databases. Inf Syst 26(1):35–58. https://doi.org/10.1016/S0306-4379(01)00008-4
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering databases method for very large. ACM SIGMOD Rec. https://doi.org/10.1145/233269.233324
Müllner D (2011) Modern hierarchical, agglomerative clustering algorithms. http://arXiv.org/abs/1109.2378v1
Müllner D (2015) Fastcluster: fast hierarchical, agglomerative clustering routines for R and Python. J Stat Softw 53(9):1–18. https://doi.org/10.18637/jss.v053.i09
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323. https://doi.org/10.1145/331499.331504
Malhotra NK, Birks DF (2009) Marketing research: an applied approach. Pearson Education, London
Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. Comput J 26(4):354–359
Sarle WS, Jain AK, Dubes RC (2006) Algorithms for clustering data. Technometrics. https://doi.org/10.2307/1268876
Johnson RA, Wichern DW (1988) Multivariate linear regression models, 2nd edn. Prentice Hall, Englewood Cliffs
Shalom SA, Dash M (2013) Efficient partitioning based hierarchical agglomerative clustering using graphics accelerators with Cuda. Int J Artif Intell Appl 4:13. https://doi.org/10.5121/ijaia.2013.4202
Sebban M, Nock R, Lallich S et al (2002) Stopping criterion for boosting-based data reduction techniques: from binary to multiclass problems. J Mach Learn Res 3:863–885
Rodrigues PP, Pedroso P (2007) Hierarchical clustering of time series data streams. IEEE Trans Knowl Data Eng 10:1–12
Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254. https://doi.org/10.1007/BF02289588
Murtagh F, Contreras P (2012) Algorithms for hierarchical clustering: an overview. Wiley Interdiscip Rev Data Min Knowl Discov. https://doi.org/10.1002/widm.53
Fung BCM, Wang K, Ester M (2011) Hierarchical document clustering. In: Encyclopedia of data warehousing and mining, Second edition, pp 970–975
Moore AW (2001) K-means and hierarchical clustering. Stat Data Min Tutorials 1–24
Rui-Ping L, Mukaidono M (2002) A maximum-entropy approach to fuzzy clustering. In: Proceedings of 1995 IEEE International conference on fuzzy systems. IEEE, Yokohama, pp 2227–2232. https://doi.org/10.1109/fuzzy.1995.409989
Anderberg MR (1978) Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks. Academic Press, Cambridge
Gordon AD (2006) A review of hierarchical classification. J R Stat Soc Ser A 150(2):119–137. https://doi.org/10.2307/2981629
Batagelj V (1981) Note on ultrametric hierarchical clustering algorithms. Psychometrika 46(3):351–352. https://doi.org/10.1007/BF02293743
Milligan GW, Romesburg HC (2006) Cluster analysis for researchers. J Mark Res. https://doi.org/10.2307/3151374
Al-Dabooni S, Wunsch D (2018) Model order reduction based on agglomerative hierarchical clustering. IEEE Trans Neural Netw Learn, Syst
Liu H, Fen L, Jian J, Chen L (2017) Overlapping community discovery algorithm based on hierarchical agglomerative clustering. Int J Pattern Recognit Artif Intell 32(03):1850008. https://doi.org/10.1142/s0218001418500088
Ying Z, Karypis G (2002) Evaluation of hierarchical clustering algorithms for document datasets. CIKM. ACM, New York, pp 515–524
Nazari Z, Kang D, Asharif MR et al (2015) A new hierarchical clustering algorithm. Int Conf Intell Inform Biomed Sci 2015:148–152. https://doi.org/10.1109/ICIIBMS.2015.7439517
Fan J (2015) OPE-HCA: an optimal probabilistic estimation approach for hierarchical clustering algorithm. Neural Comput Appl. https://doi.org/10.1007/s00521-015-1998-5
Cheng D, Zhu Q, Wu Q (2018) A local cores-based hierarchical clustering algorithm for data sets with complex structures. Proc Int Comput Softw Appl Conf 1:410–419. https://doi.org/10.1109/COMPSAC.2018.00063
Koga H, Ishibashi T, Watanabe T (2007) Fast agglomerative hierarchical clustering algorithm using locality-sensitive hashing. Knowl Inf Syst 12(1):25–53. https://doi.org/10.1007/s10115-006-0027-5
Zahoránszky LA, Katona GY, Hári P et al (2009) Breaking the hierarchy—a new cluster selection mechanism for hierarchical clustering methods. Algorithms Mol Biol 4(1):12. https://doi.org/10.1186/1748-7188-4-12
Fisher RA (2011) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x
Fischer I, Poland J (2005) Amplifying the block matrix structure for spectral clustering. In: van Otterlo M, Poel M, Nijholt A (eds) Proceedings of the 14th annual machine learning conference of Belgium and the Netherlands, pp 21–28
Uysal I, Güvenir HA (2004) Instance-based regression by partitioning feature projections. Appl Intell 21(1):57–79. https://doi.org/10.1023/B:APIN.0000027767.87895.b2
Cohen I, Cozman FG, Sebe N et al (2004) Semisupervised learning of classifiers: theory, algorithms, and their application to human–computer interaction. IEEE Trans Pattern Anal Mach Intell 26:1553–1567
Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, Seattle, pp 69–78, 22–25 Aug 2004. https://doi.org/10.1145/1014052.1014063
Ritter G (2018) Robust cluster analysis and variable selection. Chapman and Hall, London
Asuncion A, Newman DJ (2015) UCI machine learning repository: data sets. UCI
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that we have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Vijaya Prabhagar, M., Punniyamoorthy, M. Development of new agglomerative and performance evaluation models for classification. Neural Comput & Applic 32, 2589–2600 (2020). https://doi.org/10.1007/s00521-019-04297-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04297-4