Development of new agglomerative and performance evaluation models for classification

Abstract

This study proposes two new hierarchical clustering methods, namely weighted and neighbourhood to overcome the issues such as getting less accuracy, inability to separate the clusters properly and the grouping of more number of clusters which exist in present hierarchical clustering methods. We have also proposed three new criteria to assess the performance of clustering methods: (1) overall effectiveness which means the product of overall efficiency and accuracy of the clusters which is used to evaluate the performance of the hierarchical clustering methods for the class label datasets, (2) modified structure strength S(c) to overcome the usage problem in hierarchical clustering methods to determine the number of clusters for non-class label datasets and (3) R-value which is the ratio of the determinant of the sum of square and cross product matrix of between-clusters to the determinant of the sum of square and cross product matrix of within-clusters. This will help us to validate the performance of hierarchical clustering methods for non-class label datasets. The evolved algorithms provided high accuracy, ability to separate the clusters properly and the grouping of less number of clusters. The performance of the new algorithms with existing algorithms is compared in terms of newly developed performance criteria. The new algorithms thus performed better than the existing algorithms. The whole exercise is done with the help of twelve class label and six non-class label datasets.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  1. 1.

    Day WHE, Edelsbrunner H (1984) Efficient algorithms for agglomerative hierarchical clustering methods. J Classif 1:7–24

    Article  Google Scholar 

  2. 2.

    Murthy N, Devi S (2011) Pattern recognition: an algorithmic approach. Springer, Berlin

    Book  Google Scholar 

  3. 3.

    Frigui H, Krishnapuram R (1997) Clustering by competitive agglomeration. Pattern Recogn 30:1109–1119

    Article  Google Scholar 

  4. 4.

    Clarke MRB, Duda RO, Hart PE (2006) Pattern classification and scene analysis. J R Stat Soc Ser A 137:442–443. https://doi.org/10.2307/2344977

    Article  Google Scholar 

  5. 5.

    Jain AK, Dubes C (1988) Algorithms for clustering data_Jain.pdf. Prentice Hall, Englewood Cliffs

    MATH  Google Scholar 

  6. 6.

    Bouguettaya A, Yu Q, Liu X et al (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797. https://doi.org/10.1016/j.eswa.2014.09.054

    Article  Google Scholar 

  7. 7.

    Guha S, Rastogi R, Shim K (2001) CURE: an efficient clustering algorithm for large databases. Inf Syst 26(1):35–58. https://doi.org/10.1016/S0306-4379(01)00008-4

    Article  MATH  Google Scholar 

  8. 8.

    Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering databases method for very large. ACM SIGMOD Rec. https://doi.org/10.1145/233269.233324

    Article  Google Scholar 

  9. 9.

    Müllner D (2011) Modern hierarchical, agglomerative clustering algorithms. http://arXiv.org/abs/1109.2378v1

  10. 10.

    Müllner D (2015) Fastcluster: fast hierarchical, agglomerative clustering routines for R and Python. J Stat Softw 53(9):1–18. https://doi.org/10.18637/jss.v053.i09

    Article  Google Scholar 

  11. 11.

    Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323. https://doi.org/10.1145/331499.331504

    Article  Google Scholar 

  12. 12.

    Malhotra NK, Birks DF (2009) Marketing research: an applied approach. Pearson Education, London

    Book  Google Scholar 

  13. 13.

    Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. Comput J 26(4):354–359

    Article  Google Scholar 

  14. 14.

    Sarle WS, Jain AK, Dubes RC (2006) Algorithms for clustering data. Technometrics. https://doi.org/10.2307/1268876

    Article  Google Scholar 

  15. 15.

    Johnson RA, Wichern DW (1988) Multivariate linear regression models, 2nd edn. Prentice Hall, Englewood Cliffs

    Google Scholar 

  16. 16.

    Shalom SA, Dash M (2013) Efficient partitioning based hierarchical agglomerative clustering using graphics accelerators with Cuda. Int J Artif Intell Appl 4:13. https://doi.org/10.5121/ijaia.2013.4202

    Article  Google Scholar 

  17. 17.

    Sebban M, Nock R, Lallich S et al (2002) Stopping criterion for boosting-based data reduction techniques: from binary to multiclass problems. J Mach Learn Res 3:863–885

    MathSciNet  MATH  Google Scholar 

  18. 18.

    Rodrigues PP, Pedroso P (2007) Hierarchical clustering of time series data streams. IEEE Trans Knowl Data Eng 10:1–12

    Google Scholar 

  19. 19.

    Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254. https://doi.org/10.1007/BF02289588

    Article  MATH  Google Scholar 

  20. 20.

    Murtagh F, Contreras P (2012) Algorithms for hierarchical clustering: an overview. Wiley Interdiscip Rev Data Min Knowl Discov. https://doi.org/10.1002/widm.53

    Article  MATH  Google Scholar 

  21. 21.

    Fung BCM, Wang K, Ester M (2011) Hierarchical document clustering. In: Encyclopedia of data warehousing and mining, Second edition, pp 970–975  

  22. 22.

    Moore AW (2001) K-means and hierarchical clustering. Stat Data Min Tutorials 1–24  

  23. 23.

    Rui-Ping L, Mukaidono M (2002) A maximum-entropy approach to fuzzy clustering. In: Proceedings of 1995 IEEE International conference on fuzzy systems. IEEE, Yokohama, pp 2227–2232. https://doi.org/10.1109/fuzzy.1995.409989

  24. 24.

    Anderberg MR (1978) Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks. Academic Press, Cambridge

    Google Scholar 

  25. 25.

    Gordon AD (2006) A review of hierarchical classification. J R Stat Soc Ser A 150(2):119–137. https://doi.org/10.2307/2981629

    MathSciNet  Article  MATH  Google Scholar 

  26. 26.

    Batagelj V (1981) Note on ultrametric hierarchical clustering algorithms. Psychometrika 46(3):351–352. https://doi.org/10.1007/BF02293743

    MathSciNet  Article  Google Scholar 

  27. 27.

    Milligan GW, Romesburg HC (2006) Cluster analysis for researchers. J Mark Res. https://doi.org/10.2307/3151374

    Article  Google Scholar 

  28. 28.

    Al-Dabooni S, Wunsch D (2018) Model order reduction based on agglomerative hierarchical clustering. IEEE Trans Neural Netw Learn, Syst

    Google Scholar 

  29. 29.

    Liu H, Fen L, Jian J, Chen L (2017) Overlapping community discovery algorithm based on hierarchical agglomerative clustering. Int J Pattern Recognit Artif Intell 32(03):1850008. https://doi.org/10.1142/s0218001418500088

    MathSciNet  Article  Google Scholar 

  30. 30.

    Ying Z, Karypis G (2002) Evaluation of hierarchical clustering algorithms for document datasets. CIKM. ACM, New York, pp 515–524

    Google Scholar 

  31. 31.

    Nazari Z, Kang D, Asharif MR et al (2015) A new hierarchical clustering algorithm. Int Conf Intell Inform Biomed Sci 2015:148–152. https://doi.org/10.1109/ICIIBMS.2015.7439517

    Article  Google Scholar 

  32. 32.

    Fan J (2015) OPE-HCA: an optimal probabilistic estimation approach for hierarchical clustering algorithm. Neural Comput Appl. https://doi.org/10.1007/s00521-015-1998-5

    Article  Google Scholar 

  33. 33.

    Cheng D, Zhu Q, Wu Q (2018) A local cores-based hierarchical clustering algorithm for data sets with complex structures. Proc Int Comput Softw Appl Conf 1:410–419. https://doi.org/10.1109/COMPSAC.2018.00063

    Article  Google Scholar 

  34. 34.

    Koga H, Ishibashi T, Watanabe T (2007) Fast agglomerative hierarchical clustering algorithm using locality-sensitive hashing. Knowl Inf Syst 12(1):25–53. https://doi.org/10.1007/s10115-006-0027-5

    Article  MATH  Google Scholar 

  35. 35.

    Zahoránszky LA, Katona GY, Hári P et al (2009) Breaking the hierarchy—a new cluster selection mechanism for hierarchical clustering methods. Algorithms Mol Biol 4(1):12. https://doi.org/10.1186/1748-7188-4-12

    Article  Google Scholar 

  36. 36.

    Fisher RA (2011) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x

    Article  Google Scholar 

  37. 37.

    Fischer I, Poland J (2005) Amplifying the block matrix structure for spectral clustering. In: van Otterlo M, Poel M, Nijholt A (eds) Proceedings of the 14th annual machine learning conference of Belgium and the Netherlands, pp 21–28

  38. 38.

    Uysal I, Güvenir HA (2004) Instance-based regression by partitioning feature projections. Appl Intell 21(1):57–79. https://doi.org/10.1023/B:APIN.0000027767.87895.b2

    Article  MATH  Google Scholar 

  39. 39.

    Cohen I, Cozman FG, Sebe N et al (2004) Semisupervised learning of classifiers: theory, algorithms, and their application to human–computer interaction. IEEE Trans Pattern Anal Mach Intell 26:1553–1567

    Article  Google Scholar 

  40. 40.

    Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, Seattle, pp 69–78, 22–25 Aug 2004. https://doi.org/10.1145/1014052.1014063

  41. 41.

    Ritter G (2018) Robust cluster analysis and variable selection. Chapman and Hall, London

    MATH  Google Scholar 

  42. 42.

    Asuncion A, Newman DJ (2015) UCI machine learning repository: data sets. UCI

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to M. Punniyamoorthy.

Ethics declarations

Conflict of interest

The authors declare that we have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Vijaya Prabhagar, M., Punniyamoorthy, M. Development of new agglomerative and performance evaluation models for classification. Neural Comput & Applic 32, 2589–2600 (2020). https://doi.org/10.1007/s00521-019-04297-4

Download citation

Keywords

  • Clustering analysis
  • Hierarchical clustering
  • Weighted clustering
  • Neighbourhood clustering
  • Structure strength