Skip to main content
Log in

Feature Relevance in Ward’s Hierarchical Clustering Using the L p Norm

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

In this paper we introduce a new hierarchical clustering algorithm called Ward p . Unlike the original Ward, Ward p generates feature weights, which can be seen as feature rescaling factors thanks to the use of the L p norm. The feature weights are cluster dependent, allowing a feature to have different degrees of relevance at different clusters.

We validate our method by performing experiments on a total of 75 real-world and synthetic datasets, with and without added features made of uniformly random noise. Our experiments show that: (i) the use of our feature weighting method produces results that are superior to those produced by the original Ward method on datasets containing noise features; (ii) it is indeed possible to estimate a good exponent p under a totally unsupervised framework. The clusterings produced by Ward p are dependent on p. This makes the estimation of a good value for this exponent a requirement for this algorithm, and indeed for any other also based on the L p norm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Reference

  • AMORIM, R.C., and FENNER, T. (2012), “Weighting Features for Partition Around Medoids Using the Minkowski Metric”, Lecture Notes in Computer Science, 7619, 35–44.

  • AMORIM, R.C., and KOMISARCZUK, P. (2012a), “On Initializations for the Minkowski Weighted K-Means”, Lecture Notes in Computer Science, 7619, 45–55.

  • AMORIM, R C., and KOMISARCZUK, P. (2012b), “On Partitional Clustering of Malware”, in Proceedings of The First International Workshop on Cyberpatterns: Unifying Design Patterns with Security, Attack and Forensic Patterns, pp. 47–51.

  • AMORIM, R.C., and MIRKIN, B. (2012), “Minkowski Metric, Feature Weighting and Anomalous Cluster Initializing in K-Means Clustering”, Pattern Recognition, 45(3), 1061–1075.

  • BALL, G.H., and HALL D.J. (1967), “A Clustering Technique for Summarizing Multivariate Data”, Behavioral Science, 12(2), 153–155.

  • BEZDEK, J.C. (1981), Pattern Recognition with Fuzzy Objective Function Algorithms, Norwell MA: Kluwer Academic Publishers.

  • CHAN, E.Y., CHING, W.K., NG, M.K., and HUANG, J.Z. (2004), “An Optimization Algorithm for Clustering Using Weighted Dissimilarity Measures”, Pattern Recognition, 37(5), 943–952.

  • DESARBO, W.S., CARROLL, J.D., CLARK, L.A., and GREEN, P.E. (1984), “Synthesized Clustering: A Method for Amalgamating Alternative Clustering Bases with Differential Weighting of Variables”, Psychometrika, 49(1), 57–78.

  • DE SOETE, G. (1986), “Optimal Variable Weighting for Ultrametric and Additive Tree Clustering”, Quality and Quantity, 20(2), 169–180.

  • DE SOETE, G. (1988), “OVWTRE: A Program for Optimal Variable Weighting for Ultrametric and Additive Tree Fitting”, Journal of Classification, 5(1), 101–104.

  • FELSENSTEIN, J. (1997), “An Alternating Least Squares Approach to Inferring Phylogenies from Pairwise Distances”, Systematic Biology, 46(1), 101–111.

  • FLOREK, K., LUKASZEWICZ, J., PERKAL, J., STEINHAUS, H., and ZUBRZYCKI, S. (1951), “Taksonomia Wroclawska”, Przegląd Antropologiczny, 17, 93–207.

  • FRANK, A., and ASUNCION, A. (2010), “UCI Machine Learning Repository”, University of California, Irvine, School of Information and Computer Sciences, http://archive.ics.uci.edu/ml.

  • GREEN, P.E., CARMONE, F.J., and KIM, J. (1990), “A Preliminary Study of Optimal Variable Weighting in K-Means Clustering”, Journal of Classification, 7(2), 271–285.

  • HALDAR, P., PAVORD, I.D., SHAW, D.E., BERRY, M.A., THOMAS, M., BRIGHTLING, C.E., WARDLAW, A.J., and GREEN, R.H. (2008), “Cluster Analysis and Clinical Asthma Phenotypes”, American Journal of Respiratory and Critical Care Medicine, 178(3), 218–224.

  • HUANG, J.Z., NG, M.K., RONG, H., and LI, Z. (2005), “Automated Variable Weighting in K-Means Type Clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5), 657–668.

  • HUANG, J.Z., XU, J., NG, M., and YE, Y. (2008), “WeightingMethod for Feature Selection in K-Means”, in Computational Methods of Feature Selection, eds. H. Liu, and H. Motoda, Boca Raton FL: Chapman and Hall/CRC, pp. 193–210.

  • HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions”, Journal of Classification, 2(1), 193–218.

  • JAIN A.K. (2010), “Data Clustering: 50 Years Beyond K-Means”, Pattern Recognition Letters, 31(8), 651–666.

  • KAUFMAN, L., and ROUSSEEUW, P.J. (1990), Finding Groups in Data: An Introduction to Cluster Analysis, Hoboken NJ: John Wiley & Sons.

  • LIU, H., and YU, L. (2005), “Toward Integrating Feature Selection Algorithms for Classification and Clustering”, IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502.

  • MACQUEEN, J. (1967), “Some Methods for Classification and Analysis of Multivariate Observations”, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297.

  • MAKARENKOV, V., and LECLERC, B. (1999), “An Algorithm for the Fitting of a Tree Metric According to a Weighted Least-Squares Criterion”, Journal of Classification, 16(1), 3–26.

  • MAKARENKOV, V., and LEGENDRE, P. (2001), “Optimal Variable Weighting for Ultrametric and Additive Trees and K-Means Partitioning: Methods and Software”, Journal of Classification, 18(2), 245–271.

  • MILLIGAN, G.W., and COOPER, M.C. (1988), “A Study of Standardization of Variables in Cluster Analysis”, Journal of Classification, 5(2), 181–204.

  • MIRKIN, B. (2005), Clustering for Data Mining: A Data Recovery Approach, Boca Raton FL: Chapman and Hall/CRC.

  • MITRA, P., MURTHY, C.A., and PAL, S.K. (2002), “Unsupervised Feature Selection Using Feature Similarity”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 301–312.

  • MURTAGH, F., and LEGENDRE, P. (2014), “Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?” Journal of Classification, 31, 274–295.

  • PAL, S.K., and MAJUMDER, D.D. (1977), “Fuzzy Sets and Decision Making Approaches in Vowel and Speaker Recognition”, Transactions on Systems, Man, and Cybernetics, 7, 625–629.

  • ROUSSEEUW, P.J. (1987), “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis”, Journal of Computational and Applied Mathematics, 20, 53–65.

  • SOKAL, R.R., and MICHENER, C. (1958), “A Statistical Method for Evaluating Systematic Relationships”, University of Kansas Science Bulletin, 38, 1409–1438.

  • SØRENSEN, T. (1948), “A Method of Establishing Groups of Equal Amplitude in Plant Sociology Based on Similarity of Species and Its Application to Analyses of the Vegetation on Danish Commons”, Biologiske Skrifter, 5, 1–34.

  • STEINLEY, D. (2004), “Standardizing Variables in K-Means”, in Classification, Clustering, and Data Mining Applications, eds. D. Banks, F.R. McMorris, P. Arabie, and W. Gaul, Heidelberg: Springer, pp. 53–60.

  • SZÉKELY, G.J., and RIZZO, M.L. (2005), “Hierarchical Clustering via Joint Between Within Distances: Extending Ward’s Minimum Variance Method”, Journal of Classification, 22(2), 151–183.

  • TALAVERA, L. (1999), “Feature Selection as a Preprocessing Step for Hierarchical Clustering”, in Proceedings of the Sixteenth International Conference on Machine Learning, pp. 389–397.

  • WARD JR, J.H. (1963), “Hierarchical Grouping to Optimize an Objective Function”, Journal of the American Statistical Association, 236–244.

  • XU, R., and WUNSCH, D. II (2005), “Survey of Clustering Algorithms”, IEEE Transactions on Neural Networks, 16(3), 645–678.

  • ZADEH, L.A. (1965), “Fuzzy Sets”, Information and Control, 8(3), 338–353.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Renato Cordeiro de Amorim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

de Amorim, R.C. Feature Relevance in Ward’s Hierarchical Clustering Using the L p Norm. J Classif 32, 46–62 (2015). https://doi.org/10.1007/s00357-015-9167-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-015-9167-1

Keywords

Navigation