Feature Relevance in Ward’s Hierarchical Clustering Using the L p Norm

de Amorim, Renato Cordeiro

doi:10.1007/s00357-015-9167-1

Feature Relevance in Ward’s Hierarchical Clustering Using the L_p Norm

Published: 11 March 2015

Volume 32, pages 46–62, (2015)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Renato Cordeiro de Amorim¹

764 Accesses
56 Citations
3 Altmetric
Explore all metrics

Abstract

In this paper we introduce a new hierarchical clustering algorithm called Ward_p. Unlike the original Ward, Ward_p generates feature weights, which can be seen as feature rescaling factors thanks to the use of the L_p norm. The feature weights are cluster dependent, allowing a feature to have different degrees of relevance at different clusters.

We validate our method by performing experiments on a total of 75 real-world and synthetic datasets, with and without added features made of uniformly random noise. Our experiments show that: (i) the use of our feature weighting method produces results that are superior to those produced by the original Ward method on datasets containing noise features; (ii) it is indeed possible to estimate a good exponent p under a totally unsupervised framework. The clusterings produced by Ward_p are dependent on p. This makes the estimation of a good value for this exponent a requirement for this algorithm, and indeed for any other also based on the L_p norm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reference

AMORIM, R.C., and FENNER, T. (2012), “Weighting Features for Partition Around Medoids Using the Minkowski Metric”, Lecture Notes in Computer Science, 7619, 35–44.
AMORIM, R.C., and KOMISARCZUK, P. (2012a), “On Initializations for the Minkowski Weighted K-Means”, Lecture Notes in Computer Science, 7619, 45–55.
AMORIM, R C., and KOMISARCZUK, P. (2012b), “On Partitional Clustering of Malware”, in Proceedings of The First International Workshop on Cyberpatterns: Unifying Design Patterns with Security, Attack and Forensic Patterns, pp. 47–51.
AMORIM, R.C., and MIRKIN, B. (2012), “Minkowski Metric, Feature Weighting and Anomalous Cluster Initializing in K-Means Clustering”, Pattern Recognition, 45(3), 1061–1075.
BALL, G.H., and HALL D.J. (1967), “A Clustering Technique for Summarizing Multivariate Data”, Behavioral Science, 12(2), 153–155.
BEZDEK, J.C. (1981), Pattern Recognition with Fuzzy Objective Function Algorithms, Norwell MA: Kluwer Academic Publishers.
CHAN, E.Y., CHING, W.K., NG, M.K., and HUANG, J.Z. (2004), “An Optimization Algorithm for Clustering Using Weighted Dissimilarity Measures”, Pattern Recognition, 37(5), 943–952.
DESARBO, W.S., CARROLL, J.D., CLARK, L.A., and GREEN, P.E. (1984), “Synthesized Clustering: A Method for Amalgamating Alternative Clustering Bases with Differential Weighting of Variables”, Psychometrika, 49(1), 57–78.
DE SOETE, G. (1986), “Optimal Variable Weighting for Ultrametric and Additive Tree Clustering”, Quality and Quantity, 20(2), 169–180.
DE SOETE, G. (1988), “OVWTRE: A Program for Optimal Variable Weighting for Ultrametric and Additive Tree Fitting”, Journal of Classification, 5(1), 101–104.
FELSENSTEIN, J. (1997), “An Alternating Least Squares Approach to Inferring Phylogenies from Pairwise Distances”, Systematic Biology, 46(1), 101–111.
FLOREK, K., LUKASZEWICZ, J., PERKAL, J., STEINHAUS, H., and ZUBRZYCKI, S. (1951), “Taksonomia Wroclawska”, Przegląd Antropologiczny, 17, 93–207.
FRANK, A., and ASUNCION, A. (2010), “UCI Machine Learning Repository”, University of California, Irvine, School of Information and Computer Sciences, http://archive.ics.uci.edu/ml.
GREEN, P.E., CARMONE, F.J., and KIM, J. (1990), “A Preliminary Study of Optimal Variable Weighting in K-Means Clustering”, Journal of Classification, 7(2), 271–285.
HALDAR, P., PAVORD, I.D., SHAW, D.E., BERRY, M.A., THOMAS, M., BRIGHTLING, C.E., WARDLAW, A.J., and GREEN, R.H. (2008), “Cluster Analysis and Clinical Asthma Phenotypes”, American Journal of Respiratory and Critical Care Medicine, 178(3), 218–224.
HUANG, J.Z., NG, M.K., RONG, H., and LI, Z. (2005), “Automated Variable Weighting in K-Means Type Clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5), 657–668.
HUANG, J.Z., XU, J., NG, M., and YE, Y. (2008), “WeightingMethod for Feature Selection in K-Means”, in Computational Methods of Feature Selection, eds. H. Liu, and H. Motoda, Boca Raton FL: Chapman and Hall/CRC, pp. 193–210.
HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions”, Journal of Classification, 2(1), 193–218.
JAIN A.K. (2010), “Data Clustering: 50 Years Beyond K-Means”, Pattern Recognition Letters, 31(8), 651–666.
KAUFMAN, L., and ROUSSEEUW, P.J. (1990), Finding Groups in Data: An Introduction to Cluster Analysis, Hoboken NJ: John Wiley & Sons.
LIU, H., and YU, L. (2005), “Toward Integrating Feature Selection Algorithms for Classification and Clustering”, IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502.
MACQUEEN, J. (1967), “Some Methods for Classification and Analysis of Multivariate Observations”, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297.
MAKARENKOV, V., and LECLERC, B. (1999), “An Algorithm for the Fitting of a Tree Metric According to a Weighted Least-Squares Criterion”, Journal of Classification, 16(1), 3–26.
MAKARENKOV, V., and LEGENDRE, P. (2001), “Optimal Variable Weighting for Ultrametric and Additive Trees and K-Means Partitioning: Methods and Software”, Journal of Classification, 18(2), 245–271.
MILLIGAN, G.W., and COOPER, M.C. (1988), “A Study of Standardization of Variables in Cluster Analysis”, Journal of Classification, 5(2), 181–204.
MIRKIN, B. (2005), Clustering for Data Mining: A Data Recovery Approach, Boca Raton FL: Chapman and Hall/CRC.
MITRA, P., MURTHY, C.A., and PAL, S.K. (2002), “Unsupervised Feature Selection Using Feature Similarity”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 301–312.
MURTAGH, F., and LEGENDRE, P. (2014), “Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?” Journal of Classification, 31, 274–295.
PAL, S.K., and MAJUMDER, D.D. (1977), “Fuzzy Sets and Decision Making Approaches in Vowel and Speaker Recognition”, Transactions on Systems, Man, and Cybernetics, 7, 625–629.
ROUSSEEUW, P.J. (1987), “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis”, Journal of Computational and Applied Mathematics, 20, 53–65.
SOKAL, R.R., and MICHENER, C. (1958), “A Statistical Method for Evaluating Systematic Relationships”, University of Kansas Science Bulletin, 38, 1409–1438.
SØRENSEN, T. (1948), “A Method of Establishing Groups of Equal Amplitude in Plant Sociology Based on Similarity of Species and Its Application to Analyses of the Vegetation on Danish Commons”, Biologiske Skrifter, 5, 1–34.
STEINLEY, D. (2004), “Standardizing Variables in K-Means”, in Classification, Clustering, and Data Mining Applications, eds. D. Banks, F.R. McMorris, P. Arabie, and W. Gaul, Heidelberg: Springer, pp. 53–60.
SZÉKELY, G.J., and RIZZO, M.L. (2005), “Hierarchical Clustering via Joint Between Within Distances: Extending Ward’s Minimum Variance Method”, Journal of Classification, 22(2), 151–183.
TALAVERA, L. (1999), “Feature Selection as a Preprocessing Step for Hierarchical Clustering”, in Proceedings of the Sixteenth International Conference on Machine Learning, pp. 389–397.
WARD JR, J.H. (1963), “Hierarchical Grouping to Optimize an Objective Function”, Journal of the American Statistical Association, 236–244.
XU, R., and WUNSCH, D. II (2005), “Survey of Clustering Algorithms”, IEEE Transactions on Neural Networks, 16(3), 645–678.
ZADEH, L.A. (1965), “Fuzzy Sets”, Information and Control, 8(3), 338–353.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Systems, Birkbeck University of London, Malet Street, London, WC1E 7HX, UK
Renato Cordeiro de Amorim

Authors

Renato Cordeiro de Amorim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Renato Cordeiro de Amorim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

de Amorim, R.C. Feature Relevance in Ward’s Hierarchical Clustering Using the L_p Norm. J Classif 32, 46–62 (2015). https://doi.org/10.1007/s00357-015-9167-1

Download citation

Published: 11 March 2015
Issue Date: April 2015
DOI: https://doi.org/10.1007/s00357-015-9167-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature Relevance in Ward’s Hierarchical Clustering Using the L_p Norm

Abstract

Access this article

Similar content being viewed by others

A Survey on Feature Weighting Based K-Means Algorithms

A Feature Selection Method Using Hierarchical Clustering

Feature Maximization Based Clustering Quality Evaluation: A Promising Approach

Reference

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature Relevance in Ward’s Hierarchical Clustering Using the L p Norm

Abstract

Access this article

Similar content being viewed by others

A Survey on Feature Weighting Based K-Means Algorithms

A Feature Selection Method Using Hierarchical Clustering

Feature Maximization Based Clustering Quality Evaluation: A Promising Approach

Reference

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Feature Relevance in Ward’s Hierarchical Clustering Using the L_p Norm