Advertisement

Variable Selection in Cluster Analysis: An Approach Based on a New Index

  • Isabella Morlini
  • Sergio Zani
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

In cluster analysis, the inclusion of unnecessary variables may mask the true group structure. For the selection of the best subset of variables, we suggest the use of two overall indices. The first index is a distance between two hierarchical clusterings and the second one is a similarity index obtained as the complement to one of the previous distance. Both criteria can be used for measuring the similarity between clusterings obtained with different subsets of variables. An application with a real data set regarding the economic welfare of the Italian Regions shows the benefits gained with the suggested procedure.

Keywords

Variable Selection Similarity Index Italian Region Pairwise Similarity Rand Index 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Albatineh, A. N., Niewiadomska-Bugaj, M., & Mihalko, D. (2006). On similarity indices and correction for chance agreement. Journal of Classification, 23, 301–313.MathSciNetCrossRefGoogle Scholar
  2. Fowlkes, E. B., & Mallows, C. L. (1983). A method for comparing two hierarchical clusterings. JASA, 78, 553–569.zbMATHGoogle Scholar
  3. Fowlkes, E. B., Gnanadesikan, R., & Kettenring, J. R. (1988). Variable selection in clustering. Journal of Classification, 5, 205–228.MathSciNetCrossRefGoogle Scholar
  4. Fraiman, R., Justel, A., & Svarc, M. (2008). Selection of variables for cluster analysis and classification rules. JASA, 103, 1294–1303.MathSciNetzbMATHGoogle Scholar
  5. Friedman, J. H., & Meulman, J. J. (2004). Clustering objects on subset of attributes. Journal of the Royal Statistical Society B, 66, 815–849.MathSciNetzbMATHCrossRefGoogle Scholar
  6. Gnanadesikan, R., Kettering, J. R., & Tsao, S. L. (1995). Weighting and selection of variables for cluster analysis. Journal of Classification, 12, 113–136.zbMATHCrossRefGoogle Scholar
  7. Hubert, L. J., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.CrossRefGoogle Scholar
  8. Montanari, A., & Lizzani, L. (2001). A projection pursuit approach to variable selection. Computational Statistics and Data Analysis, 35, 463–473.MathSciNetzbMATHCrossRefGoogle Scholar
  9. Raftery, A. E., & Dean, N. (2006). Variable selection for model based clustering. JASA, 101, 168–178.MathSciNetzbMATHGoogle Scholar
  10. Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. JASA, 66, 846–850.Google Scholar
  11. Steinley, D., & Brusco, M. J. (2008). Selection of variables in cluster analysis: An empirical comparison of eight procedures. Psychometrika, 73, 125–144.MathSciNetzbMATHCrossRefGoogle Scholar
  12. Tadesse, M. G., Sha, N., & Vannucci, M. (2005). Bayesian variable selection in clustering high-dimensional data. JASA, 100, 602–617.MathSciNetzbMATHGoogle Scholar
  13. Warrens, M. J. (2008). On the equivalence of Cohen’s Kappa and the Hubert-Arabie adjusted Rand index. Journal of Classification, 25, 177–183.MathSciNetzbMATHCrossRefGoogle Scholar
  14. Zani, S. (1986). Some measures for the comparison of data matrices. In Proceedings of the XXXIII Meeting of the Italian Statistical Society (pp. 157–169), Bari, Italy.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Department of EconomicsUniversity of Modena and Reggio EmiliaModenaItaly
  2. 2.Department of EconomicsUniversity of ParmaParmaItaly

Personalised recommendations