Advertisement

Identification of Noisy Variables for Nonmetric and Symbolic Data in Cluster Analysis

  • Marek Walesiak
  • Andrzej Dudek
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

A proposal of an extended version of the HINoV method for the identification of the noisy variables (Carmone et al. (1999)) for nonmetric, mixed, and symbolic interval data is presented in this paper. Proposed modifications are evaluated on simulated data from a variety of models. The models contain the known structure of clusters. In addition, the models contain a different number of noisy (irrelevant) variables added to obscure the underlying structure to be recovered.

Keywords

Cluster Structure Ordinal Data Symbolic Data Multivariate Normal Distribution Rand Index 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. BILLARD, L., DIDAY, E. (2006): Symbolic data analysis. Conceptual statistics and data mining, Wiley, Chichester.MATHGoogle Scholar
  2. CARMONE, F.J., KARA, A. and MAXWELL, S. (1999): HINoV: a new method to improve market segment definition by identifying noisy variables, Journal of Marketing Research, vol. 36, November, 501-509.CrossRefGoogle Scholar
  3. GNANADESIKAN, R., KETTENRING, J.R., and TSAO, S.L. (1995): Weighting and selec-tion of variables for cluster analysis, Journal of Classification, vol. 12, no. 1, 113-136.MATHCrossRefGoogle Scholar
  4. HUBERT, L.J., ARABIE, P. (1985): Comparing partitions, Journal of Classification, vol. 2, no. 1, 193-218.CrossRefGoogle Scholar
  5. JAJUGA, K., WALESIAK, M., BAK, A. (2003): On the General Distance Measure, In: M., Schwaiger, and O., Opitz (Eds.), Exploratory data analysis in empirical research, Springer-Verlag, Berlin, Heidelberg, 104-109.Google Scholar
  6. MILLIGAN, G.W. (1996): Clustering validation: results and implications for applied analyses, In: P., Arabie, L.J., Hubert, G., de Soete (Eds.), Clustering and classification, World Scientific, Singapore, 341-375.Google Scholar
  7. TIBSHIRANI, R., WALTHER, G., HASTIE, T. (2001): Estimating the number of clusters in a data set via the gap statistic, Journal of the Royal Statistical Society, ser. B, vol. 63, part 2,411-423.MATHCrossRefMathSciNetGoogle Scholar
  8. WALESIAK, M. (2005): Variable selection for cluster analysis - approaches, problems, meth-ods, Plenary Session of the Committee on Statistics and Econometrics of the Polish Academy of Sciences, 15, March, Wroclaw.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Marek Walesiak
    • 1
  • Andrzej Dudek
    • 1
  1. 1.Department of Econometrics and Computer ScienceWroclaw University of EconomicsJelenia GoraPoland

Personalised recommendations