Identification of Noisy Variables for Nonmetric and Symbolic Data in Cluster Analysis
A proposal of an extended version of the HINoV method for the identification of the noisy variables (Carmone et al. (1999)) for nonmetric, mixed, and symbolic interval data is presented in this paper. Proposed modifications are evaluated on simulated data from a variety of models. The models contain the known structure of clusters. In addition, the models contain a different number of noisy (irrelevant) variables added to obscure the underlying structure to be recovered.
KeywordsCluster Structure Ordinal Data Symbolic Data Multivariate Normal Distribution Rand Index
Unable to display preview. Download preview PDF.
- JAJUGA, K., WALESIAK, M., BAK, A. (2003): On the General Distance Measure, In: M., Schwaiger, and O., Opitz (Eds.), Exploratory data analysis in empirical research, Springer-Verlag, Berlin, Heidelberg, 104-109.Google Scholar
- MILLIGAN, G.W. (1996): Clustering validation: results and implications for applied analyses, In: P., Arabie, L.J., Hubert, G., de Soete (Eds.), Clustering and classification, World Scientific, Singapore, 341-375.Google Scholar
- WALESIAK, M. (2005): Variable selection for cluster analysis - approaches, problems, meth-ods, Plenary Session of the Committee on Statistics and Econometrics of the Polish Academy of Sciences, 15, March, Wroclaw.Google Scholar