Journal of Classification

, Volume 29, Issue 2, pp 199–226

A New Class of Weighted Similarity Indices Using Polytomous Variables

Authors

    • Department of EconomicsUniversity of Modena and Reggio Emilia
  • Sergio Zani
    • University of Parma
Article

DOI: 10.1007/s00357-012-9107-2

Cite this article as:
Morlini, I. & Zani, S. J Classif (2012) 29: 199. doi:10.1007/s00357-012-9107-2

Abstract

We introduce new similarity measures between two subjects, with reference to variables with multiple categories. In contrast to traditionally used similarity indices, they also take into account the frequency of the categories of each attribute in the sample. This feature is useful when dealing with rare categories, since it makes sense to differently evaluate the pairwise presence of a rare category from the pairwise presence of a widespread one. A weighting criterion for each category derived from Shannon’s information theory is suggested. There are two versions of the weighted index: one for independent categorical variables and one for dependent variables. The suitability of the proposed indices is shown in this paper using both simulated and real world data sets.

Keywords

Chi-square distanceCluster analysisVariable weightingInformation theory

Copyright information

© Springer Science+Business Media, LLC 2012