Association-Based Dissimilarity Measures for Categorical Data: Limitation and Improvement
Measuring the similarity for categorical data is a challenging task in data mining due to the poor structure of categorical data. This paper presents a dissimilarity measure for categorical data based on the relations among attributes. This measure not only has the advantage of value variance but also overcomes the limitations of condition the probability-based measure when applied to databases whose attributes are independent. Experiments with 30 databases also showed that the proposed measure boosted the accuracy of Nearest Neighbor classification in comparison with other tested measures.
KeywordsCategorical Data Binary Vector Near Neighbor Association Rule Mining Dissimilarity Measure
Unable to display preview. Download preview PDF.
- 3.Aono, M., Kobayashi, M.: Vector space models for search and cluster mining. In: Survey of Text Mining: clustering, classification and retrieval, pp. 103–122. Springer, New York (2004)Google Scholar
- 7.Blake, C.L., Merz, C.J.: (uci) repository of machine learning databases (1998)Google Scholar
- 8.Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Knowledge Discovery and Data Mining, pp. 80–86 (1998)Google Scholar