Clustering of Mixed-Type Data Considering Concept Hierarchies
Most clustering algorithms have been designed only for pure numerical or pure categorical data sets while nowadays many applications generate mixed data. It arises the question how to integrate various types of attributes so that one could efficiently group objects without loss of information. It is already well understood that a simple conversion of categorical attributes into a numerical domain is not sufficient since relationships between values such as a certain order are artificially introduced. Leveraging the natural conceptual hierarchy among categorical information, concept trees summarize the categorical attributes. In this paper we propose the algorithm ClicoT (CLustering mixed-type data Including COncept Trees) which is based on the Minimum Description Length (MDL) principle. Profiting of the conceptual hierarchies, ClicoT integrates categorical and numerical attributes by means of a MDL based objective function. The result of ClicoT is well interpretable since concept trees provide insights of categorical data. Extensive experiments on synthetic and real data set illustrate that ClicoT is noise-robust and yields well interpretable results in a short runtime.
- 3.Böhm, C., Faloutsos, C., Pan, J., Plant, C.: Robust information-theoretic clustering. In: KDD (2006)Google Scholar
- 4.Böhm, C., Goebl, S., Oswald, A., Plant, C., Plavinski, M., Wackersreuther, B.: Integrative parameter-free clustering of data with mixed type attributes. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS (LNAI), vol. 6118, pp. 38–47. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13657-3_7CrossRefGoogle Scholar
- 5.He, Z., Xu, X., Deng, S.: Clustering mixed numeric and categorical data: a cluster ensemble approach. CoRR abs/cs/0509011 (2005)Google Scholar
- 10.Plant, C., Böhm, C.: INCONCO: interpretable clustering of numerical and categorical objects. In: KDD, pp. 1127–1135 (2011)Google Scholar
- 12.Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: ICML (2009)Google Scholar