Abstract
This paper presents an alternative to cluster mixed databases. The main idea is to propose a general method to cluster mixed data sets, which is not very complex and still can reach similar levels of performance of some good algorithms. The proposed approach is based on codifying the categorical attributes and use a numerical clustering algorithm on the resulting database. The codification proposed is based on polar or spherical coordinates, it is easy to understand and to apply, the increment in the length of the input matrix is not excessively large, and the codification error can be determined for each case. The proposed codification combined with the well known k-means algorithm showed a very good performance in different benchmarks and has been compared with both, other codifications and other mixed clustering algorithms, showing a better or comparable performance in all cases.
Access this article
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Similar content being viewed by others
References
Ahmad, A., & Dey, L. (2007). A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering, 63(2), 503–527.
Babuska, R. (1996). Fuzzy modeling and identification. PhD dissertation, Delft University of Technology, Delft, The Netherlands.
Barcelo-Rico, F., & Diez, J. L. (2009). Comparative study of codification techniques for clustering heart disease database. Modeling and Control in Biomedical Systems, 7(1), 64–69.
Bourke, P. (1993). http://local.wasp.uwa.edu.au/. Accessed 30 July 2010.
Brouwer, R. K. (2007). A method for fuzzy clustering with ordinal attributes. International Journal of Intelligent Systems, 22, 590–620.
Coxeter, H. S. M. (1948). Regular polytopes. Methuen.
Crossa, J., & Franco, J. (2004). Statistical methods for classifying genotypes. Euphytica, 137(1), 19–37.
de Oliveira, J. V., & Pedrycz, W. (2007). Advances in fuzzy clustering and its applications. New York: Wiley.
Diez, J. L., Navarro, J. L., & Sala, A. (2004). Algoritmos de agrupamiento en la identificacion de modelos borrosos. Revista Iberoamericana de Automática e Informática Industrial, 1(2), 32–41 (in Spanish).
Diez, J. L., Sala, A., & Navarro, J. L. (2006). Target-shaped possibilistic clustering applied to local-model identification. Engineering Applications of Artificial Intelligence, 19, 201–208.
Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification. Ed. New York, USA: Wiley.
Gelbard, R., Goldman, O., & Spiegler, I. (2007). Investigating diversity of clustering methods: An empirical comparison. Data & Knowledge Engineering, 63(1), 155–166.
Goebel, M., & Gruenwald, L. (1999). A survey of data mining and knowledge discovery software tools. ACM SIGKDD (Explorations Newsletter), 1(1), 20–33.
Hartigan, J. A., & Wong, M. A. (1979). A K-means clustering algorithm. Journal of the Royal Statistical Society. Series C, 28, 100–108.
He, Z., Xu, X., & Deng, S. (2005). Scalable algorithms for clustering large datasets with mixed type attributes. International Journal of Intelligent Systems, 20, 1077–1089.
Hsu, C. C., Chen, C. L., & Su, Y. W. (2007). Hierarchical clustering of mixed data based on distance hierarchy. Information Sciences, 177(20), 4474–4492.
Huang, Z., & Ng, M. K. (1999). A fuzzy k-modes algorithm for clustering categorical data. IEEE Transactions on Fuzzy Systems, 7(4), 446–452.
Timm, H., & Kruse, R. (1998). Fuzzy cluster analysis with missing values. In Fuzzy information processing society—NAFIPS, 1998 conference of the North American (Vol. 1).
Zhang, T., Ramakrishnan, R., & Livny, M. (1996). Birch: An efficient data clustering method for large databases. In Proc. SIGmod, 96, 103–114.
Acknowledgements
The authors acknowledge the partial funding of this work by the National projects DPI2007-66728-C02-01 and DPI2008-06737-C02-01.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Barcelo-Rico, F., Diez, JL. Geometrical codification for clustering mixed categorical and numerical databases. J Intell Inf Syst 39, 167–185 (2012). https://doi.org/10.1007/s10844-011-0187-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-011-0187-y