Geometrical codification for clustering mixed categorical and numerical databases

Barcelo-Rico, Fatima; Diez, Jose-Luis

doi:10.1007/s10844-011-0187-y

Geometrical codification for clustering mixed categorical and numerical databases

Published: 06 December 2011

Volume 39, pages 167–185, (2012)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Fatima Barcelo-Rico¹ &
Jose-Luis Diez¹

350 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

This paper presents an alternative to cluster mixed databases. The main idea is to propose a general method to cluster mixed data sets, which is not very complex and still can reach similar levels of performance of some good algorithms. The proposed approach is based on codifying the categorical attributes and use a numerical clustering algorithm on the resulting database. The codification proposed is based on polar or spherical coordinates, it is easy to understand and to apply, the increment in the length of the input matrix is not excessively large, and the codification error can be determined for each case. The proposed codification combined with the well known k-means algorithm showed a very good performance in different benchmarks and has been compared with both, other codifications and other mixed clustering algorithms, showing a better or comparable performance in all cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

A Unified Metric for Categorical and Numerical Attributes in Data Clustering

Clustering Mixed Datasets by Using Similarity Features

Methods for Clustering Categorical and Mixed Data: An Overview and New Algorithms

References

Ahmad, A., & Dey, L. (2007). A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering, 63(2), 503–527.
Article Google Scholar
Babuska, R. (1996). Fuzzy modeling and identification. PhD dissertation, Delft University of Technology, Delft, The Netherlands.
Barcelo-Rico, F., & Diez, J. L. (2009). Comparative study of codification techniques for clustering heart disease database. Modeling and Control in Biomedical Systems, 7(1), 64–69.
Google Scholar
Bourke, P. (1993). http://local.wasp.uwa.edu.au/. Accessed 30 July 2010.
Brouwer, R. K. (2007). A method for fuzzy clustering with ordinal attributes. International Journal of Intelligent Systems, 22, 590–620.
Article Google Scholar
Coxeter, H. S. M. (1948). Regular polytopes. Methuen.
Crossa, J., & Franco, J. (2004). Statistical methods for classifying genotypes. Euphytica, 137(1), 19–37.
Article Google Scholar
de Oliveira, J. V., & Pedrycz, W. (2007). Advances in fuzzy clustering and its applications. New York: Wiley.
Book Google Scholar
Diez, J. L., Navarro, J. L., & Sala, A. (2004). Algoritmos de agrupamiento en la identificacion de modelos borrosos. Revista Iberoamericana de Automática e Informática Industrial, 1(2), 32–41 (in Spanish).
Google Scholar
Diez, J. L., Sala, A., & Navarro, J. L. (2006). Target-shaped possibilistic clustering applied to local-model identification. Engineering Applications of Artificial Intelligence, 19, 201–208.
Article Google Scholar
Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification. Ed. New York, USA: Wiley.
Google Scholar
Gelbard, R., Goldman, O., & Spiegler, I. (2007). Investigating diversity of clustering methods: An empirical comparison. Data & Knowledge Engineering, 63(1), 155–166.
Article Google Scholar
Goebel, M., & Gruenwald, L. (1999). A survey of data mining and knowledge discovery software tools. ACM SIGKDD (Explorations Newsletter), 1(1), 20–33.
Article Google Scholar
Hartigan, J. A., & Wong, M. A. (1979). A K-means clustering algorithm. Journal of the Royal Statistical Society. Series C, 28, 100–108.
MATH Google Scholar
He, Z., Xu, X., & Deng, S. (2005). Scalable algorithms for clustering large datasets with mixed type attributes. International Journal of Intelligent Systems, 20, 1077–1089.
Article MATH Google Scholar
Hsu, C. C., Chen, C. L., & Su, Y. W. (2007). Hierarchical clustering of mixed data based on distance hierarchy. Information Sciences, 177(20), 4474–4492.
Article Google Scholar
Huang, Z., & Ng, M. K. (1999). A fuzzy k-modes algorithm for clustering categorical data. IEEE Transactions on Fuzzy Systems, 7(4), 446–452.
Article Google Scholar
Timm, H., & Kruse, R. (1998). Fuzzy cluster analysis with missing values. In Fuzzy information processing society—NAFIPS, 1998 conference of the North American (Vol. 1).
Zhang, T., Ramakrishnan, R., & Livny, M. (1996). Birch: An efficient data clustering method for large databases. In Proc. SIGmod, 96, 103–114.

Download references

Acknowledgements

The authors acknowledge the partial funding of this work by the National projects DPI2007-66728-C02-01 and DPI2008-06737-C02-01.

Author information

Authors and Affiliations

Institut d’Automatica i Informatica Industrial, Universitat Politecnica de Valencia, Cami de Vera s/n CP 46022, Valencia, Spain
Fatima Barcelo-Rico & Jose-Luis Diez

Authors

Fatima Barcelo-Rico
View author publications
You can also search for this author in PubMed Google Scholar
Jose-Luis Diez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fatima Barcelo-Rico.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barcelo-Rico, F., Diez, JL. Geometrical codification for clustering mixed categorical and numerical databases. J Intell Inf Syst 39, 167–185 (2012). https://doi.org/10.1007/s10844-011-0187-y

Download citation

Received: 20 July 2011
Revised: 13 November 2011
Accepted: 15 November 2011
Published: 06 December 2011
Issue Date: August 2012
DOI: https://doi.org/10.1007/s10844-011-0187-y

Keywords

Access this article

Log in via an institution

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Geometrical codification for clustering mixed categorical and numerical databases

Abstract

Access this article

Similar content being viewed by others

A Unified Metric for Categorical and Numerical Attributes in Data Clustering

Clustering Mixed Datasets by Using Similarity Features

Methods for Clustering Categorical and Mixed Data: An Overview and New Algorithms

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Geometrical codification for clustering mixed categorical and numerical databases

Abstract

Access this article

Similar content being viewed by others

A Unified Metric for Categorical and Numerical Attributes in Data Clustering

Clustering Mixed Datasets by Using Similarity Features

Methods for Clustering Categorical and Mixed Data: An Overview and New Algorithms

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation