Skip to main content
Log in

Geometrical codification for clustering mixed categorical and numerical databases

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

This paper presents an alternative to cluster mixed databases. The main idea is to propose a general method to cluster mixed data sets, which is not very complex and still can reach similar levels of performance of some good algorithms. The proposed approach is based on codifying the categorical attributes and use a numerical clustering algorithm on the resulting database. The codification proposed is based on polar or spherical coordinates, it is easy to understand and to apply, the increment in the length of the input matrix is not excessively large, and the codification error can be determined for each case. The proposed codification combined with the well known k-means algorithm showed a very good performance in different benchmarks and has been compared with both, other codifications and other mixed clustering algorithms, showing a better or comparable performance in all cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Fig. 1

Similar content being viewed by others

References

  • Ahmad, A., & Dey, L. (2007). A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering, 63(2), 503–527.

    Article  Google Scholar 

  • Babuska, R. (1996). Fuzzy modeling and identification. PhD dissertation, Delft University of Technology, Delft, The Netherlands.

  • Barcelo-Rico, F., & Diez, J. L. (2009). Comparative study of codification techniques for clustering heart disease database. Modeling and Control in Biomedical Systems, 7(1), 64–69.

    Google Scholar 

  • Bourke, P. (1993). http://local.wasp.uwa.edu.au/. Accessed 30 July 2010.

  • Brouwer, R. K. (2007). A method for fuzzy clustering with ordinal attributes. International Journal of Intelligent Systems, 22, 590–620.

    Article  Google Scholar 

  • Coxeter, H. S. M. (1948). Regular polytopes. Methuen.

  • Crossa, J., & Franco, J. (2004). Statistical methods for classifying genotypes. Euphytica, 137(1), 19–37.

    Article  Google Scholar 

  • de Oliveira, J. V., & Pedrycz, W. (2007). Advances in fuzzy clustering and its applications. New York: Wiley.

    Book  Google Scholar 

  • Diez, J. L., Navarro, J. L., & Sala, A. (2004). Algoritmos de agrupamiento en la identificacion de modelos borrosos. Revista Iberoamericana de Automática e Informática Industrial, 1(2), 32–41 (in Spanish).

    Google Scholar 

  • Diez, J. L., Sala, A., & Navarro, J. L. (2006). Target-shaped possibilistic clustering applied to local-model identification. Engineering Applications of Artificial Intelligence, 19, 201–208.

    Article  Google Scholar 

  • Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification. Ed. New York, USA: Wiley.

    Google Scholar 

  • Gelbard, R., Goldman, O., & Spiegler, I. (2007). Investigating diversity of clustering methods: An empirical comparison. Data & Knowledge Engineering, 63(1), 155–166.

    Article  Google Scholar 

  • Goebel, M., & Gruenwald, L. (1999). A survey of data mining and knowledge discovery software tools. ACM SIGKDD (Explorations Newsletter), 1(1), 20–33.

    Article  Google Scholar 

  • Hartigan, J. A., & Wong, M. A. (1979). A K-means clustering algorithm. Journal of the Royal Statistical Society. Series C, 28, 100–108.

    MATH  Google Scholar 

  • He, Z., Xu, X., & Deng, S. (2005). Scalable algorithms for clustering large datasets with mixed type attributes. International Journal of Intelligent Systems, 20, 1077–1089.

    Article  MATH  Google Scholar 

  • Hsu, C. C., Chen, C. L., & Su, Y. W. (2007). Hierarchical clustering of mixed data based on distance hierarchy. Information Sciences, 177(20), 4474–4492.

    Article  Google Scholar 

  • Huang, Z., & Ng, M. K. (1999). A fuzzy k-modes algorithm for clustering categorical data. IEEE Transactions on Fuzzy Systems, 7(4), 446–452.

    Article  Google Scholar 

  • Timm, H., & Kruse, R. (1998). Fuzzy cluster analysis with missing values. In Fuzzy information processing society—NAFIPS, 1998 conference of the North American (Vol. 1).

  • Zhang, T., Ramakrishnan, R., & Livny, M. (1996). Birch: An efficient data clustering method for large databases. In Proc. SIGmod, 96, 103–114.

Download references

Acknowledgements

The authors acknowledge the partial funding of this work by the National projects DPI2007-66728-C02-01 and DPI2008-06737-C02-01.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatima Barcelo-Rico.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barcelo-Rico, F., Diez, JL. Geometrical codification for clustering mixed categorical and numerical databases. J Intell Inf Syst 39, 167–185 (2012). https://doi.org/10.1007/s10844-011-0187-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-011-0187-y

Keywords

Navigation