Computational Statistics

, Volume 31, Issue 2, pp 771–798

Density-based clustering with non-continuous data

Original Paper

DOI: 10.1007/s00180-016-0644-8

Cite this article as:
Azzalini, A. & Menardi, G. Comput Stat (2016) 31: 771. doi:10.1007/s00180-016-0644-8
  • 231 Downloads

Abstract

Density-based clustering relies on the idea of associating groups with regions of the sample space characterized by high density of the probability distribution underlying the observations. While this approach to cluster analysis exhibits some desirable properties, its use is necessarily limited to continuous data only. The present contribution proposes a simple but working way to circumvent this problem, based on the identification of continuous components underlying the non-continuous variables. The basic idea is explored in a number of variants applied to simulated data, confirming the practical effectiveness of the technique and leading to recommendations for its practical usage. Some illustrations using real data are also presented.

Keywords

Density estimation Mixed variables Modal clustering Model-based clustering Multidimensional scaling 

Supplementary material

180_2016_644_MOESM1_ESM.pdf (508 kb)
Supplementary material 1 (pdf 507 KB)

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Dipartimento di Scienze StatisticheUniversità degli Studi di PadovaPadovaItaly

Personalised recommendations