Statistics and Computing

, Volume 24, Issue 5, pp 753–767

An advancement in clustering via nonparametric density estimation

Article

DOI: 10.1007/s11222-013-9400-x

Cite this article as:
Menardi, G. & Azzalini, A. Stat Comput (2014) 24: 753. doi:10.1007/s11222-013-9400-x

Abstract

Density-based clustering methods hinge on the idea of associating groups to the connected components of the level sets of the density underlying the data, to be estimated by a nonparametric method. These methods claim some desirable properties and generally good performance, but they involve a non-trivial computational effort, required for the identification of the connected regions. In a previous work, the use of spatial tessellation such as the Delaunay triangulation has been proposed, because it suitably generalizes the univariate procedure for detecting the connected components. However, its computational complexity grows exponentially with the dimensionality of data, thus making the triangulation unfeasible for high dimensions. Our aim is to overcome the limitations of Delaunay triangulation. We discuss the use of an alternative procedure for identifying the connected regions associated to the level sets of the density. By measuring the extent of possible valleys of the density along the segment connecting pairs of observations, the proposed procedure shifts the formulation from a space with arbitrary dimension to a univariate one, thus leading benefits both in computation and visualization.

Keywords

Cluster analysis Connected sets Nonparametric density estimation Kernel method 

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of Statistical SciencesUniversity of PaduaPadovaItaly