Statistics and Computing

, Volume 17, Issue 1, pp 71–80 | Cite as

Clustering via nonparametric density estimation

Article

Abstract

Although Hartigan (1975) had already put forward the idea of connecting identification of subpopulations with regions with high density of the underlying probability distribution, the actual development of methods for cluster analysis has largely shifted towards other directions, for computational convenience. Current computational resources allow us to reconsider this formulation and to develop clustering techniques directly in order to identify local modes of the density. Given a set of observations, a nonparametric estimate of the underlying density function is constructed, and subsets of points with high density are formed through suitable manipulation of the associated Delaunay triangulation. The method is illustrated with some numerical examples.

Keywords

Cluster analysis Delaunay triangulation Voronoi tessellation Nonparametric density estimation Kernel method 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aitchison J. 1986. The Statistical Analysis of Compositional Data. Chapman & Hall, London.MATHGoogle Scholar
  2. Ankerst M., Breuning M.M., Kriegel H.P., and Sander J. 1999. OPTICS: ordering points to identify the clustering structure. In: International Conference on Management of Data (SIGMOD’99), ACM, pp. 49–60.Google Scholar
  3. Barber C.B., Dobkin D.P., and Huhdanpaa H. 1996. The Quickhull algorithm for convex hulls. ACM Trans. Math. Software 22: 469–483.MATHCrossRefMathSciNetGoogle Scholar
  4. Bowman A. and Foster P. 1993. Density based exploration of bivariate data. Statistics and Computing 3: 171–177.CrossRefGoogle Scholar
  5. Bowman A.W. and Azzalini 1997. Applied Smoothing Techniques for Data Analysis. Claredon Press, Oxford.Google Scholar
  6. Cuevas A., Febrero M., and Fraiman R. 2000. Estimating the number of clusters. Canad. J. Stat. 28: 367–382.MATHCrossRefMathSciNetGoogle Scholar
  7. Cuevas A., Febrero M., and Fraiman R. 2001. Cluster analysis: a further approach based on density estimation. Computational Statistics & Data Analysis 36: 441–459.MATHCrossRefMathSciNetGoogle Scholar
  8. Devroye L.P. and Wagner T.J. 1980. The strong uniform consistency of kernel density estimates. In: Multivariate Analysis, North-Holland, Vol. 5, pp. 59–77.Google Scholar
  9. Ester M., Kriegel H.P., Sander J., and Xu X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery in Data Mining (KDD-96), Portland, OR, USA. ACM, pp. 226–231.Google Scholar
  10. Forina M., Armanino C., Lanteri S., and Tiscornia E. 1983. Classication of olive oils from their fatty acid composition. In: H. Martens and H. J. Russwurm (Eds.), Food Research and Data Analysis, Applied Science Publishers: London, pp. 189–214.Google Scholar
  11. Hartigan J.A. 1975. Clustering Algorithms. J. Wiley & Sons, New York.MATHGoogle Scholar
  12. Hubert L. and Arabie P. 1985. Comparing partitions. Journal of Classification 2: 193–218.CrossRefGoogle Scholar
  13. Nadaraya É.A. 1965. On non-parametric estimates of density functions and regression curves. Theory Probability its Appl. (Transl. Teorija Verojatnostei i ee Primenenija) 10: 186–190.Google Scholar
  14. Okabe A., Boots B.N., and Sugihara K. 1992. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. J. Wiley & Sons, New York.Google Scholar
  15. R Development Core Team 2004. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria 3-900051-07-0.Google Scholar
  16. Rosolin T., Azzalini A., and Torelli N. 2003. Detecting clusters via nonparametric density estimation. In: Convegno SIS analisi statistica multivariata per le scienze economico-sociali, le scienze naturali e la tecnologia, Napoli, Italy. Società Italiana di Statistica, RCE edizioni.Google Scholar
  17. Stuetzle W. 2003. Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. Journal of Classification 20: 25–47.MATHCrossRefMathSciNetGoogle Scholar
  18. Wong A.M. and Lane T. 1983. The kth nearest neighbour clustering procedure. Journal of the Royal Statistical Society, Series B 45: 362–368.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Dipartimento di Scienze StatisticheUniversità di PadovaPadovaItaly
  2. 2.Dipartimento di Scienze Economiche e StatisticheUniversità di TriesteTriesteItaly

Personalised recommendations