Statistics and Computing

, Volume 17, Issue 3, pp 253–262

Cluster analysis of massive datasets in astronomy


DOI: 10.1007/s11222-007-9027-x

Cite this article as:
Jang, W. & Hendry, M. Stat Comput (2007) 17: 253. doi:10.1007/s11222-007-9027-x


Clusters of galaxies are a useful proxy to trace the distribution of mass in the universe. By measuring the mass of clusters of galaxies on different scales, one can follow the evolution of the mass distribution (Martínez and Saar, Statistics of the Galaxy Distribution, 2002). It can be shown that finding galaxy clusters is equivalent to finding density contour clusters (Hartigan, Clustering Algorithms, 1975): connected components of the level set Sc≡{f>c} where f is a probability density function. Cuevas et al. (Can. J. Stat. 28, 367–382, 2000; Comput. Stat. Data Anal. 36, 441–459, 2001) proposed a nonparametric method for density contour clusters, attempting to find density contour clusters by the minimal spanning tree. While their algorithm is conceptually simple, it requires intensive computations for large datasets. We propose a more efficient clustering method based on their algorithm with the Fast Fourier Transform (FFT). The method is applied to a study of galaxy clustering on large astronomical sky survey data.


Density contour clusterLevel setClusteringFast Fourier transform

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Department of Epidemiology and BiostatisticsUniversity of GeorgiaAthensUSA
  2. 2.Department of Physics and AstronomyUniversity of GlasgowGlasgowUK