Abstract
Clusters of galaxies are a useful proxy to trace the distribution of mass in the universe. By measuring the mass of clusters of galaxies on different scales, one can follow the evolution of the mass distribution (Martínez and Saar, Statistics of the Galaxy Distribution, 2002). It can be shown that finding galaxy clusters is equivalent to finding density contour clusters (Hartigan, Clustering Algorithms, 1975): connected components of the level set S c ≡{f>c} where f is a probability density function. Cuevas et al. (Can. J. Stat. 28, 367–382, 2000; Comput. Stat. Data Anal. 36, 441–459, 2001) proposed a nonparametric method for density contour clusters, attempting to find density contour clusters by the minimal spanning tree. While their algorithm is conceptually simple, it requires intensive computations for large datasets. We propose a more efficient clustering method based on their algorithm with the Fast Fourier Transform (FFT). The method is applied to a study of galaxy clustering on large astronomical sky survey data.
Similar content being viewed by others
References
Bahcall, N., et al.: The cluster mass function from early Sloan digital sky survey data: cosmological implications. Astrophys. J. 585, 182–190 (2003)
Báillo, A., Cuesta-Albertos, J.A., Cuevas, A.: Convergence rates in nonparametric estimation of level sets. Stat. Probab. Lett. 53, 27–35 (2001)
Chaudhuri, A.R., Basu, A., Bhandari, S., Chaudhuri, B.: An efficient approach to consistent set estimation. Sankhyā Ser. B 61, 496–513 (1999)
Cole, S., Hatton, S., Weinberg, D.H., Frenk, C.S.: Mock 2df and sdss galaxy redshift surveys. Mon. Not. Roy. Astron. Soc. 300, 945–966 (1998)
Cooray, A., Sheth, R.K.: Halo models of large scale structure. Phys. Rep. 372, 1–129 (2002)
Cuevas, A., Fraiman, R.: A plugin approach to support estimation. Ann. Stat. 25, 2300–2312 (1997)
Cuevas, A., Rodriguez-Casal, A.: On boundary estimation. Adv. Appl. Probab. 36, 340–354 (2004)
Cuevas, A., Febrero, M., Fraiman, R.: Estimating the number of clusters. Can. J. Stat. 28, 367–382 (2000)
Cuevas, A., Febrero, M., Fraiman, R.: Cluster analysis: a further approach based on density estimation. Comput. Stat. Data Anal. 36, 441–459 (2001)
Dekel, A., Lahav, O.: Stochastic non-linear galaxy biasing. Astrophys. J. 520, 24–34 (1999)
Devroye, L., Wise, G.: Detection of abnormal behavior via nonparametric estimation of the support. SIAM J. Appl. Math. 38, 480–488 (1980)
Dodelson, S.: Modern Cosmology. Academic, New York (2003)
Fraley, C., Raftery, A.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002)
Genovese, C.R., Miller, C.J., Nichol, R.C., Arjunwadkar, M., Wasserman, L.: Nonparametric inference for the cosmic microwave background. Stat. Sci. 19, 308–321 (2004)
Gray, A., Moore, A.: Rapid evaluation of multiple density models. Artif. Intell. Stat. (2003)
Hall, P., Wand, M.: On the accuracy of binned kernel density estimators. J. Multivar. Anal. 56, 165–184 (1996)
Hartigan, J.: Clustering Algorithms. Wiley, New York (1975)
Hartigan, J.: Statistical clustering. Tech. Rep., Department of Statistics, Yale University (2000)
Jang, W.: Nonparametric density estimation and galaxy clustering. In: Statistical Challenges in Astronomy, pp. 443–445. Springer, New York (2003)
Jang, W.: Nonparametric density estimation and clustering in astronomical sky surveys. Comput. Stat. Data Anal. 50, 760–774 (2006)
Jang, W., Genovese, C., Wasserman, L.: Nonparametric confidence sets for densities. Tech. Rep. 795, Department of Statistics, Carnegie Mellon University (2004)
Kaiser, N.: Clustering in real space and in redshift space. Mon. Not. Roy. Astron. Soc. 227, 1–21 (1987)
Korostelev, A., Tsybakov, A.: Minimax Theory of Image Reconstruction. Springer, New York (1993)
Martínez, V., Saar, E.: Statistics of the Galaxy Distribution. Chapman and Hall, London (2002)
McLachlan, G., Peel, D.: Finite Mixture Model. Wiley, New York (2000)
Moore, A.: Very fast em-based mixture model clustering using multiresolution kd-trees. In: Advances in Neural Information Processing Systems, pp. 543–549 (1999)
Narasimhan, G., Zhu, J., Zachariasen, M.: Experiments with computing geometric minimum spanning trees. In: Proceedings of ALENEX’00. Lecture Notes in Computer Science, pp. 183–196. Springer, New York (2000)
Nichol, R.: Private communication (2006)
Nichol, R.C., Connolly, A.J., Moore, A.W., Schneider, J., Genovese, C., Wasserman, L.: Computational AstroStatistics: fast algorithms and efficient statistics for density estimation in large astronomical datasets. In: Proceedings of Virtual Observatories of the Future. ASP Conference Series, vol. 225, p. 265. San Francisco (2001)
Parkinson, D., Mukherjee, P., Liddle, A.R.: Bayesian model selection analysis of WMAP3. Phys. Rev. D 73, 123523 (2006)
Press, W.H., Schechter, P.: Formation of galaxies and clusters of galaxies by self-similar gravitational condensation. Astrophys. J. 187, 425–438 (1974)
R Development Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, http://www.R-project.org (2006)
Reichart, D., Nichol, R., Castander, F., Burker, D., Romer, A.K., Holden, B., Collins, C., Ulmer, M.: A deficit high-redshift, high-luminosity x-ray clusters: evidence for a high value of ω m ? Astrophys. J. 518, 521–532 (1999)
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Upper Saddle River (2002)
Scott, C.D., Nowak, R.D.: Learning minimum volume sets. J. Mach. Learn. Res. 7, 665–704 (2006)
Silverman, B.W.: Algorithm AS 176: Kernel density estimation using the fast Fourier transform. Appl. Stat. 31, 93–99 (1982)
Steinwart, I., Hush, D., Scovel, C.: A classification framework for anomaly detection. J. Mach. Learn. Res. 6, 211–232 (2005)
Stuetzle, W.: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J. Classif. 20, 25–47 (2003)
Wand, M.: Fast computation of multivariate kernel estimators. J. Comput. Graph. Stat. 3, 433–445 (1994)
Wand, M., Jones, M.: Kernel Smoothing. Chapman and Hall, London (1995)
Willett, R., Nowak, R.: Minimax optimal level set estimation. Unpublished manuscript, http://www.ee.duke/edu/~willett/ (2006)
Wong, W.-K., Moore, A.: Efficient algorithms for non-parametric clustering with clutter. Comput. Sci. Stat. 34, 541–553 (2002)
Zhou, R., Hansen, E.A.: A breadth-first approach to memory-efficient graph search. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), Boston, MA (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jang, W., Hendry, M. Cluster analysis of massive datasets in astronomy. Stat Comput 17, 253–262 (2007). https://doi.org/10.1007/s11222-007-9027-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-007-9027-x