Fast Hierarchical Clustering from the Baire Distance

  • Pedro Contreras
  • Fionn Murtagh
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


The Baire or longest common prefix ultrametric allows a hierarchy, a multiway tree, or ultrametric topology embedding, to be constructed very efficiently. The Baire distance is a 1-bounded ultrametric. For high dimensional data, one approach for the use of the Baire distance is to base the hierarchy construction on random projections. In this paper we use the Baire distance on the Sloan Digital Sky Survey (SDSS, archive. We are addressing the regression of (high quality, more costly to collect) spectroscopic and (lower quality, more readily available) photometric redshifts. Nonlinear regression is used for mapping photometric and astrometric redshifts.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: Applications to image and text data. KDD ’01: Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining.ACM. San Francisco, California.Google Scholar
  2. Brown, N. (2009). Chemoinformatics – An introduction for computer scientists. ACM Computing Surveys, 41(2). Article 8.Google Scholar
  3. Hartigan, J. A., & Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics 28, 100–108.MATHCrossRefGoogle Scholar
  4. Murtagh, F. (2004). On ultrametricity, data coding, and computation. Journal of Classification, 21, 167–184.MATHCrossRefMathSciNetGoogle Scholar
  5. Murtagh, F. (2004). Thinking ultrametrically. In D. Banks, L. House, F. R. McMorris, P. Arabie and W. Gaul (Eds.), Classification, clustering, and data mining applications (pp. 3–14). Berlin, Heidelberg, New York: Springer.Google Scholar
  6. Murtagh, F. (2004). Quantifying ultrametricity. J. Antoch (Ed.), Proceedings in Computational Statistics, Compstat (pp. 1561–1568). Berlin, Heidelberg, New York: Springer.Google Scholar
  7. Murtagh, F. (2005). Identifying the ultrametricity of time series. European Physical Journal B., 43, 573–579.CrossRefGoogle Scholar
  8. Murtagh, F., Downs, G., & Contreras, P. (2008). Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding. Society for Industrial and Applied Mathematics. SIAM Journal of Scientific Computing, 30(2), 707–730.Google Scholar
  9. Raffaele, D., Antonino, S., Giuseppe, L., Massimo, B., Maurizio, P., Elisabetta, D., & Roberto, T. (2007). Mining the SDSS archive. I. Photometric Redshifts in the Nearby Universe. ArXiv, arXiv:astro-ph/0703108v2.Google Scholar
  10. SDSS. (2008). Sloan digital sky survey.
  11. Vempala, S. (2004). The Random Projection Method (Vol. 65). DIMACS: Series in Discrete Mathematics and Theoretical Computer Science, Rutgers University. American Mathematical Society.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.Department of Computer ScienceRoyal Holloway, University of LondonEghamEngland

Personalised recommendations