Skip to main content
Log in

Characteristics of Distance Matrices Based on Euclidean, Manhattan and Hausdorff Coefficients

  • Original Research
  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

From n-size samples of k-variate points, we construct n × n distance-matrices based on the widely used Euclidean, Manhattan and Hausdorff coefficients and study (individually and in pairs) their properties P, R and ρ using theoretical analysis and both computer-generated and empirical data. The concordance PEM is shown by analysis of uniformly-distributed data to decrease asymptotically as k → ∞ to exp [‒ exp [‒ γ]] ≅  0.5704, and PEH and PMH to decrease to zero, as also in generated N(0,1) and empirical data. In geological data, PEM is higher than predicted for 10 < k < 50. The robustness R of single matrices increases asymptotically as k → ∞ to 1.0, but the benefit to phenetic analysis is offset by the opposite response in P. For paired matrices, the distance-correlation coefficient ρE2is > 0.9 and independent of k for both U(0,1) and N(0,1); ρEH  and ρEM decline to zero as k → ∞.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

Two of the three datasets used as exemplars in this paper were published in original papers cited in the text, whereas the third is unavailable due to the death of the author.

Notes

  1. In (parenthesised) lists, commas represent additions; “/” is ordered choice between alternatives (the “/”s corresponding in 2 related lists), whereas “|” is unordered choice (being subordinate to “/”).

  2. Robustness was originally defined (Temple, 1982, pp. 675–676) in terms of the removal of a single attribute/dimension. It is here re-defined, and the relevant calculations made, in terms of addition (say, as R+), rather than removal (say, as R), primarily because in this way R becomes definable for k = 1. In fact, the dimensionality of R is not integral, since calculation of R+ for (e.g.) k = 2 involves scrutiny and comparison of nearest-neighbour lists for k = 2 and k = 3 and thus should most appropriately be assigned to k = 2.5. In the context of the present article, however, the definition and dimensionality of R are of lesser importance than its increase with increasing k (see Sect. 7).

  3. We use the conventional notations for these two distributions, but the similarity between the two formulations is misleading in that, although for both distributions the notation indicates standardisation, the significance of the numbers 0 and 1 is quite different in the two cases. In U(0,1) these numbers represent the limiting values of a variate x, between which the probability of an event is equal to unity, and beyond which it is everywhere zero. In N(0,1), on the other hand, the numbers are respectively the mean and standard deviation of the probability density function.

  4. For cell-notation, see note 4 of Table 3.

References

  • Abramowitz, M., & Stegun, I. A. (1972). Handbook of mathematical functions with formulas, graphs and mathematical tables (9th printing). Dover. xiv + 1046 pp.

  • Bateman, R. M. (2022). Obituary: John T. Temple. Geological Society Obituaries. https://www.geolsoc.org.uk/About/History/Obituaries-2001-onwards/Obituaries-2022

  • Bookstein, F. L. (1991). Morphometric tools for landmark data: Geometry and biology. Cambridge University Press.

    MATH  Google Scholar 

  • Chazal, F., Cohen-Steiner, D., & Mérigot, Q. (2011). Geometric inference for measures based on distance functions. Foundations of Computational Mathematics, 11, 733–751.

    Article  MathSciNet  MATH  Google Scholar 

  • Chazal, F., de Silva, V., Glisse, M., & Oudot, S. (2016). The structure and stability of persistence modules. Springer.

    Book  MATH  Google Scholar 

  • Coxeter, H. S. M. (1973). Regular polytopes (3rd edn). Dover. xiv + 321 pp.

  • Deza, M. M., & Deza, E. (2009). Encyclopedia of distances. Springer.

    Book  MATH  Google Scholar 

  • Digby, P. G. N., & Kempton, R. A. (1987). Multivariate analysis of ecological communities. Chapman & Hall.

    Book  Google Scholar 

  • Dunn, G., & Everitt, B. S. (1982). An introduction to mathematical taxonomy. Cambridge University Press.

    MATH  Google Scholar 

  • Feller, W. (1968). An introduction to probability theory and its applications (3rd edn), vol. 1. Wiley. xviii +509 pp.

  • Gower, J. C., & Legendre, P. (1986). Metric and Euclidean properties of dissimilarity coefficients. Journal of Classification, 3, 5–48.

    Article  MathSciNet  MATH  Google Scholar 

  • Harter, H. L. (1960). Tables of range and Studentized range. Annals of Mathematical Statistics, 31, 1122–1147.

    Article  MathSciNet  MATH  Google Scholar 

  • Hoel, P. G. (1971). Introduction to mathematical statistics (4th edn). Wiley. x + 409 pp.

  • MacLeod, N. (2008). Understanding morphology in systematic contexts: three-dimensional specimen ordination and recognition. In Q.D. Wheeler (ed.), The new taxonomy (pp. 143–209). CRC/Systematics Association.

  • Pielou, E. C. (1977). Mathematical ecology. Wiley. x + 385 pp.

  • Selby, S. M. (1969). CRC standard mathematical tables (17th edn). Chemical Rubber Company. xii + 724 pp.

  • Sneath, P. H. A. (1957). The application of computers to taxonomy. Journal of General Microbiology, 17, 201–226.

    Google Scholar 

  • Sneath, P. H. A., & Sokal, R. R. (1973). Numerical taxonomy: The principles and practice of numerical classification. W.H. Freeman.

    MATH  Google Scholar 

  • Sokal, R. R., & Sneath, P. H. A. (1963). Principles of numerical taxonomy. W.H. Freeman.

    MATH  Google Scholar 

  • Stirzaker, D. (1994). Elementary probability. Cambridge: Cambridge University Press. x + 406 pp.

  • Temple, J. T. (1980). A numerical taxonomic study of species of Trinucleidae (Trilobita) from the British Isles. Transactions of the Royal Society of Edinburgh (Earth Sciences), 71, 213–233.

    Article  Google Scholar 

  • Temple, J. T. (1982). An empirical study of robustness of nearest-neighbor relations in numerical taxonomy. Mathematical Geology, 14, 675–678.

    Article  Google Scholar 

  • Temple, J. T. (1992). The progress of quantitative methods in palaeontology. Palaeontology, 35, 475–484.

    Google Scholar 

  • Temple, J. T., & Hong-Ji, W. (1990). Numerical taxonomy of Encrinurinae (Trilobita): Additional species from China and elsewhere. Transactions of the Royal Society of Edinburgh (Earth Sciences), 81, 209–219.

    Article  Google Scholar 

  • Temple, J. T., & Tripp, R. P. (1979). An investigation of the Encrinurinae (Trilobita) by numerical taxonomic methods. Transactions of the Royal Society of Edinburgh, 70, 223–250.

    Article  Google Scholar 

  • Weisstein, E. W. (1999). CRC concise encyclopaedia of mathematics. CRC Press. vii + 1969 pp.

  • Zelditch, M. L., Swiderski, D. L., & Fink, W. L. (2000). Discovery of phylogenetic characters in morphometric data. In J. J. Wiens (Ed.), Phylogenetic analysis of morphological data (pp. 37–83). Smithsonian Institution Press.

    Google Scholar 

Download references

Acknowledgements

I am grateful to many patient friends and colleagues for help provided during the quarter-century genesis of this paper. For contributions to the original ca year 2000 draft, I must thank A. Abakuks, C. P. Chalmers and P. Holgate for much statistical advice, the late E. Kronheimer for showing me how to perform multidimensional geometry and for suggesting investigation of the Hausdorff coefficient, J. S. Temple and K. N. Brunt for the help in writing and implementing programs, J. C. Tipper and the late J. C. Gower for advice and criticism of early versions of the manuscript and the then Geology Department of Birkbeck College (University of London) for providing computing facilities.

More recently, this manuscript has been saved from oblivion through encouragement and advice from several of my friends: the late Tony Cross, whose recent death was a stark memento mori for me to take up the manuscript again; Bryan Cain, who has injected into the text more mathematical rigour than I could rightly claim for myself (albeit still too little in his view); John Simister, who kindly extended my computed random number experiments from k = 100 to k = 500; Joseph Harris, who patiently resolved my recurrent computing problems, even from halfway round the world; Roger Lawson, who turned the old typescript into an editable Word document; Jessica Verschoyle, who provided LaTeX and Python expertise; and Richard Bateman for his constructively critical reading of the final MS, and for advice on preparing and submitting the MS in the electronic era; also Richard Temple, and Christopher and Penny Coombe. This, my final paper, is dedicated to the memory of my beloved wife Dorothy.

Personal Note Sadly, our friend John Temple died on 1st May 2022, aged 94 (Bateman, 2022), while actively but painstakingly revising this manuscript to accommodate the modest changes suggested in review. Having long played advisory roles in the preparation of this manuscript, and where feasible utilising John’s final annotations, we formed a triumvirate specifically to complete and resubmit the revision, in order to ensure that John’s important insights finally see the light of day: Richard Bateman (Royal Botanic Gardens Kew), Bryan Cain (Oxford University DCE) and John Simister (Manchester Metropolitan University).

Author information

Authors and Affiliations

Authors

Additional information

J. T. Temple died before publication of this work was completed.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

For further correspondence contact Richard Mark Bateman, Royal Botanic Gardens Kew, Richmond, Surrey, TW9 3DS, UK (email: r.bateman@kew.org)

Appendices

Appendix 1 Volumes of k-Spheres and k-Octahedra

The k-volume of a k-dimensional sphere of radius r (cf. Coxeter, 1973, p. 125, Eq. 7.33) is Sk rk, where Sk is the k-volume of a k-sphere of unit radius, given by:

$${S}_{k}=\frac{{\pi }^{\frac{k}{2}}}{\Gamma \left(1+\frac{k}{2}\right)}$$

This expression is reduced to \(S_{k}={\pi}{\frac{k}{2}}(k/2)\)! for even values of k, since Γ(n + l) = n! when n is an integer. The (k ‒1)-volume (“surface”) of the sphere is k Sk rk‒1.

The k-volume of a k-octahedron βn of edge-length l (Coxeter, 1973, Table 1 (iii)) is 2 k/2 lk / k!, and the (k ‒1)-volume is

$$\frac{{\left({2}^{k+1}k\right)}^\frac{1}{2}{l}^{k-1}}{\left(k-1\right)!}$$

Appendix 2 Limit of [Γ(1 + 1/k)]k

We find initially the limit of the logarithm by writing z = 1/k. Then,

$$\begin{array}{l}\ln\left[\Gamma\left(1+\frac1k\right)\right]^k=k\;\ln\left[\Gamma\left(1+\frac1k\right)\right]\\=\frac1z\;\mathrm l\mathrm n\;\mathrm\Gamma\left(1+z\right)\\=\frac1z\left[-\ln\left(1+z\right)+z\left(1-\gamma\right)+\sum\nolimits_{n=2}^\infty\frac{\left(-1\right)^n\left[\zeta\left(n\right)-1\right]z^n}n\right]\end{array}$$

where γ is Euler’s constant, i.e. \({\mathrm{lim}}_{m\to \infty }[1+\frac{1}{2}+\frac{1}{3}+\dots +\frac{1}{m}-\mathrm{ln}\ m]\cong 0.5772\), and ζ is the Riemann Zeta Function (Abramowitz & Stegun, 1972, 6.1.33). Thus,

$${\mathrm{ln}[\Gamma (1+\frac{1}{k})]}^{k}=-1+\frac{z}{2}-\frac{{z}^{2}}{3}+\frac{{z}^{3}}{4}-\dots +\left(1-\gamma \right)+\frac{\left[\zeta \left(2\right)-1\right]z}{2}-\frac{\left[\zeta \left(3\right)-1\right]{z}^{2}}{3}+\frac{\left[\zeta \left(4\right)-1\right]{z}^{3}}{4}-\dots \to -\gamma , \mathrm{as}\ {z}\to 0.$$

So [Γ(1 + 1/k)]k → exp[‒ γ] ≅ 0.5615 as k → ∞.

Appendix 3 Producing a Uniform Random Scatter of Points on the Surface of, or Within the Positive Quadrant of, a k-Sphere

For a uniform random scatter of points on the surface of a k-sphere, A. Abakuks points out that the joint distribution of k independent normal distributions (zero mean, unit variance) forms a spherical cluster centred on the origin so that if xij (i = 1, 2,... n; j = 1, 2,... k) are k-variate points normally distributed with zero mean and unit variance, the vector

$${y}_{ij}=\frac{{x}_{ij}}{\sqrt{{\sum }_{j=1}^{k}{x}_{ij}{^{2}}}}$$

is uniformly randomly distributed over the surface of a unit k-sphere. If we then form the product zij = |yij|[Ui]1/k, where Ui are uniformly randomly distributed in the interval (0,1), zij will be uniformly randomly distributed within the all-positive quadrant of the unit sphere.

Appendix 4 Relations Between F os, f s⊂o and f o⊂s

The following relations hold between Fos, fs⊂o and fo⊂s (for definitions, see Table 2):

  1. (i)

    Fos = fs⊂o + fo⊂s

  2. (ii)

    fs⊂o = a Fos

  3. (iii)

    fo⊂s = (1 – a) Fos, where

    $$\begin{array}{c}a=\frac{\pi -4\theta }{\pi -4\left(\theta -\mathrm{sin}\theta \mathrm{cos}\theta \right)}=0.4252\cdots \\ \theta ={\mathrm{cos}}^{-1}\left(\frac{\sqrt{\pi }}{2}\right)\end{array}$$

that is, the sphere/octahedron angle for λ = 2 (Fig. 1). The first of these relations is the multi-dimensional extension of a simple 2-dimensional relationship. Similar relations hold between Fcs, fs⊂c  and fc⊂s.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Temple, J.T. Characteristics of Distance Matrices Based on Euclidean, Manhattan and Hausdorff Coefficients. J Classif 40, 214–232 (2023). https://doi.org/10.1007/s00357-023-09435-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-023-09435-1

Keywords

Navigation