Skip to main content
Log in

A method for testing the distinctness of clusters: A test of the disjunction of two clusters in Euclidean space as measured by their overlap

  • Published:
Journal of the International Association for Mathematical Geology Aims and scope Submit manuscript

Abstract

A method is described for testing the distinctness of two clusters in Euclidean space. One first calculates the projections, q,of the N1 and N2 members of the clusters onto the line joining the cluster centroids. From the distributions of qan index of disjunction, W,is calculated, which corresponds to an index of overlap, VG.The quantity W√(N1+N2)is distributed as noncentral tsubject to assumptions on the multivariate normal distribution of the clusters. This allows a test of whether the observed disjunction is significantly greater than a chosen figure, which is equivalent to testing whether the overlap of the clusters is significantly less than a corresponding value of VG.Two clusters that appear distinct may be produced simply by the partitioning of a homogeneous swarm into two contiguous regions. Provided that the clusters form a dichotomy in a dendrogram, and that the clustering method yields geometrically convex clusters, a conservative test of this situation can be derived by determining the excess of Wover the value expected for a rectangular distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anderson, T. W., 1958, An introduction to multivariate statistical analysis: John Wiley, New York, 374 p.

    Google Scholar 

  • Aspin, A. A., 1949, Tables for use in comparisons whose accuracy involves two variables, separately estimated: Biometrika, v. 36, p. 290–293.

    Google Scholar 

  • Baker, F. B., and Hubert, L. J., 1975, Measuring the power of hierarchical cluster analysis: J. Amer. Statist. Assoc., v. 70, p. 31–38.

    Google Scholar 

  • Borchardt, G. A., Aruscavage, P. J., and Millard, H. T., Jr., 1972, Correlation of the Bishop Ash, a Pleistocene marker bed, using instrumental neutron activation analysis: J. Sediment. Petrol., v. 42, p. 301–306.

    Google Scholar 

  • Cochran, W. G., and Cox, G. M., 1957, Experimental designs (2nd ed.): John Wiley, New York, 612 p.

    Google Scholar 

  • Day, N. E., 1969, Estimating the components of a mixture of normal distributions: Biometrika, v. 56, p. 463–474.

    Google Scholar 

  • Engelman, L., and Hartigan, J. A., 1969, Percentage points of a test for clusters: J. Amer. Statist. Assoc., v. 64, p. 1647–1648.

    Google Scholar 

  • Fisher, L., and Van Ness, J. W., 1973, Admissible discriminant analysis: J. Amer. Statist. Assoc., v. 68, p. 603–607.

    Google Scholar 

  • Goodall, D. W., 1970, Cluster analysis using similarity and dissimilarity: Biometrie-Praximetrie, v. 11, p. 34–41.

    Google Scholar 

  • Gower, J. C., 1966, Some distance properties of latent root and vector methods used in multivariate analysis: Biometrika, v. 53, p. 325–338.

    Google Scholar 

  • Gower, J. C., 1971, A general coefficient of similarity and some of its properties: Biometrics, v. 27, p. 857–871.

    Google Scholar 

  • Johnson, N. L., and Welch, B. L., 1939, Applications of the non-centralt-distribution: Biometrika, v. 31, p. 362–389.

    Google Scholar 

  • Kendall, M. G., and Stuart, A., 1966, The advanced theory of statistics, v. 3: Griffin, London, 552 p.

    Google Scholar 

  • Lance, G. N., and Williams, W. T., 1967, A general theory of classificatory sorting strategies, I, Hierarchical systems: Computer Jour., v. 9, p. 373–380.

    Google Scholar 

  • Lehmer, A., 1944, Inverse tables of probabilities of errors of the second kind: Ann. Math. Statist., v. 15, p. 388–398.

    Google Scholar 

  • Ling, R. F., 1973, A probability theory of cluster analysis: J. Amer. Statist. Assoc., v. 68, p. 159–164.

    Google Scholar 

  • MacArthur, R. H., 1972, Geographical ecology: Harper & Row, New York, 269 p.

    Google Scholar 

  • Mehta, J. S., and Srinivasan, B., 1970, On the Behrens-Fisher problem: Biometrika, v. 57, p. 649–655.

    Google Scholar 

  • Merrington, M., and Pearson, E. S., 1958, An approximation to the distribution of noncentralt: Biometrika, v. 45, p. 484–491.

    Google Scholar 

  • Mountford, M. D., 1970, A test of the difference between clusters,in Patil, G. P., Pielou, E. C., and Waters, W. E., (eds.), Statistical ecology, v. 3: Pennsylvania University Press, University Park, Pennsylvania, p. 237–257.

    Google Scholar 

  • Orlocci, L., 1967, Data centering: a review and evaluation with reference to component analysis: Syst. Zool., v. 16, p. 208–212.

    Google Scholar 

  • Owen, D. B., 1962, Handbook of statistical tables: Addison-Wesley, Reading, Massachusetts, 580 p.

    Google Scholar 

  • Sneath, P. H. A., 1972, Computer taxonomy,in Norris, J. R., and Ribbons, D. W., (eds.), Methods in microbiology, v. 7A: Academic Press, London, p. 29–98.

    Google Scholar 

  • Sneath, P. H. A., 1974, Test reproducibility in relation to identification: Int. J. Syst. Bacteriol., v. 24, p. 508–523.

    Google Scholar 

  • Sneath, P. H. A., and Johnson, R., 1972, The influence on numerical taxonomic similarities of errors in microbiological tests: J. Gen. Microbiol., v. 72, p. 377–392.

    Google Scholar 

  • Sneath, P. H. A., and Sokal, R. R., 1973, Numerical taxonomy: W. H. Freeman, San Francisco, 573 p.

    Google Scholar 

  • Stevens, M., 1969, Development and use of multi-inoculation test methods for a taxonomy study: J. Med. Lab. Technol., v. 26, p. 253–263.

    Google Scholar 

  • Tang, P. C., 1938, The power function of the analysis of variance tests with tables and illustrations of their use: Statist. Res. Mem., v. 2, p. 126–149.

    Google Scholar 

  • Turner, M. E., 1969, Credibility and cluster: Ann. New York Acad. Sci., v. 161, p. 680–688.

    Google Scholar 

  • Webster, R., 1971, Wilk's criterion: a measure for comparing the value of general purpose soil classifications: J. Soil Sci., v. 22, p. 254–260.

    Google Scholar 

  • Welch, B. L., 1947, The generalization of ‘Student's’ problem when several different population variances are involved: Biometrika, v. 34, p. 28–35.

    Google Scholar 

  • Welch, B. L., 1949, Further note on Mrs. Aspin's tables and on certain approximations to the tabled function: Biometrika, v. 36, p. 293–296.

    Google Scholar 

  • Williams, W. T., Clifford, H. T., and Lance, G. N., 1971, Group-size dependence: a rationale for choice between numerical classifications: Computer J., v. 14, p. 157–162.

    Google Scholar 

  • Williams, W. T., and Dale, M. B., 1965, Fundamental problems in numerical taxonomy: Advanc. Bot. Res., v. 2, p. 35–68.

    Google Scholar 

  • Wolfe, J. H., 1970, Pattern clustering by multivariate mixture analysis: Multiv. Behav. Res., v. 5, p. 329–350.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sneath, P.H.A. A method for testing the distinctness of clusters: A test of the disjunction of two clusters in Euclidean space as measured by their overlap. Mathematical Geology 9, 123–143 (1977). https://doi.org/10.1007/BF02312508

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02312508

Key words

Navigation