Abstract
A method is described for testing the distinctness of two clusters in Euclidean space. One first calculates the projections, q,of the N1 and N2 members of the clusters onto the line joining the cluster centroids. From the distributions of qan index of disjunction, W,is calculated, which corresponds to an index of overlap, VG.The quantity W√(N1+N2)is distributed as noncentral tsubject to assumptions on the multivariate normal distribution of the clusters. This allows a test of whether the observed disjunction is significantly greater than a chosen figure, which is equivalent to testing whether the overlap of the clusters is significantly less than a corresponding value of VG.Two clusters that appear distinct may be produced simply by the partitioning of a homogeneous swarm into two contiguous regions. Provided that the clusters form a dichotomy in a dendrogram, and that the clustering method yields geometrically convex clusters, a conservative test of this situation can be derived by determining the excess of Wover the value expected for a rectangular distribution.
Similar content being viewed by others
References
Anderson, T. W., 1958, An introduction to multivariate statistical analysis: John Wiley, New York, 374 p.
Aspin, A. A., 1949, Tables for use in comparisons whose accuracy involves two variables, separately estimated: Biometrika, v. 36, p. 290–293.
Baker, F. B., and Hubert, L. J., 1975, Measuring the power of hierarchical cluster analysis: J. Amer. Statist. Assoc., v. 70, p. 31–38.
Borchardt, G. A., Aruscavage, P. J., and Millard, H. T., Jr., 1972, Correlation of the Bishop Ash, a Pleistocene marker bed, using instrumental neutron activation analysis: J. Sediment. Petrol., v. 42, p. 301–306.
Cochran, W. G., and Cox, G. M., 1957, Experimental designs (2nd ed.): John Wiley, New York, 612 p.
Day, N. E., 1969, Estimating the components of a mixture of normal distributions: Biometrika, v. 56, p. 463–474.
Engelman, L., and Hartigan, J. A., 1969, Percentage points of a test for clusters: J. Amer. Statist. Assoc., v. 64, p. 1647–1648.
Fisher, L., and Van Ness, J. W., 1973, Admissible discriminant analysis: J. Amer. Statist. Assoc., v. 68, p. 603–607.
Goodall, D. W., 1970, Cluster analysis using similarity and dissimilarity: Biometrie-Praximetrie, v. 11, p. 34–41.
Gower, J. C., 1966, Some distance properties of latent root and vector methods used in multivariate analysis: Biometrika, v. 53, p. 325–338.
Gower, J. C., 1971, A general coefficient of similarity and some of its properties: Biometrics, v. 27, p. 857–871.
Johnson, N. L., and Welch, B. L., 1939, Applications of the non-centralt-distribution: Biometrika, v. 31, p. 362–389.
Kendall, M. G., and Stuart, A., 1966, The advanced theory of statistics, v. 3: Griffin, London, 552 p.
Lance, G. N., and Williams, W. T., 1967, A general theory of classificatory sorting strategies, I, Hierarchical systems: Computer Jour., v. 9, p. 373–380.
Lehmer, A., 1944, Inverse tables of probabilities of errors of the second kind: Ann. Math. Statist., v. 15, p. 388–398.
Ling, R. F., 1973, A probability theory of cluster analysis: J. Amer. Statist. Assoc., v. 68, p. 159–164.
MacArthur, R. H., 1972, Geographical ecology: Harper & Row, New York, 269 p.
Mehta, J. S., and Srinivasan, B., 1970, On the Behrens-Fisher problem: Biometrika, v. 57, p. 649–655.
Merrington, M., and Pearson, E. S., 1958, An approximation to the distribution of noncentralt: Biometrika, v. 45, p. 484–491.
Mountford, M. D., 1970, A test of the difference between clusters,in Patil, G. P., Pielou, E. C., and Waters, W. E., (eds.), Statistical ecology, v. 3: Pennsylvania University Press, University Park, Pennsylvania, p. 237–257.
Orlocci, L., 1967, Data centering: a review and evaluation with reference to component analysis: Syst. Zool., v. 16, p. 208–212.
Owen, D. B., 1962, Handbook of statistical tables: Addison-Wesley, Reading, Massachusetts, 580 p.
Sneath, P. H. A., 1972, Computer taxonomy,in Norris, J. R., and Ribbons, D. W., (eds.), Methods in microbiology, v. 7A: Academic Press, London, p. 29–98.
Sneath, P. H. A., 1974, Test reproducibility in relation to identification: Int. J. Syst. Bacteriol., v. 24, p. 508–523.
Sneath, P. H. A., and Johnson, R., 1972, The influence on numerical taxonomic similarities of errors in microbiological tests: J. Gen. Microbiol., v. 72, p. 377–392.
Sneath, P. H. A., and Sokal, R. R., 1973, Numerical taxonomy: W. H. Freeman, San Francisco, 573 p.
Stevens, M., 1969, Development and use of multi-inoculation test methods for a taxonomy study: J. Med. Lab. Technol., v. 26, p. 253–263.
Tang, P. C., 1938, The power function of the analysis of variance tests with tables and illustrations of their use: Statist. Res. Mem., v. 2, p. 126–149.
Turner, M. E., 1969, Credibility and cluster: Ann. New York Acad. Sci., v. 161, p. 680–688.
Webster, R., 1971, Wilk's criterion: a measure for comparing the value of general purpose soil classifications: J. Soil Sci., v. 22, p. 254–260.
Welch, B. L., 1947, The generalization of ‘Student's’ problem when several different population variances are involved: Biometrika, v. 34, p. 28–35.
Welch, B. L., 1949, Further note on Mrs. Aspin's tables and on certain approximations to the tabled function: Biometrika, v. 36, p. 293–296.
Williams, W. T., Clifford, H. T., and Lance, G. N., 1971, Group-size dependence: a rationale for choice between numerical classifications: Computer J., v. 14, p. 157–162.
Williams, W. T., and Dale, M. B., 1965, Fundamental problems in numerical taxonomy: Advanc. Bot. Res., v. 2, p. 35–68.
Wolfe, J. H., 1970, Pattern clustering by multivariate mixture analysis: Multiv. Behav. Res., v. 5, p. 329–350.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Sneath, P.H.A. A method for testing the distinctness of clusters: A test of the disjunction of two clusters in Euclidean space as measured by their overlap. Mathematical Geology 9, 123–143 (1977). https://doi.org/10.1007/BF02312508
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF02312508