Skip to main content
Log in

Significance tests for multivariate normality of clusters from branching patterns in dendrograms

  • Articles
  • Published:
Mathematical Geology Aims and scope Submit manuscript

Abstract

A significance test is presented for whether, based on levels of branches in a dendrogram, a cluster is from a multivariate normal distribution. The method compares the observed cumulative graph of number of branches with a graph derived from a simple logistic function. Provided the number of objects or variables is not small, the difference between graphs can be tested by the Kolmogorov-Smirnov, Cramér-von Mises, and Lilliefors statistics.

Logistic functions were obtained by simulation and are available for three similarity measures: (1) Euclidean distances, (2) squared Euclidean distances, and (3) simple matching coefficients, and for five cluster methods: (1) WPGMA, (2) UPGMA, (3) single linkage (or minimum spanning trees), (4) complete linkage, and (5) Ward's increase in sums of squares. For simple matching coefficient, the mean intracluster similarity also is required.

The method allows a test of whether the dendrogram could be from a cluster of smaller dimensionality due to character correlations. Good fit of the data to abnormally large or small dimensionality provides an important warning to interpretation of the dendrogram. Quantiles of test statistics were found by simulation to be well-approximated by logistic functions. The Lilliefors test is recommended for general use; if a conservative test is required, the two-tailed Kolmogorov-Smirnov test is most suitable. The method is suitable for use with a hand calculator, and a computer program for it is available from the author.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Atkinson, A. C. and Pearce, M. C., 1976, The computer generation of beta, gamma and normal random variables: J. Roy. Stat. Soc. Ser. A, v. 139, p 431–461.

    Google Scholar 

  • Blackith, R. E. and Reyment, R. A., 1971, Multivariate morphometrics: Academic Press, London and New York, 412 p.

    Google Scholar 

  • Box, G. E. P. and Muller, M. E., 1958, A note on the generation of random normal deviates: Ann. Math. Stat., v. 29, p. 610–611.

    Google Scholar 

  • Conover, W. J., 1971, Practical nonparametric statistics: John Wiley & Sons, New York, 462 p.

    Google Scholar 

  • Craddock, J. M. 1965, A meterological application of principal component analysis: Statistician, v. 15, p. 143–165.

    Google Scholar 

  • Crow, E. L., Davis, F. A., and Maxfield, M. W., 1960, Statistics manual with examples taken from ordnance development: Dover Publications, New York, 288 p.

    Google Scholar 

  • Davis, J. C., 1973, Statistics and data analysis in geology: John Wiley & Sons, 550 p.

  • Day, N. E., 1969a, Estimating the components of a mixture of normal distributions: Biometrika, v. 56, p. 463–474

    Google Scholar 

  • Day, N. E., 1969b, Divisive cluster analysis and a test for multivariate normality: International Statistical Institute Bulletin, v. 43, no. 2, p. 110–112.

    Google Scholar 

  • Doran, J. E. and Hodson, F. R. 1975, Mathematics and computers in archaeology: Edinburgh University Press, Edinburgh, 381 p.

    Google Scholar 

  • Gower, J. C., 1966, Some distance properties of latent root and vector methods used in multivariate analysis: Biometrika, v. 53, p. 325–338.

    Google Scholar 

  • Gower, J. C. and Banfield, C. F., 1978, Goodness-of-fit criteria for hierarchical classification and their empirical functions: 8th International Biometric Symposium, Constanz, p. 347–361.

  • Gower, J. C. and Ross, G. J. S., 1969, Minimun spanning trees and single linkage cluster analysis: Appl. Stat., v. 18, p. 54–64.

    Google Scholar 

  • Koziol, J. A., 1982, A class of invariant procedures for assessing multivariate normality: Biometrika, v. 69, p. 423–427.

    Google Scholar 

  • Lance, G. N. and Williams, W. T., 1967, A general theory of classificatory sorting strategies. I. Hierarchical systems: Comput. J., v. 9, p. 373–380.

    Google Scholar 

  • Lilliefors, H. W., 1967, On the Kolmogorov-Smirnov test for normality with mean and variance unknown: J. Amer. Stat. Assoc., v. 62, p. 399–402.

    Google Scholar 

  • Milligan, G. W., 1981, A review of Monte Carlo tests of cluster analysis: Multivar. Behav. Res., v. 16, p. 379–407.

    Google Scholar 

  • Milligan, G. W. and Mahajan, V., 1980, A note on procedures for testing the quality of a clustering of a set of objects: Decis. Sci., v. 11, p. 669–677.

    Google Scholar 

  • Mudholkar, G. S. and George, E. O., 1978, A remark on the shape of the logistic distribution: Biometrika, v. 63, p. 667–668.

    Google Scholar 

  • Owen, D. B., 1962, Handbook of statistical tables: Addison-Wesley, Reading, Massachusetts, 580 p.

    Google Scholar 

  • Sneath, P. H. A., 1979, BASIC program for a significance test for clusters in UPGMA dendrograms obtained from squared Euclidean distances: Comput. Geosci., v. 5, p. 127–137.

    Google Scholar 

  • Sneath, P. H. A., 1980a, Some empirical tests for significance of clusters,in E. Diday, L. Lebart, J. P. Pagès, and R. Tomassone (Eds). Data analysis and informatics. Proceedings of the Second International Symposium on Data Analysis and Informatics, organized by the Institut de Recherche d'Informatique et Automatique, Versailles, October 17–19, 1979: North Holland, Amsterdam, P. 491–508.

    Google Scholar 

  • Sneath, P. H. A., 1980b, The probability that distinct clusters will be unrecognized in low dimensional ordinations: Class. Soc. Bull., v. 4, no. 4, p 22–43.

    Google Scholar 

  • Sneath, P. H. A., 1983, Distortions of taxonomic structure from incomplete data on a restricted set of reference strains: J. Gen. Microbiol., v. 129, p. 1045–1073.

    Google Scholar 

  • Sneath, P. H. A. and Hansell, R. I. C. 1985, Naturalness and predictivity of classifications: Bio. J. Linn. Soc., v. 24, p. 217–231.

    Google Scholar 

  • Sneath, P. H. A. and Sokal, R. R., 1973, Numerical taxonomy: the principles and practice of numerical classification: W. H. Freeman, San Francisco, 573 p.

    Google Scholar 

  • Steel, R. G. D. and Torrie, J. H., 1960, Principles and procedures of statistics with special reference to the biological sciences: McGraw-Hill, New York, 481 p.

    Google Scholar 

  • Ward, J. H., Jr. 1963, Hierarchical grouping to optimize an objective function: J. Amer. Stat Assoc., v. 58, p. 236–244.

    Google Scholar 

  • Welch, B. L., 1949, Further note on Mrs. Aspin's tables and on certain approximations to the tabled function: Biometrika, v. 34, p. 293–296.

    Google Scholar 

  • Wishart, D., 1969, Algorithm for hierarchical classifications: Biometrics, v. 22, p. 165–170.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sneath, P.H.A. Significance tests for multivariate normality of clusters from branching patterns in dendrograms. Math Geol 18, 3–32 (1986). https://doi.org/10.1007/BF00897653

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00897653

Key words

Navigation