Abstract
There have been many alternative ways in which clusters have been defined. Perhaps the most frequent choice has been the geometric concept of a cluster as a set of point ‘close’ in some space, a concept related to notions of probability density functions and hence to the framework of mathematical statistics. However such a model is not everywhere suitable, and in this paper I shall also examine some of the alternatives, chosen from models which have been used in deciding the number of clusters present. The aim of this examination it twofold. Firstly to indicate what alternatives have in fact been suggested, for many of them are neither well-known nor widely applied. Secondly to try and explore the situations in which one definition might be more appropriate than another. Ultimately such a decision must rest with the analyst, or agent, for approaches to testing for existence of clusters, and for determining the number of clusters, are closely related to the nature of the clusters being sought.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abel, D.J. and W.T. Williams. 1981. NEBALL and F1NGRP: new programs for multiple nearest neighbour analysis. Austral. Comput. J. 13: 24–26.
Arabie, P. and J.D. Carroll. 1980. MAPCLUS: a mathematical programming approach to fitting the ADCLUS model. Psychometrika 45: 211–235.
Backer, E. 1978. Cluster Analysis by Optimal Decomposition of Induced Fuzzy Sets. Delft Univ. Press, pp. 235.
Bailey, T. and J. Cowles. 1984. Cluster definition by optimization of a simple measure. IEEE Trans. Patt. Anal. Mach. Intel. PAMI-6: 645–652.
Baroni-Urbani, C. 1980. A statistical table for the degree of coexistence between two species. Oecologia (Berl) 44: 287–289.
Baroni-Urbani, C. and H.W. Buser. 1976. Similarity of binary data. Syst. Zool. 25: 251–259.
Basford, K. and G.J. McLachlan. 1985a. Estimation of allocation rates in a cluster analysis context. J. Amer. Statist. Assoc. 80: 286–293.
Basford, K. and G.J. McLachlan. 1985b. The mixture method of clustering applied to three-way data. J. Classif. 2: 109–125.
Best, D J., M.A. Cameron and J.K. Eagleson. 1983. A test for comparing large sets of tau values. Biometrika 70: 447–453.
Bezdek, J.C. and I.A. Anderson. 1985. An application of the c-varieties clustering algorithms to polygonal curve fitting. IEEE Trans. Systems, Man and Cybernetics SMC-15: 637–641.
Bezdek, J.C., C. Coray, R. Gunderson and J. Watson. 1981a. Detection and characterization of cluster substructure I. Linear structure: fuzzy c-lines. Siam J. Appl. Math. 40: 339–371.
Bezdek, J.C., C. Coray, R. Gunderson and J. Watson. 1981b. Detection and characterization of cluster substructure II. Fuzzy c-varieties and convex combinations thereof. SIAM J. Appl. Math. 40: 358–372.
Bezdek, J.C. M.P. Windham and R. Ehrlick, 1980. Statistical parameters of cluster validity functionals. Intern. J. Comput. Inform. Sci. 9: 323–336.
Bhapkar, V.P. and K.W. Patterson. 1977. On some nonparametric tests for profile analysis of several multivariate samples. J. Multivar. Anal. 7: 265–273.
Binder, D.A. 1978. Bayesian cluster analysis. Biometrika 65: 31–38.
Binder, D.A. 1981. Approximations to Bayesian clustering rules. Biometrika 68: 275–285.
Bock, H.H. 1985. On some significance tests in cluster analysis. J. Classif. 2: 77–108.
Breiman, L., J.H. Friedman, R.A. Olshen and C.J. Stone. 1984. ‘Classification and Regression Trees’. Wordsworth, Belmont, Ca.
Burtin, Yu.D. 1974. On extreme metric parameters of a random graph I. Asymptotic estimates. Theory Probab. Appl. 19: 710–725.
Cattell, R.B. and M.A. Coulter. 1966. Principles of behavioural taxonomy and the mathematical basis of the TAXONOME computer program. Brit. J. Math. Statist. Psychol. 19: 237–269.
Česka, A. and H. Roemer. 1971. A computer program for identifying species-relevé groups in vegetation studies. Vegetatio 23: 255–276.
Chen, Z. and K.S. Fu. 1975. On the connectivity of clusters. Inform. Sci. 8: 283–299.
Chiu, D.K.V. and A.K.C. Wong. 1986. Synthesizing knowledge: a cluster analysis approach using event covering. IEEE Trans. Systems, Man and Cybernetics SMC-16: 251–259.
Cliff, N., D.J. McCormick, J.L. Zatkin, R.A. Cudeck and L.M. Collins. 1986. BINCLUS: nonhierarchical clustering of binary data. Multivar. Behav. Res. 21: 201–227.
Clifford, H.T. and D.W. Goodall. 1967. A numerical contribution to the classification of the Poaceae. Austral. J. Bot. 15: 499–519.
Cohen, V. and J. Obadia. 1974. Inverse data analysis COMPSTAT 1974. pp 141–148.
Cole, A.J. and D. Wishart. 1970. An improved algorithm for the Jardine-Sibson method of generating overlapping clusters. Comput. J. 13: 156–163.
Colless, D.H. 1984. A method for hierarchical clustering based on predictivity. Syst. Zool. 33: 64–68.
Cook, C.M. 1974. Grammatical Inference by Heuristic Search. Dept. Comput. Sci., Univ. Maryland, College Park, Maryland. Rep. TR-287. 109 pp.
Cook, C.M. and A. Rosenfeld. 1976. Some experiments in grammatical inference. In: J.C. Simon (ed.) Computer Oriented Learning Processes. Nordhoolt, Leiden, pp. 157–174.
Crawford, R.M.M. and D. Wishart. 1967. A rapid multivariate method for the detection and classification of groups of ecologically related species. J. Ecol. 55: 505–524.
Cross, G. 1980. Some approaches to measuring clustering tendency. Dept. Comput. Sci., Coll. Engng, Michigan State Univ. Tech. Rep. TR-80-03. pp. 69.
Dale, M.B. 1979. On linguistic approaches to ecosystems and their classification. In: Multivariate Methods in Ecological Work L. Orlóci, C.R. Rao and M.W. Stiteler (eds.) Statistical Ecology ser. 7. pp. 11–20. Internatl. Coop. Publish. House, Maryland.
Dale, M.B. 1985. On the comparison of conceptual clustering and numerical taxonomy. IEEE Trans. Patt. Anal. Mach. Intel. PAMI-7: 241–244.
Dale, M.B. and D.J. Anderson. 1973. Inosculate analysis of vegetation data. Austral. J. Bot. 21: 253–276.
Dale, M.B., H.T. Clifford and D.R. Ross. 1984. Species, equivalence and morphological redescription: a Stradbroke Island vegetation study. In: R.J. Coleman, J. Covacevich and P. Davie (eds.) Focus on Stradbroke: New Information on North Stradbroke Island and surrounding areas, 1974–1984. Boolarong Publ., Brisbane and Stradbroke Island Management Organization, Amity Point.
Dale, M.B. and D. Walker, 1970. Information analysis of pollen diagrams. Pollen et Spores 12: 21–37.
Dale, M.B. and L.J. Webb. 1975. Numerical methods for the establishment of Associations. Vegetatio 30: 77–87.
Dale, P.E.R., K. Hulsman, B.R. Jahnke and M.B. Dale. 1984. Vegetation and nesting preferences of black noddies at Masthead island., Great Barrier Reef. I. Patterns at the macroscale. Austral. J. Ecol. 9: 335–341.
Dallwitz, M.J. 1974. A flexible computer program for generating identification keys. Syst. Zool. 23: 50–57.
D’Andrade, R.G. 1978. U-statistic hierarchical clustering. Psychometrika 43: 59–67.
Davis, B.R. 1985. An associative hierarchical self-organising system. IEEE Trans. Systems Man and Cybernetics SMC-15: 570–579.
Day, N.E. 1969a. Estimating the components of a mixture of normal distributions. Biometrika 56: 463–474.
Day, N.E. 1969b. Divisive cluster analysis and a test for ultivariate normality. Internatl. Statist. Inst. Bull. 43: 110–112.
Day, W.H.E. 1977. Validity of clusters formed by graph theoretic methods. Math. Bio. Sci. 36: 299–317.
Day, W.H.E. and D.P. Faith. 1986. A model in partial orders for comparing objects by dualistic measures. Math. Bio Sci. 78: 179–192.
Demimirmen, F. 1969. Multivariate procedures and FORTRAN IV programs for evaluation and improvement of classification. Kansas Geolog. Surv. Comput. Contrbtn. 31. pp. 51.
De Soete, G., W.S. de Sarbo and J.D. Carroll. 1985. Optimalvariable weighting for hierarchical clustering: an alternating least squares algorithm. J. Classif. 2: 173–192.
Diday, E. and G. Goveart. 1974. Classification avec distance adaptive. C.R. Acad. Sci. Paris, A 993–995.
Di Gesù, V. and M.C. Maccarone. 1986. Feature selection and ‘possibility theory’ Patt. Recog. 19: 63–72.
Dubes, R.C. and A.K. Jain. 1976. Clustering techniques: the user’s dilemma. Patt. Recog. 8: 247–260.
Dubes, R. and A.K. Jain. 1979. Validity studies in clustering methodologies. Patt. Recog. 11: 235–254.
Dubes, R. and A.K. Jain. 1980. Clustering methodologies in exploratory data analysis. Adv. Comput. 19: 113–228.
Dubes, R.C. and R.L. Hoffman. 1986. Remarks on some statistical properties of the minimum spanning forest. Patt. Recog. 19: 49–53.
Dunn, J.C. 1974. A fuzzy relative of the ISODATA process and its use in detecting compact, well separated clusters. J. Cybernet. 3: 22–57.
Ecob, R. 1978. An empirical evaluation of the behaviour of selected measures of tree and partition similarity in relation to the investigating of the sampling statistics of AID. Egyptian Statist. J. 22: 1–27.
Edelbrock, C. 1979. Mixture model tests of hierarchical clustering algorithms: the problem of classifying everybody. Multiv. Behav. Res. 14: 367–384.
Eigen, D.J., R.F. Fromm and R.A. Northouse. 1974. Clusteranalysis based on dimensional information with application to feature selection and classification. IEEE Trans. Systems Man and Cybernetics SMC-4: 284–294.
Engelman, L. and J.A. Hartigan. 1969. Percentage points of a test for clusters. Amer. Statist. Assoc. J. 64: 1647–1648.
Esty, W.W. 1985. Estimation of the number of classes in a population and the coverage of a sample. Math. Scientist. 10: 41–50.
Eye, A. von 1977. Über die Verwendung von Quadriken zur einbeschreibenden Klassifikation. Biom. J. 19: 283–290.
Eye, A. von and M. Wirsing. 1978. An attempt for a mathematical foundation and evaluation of MACS, a method for multidimensional automatical cluster detection. Biom. J. 20: 655–666.
Eye, A. von and M. Wirsing. 1980. Cluster search by enveloping space density maxima. COMPSTAT 1980, Physicaverlag, Vienna, pp. 447–45.
Faith, D.P. 1985. A model of immunological distance in systematics. J. Theor. Biol. 114: 511–526.
Farris, J.S., A.G. Kluge and M.J. Eckhardt. 1970. A numerical approach to phylogenetic systematics. Syst. Zool. 19: 172–189.
Felsenstein, J. 1983. Parsimony in systematics: biological and statistical issues. Ann. Rev. Ecol. Syst. 14: 313–333.
Feoli, E. and M. Lagonegro. 1983. A resemblance function based on probability: applications to field and simulated data. Vegetatio 53: 3–9.
Feoli, E. and M. Lagonegro. 1984. Effects of sampling intensity and random noise on detection of species groups by intersection analysis. Studia Geobotanica 4: 101–108.
Feoli, E. and D. Lausi. 1980. Hierarchical levels in syntaxonomy based on information functions. Vegetatio 42: 113–115.
Frank, O. 1978a. Inferences concerning cluster structure. Dept. Statistics, Univ. Lund, CODEN: LUSADG/STAT-3050/1–7.
Frank, O. 1978b. Estimation of the number of connected components in a graph by using a sampled subgraph. Scand. J. Statist. 5: 177–188.
Frank, O. and F. Harary. 1982. Cluster inference by using transitivity indices in empirical graphs. Amer. Statist. Assoc. J. 77: 835–840.
Frank, O. and K. Svensson. 1981. On probability distributions of single linkage dendrograms. J. Statist. Comput. Simul. 12: 121–131.
Frey, T. 1966. On the significance of Czekanowki’s index of similarity. Applicationes Mathematicae 9: 1–7.
Frid, L.M. 1970. Minimization of a function specified over a tree. Kybernetika 4: 115–119.
Friedman. J. and L.C. Rafsky. 1979. Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Ann. Statist. 7: 697–717.
Fukunaga, K. and T.E. Flick. 1986. A test of the Gaussianness of a data set using clustering. IEEE Trans. Patt. Anal. Mach. Intel. PAMI-8: 240–247.
Futo, P. 1977. A new model and algorithm for cluster analysis. Szigma 10: 199–220.
Ganasalingam, S. and G.J. McLachlan. 1979. A case study of two clustering methods based on maximum likelihood. Statist. Neerland. 33: 81–90.
Ganesalingam, S. and G.J. McLachlan. 1980. A comparison of mixture and classification approaches to cluster analysis. Commun. Statist.-Theor. Meth. A9: 923–933.
Gasking, D. 1960. Clusters. Australas. J. Phil. 38: 1–36.
Gavrishin, A.I., A. Coradini, and M. Fulchignoni. 1976. On the formulation of the new z 2 criterion. Lab. Astrofisica Spaziale, Rap. 19, Frascati.
Ghosh, S.P. 1975. Consecutive storage of relevant records with redundancy. Commun. A.C.M. 18: 464–471.
Gilbert, N. and T.C.E. Wells. 1966. The analysis of quadrat data. J. Ecol. 54: 675–685.
Giles, R. 1976. Lukasiewicz logic and fuzzy set theory. Int. J. Man-Machine Stud. 8: 313–327.
Gitman, I. 1973. An algorithm for nonsupervised pattern classification. IEEE Trans. Systems Man and Cybernetics SMC-3: 66–74.
Golden, R.R. and P.E. Meehl. 1980. Detection of biological sex: an empirical test of cluster methods. Multiv. Behav. Res. 15: 475–496.
Goodall, D.W. 1953. Objective methods for the classification of vegetation I. The use of positive interspecific correlation. Austral. J. Bot. 1: 39–63.
Goodall, D.W. 1964. A probabilistic similarity index. Nature 203–1098.
Goodall, D.W. 1967. The distribution of the matching coefficient. Biometrics 23: 647–656.
Goodall, D.W. 1969. A procedure for the recognition of uncommon species combinations in sets of vegetation samples. Vegetatio 18: 19–35.
Goodall, D.W, 1973. Sampling similarity and species correlation. In: R.H. Whittaker (ed.), Handbook of Vegetation Science, Vol. 5, pp. 105–156. Junk, The Hague.
Gordesch, J. and P.P. Sint. 1974. Clustering structures. COMPSTAT 74: 82–92.
Gotoh, O. 1986. Alignment of three biological sequences with an efficient traceback procedure. J. Theoret. Biol. 121: 327–337.
Gower, J.C. 1974. Maximal predictive classification. Biometrics 30: 643–654.
Gower, J.C. and C.F. Banfield. 1978. Goodness of fit criteria for hierarchic classification and their empirical functions. Proc. 8th Internatl. Biometrics Symp. Constanz. pp. 347–361.
Gunderson, R.W. 1982. Choosing the r-dimension for the FCV family of clustering algorithms. BIT 22: 140–149.
Gunderson, R.W. 1983. An adaptive FCV clustering algorithm. Interntl. J. Man-Mach. Stud. 19: 97–104.
Gustafson, D.E. and W.E. Kessel. 1978. Fuzzy clustering with a fuzzy covariance matrix. In: D.S. Fu (ed.) Proc. IEEE Conf. Decision Control. pp. 761–76.
Haefner, J.W. 1978. Ecosystem assembly grammars: generative capacity and empirical adequacy. J. Theor. Biol 73: 293–318.
Hájek, P. and T. Havránek. 1978. The GUHA method — its aims and techniques (twenty-four questions and answers). Int. J. Man Mach. Stud. 10: 3–22.
Harper, C.W. Jr. 1978. Groupings by locality in community ecology and palaeoecology. Lethaia 11: 251–257.
Hartigan, J. 1972. Direct clustering of a data matrix. Amer. Statist. Assoc. J. 67: 123–129.
Hartigan, J. 1978. Asymptotic distribution of a clustering criterion. Ann. Statist. 6: 117–131.
Hartigan, J. 1981. Consistency of single linkage for high density clusters. Amer. Statist. Assoc. 76: 388–396.
Hartigan, J.A. 1985. Statistical theory in clustering. J. Classif. 2: 63–76.
Hartigan, P. 1985. Algorithm AS 217. Computation of the Dip statistic to test for unimodality. Appl. Stat. 34: 320–325.
Hawkins, D.M. 1979. Fractiles of an extended multiple outlier test. J. Statist. Comput. Simul. 5: 227–336.
Hayes, W.B. 1978. Some sampling properties of the Fager index for recurrent species groups. Ecology 59: 194–196.
Hill, M.O., R.G.H. Bunce and M.W. Shaw. 1975. Indicator species analysis, a divisive polythetic method of classification and its application to a survey of native pine-woods in Scotland. J. Ecol. 63: 597–613.
Hill, R.S. 1980. A stopping rule for partitioning dendrograms. Bot. Gaz. 141: 321–324.
Hogeweg, P. 1976. Iterative character weighting in numerical taxonomy. Comput. Biol. Med. 6: 199–211.
Hogeweg, P. and B. Hesper. 1974. A model study of biomorphological description. Patt. Recog. 6: 165–179.
Hogeweg, P. and B. Hesper. 1984. The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J. Mol. Evol. 20: 175–156.
Holzner, W. and F. Stockinger. 1973. Der Einsatz von Elektonenrechnern bei der pflanzensoziologischen Tabellenarbeit. Österr. Bot. Z. 121: 303–309.
Hsu, Y-S., J.J. Walker and D.E. Ogren. 1986. A stepwise method for determining the number of component distributions in a mixture. Math. Geol. 18: 153–160.
Hubert, L.J. 1953. Inference procedures for the evaluation and comparison of proximity matrices. In: J. Felsenstein (ed.), Numerical Taxonomy, pp. 209–225. Springer-Verlag, Berlin.
Hubert, L.J. and P. Arabie. 1985. Comparing partitions. J. Classif. 2: 193–218.
Hubert, L.J. and F.B. Baker. 1977. An empirical comparison of baseline models for goodness-of-fit in r-diameter hierarchical clustering. In: J. van Ryzin (ed.), Classification and Clustering, pp. 131–151. Academic Press, New York.
Huxley, A. 1937. ‘Ends and Means.’ (An enquiry into the nature of ideals and into the methods employed for their realization.) Chatto and Windus, London.
Jackson, D.M. 1970. The stability of classifications of binary data. Classif. Soc. Bull. 2: 44–46.
Jackson, D.M. 1972. Stability problems in non-statistical classification theory. Comput. J. 15: 214–221.
Jain, N.C., A. Indrayan and L.R. Goel. 1956. Monte Carlo comparisons of six hierarchical clustering methods on random data. Patt. Recog. 19: 95–99.
Jancey, R.C. 1974. Algorithm for the detection of discontinuities in data sets. Vegetatio 29: 131–133.
Jardine, N. and R. Sibson. 1965. The construction of hierarchic and nonhierarchic classifications. Comput. J. 11: 177–184.
Kashyap, R.L. and B.J. Oommen. 1983. Similarity measures for sets of strings. Intern. J. Comput. Math. 13: 95–104.
Katajainen, J. and O. Nevalainen. 1986. Computing relative neighbourhood graphs in the plane. Patt. Recog. 19: 221–228.
Kendall, M.G. 1945. Rank Correlation Methods. Griffin, London.
Klauber, M.R. 1975. Space-time clustering test for more than two samples. Biometrics 31: 719–726.
Klopman, G. and O.T. Macina. 1985. Use of the computer automated structure evaluation program in determining quantitative structure-activity relationships with hallucinogenic phenylalkylamines. J. Theor. Biol. 113: 637–648.
Korhonen, T. 1984. Self-Organization and Associative Memory, pp. 125–188. Springer-Verlag, Berlin.
Krishna-Iyer, P.V. 1949. The first and second moments of some probability distributions arising from points on a lattice and their application. Biometrika 36: 135–141.
Lambert, J.M. and W.T. Williams. 1962. Multivariate methods in plant ecology. IV. Nodal analysis. J. Ecol. 50: 775–802.
Lance, G.N. and W.T. Williams. 1967. A general theory of classificatory sorting strategies. I. Hierarchical systems. Comput. J. 9: 373–380.
Lance, G.N. 1970 Mixed and discontinuous data. In: R.S. Anderssen and M.R. Osborne (eds.), Data Representation, pp. 102–107. Univ. Queensland Press, St. Lucia, Qld.
Lance, G.N. and W.T. Williams. 1977. Attribute contributions to a classification. Austral. Comput. J. 9: 128–129.
Langridge, D.J. 1971. On the Computation of Shape. Intern. Conf. Frontiers Patt. Recog. 35 pp. Honolulu, Hawaii.
Le Quesne, W.J. 1974. The uniquely derived character concept and its cladistic application. Syst. Zool. 23: 513–517.
Lee, K.L. 1979. Multivariate tests for clusters. Amer. Statist. Assoc. J. 74: 708–714.
Lee, R.C.T., J.R. Slagle and C.T. Mong. 1976. Application of Clustering to Estimate missing Values and Improve Data Integrity. Proc. 2nd Internatl. Conf. Software Engrng, San Francisco, pp. 539–544.
Lefkovitch, L.P. 1975. Choosing clustering levels for nonhierarchical procedures. In: G.F. Estabrook (ed.), Proc. 8th Internatl. Conf. Numerical Taxonomy, pp. 132–142.
Lefkovitch, L.P. 1976. A loss function minimization strategy for grouping from dendrograms. Syst. Zool. 25: 41–48.
Lefkovitch, L.P. 1978. Cluster generation and grouping using mathematical programming. Math. Bio. Sci. 41: 91–110.
Lefkovitch, L.P. 1950. Conditional clustering. Biometrics 36: 43–58.
Lefkovitch, L.P. 1952. Conditional clusters, musters and probability. Math. Bio. Sci. 60: 207–234.
Lefkovitch, L.P. 1955. Further nonparametric tests for comparing dissimilarity matrices based on the relative neighbourhood graph. Math. Bio. Sci. 73: 71–88.
Lehert, P. 1982. Clustering by connected components in 0 (n) expected time. RAIRO Informat. 15: 207–218.
Lennington, R.K. and R.H. Flake. 1975. Statistical evaluation of a family of clustering methods. In: G.F. Estabrook (ed.), Proc. 5th Interntl. Conf. Numerical Taxonomy, pp. 1–37.
Lessig, V.P. 1972. Comparing cluster analyses with cophenetic correlation. J. Marketing Res. 9: 82–84.
Lewis, P.A.W., P.B. Baxendale and J.L. Bennet. 1967. Statistical discrimination of the Synonymy/Antonymy relationship between words. Assoc. Comput. Mach. J. 14: 20–44.
Lim, T.M. and H.W. Khoo. 1985. Sampling properties of Gower’s general coefficient of similarity. Ecology 66: 1682–1685.
Ling, R.F. 1972. On the theory and construction of k-clusters. Comput. J. 15: 326–332.
Ling, R.F. 1973a. The expected number of components in random linear graphs. Ann. Probab. 1: 876–881.
Ling, R.F. 1973b. A probability theory of cluster analysis. Amer. Statist. Assoc. J. 68: 159–154.
Ling, R.F. 1975. An exact probability distribution on the connectivity of graphs. J. Math. Psychol. 12: 90–96.
Ling, R.A. and G.C. Killough. 1976. Probability tables for cluster analysis based on a theory of random graphs. Amer. Statist. Assoc. J. 71: 293–300.
Lingoes, J. and T. Cooper. 1971. PEP-I. A FORTRAN IV (G) program for Guttman-Lingoes nonmetric probability clustering. Behav. Sci. 16: 259–261.
Libert, G. and M. Roubens. 1983. New experimental results in cluster validity of fuzzy clustering algorithms. In: J. Janssen, J-F. Marcotorchino and J-M. Proth (eds.), New Trends in Data Analysis and Applications, pp. 205–218. North-Holland, Amsterdam.
Little, I.P. and D.R. Ross. 1985. The Levenshtein metric, a new means for soil classification tested by data from a sandpodzol chronosequence and evaluated by discriminant analysis. Aust. J. Soil. Res. 23: 115–130.
López De Màntaras, R. and J. Aguilar-Martin. 1985. Selflearning pattern classification using a sequential clustering technique. Patt. Recog. 18: 271–277.
Lukasová, A. 1979. Hierarchical agglomerative clustering procedure. Patt. Recog. 11: 365–381.
Lumelsky, V.J. 1982. A combined algorithm for weighting the variables and clustering in the clustering problem. Patt. Recog. 15: 53–60.
Macnaughton-Smith, P. 1965. Some Statistical and Other Numerical Methods for Classifying Individuals. Home Office Res. Unit. Rep. 6. 65 pp.
Mantel, N. 1967. The detection of disease clustering and a generalized regression. Cancer Res. 27: 209–220.
Margules, C.R., D.P. Faith and L. Belbin. 1985. An adjacency constraint in agglomerative hierarchical classifications of geographic data. Environ. Planning A-17: 397–412.
Massart, D.L., F. Plastria and L. Kaufman. 1983. Nonhierarchical clustering with MASLOC. Patt. Recog. 16: 507–516.
Matula, D.W. 1983. Cluster validity by concurrent chaining. In: J. Felsenstein (ed.) Numerical Taxonomy, pp. 156–166. Springer-Verlag, Berlin.
McBratney, A.R. and A.W. Moore. 1985. Application of fuzzy sets to climatic classifications. Agric. For. Meteor. 35: 165–185.
Michalski, R.S. 1980a. Pattern recognition as rule-guided inductive inference. IEEE. Trans. Patt. Anal. Mach. Intel. PAMI-2: 349–361.
Michalski, R.S. 1980b. Knowledge acquisition through conceptual clustering: a theoretical framework and an algorithm for partitioning data into conjunctive concepts. J. Policy Anal. Inform. Sci. 4: 219–244.
Michalski, S. and R.E. Stepp. 1985. Automated construction of classifications: conceptual clustering versus numerical taxonomy. IEEE Trans. Patt. Anal. Mach Intel. PAMI-5: 396–410.
Michaud, P. 1983. Opinions aggregation. In: J. Janssen, J-P. Marcotorchino and J-M. Proth (eds.), New Trends in Data Analysis and Applications, pp. 5–27, North-Holland, Amsterdam.
Milligan, G.W. 1981. A review of Monte Carlo tests for clustering. Multiv. Behav. Res. 16: 379–407.
Milligan, G.W. and P.D. Isaac. 1980. The validation of four ultrametric clustering algorithms. Patt. Recog. 13: 41–50.
Milligan, G.W. and V. Mahajan. 1980. A note on procedures for testing the quality of a clustering of a set of objects. Decis. Sci. 11: 669–677.
Milligan, G.W., S.C. Soon and L.M. Sokol. 1983. The effect of cluster size, dimensionality and the number of clusters on recovery of true cluster structure. IEEE Trans. Patt. Anal. Mach. Intel. PAMI-5: 40–47.
Minkoff, E.C. 1965. The effect on classification of slight alterations in numerical taxonomy. Syst. Zool. 15: 196–213.
Mojena, R. 1977. Hierarchical grouping methods and stopping rules: an evaluation. Comput. J. 20: 359–363.
Molander, P. 1986. Induction of categories: the problem of multiple equilibria. J. Math. Psychol. 30: 42–54.
Moller-Anderson, N. 1978. Some principles and methods of cladistic analysis with notes on the uses of cladistics in classification and biogeography. Z. Zool. Syst. Evolutionsforsch. 16: 242–255.
Mountford, M.D. 1971. A test of the difference between two clusters. In: Patil, G.P., Pielou, E.C. and Waters, W.E. ‘Statistical Ecology 3.’ Penn. State Univ. Press pp. 237–251.
Murtagh, F. 1983. A probability theory of hierarchic clustering using random dendrograms. J. Statist. Comput. Simul. 18: 145–157.
Nakamura, K. and S. Iwai. 1982. A representation of analogical inference by fuzzy sets and its application to information retrieval system. In: M.M. Gupta and E. Sanchez (eds.), Fuzzy Information and Decision Processes, pp. 373–386. North Holland, Amsterdam.
Naur, J.I. and L. Rabinowitz. 1975. The expectation and variance of the number of components in random linear graphs. Ann. Probab. 3: 159–161.
Noy-Meir, I. 1973. Data transformations in ecological ordination. I. Some advantages of non-centering J. Ecol. 61: 329–341.
O’Callaghan, J.F. 1976. A model for recovering perceptual organization from dot patterns. IEEE 3rd Internatl. Conf. Patt. Recog. Proc. pp 294–298.
O’Gilvie, J.C. 1969. The distribution of number and size of connected components in a random graphs of medium size. Information Processing 68, North-Holland, Amsterdam, pp. 1527–1530.
O’Gorman, L. and A.C. Sanderson. 1984. The converging squares algorithm: an efficient method for locating peaks in multidimensions. IEEE. Trans. Patt. Anal. Mach. Intel. PAMI-6: 280–288.
Orford, J.D, 1976. Implementation of criteria for partitioning a dendrogram. Math. Geol. 8: 75–84,
Ozawa, K. 1983. CLASSIC: a hierarchical clustering algorithm based on asymmetric similarities. Patt. Recog. 16: 201–211.
Ozawa, K. 1985. A stratificational overlapping cluster scheme. Patt. Recog. 18: 279–286.
Palka, Z. 1982. Isolated trees on a random graph. Zastos. Matem. 17: 309–316.
Panayirci, E. and R.C. Dubes. 1983. A test for multidimensional clustering tendency. Patt. Recog. 16: 433–444.
Pawlak, Z. 1984. Rough classification. Int. J. Man-Mach. Studies 20: 469–483.
Peay, E.H. 1975. Nonmetric grouping: Clusters and cliques. Psychometrika 40: 297–313.
Perillo, G.M.E. and E. Marone. 1986a. Determining optimal numbers of class intervals using maximal entropy. Math. Geol. 18: 401–407.
Perillo, G.M.E. and E. Marone. 1986b. Applications of the maximal entropy and optimal number of class interval concept: two examples. Math. Geol. 18: 465–475.
Phillips, T.H. and A. Rosenfeld. 1986. A simplified method of detecting structure in Glass patterns. Patt. Recog. Lett 4: 213–217.
Pirktl, L. 1983. On the use of cluster analysis for partitioning and allocating computational objects in distributed computing systems. In: J.E. Gentleman (ed.), Computer Science and Statistics: the Interface, pp. 361–364. North Holland, Amsterdam.
Plastria, F. 1986. Two hierarchies associated with each clustering scheme. Patt. Recog. 19: 193–196.
Popma, J., L. Mucina, O. van Tongeren and E. van der Maarel, 1983. On the determination of optimal levels in phytosociological classification. Vegetatio 52: 65–76.
Rachman, M.I. and S.Ja. Kozýakov. 1986. A statistical method for comparison of two structures and its biological application. Biom. J. 2: 183–195.
Rahman, M.A. 1962. On the sampling distribution of the studentized Penrose measure of distance. Ann. Human Genet. 26: 97–106.
Ratkowsky, D. and G.M. Lance. 1978. A criterion for determining the number of groups in a classification. Austral. Comput. J. 10: 115–117.
Rogers, C.C.G. 1978. The probability that 2 samples in the plane have disjoint convex hulls. J. Appl. Prob. 15: 790–802.
Rohlf, F.J. 1975. Generalization of the gap test for the detection of multivariate outliers. Biometrics 31: 93–101.
Rose, M.J. 1965. Classification of a set of elements. Comput. J. 7: 208–224.
Ross, D.R. 1979. TAXON Users Manual, ed. P3. CSIRO, Division Computing Research, Canberra, A.C.T.
Roubens, N. 1978. Pattern classification problems and fuzzy sets. Fuzzy sets and systems 1: 239–253.
Roubens, M. 1982. Fuzzy clustering algorithms and their cluster validity. Eur. J. Oper. Res. 10: 294–301.
Rousseau, P. 1978. Maximum likelihood clustering of binary data sets. Classif. Soc. Bull. 4.
Ruspini, E.H. 1982. A new approach to clustering. Inf. Control. 15: 22–32.
Sandland, R.L. and P.C. Young. 1979. Probabilistic tests and stopping rules associated with hierarchical classification techniques. Austral. J. Ecol. 4: 399–406.
Sankoff, D. and J.B. Kruskal. 1983. Time Warps, String Edits and Macromolecules: the theory and practice of sequence comparison. Addison-Wesley, London, pp. 382.
Sattath, S. and A. Tversky. 1977. Additive similarity trees. Psychometrika 42: 319–345.
Särndal, C.E.A. 1976. A Monte Carlo study of some asymmetric association measures. Brit. J. Math. Statist. Psychol. 29: 94–102.
Schaeben, H. 1984. A new cluster algorithm for orientation data. Math. Geol. 16: 139–153.
Scher, A., M. Schneier, and A. Rosenfeld. 1982. Clustering of collinear line segments. Patt. Recog. 15: 85–91.
Schueler, L. and H. Wolff. 1980. Automatic classification in the case of unknown number of clusters using global density estimates. Biom. J. 22: 745–754.
Schultz, J.V. and L.J. Hubert. 1973. Data analysis and the connectivity of random graphs. J. Math. Psychol. 10: 421–435.
Schultz, J.V. and L.J. Hubert. 1975. Empirical evaluation of an approximate result in random graph theory. Brit. J. Math. Statist. Psychol. 28: 103–111.
Sclove, S.L. 1977. Population mixture models and clustering algorithms. Commun. Statist.-Theor. Meth. A6: 417–434.
Scott, A.J. and M. Knott. 1976. An approximate test for use with AID. Appl. Statist. 25: 103–109.
Scott, D.W. and J.R. Thompson. 1983. Probability density estimates in higher dimensions. In: J.E. Gentleman (ed.), Computer Science and Statistics, the Interface, pp. 173–179. North Holland, Amsterdam.
Segen, J. and A.C. Sanderson. 1979. A minimal representation criterion for clustering. In: J.F. Gentleman (ed.), Comput. Science and Statistics, the Interface, pp. 332–334, North Holland, Amsterdam.
Selem, S.Z. and M.A. Ismail. 1984. Soft clustering of multidimensional data: a semi-fuzzy approach. Patt. Recog. 17: 559–568.
Selkow, S.M. 1974. Diagnostic keys as a representation for context in pattern recognition. IEEE Trans. Comput. C-23: 970–971.
Sen Gupta, A. 1982/83. Tests for simultaneously determining the number of clusters and their shape with multivariate data. Statist. Probab. Lett. 1: 46–50.
Shafer, E., R. Dubes and A.K. Jain. 1979. Single-link characteristics of a mode-seeking clustering algorithm. Patt. Recog. 11; 65–70.
Shanley, R.J. and M.A. Mahtab. 1976. Delineation and analysis of clusters in orientation data. Math. Geol. 8: 9–23.
Shapiro, L.G. and R.M. Haralick. 1979. Decomposition of two-dimensional shapes by graph-theoretic clustering. IEEE Trans. Patt. Anal. Mach. Intel. PAMI 1: 10–20.
Shepard, R.N. and P. Arabie. 1979. Additive clustering: representation of similarities as combinations of discrete overlapping properties. Psychol. Rev. 86: 87–123.
Siemiatychi, J. 1978. Mantel’s space-time clustering statistic. I. Computing higher moments and a comparison of various data transforms. J. Statist. Comput. Simulation 7: 13–31.
Simon, J.C. and G. Guiho. 1972. On algorithms preserving neighbourhood to file and retrieve information in a memory. Intern. J. Comput. Inform. Sci. 1: 3–15.
Smith, S.P. and R. Dubes. 1980. Stability of a hierarchical clustering. Patt. Recog. 12: 177–187.
Smith, S.P. and A.K. Jain. 1984. Testing for uniformity in multidimensional data. IEEE Trans. Patt. Anal. Mach. Intel. PAMI-6: 73–81.
Smith, W. and J.F. Grassle. 1977. Sampling properties of a family of diversity measures. Biometrics 33: 282–292.
Sneath, P.H.A. 1966. A method for curve seeking from scattered points. Comput. J. 8: 383–391.
Sneath, P.H.A. 1979. BASIC program for a significance test for 2 clusters in Euclidean space as measured by their overlap. Comput. Geosci. 5: 143–155.
Sneath, P.H.A. 1980a. Some empirical tests for significance of clusters. In: E. Diday, L. Lebart, J.P. Pagès and R. Tomassone (eds.), Data Analysis and Informatics, pp. 491–508. North Holland, Amsterdam.
Sneath, P.H.A. 1980b. The probability that distinct clusters will be unrecognised in low dimensional ordinations. Classif. Soc. Bull. 4: 22–43.
Sneath, P.H.A. 1985. DENBRAN: a BASIC program for a significance test for multivariate normality of clusters from branching points in dendrograms. Comput. Geosci. 11: 767–785.
Sneath, P.H.A. 1986. Significance tests for multivariate normality of clusters from branching patterns of dendrograms. J. Math. Geol. 18: 3–32.
Sonquist, J.A., E.L. Baker and J.N. Morgan. 1973. Searching for Structure: An Approach to Analysis of Substantial Bodies of Micro-data and Documentation for a Computer Program. Inst. Soc. Res., Univ. Michigan, Ann Arbor. 236 pp.
Stepp, R.E. and R.S. Michalski. 1986. Conceptual clustering of structured objects: a goal orientated approach. Art. Intell. 28: 43–70.
Stoddard, A.M. 1979. Standardization of measures prior to cluster analysis, Biometrics 35: 765–773.
Strauss, R.E. 1982. Statistical significance of species clusters in association analysis. Ecology 64: 634–639.
Switzer, P. 1968. Statistical techniques in pattern recognition and clustering. Proc. Amer. Statist Assoc. 40–47.
Thurstone, L.L. 1945. A multiple group method for factoring the correlation matrix, Psychometrika 1: 73–78.
Tou, J.T. 1979. DYNOC — A dynamic optimal cluster-seeking technique. Internl. J. Comput. Inform. Sci. 8: 541–547.
Toussaint, G.T. 1974. Some properties of Matusita’s measure of affinity of several distributions. Ann. Inst. Statist. Math. 26: 389–394.
Trivedi, M.M, and J.C. Bezdek. 1986. Low-level segmentation of aerial images with fuzzy clustering. IEEE Trans. Syst., Man Cybern. SMC-116: 589–598.
Tsukamura, M. 1976. Conditions for normal distribution of matching coefficients involved in a cluster in numerical classification. Japan. J. Microbiol. 20: 357–359.
Uttley, A.M. 1970. The Informon. J. Theor. Biol. 27: 31–45.
Velasco, F.R.D. 1980. A method of analysis of Gaussian-like clusters. Patt. Recog. 12: 381–393.
Verfielst, N.D., M.G.M. Koppen and E.P. Van Essen. 1985. The exact distribution of and index of agreement between partitions. Brit. J. Math. Statist. Psychol. 38: 44–57.
Vesely, A. 1981. Logically oriented cluster-analysis. Kybernetika 17: 82–92.
Wacker, A.G. 1972. The effect of subclass numbers on maximum likelihood gaussian classification. Proc. 8th Remote Sensing Conf., East Lansing, pp. 851–859.
Wainer, H. and S. Schacht. 1978. Gapping. Physchometrika 43: 203–212.
Wallace, C.S. and D.A. Boulton. 1968. An information measure for classification. Comput. J. 11: 185–194.
Warnekar, C.S. and G. Krishna. 1979. An algorithm to detect linearly seperable clusters of binary variables. Patt. Recog. 11: 109–113.
Watanabe, S. 1969. Knowing and Guessing. J. Wiley, New York. pp. 376–379.
Whitfield, J.W. 1953. The distribution of total rank values for one particular object in m rankings of n objects. Brit. J. Statist. Psychol. 6: 35–40.
Williams, W.T. and J.S. Bunt. 1980. Studies on the analysis of data from Australian tidal forests (‘Mangroves’). II. The use of an asymmetric monothetic divisive classificatory program. Austral. J. Ecol. 5: 391–396.
Williams, W.T. and M.B. Dale. 1965. Fundamental problems in numerical taxonomy. Adv. Bot. Res. 2: 35–68.
Williams, W.T. and J.M. Lambert. 1959. Multivariate methods in plant ecology I. Association-analysis in plant communities. J. Ecol 47: 83–101.
Williams, W.T., J.M. Lambert and G.N. Lance. 1966. Multivariate methods In plant ecology V. Similarity analyses and information analysis. J. Ecol. 54: 427–445.
Williams, W.T., G.N. Lance. L.J. Webb, J.T. Tracey and M.B. Dale. 1969. Studies in the numerical analysis of complex rain-forest communities III. The analysis of successional data. J. Ecol. 57: 515–535.
Williams, W.T. and J.G. Tracey. 1984. Network analysis of north Queensland rainforests. Austral. J. Bot. 32: 109–116.
Windham, M.P. 1985. Numerical classification of proximity data with assignment measures. J. Classif. 2: 157–172.
Wishart, D. 1969. Numerical classification method for deriving natural classes. Nature 22: 97–98.
Wong, M.A. 1984. Asymptotic properties of univariate sample K-means clusters. J. Classif. 1: 265–270.
Wong, A.K.C. and D.E. Ghahraman. 1980. Random graphs: structural-contextual dichotomy. IEEE. Trans. Patt. Anal. Mach. Intel. PAMI-2: 1341–355.
Wong, A.K.C. and T.S. Liu. 1975. Typicality, diversity and feature pattern of an ensemble. IEEE. Trans. Comput. C-24: 158–181.
Yamamoto, S., K. Ushio, S. Tazawa, H. Ikeda, F. Tamari and N. Hamada. 1977. Partitions of a query set into minimal number of subsets having the consecutive retrival property. J. Statist. Planning Infer. 1: 41–51.
Yolkina, V.N. and N.G. Zagorniko. 1978. Some classification algorithms developed at Novosibirsk. RAIRO Informatique/Computer science. 12: 37–46.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1991 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Dale, M.B. (1991). Knowing When to Stop: Cluster Concept — Concept Cluster. In: Feoli, E., Orlóci, L. (eds) Computer assisted vegetation analysis. Handbook of vegetation science, vol 11. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-3418-7_14
Download citation
DOI: https://doi.org/10.1007/978-94-011-3418-7_14
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-5512-3
Online ISBN: 978-94-011-3418-7
eBook Packages: Springer Book Archive