Abstract
This paper attempts to review and expand upon the relationship between graph theory and the clustering of a set of objects. Several graphtheoretic criteria are proposed for use within a general clustering paradigm as a means of developing procedures “in between” the extremes of complete-link and single-link hierarchical partitioning; these same ideas are then extended to include the more general problem of constructing subsets of objects with overlap. Finally, a number of related topics are surveyed within the general context of reinterpreting and justifying methods of clustering either through standard concepts in graph theory or their simple extensions.
Similar content being viewed by others
References
Abraham, C. T. Techniques for thesaurus organization and evaluation. In M. Kochen (ed.),Some problems in information science. New York: The Scarecrow Press, 1965, 131–150. (a)
Abraham, C. T. Graph theoretic techniques for the organization of linked data. In M. Kochen (ed.),Some problems in information science. New York: The Scarecrow Press, 1965, 229–264. (b)
Anderson, S. S.Graph theory and finite combinatorics. Chicago: Markham, 1970.
Augustson, J. C. and Minker, J. An analysis of some graph theoretical cluster techniques.Journal of the Association for Computing Machinery, 1970,17, 571–588. (a)
Augustson, J. C. and Minker, J. Deriving term relations for a corpus by graph theoretical clusters.Journal of the American Society for Information Science, 1970,21, 101–111. (b)
Bohisud, H. M. and Bohisud, L. E. A metric for classifications.Taxon, 1972,21, 607–613.
Bonner, R. E. On some clustering techniques.IBM Journal, 1964,8, 22–32.
Boorman, S. A. and Arabie, P. Structural measures and the method of sorting. In R. N. Shepard, A. K. Romney and S. B. Nerlove (eds.),Multidimensional scaling—Volume I. New York: Seminar Press, 1972, 225–249.
Boorman, S. A. and Olivier, D. C. Metrics on spaces of finite trees.Journal of Mathematical Psychology, 1973,10, 26–59.
Busacker, R. G. and Saaty, T. L.Finite graphs and networks. New York: McGraw-Hill, 1965.
Cattell, R. B. and Coulter, M. A. Principles of behavioral taxonomy and the mathematical basis of the taxonome computer program.The British Journal of Mathematical and Statistical Psychology, 1966,19, 237–269.
Chabot, J. A simplified example of the use of matrix multiplication for the analysis of sociometric data.Sociometry, 1950,13, 131–140.
Clark, J. A. and McQuitty, L. L. Some problems and elaborations of interactive intercolumnar correlational analysis.Educational and Psychological Measurement, 1970,30, 773–784.
Cole, A. J. and Wishart, D. An improved algorithm for the Jardine-Sibson method of generating overlapping clusters.The Computer Journal, 1970,13, 156–163.
Constantinescu, P. The classification of a set of elements with respect to a set of properties.The Computer Journal, 1966,8, 352–357.
Constantinescu, P. A method of cluster analysis.The British Journal of Mathematical and Statistical Psychology, 1967,20, 93–106.
Cormack, R. M. A review of classification.Journal of the Royal Statistical Society—Series A, 1971,134, 321–367.
Doreian, P. A note on the detection of cliques in valued graphs.Sociometry, 1969,32, 237–242.
Erdös, P. and Rényi, A. On the evolution of random graphs.Publications of the Mathematical Institute of the Hungarian Academy of Sciences, 1960,5, 17–61.
Estabrook, G. F. A mathematical model in graph theory for biological classification.Journal of Theoretical Biology, 1966,12, 297–310.
Festinger, L. The analysis of sociograms using matrix algebra.Human Relations, 1949,2, 153–158.
Fillenbaum, S. and Rapoport, A.Structures in the subjective lexicon. New York: Academic Press, 1971.
Ford, L. R. and Fulkerson, D. R.Flows in networks. Princeton: Princeton University Press, 1962.
Gorinshteyn, L. L. The partitioning of graphs.Engineering Cybernetics, 1969,1, 76–82.
Gotlieb, C. C. and Kumar, S. Semantic clustering of index terms.Journal of the Association for Computing Machinery, 1968,15, 493–513.
Gower, J. C. Comparison of some methods of cluster analysis.Biometrics, 1967,23, 623–637.
Gower, J. C. and Ross, G. J. S. Minimum spanning trees and single linkage cluster analysis.Applied Statistics, 1969,18, 54–64.
Harary, F. A graph theoretic approach to similarity relations.Psychometrika, 1964,29, 143–151.
Harary, F.Graph theory. Reading, Mass.: Addison-Wesley, 1969.
Harary, F. Graph theory as a structural model in the social sciences. In B. Harris (ed.),Graph theory and its applications. New York: Academic Press, 1970, 1–16.
Harary, F., Norman, R. Z. and Cartwright, D.Structural models: An introduction to the theory of directed graphs. New York: Wiley, 1965.
Harary, F. and Ross, I. C. A procedure for clique detection using the group matrix.Sociometry, 1957,20, 205–215.
Harrison, I. Cluster analysis.Metra, 1968,7, 513–528.
Hartigan, J. A. Representation of similarity matrices by trees.Journal of the American Statistical Association, 1967,62, 1140–1158.
Hubert, L. Some extensions of Johnson's hierarchical clustering algorithms.Psychometrika, 1972,37, 261–274.
Hubert, L. Monotone invariant clustering procedures.Psychometrika, 1973,38, 47–62. (a)
Hubert, L. Min and max hierarchical clustering using asymmetric similarity measures.Psychometrika, 1973.38, 63–72. (b)
Hubert, L. Approximate evaluation techniques for the single-link and complete-link hierarchical clustering procedures.Journal of the American Statistical Association, 1974,69, in press. (a)
Hubert, L. Spanning trees and aspects of clustering.British Journal of Mathematical and Statistical Psychology, 1974, in press. (b)
Hubert, L. and Schultz, J. The approximate sampling distribution for the minimum number of lines in a connected random graph.Journal of Statistical Computation and Simulation, 1974, in press.
Jardine, N. Towards a general theory of clustering.Biometrics, 1969,25, 609–610.
Jardine, N. Algorithms, methods and models in the simplification of complex data.The Computer Journal, 1970,13, 116–117.
Jardine, N. A new approach to pattern recognition.Nature, 1971,234, 526–528.
Jardine, N. and Sibson, R. A model for taxonomy.Mathematical Biosciences, 1968,2, 465–482. (a)
Jardine, N. and Sibson, R. The construction of hierarchic and non-hierarchic classifications.The Computer Journal, 1968,11, 177–184. (b)
Jardine, N. and Sibson, R.Mathematical taxonomy. New York: Wiley, 1971.
Johnson, S. C. Hierarchical clustering schemes.Psychometrika, 1957,32, 241–254.
Kruskal, J. B. On the shortest spanning subtree of a graph and the traveling salesman problem.Proceedings of the American Mathematical Society, 1956,7, 48–50.
Lance, G. N. and Williams, W. T. A general theory of classifactory sorting strategies I. Hierarchical systems.The Computer Journal, 1967.10, 373–380. (a)
Lance, G. N. and Williams, W. T. A general theory of classifactory sorting strategies II. Clustering systems.The Computer Journal, 1967,10, 271–277. (b)
Legendre, P. and Rogers, D. J. Characters and clustering in taxonomy: A synthesis of two taximetric procedures.Taxon, 1972,21, 567–606.
Lerman, I. C.Les bases de la classification automatique. Paris: Gauthier-Villars, 1970.
Levandowsky, M. and Winter, D. Distance between sets.Nature, 1971,234, 34–35.
Ling, R. F. On the theory and construction of k-clusters.The Computer Journal, 1972,15, 326–332.
Ling, R. F. A probability theory of cluster analysis.Journal of the American Statistical Association, 1973,68, 159–164.
Luce, R. D. Connectivity and generalized cliques in sociometric group structure.Psychometrika, 1950,15, 169–190.
Luce, R. D. Two decomposition theorems for a class of finite oriented graphs.American Journal of Mathematics, 1952,74, 701–722.
Luce, R. D. Networks satisfying minimality conditions.American Journal of Mathematics, 1953,75, 825–838.
Luce, R. D. and Perry, A. D. A method of matrix analysis of group structure.Psychometrika, 1949,14, 95–116.
Marshall, C. W.Applied graph theory. New York: Wiley, 1971.
Matula, D. W. Cluster analysis via graph theoretic techniques. In R. C. Mullin, K. B. Reid, and D. P. Roselle (Eds.),Proceedings of the Louisiana Conference on combinatorics, graph theory, and computing. Winnipeg: University of Manitoba, 1970, 199–212.
Matula, D. W.k-components, clusters and slicings in graphs.SIAM Journal of Applied Mathematics, 1972,22, 459–480.
Matula, D. W., Marble, G. and Isaacson, J. D. Graph coloring algorithms. In R. C. Read (Ed.),Graph theory and computing. New York: Academic Press, 1972, 109–122.
Menger, K. Zur allgemeinen Kurventheorie.Fundamenta Mathematicae, 1927,10, 96–115.
McQuitty, L. L. Elementary linkage analysis for isolating orthogonal and oblique types and typal relevancies.Educational and Psychological Measurement, 1957,17, 207–229.
McQuitty, L. L. Typal analysis.Educational and Psychological Measurement, 1961,21, 677–697. (a)
McQuitty, L. L. Elementary factor analysis.Psychological Reports, 1961,9, 71–78. (b)
McQuitty, L. L. Rank order typal analysis.Educational and Psychological Measurement, 1963,23, 55–61.
McQuitty, L. L. Capabilities and improvements of linkage analysis as a clustering method.Educational and Psychological Measurement, 1964,24, 441–456.
McQuitty, L. L. A mutual development of some typological theories and pattern-analytic methods.Educational and Psychological Measurement, 1967,27, 21–48.
McQuitty, L. L. and Clark, J. A. Clusters from iterative intercolumnar correlational analysis.Educational and Psychological Measurement, 1968,28, 211–238.
Mulligan, G. C. and Corneil, D. G. Corrections to Bierstone's algorithm for generating cliques.Journal of the Association for Computing Machinery, 1972,19, 244–247.
Needham, R. M.The theory of Clumps II. Report Number 139, Cambridge Language Research Unit, Cambridge, England, 1961.
Ogilvie, J. C. The distribution of number and size of connected components in random graphs of medium size. In A. J. H. Morrell (Ed.),Information processing: 68. Amsterdam: North Holland Publishing Co., 1969, 1527–1530.
Ore, O.Theory of graphs. Providence: American Mathematical Society, 1962.
Ore, O.Graphs and their use. New York: Random House, 1963.
Overall, J. E. A configural analysis of psychiatric diagnostic stereotypes.Behavioral Science, 1963,8, 211–219.
Overall, J. E. and Klett, C. J.Applied multivariate analysis. New York: McGraw-Hill, 1972.
Parker-Rhodes, A. F.Contributions to the theory of clumps: The usefulness and feasibility of the theory. Report Number 137, Cambridge Language Research Unit, Cambridge, England, 1961.
Parker-Rhodes, A. F. and Needham, R. M.The theory of clumps. Report Number 126, Cambridge Language Research Unit, Cambridge, England, 1961.
Peay, E. R.An interactive clique detection procedure. Michigan Mathematical Psychology Program, 70–74, Ann Arbor, Michigan, 1970(a).
Peay, E. R.Nonmetric grouping: Clusters and cliques. Michigan Mathematical Psychology Program, 70–75, Ann Arbor, Michigan, 1970(b).
Prim, R. C. Shortest connection networks and some generalizations.Bell System Technical Journal, 1957,36, 1389–1401.
Restle, F. A metric and an ordering on sets.Psychometrika, 1959,24, 207–219.
Rose, M. J. Classification of a set of elements.The Computer Journal, 1964,7, 208–210.
Ross, G. J. S. Classification techniques for large sets of data. In A. J. Cole (Ed.),Numerical taxonomy. New York: Academic Press, 1969, 224–233.
Ross, I. C. and Harary, F. On the determination of redundancies in sociometric chains.Psychometrika, 1952,17, 195–208.
Ross, I. C. and Harary, F. Identification of the liaison persons of an organization using the structure matrix.Management Science, 1955,1, 251–258.
Ross, I. C. and Harary, F. A description of strenghtening and weakening group members.Sociometry, 1959,22, 139–147.
Roy, D. An algorithm for a general constrained set covering problem. In R. C. Read (Ed.),Graph theory and computing. New York: Academic Press, 1972, 267–283.
Schultz, J. and Hubert, L. Data analysis and the connectivity of random graphs.Journal of Mathematical Psychology, 1973,10, 421–428.
Shepard, R. N. A taxonomy of some principal types of data and of multidimensional methods for their analysis. In R. N. Shepard, A. K. Romney and S. B. Nerlove (Eds.),Multidimensional scaling-Volume I. New York: Seminar Press, 1972, 21–47.
Shepherd, M. J. and Willmott, A. J. Cluster analysis on the Atlas computer.The Computer Journal, 1968,11, 56–62.
Sibson, R. Some observations on a paper by Lance and Williams.The Computer Journal, 1971,14, 156–157.
Sparck-Jones, K.Automatic keyword classification for information retrieval. London: Butterworths, 1971.
Tryon, R. C. and Bailey, D. E. The BCTRY computer system of cluster and factor analysis.Multivariate Behavioral Research, 1966,1, 95–111.
Tutte, W. T.The connectivity of graphs. Toronto: Toronto University Press, 1967.
Van Rijsbergen, C. J. A clustering algorithm.The Computer Journal, 1970,13, 113–115.
Vaswani, P. K. T. A technique for cluster emphasis and its application to automatic indexing. In A. J. H. Morrell (Ed.),Information processing: 68. Amsterdam: North Holland Publishing Co., 1969, 1300–1303.
Weiss, R. S. and Jacobson, E. A method for the analysis of the structure of complex organizations.American Sociological Review, 1955,20, 661–668.
Whitney, H. Congruent graphs and the connectivity of graphs.American Journal of Mathematics, 1932,54, 150–168.
Williams, W. T., Lance, G. N., Dale, M. B. and Clifford, H. T. Controversy concerning the criteria for taxonometric strategies.The Computer Journal, 1971,14, 162–165.
Wirth, M., Estabrook, G. F. and Rogers, D. J. A graph theory model for systematic biology, with an example for the Oncidiinae (Orchidaceae).Systematic Zoology, 1966,15, 59–69.
Wishart, D. A generalization of nearest neighbor which reduces chaining effects. In A. J. Cole (Ed.),Numerical taxonomy. New York: Academic Press, 1969, 282–311.
Zahn, C. T. Graph-theoretical methods for detecting and describing Gestalt clusters.IEEE Transactions on Computers, 1971,C-20, 68–86.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hubert, L.J. Some applications of graph theory to clustering. Psychometrika 39, 283–309 (1974). https://doi.org/10.1007/BF02291704
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02291704