Skip to main content
Log in

Some applications of graph theory to clustering

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

This paper attempts to review and expand upon the relationship between graph theory and the clustering of a set of objects. Several graphtheoretic criteria are proposed for use within a general clustering paradigm as a means of developing procedures “in between” the extremes of complete-link and single-link hierarchical partitioning; these same ideas are then extended to include the more general problem of constructing subsets of objects with overlap. Finally, a number of related topics are surveyed within the general context of reinterpreting and justifying methods of clustering either through standard concepts in graph theory or their simple extensions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abraham, C. T. Techniques for thesaurus organization and evaluation. In M. Kochen (ed.),Some problems in information science. New York: The Scarecrow Press, 1965, 131–150. (a)

    Google Scholar 

  • Abraham, C. T. Graph theoretic techniques for the organization of linked data. In M. Kochen (ed.),Some problems in information science. New York: The Scarecrow Press, 1965, 229–264. (b)

    Google Scholar 

  • Anderson, S. S.Graph theory and finite combinatorics. Chicago: Markham, 1970.

    Google Scholar 

  • Augustson, J. C. and Minker, J. An analysis of some graph theoretical cluster techniques.Journal of the Association for Computing Machinery, 1970,17, 571–588. (a)

    Google Scholar 

  • Augustson, J. C. and Minker, J. Deriving term relations for a corpus by graph theoretical clusters.Journal of the American Society for Information Science, 1970,21, 101–111. (b)

    Google Scholar 

  • Bohisud, H. M. and Bohisud, L. E. A metric for classifications.Taxon, 1972,21, 607–613.

    Google Scholar 

  • Bonner, R. E. On some clustering techniques.IBM Journal, 1964,8, 22–32.

    Google Scholar 

  • Boorman, S. A. and Arabie, P. Structural measures and the method of sorting. In R. N. Shepard, A. K. Romney and S. B. Nerlove (eds.),Multidimensional scaling—Volume I. New York: Seminar Press, 1972, 225–249.

    Google Scholar 

  • Boorman, S. A. and Olivier, D. C. Metrics on spaces of finite trees.Journal of Mathematical Psychology, 1973,10, 26–59.

    Google Scholar 

  • Busacker, R. G. and Saaty, T. L.Finite graphs and networks. New York: McGraw-Hill, 1965.

    Google Scholar 

  • Cattell, R. B. and Coulter, M. A. Principles of behavioral taxonomy and the mathematical basis of the taxonome computer program.The British Journal of Mathematical and Statistical Psychology, 1966,19, 237–269.

    Google Scholar 

  • Chabot, J. A simplified example of the use of matrix multiplication for the analysis of sociometric data.Sociometry, 1950,13, 131–140.

    Google Scholar 

  • Clark, J. A. and McQuitty, L. L. Some problems and elaborations of interactive intercolumnar correlational analysis.Educational and Psychological Measurement, 1970,30, 773–784.

    Google Scholar 

  • Cole, A. J. and Wishart, D. An improved algorithm for the Jardine-Sibson method of generating overlapping clusters.The Computer Journal, 1970,13, 156–163.

    Google Scholar 

  • Constantinescu, P. The classification of a set of elements with respect to a set of properties.The Computer Journal, 1966,8, 352–357.

    Google Scholar 

  • Constantinescu, P. A method of cluster analysis.The British Journal of Mathematical and Statistical Psychology, 1967,20, 93–106.

    Google Scholar 

  • Cormack, R. M. A review of classification.Journal of the Royal Statistical Society—Series A, 1971,134, 321–367.

    Google Scholar 

  • Doreian, P. A note on the detection of cliques in valued graphs.Sociometry, 1969,32, 237–242.

    Google Scholar 

  • Erdös, P. and Rényi, A. On the evolution of random graphs.Publications of the Mathematical Institute of the Hungarian Academy of Sciences, 1960,5, 17–61.

    Google Scholar 

  • Estabrook, G. F. A mathematical model in graph theory for biological classification.Journal of Theoretical Biology, 1966,12, 297–310.

    Google Scholar 

  • Festinger, L. The analysis of sociograms using matrix algebra.Human Relations, 1949,2, 153–158.

    Google Scholar 

  • Fillenbaum, S. and Rapoport, A.Structures in the subjective lexicon. New York: Academic Press, 1971.

    Google Scholar 

  • Ford, L. R. and Fulkerson, D. R.Flows in networks. Princeton: Princeton University Press, 1962.

    Google Scholar 

  • Gorinshteyn, L. L. The partitioning of graphs.Engineering Cybernetics, 1969,1, 76–82.

    Google Scholar 

  • Gotlieb, C. C. and Kumar, S. Semantic clustering of index terms.Journal of the Association for Computing Machinery, 1968,15, 493–513.

    Google Scholar 

  • Gower, J. C. Comparison of some methods of cluster analysis.Biometrics, 1967,23, 623–637.

    Google Scholar 

  • Gower, J. C. and Ross, G. J. S. Minimum spanning trees and single linkage cluster analysis.Applied Statistics, 1969,18, 54–64.

    Google Scholar 

  • Harary, F. A graph theoretic approach to similarity relations.Psychometrika, 1964,29, 143–151.

    Google Scholar 

  • Harary, F.Graph theory. Reading, Mass.: Addison-Wesley, 1969.

    Google Scholar 

  • Harary, F. Graph theory as a structural model in the social sciences. In B. Harris (ed.),Graph theory and its applications. New York: Academic Press, 1970, 1–16.

    Google Scholar 

  • Harary, F., Norman, R. Z. and Cartwright, D.Structural models: An introduction to the theory of directed graphs. New York: Wiley, 1965.

    Google Scholar 

  • Harary, F. and Ross, I. C. A procedure for clique detection using the group matrix.Sociometry, 1957,20, 205–215.

    Google Scholar 

  • Harrison, I. Cluster analysis.Metra, 1968,7, 513–528.

    Google Scholar 

  • Hartigan, J. A. Representation of similarity matrices by trees.Journal of the American Statistical Association, 1967,62, 1140–1158.

    Google Scholar 

  • Hubert, L. Some extensions of Johnson's hierarchical clustering algorithms.Psychometrika, 1972,37, 261–274.

    Google Scholar 

  • Hubert, L. Monotone invariant clustering procedures.Psychometrika, 1973,38, 47–62. (a)

    Google Scholar 

  • Hubert, L. Min and max hierarchical clustering using asymmetric similarity measures.Psychometrika, 1973.38, 63–72. (b)

    Google Scholar 

  • Hubert, L. Approximate evaluation techniques for the single-link and complete-link hierarchical clustering procedures.Journal of the American Statistical Association, 1974,69, in press. (a)

  • Hubert, L. Spanning trees and aspects of clustering.British Journal of Mathematical and Statistical Psychology, 1974, in press. (b)

  • Hubert, L. and Schultz, J. The approximate sampling distribution for the minimum number of lines in a connected random graph.Journal of Statistical Computation and Simulation, 1974, in press.

  • Jardine, N. Towards a general theory of clustering.Biometrics, 1969,25, 609–610.

    Google Scholar 

  • Jardine, N. Algorithms, methods and models in the simplification of complex data.The Computer Journal, 1970,13, 116–117.

    Google Scholar 

  • Jardine, N. A new approach to pattern recognition.Nature, 1971,234, 526–528.

    Google Scholar 

  • Jardine, N. and Sibson, R. A model for taxonomy.Mathematical Biosciences, 1968,2, 465–482. (a)

    Google Scholar 

  • Jardine, N. and Sibson, R. The construction of hierarchic and non-hierarchic classifications.The Computer Journal, 1968,11, 177–184. (b)

    Google Scholar 

  • Jardine, N. and Sibson, R.Mathematical taxonomy. New York: Wiley, 1971.

    Google Scholar 

  • Johnson, S. C. Hierarchical clustering schemes.Psychometrika, 1957,32, 241–254.

    Google Scholar 

  • Kruskal, J. B. On the shortest spanning subtree of a graph and the traveling salesman problem.Proceedings of the American Mathematical Society, 1956,7, 48–50.

    Google Scholar 

  • Lance, G. N. and Williams, W. T. A general theory of classifactory sorting strategies I. Hierarchical systems.The Computer Journal, 1967.10, 373–380. (a)

    Google Scholar 

  • Lance, G. N. and Williams, W. T. A general theory of classifactory sorting strategies II. Clustering systems.The Computer Journal, 1967,10, 271–277. (b)

    Google Scholar 

  • Legendre, P. and Rogers, D. J. Characters and clustering in taxonomy: A synthesis of two taximetric procedures.Taxon, 1972,21, 567–606.

    Google Scholar 

  • Lerman, I. C.Les bases de la classification automatique. Paris: Gauthier-Villars, 1970.

    Google Scholar 

  • Levandowsky, M. and Winter, D. Distance between sets.Nature, 1971,234, 34–35.

    Google Scholar 

  • Ling, R. F. On the theory and construction of k-clusters.The Computer Journal, 1972,15, 326–332.

    Google Scholar 

  • Ling, R. F. A probability theory of cluster analysis.Journal of the American Statistical Association, 1973,68, 159–164.

    Google Scholar 

  • Luce, R. D. Connectivity and generalized cliques in sociometric group structure.Psychometrika, 1950,15, 169–190.

    Google Scholar 

  • Luce, R. D. Two decomposition theorems for a class of finite oriented graphs.American Journal of Mathematics, 1952,74, 701–722.

    Google Scholar 

  • Luce, R. D. Networks satisfying minimality conditions.American Journal of Mathematics, 1953,75, 825–838.

    Google Scholar 

  • Luce, R. D. and Perry, A. D. A method of matrix analysis of group structure.Psychometrika, 1949,14, 95–116.

    Google Scholar 

  • Marshall, C. W.Applied graph theory. New York: Wiley, 1971.

    Google Scholar 

  • Matula, D. W. Cluster analysis via graph theoretic techniques. In R. C. Mullin, K. B. Reid, and D. P. Roselle (Eds.),Proceedings of the Louisiana Conference on combinatorics, graph theory, and computing. Winnipeg: University of Manitoba, 1970, 199–212.

    Google Scholar 

  • Matula, D. W.k-components, clusters and slicings in graphs.SIAM Journal of Applied Mathematics, 1972,22, 459–480.

    Google Scholar 

  • Matula, D. W., Marble, G. and Isaacson, J. D. Graph coloring algorithms. In R. C. Read (Ed.),Graph theory and computing. New York: Academic Press, 1972, 109–122.

    Google Scholar 

  • Menger, K. Zur allgemeinen Kurventheorie.Fundamenta Mathematicae, 1927,10, 96–115.

    Google Scholar 

  • McQuitty, L. L. Elementary linkage analysis for isolating orthogonal and oblique types and typal relevancies.Educational and Psychological Measurement, 1957,17, 207–229.

    Google Scholar 

  • McQuitty, L. L. Typal analysis.Educational and Psychological Measurement, 1961,21, 677–697. (a)

    Google Scholar 

  • McQuitty, L. L. Elementary factor analysis.Psychological Reports, 1961,9, 71–78. (b)

    Google Scholar 

  • McQuitty, L. L. Rank order typal analysis.Educational and Psychological Measurement, 1963,23, 55–61.

    Google Scholar 

  • McQuitty, L. L. Capabilities and improvements of linkage analysis as a clustering method.Educational and Psychological Measurement, 1964,24, 441–456.

    Google Scholar 

  • McQuitty, L. L. A mutual development of some typological theories and pattern-analytic methods.Educational and Psychological Measurement, 1967,27, 21–48.

    Google Scholar 

  • McQuitty, L. L. and Clark, J. A. Clusters from iterative intercolumnar correlational analysis.Educational and Psychological Measurement, 1968,28, 211–238.

    Google Scholar 

  • Mulligan, G. C. and Corneil, D. G. Corrections to Bierstone's algorithm for generating cliques.Journal of the Association for Computing Machinery, 1972,19, 244–247.

    Google Scholar 

  • Needham, R. M.The theory of Clumps II. Report Number 139, Cambridge Language Research Unit, Cambridge, England, 1961.

    Google Scholar 

  • Ogilvie, J. C. The distribution of number and size of connected components in random graphs of medium size. In A. J. H. Morrell (Ed.),Information processing: 68. Amsterdam: North Holland Publishing Co., 1969, 1527–1530.

    Google Scholar 

  • Ore, O.Theory of graphs. Providence: American Mathematical Society, 1962.

    Google Scholar 

  • Ore, O.Graphs and their use. New York: Random House, 1963.

    Google Scholar 

  • Overall, J. E. A configural analysis of psychiatric diagnostic stereotypes.Behavioral Science, 1963,8, 211–219.

    Google Scholar 

  • Overall, J. E. and Klett, C. J.Applied multivariate analysis. New York: McGraw-Hill, 1972.

    Google Scholar 

  • Parker-Rhodes, A. F.Contributions to the theory of clumps: The usefulness and feasibility of the theory. Report Number 137, Cambridge Language Research Unit, Cambridge, England, 1961.

    Google Scholar 

  • Parker-Rhodes, A. F. and Needham, R. M.The theory of clumps. Report Number 126, Cambridge Language Research Unit, Cambridge, England, 1961.

    Google Scholar 

  • Peay, E. R.An interactive clique detection procedure. Michigan Mathematical Psychology Program, 70–74, Ann Arbor, Michigan, 1970(a).

    Google Scholar 

  • Peay, E. R.Nonmetric grouping: Clusters and cliques. Michigan Mathematical Psychology Program, 70–75, Ann Arbor, Michigan, 1970(b).

    Google Scholar 

  • Prim, R. C. Shortest connection networks and some generalizations.Bell System Technical Journal, 1957,36, 1389–1401.

    Google Scholar 

  • Restle, F. A metric and an ordering on sets.Psychometrika, 1959,24, 207–219.

    Google Scholar 

  • Rose, M. J. Classification of a set of elements.The Computer Journal, 1964,7, 208–210.

    Google Scholar 

  • Ross, G. J. S. Classification techniques for large sets of data. In A. J. Cole (Ed.),Numerical taxonomy. New York: Academic Press, 1969, 224–233.

    Google Scholar 

  • Ross, I. C. and Harary, F. On the determination of redundancies in sociometric chains.Psychometrika, 1952,17, 195–208.

    Google Scholar 

  • Ross, I. C. and Harary, F. Identification of the liaison persons of an organization using the structure matrix.Management Science, 1955,1, 251–258.

    Google Scholar 

  • Ross, I. C. and Harary, F. A description of strenghtening and weakening group members.Sociometry, 1959,22, 139–147.

    Google Scholar 

  • Roy, D. An algorithm for a general constrained set covering problem. In R. C. Read (Ed.),Graph theory and computing. New York: Academic Press, 1972, 267–283.

    Google Scholar 

  • Schultz, J. and Hubert, L. Data analysis and the connectivity of random graphs.Journal of Mathematical Psychology, 1973,10, 421–428.

    Google Scholar 

  • Shepard, R. N. A taxonomy of some principal types of data and of multidimensional methods for their analysis. In R. N. Shepard, A. K. Romney and S. B. Nerlove (Eds.),Multidimensional scaling-Volume I. New York: Seminar Press, 1972, 21–47.

    Google Scholar 

  • Shepherd, M. J. and Willmott, A. J. Cluster analysis on the Atlas computer.The Computer Journal, 1968,11, 56–62.

    Google Scholar 

  • Sibson, R. Some observations on a paper by Lance and Williams.The Computer Journal, 1971,14, 156–157.

    Google Scholar 

  • Sparck-Jones, K.Automatic keyword classification for information retrieval. London: Butterworths, 1971.

    Google Scholar 

  • Tryon, R. C. and Bailey, D. E. The BCTRY computer system of cluster and factor analysis.Multivariate Behavioral Research, 1966,1, 95–111.

    Google Scholar 

  • Tutte, W. T.The connectivity of graphs. Toronto: Toronto University Press, 1967.

    Google Scholar 

  • Van Rijsbergen, C. J. A clustering algorithm.The Computer Journal, 1970,13, 113–115.

    Google Scholar 

  • Vaswani, P. K. T. A technique for cluster emphasis and its application to automatic indexing. In A. J. H. Morrell (Ed.),Information processing: 68. Amsterdam: North Holland Publishing Co., 1969, 1300–1303.

    Google Scholar 

  • Weiss, R. S. and Jacobson, E. A method for the analysis of the structure of complex organizations.American Sociological Review, 1955,20, 661–668.

    Google Scholar 

  • Whitney, H. Congruent graphs and the connectivity of graphs.American Journal of Mathematics, 1932,54, 150–168.

    Google Scholar 

  • Williams, W. T., Lance, G. N., Dale, M. B. and Clifford, H. T. Controversy concerning the criteria for taxonometric strategies.The Computer Journal, 1971,14, 162–165.

    Google Scholar 

  • Wirth, M., Estabrook, G. F. and Rogers, D. J. A graph theory model for systematic biology, with an example for the Oncidiinae (Orchidaceae).Systematic Zoology, 1966,15, 59–69.

    Google Scholar 

  • Wishart, D. A generalization of nearest neighbor which reduces chaining effects. In A. J. Cole (Ed.),Numerical taxonomy. New York: Academic Press, 1969, 282–311.

    Google Scholar 

  • Zahn, C. T. Graph-theoretical methods for detecting and describing Gestalt clusters.IEEE Transactions on Computers, 1971,C-20, 68–86.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hubert, L.J. Some applications of graph theory to clustering. Psychometrika 39, 283–309 (1974). https://doi.org/10.1007/BF02291704

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02291704

Keywords

Navigation