Skip to main content

Comparing partitions

Abstract

The problem of comparing two different partitions of a finite set of objects reappears continually in the clustering literature. We begin by reviewing a well-known measure of partition correspondence often attributed to Rand (1971), discuss the issue of correcting this index for chance, and note that a recent normalization strategy developed by Morey and Agresti (1984) and adopted by others (e.g., Miligan and Cooper 1985) is based on an incorrect assumption. Then, the general problem of comparing partitions is approached indirectly by assessing the congruence of two proximity matrices using a simple cross-product measure. They are generated from corresponding partitions using various scoring rules. Special cases derivable include traditionally familiar statistics and/or ones tailored to weight certain object pairs differentially. Finally, we propose a measure based on the comparison of object triples having the advantage of a probabilistic interpretation in addition to being corrected for chance (i.e., assuming a constant value under a reasonable null hypothesis) and bounded between ±1.

This is a preview of subscription content, access via your institution.

References

  • ARABIE, P., and BOORMAN, S.A., (1973), “Multidimensional Scaling of Measures of Distance Between Partitions,”Journal of Mathematical Psychology, 10, 148–203.

    Google Scholar 

  • BERRY, K.J., and MIELKE, P.W., (1985), “Goodman and Kruskal's TAU-B Statistic,”Sociological Methods & Research, 13, 543–550.

    Google Scholar 

  • BRENNAN, R.L., and LIGHT, R.J., (1974), “Measuring Agreement When Two Observers Classify People into Categories not Defined in Advance,”British Journal of Mathematical and Statistical Psychology, 27, 154–163.

    Google Scholar 

  • BROOK, R.J., and STIRLING, W.D., (1984), “Agreement Between Observers When the Categories are not Specified in Advance,”British Journal of Mathematical and Statistical Psychology, 37, 271–282.

    Google Scholar 

  • COSTANZO, C.M., HUBERT, L.J., and GOLLEDGE, R.G., (1983), “A Higher Moment for Spatial Statistics,”Geographical Analysis, 15, 347–351.

    Google Scholar 

  • DUBIEN, J.L., and WARDE, W.D., (1981),Some Distributional Results Concerning a Comparative Statistic Used in Cluster Analysis, Unpublished manuscript, Department of Mathematics, Western Michigan University, Kalamazoo, Michigan.

    Google Scholar 

  • FOWLKES, E.B., and MALLOWS, C.L., (1983), “A Method for Comparing Two Hierarchical Clusterings,”Journal of the American Statistical Association, 78, 553–569.

    Google Scholar 

  • FRANK, O., (1976), “Comparing Classifications by the Use of the Symmetric Class Difference,” inProceedings in Computational Statistics, eds. J. Gordesch and P. Maeze, Würzburg: Physica Verlag, 84–96.

    Google Scholar 

  • GAREY, M.R., and JOHNSON, D.S., (1979),Computers and Intractability: A Guide to the Theory of NP-Completeness, San Francisco: W.H. Freeman.

    Google Scholar 

  • GOODMAN, L.A., and KRUSKAL, W.H., (1954), “Measures of Association for Cross-Classifications,”Journal of the American Statistical Association, 49, 732–764.

    Google Scholar 

  • GREEN, P.E., and RAO, V.R., (1969), “A Note on Proximity Measures and Cluster Analysis,”Journal of Marketing Research, 6, 359–364.

    Google Scholar 

  • HARTIGAN, J.A., (1975),Clustering Algorithms, New York: Wiley.

    Google Scholar 

  • HUBERT, L.J., (1977), “Nominal Scale Response Agreement as a Generalized Correlation,”British Journal of Mathematical and Statistical Psychology, 30, 98–103.

    Google Scholar 

  • HUBERT, L.J., (1979), “Matching Models in the Analysis of Cross-Classifications,”Psychometrika, 44, 21–41.

    Google Scholar 

  • HUBERT, L.J., (1983), “Inference Procedures for the Evaluation and Comparison of Proximity Matrices,” inNumerical Taxonomy, ed. J. Felsenstein, New York: Springer-Verlag, 209–228.

    Google Scholar 

  • HUBERT, L.J., GOLLEDGE, R.G., COSTANZO, C.M., and GALE, N., (1985), “Order-Dependent Measures of Correspondence for Comparing Proximity Matrices and Related Structures,” inMeasuring the Unmeasurable, eds. P. Nijkamp and H. Leitner, The Hague: Martinus Nijhoff.

    Google Scholar 

  • JOHNSON, S.C., (1968), “Metric Clustering,” Unpublished manuscript, AT&T Bell Laboratories, Murray Hill, New Jersey.

    Google Scholar 

  • KENDALL, M.G., (1970),Rank Correlation Methods, 4th Edition, London: Griffin.

    Google Scholar 

  • KLASTORIN, T.D., (1985), “Thep-Median Problem for Cluster Analysis: A Comparative Test Using the Mixture Model Approach,”Management Science, 31, 84–95.

    Google Scholar 

  • LERMAN, I.C., (1973), “Etude Distributionelle de Statistiques de Proximité entre Structures Finies de Même Type; Application à la Classification Automatique,”Cahier no. 19 du Bureau Universitaire de Recherche Opérationnelle, Institut de Statistique des Universités de Paris.

  • MIELKE, P.W., (1979), “On Asymptotic Nonnormality of Null Distributions of MRPP Statistics,”Communications in Statistics — Theory and Methods, A8, 1541–1550 (errata:A10, 1981, p. 1795 andA11, 1982, p. 847).

    Google Scholar 

  • MIELKE, P.W., and BERRY, K.J., (1985), “Non-Asymptotic Inference Based on the Chrisquare Statistic forr byc Contingency Tables,”Journal of Statistical Planning and Inference, 12, 41–45.

    Google Scholar 

  • MIELKE, P.W., BERRY, K.J., and BRIER, G.W., (1981), “Application of Multiresponse Permutation Procedures for Examining Seasonal Changes in Monthly Sea-Level Pressure Patterns,”Monthly Weather Review, 109, 120–126.

    Google Scholar 

  • MILLIGAN, G.W., and COOPER, M.C., (1985), “A Study of the Comparability of External Criteria across Hierarchy Levels,” unpublished manuscript, Ohio State University, Columbus, Ohio.

    Google Scholar 

  • MILLIGAN, G.W., SOON, S.C., and SOKOL, L.M., (1983), “The Effect of Cluster Size, Dimensionality, and the Number of Clusters on Recovery of True Cluster Structure,”IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-5, 40–47.

    Google Scholar 

  • MIRKIN, B.G., and CHERNYI, L.B., (1970), “Measurement of the Distance Between Distinct Partitions of a Finite Set of Objects,”Automation and Remote Control, 31, 786–792.

    Google Scholar 

  • MOREY, L.C., and AGRESTI, A., (1984), “The Measurement of Classification Agreement: An Adjustment of the Rand Statistic for Chance Agreement,”Educational and Psychological Measurement, 44, 33–37.

    Google Scholar 

  • PATEFIELD, W.M., (1981), “Algorithm AS159. An Efficient Method of Generating RandomR × C Tables with Given Row and Column Totals,”Applied Statistics, 30, 91–97.

    Google Scholar 

  • RAND, W.M., (1971), “Objective Criteria for the Evaluation of Clustering Methods,”Journal of the American Statistical Association, 66, 846–850.

    Google Scholar 

  • REYNOLDS, H.T., (1977),The Analysis of Cross-Classifications, New York: Free Press.

    Google Scholar 

  • ROHLF, F.J., (1974), “Methods of Comparing Classifications,”Annual Review of Ecology and Systematics, 5, 101–113.

    Google Scholar 

  • ROHLF, F.J., (1982), “Consensus Indices for Comparing Classifications,”Mathematical Biosciences, 59, 131–144.

    Google Scholar 

  • STAM, A.J., (1983), “Generation of a Random Partition of a Finite Set by an Urn Model,”Journal of Combinatorial Theory A, 35, 231–240.

    Google Scholar 

  • WALLACE, D.L., (1983), “Comment”Journal of the American Statistical Association, 78, 569–579.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

William H.E. Day was Acting Editor for the reviewing of this paper. We are grateful to him, Ove Frank, Charles Lewis, Glenn W. Milligan, Ivo Molenaar, Stanley S. Wasserman, and anonymous referees for helpful suggestions. Lynn Bilger and Tom Sharpe provided competent technical assistance. Partial support of Phipps Arabie's participation in this research was provided by NSF Grant SES 8310866 and ONR Contract N00014-83-K-0733.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Hubert, L., Arabie, P. Comparing partitions. Journal of Classification 2, 193–218 (1985). https://doi.org/10.1007/BF01908075

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01908075

Keywords

  • Measures of agreement
  • Measures of association
  • Consensus indices