Comparing partitions

Abstract

The problem of comparing two different partitions of a finite set of objects reappears continually in the clustering literature. We begin by reviewing a well-known measure of partition correspondence often attributed to Rand (1971), discuss the issue of correcting this index for chance, and note that a recent normalization strategy developed by Morey and Agresti (1984) and adopted by others (e.g., Miligan and Cooper 1985) is based on an incorrect assumption. Then, the general problem of comparing partitions is approached indirectly by assessing the congruence of two proximity matrices using a simple cross-product measure. They are generated from corresponding partitions using various scoring rules. Special cases derivable include traditionally familiar statistics and/or ones tailored to weight certain object pairs differentially. Finally, we propose a measure based on the comparison of object triples having the advantage of a probabilistic interpretation in addition to being corrected for chance (i.e., assuming a constant value under a reasonable null hypothesis) and bounded between ±1.

This is a preview of subscription content, log in to check access.

References

  1. ARABIE, P., and BOORMAN, S.A., (1973), “Multidimensional Scaling of Measures of Distance Between Partitions,”Journal of Mathematical Psychology, 10, 148–203.

    Google Scholar 

  2. BERRY, K.J., and MIELKE, P.W., (1985), “Goodman and Kruskal's TAU-B Statistic,”Sociological Methods & Research, 13, 543–550.

    Google Scholar 

  3. BRENNAN, R.L., and LIGHT, R.J., (1974), “Measuring Agreement When Two Observers Classify People into Categories not Defined in Advance,”British Journal of Mathematical and Statistical Psychology, 27, 154–163.

    Google Scholar 

  4. BROOK, R.J., and STIRLING, W.D., (1984), “Agreement Between Observers When the Categories are not Specified in Advance,”British Journal of Mathematical and Statistical Psychology, 37, 271–282.

    Google Scholar 

  5. COSTANZO, C.M., HUBERT, L.J., and GOLLEDGE, R.G., (1983), “A Higher Moment for Spatial Statistics,”Geographical Analysis, 15, 347–351.

    Google Scholar 

  6. DUBIEN, J.L., and WARDE, W.D., (1981),Some Distributional Results Concerning a Comparative Statistic Used in Cluster Analysis, Unpublished manuscript, Department of Mathematics, Western Michigan University, Kalamazoo, Michigan.

    Google Scholar 

  7. FOWLKES, E.B., and MALLOWS, C.L., (1983), “A Method for Comparing Two Hierarchical Clusterings,”Journal of the American Statistical Association, 78, 553–569.

    Google Scholar 

  8. FRANK, O., (1976), “Comparing Classifications by the Use of the Symmetric Class Difference,” inProceedings in Computational Statistics, eds. J. Gordesch and P. Maeze, Würzburg: Physica Verlag, 84–96.

    Google Scholar 

  9. GAREY, M.R., and JOHNSON, D.S., (1979),Computers and Intractability: A Guide to the Theory of NP-Completeness, San Francisco: W.H. Freeman.

    Google Scholar 

  10. GOODMAN, L.A., and KRUSKAL, W.H., (1954), “Measures of Association for Cross-Classifications,”Journal of the American Statistical Association, 49, 732–764.

    Google Scholar 

  11. GREEN, P.E., and RAO, V.R., (1969), “A Note on Proximity Measures and Cluster Analysis,”Journal of Marketing Research, 6, 359–364.

    Google Scholar 

  12. HARTIGAN, J.A., (1975),Clustering Algorithms, New York: Wiley.

    Google Scholar 

  13. HUBERT, L.J., (1977), “Nominal Scale Response Agreement as a Generalized Correlation,”British Journal of Mathematical and Statistical Psychology, 30, 98–103.

    Google Scholar 

  14. HUBERT, L.J., (1979), “Matching Models in the Analysis of Cross-Classifications,”Psychometrika, 44, 21–41.

    Google Scholar 

  15. HUBERT, L.J., (1983), “Inference Procedures for the Evaluation and Comparison of Proximity Matrices,” inNumerical Taxonomy, ed. J. Felsenstein, New York: Springer-Verlag, 209–228.

    Google Scholar 

  16. HUBERT, L.J., GOLLEDGE, R.G., COSTANZO, C.M., and GALE, N., (1985), “Order-Dependent Measures of Correspondence for Comparing Proximity Matrices and Related Structures,” inMeasuring the Unmeasurable, eds. P. Nijkamp and H. Leitner, The Hague: Martinus Nijhoff.

    Google Scholar 

  17. JOHNSON, S.C., (1968), “Metric Clustering,” Unpublished manuscript, AT&T Bell Laboratories, Murray Hill, New Jersey.

    Google Scholar 

  18. KENDALL, M.G., (1970),Rank Correlation Methods, 4th Edition, London: Griffin.

    Google Scholar 

  19. KLASTORIN, T.D., (1985), “Thep-Median Problem for Cluster Analysis: A Comparative Test Using the Mixture Model Approach,”Management Science, 31, 84–95.

    Google Scholar 

  20. LERMAN, I.C., (1973), “Etude Distributionelle de Statistiques de Proximité entre Structures Finies de Même Type; Application à la Classification Automatique,”Cahier no. 19 du Bureau Universitaire de Recherche Opérationnelle, Institut de Statistique des Universités de Paris.

  21. MIELKE, P.W., (1979), “On Asymptotic Nonnormality of Null Distributions of MRPP Statistics,”Communications in Statistics — Theory and Methods, A8, 1541–1550 (errata:A10, 1981, p. 1795 andA11, 1982, p. 847).

    Google Scholar 

  22. MIELKE, P.W., and BERRY, K.J., (1985), “Non-Asymptotic Inference Based on the Chrisquare Statistic forr byc Contingency Tables,”Journal of Statistical Planning and Inference, 12, 41–45.

    Google Scholar 

  23. MIELKE, P.W., BERRY, K.J., and BRIER, G.W., (1981), “Application of Multiresponse Permutation Procedures for Examining Seasonal Changes in Monthly Sea-Level Pressure Patterns,”Monthly Weather Review, 109, 120–126.

    Google Scholar 

  24. MILLIGAN, G.W., and COOPER, M.C., (1985), “A Study of the Comparability of External Criteria across Hierarchy Levels,” unpublished manuscript, Ohio State University, Columbus, Ohio.

    Google Scholar 

  25. MILLIGAN, G.W., SOON, S.C., and SOKOL, L.M., (1983), “The Effect of Cluster Size, Dimensionality, and the Number of Clusters on Recovery of True Cluster Structure,”IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-5, 40–47.

    Google Scholar 

  26. MIRKIN, B.G., and CHERNYI, L.B., (1970), “Measurement of the Distance Between Distinct Partitions of a Finite Set of Objects,”Automation and Remote Control, 31, 786–792.

    Google Scholar 

  27. MOREY, L.C., and AGRESTI, A., (1984), “The Measurement of Classification Agreement: An Adjustment of the Rand Statistic for Chance Agreement,”Educational and Psychological Measurement, 44, 33–37.

    Google Scholar 

  28. PATEFIELD, W.M., (1981), “Algorithm AS159. An Efficient Method of Generating RandomR × C Tables with Given Row and Column Totals,”Applied Statistics, 30, 91–97.

    Google Scholar 

  29. RAND, W.M., (1971), “Objective Criteria for the Evaluation of Clustering Methods,”Journal of the American Statistical Association, 66, 846–850.

    Google Scholar 

  30. REYNOLDS, H.T., (1977),The Analysis of Cross-Classifications, New York: Free Press.

    Google Scholar 

  31. ROHLF, F.J., (1974), “Methods of Comparing Classifications,”Annual Review of Ecology and Systematics, 5, 101–113.

    Google Scholar 

  32. ROHLF, F.J., (1982), “Consensus Indices for Comparing Classifications,”Mathematical Biosciences, 59, 131–144.

    Google Scholar 

  33. STAM, A.J., (1983), “Generation of a Random Partition of a Finite Set by an Urn Model,”Journal of Combinatorial Theory A, 35, 231–240.

    Google Scholar 

  34. WALLACE, D.L., (1983), “Comment”Journal of the American Statistical Association, 78, 569–579.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Additional information

William H.E. Day was Acting Editor for the reviewing of this paper. We are grateful to him, Ove Frank, Charles Lewis, Glenn W. Milligan, Ivo Molenaar, Stanley S. Wasserman, and anonymous referees for helpful suggestions. Lynn Bilger and Tom Sharpe provided competent technical assistance. Partial support of Phipps Arabie's participation in this research was provided by NSF Grant SES 8310866 and ONR Contract N00014-83-K-0733.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Hubert, L., Arabie, P. Comparing partitions. Journal of Classification 2, 193–218 (1985). https://doi.org/10.1007/BF01908075

Download citation

Keywords

  • Measures of agreement
  • Measures of association
  • Consensus indices