Abstract
Parameters are derived of distributions of three coefficients of similarity between pairs (dyads) of operational taxonomic units for multivariate binary data (presence/absence of attributes) under statistical independence. These are applied to test independence for dyadic data. Association among attributes within operational taxonomic units is allowed. It is also permissible for the two units in the dyad to be drawn from different populations having different presence probabilities of attributes. The variance of the distribution of the similarity coefficients under statistical independence is shown to be relatively large in many empirical situations. This result implies that the practical interpretation of these coefficients requires much care. An application using the Jaccard index is given for the assessment of consensus between psychotherapists and their clients.
Résumé
Les paramètres de la distribution de trois coefficients de similarité entre paires d'éléments taxinomiques opérationels de données multivariables binaires (présence/absence) ont été dérivés dans l'hypothèse d'indépendance statistique. Ces paramètres sont utilisés dans un test d'indépendance pour les données dyadiques. L'existence est autorisée, dans la population d'éléments, d'une association entre plusieurs attributs. Il est également permis que les deux éléments de la dyade soient tirés de deux populations différentes, ayant différentes probabilit és quant à la présence des attributs. Dans beaucoup de situations empiriques, la variance des coefficients de similarité peut être relativement élevée dans le cas d'indépendance statistique. Par conséquence, ces coefficients doivent être interprétés avec précaution. Un exemple est donné pour le coefficient de Jaccard, qui a été employé dans une recherche sur la concordance entre des psychothérapeutes et leurs clients.
Similar content being viewed by others
References
ANDERBERG, M. R. (1973),Cluster Analysis for Applications, New York: Academic Press.
AUSTIN, B., and Colwell, R. R. (1977), “Evaluation of Some Coefficients for Use in Numerical Taxonomy of Microorganisms,”International Journal of Systematic Bacteriology, 27, 204–210.
BARONI-URBANI, C., and BUSER, M. W. (1976), “Similarity of Binary Data,”Systematic Zoology, 25, 251–259.
BARONI-URBANI, C. (1980), “A Statistical Table for the Degree of Coexistence Between Two Species,”Oecologia, 44, 287–289.
BISHOP, Y. M. M., Fienberg, S. E., and Holland, P. W. (1975),Discrete Multivariate Analysis: Theory and Practice, Cambridge, Mass.: MIT Press.
COLEMAN, B. D., Mares, M. A., Willig, M. R., and Hsieh, Y.-H. (1982), “Randomness, Area and Species Richness,”Ecology, 63, 1121–1133.
CONNOR, E. F., and Simberloff, D. (1978), “Species Number and Compositional Similarity of the Galapagos Flora and Avifauna,”Ecological Monographs, 48, 219–248.
COQUIN-VIENNOT, D. (1975), “Recherche d'une organisation mnemonique interne dans un ensemble de donnees,”Annee Psychologique, 75, 575–597.
DICE, L. R. (1945), “Measures of the Amount of Ecological Association Between Species,”Ecology, 26, 297–302.
DICE, L. R. (1952), “Measure of the Spacing Between Individuals Within a Population,”Contributions of the Laboratory of Vertebrate Biology of the University of Michigan, 55, 1–23.
DORMAAR, M., Dijkman-Caes, C., and De Vries, M. W. (1989), “Consensus in Client-Therapist Interactions; A Measure of the Therapeutic Relationship Related to Outcome,” Accepted for publication,Psychotherapy and Psychosomatics.
ELSTON, R.C., Schroeder, S. R., and Rohjan, J. (1982), “Measures of Observer Agreement When Binomial Data Are Collected in Free Operant Situations,”Journal of Behavioral Assessment, 4, 299–310.
EVERITT, B. S. (1980),Cluster Analysis (2nd ed.), London: Gower.
GOODALL, D. W. (1967), “The Distribution of the Matching Coefficient,”Biometrics, 23, 647–656.
GOODALL, D. W. (1978), “Sample Similarity and Species Correlation,” inOrdination of Plant Communities, Ed. R. H. Whittaker, The Hague: Junk, 101–149.
GOWER, J. C., and Legendre, P. (1986), “Metric and Euclidean Properties of Dissimilarity Coefficients,”Journal of Classification, 3, 5–48.
GREGSON, R. A. M. (1975),Psychometrics of Similarity, New York: Academic Press.
HELTSHE, J. F. (1988), “Jackknife Estimate of the Matching Coefficient of Similarity,”Biometrics, 44, 447–460.
HUBALEK, Z. (1982), “Coefficients of Association and Similarity, Based on Binary (Presence-Absence) Data: An Evaluation,”Biological Review, 57, 669–689.
JACCARD, P. (1900), “Contributions au problème de l'immigration post-glaciaire de la flore alpine,”Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 547–579.
JACCARD, P. (1908), “Nouvelles recherches sur la distribution florale,”Bulletin de la Société Vaudoise des Sciences Naturelles, 44, 223–270.
JANSON, S., and Vegelius, J. (1981), “Measures of Ecological Association,”Oecologia, 49, 371–376.
JOHNSON, B. E., and Millie, D. F. (1982), “The Estimation and Applicability of Confidence Intervals for Stander's Similarity Index (SIMI) in Algal Assemblage Comparisons,”Hydrobiologica, 89, 3–8.
JOHNSON, N. L. and Kotz, S. (1970),Distributions in Statistics: Continuous Distributions — 2, New York: Wiley.
RRESS, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T. (1986),Numerical Recipes, Cambridge: Cambridge University Press.
QUESADA, E., Ventosa, A., Rodgriguez-Valera, F., Megias, L., and Ramos-Cormenzana, A. (1983), “Numerical Taxonomy of Moderate Halophilic Gram-negative Bacteria from Hypersaline Soils,”Journal of General Microbiology, 129, 2649–2657.
SCHEFF, T. J. (1967), “Toward a Sociological Model of Consensus,”American Sociological Review, 32, 32–46.
SNEATH, P. H. A., and Sokal, R. R. (1973),Numerical Taxonomy, San Francisco: Freeman.
SNIJDERS, T. A. B. (1989), “Enumeration and Simulation Methods for 0–1 Matrices with Given Marginals,” Submitted for publication.
SOKAL, R. R., and Michener, C. D. (1958), “A Statistical Method for Evaluating Systematic Relationships,”University of Kansas Scientific Bulletin, 38, 1409–1438.
WASHINGTON, H. G. (1984), “Diversity, Biotic and Similarity Indices. A Review with Special Relevance to Aquatic Ecosystems,”Water Research, 18, 653–694.
WISHART, D. (1978),Clustan User Manual (3d ed.), Edinburgh: Program Library Unit, Edinburgh University.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Snijders, T.A.B., Dormaar, M., van Schuur, W.H. et al. Distribution of some similarity coefficients for dyadic binary data in the case of associated attributes. Journal of Classification 7, 5–31 (1990). https://doi.org/10.1007/BF01889701
Issue Date:
DOI: https://doi.org/10.1007/BF01889701