Skip to main content
Log in

Distribution of some similarity coefficients for dyadic binary data in the case of associated attributes

La distribution des coefficients de similarité pour les données binaires et les attributs associés

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Parameters are derived of distributions of three coefficients of similarity between pairs (dyads) of operational taxonomic units for multivariate binary data (presence/absence of attributes) under statistical independence. These are applied to test independence for dyadic data. Association among attributes within operational taxonomic units is allowed. It is also permissible for the two units in the dyad to be drawn from different populations having different presence probabilities of attributes. The variance of the distribution of the similarity coefficients under statistical independence is shown to be relatively large in many empirical situations. This result implies that the practical interpretation of these coefficients requires much care. An application using the Jaccard index is given for the assessment of consensus between psychotherapists and their clients.

Résumé

Les paramètres de la distribution de trois coefficients de similarité entre paires d'éléments taxinomiques opérationels de données multivariables binaires (présence/absence) ont été dérivés dans l'hypothèse d'indépendance statistique. Ces paramètres sont utilisés dans un test d'indépendance pour les données dyadiques. L'existence est autorisée, dans la population d'éléments, d'une association entre plusieurs attributs. Il est également permis que les deux éléments de la dyade soient tirés de deux populations différentes, ayant différentes probabilit és quant à la présence des attributs. Dans beaucoup de situations empiriques, la variance des coefficients de similarité peut être relativement élevée dans le cas d'indépendance statistique. Par conséquence, ces coefficients doivent être interprétés avec précaution. Un exemple est donné pour le coefficient de Jaccard, qui a été employé dans une recherche sur la concordance entre des psychothérapeutes et leurs clients.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • ANDERBERG, M. R. (1973),Cluster Analysis for Applications, New York: Academic Press.

    Google Scholar 

  • AUSTIN, B., and Colwell, R. R. (1977), “Evaluation of Some Coefficients for Use in Numerical Taxonomy of Microorganisms,”International Journal of Systematic Bacteriology, 27, 204–210.

    Google Scholar 

  • BARONI-URBANI, C., and BUSER, M. W. (1976), “Similarity of Binary Data,”Systematic Zoology, 25, 251–259.

    Google Scholar 

  • BARONI-URBANI, C. (1980), “A Statistical Table for the Degree of Coexistence Between Two Species,”Oecologia, 44, 287–289.

    Google Scholar 

  • BISHOP, Y. M. M., Fienberg, S. E., and Holland, P. W. (1975),Discrete Multivariate Analysis: Theory and Practice, Cambridge, Mass.: MIT Press.

    Google Scholar 

  • COLEMAN, B. D., Mares, M. A., Willig, M. R., and Hsieh, Y.-H. (1982), “Randomness, Area and Species Richness,”Ecology, 63, 1121–1133.

    Google Scholar 

  • CONNOR, E. F., and Simberloff, D. (1978), “Species Number and Compositional Similarity of the Galapagos Flora and Avifauna,”Ecological Monographs, 48, 219–248.

    Google Scholar 

  • COQUIN-VIENNOT, D. (1975), “Recherche d'une organisation mnemonique interne dans un ensemble de donnees,”Annee Psychologique, 75, 575–597.

    Google Scholar 

  • DICE, L. R. (1945), “Measures of the Amount of Ecological Association Between Species,”Ecology, 26, 297–302.

    Google Scholar 

  • DICE, L. R. (1952), “Measure of the Spacing Between Individuals Within a Population,”Contributions of the Laboratory of Vertebrate Biology of the University of Michigan, 55, 1–23.

    Google Scholar 

  • DORMAAR, M., Dijkman-Caes, C., and De Vries, M. W. (1989), “Consensus in Client-Therapist Interactions; A Measure of the Therapeutic Relationship Related to Outcome,” Accepted for publication,Psychotherapy and Psychosomatics.

  • ELSTON, R.C., Schroeder, S. R., and Rohjan, J. (1982), “Measures of Observer Agreement When Binomial Data Are Collected in Free Operant Situations,”Journal of Behavioral Assessment, 4, 299–310.

    Google Scholar 

  • EVERITT, B. S. (1980),Cluster Analysis (2nd ed.), London: Gower.

    Google Scholar 

  • GOODALL, D. W. (1967), “The Distribution of the Matching Coefficient,”Biometrics, 23, 647–656.

    Google Scholar 

  • GOODALL, D. W. (1978), “Sample Similarity and Species Correlation,” inOrdination of Plant Communities, Ed. R. H. Whittaker, The Hague: Junk, 101–149.

  • GOWER, J. C., and Legendre, P. (1986), “Metric and Euclidean Properties of Dissimilarity Coefficients,”Journal of Classification, 3, 5–48.

    Google Scholar 

  • GREGSON, R. A. M. (1975),Psychometrics of Similarity, New York: Academic Press.

    Google Scholar 

  • HELTSHE, J. F. (1988), “Jackknife Estimate of the Matching Coefficient of Similarity,”Biometrics, 44, 447–460.

    Google Scholar 

  • HUBALEK, Z. (1982), “Coefficients of Association and Similarity, Based on Binary (Presence-Absence) Data: An Evaluation,”Biological Review, 57, 669–689.

    Google Scholar 

  • JACCARD, P. (1900), “Contributions au problème de l'immigration post-glaciaire de la flore alpine,”Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 547–579.

    Google Scholar 

  • JACCARD, P. (1908), “Nouvelles recherches sur la distribution florale,”Bulletin de la Société Vaudoise des Sciences Naturelles, 44, 223–270.

    Google Scholar 

  • JANSON, S., and Vegelius, J. (1981), “Measures of Ecological Association,”Oecologia, 49, 371–376.

    Google Scholar 

  • JOHNSON, B. E., and Millie, D. F. (1982), “The Estimation and Applicability of Confidence Intervals for Stander's Similarity Index (SIMI) in Algal Assemblage Comparisons,”Hydrobiologica, 89, 3–8.

    Google Scholar 

  • JOHNSON, N. L. and Kotz, S. (1970),Distributions in Statistics: Continuous Distributions — 2, New York: Wiley.

    Google Scholar 

  • RRESS, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T. (1986),Numerical Recipes, Cambridge: Cambridge University Press.

    Google Scholar 

  • QUESADA, E., Ventosa, A., Rodgriguez-Valera, F., Megias, L., and Ramos-Cormenzana, A. (1983), “Numerical Taxonomy of Moderate Halophilic Gram-negative Bacteria from Hypersaline Soils,”Journal of General Microbiology, 129, 2649–2657.

    Google Scholar 

  • SCHEFF, T. J. (1967), “Toward a Sociological Model of Consensus,”American Sociological Review, 32, 32–46.

    Google Scholar 

  • SNEATH, P. H. A., and Sokal, R. R. (1973),Numerical Taxonomy, San Francisco: Freeman.

    Google Scholar 

  • SNIJDERS, T. A. B. (1989), “Enumeration and Simulation Methods for 0–1 Matrices with Given Marginals,” Submitted for publication.

  • SOKAL, R. R., and Michener, C. D. (1958), “A Statistical Method for Evaluating Systematic Relationships,”University of Kansas Scientific Bulletin, 38, 1409–1438.

    Google Scholar 

  • WASHINGTON, H. G. (1984), “Diversity, Biotic and Similarity Indices. A Review with Special Relevance to Aquatic Ecosystems,”Water Research, 18, 653–694.

    Google Scholar 

  • WISHART, D. (1978),Clustan User Manual (3d ed.), Edinburgh: Program Library Unit, Edinburgh University.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Snijders, T.A.B., Dormaar, M., van Schuur, W.H. et al. Distribution of some similarity coefficients for dyadic binary data in the case of associated attributes. Journal of Classification 7, 5–31 (1990). https://doi.org/10.1007/BF01889701

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01889701

Keywords

Navigation