Advertisement

Journal of Classification

, Volume 26, Issue 2, pp 227–245 | Cite as

k-Adic Similarity Coefficients for Binary (Presence/Absence) Data

  • Matthijs J. Warrens
Article

Abstract

k-Adic formulations (for groups of objects of size k) of a variety of 2-adic similarity coefficients (for pairs of objects) for binary (presence/absence) data are presented. The formulations are not functions of 2-adic similarity coefficients. Instead, the main objective of the the paper is to present k-adic formulations that reflect certain basic characteristics of, and have a similar interpretation as, their 2-adic versions. Two major classes are distinguished. The first class is referred to as Bennani-Heiser similarity coefficients, which contains all coefficients that can be defined using just the matches, the number of attributes that are present and that are absent in k objects, and the total number of attributes. The coefficients in the second class can be formulated as functions of Dice’s association indices.

Keywords

Indices of association Resemblance measures Simple matching coefficient Jaccard coefficient Dice/Sørenson coefficient Rand index Global order equivalence 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. ALBATINEH, A.N., NIEWIADOMSKA-BUGAJ, M., and MIHALKO, D. (2006), “On Similarity Indices and Correction for Chance Agreement,” Journal of Classification, 23, 301–313.CrossRefMathSciNetGoogle Scholar
  2. BARONI-URBANI, C. and BUSER, M.W. (1976), “Similarity of Binary Data,” Systematic Zoology, 25, 251–259.CrossRefGoogle Scholar
  3. BATAGELJ, V. and BREN, M. (1995), “Comparing Resemblance Measures,” Journal of Classification, 12, 73–90.MATHCrossRefMathSciNetGoogle Scholar
  4. BAULIEU, F.B. (1989), “A Classification of Presence/Absence Based Dissimilarity Coefficients,” Journal of Classification, 6, 233–246.MATHCrossRefMathSciNetGoogle Scholar
  5. BENNANI-DOSSE,M. (1993), Analyses Métriques á Trois Voies, Ph.D. Dissertation, Université de Haute Bretagne Rennes II, France.Google Scholar
  6. BRAUN-BLANQUET, J. (1932), Plant Sociology: The Study of Plant Communities, Authorized English translation of Pflanzensoziologie, New York: McGraw-Hill.Google Scholar
  7. BRAY, J.R. (1956), “A Study of Mutual Occurrence of Plant Species,” Ecology, 37, 21–28.CrossRefGoogle Scholar
  8. CHEETHAM, A.H. and HAZEL, J.E. (1969), “Binary (Presence-Absence) Similarity Coefficients,” Journal of Paleontology, 43, 1130–1136.Google Scholar
  9. COX, T.F., COX, M.A.A., and BRANCO, J.A. (1991), “Multidimensional Scaling of n-Tuples,” British Journal of Mathematical and Statistical Psychology, 44, 195–206.MATHGoogle Scholar
  10. CZEKANOWSKI, J. (1932), “Coefficient of Racial Likeliness und Durchschnittliche Differenz,” Anthropologischer Anzeiger, 9, 227–249.Google Scholar
  11. DAWS, J.T. (1996), “The Analysis of Free-sorting Data: Beyond Pairwise Comparison,” Journal of Classification, 13, 57–80.MATHCrossRefGoogle Scholar
  12. DE ROOIJ, M. and GOWER, J.C. (2003), “The Geometry of Triadic Distances,” Journal of Classification, 20, 181–220.MATHCrossRefMathSciNetGoogle Scholar
  13. DICE, L.R. (1945), “Measures of the Amount of Ecologic Association Between Species”, Ecology, 26, 297–302.CrossRefGoogle Scholar
  14. FICHET, B. (1986), “Distances and Euclidean Distances for Presence-Absence Characters and Their Application to Factor Analysis,” in Multidimensional Data Analysis, Eds. J. de Leeuw, W.J. Heiser, J.J. Meulman and F. Critchley, Leiden: DSWO Press, 23–46.Google Scholar
  15. FOWLKES, E.B. and MALLOWS, C.L. (1983), “A Method for Comparing Two Hierarchical Clusterings,” Journal of the American Statistical Association, 78, 553–569.MATHCrossRefGoogle Scholar
  16. GLEASON, H.A. (1920), “Some Applications of the Quadrat Method,” Bulletin of the Torrey Botanical Club, 47, 21–33.CrossRefGoogle Scholar
  17. GOWER, J.C. (1986), “Euclidean Distance Matrices,” in Multidimensional Data Analysis, Eds. J. de Leeuw, W.J. Heiser, J.J. Meulman and F. Critchley, Leiden: DSWO Press, 11–22.Google Scholar
  18. GOWER, J.C. and LEGENDRE, P. (1986), “Metric and Euclidean Properties of Dissimilarity Coefficients,” Journal of Classification, 3, 5–48.MATHCrossRefMathSciNetGoogle Scholar
  19. GOWER, J.C. and HAND, D.J. (1996), Biplots, London: Chapman and Hall.MATHGoogle Scholar
  20. HAMANN, U. (1961), “Merkmalsbestand und Verwandtschaftsbeziehungen der Farinose. Ein Betrag zum System der Monokotyledonen,” Willdenowia, 2, 639–768.Google Scholar
  21. HEISER, W.J. and BENNANI, M. (1997), “Triadic Distance Models: Axiomatization and Least Squares Representation,” Journal of Mathematical Psychology, 41, 189–206.MATHCrossRefMathSciNetGoogle Scholar
  22. HOLLEY, J.W. and GUILFORD, J.P. (1964), “A Note on the G Index of Agreement,” Educational and Psychological Measurement, 24, 749–753.CrossRefGoogle Scholar
  23. HUBÁLEK, Z. (1982), “Coefficients of Association and Similarity Based on Binary (Presence-Absence) Data: An Evaluation,” Biological Reviews, 57, 669–689.CrossRefGoogle Scholar
  24. HUBERT, L.J. (1977), “Nominal Scale Response Agreement as a Generalized Correlation,” British Journal of Mathematical and Statistical Psychology, 30, 98–103.MATHMathSciNetGoogle Scholar
  25. HUBERT, L.J. and ARABIE, P. (1985), “Comparing Partitions,” Journal of Classification, 2, 193–218.CrossRefGoogle Scholar
  26. JACCARD, P. (1912), “The Distribution of the Flora in the Alpine Zone,” The New Phytologist, 11, 37-5-0.Google Scholar
  27. JANSON, S. and VEGELIUS, J. (1981), “Measures of Ecological Association,” Oecologia, 49, 371–376.CrossRefGoogle Scholar
  28. JOLY, S. and LE CALVÉ, G. (1995), “Three-way Distances,” Journal of Classification, 12, 191–205.MATHCrossRefMathSciNetGoogle Scholar
  29. KULCZYŃSKI, S. (1927), “Die Pflanzenassociationen der Pienenen,” Bulletin International de L’Académie Polonaise des Sciences et des Letters, classe des sciences mathematiques et naturelles, Serie B, Supplément II, 2, 57–203.Google Scholar
  30. MAYS, M.E. (1983), “Functions Which Parametrize Means,” The American Mathematical Monthly, 90, 677–683.MATHCrossRefMathSciNetGoogle Scholar
  31. MCCONNAUGHEY, B.H. (1964), “The Determination and Analysis of Plankton Communities,” Marine Research, Special No, Indonesia, 1–40.Google Scholar
  32. NEI, M. and LI, W.-H. (1979), “Mathematical Model for Studying Genetic Variation in Terms of Restriction Endonucleases,” Proceedings of the National Academy of Sciences of the United States of America, 76, 5269–5273.MATHCrossRefGoogle Scholar
  33. OCHIAI, A. (1957), “Zoogeographic Studies on the Soleoid Fishes Found in Japan and Its Neighboring Regions,” Bulletin of the Japanese Society for Fish Science, 22, 526–530.Google Scholar
  34. RAND, W. (1971), “Objective Criteria for the Evaluation of Clustering Methods,” Journal of the American Statistical Association, 66, 846–850.CrossRefGoogle Scholar
  35. ROGERS, D.J. and TANIMOTO, T.T. (1960), “A Computer Program for Classifying Plants,” Science, 132, 1115–1118.CrossRefGoogle Scholar
  36. RUSSEL, P.F. and RAO, T.R. (1940), “On Habitat and Association of Species of Anopheline Larvae in South-Eastern Madras,” Journal of Malaria Institute India, 3, 153–178.Google Scholar
  37. SIBSON, R. (1972), “Order Invariant Methods for Data Analysis,” Journal of the Royal Statistical Society, Series B, 34, 311–349.MATHMathSciNetGoogle Scholar
  38. SIMPSON, G.G. (1943), “Mammals and the Nature of Continents,” American Journal of Science, 241, 1–31.Google Scholar
  39. SOKAL, R.R. and MICHENER, C.D. (1958), “A Statistical Method for Evaluating Systematic Relationships”, University of Kansas Science Bulletin, 38, 1409–1438.Google Scholar
  40. SOKAL, R.R. and SNEATH, R.H. (1963), Principles of Numerical Taxonomy, San Francisco: W. H. Freeman and Company.Google Scholar
  41. SØRENSON, T. (1948), “A Method of Stabilizing Groups of Equivalent Amplitude in Plant Sociology Based on the Similarity of Species Content and Its Application to Analyses of the Vegetation on Danish Commons,” Kongelige Danske Videnskabernes Selskab Biologiske Skrifter, 5, 1–34.Google Scholar
  42. SORGENFREI, T. (1958), Molluscan Assemblages from the Marine Middle Miocene of South Jutland and Their Environments, Copenhagen: Reitzel.Google Scholar
  43. WALLACE, D.L. (1983), “A Method for Comparing Two Hierarchical Clusterings: Comment,” Journal of the American Statistical Association, 78, 569–576.CrossRefGoogle Scholar
  44. WARRENS, M.J. (2008a), “On the Indeterminacy of Resemblance Measures for Binary (Presence/Absence) Data,” Journal of Classification, 25, 125–136.CrossRefMathSciNetGoogle Scholar
  45. WARRENS, M.J. (2008b), “On Similarity Coefficients for 2 × 2 Tables and Correction for Chance,” Psychometrika, 73, 487–502.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Psychometrics and Research Methodology Group, Leiden University Institute for Psychological ResearchLeiden UniversityLeidenThe Netherlands

Personalised recommendations