Towards a Classification of Binary Similarity Measures

  • Ivan Ramirez Mejia
  • Ildar BatyrshinEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10632)


Similarity measures for binary variables are used in many problems of machine learning, pattern recognition and classification. Currently, the dozens of similarity measures are introduced and the problem of comparative analysis of these measures appears. One of the methods used for such analysis is clustering of similarity measures based on correlation between data similarity values obtained by different measures. The paper proposes the method of comparative analysis of similarity measures based on the set theoretic representation of these measures and comparison of algebraic properties of these representations. The results show existing relationship between results of clustering and the classification of measures by their properties. Due to the results of clustering depend on the clustering method and on data used for measuring correlation between measures we conclude that the classification based on the proposed properties of similarity measures is more suitable for comparative analysis of similarity measures.


Similarity measure Binary data Contingency table Clustering 



The work is partially supported by the projects SIP 20171344, BEIFI of IPN and 283778 of CONACYT.


  1. 1.
    Batyrshin, I.: On definition and construction of association measures. J. Intell. Fuzzy Syst. 29, 2319–2326 (2015)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Batyrshin, I.Z., Kubysheva, N., Solovyev, V., Villa-Vargas, L.A.: Visualization of similarity measures for binary data and 2 × 2 tables. Computación y Sistemas 20(3), 345–353 (2016)CrossRefGoogle Scholar
  3. 3.
    Batagelj, V., Bren, M.: Comparing resemblance measures. J. Classif. 12(1), 73–90 (1995)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Baulieu, F.B.: A classification of presence/absence based dissimilarity coefficients. J. Classif. 6(1), 233–246 (1989)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Choi, S.S., Cha, S.H., Charles, C.T.: A survey of binary similarity and distance measures. J. Syst. Cybern. Inf. 8, 43–48 (2010)Google Scholar
  6. 6.
    Clifford, H.T., Stephenson, W.: An Introduction to Numerical Classification, vol. 229. Academic Press, New York (1975)zbMATHGoogle Scholar
  7. 7.
    Duarte, J.M., Santos, J.B.D., Melo, L.C.: Comparison of similarity coefficients based on RAPD markers in the common bean. Genet. Mol. Biol. 22(3), 427–432 (1999)CrossRefGoogle Scholar
  8. 8.
    Goodman, L.A., Kruskal, W.H.: Measures of association for cross classifications. J. Am. Stat. Assoc. 49, 732–764 (1954)zbMATHGoogle Scholar
  9. 9.
    Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 857–871Google Scholar
  10. 10.
    Gower, J.C., Legendre, P.: Metric and Euclidean properties of dissimilarity coefficients. J. Classif. 3(1), 5–48 (1986)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Hassanat, A.B.: Dimensionality invariant similarity measure. J. Am. Sci. 221–226 (2014)Google Scholar
  12. 12.
    Johnston, J.W.: Similarity indices I: what do they measure? In: Energy Research and Development Administration, vol. 136 (1976)Google Scholar
  13. 13.
    Legendre, P., Legendre, L.F.: Numerical Ecology, 2nd edn. Elsevier, Amsterdam (1998)zbMATHGoogle Scholar
  14. 14.
    Lesot, M.-J., Rifqi, M., Benhadda, H.: Similarity measures for binary and numerical data: a survey. Int. J. Knowl. Eng. Soft Data Paradig. 1(1), 63–84 (2009)CrossRefGoogle Scholar
  15. 15.
    Meilă, M.: Comparing clusterings: an information based distance. J. Multivar. Anal. 98, 873–895 (2007)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Meyer, A.D.S., Garcia, A.A.F., Souza, A.P.D., Souza Jr., C.L.D.: Comparison of similarity coefficients used for cluster analysis with dominant markers in maize (Zea mays L). Genet. Mol. Biol. 27(1), 83–91 (2004)CrossRefGoogle Scholar
  17. 17.
    Pearson, K., Blakeman, J.: Mathematical contributions to the theory of evolution. In: 13th on the Theory of Contingency and Its Relation to Association and Normal Correlation. Dulau & Co., London (1912)Google Scholar
  18. 18.
    Pfitzner, D., Leibbrandt, R., Powers, D.: Characterization and evaluation of similarity measures for pairs of clusterings. Knowl. Inf. Syst. 19, 361–394 (2009)CrossRefGoogle Scholar
  19. 19.
    Rodríguez-Salazar, M.E., Álvarez-Hernández, S., Bravo-Núñez, E.: Coeficientes de asociación. Plaza y Valdés Editores, México (2001)Google Scholar
  20. 20.
    Sidorov, G., Gelbukh, A., Gómez-Adorno, H., Pinto, D.: Soft similarity and soft cosine measure: similarity of features in vector space model. Computación y Sistemas 18(3), 491–504 (2014)CrossRefGoogle Scholar
  21. 21.
    Sokal, R.R., Sneath, P.H.A.: Principles of Numerical Taxonomy. WH Freeman, New York (1963)zbMATHGoogle Scholar
  22. 22.
    Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 32–41 (2002)Google Scholar
  23. 23.
    Tversky, A.: Features of similarity. Psychol. Rev. 84, 327–352 (1977)CrossRefGoogle Scholar
  24. 24.
    Warrens, M.J.: A comparison of multi-way similarity coefficients for binary sequences. Int. J. Res. Rev. Appl. Sci. 16(1), 12 (2013)MathSciNetGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Centro de Investigación en ComputaciónInstituto Politécnico NacionalCiudad de MéxicoMexico

Personalised recommendations