Data Science: Similarity, Dissimilarity and Correlation Functions

  • Ildar Z. BatyrshinEmail author
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11866)


The lecture presents a new, non-statistical approach to the analysis and construction of similarity, dissimilarity and correlation measures. The measures are considered as functions defined on an underlying set and satisfying the given properties. Different functional structures, relationships between them and methods of their construction are discussed. Particular attention is paid to functions defined on sets with an involution operation, where the class of (strong) correlation functions is introduced. The general methods constructing new correlation functions from similarity and dissimilarity functions are considered. It is shown that the classical correlation and association coefficients (Pearson’s, Spearman’s, Kendall’s, Yule’s Q, Hamann) can be obtained as particular cases.


Similarity measure Pearson’s product-moment correlation Spearman’s rank correlation Kendall’s rank correlation Yule’s Q 



This works partially supported by the project SIP 20196374 IPN and by Organizing Committee of RAAI Summer School. The author thanks all organizers of RAAI Summer School and editors of this book. Special thanks to doctors Gennady Osipov, Alexander Panov and Maria Koroleva.


  1. 1.
    Aherne, F.J., Thacker, N.A., Rockett, P.I.: The Bhattacharyya metric as an absolute similarity measure for frequency coded data. Kybernetika 34, 363–368 (1998)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Averkin, A.N., Batyrshin, I.Z., Blishun, A.F., Silov, V.B., Tarasov, V.B.: Fuzzy sets in models of control and artificial intelligence. Pospelov, D.A. (ed.) Nauka, Moscow (1986). (in Russian)Google Scholar
  3. 3.
    Batagelj, V., Bren, M.: Comparing resemblance measures. J. Classif. 12, 73–90 (1995)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Batyrshin, I.Z.: Methods of system analysis based on weighted relations, Ph.D. dissertation. Moscow Power Engineering Institute, Moscow (1982). (in Russian)Google Scholar
  5. 5.
    Batyrshin, I.Z.: On fuzzinesstic measures of entropy on Kleene algebras. Fuzzy Sets Syst. 34, 47–60 (1990)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Batyrshin, I., Rudas, T.: Invariant hierarchical clustering schemes. In: Batyrshin, I., Kacprzyk, J., Sheremetov, L., Zadeh, L.A. (eds.) Perception-Based Data Mining and Decision Making in Economics and Finance, pp. 181–206. Springer, Heidelberg (2007). Scholar
  7. 7.
    Batyrshin, I.Z.: On definition and construction of association measures. J. Intell. Fuzzy Syst. 29, 2319–2326 (2015)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Batyrshin, I., Monroy-Tenorio, F., Gelbukh, A., Villa-Vargas, L.A., Solovyev, V., Kubysheva, N.: Bipolar rating scales: a survey and novel correlation measures based on non-linear bipolar scoring functions. Acta Polytechnica Hungarica 14, 33–57 (2017)Google Scholar
  9. 9.
    Batyrshin, I.: Towards a general theory of similarity and association measures: similarity, dissimilarity and correlation functions. J. Intell. Fuzzy Syst. 36(4), 2977–3004 (2019)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Batyrshin, I.Z.: Constructing correlation coefficients from similarity and dissimilarity functions. In: INES 2019, IEEE 23rd IEEE International Conference on Intelligent Engineering Systems, Hungary, 25–27 April. IEEE, Gödöllő (2019)Google Scholar
  11. 11.
    Birkhoff, G.: Lattice Theory, 3rd edn. American Mathematical Society, Providence (1967)zbMATHGoogle Scholar
  12. 12.
    Chen, P.Y., Popovich, P.M.: Correlation: Parametric and Nonparametric Measures. Sage, Thousand Oaks (2002)CrossRefGoogle Scholar
  13. 13.
    Choi, S.S., Cha, S.H., Charles, C.T.: A survey of binary similarity and distance measures. J. Syst. Cybern. Inform. 8, 43–48 (2010)Google Scholar
  14. 14.
    Clifford, H.T., Stephenson, W.: An Introduction to Numerical Classification. Academic Press, New York (1975)zbMATHGoogle Scholar
  15. 15.
    De Luca, A., Termini, S.: A definition of a nonprobabilistic entropy in the setting of fuzzy sets. Inform. Control 20, 301–312 (1972)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Dunn, J.C.: A graph theoretic analysis of pattern classification via Tamura’s fuzzy relation. IEEE Trans. Syst. Man Cybern. 3, 310–313 (1974)CrossRefGoogle Scholar
  17. 17.
    Fodor, J.C., Roubens, M.R.: Fuzzy Preference Modelling and Multicriteria Decision Support, vol. 14. Springer, Dordrecht (1994). Scholar
  18. 18.
    Gibbons, J.D., Chakraborti, S.: Nonparametric Statistical Inference, 4th edn. Dekker, New York (2003)zbMATHGoogle Scholar
  19. 19.
    Gower, J.C., Ross, G.J.S.: Minimum spanning trees and single linkage cluster analysis. Appl. Stat. 18, 54–64 (1969)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Janson, S., Vegelius, J.: Measures of ecological association. Oecologia 49, 371–376 (1981)CrossRefGoogle Scholar
  21. 21.
    Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 32, 241–254 (1967)CrossRefGoogle Scholar
  22. 22.
    Kendall, M.G.: Rank Correlation Methods, 4th edn. Griffin, London (1970)zbMATHGoogle Scholar
  23. 23.
    Legendre, P., Legendre, L.F.: Numerical Ecology, 2nd edn. Elsevier, Amsterdam (1998). English edn. Google Scholar
  24. 24.
    Lesot, M-J., Rifqi, M., Benhadda, H.: Similarity measures for binary and numerical data: a survey. Int. J. Knowl. Eng. Soft Data Paradigms 1, 63–84 (2009)CrossRefGoogle Scholar
  25. 25.
    Rauschenbach, G.V.: Proximity and similarity measures. In: Analysis of Non-Numerical Information in Sociological Research, Nauka, Moscow, pp. 169–202 (1985). (in Russian)Google Scholar
  26. 26.
    Tamura, S., Higuchi, S., Tanaka, K.: Pattern classification based on fuzzy relations. IEEE Trans. Syst. Man Cybern. 1, 61–66 (1971)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: 8th Proceedings of Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 32–41 (2002)Google Scholar
  28. 28.
    Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)CrossRefGoogle Scholar
  29. 29.
    Zadeh, L.A.: Similarity relations and fuzzy orderings. Inf. Sci. 3, 177–200 (1971)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Instituto Politécnico Nacional, Centro de Investigación en ComputaciónCiudad de México, CDMXMexico

Personalised recommendations