Skip to main content

Data Science: Similarity, Dissimilarity and Correlation Functions

  • Chapter
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11866))

Abstract

The lecture presents a new, non-statistical approach to the analysis and construction of similarity, dissimilarity and correlation measures. The measures are considered as functions defined on an underlying set and satisfying the given properties. Different functional structures, relationships between them and methods of their construction are discussed. Particular attention is paid to functions defined on sets with an involution operation, where the class of (strong) correlation functions is introduced. The general methods constructing new correlation functions from similarity and dissimilarity functions are considered. It is shown that the classical correlation and association coefficients (Pearson’s, Spearman’s, Kendall’s, Yule’s Q, Hamann) can be obtained as particular cases.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aherne, F.J., Thacker, N.A., Rockett, P.I.: The Bhattacharyya metric as an absolute similarity measure for frequency coded data. Kybernetika 34, 363–368 (1998)

    MathSciNet  MATH  Google Scholar 

  2. Averkin, A.N., Batyrshin, I.Z., Blishun, A.F., Silov, V.B., Tarasov, V.B.: Fuzzy sets in models of control and artificial intelligence. Pospelov, D.A. (ed.) Nauka, Moscow (1986). (in Russian)

    Google Scholar 

  3. Batagelj, V., Bren, M.: Comparing resemblance measures. J. Classif. 12, 73–90 (1995)

    Article  MathSciNet  Google Scholar 

  4. Batyrshin, I.Z.: Methods of system analysis based on weighted relations, Ph.D. dissertation. Moscow Power Engineering Institute, Moscow (1982). (in Russian)

    Google Scholar 

  5. Batyrshin, I.Z.: On fuzzinesstic measures of entropy on Kleene algebras. Fuzzy Sets Syst. 34, 47–60 (1990)

    Article  MathSciNet  Google Scholar 

  6. Batyrshin, I., Rudas, T.: Invariant hierarchical clustering schemes. In: Batyrshin, I., Kacprzyk, J., Sheremetov, L., Zadeh, L.A. (eds.) Perception-Based Data Mining and Decision Making in Economics and Finance, pp. 181–206. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-36247-0_7

    Chapter  Google Scholar 

  7. Batyrshin, I.Z.: On definition and construction of association measures. J. Intell. Fuzzy Syst. 29, 2319–2326 (2015)

    Article  MathSciNet  Google Scholar 

  8. Batyrshin, I., Monroy-Tenorio, F., Gelbukh, A., Villa-Vargas, L.A., Solovyev, V., Kubysheva, N.: Bipolar rating scales: a survey and novel correlation measures based on non-linear bipolar scoring functions. Acta Polytechnica Hungarica 14, 33–57 (2017)

    Google Scholar 

  9. Batyrshin, I.: Towards a general theory of similarity and association measures: similarity, dissimilarity and correlation functions. J. Intell. Fuzzy Syst. 36(4), 2977–3004 (2019)

    Article  MathSciNet  Google Scholar 

  10. Batyrshin, I.Z.: Constructing correlation coefficients from similarity and dissimilarity functions. In: INES 2019, IEEE 23rd IEEE International Conference on Intelligent Engineering Systems, Hungary, 25–27 April. IEEE, Gödöllő (2019)

    Google Scholar 

  11. Birkhoff, G.: Lattice Theory, 3rd edn. American Mathematical Society, Providence (1967)

    MATH  Google Scholar 

  12. Chen, P.Y., Popovich, P.M.: Correlation: Parametric and Nonparametric Measures. Sage, Thousand Oaks (2002)

    Book  Google Scholar 

  13. Choi, S.S., Cha, S.H., Charles, C.T.: A survey of binary similarity and distance measures. J. Syst. Cybern. Inform. 8, 43–48 (2010)

    Google Scholar 

  14. Clifford, H.T., Stephenson, W.: An Introduction to Numerical Classification. Academic Press, New York (1975)

    MATH  Google Scholar 

  15. De Luca, A., Termini, S.: A definition of a nonprobabilistic entropy in the setting of fuzzy sets. Inform. Control 20, 301–312 (1972)

    Article  MathSciNet  Google Scholar 

  16. Dunn, J.C.: A graph theoretic analysis of pattern classification via Tamura’s fuzzy relation. IEEE Trans. Syst. Man Cybern. 3, 310–313 (1974)

    Article  Google Scholar 

  17. Fodor, J.C., Roubens, M.R.: Fuzzy Preference Modelling and Multicriteria Decision Support, vol. 14. Springer, Dordrecht (1994). https://doi.org/10.1007/978-94-017-1648-2

    Book  MATH  Google Scholar 

  18. Gibbons, J.D., Chakraborti, S.: Nonparametric Statistical Inference, 4th edn. Dekker, New York (2003)

    MATH  Google Scholar 

  19. Gower, J.C., Ross, G.J.S.: Minimum spanning trees and single linkage cluster analysis. Appl. Stat. 18, 54–64 (1969)

    Article  MathSciNet  Google Scholar 

  20. Janson, S., Vegelius, J.: Measures of ecological association. Oecologia 49, 371–376 (1981)

    Article  Google Scholar 

  21. Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 32, 241–254 (1967)

    Article  Google Scholar 

  22. Kendall, M.G.: Rank Correlation Methods, 4th edn. Griffin, London (1970)

    MATH  Google Scholar 

  23. Legendre, P., Legendre, L.F.: Numerical Ecology, 2nd edn. Elsevier, Amsterdam (1998). English edn.

    Google Scholar 

  24. Lesot, M-J., Rifqi, M., Benhadda, H.: Similarity measures for binary and numerical data: a survey. Int. J. Knowl. Eng. Soft Data Paradigms 1, 63–84 (2009)

    Article  Google Scholar 

  25. Rauschenbach, G.V.: Proximity and similarity measures. In: Analysis of Non-Numerical Information in Sociological Research, Nauka, Moscow, pp. 169–202 (1985). (in Russian)

    Google Scholar 

  26. Tamura, S., Higuchi, S., Tanaka, K.: Pattern classification based on fuzzy relations. IEEE Trans. Syst. Man Cybern. 1, 61–66 (1971)

    Article  MathSciNet  Google Scholar 

  27. Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: 8th Proceedings of Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 32–41 (2002)

    Google Scholar 

  28. Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)

    Article  Google Scholar 

  29. Zadeh, L.A.: Similarity relations and fuzzy orderings. Inf. Sci. 3, 177–200 (1971)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This works partially supported by the project SIP 20196374 IPN and by Organizing Committee of RAAI Summer School. The author thanks all organizers of RAAI Summer School and editors of this book. Special thanks to doctors Gennady Osipov, Alexander Panov and Maria Koroleva.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ildar Z. Batyrshin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Batyrshin, I.Z. (2019). Data Science: Similarity, Dissimilarity and Correlation Functions. In: Osipov, G., Panov, A., Yakovlev, K. (eds) Artificial Intelligence. Lecture Notes in Computer Science(), vol 11866. Springer, Cham. https://doi.org/10.1007/978-3-030-33274-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33274-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33273-0

  • Online ISBN: 978-3-030-33274-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics