Data Science: Similarity, Dissimilarity and Correlation Functions
The lecture presents a new, non-statistical approach to the analysis and construction of similarity, dissimilarity and correlation measures. The measures are considered as functions defined on an underlying set and satisfying the given properties. Different functional structures, relationships between them and methods of their construction are discussed. Particular attention is paid to functions defined on sets with an involution operation, where the class of (strong) correlation functions is introduced. The general methods constructing new correlation functions from similarity and dissimilarity functions are considered. It is shown that the classical correlation and association coefficients (Pearson’s, Spearman’s, Kendall’s, Yule’s Q, Hamann) can be obtained as particular cases.
KeywordsSimilarity measure Pearson’s product-moment correlation Spearman’s rank correlation Kendall’s rank correlation Yule’s Q
This works partially supported by the project SIP 20196374 IPN and by Organizing Committee of RAAI Summer School. The author thanks all organizers of RAAI Summer School and editors of this book. Special thanks to doctors Gennady Osipov, Alexander Panov and Maria Koroleva.
- 2.Averkin, A.N., Batyrshin, I.Z., Blishun, A.F., Silov, V.B., Tarasov, V.B.: Fuzzy sets in models of control and artificial intelligence. Pospelov, D.A. (ed.) Nauka, Moscow (1986). (in Russian)Google Scholar
- 4.Batyrshin, I.Z.: Methods of system analysis based on weighted relations, Ph.D. dissertation. Moscow Power Engineering Institute, Moscow (1982). (in Russian)Google Scholar
- 6.Batyrshin, I., Rudas, T.: Invariant hierarchical clustering schemes. In: Batyrshin, I., Kacprzyk, J., Sheremetov, L., Zadeh, L.A. (eds.) Perception-Based Data Mining and Decision Making in Economics and Finance, pp. 181–206. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-36247-0_7CrossRefGoogle Scholar
- 8.Batyrshin, I., Monroy-Tenorio, F., Gelbukh, A., Villa-Vargas, L.A., Solovyev, V., Kubysheva, N.: Bipolar rating scales: a survey and novel correlation measures based on non-linear bipolar scoring functions. Acta Polytechnica Hungarica 14, 33–57 (2017)Google Scholar
- 10.Batyrshin, I.Z.: Constructing correlation coefficients from similarity and dissimilarity functions. In: INES 2019, IEEE 23rd IEEE International Conference on Intelligent Engineering Systems, Hungary, 25–27 April. IEEE, Gödöllő (2019)Google Scholar
- 13.Choi, S.S., Cha, S.H., Charles, C.T.: A survey of binary similarity and distance measures. J. Syst. Cybern. Inform. 8, 43–48 (2010)Google Scholar
- 23.Legendre, P., Legendre, L.F.: Numerical Ecology, 2nd edn. Elsevier, Amsterdam (1998). English edn. Google Scholar
- 25.Rauschenbach, G.V.: Proximity and similarity measures. In: Analysis of Non-Numerical Information in Sociological Research, Nauka, Moscow, pp. 169–202 (1985). (in Russian)Google Scholar
- 27.Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: 8th Proceedings of Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 32–41 (2002)Google Scholar