Abstract
Recently new methods for measuring and testing dependence have appeared in the literature. One way to evaluate and compare these measures with each other and with classical ones is to consider what are reasonable and natural axioms that should hold for any measure of dependence. We propose four natural axioms for dependence measures and establish which axioms hold or fail to hold for several widely applied methods. All of the proposed axioms are satisfied by distance correlation. We prove that if a dependence measure is defined for all bounded nonconstant real valued random variables and is invariant with respect to all one-to-one measurable transformations of the real line, then the dependence measure cannot be weakly continuous. This implies that the classical maximal correlation cannot be continuous and thus its application is problematic. The recently introduced maximal information coefficient has the same disadvantage. The lack of weak continuity means that as the sample size increases the empirical values of a dependence measure do not necessarily converge to the population value.
Similar content being viewed by others
References
Dedecker J, Prieur C (2005) New dependence coefficients. Examples and applications to statistics. Probab Theory Relat Fields 132:203–236. https://doi.org/10.1007/s00440-004-0394-3
Dueck J, Edelmann D, Gneiting T, Richards D (2014) The affinely invariant distance correlation. Bernoulli 20:2305–2330. https://doi.org/10.3150/13-BEJ558
Eaton ML (1989) Group invariance. Applications in statistics, NSF-CBMS regional conference series in probability and statistics 1. IMS, Hayward
Escoufier Y (1973) Le Traitement des Variables Vectorielles. Biometrics 29:751–760. https://doi.org/10.2307/2529140
Gebelein H (1941) Das statistische Problem der Korrelation als Variations- und Eigenwert-problem und sein Zusammenhang mit der Ausgleichungsrechnung. Z Angew Math Mech 21:364–379. https://doi.org/10.1002/zamm.19410210604
Gouvêa FQ (2011) Was cantor surprised? Am Math Mon 118:198–209. https://doi.org/10.4169/amer.math.monthly.118.03.198
Hirschfeld HO (1935) A connection between correlation and contingency. Math Proc Camb Philos Soc 31:520–524. https://doi.org/10.1017/S0305004100013517
Hoeffding W (1940) Masstabinvariante Korrelationstherie. Schr Math Inst und Inst Angew Math Univ Berlin 5:181–233
Hoeffding W (1948) A non-parametric test of independence. Ann Math Stat 19:546–557. https://doi.org/10.1214/aoms/1177730150
Huang Q, Zhu Y (2016) Model-free sure screening via maximum correlation. J Multivar Anal 148:89–106. https://doi.org/10.1016/j.jmva.2016.02.014
Jakobsen ME (2017) Distance covariance in metric spaces: non-parametric independence testing in metric spaces. arXiv:1706.03490. Accessed 9 Jan 2018
Josse J, Holmes S (2014) Tests of independence and beyond. arXiv:1307.7383v3. Accessed 9 Jan 2018
Kendall MG (1938) A new measure of rank correlation. Biometrika 30:81–93. https://doi.org/10.2307/2332226
Kimeldorf G, Sampson AR (1978) Monotone dependence. Ann Stat 6:895–903. https://doi.org/10.1214/aos/1176344262
Lehmann EL (1966) Some concepts of dependence. Ann Math Stat 37:1137–1153. https://doi.org/10.1214/aoms/1177699260
Lehmann EL, Romano JP (2005) Testing statistical hypotheses, 3rd edn. Springer, New York. https://doi.org/10.1007/0-387-27605-X
Linfoot EH (1957) An informational measure of correlation. Inf Control 1:85–89. https://doi.org/10.1016/S0019-9958(57)90116-X
López Blázquez F, Salamanca Miño B (2014) Maximal correlation in a non-diagonal case. J Multivar Anal 131:265–278. https://doi.org/10.1016/j.jmva.2014.07.008
Lyons R (2013) Distance covariance in metric spaces. Ann Probab 41:3284–3305. https://doi.org/10.1214/12-AOP803
Papadatos N (2014) Some counterexamples concerning maximal correlation and linear regression. J Multivar Anal 126:114–117. https://doi.org/10.1016/j.jmva.2013.12.008
Papadatos N, Xifara T (2013) A simple method for obtaining the maximal correlation coefficient and related characterizations. J Multivar Anal 118:102–114. https://doi.org/10.1016/j.jmva.2013.03.017
Pearson K (1920) Notes on the history of correlation. Biometrika 13:25–45. https://doi.org/10.2307/2331722
Reimherr M, Nicolae DL (2013) On quantifying dependence: a framework for developing interpretable measures. Stat Sci 28:116–130. https://doi.org/10.1214/12-STS405
Rényi A (1959) On measures of dependence. Acta Mat Acad Sci Hung 10:441–451. https://doi.org/10.1007/BF02024507
Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 334(6062):1518–1524. https://doi.org/10.1126/science.1205438
Reshef YA, Reshef DN, Finucane HK, Sabeti PC, Mitzenmacher M (2016) Measuring dependence powerfully and equitably. J Mach Learn Res 17(212):1–63
Richards DStP (2017) Distance correlation: a new tool for detecting association and measuring correlation between data sets. Plenary talk at the Joint Mathematics Meeting, Atlanta, 2017. Not Am Math Soc 64:16–18. https://doi.org/10.1090/noti1457
Sampson AR (1984) A multivariate correlation ratio. Stat Probab Lett 2:77–81. https://doi.org/10.1016/0167-7152(84)90054-3
Schweizer B, Wolff EF (1981) On nonparametric measures of dependence for random variables. Ann Stat 9:879–885. https://doi.org/10.1214/aos/1176345528
Sejdinovic D, Sriperumbudur B, Gretton A, Fukumiyu K (2013) Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann Stat 41:2263–2291. https://doi.org/10.1214/13-AOS1140
Simon N, Tibshirani R (2011) Comment on “Detecting novel associations in large data set” by Reshef et al. Science Dec 16, 2011. arXiv:1401.7645v1. Accessed 9 Jan 2018
Spearman C (1904) A proof and measurement of association between two things. Am J Psychol 15:72–101. https://doi.org/10.2307/1412159
Speed T (2011) A correlation for the 21st century. Science 334(6062):1502–1503. https://doi.org/10.1126/science.1215894
Stigler S (1989) Francis Galton’s account of the invention of correlation. Stat Sci 4:73–79. https://doi.org/10.1214/ss/1177012580
Székely GJ, Rizzo ML, Bakirov NK (2007) Measuring and testing independence by correlation of distances. Ann Stat 35:2769–2794. https://doi.org/10.1214/009053607000000505
Székely GJ, Rizzo ML (2009) Brownian distance covariance. Ann Appl Stat 3:1236–1265. https://doi.org/10.1214/09-AOAS312
Székely GJ, Rizzo ML (2014) Partial distance correlation with methods for dissimilarities. Ann Stat 42:2382–2412. https://doi.org/10.1214/14-AOS1255
Volokh E (2015) Zero correlation between State Homicide and State Gun Laws. The Washington Post, October 6, 2015
Author information
Authors and Affiliations
Corresponding author
Additional information
T. F. Móri was supported by the Hungarian National Research, Development and Innovation Office NKFIH—Grant No. K125569. Part of this research was based on work supported by the National Science Foundation, while the second author was working at the Foundation. G. J. Székely is grateful for many interesting discussions with Yakir and David Reshef, Abram M. Kagan, and Gábor Tusnády.
Appendix
Appendix
Proof of Proposition 1
Without loss of generality assume that \(E[X]=0\). Let Q denote the distribution of X on the Borel sets of \(\mathbb {R}\). We have to find a 1–1 function f such that \(\int xf(x)\,dQ=0\).
By assumption, there exist real numbers \(t_1<t_2<t_3\) such that each of the intervals \((-\infty ,t_1]\), \((t_1,t_2]\), \((t_2,t_3]\), \((t_3,+\infty )\) has positive measure (w.r.t. Q). Let \(\delta \) be a suitably small positive number (the meaning of “suitably” will be made clear later). One can find \(t_0<t_1\) and \(t_4>t_3\) such that both \(Q(-\infty ,t_0]\) and \(Q(t_4,+\infty )\) are less than \(\delta \) (possibly 0).
Let the intervals \((-\infty ,t_0]\), \((t_0.t_1]\), \((t_1,t_2]\), \((t_2,t_3]\), \((t_3,t_4]\), and \((t_4,+\infty )\) be denoted by \(A_0\), \(A_1\), \(A_2\), \(A_3\), \(A_4\), and \(A_5\), respectively. Introduce
Then \(\mu _0+\cdots +\mu _5=0\).
It is not hard to see that there exist real constants \(a_1,a_2,a_3,a_4\), all different, such that
Indeed, consider the hyperplane \(\mathcal {L}\) of all vectors \((a_1,a_2,a_3,a_4) \in \mathbb {R}^4\) satisfying (1). \(\mathcal {L}\) cannot coincide with the hyperplane \(\mathcal {L}_{1,2}=\{a_1=a_2\}\), because the \(\mathcal {L}_{1,2}\) is orthogonal to the vector \((1,-1,0,0)\), which is not parallel to \((\mu _0+\mu _1,\mu _2,\mu _3,\mu _4+\mu _5)\), since the latter can have at most one 0 coordinate. Thus, \(\dim (\mathcal {L}\cap \mathcal {L}_{1,2})=2\). The same holds for \(\mathcal {L}_{i,j}\), the hyperplane defined by equality \(a_i=a_j\)\((i\ne j)\). Since \(\mathcal {L}\) cannot be covered by six of its lower dimensional subspaces, the existence of a vector in \(\mathcal {L}\) with different coordinates follows.
Let \(K>\max _{1\le i\le 4}|a_i|\). By continuity, if \(\delta \) is small enough, one can find constants \(b_1,b_2,b_3,b_4\) all different, such that \(\max _{1\le i\le 4}|b_i|<K\), and
Finally, choose \(c_0,c_1,\dots ,c_5\) in such a way that none of them are equal to 0, \(c_0\) and \(c_5\) are positive, and \(\sum _{i=0}^5 c_i\sigma _i^2=0\). This can be done, because there are at least 3 positive among the quantities \(\sigma _i^2\).
Now, let \(b_0=-K\), \(b_5=K\), and \(f(x)=b_i+\varepsilon c_i x\) if \(x\in A_i\), \(0\le i\le 5\). Then f is injective provided \(\varepsilon \) is a sufficiently small positive number, and
as needed.
Such an f cannot exist if X can take on exactly two values, because in that case uncorrelatedness is equivalent to independence.
When the distribution of X is concentrated on exactly 3 points, and X is supposed to have mean 0, then such an f exists if and only if zero is not among the possible values of X. (If \(E[X]=0\) is not supposed, the necessary and sufficient condition for f to exist is \(P(X=E[X])=0\).) Indeed, let \(x_1<x_2<x_3\) be the possible values of X, with probabilities \(q_1,q_2,q_3\), respectively. Then \(q_1x_1+q_2x_2+q_3x_3=0\), and \(x_1<0<x_3\). We are looking for real numbers \(f_1,f_2,f_3\) such that \(q_1x_1f_1+q_2x_2f_2+ q_3x_3f_3=0\). If \(x_2=0\), then it can only achieved with \(f_1=f_3\). In the complementary case \(f_1=-1\), \(f_3=1\) and \(f_2=(q_1x_1-q_3x_3)/(q_2x_2)\) will do, because \(f_2=1\) would imply \(-q_1x_1+q_2x_2+q_3x_3=0\), hence \(q_1x_1=0\), which is not allowed, and similarly, \(f_2=-1\) would imply \(q_3x_3=0\). \(\square \)
Rights and permissions
About this article
Cite this article
Móri, T.F., Székely, G.J. Four simple axioms of dependence measures. Metrika 82, 1–16 (2019). https://doi.org/10.1007/s00184-018-0670-3
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-018-0670-3