Efficient computation of the Bergsma–Dassios sign covariance

Abstract

In an extension of Kendall’s \(\tau \), Bergsma and Dassios (Bernoulli 20(2):1006–1028, 2014) introduced a covariance measure \(\tau ^*\) for two ordinal random variables that vanishes if and only if the two variables are independent. For a sample of size n, a direct computation of \(t^*\), the empirical version of \(\tau ^*\), requires \(O(n^4)\) operations. We derive an algorithm that computes the statistic using only \(O \left( n^2\log (n)\right) \) operations.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2

Notes

  1. 1.

    See https://cran.r-project.org/web/packages/TauStar/index.html.

  2. 2.

    R code to reproduce the results of Tables 1, 2 and 3 can be found on the first author’s webpage: http://www.stat.washington.edu/~lucaw/public_resources/eff_comp_2015/tables.R.

References

  1. Bergsma W, Dassios A (2014) A consistent test of independence based on a sign covariance related to Kendall’s tau. Bernoulli 20(2):1006–1028

    MathSciNet  Article  MATH  Google Scholar 

  2. Christensen D (2005) Fast algorithms for the calculation of Kendall’s \(\tau \). Comput Stat 20(1):51–62

    Article  MATH  Google Scholar 

  3. Guibas LJ, Sedgewick R (1978) A dichromatic framework for balanced trees. In: 19th annual symposium on foundations of computer science, pp 8–21

  4. Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1/2):81–93

    MathSciNet  Article  MATH  Google Scholar 

  5. Martinian E (2005) Red-black tree C code. http://web.mit.edu/~emin/www.old/source_code/red_black_tree/index.html

  6. R Core Team (2015) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/

  7. Serfling RJ (1980) Approximation theorems of mathematical statistics, Wiley Series in Probability and Mathematical Statistics. Wiley, New York

    Google Scholar 

  8. Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15:72–101

    Article  Google Scholar 

  9. Weihs L (2015) TauStar: efficient computation of the t* statistic of Bergsma and Dassios (2014). R package version 1.0.0, http://CRAN.R-project.org/package=TauStar

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Luca Weihs.

Appendices

Appendix 1: Modifications for the V-statistic

This section provides an overview of necessary modifications to Algorithm 2 in order to compute the V-statistic version of \(t^*\). Suppose, as usual, that we have reordered the pairs \((x_1,y_1),\ldots ,(x_n,y_n)\) so that \(x_1\le x_2\le \cdots \le x_n\). Then the V-statistic for \(\tau ^*\) is

$$\begin{aligned} t_V^*&= \frac{1}{n^4}\sum _{1\le i,j,k,l\le n} a(x_i,x_j,x_k,x_l)a(y_i,y_j,y_k,y_l) \\&= \frac{1}{n^4}\left( \sum _{1\le i< j< k< l\le n} b_{ijkl} + \sum _{1\le i< j< k\le n} \frac{b_{ijkk} + b_{ijjk} + b_{iijk}}{2} + \sum _{1\le i < k\le n}\frac{b_{iikk}}{4}\right) \\&= \frac{1}{n^4}\left( \sum _{1\le i< j< k< l\le n} b_{ijkl} + \sum _{1\le i< j< k\le n} \frac{b_{ijkk}+ b_{iijk}}{2} + \sum _{1\le i < k\le n}\frac{b_{iikk}}{4}\right) . \end{aligned}$$

Here, the second equality holds since \(a(x_i,x_j,x_k,x_l)a(y_i,y_j,y_k,y_l)=0\) if any three of ijkl are equal. The third equality holds because \(b_{ijjk}=0\) for all \(i< j< k\); indeed, \(x_i\le x_j \le x_k\) implies that \(b_{ijjk}\) corresponds to an inseparable collection of points. Note that, in the above equations, we have coefficients of \(\frac{1}{2}\) on \(b_{ijkk},b_{iijk}\) and \(\frac{1}{4}\) on \(b_{iikk}\), these are corrective factors to account for the fact that the number of permutations of four elements where exactly two are equal is \(|S_4|/2\) while the number of permutations where exactly two pairs of two are equal is \(|S_4|/4\). Now we may continue to rewrite \(t^*_V\) as

$$\begin{aligned} t_V^*&= \frac{1}{n^4} \left( \sum _{1\le i< j< k< l\le n} b_{ijkl} + \sum _{1\le i< j< k\le n} \frac{b_{ijkk}+ b_{iijk}}{2} + \sum _{1\le i < k\le n}\frac{b_{iikk}}{4} \right) \\&= \frac{1}{n^4} \left( \sum _{1\le i< j< k< l\le n} b_{ijkl} + \sum _{1\le i< j< k\le n} \frac{b_{ijkk}}{2}+ \sum _{1\le i< k< l\le n}\frac{b_{iikl}}{2} + \sum _{1\le i < k\le n}\frac{b_{iikk}}{4} \!\right) \\&= \frac{1}{n^4}\sum _{3\le k\le n} \Bigg (\sum _{k<l\le n}\left( \sum _{1\le i < j<k} b_{ijkl} \!+\! \sum _{1\le i < k}\frac{b_{iikl}}{2}\right) \!+\! \sum _{1\le i< j < k} \frac{b_{ijkk}}{2} \!+\! \sum _{1\le i < k} \frac{b_{iikk}}{4} \Bigg ). \end{aligned}$$

If \(k=n\) then \(\sum _{k<l\le n}\) is the empty sum which we define to equal 0. For a fixed \(k<l\) we know already, from Sect. 3, how to compute \(\sum _{1\le i < j<k} b_{ijkl}\) efficiently using a red–black tree and since \(b_{iikl},b_{ijkk}\), and \(b_{iikk}\) can only correspond to inseparable or concordant quadruples it is easy to see that

$$\begin{aligned} \sum _{1\le i <k} \frac{1}{2} b_{iikl}&= 8\times \left( top(k,l) + bot(k,l)\right) , \end{aligned}$$
(15)
$$\begin{aligned} \sum _{1\le i < j <k} \frac{1}{2} b_{ijkk}&= 8\times \left( {top(k,k) \atopwithdelims ()2} + {bot(k,k)\atopwithdelims ()2}\right) , \end{aligned}$$
(16)
$$\begin{aligned} \sum _{1\le i <k} \frac{1}{4}b_{iikk}&= 4\times \left( top(k,k) + bot(k,k)\right) . \end{aligned}$$
(17)

Thus we may compute \(t_V^*\) by running Algorithm 2 with the following modifications:

  1. (i)

    Change line 9 to

    figurec

    This corresponds to the outer sum of \(t^*_V\).

  2. (ii)

    After line 14 add the lines:

    figured

    This accounts for the effect of (16) and (17).

  3. (iii)

    Change line 23 to

    figuree

    This corresponds to (15).

  4. (iv)

    Change line 42 to

    figuref

Finally, note that this Algorithm for computing \(t^*_V\) clearly remains \(O(n^2\log (n))\).

Appendix 2: Proof of Lemma 1

By permutation invariance, suppose we have relabeled so that \(x_1\le x_2\le x_3\le x_4\). We have 3 cases:

  1. (i)

    The points in A are inseparable. The fact that \(b_{1234}=0\) is an immediate consequence of Eq. (2).

  2. (ii)

    The points in A are concordant. In this case we must have that \(x_2<x_3\) and either \(\max (y_1,y_2) < \min (y_3,y_4)\) or \(\min (y_1,y_2) > \max (y_3,y_4)\). By symmetry we need only consider the case when \(\max (y_1,y_2) < \min (y_3,y_4)\). By Eq. (2) it follows, with some thought, that \(a(x_{\pi (1,2,3,4)}) = a(y_{\pi (1,2,3,4)})\) for all permutations \(\pi \in S_4\) and thus, for any \(\pi \in S_4\) we have \(a(x_{\pi (1,2,3,4)})a(y_{\pi (1,2,3,4)}) = a(x_{\pi (1,2,3,4)})^2\) with

    $$\begin{aligned} a(x_{\pi (1,2,3,4)})^2&=\left\{ \begin{array}{lll} 1 &{} \quad \text{ if } \max (x_{\pi (1)}, x_{\pi (2)}) < \min (x_{\pi (3)}, x_{\pi (4)})\text { or} \\ &{} \min (x_{\pi (1)}, x_{\pi (2)}) > \max (x_{\pi (3)}, x_{\pi (4)})\text { or}\\ &{} \max (x_{\pi (1)}, x_{\pi (3)}) < \min (x_{\pi (2)}, x_{\pi (4)})\text { or}\\ &{} \min (x_{\pi (1)}, x_{\pi (3)}) > \max (x_{\pi (2)}, x_{\pi (4)}),\\ 0 &{} \quad \text {otherwise.} \end{array} \right. \end{aligned}$$

    But since \(x_1\le x_2 < x_3\le x_4\) we have that \(a(x_{\pi (1,2,3,4)})a(y_{\pi (1,2,3,4)})=1\) if and only if \(\{\pi (1),\pi (2)\} \in \{\{1,2\}, \{3,4\}\}\) or \(\{\pi (1),\pi (3)\} \in \{\{1,2\}, \{3,4\}\}\). There are exactly \(2^4=16\) such permutations and thus \(b_{1234}=16\).

  3. (iii)

    The points in A are discordant. Once again we must have that \(x_2<x_3\). It then follows, by the definition of discordant, that \(y_1\not =y_2\) and \(y_3\not =y_4\). We prove an intermediary lemma:

Lemma 2

Suppose that \((x_1,y_1),\ldots ,(x_4,y_4)\) are discordant and \(x_1\le x_2< x_3\le x_4\). Let

$$\begin{aligned} (x_5,y_5)&=\! (x_1,y_2),&(x_6,y_6)&=\! (x_2,y_1),&(x_7,y_7)&=\! (x_3,y_3),&(x_8,y_8)&=\! (x_4,y_4), \end{aligned}$$

so that \((x_5,y_5),\ldots ,(x_8,y_8)\) are simply \((x_1,y_1),\ldots ,(x_4,y_4)\) with \(y_1,y_2\) switched. Then \(b_{1234} = b_{5678}\). Moreover, the same result is true if we flipped \(y_3,y_4\) instead of \(y_1,y_2\).

Proof

First note that, trivially, \(a(x_{\pi (1,2,3,4)}) = a(x_{\pi (5,6,7,8)})\) for any \(\pi \in S_4\). Let \(\pi \) be any permutation so that \(a(x_{\pi (1,2,3,4)})^2=1\). From case (ii) we know that we must have \(\{\pi (1),\pi (2)\} \in \{\{1,2\}, \{3,4\}\}\) or \(\{\pi (1),\pi (3)\} \in \{\{1,2\}, \{3,4\}\}\). Suppose that \(\{\pi (1),\pi (2)\}=\{1,2\}\), and let \(\pi '\in S_4\) be the permutation where

$$\begin{aligned} \pi '(1) = \pi (2),\ \ \pi '(2) = \pi (1),\ \ \pi '(3) = \pi (3),\ \ \pi '(4) = \pi (4). \end{aligned}$$

Then clearly \( a(x_{\pi (1,2,3,4)}) = a(x_{\pi '(1,2,3,4)}) = a(x_{\pi (5,6,7,8)}) = a(x_{\pi '(5,6,7,8)})\) but

$$\begin{aligned} a(y_{\pi (1,2,3,4)})&= a(y_{\pi '(5,6,7,8)}),&a(y_{\pi '(1,2,3,4)})&= a(y_{\pi (5,6,7,8)}), \end{aligned}$$

and thus

$$\begin{aligned}&a(x_{\pi (1,2,3,4)})a(x_{\pi (1,2,3,4)}) + a(x_{\pi '(1,2,3,4)})a(x_{\pi '(1,2,3,4)}) \\&\quad = a(x_{\pi (5,6,7,8)})a(x_{\pi (5,6,7,8)}) + a(x_{\pi '(5,6,7,8)})a(x_{\pi '(5,6,7,8)}). \end{aligned}$$

As we may perform a similar procedure to all \(\pi \in S_4\) with \(a(x_{\pi (1,2,3,4)})^2=1\) (changing the choice of \(\pi '\)), we see that \(b_{1234} = b_{5678}\) as claimed.

Finally, pairing \(\pi \) with \(\pi '\) given by

$$\begin{aligned} \pi '(1) = \pi (1),\ \ \pi '(2) = \pi (2),\ \ \pi '(3) = \pi (4),\ \ \pi '(4) = \pi (3) \end{aligned}$$

shows that this result still holds if we had flipped \(y_3,y_4\) instead of \(y_1,y_2\). \(\square \)

By Lemma 2, we may assume that \(x_1\le x_2<x_3\le x_4\) and \(y_1<y_2\) and \(y_3<y_4\). Note that, by the definition of discordant, we must have that \(y_2 > y_3\) and \(y_1<y_4\). From case (ii) we know that there are only 16 permutations \(\pi \) for which \(a(x_{\pi (1,2,3,4)}) \not = 0\) and they satisfy

$$\begin{aligned} \{\pi (1),\pi (2)\} \in \{\{1,2\}, \{3,4\}\}\text { or }\{\pi (1),\pi (3)\} \in \{\{1,2\}, \{3,4\}\}. \end{aligned}$$

If \(\{\pi (1),\pi (2)\} \in \{\{1,2\}, \{3,4\}\}\) and \(\{\pi (1),\pi (3)\} \in \{\{1,4\},\{2,3\}\}\), then we have \(a(y_{\pi (1,2,3,4)}) = 0\). Similarly, \(a(y_{\pi (1,2,3,4)}) = 0\) if \(\{\pi (1),\pi (3)\} \in \{\{1,2\}, \{3,4\}\}\) and \(\{\pi (1),\pi (2)\} \in \{\{1,4\},\{2,3\}\}\). This leaves only 8 permutations \(\pi \in S_4\) for which \(a(x_{\pi (1,2,3,4)})a(y_{\pi (1,2,3,4)})\) may be non-zero, and we check these explicitly:

$$\begin{aligned} a(x_{1,2,3,4})a(y_{1,2,3,4})&= -1\times 1 = -1,&a(x_{2,1,4,3})a(y_{2,1,4,3})&= -1\times 1 = -1, \\ a(x_{3,4,1,2})a(y_{3,4,1,2})&= -1\times 1 = -1,&a(x_{4,3,2,1})a(y_{4,3,2,1})&= -1\times 1 = -1, \\ a(x_{1,3,2,4})a(y_{1,3,2,4})&= 1\times -1 = -1,&a(x_{2,4,1,3})a(y_{2,4,1,3})&= 1\times -1 = -1, \\ a(x_{3,1,4,2})a(y_{3,1,4,2})&= 1\times -1 = -1,&a(x_{4,2,3,1})a(y_{4,2,3,1})&= 1\times -1 = -1. \end{aligned}$$

We conclude that \(b_{1234} = -8\) as claimed.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Weihs, L., Drton, M. & Leung, D. Efficient computation of the Bergsma–Dassios sign covariance. Comput Stat 31, 315–328 (2016). https://doi.org/10.1007/s00180-015-0639-x

Download citation

Keywords

  • Binary tree
  • Kendall’s tau
  • Nonparametric correlation
  • Spearman’s rho
  • Rank correlation
  • Test of independence