## Summary

Traditional algorithms for the calculation of Kendall’s τ between two datasets of n samples have a calculation time of O(n^{2}). This paper presents a suite of algorithms with expected calculation time of O(n log n) or better using a combination of sorting and balanced tree data structures. The literature, e.g. Dwork et al. (2001), has alluded to the existence of O(n log n) algorithms without any analysis: this paper gives an explicit descriptions of such algorithms for general use both for the case with and without duplicate values in the data. Execution times for sample data are reduced from 3.8 hours to around 1–2 seconds for one million data pairs.

### Similar content being viewed by others

## Notes

^{1}Whilst samples from continuous distributions may contain duplicates to the accuracy with which they are represented numerically, the algorithm presented will treat them as logically unequal with (effectively) a random ordering being selected for them. The chances of this occurring in practice on any sensible accuracy of machine representation are sufficiently slight that further analysis of this case has not been performed.^{2}The descriptions throughout this document assume ascending sorts are used. Obviously if descending-sorted data are available, the algorithms can be modified accordingly rather than resorting

## References

Adel’son-Vel’skii, G. M. and Landis, E. M. (1962), An Algorithm for the Organization of Information.

*Soviet Mathematics Doklady*, 3, 1259–1262.Dwork, C, Kumar, R., Naor, M. and Sivakumar, D. (2001), Rank Aggregation Revisited.

*Proc. 10th International World Wide Web Conference*, 613–622.Knuth, D.E. (1998),

*The Art of Computer Programming, Volume 3: Sorting and Searching*, Addison-Wesley, 2nd edition.Lindskog, F., McNeil, A. and Schmock, U. (2001), Kendall’s τ for Elliptical Distributions.

*Working paper from http://www, math, ethz. ch/~mcneil/pub_list, html*Press, W.H., Flannery, B.P., Teukolsky, S.A. and Vetterling, W.T. (1993),

*Numerical Recipes*, Cambridge University Press.

## Author information

### Authors and Affiliations

## Appendices

### Appendix 1: Use of binary trees

*This appendix & provided for those unfamiliar with the use of binary trees to provide O(log n) lookup of data. It presents no new results and can be safely skipped by those familiar with the use of such data structures.*

At several locations within the algorithms in the main paper, we wish to find a value of Y_{i} in a list of those that have already been encountered, and if it is not there, add it in. Furthermore we also wish to maintain a count of the number of values which occur in the sequence (so far) which have Y < Y_{i}.

A simple approach to this would maintain a list or array of values found so far. In the list case the search is O(n) and the insertion is in constant time. In the array case the search can be O(log n) but the insertion is O(n). Either way, the overall search and insert process is O(n). Given that we are performing this n times in the original algorithm, this produces an overall execution time of O(n^{2}) — precisely what we are trying to avoid.

The standard computer science solution to this problem is to use a binary tree. In this data structure, each item is stored in a “node” which as well as storing the item’s value also refers to (up to) two other items, a “left node” and a “right node”. These nodes themselves refer to other nodes, so that each node has two “sub trees” under it. The rule for organising such a tree is that the value at a node is greater than the values at all nodes in its left sub tree, and less than the values in all nodes in its right sub tree. For example:

The efficiency of the data structure comes from the fact that the depth of the tree is at most 1+log_{2}(n) *provided that the tree is balanced*, i.e. that the routes from root to leaf nodes are all approximately the same length. Ensuring that the tree remains perfectly balanced is not a trivial process, and the normal solution is to use the AVL algorithm described in Adel’son-Vel’skii and Landis (1962) which keeps the tree “near enough balanced” and maintains O(log n) performance for both searching and insertion.

### Appendix 2: Code listings

Since SDTau seems to be the best all-round performer, only that algorithm is included here. It also assumes the availability of Quicksort and AVLTree implementations. However, all of the algorithms described and an implementation of AVLTree are available on request from the author at d.christensen@emb.co.uk.

## Rights and permissions

## About this article

### Cite this article

Christensen, D. Fast algorithms for the calculation of Kendall’s τ.
*Computational Statistics* **20**, 51–62 (2005). https://doi.org/10.1007/BF02736122

Published:

Issue Date:

DOI: https://doi.org/10.1007/BF02736122