Computational Statistics

, 24:605 | Cite as

Reliability and efficiency of algorithms for computing the significance of the Mann–Whitney test

Original Paper


Motivated by recent applications of the Mann–Whitney U test to large data sets we took a critical look at current methods for computing its significance. Surprisingly, we found that the two fastest and most popular tools for exact computation of the test significance, Dinneen and Blakesley’s and Harding’s, can exhibit large numerical errors even in moderately large datasets. In addition, another method proposed by Pagano and Tritchler also suffers from a similar numerical instability and can produce inaccurate results. This motivated our development of a new algorithm, mw-sFFT, for the exact computation of the Mann–Whitney test with no ties. Among the class of exact algorithms that are numerically stable, mw-sFFT has the best complexity: O(m2n) versus O(m2n2) for others, where m and n are the two sample sizes. This asymptotic efficiency is also reflected in the practical runtime of the algorithm. In addition, we also present a rigorous analysis of the propagation of numerical errors in mw-sFFT to derive an error guarantee for the values computed by the algorithm. The reliability and efficiency of mw-sFFT make it a valuable tool in compuational applications and we plan to provide open-source libraries for it in C++ and Matlab.


Numerical error Exact computation FFT 


  1. Bickel DR (2004) Degrees of differential gene expression: detecting biologically significant expression differences and estimating their magnitudes. Bioinformatics 20(5): 682–688CrossRefGoogle Scholar
  2. Buckle N, Kraft C, van Eeden C (1969) An approximation to the Wilcoxon–Mann–Whitney distribution. J Am Stat Assoc 64: 591–599MATHCrossRefGoogle Scholar
  3. Dembo A, Zeitouni O (1998) Large deviation techniques and applications. Springer, New YorkGoogle Scholar
  4. Di Bucchianico A (1999) Combinatorics, computer-algebra and the Wilcoxon–Mann–Whitney test. J Stat Plann Infer 79: 349–364MATHCrossRefMathSciNetGoogle Scholar
  5. Dinneen LC, Blakesley BC (1973) Algorithm AS 62: a generator for the sampling distribution of the Mann–whitney U statistic. Appl Stat 22(2): 269–273CrossRefGoogle Scholar
  6. Fix E, Hodges JL (1955) Significance probabilities of the Wilcoxon test. Ann Math Stat 26(2): 301–312MATHCrossRefMathSciNetGoogle Scholar
  7. Froda S, van Eeden C (2000) A uniform saddlepoint expansion for the null-distribution of the Wilcoxon–Mann–Whitney statistic. Can J Stat 1: 137–149Google Scholar
  8. Harding EF (1984) An efficient, minimal-storage procedure for calculating the Mann–Whitney U , generalized U and similar distributions. Appl Stat 33(1): 1–6MATHCrossRefGoogle Scholar
  9. Hodges JL, Ramsey P, Wechsler S (1990) Improved significance probabilities of the Wilcoxon test. J Educ Stat 15(3): 249–265CrossRefGoogle Scholar
  10. Jin R, Robinson J (1999) Saddlepoint approximation near the endpoints of the support. Stat Prob Lett 45(4): 295–303MATHCrossRefMathSciNetGoogle Scholar
  11. Jin R, Robinson J (2003) Saddlepoint approximations of the two-sample Wilcoxon statistic. Institute of Mathematical Statistics, pp 149–158Google Scholar
  12. Karanam S, Moreno CS (2004) CONFAC: automated application of comparative genomic promoter analysis to DNA microarray datasets. Nucleic Acids Res 32(Web Server issue): W475–W484CrossRefGoogle Scholar
  13. Keich U (2005) Efficiently computing the p-value of the entropy score. J Comput Biol 12(4): 416–430CrossRefGoogle Scholar
  14. Keich U, Nagarajan N (2006) A fast and numerically robust method for exact multinomial goodness-of-fit test. J Comput Graph Stat 15(4): 779–802CrossRefMathSciNetGoogle Scholar
  15. Kifer D, Ben-David S, Gehrke J (2004) Detecting change in data streams. In: Proceedings of the 30th VLDB conference, pp 180–191Google Scholar
  16. Mann H, Whitney D (1947) On a test whether one of two random variables is stochastically larger than the other. Ann Math Stat 18: 50–60MATHCrossRefMathSciNetGoogle Scholar
  17. Mehta CR, Patel NR, Tsiatis AA (1984) Exact significance testing to establish treatment equivalence with ordered categorical data. Biometrics 40(3): 819–825MATHCrossRefMathSciNetGoogle Scholar
  18. Nagarajan N, Jones N, Keich U (2005) Computing the p-value of the information content from an alignment of multiple sequences. In: Proceedings of the 13th ISMB conference, pp 311–318Google Scholar
  19. Pagano M, Tritchler D (1983) On obtaining permutation distribution in polynomial time. Am Stat Assoc 83: 435–440CrossRefMathSciNetGoogle Scholar
  20. Press W, Teukolsky S, Vetterling W, Flannery B (1992) Numerical recipes in C. The art of scientific computing, 2nd edn. Cambridge University Press, New YorkMATHGoogle Scholar
  21. Streitberg B, Rohmel J (1984) Exact nonparametrics in APL. In: Proceedings of the APL conference, ACM, New York, pp 313–325Google Scholar
  22. Tasche M, Zeuner H (2002) Improved roundoff error analysis for precomputed twiddle factors. J Comput Anal Appl 4(1): 1–18MATHMathSciNetGoogle Scholar
  23. Troyanskaya OG, Garber ME, Brown PO, Botstein D, Altman RB (2002) Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 18(11): 1454–1461CrossRefGoogle Scholar
  24. van Dantzig D (1947–1950) Kader cursus Mathematische Statistiek, Mathematisch Centrum, pp 301–304Google Scholar
  25. Van de Wiel MA, Smeets SJ, Brakenhoff RH, Yistra B (2005) CGHMultiArray: exact p-values for multi-array comparative genomic hybridization data. Bioinformatics 21: 3193–3194CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  1. 1.CBCB, UMIACSUniversity of MarylandCollege ParkUSA
  2. 2.Department of Computer ScienceCornell UniversityIthacaUSA
  3. 3.School of Mathematics and StatisticsUniversity of SydneySydneyAustralia

Personalised recommendations