Computational Statistics

, 24:605

Reliability and efficiency of algorithms for computing the significance of the Mann–Whitney test

Original Paper

DOI: 10.1007/s00180-009-0148-x

Cite this article as:
Nagarajan, N. & Keich, U. Comput Stat (2009) 24: 605. doi:10.1007/s00180-009-0148-x

Abstract

Motivated by recent applications of the Mann–Whitney U test to large data sets we took a critical look at current methods for computing its significance. Surprisingly, we found that the two fastest and most popular tools for exact computation of the test significance, Dinneen and Blakesley’s and Harding’s, can exhibit large numerical errors even in moderately large datasets. In addition, another method proposed by Pagano and Tritchler also suffers from a similar numerical instability and can produce inaccurate results. This motivated our development of a new algorithm, mw-sFFT, for the exact computation of the Mann–Whitney test with no ties. Among the class of exact algorithms that are numerically stable, mw-sFFT has the best complexity: O(m2n) versus O(m2n2) for others, where m and n are the two sample sizes. This asymptotic efficiency is also reflected in the practical runtime of the algorithm. In addition, we also present a rigorous analysis of the propagation of numerical errors in mw-sFFT to derive an error guarantee for the values computed by the algorithm. The reliability and efficiency of mw-sFFT make it a valuable tool in compuational applications and we plan to provide open-source libraries for it in C++ and Matlab.

Keywords

Numerical error Exact computation FFT 

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  1. 1.CBCB, UMIACSUniversity of MarylandCollege ParkUSA
  2. 2.Department of Computer ScienceCornell UniversityIthacaUSA
  3. 3.School of Mathematics and StatisticsUniversity of SydneySydneyAustralia

Personalised recommendations