Randomization tests are often recommended when parametric assumptions may be violated because they require no distributional or random sampling assumptions in order to be valid. In addition to being exact, a randomization test may also be more powerful than its parametric counterpart. This was demonstrated in a simulation study which examined the conditional power of three nondirectional tests: the randomization t test, the Wilcoxon–Mann–Whitney (WMW) test, and the parametric t test. When the treatment effect was skewed, with degree of skewness correlated with the size of the effect, the randomization t test was systematically more powerful than the parametric t test. The relative power of the WMW test under the skewed treatment effect condition depended on the sample size ratio.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Box, G.E.P., & Anderson, S.L. (1955). Permutation theory in the derivation of robust criteria and the study of departures from assumption. Journal of the Royal Statistical Society. Series B, Statistical Methodology, 17, 1–34.
David, H.A. (2008). The beginnings of randomization tests. The American Statistician, 62, 70–72.
Dwass, M. (1957). Modified randomization tests for nonparametric hypotheses. Annals of Mathematical Statistics, 28, 181–187.
Eden, T., & Yates, F. (1933). On the validity of Fisher’s z test when applied to an actual example of non-normal data. Journal of Agricultural Science, 23, 6–17.
Edgington, E.S., & Ezinga, G. (1978). Randomization tests and outlier scores. The Journal of Psychology, 99, 259–262.
Edgington, E.S., & Onghena, P. (2007). Randomization tests (4th ed.). Boca Raton: Chapman & Hall.
Fisher, R.A. (1935). The design of experiments. Edinburgh: Oliver & Boyd.
Gabriel, K.R., & Hall, W.J. (1983). Rerandomization inference on regression and shift effects: Computationally feasible methods. Journal of the American Statistical Association, 78, 827–836.
Gabriel, K.R., & Hsu, C.-F. (1983). Evaluation of the power of rerandomization tests, with application to weather modification experiments. Journal of the American Statistical Association, 78, 766–775.
Gill, P.M.W. (2007). Efficient calculation of p-values in linear-statistic permutation significance tests. Journal of Statistical Computation and Simulation, 77, 55–61.
Hayes, A.F. (1996). Permutation test is not distribution-free: Testing H 0:ρ=0. Psychological Methods, 1, 184–198.
Hettmansperger, T.P. (1984). Statistical inference based on ranks. New York: Wiley.
Hoeffding, W. (1952). The large sample power of tests based on permutations of observations. Annals of Mathematical Statistics, 23, 169–192.
Hothorn, T., Hornik, K., van de Wiel, M.A., & Zeileis, A. (2006). A Lego system for conditional inference. The American Statistician, 60(3), 257–263.
Keller-McNulty, S., & Higgins, J.J. (1987). Effect of tail weight and outliers on power and type-I error of robust permutation tests for location. Communications in Statistics. Simulation and Computation, 16, 17–35.
Kempthorne, O., & Doerfler, T.E. (1969). The behavior of some significance tests under experimental randomization. Biometrika, 56, 231–248.
Keppel, G., & Wickens, T.D. (2004). Design and analysis: a researcher’s handbook (4th ed.). Upper Saddle River: Pearson Education.
Klotz, J.H. (1966). The Wilcoxon, ties, and the computer. Journal of the American Statistical Association, 61, 772–787.
Lehmann, E.L. (1975). Nonparametrics. San Francisco: Holden-Day.
Levin, J.R., Marascuilo, L.A., & Hubert, L.J. (1978). N=nonparametric randomization tests. In T.R. Kratochwill (Ed.), Single-subject research: strategies for evaluating change (pp. 167–196). New York: Academic Press.
Ludbrook, J., & Dudley, H. (1998). Why permutation tests are superior to t and F tests in biomedical research. The American Statistician, 52, 127–132.
Mann, H.B., & Whitney, D.R. (1947). On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18, 50–60.
Mehta, C.R., Patel, N.R., & Tsiatis, A.A. (1984). Exact significance testing to establish treatment equivalence with ordered categorical data. Biometrics, 40, 819–825.
Mewhort, D.J.K. (2005). A comparison of the randomization test with the F test when error is skewed. Behavior Research Methods, 37, 426–435.
Onghena, P., & May, R.B. (1995). Pitfalls in computing and interpreting randomization test p values: A commentary on Chen and Dunlap. Behavior Research Methods, Instruments, & Computers, 27, 408–411.
Pitman, E.J.G. (1937). Significance tests which may be applied to samples from any populations. Supplement to the Journal of the Royal Statistical Society, 4, 119–130.
R Development Core Team (2011). R: a language and environment for statistical computing [Computer software manual]. Vienna, Austria. Available from http://www.R-project.org/ (ISBN 3-900051-07-0).
Scheffé, H. (1959). The analysis of variance. New York: Wiley.
Streitberg, B., & Röhmel, J. (1986). Exact distributions for permutation and rank tests: An introduction to some recently published algorithms. Statistical Software Newsletter, 12, 10–17.
Tomarken, A.J., & Serlin, R.C. (1986). Comparison of ANOVA alternatives under variance heterogeneity and specific noncentrality structures. Psychological Bulletin, 99, 90–99.
Toothaker, L.E. (1972). An empirical investigation of the permutation t-test as compared to Student’s t-test and the Mann-Whitney U-test. Doctoral dissertation, University of Wisconsin, Madison.
van den Brink, W.P., & van den Brink, S.G.J. (1989). A comparison of the power of the t test, Wilcoxon’s test, and the approximate permutation test for the two-sample location problem. British Journal of Mathematical & Statistical Psychology, 42, 183–189.
Wald, A., & Wolfowitz, J. (1944). Statistical tests based on the permutations of the observations. Annals of Mathematical Statistics, 15, 358–372.
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics, 1, 80–83.
Zimmerman, D., & Zumbo, B. (1992). Parametric alternatives to the Student t test under violation of normality and homogeneity of variance. Perceptual and Motor Skills, 74, 835–844.
Zimmerman, D., & Zumbo, B. (1993). Rank transformations and the power of the Student t test and Welch t′ test for non-normal populations with unequal variances. Canadian Journal of Experimental Psychology, 47, 523–539.
The author is tremendously grateful to Professors Jee-Seon Kim, Ronald Serlin, and Peter Steiner for helpful discussions.
About this article
Cite this article
Keller, B. Detecting Treatment Effects with Small Samples: The Power of Some Tests Under the Randomization Model. Psychometrika 77, 324–338 (2012). https://doi.org/10.1007/s11336-012-9249-5
- randomization test
- permutation test
- Wilcoxon–Mann–Whitney test
- exact Type I error rate
- conditional power