Normal and Non-normal Data Simulations for the Evaluation of Two-Sample Location Tests

Part of the ICSA Book Series in Statistics book series (ICSABSS)


Two-sample location tests refer to the family of statistical tests that compare two independent distributions via measures of central tendency, most commonly means or medians. The t-test is the most recognized parametric option for two-sample mean comparisons. The pooled t-test assumes the two population variances are equal. Under circumstances where the two population variances are unequal, Welch’s t-test is a more appropriate test. Both of these t-tests require data to be normally distributed. If the normality assumption is violated, a non-parametric alternative such as the Wilcoxon rank-sum test has potential to maintain adequate type I error and appreciable power . While sometimes considered controversial, pretesting for normality followed by the F-test for equality of variances may be applied before selecting a two-sample location test. This option results in multi-stage tests as another alternative for two-sample location comparisons, starting with a normality test, followed by either Welch’s t-test or the Wilcoxon rank-sum test. Less commonly utilized alternatives for two-sample location comparisons include permutation tests, which evaluate statistical significance based on empirical distributions of test statistics. Overall, a variety of statistical tests are available for two-sample location comparisons. Which tests demonstrate the best performance in terms of type I error and power depends on variations in data distribution, population variance, and sample size. One way to evaluate these tests is to simulate data that mimic what might be encountered in practice. In this chapter, the use of Monte Carlo techniques are demonstrated to simulate normal and non-normal data for the evaluation of two-sample location tests.


errorType powerPower Welche skewnessSkewness Null Simulations 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Altman, D. G., & Royston, P. (2006). The cost of dichotomising continuous variables. Bmj, 332(7549), 1080.CrossRefGoogle Scholar
  2. Beasley, T. M., Erickson, S., & Allison, D. B. (2009). Rank-based inverse normal transformations are increasingly used, but are they merited? Behavior Genetics, 39(5), 580–595.CrossRefGoogle Scholar
  3. Boik, R. J. (1987). The fisher-pitman permutation test: A non-robust alternative to the normal theory f test when variances are heterogeneous. British Journal of Mathematical and Statistical Psychology, 40(1), 26–42.MathSciNetCrossRefzbMATHGoogle Scholar
  4. Cohen, J. (2013). Statistical power analysis for the behavioral sciences. Academic press.Google Scholar
  5. de Winter, J. C. (2013). Using the students t-test with extremely small sample sizes. Practical Assessment, Research & Evaluation, 18(10), 1–12.Google Scholar
  6. Demirtas, H., Hedeker, D., & Mermelstein, R. J. (2012). Simulation of massive public health data by power polynomials. Statistics in Medicine, 31(27), 3337–3346.MathSciNetCrossRefGoogle Scholar
  7. Devroye, L. (1986). Sample-based non-uniform random variate generation. In Proceedings of the 18th conference on Winter simulation, pp. 260–265. ACM.Google Scholar
  8. Ernst, M. D., et al. (2004). Permutation methods: a basis for exact inference. Statistical Science, 19(4), 676–685.MathSciNetCrossRefzbMATHGoogle Scholar
  9. Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G* power 3: A flexible statistical power ana lysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191.CrossRefGoogle Scholar
  10. Headrick, T. C., & Sawilowsky, S. S. (2000). Weighted simplex procedures for determining boundary points and constants for the univariate and multivariate power methods. Journal of Educational and Behavioral Statistics, 25(4), 417–436.CrossRefGoogle Scholar
  11. Kohr, R. L., & Games, P. A. (1974). Robustness of the analysis of variance, the welch procedure and a box procedure to heterogeneous variances. The Journal of Experimental Education, 43(1), 61–69.CrossRefGoogle Scholar
  12. Marrero, O. (1985). Robustness of statistical tests in the two-sample location problem. Biometrical Journal, 27(3), 299–316.MathSciNetCrossRefGoogle Scholar
  13. Osborne, J. (2005). Notes on the use of data transformations. Practical Assessment, Research and Evaluation, 9(1), 42–50.Google Scholar
  14. Osborne, J. W. (2010). Improving your data transformations: Applying the box-cox transformation. Practical Assessment, Research & Evaluation, 15(12), 1–9.Google Scholar
  15. Rasch, D., Kubinger, K. D., & Moder, K. (2011). The two-sample t-test: pre-testing its assumptions does not pay off. Statistical Papers, 52(1), 219–231.Google Scholar
  16. Rochon, J., Gondan, M., & Kieser, M. (2012). To test or not to test: Preliminary assessment of normality when comparing two independent samples. BMC Medical Research Methodology, 12(1), 81.CrossRefGoogle Scholar
  17. Royston, J. (1982). An extension of shapiro and wilk’s w test for normality to large samples. Applied Statistics, 115–124.Google Scholar
  18. Ruxton, G. D. (2006). The unequal variance t-test is an underused alternative to student’s t-test and the mann-whitney u test. Behavioral Ecology, 17(4), 688–690.CrossRefGoogle Scholar
  19. Sawilowsky, S. S. (2005). Misconceptions leading to choosing the t-test over the wilcoxon mann-whitney test for shift in location parameter.Google Scholar
  20. Schucany, W. R., & Tony Ng H. (2006). Preliminary goodness-of-fit tests for normality do not validate the one-sample student t. Communications in Statistics Theory and Methods, 35(12), 2275–2286.Google Scholar
  21. Team, R. C. (2014). R: A language and environment for statistical computing. R foundation for statistical computing, vienna, austria, 2012.Google Scholar
  22. Welch, B. L. (1938). The significance of the difference between two means when the population variances are unequal. Biometrika, 29(3–4), 350–362.CrossRefzbMATHGoogle Scholar
  23. Zimmerman, D. W. (1996). A note on homogeneity of variance of scores and ranks. The Journal of Experimental Education, 64(4), 351–362.CrossRefGoogle Scholar
  24. Zimmerman, D. W. (1998). Invalidation of parametric and nonparametric statistical tests by concurrent violation of two assumptions. The Journal of Experimental Education, 67(1), 55–68.CrossRefGoogle Scholar
  25. Zimmerman, D. W. (2004). A note on preliminary tests of equality of variances. British Journal of Mathematical and Statistical Psychology, 57(1), 173–181.MathSciNetCrossRefGoogle Scholar
  26. Zimmerman, D. W., & Zumbo, B. D. (1993). Rank transformations and the power of the student t-test and welch t’test for non-normal populations with unequal variances. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale, 47(3), 523.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  1. 1.Department of Community Medicine and Health CareConnecticut Institute for Clinical and Translational Science, University of Connecticut Health CenterFarmingtonUSA

Personalised recommendations