Evaluating the robustness of repeated measures analyses: The case of small sample sizes and nonnormal data
 Daniel Oberfeld,
 Thomas Franke
 … show all 2 hide
Abstract
Repeated measures analyses of variance are the method of choice in many studies from experimental psychology and the neurosciences. Data from these fields are often characterized by small sample sizes, high numbers of factor levels of the withinsubjects factor(s), and nonnormally distributed response variables such as response times. For a design with a single withinsubjects factor, we investigated Type I error control in univariate tests with corrected degrees of freedom, the multivariate approach, and a mixedmodel (multilevel) approach (SAS PROC MIXED) with Kenward–Roger’s adjusted degrees of freedom. We simulated multivariate normal and nonnormal distributions with varied population variance–covariance structures (spherical and nonspherical), sample sizes (N), and numbers of factor levels (K). For normally distributed data, as expected, the univariate approach with Huynh–Feldt correction controlled the Type I error rate with only very few exceptions, even if samples sizes as low as three were combined with high numbers of factor levels. The multivariate approach also controlled the Type I error rate, but it requires N ≥ K. PROC MIXED often showed acceptable control of the Type I error rate for normal data, but it also produced several liberal or conservative results. For nonnormal data, all of the procedures showed clear deviations from the nominal Type I error rate in many conditions, even for sample sizes greater than 50. Thus, none of these approaches can be considered robust if the response variable is nonnormally distributed. The results indicate that both the variance heterogeneity and covariance heterogeneity of the population covariance matrices affect the error rates.
 blah 13428_2012_281_MOESM1_ESM.pdf (333KB)
 Akaike, H (1974) A new look at statistical model identification. IEEE Transactions on Automatic Control AC19: pp. 716723 CrossRef
 Algina, J, Keselman, HJ (1997) Detecting repeated measures effects with univariate and multivariate statistics. Psychological Methods 2: pp. 208218 CrossRef
 Algina, J, Keselman, HJ (1998) A power comparison of the Welch–James and improved general approximation tests in the splitplot design. Journal of Educational and Behavioral Statistics 23: pp. 152169
 Algina, J, Oshima, TC (1994) Type I error rates for Huynh’s general approximation and improved general approximation tests. British Journal of Mathematical and Statistical Psychology 47: pp. 151165 CrossRef
 Arnau, J, Bono, R, Vallejo, G (2009) Analyzing small samples of repeated measures data with the mixedmodel adjusted F test. Communications in Statistics: Simulation and Computation 38: pp. 10831103 CrossRef
 Austin, PJ (2010) Estimating multilevel logistic regression models when the number of clusters is low: A comparison of different statistical software procedures. International Journal of Biostatistics 6: pp. 16
 Berkovits, I, Hancock, GR, Nevitt, J (2000) Bootstrap resampling approaches for repeated measure designs: Relative robustness to sphericity and normality violations. Educational and Psychological Measurement 60: pp. 877892 CrossRef
 Box, GEP (1954) Some theorems on quadratic forms applied in the study of analysis of variance problems. II. Effects of inequality of variance and of correlation between errors in the twoway classification. Annals of Mathematical Statistics 25: pp. 484498 CrossRef
 Bradley, JV (1978) Robustness?. British Journal of Mathematical and Statistical Psychology 31: pp. 144152 CrossRef
 Breslow, NE, Clayton, DG (1993) Approximate inference in generalized linear mixed models. Journal of the American Statistical Association 88: pp. 925
 Brett, M, Johnsrude, IS, Owen, AM (2002) The problem of functional localization in the human brain. Nature Reviews Neuroscience 3: pp. 243249 CrossRef
 Brunner, E, Munzel, U, Puri, ML (1999) Rankscore tests in factorial designs with repeated measures. Journal of Multivariate Analysis 70: pp. 286317 CrossRef
 Brunner, E, Puri, ML (2001) Nonparametric methods in factorial designs. Statistical Papers 42: pp. 152 CrossRef
 Benoit, C (1924) Note sur une méthode de résolution des équations normales provenant de l’application de la méthode des moindres carrés à un système d’équations linéaires en nombre inférieur à celui des inconnues—Application de la méthode à la résolution d’un système defini d’équations linéaires (Procédé du Commandant Cholesky). Bulletin Géodésique 2: pp. 6777 CrossRef
 Davidian, M, Giltinan, DM (1998) Nonlinear models for repeated measurement data. Chapman & Hall/CRC, Boca Raton, FL
 Davidian, M, Giltinan, DM (2003) Nonlinear models for repeated measurement data: An overview and update. Journal of Agricultural, Biological, and Environmental Statistics 8: pp. 387419 CrossRef
 DeCarlo, LT (1997) On the meaning and use of kurtosis. Psychological Methods 2: pp. 292307 CrossRef
 Duncan, J, Humphreys, GW (1989) Visual search and stimulus similarity. Psychological Review 96: pp. 433458 CrossRef
 Eriksen, BA, Eriksen, CW (1974) Effects of noise letters upon the identification of a target letter in a nonsearch task. Perception & Psychophysics 16: pp. 143149 CrossRef
 Ernst, MO, Banks, MS (2002) Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415: pp. 429433 CrossRef
 Fleishman, AI (1978) Method for simulating nonnormal distributions. Psychometrika 43: pp. 521532 CrossRef
 Florentine, M, Buus, S, Poulsen, T (1996) Temporal integration of loudness as a function of level. Journal of the Acoustical Society of America 99: pp. 16331644 CrossRef
 Fouladi, RT, Shieh, YY (2004) A comparison of two general approaches to mixed model longitudinal analyses under small sample size conditions. Communications in Statistics: Simulation and Computation 33: pp. 807824 CrossRef
 Games, PA (1983) Curvilinear transformations of the dependent variable. Psychological Bulletin 93: pp. 382387 CrossRef
 Games, PA (1984) Data transformations, power, and skew: A rebuttal to Levine and Dunlap. Psychological Bulletin 95: pp. 345347 CrossRef
 Geisser, S, Greenhouse, SW (1958) An extension of Box’s results on the use of the F distribution in multivariate analysis. Annals of Mathematical Statistics 29: pp. 885891 CrossRef
 Glass, GV, Peckham, PD, Sanders, JR (1972) Consequences of failure to meet assumptions underlying fixed effects analyses of variance and covariance. Review of Educational Research 42: pp. 237288 CrossRef
 Gomez, EV, Schaalje, GB, Fellingham, GW (2005) Performance of the Kenward–Roger method when the covariance structure is selected using AIC and BIC. Communications in Statistics: Simulation and Computation 34: pp. 377392 CrossRef
 Green, DM, Swets, JA (1966) Signal detection theory and psychophysics. Wiley, New York, NY
 Greenhouse, SW, Geisser, S (1959) On methods in the analysis of profile data. Psychometrika 24: pp. 95112 CrossRef
 Harwell, MR, Rubinstein, EN, Hayes, WS, Olds, CC (1992) Summarizing MonteCarlo results in methodological research: The onefactor and twofactor fixed effects ANOVA cases. Journal of Educational Statistics 17: pp. 315339 CrossRef
 Hays, WL (1988) Statistics. Holt, Rinehart & Winston, Fort Worth, TX
 Headrick, TC (2002) Fast fifthorder polynomial transforms for generating univariate and multivariate nonnormal distributions. Computational Statistics & Data Analysis 40: pp. 685711 CrossRef
 Headrick, TC, Kowalchuk, RK (2007) The power method transformation: Its probability density function, distribution function, and its further use for fitting data. Journal of Statistical Computation and Simulation 77: pp. 229249 CrossRef
 Headrick, TC, Kowalchuk, RK, Sheng, Y (2010) Parametric probability densities and distribution functions for Tukey g and h transformations and their use for fitting data. Applied Mathematical Sciences 2: pp. 449462
 Headrick, TC, Sheng, YY, Hodis, FA (2007) Numerical computing and graphics for the power method transformation using Mathematica. Journal of Statistical Software 19: pp. 117
 Hearne, EM, Clark, GM, Hatch, JP (1983) A test for serial correlation in univariate repeatedmeasures analysis. Biometrics 39: pp. 237243 CrossRef
 Heathcote, A, Brown, S, Cousineau, D (2004) QMPE: Estimating lognormal, Wald, and Weibull RT distributions with a parameterdependent lower bound. Behavior Research Methods, Instruments, & Computers 36: pp. 277290 CrossRef
 Hotelling, H (1931) The generalization of Student’s ratio. Annals of Mathematical Statistics 2: pp. 360378 CrossRef
 Hu, FB, Goldberg, J, Hedeker, D, Flay, BR, Pentz, MA (1998) Comparison of populationaveraged and subjectspecific approaches for analyzing repeated binary outcomes. American Journal of Epidemiology 147: pp. 694703 CrossRef
 Huynh, H, Feldt, LS (1970) Conditions under which mean square ratios in repeated measurements designs have exact Fdistributions. Journal of the American Statistical Association 65: pp. 15821589 CrossRef
 Huynh, H, Feldt, LS (1976) Estimation of the Box correction for degrees of freedom from sample data in randomized block and splitplot designs. Journal of Educational Statistics 1: pp. 6982 CrossRef
 Huynh, H, Mandeville, GK (1979) Validity conditions in repeated measures designs. Psychological Bulletin 86: pp. 964973 CrossRef
 Jaeger, TF (2008) Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language 59: pp. 434446 CrossRef
 Jennrich, RI, Schluchter, MD (1986) Unbalanced repeatedmeasures models with structured covariance matrices. Biometrics 42: pp. 805820 CrossRef
 Jin, Y, Hein, MJ, Deddens, JA, Hines, CJ (2011) Analysis of lognormally distributed exposure data with repeated measures and values below the limit of detection using SAS. Annals of Occupational Hygiene 55: pp. 97112 CrossRef
 Kaiser, HF, Dickman, K (1962) Sample and population score matrices and sample correlation matrices from an arbitrary population correlation matrix. Psychometrika 27: pp. 179182 CrossRef
 Kanwisher, N, McDermott, J, Chun, MM (1997) The fusiform face area: A module in human extrastriate cortex specialized for face perception. Journal of Neuroscience 17: pp. 43024311
 Kenward, MG, Roger, JH (1997) Small sample inference for fixed effects from restricted maximum likelihood. Biometrics 53: pp. 983997 CrossRef
 Kenward, MG, Roger, JH (2009) An improved approximation to the precision of fixed effects from restricted maximum likelihood. Computational Statistics & Data Analysis 53: pp. 25832595 CrossRef
 Keselman, HJ, Algina, J, Kowalchuk, RK (2001) The analysis of repeated measures designs: A review. British Journal of Mathematical and Statistical Psychology 54: pp. 120 CrossRef
 Keselman, HJ, Algina, J, Kowalchuk, RK (2002) A comparison of data analysis strategies for testing omnibus effects in higherorder repeated measures designs. Multivariate Behavioral Research 37: pp. 331357 CrossRef
 Keselman, HJ, Algina, J, Kowalchuk, RK, Wolfinger, RD (1999) The analysis of repeated measurements: A comparison of mixedmodel Satterthwaite F tests and a nonpooled adjusted degrees of freedom multivariate test. Communications in Statistics: Theory and Methods 28: pp. 29672999 CrossRef
 Keselman, HJ, Algina, J, Kowalchuk, RK, Wolfinger, RD (1999) A comparison of recent approaches to the analysis of repeated measurements. British Journal of Mathematical and Statistical Psychology 52: pp. 6378 CrossRef
 Keselman, HJ, Carriere, KC, Lix, LM (1993) Testing repeatedmeasures hypotheses when covariance matrices are heterogeneous. Journal of Educational Statistics 18: pp. 305319 CrossRef
 Keselman, HJ, Keselman, JC, Lix, LM (1995) The analysis of repeated measurements: Univariate tests, multivariate tests, or both?. British Journal of Mathematical and Statistical Psychology 48: pp. 319338 CrossRef
 Keselman, HJ, Kowalchuk, RK, Algina, J, Lix, LM, Wilcox, RR (2000) Testing treatment effects in repeated measures designs: Trimmed means and bootstrapping. British Journal of Mathematical and Statistical Psychology 53: pp. 175191 CrossRef
 Keselman, HJ, Kowalchuk, RK, Boik, RJ (2000) An examination of the robustness of the empirical Bayes and other approaches for testing main and interaction effects in repeated measures designs. British Journal of Mathematical and Statistical Psychology 53: pp. 5167 CrossRef
 Keselman, HJ, Rogan, JC, Mendoza, JL, Breen, LJ (1980) Testing the validity conditions of repeated measures Ftests. Psychological Bulletin 87: pp. 479481 CrossRef
 Kowalchuk, RK, Keselman, HJ, Algina, J, Wolfinger, RD (2004) The analysis of repeated measurements with mixedmodel adjusted F tests. Educational and Psychological Measurement 64: pp. 224242 CrossRef
 Kubinger, KD, Rasch, D, Moder, K (2009) Zur Legende der Voraussetzungen des tTests für unabhängige Stichproben [On the legend of the prerequisites of ttests for independent samples]. Psychologische Rundschau 60: pp. 2627 CrossRef
 Kuss, O. (2002, April). How to use SAS for logistic regression with correlated data. Paper presented at the 27th Annual SAS Users Group International Conference, Orlando, FL.
 Cam, L (1986) The central limit theorem around 1935. Statistical Science 1: pp. 7891 CrossRef
 Lecoutre, B (1991) A correction for the epsilon approximate test in repeated measures designs with two or more independent groups. Journal of Educational Statistics 16: pp. 371372 CrossRef
 Lee, Y, Nelder, JA (2001) Modelling and analysing correlated nonnormal data. Statistical Modelling 1: pp. 316 CrossRef
 Levine, DW, Dunlap, WP (1982) Power of the F test with skewed data: Should one transform or not?. Psychological Bulletin 92: pp. 272280 CrossRef
 Levine, DW, Dunlap, WP (1983) Data transformation, power, and skew: A rejoinder to Games. Psychological Bulletin 93: pp. 596599 CrossRef
 Lipsitz, SR, Kim, K, Zhao, LP (1994) Analysis of repeated categorical data using generalized estimating equations. Statistics in Medicine 13: pp. 11491163 CrossRef
 Littell, RC, Milliken, GA, Stroup, WW, Wolfinger, RD, Schabenberger, O (2006) SAS for mixed models. SAS Institute, Inc, Cary, NC
 Littell, RC, Pendergast, J, Natarajan, R (2000) Modelling covariance structure in the analysis of repeated measures data. Statistics in Medicine 19: pp. 17931819 CrossRef
 Lix, LM, Keselman, JC, Keselman, HJ (1996) Consequences of assumption violations revisited: A quantitative review of alternatives to the oneway analysis of variance F test. Review of Educational Research 66: pp. 579619
 Luce, RD (1986) Response times: Their role in inferring elementary mental organization. Oxford University Press, New York, NY
 MacLeod, CM (1991) Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin 109: pp. 163203 CrossRef
 Mauchly, JW (1940) Significance test for sphericity of nvariate normal populations. Annals of Mathematical Statistics 11: pp. 204209 CrossRef
 Maxwell, SE, Delaney, HD (2004) Designing experiments and analyzing data: A model comparison perspective. Erlbaum, Mahwah, NJ
 McCullagh, P, Nelder, JA (1989) Generalized linear models. Chapman and Hall, London, U.K.
 Meiran, N (1996) Reconfiguration of processing mode prior to task performance. Journal of Experimental Psychology: Learning, Memory, and Cognition 22: pp. 14231442 CrossRef
 Micceri, T (1989) The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin 105: pp. 156166 CrossRef
 Muller, KE, Edwards, LJ, Simpson, SL, Taylor, DJ (2007) Statistical tests with accurate size and power for balanced linear mixed models. Statistics in Medicine 26: pp. 36393660 CrossRef
 Neuhaus, JM, Kalbfleisch, JD, Hauck, WW (1991) A comparison of clusterspecific and populationaveraged approaches for analyzing correlated binary data. International Statistical Review 59: pp. 2535 CrossRef
 Padilla, MA, Algina, J (2004) Type I error rates for a one factor withinsubjects design with missing values. Journal of Modern Applied Statistical Methods 3: pp. 406416
 Palmer, EM, Horowitz, TS, Torralba, A, Wolfe, JM (2011) What are the shapes of response time distributions in visual search?. Journal of Experimental Psychology. Human Perception and Performance 37: pp. 5871 CrossRef
 Pendergast, JF, Gange, SJ, Newton, MA, Lindstrom, MJ, Palta, M, Fisher, MR (1996) A survey of methods for analyzing clustered binary response data. International Statistical Review 64: pp. 89118 CrossRef
 Potvin, PJ, Schutz, RW (2000) Statistical power for the twofactor repeated measures ANOVA. Behavior Research Methods, Instruments, & Computers 32: pp. 347356 CrossRef
 Quintana, SM, Maxwell, SE (1994) A MonteCarlo comparison of seven epsilonadjustment procedures in repeatedmeasures designs with small sample sizes. Journal of Educational Statistics 19: pp. 5771 CrossRef
 Rasmussen, JL (1989) Data transformation, Type I error rate and power. British Journal of Mathematical and Statistical Psychology 42: pp. 203213 CrossRef
 Rouanet, H, Lépine, D (1970) Comparison between treatments in a repeatedmeasurement design: ANOVA and multivariate methods. British Journal of Mathematical and Statistical Psychology 23: pp. 147163 CrossRef
 Rubin, DB (1976) Inference and missing data. Biometrika 63: pp. 581590 CrossRef
 Sams, M, Paavilainen, P, Alho, K, Näätänen, R (1985) Auditory frequency discrimination and eventrelated potentials. Electroencephalography and Clinical Neurophysiology 62: pp. 437448 CrossRef
 Sawilowsky, SS, Blair, RC (1992) A more realistic look at the robustness and TypeII error properties of the t test to departures from population normality. Psychological Bulletin 111: pp. 352360 CrossRef
 Schaalje, GB, McBride, JB, Fellingham, GW (2002) Adequacy of approximations to distributions of test statistics in complex mixed linear models. Journal of Agricultural, Biological, and Environmental Statistics 7: pp. 512524 CrossRef
 Schmider, E, Ziegler, M, Danay, E, Beyer, L, Bühner, M (2010) Is it really robust? Reinvestigating the robustness of ANOVA against violations of the normal distribution assumption. Methodology 6: pp. 147151
 Schwarz, G (1978) Estimating the dimension of a model. The Annals of Statistics 6: pp. 461464 CrossRef
 Skene, SS, Kenward, MG (2010) The analysis of very small samples of repeated measurements I: An adjusted sandwich estimator. Statistics in Medicine 29: pp. 28252837 CrossRef
 Spiess, M, Hamerle, A (2000) A comparison of different methods for the estimation of regression models with correlated binary responses. Computational Statistics and Data Analysis 33: pp. 439455 CrossRef
 Ulrich, R, Miller, J (1994) Effects of truncation on reaction time analysis. Journal of Experimental Psychology. General 123: pp. 3480 CrossRef
 Vale, CD, Maurelli, VA (1983) Simulating multivariate nonnormal distributions. Psychometrika 48: pp. 465471 CrossRef
 Vallejo, G, LivacicRojas, P (2005) Comparison of two procedures for analyzing small sets of repeated measures data. Multivariate Behavioral Research 40: pp. 179205 CrossRef
 Vallejo Seco, G, Izquierdo, MC, Garcia, MPF, Diez, RJH (2006) A comparison of the bootstrapF, improved general approximation, and Brown–Forsythe multivariate approaches in a mixed repeated measures design. Educational and Psychological Measurement 66: pp. 3562 CrossRef
 Zandt, T (2000) How to fit a response time distribution. Psychonomic Bulletin & Review 7: pp. 424465 CrossRef
 Vonesh, EF, Chinchilli, VM (1997) Linear and nonlinear models for the analysis of repeated measurements. Dekker, New York, NY
 Wilcox, RR (2005) Introduction to robust estimation and hypothesis testing. Elsevier/Academic Press, Amsterdam, The Netherlands
 Wilcox, RR, Keselman, HJ (2003) Repeated measures oneway ANOVA based on a modified onestep Mestimator. British Journal of Mathematical and Statistical Psychology 56: pp. 1525 CrossRef
 Wilcox, RR, Keselman, HJ, Muska, J, Cribbie, R (2000) Repeated measures ANOVA: Some new results on comparing trimmed means and means. British Journal of Mathematical and Statistical Psychology 53: pp. 6982 CrossRef
 Winer, BJ, Brown, DR, Michels, KM (1991) Statistical principles in experimental design. McGrawHill, New York, NY
 Wolfinger, RD (1996) Heterogeneous variance: Covariance structures for repeated measures. Journal of Agricultural, Biological, and Environmental Statistics 1: pp. 205230 CrossRef
 Title
 Evaluating the robustness of repeated measures analyses: The case of small sample sizes and nonnormal data
 Journal

Behavior Research Methods
Volume 45, Issue 3 , pp 792812
 Cover Date
 20130901
 DOI
 10.3758/s1342801202812
 Online ISSN
 15543528
 Publisher
 Springer US
 Additional Links
 Topics
 Keywords

 Analysis of variance
 Robustness
 Nonnormality
 Small sample settings
 Repeated measurements
 Correlated data
 Multivariate
 Mixed model analyses
 Multilevel model
 Simulation study
 Type I error rate
 Central limit theorem
 Monte Carlo
 Industry Sectors
 Authors

 Daniel Oberfeld ^{(1)}
 Thomas Franke ^{(1)}
 Author Affiliations

 1. Department of Psychology, Johannes GutenbergUniversität, 55099, Mainz, Germany