# Prospective evaluation of designs for analysis of variance without knowledge of effect sizes

- 269 Downloads
- 2 Citations

## Abstract

Estimation of design power requires knowledge of treatment effect size and error variance, which are often unavailable for ecological studies. In the absence of prior information on these parameters, investigators can compare an alternative to a reference design for the same treatment(s) in terms of its precision at equal sensitivity. This measure of relative performance calculates the fractional error variance allowed of the alternative for it to just match the power of the reference. Although first suggested as a design tool in the 1950s, it has received little analysis and no uptake by environmental scientists or ecologists. We calibrate relative performance against the better known criterion of relative efficiency, in order to reveal its unique advantage in controlling sensitivity when considering the precision of estimates. The two measures differ strongly for designs with low replication. For any given design, relative performance at least doubles with each doubling of effective sample size. We show that relative performance is robustly approximated by the ratio of reference to alternative \(\alpha \) quantiles of the \(F\) distribution, multiplied by the ratio of alternative to reference effective sample sizes. The proxy is easy to calculate, and consistent with exact measures. Approximate or exact measurement of relative performance serves a useful purpose in enumerating trade-offs between error variance and error degrees of freedom when considering whether to block random variation or to sample from a more or less restricted domain.

## Keywords

ANOVA mixed models Experimental design Power analysis Sensitivity analysis Significance test Statistical power## Notes

### Acknowledgments

This work was supported by grant NE/C003705/1 to CPD from the UK Natural Environment Research Council. Valuable comments and correctives were offered by K. E. Muller, R. W. Payne, W. J. Resetarits Jr, M. S. Ridout, and A. J. Vickers. Two anonymous reviewers made helpful suggestions for improving clarity.

## References

- Abou-el-Fittouh HA (1976) Relative efficiency of the randomized complete block design. Exp Agric 12:145–149CrossRefGoogle Scholar
- Abou-el-Fittouh HA (1978) Relative efficiency of the split-plot design. Exp Agric 14:65–72CrossRefGoogle Scholar
- Anderson S, Hauck WW (1983) A new procedure for testing equivalence in comparative bioavailability and other clinical trials. Commun Stat A-Theor 12:2663–2692CrossRefGoogle Scholar
- Bacchetti P (2010) Current sample size conventions: flaws, harms, and alternatives. BMC Med 8:17. http://www.biomedcentral.com/1741-7015/8/17 Google Scholar
- Baguley T (2004) Understanding statistical power in the context of applied research. Appl Ergon 35:73–80PubMedCrossRefGoogle Scholar
- Bausell RB, Li Y-F (2002) Power analysis for experimental research: a practical guide for the biological, medical and social sciences. Cambridge University Press, CambridgeCrossRefGoogle Scholar
- Blair RC, Higgins JJ, Karniski W, Kromrey JD (1994) A study of multivariate permutation tests which may replace Hotelling’s T2 in prescribed circumstances. Multivar Behav Res 29:141–163CrossRefGoogle Scholar
- Brosi BJ, Biber EG (2009) Statistical inference, Type II error, and decision making under the US Endangered Species Act. Front Ecol Environ 7:487–494CrossRefGoogle Scholar
- Brown MB, Forsythe AB (1974) Small sample behaviour of some statistics which test equality of several means. Technometrics 16:129–132CrossRefGoogle Scholar
- Brunner E, Dette H, Munk A (1997) Box-type approximations in nonparametric factorial designs. J Am Stat Assoc 92:1494–1502CrossRefGoogle Scholar
- Cochran WG, Cox GM (1957) Experimental designs, 2nd edn. Wiley, New YorkGoogle Scholar
- Colegrave N, Ruxton GD (2003) Confidence intervals are a more useful complement to nonsignificant tests than are power calculations. Behav Ecol 14:446–450CrossRefGoogle Scholar
- Cumming G (2008) Replication and \(p\) intervals: \(p\) values predict the future only vaguely, but confidence intervals do much better. Perspect Psychol Sci 3:286–300CrossRefGoogle Scholar
- Doncaster CP, Davey AJH (2007) Analysis of variance and covariance: how to choose and construct models for the life sciences. Cambridge University Press, Cambridge. http://www.personal.soton.ac.uk/cpd/anovas/datasets/
- Dutilleul P, Carrière Y (1998) Among-environment heteroscedasticity and the estimation and testing of genetic correlation. Heredity 80:403–413CrossRefGoogle Scholar
- Dutilleul P, Potvin C (1995) Among-environment heteroscedasticity and genetic autocorrelation: implications for the study of phenotypic plasticity. Genetics 139:1815–1829PubMedCentralPubMedGoogle Scholar
- Faul F, Erdfelder E, Lang A-G, Buchner A (2007) G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods 39:175–191PubMedCrossRefGoogle Scholar
- Fisher RA (1935, 1960) The design of experiments. Oliver and Boyd, EdinburghGoogle Scholar
- Hardin JW, Hilbe JM (2012) Generalized linear models and extensions, 3rd edn. Stata Press, College StationGoogle Scholar
- Hinkelmann K, Kempthorne O (1994) Design and analysis of experiments, vol I. Wiley, New YorkGoogle Scholar
- Hoenig JM, Heisey DM (2001) The abuse of power: the pervasive fallacy of power calculations for data analysis. Am Stat 55:19–24CrossRefGoogle Scholar
- Kent A, Hawkins SJ, Doncaster CP (2003) Population consequences of mutual attraction between settling and adult barnacles. J Anim Ecol 72:941–952CrossRefGoogle Scholar
- Kenward MG, Roger JH (1997) Small sample inference for fixed effects from restricted maximum likelihood. Biometrics 53:983–997PubMedCrossRefGoogle Scholar
- Kirk RE (1982) Experimental design: procedures for the behavioral sciences. Wadsworth, BelmontGoogle Scholar
- Kraemer HC, Thiemann S (1987) How many subjects? Statistical power analysis in research. Sage, LondonGoogle Scholar
- Legendre P, Dale MRT, Fortin MJ, Casgrain P, Gurevitch J (2004) Effects of spatial structures on the results of field experiments. Ecology 85:3202–3214CrossRefGoogle Scholar
- Lai K, Kelley K (2012) Accuracy in parameter estimation for ANCOVA and ANOVA contrasts: Sample size planning via narrow confidence intervals. Br J Math Stat Psychol 65:350–370PubMedCrossRefGoogle Scholar
- Lenth RV (2001) Some practical guidelines for effective sample size determination. Am Stat 55:187–193CrossRefGoogle Scholar
- Lenth RV (2006) Java applets for power and sample size [Computer software]. Retrieved August 3rd 2007, from http://www.stat.uiowa.edu/~rlenth/Power
- Lipsey MW (1990) Design sensitivity: statistical power for experimental research. Sage, Newbury ParkGoogle Scholar
- Maxwell SE, Kelley K, Rausch JR (2008) Sample size planning for statistical power and accuracy in parameter estimation. Ann Rev Psychol 59:537–563CrossRefGoogle Scholar
- Menendez R, Megias AG, Hill JK, Braschler B, Willis SG, Collingham Y, Fox R, Roy DB, Thomas CD (2006) Species richness changes lag behind climate change. Proc R Soc Lond B 273:1465–1470CrossRefGoogle Scholar
- Muller KE, Stewart PW (2006) Linear model theory: univariate, multivariate, and mixed models. Wiley, New YorkCrossRefGoogle Scholar
- Neyman J, Iwaszkiewicz K, Kolodziejczyk St (1935) Statistical problems in agricultural experimentation. J R Stat Soc 2:107–180Google Scholar
- Patnaik PB (1949) The non-central \(\chi ^{2}\)- and \(F\)-distributions and their applications. Biometrika 36:202–232PubMedGoogle Scholar
- R Development Core Team (2010) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org
- Rasch D, Herrendörfer G (1986) Experimental design: sample size determination and block designs. D Reidel, DordrechtGoogle Scholar
- Shieh G, Show-Li J (2004) The effectiveness of randomized complete block design. Stat Neerl 58:111–124CrossRefGoogle Scholar
- Stanton ML, Thiede DA (2005) Statistical convenience vs biological insight: consequences of data transformation for the analysis of fitness variation in heterogeneous environments. New Phytol 166:319–338PubMedCrossRefGoogle Scholar
- Steel RGD, Torrie JH (1960) Principles and procedures of statistics with special reference to the biological sciences. McGraw-Hill, New YorkGoogle Scholar
- Tagg N, Innes DJ, Doncaster CP (2005) Outcomes of reciprocal invasions between genetically diverse and genetically uniform populations of
*Daphnia obtusa*(Kurz). Oecologia 143:527–536Google Scholar - Underwood AJ (1997) Experiments in ecology: their logical design and interpretation using analysis of variance. Cambridge University Press, CambridgeGoogle Scholar
- Verrill S, Durst M (2005) The decline and fall of Type II error rates. Am Stat 59:287–291CrossRefGoogle Scholar
- Vonesh EF (1983) Efficiency of repeated measures designs versus completely randomized designs based on multiple comparisons. Commun Stat A-Theor 12:289–301CrossRefGoogle Scholar
- Wang M, Hering F (2005) Efficiency of split-block designs versus split-plot designs for hypothesis testing. J Stat Plan Infer 132:163–182CrossRefGoogle Scholar
- Webb RY, Smith PJ, Firag A (2010) On the probability of improved accuracy with increased sample size. Am Stat 64:257–262CrossRefGoogle Scholar
- Welch BL (1951) On the comparison of several mean values: an alternative approach. Biometrika 38:330–336CrossRefGoogle Scholar
- White H (1980) A heteroscedastic-consistent covariance matrix estimator and a direct test for heteroscedasticity. Econometrika 48:817–838CrossRefGoogle Scholar