Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner's curse
 Wolfgang Forstmeier,
 Holger Schielzeth
Abstract
Fitting generalised linear models (GLMs) with more than one predictor has become the standard method of analysis in evolutionary and behavioural research. Often, GLMs are used for exploratory data analysis, where one starts with a complex full model including interaction terms and then simplifies by removing nonsignificant terms. While this approach can be useful, it is problematic if significant effects are interpreted as if they arose from a single a priori hypothesis test. This is because model selection involves cryptic multiple hypothesis testing, a fact that has only rarely been acknowledged or quantified. We show that the probability of finding at least one ‘significant’ effect is high, even if all null hypotheses are true (e.g. 40% when starting with four predictors and their twoway interactions). This probability is close to theoretical expectations when the sample size (N) is large relative to the number of predictors including interactions (k). In contrast, type I error rates strongly exceed even those expectations when model simplification is applied to models that are overfitted before simplification (low N/k ratio). The increase in falsepositive results arises primarily from an overestimation of effect sizes among significant predictors, leading to upwardbiased effect sizes that often cannot be reproduced in followup studies (‘the winner's curse’). Despite having their own problems, full model tests and P value adjustments can be used as a guide to how frequently type I errors arise by sampling variation alone. We favour the presentation of full models, since they best reflect the range of predictors investigated and ensure a balanced representation also of nonsignificant results.
 Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner's curse
Behavioral Ecology and Sociobiology
Volume 65, Issue 1 , pp 4755
 20110101
 10.1007/s0026501010385
 Bonferroni correction
 Effect size estimation
 Generalised linear models
 Model selection
 Multiple regression
 Multiple testing
 Parameter estimation
 Publication bias
 Wolfgang Forstmeier
 Holger Schielzeth
 1. Max Planck Institute for Ornithology, EberhardGwinnerStr., 82319, Seewiesen, Germany
 2. Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, SE752 36, Uppsala, Sweden