A latent variable model approach to estimating systematic bias in the oversampling method
The method of oversampling data from a preselected range of a variable’s distribution is often applied by researchers who wish to study rare outcomes without substantially increasing sample size. Despite frequent use, however, it is not known whether this method introduces statistical bias due to disproportionate representation of a particular range of data. The present study employed simulated data sets to examine how oversampling introduces systematic bias in effect size estimates (of the relationship between oversampled predictor variables and the outcome variable), as compared with estimates based on a random sample. In general, results indicated that increased oversampling was associated with a decrease in the absolute value of effect size estimates. Critically, however, the actual magnitude of this decrease in effect size estimates was nominal. This finding thus provides the first evidence that the use of the oversampling method does not systematically bias results to a degree that would typically impact results in behavioral research. Examining the effect of sample size on oversampling yielded an additional important finding: For smaller samples, the use of oversampling may be necessary to avoid spuriously inflated effect sizes, which can arise when the number of predictor variables and rare outcomes is comparable.