A latent variable model approach to estimating systematic bias in the oversampling method
The method of oversampling data from a preselected range of a variable’s distribution is often applied by researchers who wish to study rare outcomes without substantially increasing sample size. Despite frequent use, however, it is not known whether this method introduces statistical bias due to disproportionate representation of a particular range of data. The present study employed simulated data sets to examine how oversampling introduces systematic bias in effect size estimates (of the relationship between oversampled predictor variables and the outcome variable), as compared with estimates based on a random sample. In general, results indicated that increased oversampling was associated with a decrease in the absolute value of effect size estimates. Critically, however, the actual magnitude of this decrease in effect size estimates was nominal. This finding thus provides the first evidence that the use of the oversampling method does not systematically bias results to a degree that would typically impact results in behavioral research. Examining the effect of sample size on oversampling yielded an additional important finding: For smaller samples, the use of oversampling may be necessary to avoid spuriously inflated effect sizes, which can arise when the number of predictor variables and rare outcomes is comparable.
KeywordsSampling statistical bias latent variable modeling
- Alloy, L. B., Abramson, L. Y., Hogan, M. E., Whitehouse, W. G., Rose, D. T., Robinson, M. S., & Lapkin, J. B. (2000). The temple-wisconsin cognitive vulnerability to depression project: Lifetime history of axis I psychopathology in individuals at high and low cognitive risk for depression. Journal of Abnormal Psychology, 109(3), 403–418.PubMedCrossRefGoogle Scholar
- Alloy, L. B., Abramson, L. Y., Whitehouse, W. G., Hogan, M. E., Panzarella, C., & Rose, D. T. (2006). Prospective incidence of first onsets and recurrences of depression in individuals at high and low cognitive risk for depression. Journal of Abnormal Psychology, 115(1), 145–156.PubMedCrossRefGoogle Scholar
- Humphreys, L. G. (1985). Correlations in psychological research. In D. K. Detterman (Ed.), Current topics in human intelligence (Research methodology, Vol. 1, pp. 3–24). Norwood, NJ: Ablex Publishing.Google Scholar
- Development Core Team, R. (2007). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.Google Scholar
- Zinbarg, R. E., Mineka, S., Craske, M. G., Griffith, J. W., Sutton, J., Rose, R. D., & Waters, A. M. (2010). The Northwestern-UCLA youth emotion project: Associations of cognitive vulnerabilities, neuroticism and gender with past diagnoses of emotional disorders in adolescents. Behaviour Research and Therapy, 48(5), 347–358. doi:10.1016/j.brat.2009.12.008 PubMedCentralPubMedCrossRefGoogle Scholar