Prevention Science

, Volume 19, Issue 3, pp 274–283 | Cite as

Pretest Measures of the Study Outcome and the Elimination of Selection Bias: Evidence from Three Within Study Comparisons

  • Kelly Hallberg
  • Thomas D. Cook
  • Peter M. Steiner
  • M. H. Clark


This paper examines how pretest measures of a study outcome reduce selection bias in observational studies in education. The theoretical rationale for privileging pretests in bias control is that they are often highly correlated with the outcome, and in many contexts, they are also highly correlated with the selection process. To examine the pretest’s role in bias reduction, we use the data from two within study comparisons and an especially strong quasi-experiment, each with an educational intervention that seeks to improve achievement. In each study, the pretest measures are consistently highly correlated with post-intervention measures of themselves, but the studies vary the correlation between the pretest and the process of selection into treatment. Across the three datasets with two outcomes each, there are three cases where this correlation is low and three where it is high. A single wave of pretest always reduces bias across the six instances examined, and it eliminates bias in three of them. Adding a second pretest wave eliminates bias in two more instances. However, the pattern of bias elimination does not follow the predicted pattern—that more bias reduction ensues as a function of how highly the pretest is correlated with selection. The findings show that bias is more complexly related to the pretest’s correlation with selection than we hypothesized, and we seek to explain why.


Within-study comparison Propensity score matching Randomized experiment Causal inference 


Compliance with Ethical Standards


This work was supported by the National Science Foundation Grant DRL-1228866.

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. For this type of study, formal consent is not required. This article does not contain any studies with animals performed by any of the authors.

Informed Consent

This study only included de-identified, secondary data analysis. For this type of study, formal consent is not required.


  1. Alexander, K. L., Entwisle, R. D., & Dauber, S. L. (2003). On the success of failure: A reassessment of the effects of retention in the primary school grades. Cambridge: New York.Google Scholar
  2. Ashenfelter, O. (1978). Estimating the effect of training programs on earnings. Review of Economics and Statistics, 67, 47–57.Google Scholar
  3. Bifulco, R. (2012). Can nonrandomized estimates replicate estimates based on random assignment in evaluations of school choice? A within-study comparison. Journal of Policy Analysis and Management, 31, 729–751.CrossRefGoogle Scholar
  4. Bloom, H., Michalopoulos, C., & Hill, C. (2005). Using experiments to assess nonexperimental comparison-group methods for measuring program effects. In H. Bloom (Ed.), Learning more from social experiments. New York: Russell Sage.Google Scholar
  5. Campbell, D. T. (1957). Factors relevant to the validity of experiments in social setting. Psychological Bulletin, 54, 297–312.CrossRefPubMedGoogle Scholar
  6. Campbell, D. T., & Boruch, R. F. (1975). Making the case for randomized assignment to treatments by considering the alternatives. In C. A. Bennett & A. A. Lumsdaine (Eds.), Evaluation and experiments: Some critical issues in assessing social programs. New York: Academic.Google Scholar
  7. Campbell, D.T., & Erlebacher, A. E. (1970). How regression artifacts can mistakenly make compensatory education programs look harmful. In J. Hellmuth (Ed.), The disadvantaged child: Vol. 3, Compensatory education: A national debate. New York: Brunner/Mazel.Google Scholar
  8. Campbell, D. T., & Stanley, J. (1963). Experimental and quasi-experimental designs for research. Boston, MA: Houghton Mifflin Company.Google Scholar
  9. Cook, T. D., Shadish, W. J., & Wong, V. C. (2008). Three conditions under which observational studies produce the same results as experiments. Journal of Policy Analysis and Management, 27, 724–750.CrossRefGoogle Scholar
  10. Cronbach, L. (1982). Desigining evaluations of educational and social programs. San Francisco, CA: Jossey-Bass Publishers.Google Scholar
  11. Demirtas, H., & Hedeker, D. (2011). A practical way for computing approximate upper and lower correlation bounds. The American Statistician, 65, 2.CrossRefGoogle Scholar
  12. Elwert, F. & Winship, C. (2014). Endogenous selection bias: The problem of conditioning on a collider variable. The Annual Review of Sociology.Google Scholar
  13. Glazerman, S., Levy, D., & Myers, D. (2003). Nonexperimental versus experimental estimates of earnings impacts. The Annals of the American Academy, 589, 63–91.CrossRefGoogle Scholar
  14. Hong, G., & Raudenbush, S. W. (2005). Effects of kindergarten retention policy on children’s cognitive growth in reading and mathematics. Education Evaluation and Policy Analysis, 27, 205–224.CrossRefGoogle Scholar
  15. Hong, G., & Raudenbush, S. W. (2006). Evaluation kindergarten retention: A case study for causal inference for multilevel observational data. Journal of the American Statistical Association, 101, 901–910.CrossRefGoogle Scholar
  16. Jackson, G. B. (1975). The research evidence on the effects of grade retention. Review of Educational Research, 45, 613–635.CrossRefGoogle Scholar
  17. Konstantopoulos, S., Miller, S., & van der Ploeg, A. (2013). The impact of Indiana’s interim assessments on methematics and reading. Educational Evaluation and Policy Analysis, 35, 481–499.CrossRefGoogle Scholar
  18. LaLonde, R. (1986). Evaluating the econometric evalautions of training programs with experimental data. Annual Economic Review, 76, 604–20.Google Scholar
  19. Pearl, J. (2009). The structural theory of causation. In P. McKay Illari, F. Russo, & J. Williamson (Eds.), Causality in the sciences (pp. 1–30). Oxford: Clarendon.CrossRefGoogle Scholar
  20. Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55.CrossRefGoogle Scholar
  21. Rubin, D. B. (2001). Using propensity scores to help design observational studies: Application to the tobacco litigation. Health Services & Outcomes Research Methodology, 2, 169–188.CrossRefGoogle Scholar
  22. Rubin, D. B., & Thomas, N. (1996). Characterizing the effect of using linear propensity score methods with normal distributions. Biometrika, 79, 797–809.Google Scholar
  23. Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment. Journal of the American Statistical Association, 103, 1334–1343.CrossRefGoogle Scholar
  24. Shadish, W. R., Cook, T., & Campbell, D. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin Company.Google Scholar
  25. Smith, J., & Todd, P. (2005). Does matching overcome LaLonde’s critique of nonexperimental estimators? Journal of Econometrics, 305–353.Google Scholar
  26. St. Clair, T., Cook, T.D., & Hallberg, K. (2014). Examining the internal validity and statistical precision of the comparative interrupted time series design by comparison with a randomized experiment. American Journal of Evaluation.Google Scholar
  27. Steiner, P. M., Cook, T. D., Shadish, W. R., & Clark, M. H. (2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods, 15, 250–67.CrossRefPubMedGoogle Scholar
  28. Steiner, P. M., Cook, T. D., & Shadish, W. R. (2011). On the importance of reliable covariate measurement in selection bias adjustments using propensity scores. Journal of Educational and Behavioral Statistics, 36, 213.Google Scholar
  29. Wong, V., Valentine, J.C. & Miller-Bains, K. (2016). Empirical performance of covariates in education observational studies. Journal of Research on Educational Effectiveness.Google Scholar
  30. Wooldridge, J.M. (2009). Should instrumental variables be used as matching variables? Working paper.Google Scholar

Copyright information

© Society for Prevention Research 2016

Authors and Affiliations

  • Kelly Hallberg
    • 1
  • Thomas D. Cook
    • 2
    • 3
  • Peter M. Steiner
    • 4
  • M. H. Clark
    • 5
  1. 1.University of ChicagoChicagoUSA
  2. 2.Mathematica Policy ResearchChicagoUSA
  3. 3.Northwestern UniversityEvanstonUSA
  4. 4.University of WisconsinMadisonUSA
  5. 5.University of Central FloridaOrlandoUSA

Personalised recommendations