Propensity score analysis: promise, reality and irrational exuberance
- William R. Shadish
- … show all 1 hide
Purchase on Springer.com
$39.95 / €34.95 / £29.95*
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.
The aim of this work is to examine the promise that propensity scores can yield accurate effect estimates in nonrandomized experiments, review research on the realities of the conditions needed to meet this promise, and caution against irrational exuberance about their capacity to meet this promise.
A review of selected experimental work that illustrates both the promise and realities of propensity score analysis.
Propensity score analysis of nonrandomized experiments can yield the same results as randomized experiments. Those estimates depend on meeting the strong ignorability assumption that the available covariates well describe selection processes and on use of comparison groups that are from the same location with very similar focal characteristics. When those assumptions are not met, propensity scores may not yield accurate estimates.
The use of propensity score analysis has proliferated exponentially, especially in the last decade, but careful attention to its assumptions seems to be very rare in practice. Researchers and policymakers who rely on these extensive propensity score applications may be using evidence of largely unknown validity. All stakeholders should devote far more empirical attention to justifying that each study has met these assumptions.
- Belister, S. V., Martens, E. P., Pestman, W. R., Groenwold, R. H. H., de Boer, A., & Klungel, O. H. (2011). Measuring balance and model selection in propensity score methods. Pharmacoepidemiology and Drug Safety, 20, 1115–1129. CrossRef
- Cook, T. D., Shadish, W. R., & Wong, V. C. (2008). Three conditions under which experiments and observational studies produce comparable causal estimates: new findings from within-study comparisons. Journal of Policy Analysis and Management, 27, 724–750. CrossRef
- Feng, P., Zhou, Z.-H., Zou, Q.-M., Fan, M.-Y., & Li, X.-S. (2011). Generalized propensity score for estimating the average treatment effect of multiple treatments. Statistics in Medicine, 12, 681–697. doi:10.1002/sim.4168.
- Francis, G. (2012). Too good to be true. Publication bias in two prominent studies from experimental psychology. Psychonomic Bulletin and Review, 19, 151–156. doi:10.3758/s13423-012-0227-9. CrossRef
- Guo, S., & Fraser, M. W. (2010). Propensity score analysis: Statistical methods and applications. Thousand Oaks: Sage Publications.
- Ioannidis, J. P. A. (2005). Why most published research findings are false. PLOS Medicine, 2(8), e124. doi:10.1371/journal.pmed/0020124. CrossRef
- Ioannidis, J. P. A. (2008). Perfect study, poor evidence: interpretation of biases preceding study design. Seminars in Hematology, 45, 160–166. CrossRef
- Ioannidis, J., & Lau, J. (2001). Evolution of treatment effects over time: empirical insight from recursive cumulative meta-analyses. Proceedings of the National Academy of Science USA, 98, 831–836. CrossRef
- Ioannidis, J. P. A., & Panagiotou, O. A. (2011). Comparison of effect sizes associated with biomarkers reported in highly cited individual articles and in subsequent meta-analyses. Journal of the American Medical Association, 305, 2200–2210. CrossRef
- Ioannidis, J. P., & Trikalinos, T. A. (2007). An exploratory test for an excess of significant findings. Clinical Trials, 4, 245–253. CrossRef
- Kyzas, P. A., Loizou, K. T., & Ioannidis, J. P. (2005). Selective reporting biases in cancer prognostic factor studies. Journal of the National Cancer Institute, 97, 1043–1055. CrossRef
- LaLonde, R. (1986). Evaluating the econometric evaluations of training programs with experimental data. American Economic Review, 76, 604–620.
- Light, R. J., Singer, J. D., & Willett, J. B. (1990). By design: Planning research in higher education. Cambridge: Harvard University Press.
- Luellen, J. (2007). A comparison of propensity score estimation and adjustment methods on simulated data (Unpublished doctoral dissertation). The University of Memphis, Memphis, TN.
- McCandless, L.C., Richardson, S. & Best, N. (2012). Adjustment for missing confounders using external validation data and propensity scores. Journal of the American Statistical Association, 107, 40–51. http://dx.doi.org/10.1080/01621459.2011.643739
- Moser, S., West, S. G., & Hughes, J. N. (2012). Trajectories of math and reading achievement in low achieving children in elementary school: How are they affected by retention in first and later grades? Journal of Educational Psychology, 104, 603–621. doi:10.1037/a0027571
- Peikes, D. N., Moreno, L., & Orzol, S. M. (2008). Propensity score matching: a note of caution for evaluators of social programs. The American Statistician, 62, 222–231. CrossRef
- Pohl, S., Steiner, P. M., Eisermann, J., Soellner, R., & Cook, T. D. (2009). Unbiased causal inference from an observational study: results of a within-study comparison. Educational Evaluation and Policy Analysis, 31, 463–479. CrossRef
- Popper, K. R. (1959). The logic of scientific discovery. New York: Basic Books.
- Renkewitz, R., Fuchs, H. M., & Fiedler, S. (2011). Is there evidence of publication biases in JDM research? Judgment and Decision Making, 6, 870–881.
- Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55. CrossRef
- Rubin, D. B. (2001). Using propensity scores to help design observational studies: application to the tobacco litigation. Health Services and Outcomes Research Methodology, 2, 169–188. CrossRef
- Shadish, W. R., & Cook, T. D. (2009). The renaissance of field experimentation in evaluating interventions. Annual Review of Psychology, 60, 607–629. CrossRef
- Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment. Journal of the American Statistical Association, 103, 1334–1343. CrossRef
- Shadish, W.R., Steiner, P.M., & Cook, T.D. (2008). Peikes, D.N., Moreno, L. & Orzol, S.M. (2008). Propensity score matching: A note of caution for evaluators of social programs. The American Statistician, 62, 222-231: Comment by Shadish, Steiner and Cook. Unpublished manuscript.
- Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366. doi:10.1177/0956797611417632. CrossRef
- Steiner, P. M., Cook, T. D., & Shadish, W. R. (2011). On the importance of reliable covariate measurement in selection bias adjustments using propensity scores. Journal of Educational and Behavioral Statistics, 36, 213–236. CrossRef
- Steiner, P. M., Cook, T. D., Shadish, W. R., & Clark, M. H. (2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods, 15, 250–267. CrossRef
- Wagenmakers, E.-J., Wetzels, R., Borsboom, D., & van der Maas, H. L. J. (2011). Why psychologists must change the way that they analyze their data: the case of Psi: comment on Bem (2011). Journal of Personality and Social Psychology, 100, 426–432. doi:10.1037/a0022790. CrossRef
- Zhao, Z. (2004). Using matching to estimate treatment effects: data requirements, matching metrics and Monte Carlo evidence. The Review of Economics and Statistics, 86, 91–107. CrossRef
- Propensity score analysis: promise, reality and irrational exuberance
Journal of Experimental Criminology
Volume 9, Issue 2 , pp 129-144
- Cover Date
- Print ISSN
- Online ISSN
- Springer Netherlands
- Additional Links
- Propensity score
- Nonrandomized experiment
- Author Affiliations
- 1. School of Social Sciences, Humanities and Arts, University of California, Merced, 5200 North Lake Rd, Merced, CA, 95343, USA