Abstract
The propensity score methodology has become quite common in applied research in the last 10 years, and criminology is no exception to this growing trend. It offers a potentially powerful way to estimate the treatment effect of some intervention on behavior when the receipt of treatment arises in a nonrandom way – this is the selection problem. It does so by creating synthetic “experimental” and “control” groups that are equivalent on a large number of potential confounding variables. In this chapter, we first introduce the counterfactual framework on which the propensity score method is based and define the average treatment effect. We then outline technical issues that must be addressed when the propensity score method is used in practice, including estimation of the propensity score, demonstration of covariate balance, and estimation of the treatment effect of interest. To provide a step-by-step example of the method, we appeal to the relationship between employment and substance use in adolescence. Following a brief review of research in criminology and related disciplines that employ the propensity score methodology, we offer a number of guidelines for use of the technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Notice that the counterfactual definition of causality requires that the individual occupy two states at the same time, not two different states at two different times. If the latter condition held, panel data with a time-varying treatment condition would suffice to estimate a causal effect of treatment. In the marriage example, the period(s) in which the individual is not married would be the counterfactual for the period(s) in which the same individual is married.
- 3.
In this chapter, we will be mostly concerned with estimation of ATE rather that its constituents, ATT and ATU.
- 4.
Because it renders treatment ignorable, randomization is sufficient to identify the average treatment effect in the following manner:
$$\begin{array}{l} \mathrm{ATE} = \mathrm{E}\left (\left.{Y }_{i}^{1}\right \vert {T}_{i} = 1\right ) -\mathrm{E}\left (\left.{Y }_{i}^{0}\right \vert {T}_{i} = 0\right ) \\ \qquad \ = \mathrm{E}\left (\left.{Y }_{i}\right \vert {T}_{i} = 1\right ) -\mathrm{E}\left (\left.{Y }_{i}\right \vert {T}_{i} = 0\right ) \end{array}$$Notice that this is simply the mean difference in the outcome for treated and untreated individuals in the target population, as the potential outcomes notation in the first equality can be removed. The second equality necessarily follows because treatment assignment independent of potential outcomes ensures that:
$$\mathrm{E}\left (\left.{Y }_{i}^{1}\right \vert {T}_{ i} = 1\right ) = E\left (\left.{Y }_{i}^{1}\right \vert {T}_{ i} = 0\right ) = E\left (\left.{Y }_{i}\right \vert {T}_{i} = 1\right )$$and
$$\mathrm{E}\left (\left.{Y }_{i}^{0}\right \vert {T}_{ i} = 1\right ) = E\left (\left.{Y }_{i}^{0}\right \vert {T}_{ i} = 0\right ) = E\left (\left.{Y }_{i}\right \vert {T}_{i} = 0\right )$$As an interesting aside, in the case of a randomized experiment, it is also the case that ATT and ATU are equivalent to ATE by virtue of these equalities.
- 5.
To be perfectly accurate, randomization may in fact produce imbalance, but the imbalance is attributable entirely to chance. However, asymptotically (i.e., as the sample size tends toward infinity) the expected imbalance approaches zero.
- 6.
Aside from ethical and practical concerns, this experiment would be unable to assess the effect of marriage as we know it, as marriages entered into on the basis of a coin flip would likely have very different qualities than those freely chosen.
- 7.
Researchers differ in their preferences for how exhaustive the treatment status model should be. In a theoretically informed model, the researcher includes only a vector of variables that are specified a priori in the theory or theories of choice. In a kitchen sink model, the researcher includes as many variables as are available in the dataset. In our view, a theoretically informed model is appealing only to the extent that it achieves balance on confounders that are excluded from the treatment status model but would have been included in a kitchen sink model.
- 8.
Some researchers also include functions of the confounders in the treatment status model, for example, quadratic and interaction terms.
- 9.
A useful sensitivity exercise is to estimate treatment effects using a number of different bandwidths to determine stability of the estimates. With smaller bandwidths, common support shrinks and fewer cases are retained. This alters the nature of the estimated treatment effect, particularly if a large number of cases are excluded. This can be dealt with by simply acknowledging that the estimated effect excludes certain kinds of cases, and these can be clearly described since the dropped cases are observed.
- 10.
Where substantive significance is as important as statistical significance, the standardized bias formula can also be used to estimate an effect size for the treatment effect estimate (see Cohen, 1988).
- 11.
In practice, Wooldridge (2002) recommends augmenting the regression model in the following way:
$${Y }_{i} = {\alpha }^{{\prime}} + {\beta }^{{\prime}}{T}_{ i} + {\gamma }^{{\prime}}P({x}_{ i}) + {\delta }^{{\prime}}{T}_{ i}\left [P({x}_{i}) -\bar{ P}({x}_{i})\right ] + {e}_{i}^{{\prime}}$$where \(\bar{P}({x}_{i})\) represents the mean propensity score for the target population and ATE is estimated the same way, but by using β′ in place of β.
- 12.
Nearest neighbor matching can be done with or without replacement. Matching without replacement means that once an untreated case has been matched to a treated case, it is removed from the candidates for matching. This may lead to poor matches when the distribution of propensity scores is quite different for the treated and untreated groups. Matching without replacement also requires that cases be randomly sorted prior to matching, as sort order can affect matches when there are cases with equal propensity scores. Matching with replacement allows an untreated individual to serve as the counterfactual for multiple treated individuals. This allows for better matches, but reduces the number of untreated cases used to create the treatment effect estimate, which increases the variance of the estimate (Smith and Todd 2005). As with the choice of the number of neighbors, one has to balance concerns of bias and efficiency.
- 13.
When there are many cases at the boundaries of the propensity score distribution, it may be useful to generalize kernel matching to include a linear term; this is called local linear matching. Its main advantage over kernel matching is that it yields more accurate estimates at boundary points in the distribution of propensity scores and it deals better with different data densities (Smith and Todd 2005).
- 14.
Apel et al. (2006, 2007, 2008); Bachman et al. (1981, 2003); Bachman and Schulenberg (1993); Gottfredson (1985); Greenberger et al. (1981); Johnson (2004); McMorris and Uggen (2000); Mihalic and Elliott (1997); Mortimer (2003); Mortimer et al. (1996); Paternoster et al. (2003); Ploeger (1997); Resnick et al. (1997); Safron et al. (2001); Staff and Uggen (2003); Steinberg and Dornbusch (1991); Steinberg et al. (1982, 1993); Tanner and Krahn (1991).
- 15.
If we select the sample treatment probability as the classification threshold, 71.8 percent of the sample is correctly classified from the model shown in Table26.1.
- 16.
The sign of the standardized bias is informative. If positive, it signifies that treated youth (i.e., youth who work intensively during the school year) exhibit more of the characteristic being measured than untreated youth. Conversely, if negative, it means that treated youth have less of the measured quality than untreated youth.
- 17.
If a logistic regression model of substance use is estimated instead, the coefficient for intensive work with no control variables is 0.77 (odds ratio = 2.16), and with control variables is 0.28 (odds ratio = 1.33). Both coefficients are statistically significant at a five-percent level.
- 18.
Notice that the ATE from standard regression in panel A (b=0.051) is very similar to the ATE from propensity score regression with no trimming in panel B (b=0.054). The similarity is not coincidental. The discrepancy is only due to the fact that the propensity score was estimated from a logistic regression model at the first stage. Had a linear regression model been used instead, the two coefficients would be identical, although the standard errors would differ.
- 19.
We employ the user-written Stata protocol -psmatch2- to estimate average treatment effects from the matching models (see Leuven and Barbara 2003). To obtain the standard error of the ATE, we perform a bootstrap procedure with 100 replications.
- 20.
As a further test of sensitivity, we estimated the ATE of intensive employment on substance use for subsamples with different substance use histories. For this test, we employed single-nearest-neighbor matching with no caliper, although the findings were not sensitive to this choice. Among the 2,740 youth who, at the initial interview, reported never having used illicit substances, ATE=0.084 (S.E.=0.060). Among the 1,927 youth who reported having used at least one type of illicit substance prior to the initial interview, ATE=−0.019 (S.E.=0.046).
References
Apel R, Bushway SD, Brame R, Haviland AM, Nagin DS, Paternoster R (2007) Unpacking the relationship between adolescent employment and antisocial behavior: a matched samples comparison. Criminology 45:67–97
Apel R, Bushway SD, Paternoster R, Brame R, Sweeten G (2008) Using state child labor laws to identify the causal effect of youth employment on deviant behavior and academic achievement. J Quant Criminol 24:337–362
Apel R, Paternoster R, Bushway SD, Brame R (2006) A job isn’t just a job: the differential impact of formal versus informal work on adolescent problem behavior. Crime Delinq 52:333–369
Bachman JG, Johnston LD, O’Malley PM (1981) Smoking, drinking, and drug use among American high school students: correlates and trends, 1975–1979. Am J Public Health 71:59–69
Bachman JG, Safron DJ, Sy SR, Schulenberg JE (2003) Wishing to work: new perspectives on how adolescents’ part-time work intensity is linked to educational disengagement, substance use, and other problem behaviors. Int J Behav Dev 27:301–315
Bachman JG Schulenberg JE (1993) How part-time work intensity relates to drug use, problem behavior, time use, and satisfaction among high school seniors: are these consequences or merely correlates? Dev Psychol 29:220–235
Banks D, Gottfredson DC (2003) The effects of drug treatment and supervision on time to rearrest among drug treatment court participants. J Drug Issues 33:385–412
Berk RA, Newton PJ (1985) Does arrest really deter wife battery? An effort to replicate the findings of the Minneapolis spouse abuse experiment. Am Sociol Rev 50:253–262
Bingenheimer JB, Brennan RT, Earls FJ (2005) Firearm violence exposure and serious violent behavior. Science 308:1323–1326
Blechman EA, Maurice A, Bueckner B, Helberg C (2000) Can mentoring or skill training reduce recidivism? Observational study with propensity analysis. Prev Sci 1:139–155
Brame R, Bushway SD, Paternoster R, Apel R (2004) Assessing the effect of adolescent employment on involvement in criminal activity. J Contemp Crim Justice 20:236–256
Caldwell M, Skeem J, Salekin R, Rybroek GV (2006) Treatment response of adolescent offenders with psychopathy features: a 2-year follow-up. Crim Justice Behav 33:571–596
Cameron AC, Trivedi PK (2005) Microeconometrics: methods and applications. Cambridge University Press, New York
Cochran WG (1968) The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics 24:295–313
Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum, Hillsdale, NJ
Dehejia RH, Wahba S (1999) Causal effects in nonexperimental settings: reevaluating the evaluation of training programs. J Am Stat Assoc 94:1053–1062
Dehejia RH, Wahba S (2002) Propensity score-matching methods for nonexperimental causal studies. Rev Econ Stat 84:151–161
Glueck S, Glueck E (1950) Unraveling juvenile delinquency. The Commonwealth Fund, Cambridge, MA
Gottfredson DC (1985) Youth employment, crime, and schooling: a longitudinal study of a national sample. Dev Psychol 21:419–432
Greenberger E, Steinberg LD, Vaux A (1981) Adolescents who work: health and behavioral consequences of job stress. Dev Psychol 17:691–703
Haviland AM, Nagin DS (2005) Causal inferences with group based trajectory models. Psychometrika 70:1–22
Haviland AM, Nagin DS (2007) Using group-based trajectory modeling in conjunction with propensity scores to improve balance. J Exp Criminol 3:65–82
Haviland AM, Nagin DS, Rosenbaum PR (2007) Combining propensity score matching and group-based trajectory analysis in an observational study. Psychol Methods 12:247–267
Heckman JJ, Joseph Hotz V (1989) Choosing among alternative nonexperimental methods for estimating the impact of social programs: the case of manpower training. J Am Stat Assoc 84:862–874
Hirano K, Imbens GW, Ridder G (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71:1161–1189
Holland PW (1986) Statistics and causal inference. J Am Stat Assoc 81:945–960
Imbens GW (2004) Nonparametric estimation of average treatment effects under exogeneity: a review. Rev Econ Stat 86:4–29
Johnson MK (2004) Further evidence on adolescent employment and substance use: differences by race and ethnicity. J Health Soc Behav 45:187–197
King RD, Massoglia M, MacMillan R (2007) The context of marriage and crime: gender, the propensity to marry, and offending in early adulthood. Criminology 45:33–65
Krebs CP, Strom KJ, Koetse WH, Lattimore PK (2009) The impact of residential and nonresidential drug treatment on recidivism among drug-involved probationers. Crime Delinq 55:442–471
Leeb RT, Barker LE, Strine TW (2007) The effect of childhood physical and sexual abuse on adolescent weapon carrying. J Adolesc Health 40:551–558
Leuven E, Barbara S (2003) PSMATCH2: Stata module to perform full Mahalanobis and propensity score matching, common support graphing, and covariate imbalance testing. Available online: http://ideas.repec.org/c/boc/bocode/s432001.html
Li YP, Propert KJ, Rosenbaum PR (2001) Balanced risk set matching. J Am Stat Assoc 96:870–882
Lu B (2005) Propensity score matching with time-dependent covariates. Biometrics 61:721–728
McCaffrey DF, Ridgeway G, Morral AR (2004) Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol Methods 9:403–425
McMorris BJ Uggen C (2000) Alcohol and employment in the transition to adulthood. J Health Soc Behav 41: 276–294
McNeil DE, Binder RL (2007) Effectiveness of a mental health court in reducing criminal recidivism and violence. Am J Psychiatry 164:1395–1403
Mihalic SW Elliott DS (1997) Short- and long-term consequences of adolescent work. Youth Soc 28:464–498
Mocan NH, Tekin E (2006) Catholic schools and bad behavior: a propensity score matching analysis. J Econom Anal Policy 5:1–34
Molnar BE, Browne A, Cerda M, Buka SL (2005) Violent behavior by girls reporting violent victimization: a prospective study. Arch Pediatr Adolesc Med 159:731–739
Morgan SL, Harding DJ (2006) Matching estimators of causal effects: prospects and pitfalls in theory and practice. Sociol Methods Res 35:3–60
Mortimer JT (2003) Working and growing up in America. Harvard University Press, Cambridge, MA
Mortimer JT, Finch MD, Ryu S, Shanahan MJ, Call KT (1996) The effects of work intensity on adolescent mental health, achievement, and behavioral adjustment: new evidence from a prospective study. Child Dev 67: 1243–1261
Nagin DS (2005) Group-based modeling of development. Harvard University Press, Cambridge, MA
Nieuwbeerta P, Nagin DS, Blokland AAJ (2009) Assessing the impact of first-time imprisonment on offenders’ subsequent criminal career development: A matched samples comparison. J Quant Criminol 25:227–257
Paternoster R, Bushway S, Brame R, Apel R (2003) The effect of teenage employment on delinquency and problem behaviors. Soc Forces 82:297–335
Ploeger M (1997) Youth employment and delinquency: reconsidering a problematic relationship. Criminology 35:659–675
Resnick MD, Bearman PS, Blum RW, Bauman KE, Harris KM, Jo J, Tabor J, Beuhring T, Sieving RE, Shew M, Ireland M, Bearinger LH, Richard Udry J (1997) Protecting adolescents from harm: findings from the national longitudinal study of adolescent health. J Am Med Assoc 278:823–832
Ridgeway G (2006) Assessing the effect of race bias in post-traffic stop outcomes using propensity scores. J Quant Criminol 22:1–29
Robins JM (1999) Association, causation, and marginal structural models. Synthese 121:151–179
Robins JM, Rotnitzky A (1995) Semiparametric efficiency in multivariate regression models with missing data. J Am Stat Assoc 90:122–129
Robins JM, Mark SD, Newey WK (1992) Estimating exposure effects by modeling the expectation of exposure conditional on confounders. Biometrics 48:479–495
Robins JM, Hernán MÁ, Brumback B (2000) Marginal structural models and causal inference in epidemiology. Epidemiology 11:550–560
Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55
Rosenbaum PR, Rubin DB (1984) Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 79:516–524
Rosenbaum PR, Rubin DB (1985) Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 39:33–38
Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66:688–701
Rubin DB (1977) Assignment of treatment group on the basis of a covariate. J Educ Stat 2:1–26
Safron DJ, Schulenberg JE Bachman JG (2001) Part-time work and hurried adolescence: the links among work intensity, social activities, health behaviors, and substance use. J Health Soc Behav 42:425–449
Sampson RJ, Laub JH, Wimer C (2006) Does marriage reduce crime? A counterfactual approach to within-individual causal effects. Criminology 44:465–508
Smith JA, Todd PE (2005) Does matching overcome lalonde’s critique of nonexperimental estimators? J Econom 125:305–353
Staff J Uggen C (2003) The fruits of good work: early work experiences and adolescent deviance. J Res Crime Delinq 40:263–290
Steinberg L Dornbusch S (1991) Negative correlates of part-time work in adolescence: replication and elaboration. Dev Psychol 17:304–313
Steinberg L, Fegley S, Dornbusch S (1993) Negative impact of part-time work on adolescent adjustment: evidence from a longitudinal study. Dev Psychol 29:171–180
Steinberg LD, Greenberger E, Garduque L, Ruggiero M, Vaux A (1982) Effects of working on adolescent development. Dev Psychol 18:385–395
Sweeten G, Apel R (2007) Incapacitation: revisiting an old question with a new method and new data. J Quant Criminol 23:303–326
Tanner J, Krahn H (1991) Part-time work and deviance among high-school seniors. Can J Sociol 16:281–302
Tita G, Ridgeway G (2007) The impact of gang formation on local patterns of crime. J Res Crime Delinq 44:208–237
Widom CS (1989) The cycle of violence. Science 244:160–166
Wooldridge JM (2002) Econometric analysis of cross section and panel data. MIT Press, Cambridge, MA
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Apel, R.J., Sweeten, G. (2010). Propensity Score Matching in Criminology and Criminal Justice. In: Piquero, A., Weisburd, D. (eds) Handbook of Quantitative Criminology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-77650-7_26
Download citation
DOI: https://doi.org/10.1007/978-0-387-77650-7_26
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-77649-1
Online ISBN: 978-0-387-77650-7
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)