Skip to main content

Propensity Score Matching in Criminology and Criminal Justice

  • Chapter
  • First Online:
Handbook of Quantitative Criminology

Abstract

The propensity score methodology has become quite common in applied research in the last 10 years, and criminology is no exception to this growing trend. It offers a potentially powerful way to estimate the treatment effect of some intervention on behavior when the receipt of treatment arises in a nonrandom way – this is the selection problem. It does so by creating synthetic “experimental” and “control” groups that are equivalent on a large number of potential confounding variables. In this chapter, we first introduce the counterfactual framework on which the propensity score method is based and define the average treatment effect. We then outline technical issues that must be addressed when the propensity score method is used in practice, including estimation of the propensity score, demonstration of covariate balance, and estimation of the treatment effect of interest. To provide a step-by-step example of the method, we appeal to the relationship between employment and substance use in adolescence. Following a brief review of research in criminology and related disciplines that employ the propensity score methodology, we offer a number of guidelines for use of the technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Readers desiring a more thorough survey of the counterfactual framework generally and the propensity score method specifically are referred to Cameron and Trivedi (2005: Chap. 25), Imbens (2004), Morgan and Harding (2006), and Wooldridge (2002: Chap. 18).

  2. 2.

    Notice that the counterfactual definition of causality requires that the individual occupy two states at the same time, not two different states at two different times. If the latter condition held, panel data with a time-varying treatment condition would suffice to estimate a causal effect of treatment. In the marriage example, the period(s) in which the individual is not married would be the counterfactual for the period(s) in which the same individual is married.

  3. 3.

    In this chapter, we will be mostly concerned with estimation of ATE rather that its constituents, ATT and ATU.

  4. 4.

    Because it renders treatment ignorable, randomization is sufficient to identify the average treatment effect in the following manner:

    $$\begin{array}{l} \mathrm{ATE} = \mathrm{E}\left (\left.{Y }_{i}^{1}\right \vert {T}_{i} = 1\right ) -\mathrm{E}\left (\left.{Y }_{i}^{0}\right \vert {T}_{i} = 0\right ) \\ \qquad \ = \mathrm{E}\left (\left.{Y }_{i}\right \vert {T}_{i} = 1\right ) -\mathrm{E}\left (\left.{Y }_{i}\right \vert {T}_{i} = 0\right ) \end{array}$$

    Notice that this is simply the mean difference in the outcome for treated and untreated individuals in the target population, as the potential outcomes notation in the first equality can be removed. The second equality necessarily follows because treatment assignment independent of potential outcomes ensures that:

    $$\mathrm{E}\left (\left.{Y }_{i}^{1}\right \vert {T}_{ i} = 1\right ) = E\left (\left.{Y }_{i}^{1}\right \vert {T}_{ i} = 0\right ) = E\left (\left.{Y }_{i}\right \vert {T}_{i} = 1\right )$$

    and

    $$\mathrm{E}\left (\left.{Y }_{i}^{0}\right \vert {T}_{ i} = 1\right ) = E\left (\left.{Y }_{i}^{0}\right \vert {T}_{ i} = 0\right ) = E\left (\left.{Y }_{i}\right \vert {T}_{i} = 0\right )$$

    As an interesting aside, in the case of a randomized experiment, it is also the case that ATT and ATU are equivalent to ATE by virtue of these equalities.

  5. 5.

    To be perfectly accurate, randomization may in fact produce imbalance, but the imbalance is attributable entirely to chance. However, asymptotically (i.e., as the sample size tends toward infinity) the expected imbalance approaches zero.

  6. 6.

    Aside from ethical and practical concerns, this experiment would be unable to assess the effect of marriage as we know it, as marriages entered into on the basis of a coin flip would likely have very different qualities than those freely chosen.

  7. 7.

    Researchers differ in their preferences for how exhaustive the treatment status model should be. In a theoretically informed model, the researcher includes only a vector of variables that are specified a priori in the theory or theories of choice. In a kitchen sink model, the researcher includes as many variables as are available in the dataset. In our view, a theoretically informed model is appealing only to the extent that it achieves balance on confounders that are excluded from the treatment status model but would have been included in a kitchen sink model.

  8. 8.

    Some researchers also include functions of the confounders in the treatment status model, for example, quadratic and interaction terms.

  9. 9.

    A useful sensitivity exercise is to estimate treatment effects using a number of different bandwidths to determine stability of the estimates. With smaller bandwidths, common support shrinks and fewer cases are retained. This alters the nature of the estimated treatment effect, particularly if a large number of cases are excluded. This can be dealt with by simply acknowledging that the estimated effect excludes certain kinds of cases, and these can be clearly described since the dropped cases are observed.

  10. 10.

    Where substantive significance is as important as statistical significance, the standardized bias formula can also be used to estimate an effect size for the treatment effect estimate (see Cohen, 1988).

  11. 11.

    In practice, Wooldridge (2002) recommends augmenting the regression model in the following way:

    $${Y }_{i} = {\alpha }^{{\prime}} + {\beta }^{{\prime}}{T}_{ i} + {\gamma }^{{\prime}}P({x}_{ i}) + {\delta }^{{\prime}}{T}_{ i}\left [P({x}_{i}) -\bar{ P}({x}_{i})\right ] + {e}_{i}^{{\prime}}$$

    where \(\bar{P}({x}_{i})\) represents the mean propensity score for the target population and ATE is estimated the same way, but by using β in place of β.

  12. 12.

    Nearest neighbor matching can be done with or without replacement. Matching without replacement means that once an untreated case has been matched to a treated case, it is removed from the candidates for matching. This may lead to poor matches when the distribution of propensity scores is quite different for the treated and untreated groups. Matching without replacement also requires that cases be randomly sorted prior to matching, as sort order can affect matches when there are cases with equal propensity scores. Matching with replacement allows an untreated individual to serve as the counterfactual for multiple treated individuals. This allows for better matches, but reduces the number of untreated cases used to create the treatment effect estimate, which increases the variance of the estimate (Smith and Todd 2005). As with the choice of the number of neighbors, one has to balance concerns of bias and efficiency.

  13. 13.

    When there are many cases at the boundaries of the propensity score distribution, it may be useful to generalize kernel matching to include a linear term; this is called local linear matching. Its main advantage over kernel matching is that it yields more accurate estimates at boundary points in the distribution of propensity scores and it deals better with different data densities (Smith and Todd 2005).

  14. 14.

    Apel et al. (2006, 2007, 2008); Bachman et al. (1981, 2003); Bachman and Schulenberg (1993); Gottfredson (1985); Greenberger et al. (1981); Johnson (2004); McMorris and Uggen (2000); Mihalic and Elliott (1997); Mortimer (2003); Mortimer et al. (1996); Paternoster et al. (2003); Ploeger (1997); Resnick et al. (1997); Safron et al. (2001); Staff and Uggen (2003); Steinberg and Dornbusch (1991); Steinberg et al. (1982, 1993); Tanner and Krahn (1991).

  15. 15.

    If we select the sample treatment probability as the classification threshold, 71.8 percent of the sample is correctly classified from the model shown in Table26.1.

  16. 16.

    The sign of the standardized bias is informative. If positive, it signifies that treated youth (i.e., youth who work intensively during the school year) exhibit more of the characteristic being measured than untreated youth. Conversely, if negative, it means that treated youth have less of the measured quality than untreated youth.

  17. 17.

    If a logistic regression model of substance use is estimated instead, the coefficient for intensive work with no control variables is 0.77 (odds ratio = 2.16), and with control variables is 0.28 (odds ratio = 1.33). Both coefficients are statistically significant at a five-percent level.

  18. 18.

    Notice that the ATE from standard regression in panel A (b=0.051) is very similar to the ATE from propensity score regression with no trimming in panel B (b=0.054). The similarity is not coincidental. The discrepancy is only due to the fact that the propensity score was estimated from a logistic regression model at the first stage. Had a linear regression model been used instead, the two coefficients would be identical, although the standard errors would differ.

  19. 19.

    We employ the user-written Stata protocol -psmatch2- to estimate average treatment effects from the matching models (see Leuven and Barbara 2003). To obtain the standard error of the ATE, we perform a bootstrap procedure with 100 replications.

  20. 20.

    As a further test of sensitivity, we estimated the ATE of intensive employment on substance use for subsamples with different substance use histories. For this test, we employed single-nearest-neighbor matching with no caliper, although the findings were not sensitive to this choice. Among the 2,740 youth who, at the initial interview, reported never having used illicit substances, ATE=0.084 (S.E.=0.060). Among the 1,927 youth who reported having used at least one type of illicit substance prior to the initial interview, ATE=−0.019 (S.E.=0.046).

References

  • Apel R, Bushway SD, Brame R, Haviland AM, Nagin DS, Paternoster R (2007) Unpacking the relationship between adolescent employment and antisocial behavior: a matched samples comparison. Criminology 45:67–97

    Article  Google Scholar 

  • Apel R, Bushway SD, Paternoster R, Brame R, Sweeten G (2008) Using state child labor laws to identify the causal effect of youth employment on deviant behavior and academic achievement. J Quant Criminol 24:337–362

    Article  Google Scholar 

  • Apel R, Paternoster R, Bushway SD, Brame R (2006) A job isn’t just a job: the differential impact of formal versus informal work on adolescent problem behavior. Crime Delinq 52:333–369

    Article  Google Scholar 

  • Bachman JG, Johnston LD, O’Malley PM (1981) Smoking, drinking, and drug use among American high school students: correlates and trends, 1975–1979. Am J Public Health 71:59–69

    Google Scholar 

  • Bachman JG, Safron DJ, Sy SR, Schulenberg JE (2003) Wishing to work: new perspectives on how adolescents’ part-time work intensity is linked to educational disengagement, substance use, and other problem behaviors. Int J Behav Dev 27:301–315

    Article  Google Scholar 

  • Bachman JG Schulenberg JE (1993) How part-time work intensity relates to drug use, problem behavior, time use, and satisfaction among high school seniors: are these consequences or merely correlates? Dev Psychol 29:220–235

    Google Scholar 

  • Banks D, Gottfredson DC (2003) The effects of drug treatment and supervision on time to rearrest among drug treatment court participants. J Drug Issues 33:385–412

    Google Scholar 

  • Berk RA, Newton PJ (1985) Does arrest really deter wife battery? An effort to replicate the findings of the Minneapolis spouse abuse experiment. Am Sociol Rev 50:253–262

    Article  Google Scholar 

  • Bingenheimer JB, Brennan RT, Earls FJ (2005) Firearm violence exposure and serious violent behavior. Science 308:1323–1326

    Article  Google Scholar 

  • Blechman EA, Maurice A, Bueckner B, Helberg C (2000) Can mentoring or skill training reduce recidivism? Observational study with propensity analysis. Prev Sci 1:139–155

    Article  Google Scholar 

  • Brame R, Bushway SD, Paternoster R, Apel R (2004) Assessing the effect of adolescent employment on involvement in criminal activity. J Contemp Crim Justice 20:236–256

    Article  Google Scholar 

  • Caldwell M, Skeem J, Salekin R, Rybroek GV (2006) Treatment response of adolescent offenders with psychopathy features: a 2-year follow-up. Crim Justice Behav 33:571–596

    Article  Google Scholar 

  • Cameron AC, Trivedi PK (2005) Microeconometrics: methods and applications. Cambridge University Press, New York

    Google Scholar 

  • Cochran WG (1968) The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics 24:295–313

    Article  Google Scholar 

  • Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum, Hillsdale, NJ

    Google Scholar 

  • Dehejia RH, Wahba S (1999) Causal effects in nonexperimental settings: reevaluating the evaluation of training programs. J Am Stat Assoc 94:1053–1062

    Article  Google Scholar 

  • Dehejia RH, Wahba S (2002) Propensity score-matching methods for nonexperimental causal studies. Rev Econ Stat 84:151–161

    Article  Google Scholar 

  • Glueck S, Glueck E (1950) Unraveling juvenile delinquency. The Commonwealth Fund, Cambridge, MA

    Google Scholar 

  • Gottfredson DC (1985) Youth employment, crime, and schooling: a longitudinal study of a national sample. Dev Psychol 21:419–432

    Article  Google Scholar 

  • Greenberger E, Steinberg LD, Vaux A (1981) Adolescents who work: health and behavioral consequences of job stress. Dev Psychol 17:691–703

    Article  Google Scholar 

  • Haviland AM, Nagin DS (2005) Causal inferences with group based trajectory models. Psychometrika 70:1–22

    Article  Google Scholar 

  • Haviland AM, Nagin DS (2007) Using group-based trajectory modeling in conjunction with propensity scores to improve balance. J Exp Criminol 3:65–82

    Article  Google Scholar 

  • Haviland AM, Nagin DS, Rosenbaum PR (2007) Combining propensity score matching and group-based trajectory analysis in an observational study. Psychol Methods 12:247–267

    Article  Google Scholar 

  • Heckman JJ, Joseph Hotz V (1989) Choosing among alternative nonexperimental methods for estimating the impact of social programs: the case of manpower training. J Am Stat Assoc 84:862–874

    Article  Google Scholar 

  • Hirano K, Imbens GW, Ridder G (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71:1161–1189

    Article  Google Scholar 

  • Holland PW (1986) Statistics and causal inference. J Am Stat Assoc 81:945–960

    Article  Google Scholar 

  • Imbens GW (2004) Nonparametric estimation of average treatment effects under exogeneity: a review. Rev Econ Stat 86:4–29

    Article  Google Scholar 

  • Johnson MK (2004) Further evidence on adolescent employment and substance use: differences by race and ethnicity. J Health Soc Behav 45:187–197

    Article  Google Scholar 

  • King RD, Massoglia M, MacMillan R (2007) The context of marriage and crime: gender, the propensity to marry, and offending in early adulthood. Criminology 45:33–65

    Article  Google Scholar 

  • Krebs CP, Strom KJ, Koetse WH, Lattimore PK (2009) The impact of residential and nonresidential drug treatment on recidivism among drug-involved probationers. Crime Delinq 55:442–471

    Article  Google Scholar 

  • Leeb RT, Barker LE, Strine TW (2007) The effect of childhood physical and sexual abuse on adolescent weapon carrying. J Adolesc Health 40:551–558

    Article  Google Scholar 

  • Leuven E, Barbara S (2003) PSMATCH2: Stata module to perform full Mahalanobis and propensity score matching, common support graphing, and covariate imbalance testing. Available online: http://ideas.repec.org/c/boc/bocode/s432001.html

  • Li YP, Propert KJ, Rosenbaum PR (2001) Balanced risk set matching. J Am Stat Assoc 96:870–882

    Article  Google Scholar 

  • Lu B (2005) Propensity score matching with time-dependent covariates. Biometrics 61:721–728

    Article  Google Scholar 

  • McCaffrey DF, Ridgeway G, Morral AR (2004) Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol Methods 9:403–425

    Article  Google Scholar 

  • McMorris BJ Uggen C (2000) Alcohol and employment in the transition to adulthood. J Health Soc Behav 41: 276–294

    Google Scholar 

  • McNeil DE, Binder RL (2007) Effectiveness of a mental health court in reducing criminal recidivism and violence. Am J Psychiatry 164:1395–1403

    Article  Google Scholar 

  • Mihalic SW Elliott DS (1997) Short- and long-term consequences of adolescent work. Youth Soc 28:464–498

    Google Scholar 

  • Mocan NH, Tekin E (2006) Catholic schools and bad behavior: a propensity score matching analysis. J Econom Anal Policy 5:1–34

    Google Scholar 

  • Molnar BE, Browne A, Cerda M, Buka SL (2005) Violent behavior by girls reporting violent victimization: a prospective study. Arch Pediatr Adolesc Med 159:731–739

    Article  Google Scholar 

  • Morgan SL, Harding DJ (2006) Matching estimators of causal effects: prospects and pitfalls in theory and practice. Sociol Methods Res 35:3–60

    Article  Google Scholar 

  • Mortimer JT (2003) Working and growing up in America. Harvard University Press, Cambridge, MA

    Google Scholar 

  • Mortimer JT, Finch MD, Ryu S, Shanahan MJ, Call KT (1996) The effects of work intensity on adolescent mental health, achievement, and behavioral adjustment: new evidence from a prospective study. Child Dev 67: 1243–1261

    Google Scholar 

  • Nagin DS (2005) Group-based modeling of development. Harvard University Press, Cambridge, MA

    Google Scholar 

  • Nieuwbeerta P, Nagin DS, Blokland AAJ (2009) Assessing the impact of first-time imprisonment on offenders’ subsequent criminal career development: A matched samples comparison. J Quant Criminol 25:227–257

    Article  Google Scholar 

  • Paternoster R, Bushway S, Brame R, Apel R (2003) The effect of teenage employment on delinquency and problem behaviors. Soc Forces 82:297–335

    Article  Google Scholar 

  • Ploeger M (1997) Youth employment and delinquency: reconsidering a problematic relationship. Criminology 35:659–675

    Article  Google Scholar 

  • Resnick MD, Bearman PS, Blum RW, Bauman KE, Harris KM, Jo J, Tabor J, Beuhring T, Sieving RE, Shew M, Ireland M, Bearinger LH, Richard Udry J (1997) Protecting adolescents from harm: findings from the national longitudinal study of adolescent health. J Am Med Assoc 278:823–832

    Article  Google Scholar 

  • Ridgeway G (2006) Assessing the effect of race bias in post-traffic stop outcomes using propensity scores. J Quant Criminol 22:1–29

    Article  Google Scholar 

  • Robins JM (1999) Association, causation, and marginal structural models. Synthese 121:151–179

    Article  Google Scholar 

  • Robins JM, Rotnitzky A (1995) Semiparametric efficiency in multivariate regression models with missing data. J Am Stat Assoc 90:122–129

    Article  Google Scholar 

  • Robins JM, Mark SD, Newey WK (1992) Estimating exposure effects by modeling the expectation of exposure conditional on confounders. Biometrics 48:479–495

    Article  Google Scholar 

  • Robins JM, Hernán MÁ, Brumback B (2000) Marginal structural models and causal inference in epidemiology. Epidemiology 11:550–560

    Article  Google Scholar 

  • Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55

    Article  Google Scholar 

  • Rosenbaum PR, Rubin DB (1984) Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 79:516–524

    Article  Google Scholar 

  • Rosenbaum PR, Rubin DB (1985) Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 39:33–38

    Article  Google Scholar 

  • Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66:688–701

    Article  Google Scholar 

  • Rubin DB (1977) Assignment of treatment group on the basis of a covariate. J Educ Stat 2:1–26

    Article  Google Scholar 

  • Safron DJ, Schulenberg JE Bachman JG (2001) Part-time work and hurried adolescence: the links among work intensity, social activities, health behaviors, and substance use. J Health Soc Behav 42:425–449

    Google Scholar 

  • Sampson RJ, Laub JH, Wimer C (2006) Does marriage reduce crime? A counterfactual approach to within-individual causal effects. Criminology 44:465–508

    Article  Google Scholar 

  • Smith JA, Todd PE (2005) Does matching overcome lalonde’s critique of nonexperimental estimators? J Econom 125:305–353

    Article  Google Scholar 

  • Staff J Uggen C (2003) The fruits of good work: early work experiences and adolescent deviance. J Res Crime Delinq 40:263–290

    Google Scholar 

  • Steinberg L Dornbusch S (1991) Negative correlates of part-time work in adolescence: replication and elaboration. Dev Psychol 17:304–313

    Google Scholar 

  • Steinberg L, Fegley S, Dornbusch S (1993) Negative impact of part-time work on adolescent adjustment: evidence from a longitudinal study. Dev Psychol 29:171–180

    Article  Google Scholar 

  • Steinberg LD, Greenberger E, Garduque L, Ruggiero M, Vaux A (1982) Effects of working on adolescent development. Dev Psychol 18:385–395

    Article  Google Scholar 

  • Sweeten G, Apel R (2007) Incapacitation: revisiting an old question with a new method and new data. J Quant Criminol 23:303–326

    Article  Google Scholar 

  • Tanner J, Krahn H (1991) Part-time work and deviance among high-school seniors. Can J Sociol 16:281–302

    Google Scholar 

  • Tita G, Ridgeway G (2007) The impact of gang formation on local patterns of crime. J Res Crime Delinq 44:208–237

    Article  Google Scholar 

  • Widom CS (1989) The cycle of violence. Science 244:160–166

    Article  Google Scholar 

  • Wooldridge JM (2002) Econometric analysis of cross section and panel data. MIT Press, Cambridge, MA

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Apel, R.J., Sweeten, G. (2010). Propensity Score Matching in Criminology and Criminal Justice. In: Piquero, A., Weisburd, D. (eds) Handbook of Quantitative Criminology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-77650-7_26

Download citation

Publish with us

Policies and ethics