Skip to main content

Advertisement

Log in

Estimating Treatment Effects and Predicting Recidivism for Community Supervision Using Survival Analysis with Instrumental Variables

  • Original Paper
  • Published:
Journal of Quantitative Criminology Aims and scope Submit manuscript

Abstract

Criminal justice researchers often seek to predict criminal recidivism and to estimate treatment effects for community corrections programs. Although random assignment provides a desirable avenue to estimating treatment effects, often estimation must be based on observational data from operating corrections programs. Using observational data raises the risk of selection bias. In the community corrections contexts, researchers can sometimes use judges as instrumental variables. However, the use of instrumental variable estimation is complicated for nonlinear models, and when studying criminal recidivism, researchers often choose to use survival models, which are nonlinear given right-hand-censoring or competing events. This paper discusses a procedure for estimating survival models with judges as instruments. It discusses strengths and weaknesses of this approach and demonstrates some of the estimation properties with a computer simulation. Although this paper’s focus is narrow, its implications are broad. A conclusion argues that instrumental variable estimation is valuable for a broad range of topics both within and outside of criminal justice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. Estimation of the treatment effect and estimation of its standard error are both biased. The bias for the treatment effect is such that at the extreme, the IV estimate equals the biased OLS estimate. The estimated standard error is biased downward. Hahn and Hausman (2003) discuss these issues and note that the frequently used test (Cameron and Trivedi 2005, p. 271) of the difference between the OLS and 2SLS estimates may be misleading as a test for endogeneity. The bias is especially serious in models that explain very little of the variance in the dependent variable, and the problem is compounded by the use of multiple instruments, an issues discussed later in this paper. Also see Stock et al. (2002).

  2. Although the instruments are binary, exhaustive and mutually exclusive for the problem considered here, this is not a requirement for multiple instruments. The proposed estimator offered later in this paper does require the instruments to be binary and mutually exclusive, but as Angrist and Pischke (2009) note, overlapping instruments can be converted to mutually exclusive instruments. For example let A and B be binary but overlapping instruments. These can be converted to a, b and c binary and mutually exclusive instruments. Furthermore, a continuous instrument can be divided into binary and mutually exclusive segments.

  3. Monotonicity is not verifiable because offender i is sentenced by a single judge. The assumption is stated in terms of probabilities, so it does not say that judge n would sentence offender i to treatment if judge m sentences offender i to treatment given n > m in the judge ranking. Some ranking is required else judges would not be instruments, but note that strict inequality need not hold. For example the rankings might cluster so that \( P_{i1} = P_{i2} < P_{i3} = P_{i4} \). The inequality must hold at some point in the sequence. Readers who are familiar with fuzzy regression discontinuity design estimators will see a parallel between this instrumental variable approach and the fuzzy RDD approach, a point made by Angrist and Pischke (2009).

  4. The condition can be relaxed to allow for random parameters. In general, however, if the treatment effect is heterogeneous, the parameter is a weighted average of individual treatment effect, where the weights are proportional to the variance of the treatment variable conditional on W. The OLS estimate is said to be a conditional-variance-weighted estimate of the underlying treatment effect (Angrist 1998; Angrist and Krueger 1999; Morgan and Winship 2007; Angrist and Pischke 2009; Rhodes 2009a).

  5. In general a nonlinear model can be written so that the outcome Y is a function of covariates X, a treatment indicator T, and a set of parameter α: \( Y = F(X,T,\alpha ) \). The treatment effect is \( \partial Y /\partial T = \partial F(X,T,\alpha ) /\partial T \). The treatment effect will depend on X and all the α.

  6. Assuming no selection bias may seem strange, but our motivation here is to compare the efficiency of the IV estimator with the OLS estimator. This is most easily done when there is no selection bias, allowing us to estimate the cost of IV estimation compared with OLS estimation.

  7. Although the equation does not show why, the presence of Z in the DGP is essential to identification whether one uses IV or some other estimator. The issue is that if there were no Z in the DGP, then T would be collinear with W, and one could not estimate the treatment effect from regressing Y onto W and T even given selection on the observables. (Identification might come from nonlinearities, but this is not a very promising identification condition.) Z must always appear in the DGP. Estimation techniques that rely on selection of the observables do not need to know Z and should not use Z in the regression when Z is known. On contrast, IV estimation requires the analyst to identify and measure one or more Z variables.

  8. The evaluation methodology discussed in this paper was developed for a specific application: Evaluating the effectiveness of correctional programming for federal offenders serving terms of probation and terms of supervised release (i.e. community supervision following a prison term). In the federal system, senior judges have discretion regarding their caseloads, so senior judges should be excluded from the analysis. There does not appear to be heavy judge-shopping in the federal system, perhaps because of federal sentencing guidelines. Judge shopping may be so prominent in other settings that an evaluator would feel uncomfortable using judges as instruments even after controlling for observables.

  9. The ordering would be based on predictions where one of the Zs would be set equal to 1 while all other Zs are set equal to 0. The choice of the Z will not change the ordering.

  10. More accurately, judges must receive a random sample of offenders conditional on the X O included in the regression. This is what allows the judges to serve as instruments.

  11. The discussion uses the subjunctive mood. This is appropriate because the discussion concerns what would have happened under a hypothetical situation where each judge sentences the same offenders. Obviously this cannot happen, and instead the statistical analysis assumes that judges receive a random sample of offenders, so that they sentence the same offenders in expectation.

  12. Definitions of treatment effects comes in what might be seen as an uncomfortably wide variety of flavors. The multiple treatment effects identified by using multiple instruments might be seen as approximating marginal treatment effects (Moffitt 1999). Integrating over these marginal treatment effects leads to the average treatment effect for a specific population, which may be the average treatment effect for the treated if the data provide sufficient support. Interpreting LATE is a lively current topic in econometric research, but not one that can be pursued within the scope of this paper.

  13. There is an interesting parallel here between the local average treatment estimator and a fuzzy regression discontinuity design (Hahn et al. 2001). Within a small bandwidth about the jth judge, offenders just to the left of an implicit threshold are treated and offenders just to the right of that threshold are not treated. Thus each judge provides a discontinuity. From the RDD perspective, judges need not follow a strict rule (Lee and Lemieux 2009). In fact, Angrist and Pischke (2009) assert that a fuzzy regression discontinuity design is an instrumental variable estimator. This relaxes the strict assumption of monotonicity.

  14. If the simulation randomly assigns 25 (50) or fewer offenders to a judge, the simulation overrides that random assignment to set the number to 25 (50). This results in somewhat more offenders per judge than is implied by the table.

References

  • Abbring J, van den Berg G (2005) Social experiments and instrumental variables with duration outcomes. Tinbergen Institute discussion paper 2005-047/3

  • Angrist J (1998) Estimating the labor force impact of voluntary military service using social security data on military applications. Econometrica 66(2):249–288

    Article  Google Scholar 

  • Angrist J (2001) Estimation of limited dependent variable models with dummy endogenous regressors: simple strategies for empirical practices. Am Stat Assoc J Bus Econ Stat 19(1):2–28

    Article  Google Scholar 

  • Angrist J (2006) Instrumental variable methods in criminological work: what, why and how. J Exp Criminol 1–22

  • Angrist J, Evans W (1998) Children and their parents’ labor supply: evidence from exogenous variation in labor supply. Am Econ Rev 450–477

  • Angrist J, Krueger A (1999) Empirical strategies in labor economics. In: Ashenfelder O, Card D (eds) Handbook of labor economics. Elsvier, Amsterdam, pp 1277–1366

    Google Scholar 

  • Angrist J, Lavy V (1999) Using Maimonides’ rule. Quart J Econom 114(2):533–575

    Article  Google Scholar 

  • Angrist J, Pischke J (2009) Mostly harmless econometrics: an Empiricist’s companion. Princeton University Press, Princeton

    Google Scholar 

  • Baum C (2009) An introduction to stata programming. StataCorp, College Station

    Google Scholar 

  • Berk R, de Leeuw J (1999) An evaluation of California’s inmate classification system using generalized regression discontinuity design. J Am Stat Assoc 94(448):1045–1052

    Article  Google Scholar 

  • Bloom H (1984) Accounting for no-shows in experimental evaluation designs. Eval Rev 8(2):225–246

    Article  Google Scholar 

  • Bogue B, Campbell M, Carey M, Calwson D, Faust K, Forio K et al (2004) Implementing evidence-based practices in community corrections: the principles of effective intervention. National Institute of Corrections, Washington, DC

    Google Scholar 

  • Bowden R, Turkington D (1984) Instrumental variables. Cambridge University Press, Cambridge

    Google Scholar 

  • Bushway S, Smith J (2007) Sentencing using statistical statement rules: what we don’t know can hurt us. J Quant Criminol 377–387

  • Cameron A, Trivedi P (2005) Microeconometrics: methods and applications. Cambridge University Press, Cambridge

    Google Scholar 

  • Cameron A, Trivedi P (2009) Microeconometrics using stata. Cambridge University Press, Cambridge

    Google Scholar 

  • Cleves M, Gould W, Gutierrez R, Marchenko Y (2008) An introduction to survival analysis using stata, 2nd edn. Stata Press, College Station

    Google Scholar 

  • Davidson R, MacKinnon J (1993) Estimation and inference in econometrics. Oxford University Press, Oxford

    Google Scholar 

  • Farrington D, Welsh B (2005) Randomized experiments in criminology: what we have learned in the last two decades. J Exp Criminol 1(1):9–38

    Article  Google Scholar 

  • Gennetian L, Morris P, Bos J, Bloom H (2005) Constructing instrumental variables from experimental data to explore how treatment produces effects. In: Bloom H (ed) Learning more from experiments: evolving analytic approaches. Russell Sage Foundation, New York

    Google Scholar 

  • Gottfredson S, Moriarty L (2006) Statistical risk assessment: old problems and new applications. Crime Delinq 52(1):178–200

    Article  Google Scholar 

  • Greene W (2008) Econometric analysis, 6th edn. Prentice Hall, Upper Saddle River

    Google Scholar 

  • Hahn J, Hausman J (2003) Weak instruments: diagnosis and cures in empirical econometrics. Am Econ Rev 93(2):118–125

    Article  Google Scholar 

  • Hahn J, Todd P, van der Klaauw W (2001) Identification and estimation of treatment effects with a regression- discontinuity design. Econometrica 69(1):201–209

    Article  Google Scholar 

  • Heckman J (1979) Sample selection bias as a specification error. Econometrica 47(1):153–161

    Article  Google Scholar 

  • Heckman J, Leamer E (2007) Handbook of econometrics, vol 6B. North-Holland Press, Amsterdam

    Google Scholar 

  • Heckman J, Singer B (1984) The identifiability of the proportional hazard model. Rev Econ Stud 231–241

  • Heckman J, Vytlacil E (2007) Econometric evaluation of social programs, Part I. In: Heckman J, Leamer E (eds) Handbook of econometrics, vol 6B. North Holland Press, Amsterdam

    Google Scholar 

  • Hughs J (2008) Results-based management in federal probation and pretrial services. Fed Probat 4–14

  • Imbens G (2009) Better LATE than nothing: some comments on Deaton (2009) and Heckman and Urzua (2009), unpublished paper downloaded from http://www.economics.harvard.edu/faculty/imbens/files/bltn_09apr10.pdf on Dec 2, 2009

  • Imbens G, Angrist J (1994) Identification and estimation of local average treatment effects. Econometrica 62(2):467–475

    Article  Google Scholar 

  • Kalbfleisch J, Prentice R (1980) The statistical analysis of failure time data. John Wiley, New York

    Google Scholar 

  • Kilmer B (2008) Does parolee drug testing influence employment and education outcomes? Evidence from randomized experiment with noncompliance. J Quant Criminol 93–123

  • Kling J (2006) Incarceration length, employment and earnings. Am Econ Rev 96(3):863–876

    Article  Google Scholar 

  • Lancaster T (1990) The econometric analysis of transition data. Cambridge University Press, Cambridge

    Google Scholar 

  • Lee M (2005) Micro-econometrics for policy, program and treatment effects. Oxford University Press, Oxford

    Book  Google Scholar 

  • Lee D, Lemieux T (2009) Regression discontinuity designs in economics. Retrieved Jul 27, 2009, from NBER Working Paper Series: http://www.nber.org/papers/w14723

  • Lipsey M, Petrie C, Weisburd D, Gottfredson D (2006) Improving evaluations of anti-crime programs. J Exp Criminol

  • Manski C (2007) Identification for prediction and decision. Harvard University Press, Cambridge

    Google Scholar 

  • Moffitt R (1999) Commentrary: models of treatment effects when responses are heterogeneous. Proc Natl Acad Sci USA Sol 96:6575–6576

    Article  Google Scholar 

  • Morgan S, Winship C (2007) Counterfactuals and causal inferences: methods and principals for social research. Cambridge University Press, Cambridge

    Google Scholar 

  • Rhodes W (1985) The adequacy of statically derived prediction instruments in the face of sample selectivity: criminal justice as an example. Eval Rev 9(3):369–382

    Article  Google Scholar 

  • Rhodes W (2009a) Estimating treatment effects: a primer for evaluators. Cambridge, unpublished manuscript available from the author

  • Rhodes W (2009b) Predicting criminal recidivism: a research note (under review)

  • Rhodes W, Pelissier B, Gaes G, Saylor B, Camp S, Wallace S (2001) Alternative solutions to the problem of selection bias in an analysis of federal residential drug treatment programs. Eval Rev 331–369

  • Rhodes W, Jalbert S, Flygare C (2009) Regression discontinuity design: a review and illustration (submitted for publication)

  • Rosenbaum P (2002) Observational studies, 2nd edn. Springer, New York

    Google Scholar 

  • Smith H (1997) Matching with multiple controls to estimate treatment effects in observational studies. In: Raftery A (ed) Sociological methodology, vol 27. Blackwell, Boston, pp 325–354

    Google Scholar 

  • Stock J, Wright J, Yogo M (2002) A survey of weak instruments and weak identification in generalized method of moments. J Bus Econ Stud 518–529

  • Truitt L, Rhodes W, Seeherman A, Carrigan K, Finn P (1999) Process and impact evaluation of the Escambia County, Florida, and Jackson County, Missouri drug courts. National Institute of Justice, Washington, DC

    Google Scholar 

  • United States Government Accounting Office (2009) Program evaluation: a variety of rigorous methods can help identify effective interventions. United States Government Accounting Office, Washington, DC

    Google Scholar 

Download references

Acknowledgments

The author thanks Gerry Gaes, Stephen Kennedy, Jacob Klerman, Fatih Unlu, members of the Abt Associates Journal Authors Support Group, and three JQC reviewers for their comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to William Rhodes.

Appendix

Appendix

The appendix shows why the two-step estimator does not work in a survival model. The DGP for commonly used proportional hazard survival models (Lancaster 1990, p. 35) can be written in a linear logarithmic form.

$$ \ln \left( Y \right) = X\alpha_{X} + T\delta + e $$
(A1)

Y is the failure time, X are risk factors, and T is a dichotomous variable representing treatment. The distribution of e determines the parametric model. For example, if exp(e) is a type one extreme value distribution, then the failure time has an exponential distribution conditional on X and T. As before assume that the probability of treatment is a linear function of X and Z, so:

$$ T = 1\quad {\text{if}}\;X\beta_{X} + Z\beta_{Z} + u \ge 0\;{\text{and}}\;T = 0\;{\text{otherwise}} $$
(A2)

Both Eqs. A1 and A2 are parts of the DGP. The first equation is the outcome equation. The second is the selection equation.

Suppose that Eqs. A1 and A2 are the DGP, but that the model includes some of the X variables (the X O variables) and omits other X variables (the X U variables). The Z are always observed. Placing the omitted variables in the error term, rewrite the same DGP for the outcome equation as:

$$ \ln \left( Y \right) = X_{O} \alpha_{{X_{O} }} + T\delta + e^{*}\quad {\text{where}}\;e^{*} = X_{U} \alpha_{{X_{U} }} + e $$
(A3)

Likewise, the same DGP for the selection equation is written as:

$$ T = 1\quad {\text{if}}\;X_{O} \beta_{{X_{O} }} + Z\beta_{Z} + u^{*} \ge 0\;{\text{and}}\;T = 0\;{\text{otherwise}}\;{\text{where}}\;u^{*} = X_{U} \beta_{{X_{U} }} + u $$
(A4)

This new error term changes the survival distribution (Heckman and Singer 1984), but we ignore that fact in this discussion. Generally estimates of the δ will be biased and inconsistent because T is not orthogonal to e*. The survival model will suffer from selection bias, but can a two-step estimator overcome this problem? In the two-step estimator, the evaluator estimates T based on a model of the DGP for the selection equation. The model will not represent the β consistently, but nevertheless the model will provide a consistent estimate of T conditional on X O and Z. Replacing the treatment variable T with its estimate leads to another representation of the DGP for the outcome equation as:

$$ \ln \left( Y \right) = X_{O} \alpha_{{X_{O} }} + \hat{T}\delta + e^{**} $$
(A5)

The error term is now \( e^{**} = \left( {T - \hat{T}} \right)\delta + X_{U} \alpha_{{X_{U} }} + e \). Conditional on X O , \( \hat{T} \) will be orthogonal to e** by construction, but X O will not be orthogonal to e** if the observed variables are correlated with the unobserved variables.

Holding X O constant and letting P be the probability of treatment conditional on X O and Z, and using a least squares regression to estimate the parameters, the expected value of the outcome Y can be written:

$$ \begin{aligned} E\left[ {\ln \left( Y \right)} \right] & =\,& P\left\{ {X_{O} a_{{X_{O} }} + \delta + E\left[ {v|T = 1} \right]} \right\} + \left( {1 - P} \right)\left\{ {X_{O} a_{{X_{O} }} + E\left[ {v|T = 0} \right]} \right\} \\ & =\,& \left\{ {X_{O} a_{{X_{O} }} + P\delta } \right\} \\ \end{aligned} $$
(A6)

\( E\left[ {v|T = 1} \right] \) is the expected value of the residual (not the error term) conditional on treatment. The expected value is not zero because of the presence of the omitted variable X U . Likewise \( E\left[ {v|T = 0} \right] \) is the expected value of the residual conditional on no treatment. If there is a constant in the model, so that \( E\left[ v \right] = 0 \), \( PE\left[ {v|T = 1} \right] + \left( {1 - P} \right)E\left[ {v|T = 0} \right] = 0 \). The second line shows that the expectation conditional on X O and the probability of treatment admission equals the two-step estimator. The estimator works provided there is no censoring.

Now introduce censoring at time C = ln(censoring time). Let \( \varphi \left( v \right) \) represent the density for v; let \( \varphi \left( {v|T = 1} \right) \) represent the density of v conditional on treatment and let \( \varphi \left( {v|T = 0} \right) \) represent the density of v conditional on no treatment. The probability of surviving to time C implies that:

$$ v \ge C - X_{O} a_{{X_{O} }} - \hat{P}\delta $$
(A7)

The probability of survival as of time C can be written as:

$$ \begin{aligned} {\text{PROB}}_{\text{survival}} \left( {C|X_{O} ,\hat{P}} \right) & = & P\left\{ {\int\limits_{{C - X_{O} a_{{X_{O} }} - \delta }}^{\infty } {\varphi (v|T = 1)} } \right\} + (1 - P)\left\{ {\int\limits_{{C - X_{O} a_{{X_{O} }} }}^{\infty } {\varphi (v|T = 0)} } \right\} \\ & \ne & \int\limits_{{C - X_{O} a_{{X_{O} }} - P\delta }}^{\infty } {\varphi (v)} \\ \end{aligned} $$
(A8)

The inequality holds because of Jenkin’s inequality. This shows why the two-step estimator will not work: It cannot account for nonlinearities due to censoring. One might solve this problem by making assumptions about the conditional distribution of v. This leads to what are known as Heckman-type adjustment models, but obviously making assumptions about the distribution of v is uncertain and getting the assumptions wrong will lead to biased and inconsistent parameter estimates.

Possibly the density function is approximately linear over the range of interest to the estimation, namely: \( \left( {C - X_{O} a_{{X_{O} }} } \right) - \left( {C - X_{O} a_{{X_{O} }} - \delta } \right) \). If so, then the two-step estimator would likely improve on the approach that relies on selection on the observables. Furthermore, the approximation is especially plausible over ranges where the density in increasing or decreasing monotonically. This will always be true of an exponential survival model and it will be true over a range for other survival models.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rhodes, W. Estimating Treatment Effects and Predicting Recidivism for Community Supervision Using Survival Analysis with Instrumental Variables. J Quant Criminol 26, 391–413 (2010). https://doi.org/10.1007/s10940-010-9090-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10940-010-9090-x

Keywords

Navigation