An impact assessment of machine learning risk forecasts on parole board decisions and recidivism
The Pennsylvania Board of Probation and Parole has begun using machine learning forecasts to help inform parole release decisions. In this paper, we evaluate the impact of the forecasts on those decisions and subsequent recidivism.
A close approximation to a natural, randomized experiment is used to evaluate the impact of the forecasts on parole release decisions. A generalized regression discontinuity design is used to evaluate the impact of the forecasts on recidivism.
The forecasts apparently had no effect on the overall parole release rate, but did appear to alter the mix of inmates released. Important distinctions were made between offenders forecasted to be re-arrested for nonviolent crime and offenders forecasted to be re-arrested for violent crime. The balance of evidence indicates that the forecasts led to reductions in re-arrests for both nonviolent and violent crimes.
Risk assessments based on machine learning forecasts can improve parole release decisions, especially when distinctions are made between re-arrests for violent and nonviolent crime.
KeywordsParole Machine learning Recidivism Forecasting Regression discontinuity design Multinomial logistic regression
The entire project would have been impossible without the efforts of Jim Alibrio, Fred Klunk and their many colleagues working at the Pennsylvania Board of Probation and Parole and the Pennsylvania Department of Corrections. Thanks also go to the National Institute of Justice for financial support and to Patrick Clark who was the project monitor at NIJ. Finally, the paper benefitted from an unusually thorough and constructive set of reviews.
- Berk, R. A., Barnes, G., Alhman, L., & Kurtz, E. (2010b). When second best is good enough: a comparison between a true experiment and a regression discontinuity quasi-experiment. J. Exp. Criminol., 6(2), 191–208.Google Scholar
- Berk, R. A., & Bleich, J. (2013). Statistical procedures for forecasting criminal behavior: A comparative assessment. Journal of Criminology and Public Policy, 12 (3), 515–544.Google Scholar
- Berk, R. A., Brown, L., & Zhao, L. (2010a). Statistical inference after model selection. J. Quant. Criminol., 26(2), 217–236.Google Scholar
- Berk, R. A., Brown, L., Buja, A., Zhang, K., & Zhao, L. (2014a). Valid post-selection inference. Annals of Statistics, 41(2), 802–837.Google Scholar
- Berk, R. A., Pitkin, E., Brown, L., Buja, A., George, E., & Zhao, L. (2014b). Covariance adjustments for the analysis of randomized field experiments. Eval. Rev., 37, 170–196.Google Scholar
- Berk, R. A., Brown, L., Buja, A., George, E., Pitkin, E., Zhang, K., & Zhao, L. (2014c). Misspecified mean function regression: making good use of regression models that are wrong. Sociological Methods and Research, 43, 422–451.Google Scholar
- Berk, R. A., & de Leeuw, J. (1999). An evaluation of California’s inmate classification system using a generalized regression discontinuity design. J. Am. Stat. Assoc., 94(448), 1045–1052.Google Scholar
- Bertanha, M., & Imbens, G. W. (2014). External validity in fuzzy regression discontinuity designs. National Bureau of Economic Research, working paper 20773.Google Scholar
- Buja, A., Berk, R. A., Brown, L., George, E., Pitkin, E., Traskin, M., Zhao, L., & Zhang, K. (2015). Models as approximations — a conspiracy of random regressors and model violations against classical inference in regression. imsart-sts ver.2015/07/30: Buja_et_al_Conspiracy-v2.tex year: July 23, 2015.Google Scholar
- Burgess, E. M. (1928). The Working of the Indeterminate Sentence Law and the Parole System in Illinois In Bruce, A. A., Harno, A. J., Burgess, E., & Landesco, E. W. (Eds.), Factors determining success or failure on parole, (pp. 205–249). Springfield: State Board of Parole.Google Scholar
- Campbell, D. T., & Stanley, J. (1963). Experimental and quasi-experimental designs for research (Independence. Kentucky: Cengage Learning).Google Scholar
- Conroy, M. A. (2006). Risk assessments in the Texas criminal justice system. Applied Psychology in Criminal Justice, 2(3), 147–176.Google Scholar
- Hahn, J., Todd, J. P., & Van der Klaauw, W. (2001). Identification and estimation of treatment effects with a regression discontinuity design. Econometrica, 69, 201–209.Google Scholar
- Harcourt, B. (2008). Against prediction: profiling, policing and punishing in an actuarial age. Chicago: University of Chicago Press.Google Scholar
- Hollander, M., & Wolfe, D. A. (1999). Nonparametric statistical methods, 2nd edn. New York: Wiley.Google Scholar
- LaVigne, N., Bieler, S., Cramer, L., Ho, H., Kotonias, C., Mayer, D., McClure, D., Pacifici, L., Parks, E., Peterson, B., & Samuels, J. (2014). Justice Reinvestment Initaitive State Assessment Report. Washington D.C.: Bureau of Justice Assistance, U.S. Department of Justice.Google Scholar
- McCafferty, J. J. (2015). Professional discretion and the predictive validity of a juvenile risk assessment instrument: Exploring the overlooked principle of effective correctional classification. Youth Violent and Juvenile Justice, December 28.Google Scholar
- Pew (2011). Risk/needs assessment 101: Science reveals new tools to manage offenders. PEW Center on the States, Public Safety Performance Project. www.pewcenteronthestates.org/publicsafety.
- Starr, S. B. (2014). Evidence-based sentencing and the scientific rationalization of discrimination. Stanford Law Review, 66, 803–872.Google Scholar
- Trochim, W. M. K. (2001). Regression discontinuity design Smelser, N. J., & Bates, P. B. (Eds.), (Vol. 19.Google Scholar