Journal of Experimental Criminology

, Volume 13, Issue 2, pp 193–216 | Cite as

An impact assessment of machine learning risk forecasts on parole board decisions and recidivism




The Pennsylvania Board of Probation and Parole has begun using machine learning forecasts to help inform parole release decisions. In this paper, we evaluate the impact of the forecasts on those decisions and subsequent recidivism.


A close approximation to a natural, randomized experiment is used to evaluate the impact of the forecasts on parole release decisions. A generalized regression discontinuity design is used to evaluate the impact of the forecasts on recidivism.


The forecasts apparently had no effect on the overall parole release rate, but did appear to alter the mix of inmates released. Important distinctions were made between offenders forecasted to be re-arrested for nonviolent crime and offenders forecasted to be re-arrested for violent crime. The balance of evidence indicates that the forecasts led to reductions in re-arrests for both nonviolent and violent crimes.


Risk assessments based on machine learning forecasts can improve parole release decisions, especially when distinctions are made between re-arrests for violent and nonviolent crime.


Parole Machine learning Recidivism Forecasting Regression discontinuity design Multinomial logistic regression 



The entire project would have been impossible without the efforts of Jim Alibrio, Fred Klunk and their many colleagues working at the Pennsylvania Board of Probation and Parole and the Pennsylvania Department of Corrections. Thanks also go to the National Institute of Justice for financial support and to Patrick Clark who was the project monitor at NIJ. Finally, the paper benefitted from an unusually thorough and constructive set of reviews.


  1. Agresti, A. (2002). Categorical data analysis. New York: Wiley.CrossRefGoogle Scholar
  2. Armitage, P. (1955). Tests for linear trends in proportions and frequencies. Biometrtics, 11(3), 375–386.CrossRefGoogle Scholar
  3. Berk, R. A. (2010). Handbook of quantitative criminology In Piquero, A., & Weisburd, D. (Eds.), Recent perspectives on the regression discontinuity design. New York: Springer.CrossRefGoogle Scholar
  4. Berk, R. A. (2012). Criminal justice forecasts of risk: a machine learning approach. New York: Springer.CrossRefGoogle Scholar
  5. Berk, R. A., Barnes, G., Alhman, L., & Kurtz, E. (2010b). When second best is good enough: a comparison between a true experiment and a regression discontinuity quasi-experiment. J. Exp. Criminol., 6(2), 191–208.Google Scholar
  6. Berk, R. A., & Bleich, J. (2013). Statistical procedures for forecasting criminal behavior: A comparative assessment. Journal of Criminology and Public Policy, 12 (3), 515–544.Google Scholar
  7. Berk, R. A., Brown, L., & Zhao, L. (2010a). Statistical inference after model selection. J. Quant. Criminol., 26(2), 217–236.Google Scholar
  8. Berk, R. A., Brown, L., Buja, A., Zhang, K., & Zhao, L. (2014a). Valid post-selection inference. Annals of Statistics, 41(2), 802–837.Google Scholar
  9. Berk, R. A., Pitkin, E., Brown, L., Buja, A., George, E., & Zhao, L. (2014b). Covariance adjustments for the analysis of randomized field experiments. Eval. Rev., 37, 170–196.Google Scholar
  10. Berk, R. A., Brown, L., Buja, A., George, E., Pitkin, E., Zhang, K., & Zhao, L. (2014c). Misspecified mean function regression: making good use of regression models that are wrong. Sociological Methods and Research, 43, 422–451.Google Scholar
  11. Berk, R. A., & de Leeuw, J. (1999). An evaluation of California’s inmate classification system using a generalized regression discontinuity design. J. Am. Stat. Assoc., 94(448), 1045–1052.Google Scholar
  12. Berk, R. A., & Hyatt, J. (2015). Machine learning forecasts of risk to inform sentencing decisions. The Federal Sentencing Reporter, 27(4), 222–228.CrossRefGoogle Scholar
  13. Berk, R. A., & Rauma, D. (1983). Capitalizing on nonrandom assignment to treatments: a regression discontinuity evaluation of a crime control program. J. Am. Stat. Assoc., 78(381), 21–27.CrossRefGoogle Scholar
  14. Bertanha, M., & Imbens, G. W. (2014). External validity in fuzzy regression discontinuity designs. National Bureau of Economic Research, working paper 20773.Google Scholar
  15. Breiman, L. (2001). Random forests. Mach. Learn., 45, 5–32.CrossRefGoogle Scholar
  16. Buja, A., Berk, R. A., Brown, L., George, E., Pitkin, E., Traskin, M., Zhao, L., & Zhang, K. (2015). Models as approximations — a conspiracy of random regressors and model violations against classical inference in regression. imsart-sts ver.2015/07/30: Buja_et_al_Conspiracy-v2.tex year: July 23, 2015.Google Scholar
  17. Borden, H. G. (1928). Factors predicting parole success. Journal of the American Institute of Criminal Law and Criminology, 19, 328–336.CrossRefGoogle Scholar
  18. Burgess, E. M. (1928). The Working of the Indeterminate Sentence Law and the Parole System in Illinois In Bruce, A. A., Harno, A. J., Burgess, E., & Landesco, E. W. (Eds.), Factors determining success or failure on parole, (pp. 205–249). Springfield: State Board of Parole.Google Scholar
  19. Duwe, G. (2014). The development, validity, and reliability of the Minnesota screening tool assessing recidivism risk (mnSTARR). Criminal Justice Policy Review, 25(5), 579–613.CrossRefGoogle Scholar
  20. Campbell, D. T., & Stanley, J. (1963). Experimental and quasi-experimental designs for research (Independence. Kentucky: Cengage Learning).Google Scholar
  21. Chen, M., & Shapiro, J. (2007). Do harsher prison conditions reduce recidivism: a discontinuity-based approach. American Law and Economics Review, 9, 1–29.CrossRefGoogle Scholar
  22. Cochran, W. G. (1954). Some methods for strengthening the common χ 2 tests. Biometrics, 10(4), 417–451.CrossRefGoogle Scholar
  23. Conroy, M. A. (2006). Risk assessments in the Texas criminal justice system. Applied Psychology in Criminal Justice, 2(3), 147–176.Google Scholar
  24. Gelman, A., & Imbens, G. (2014). Why high-order polynomials should not be used in regression discontinuity designs. Cambridge: National Bureau of Economic Research. (No. w20405).CrossRefGoogle Scholar
  25. Gottfredson, S. D., & Moriarty, L. J. (2006). Statistical risk assessment: old problems and new applications. Crime Delinq., 52(1), 178–200.CrossRefGoogle Scholar
  26. Hamilton, Z., Kigerl, A., & Campagna, M. (2016). Designed to fit: the development and validation of the STRONG-R recidivism risk assessment. Criminal Justice and Behavior February, 43(2), 230–263.CrossRefGoogle Scholar
  27. Hahn, J., Todd, J. P., & Van der Klaauw, W. (2001). Identification and estimation of treatment effects with a regression discontinuity design. Econometrica, 69, 201–209.Google Scholar
  28. Harcourt, B. (2008). Against prediction: profiling, policing and punishing in an actuarial age. Chicago: University of Chicago Press.Google Scholar
  29. Hollander, M., & Wolfe, D. A. (1999). Nonparametric statistical methods, 2nd edn. New York: Wiley.Google Scholar
  30. Holsinger, A. M. (2013). Implementation of actuarial risk/need assessment and its effect on community supervision revocations. Justice Research and Policy, 15(1), 95–122.CrossRefGoogle Scholar
  31. Imani, K., King, G., & Stuart, E. A. (2008). Misunderstandings between experimentalists and observationalists about causal Inference. J. R. Stat. Soc. Ser. A, 171, 481–502.CrossRefGoogle Scholar
  32. Imbens, G. W., & Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. J. Econ., 142, 611–614.CrossRefGoogle Scholar
  33. Imbens, G. W., & Rubin, D. B. (2015). Causal inference for statistics, social and biomedical sciences: an introduction. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  34. Jalbert, S. K., Rhodes, W., Flygare, C., & Kane, M. (2010). Testing probation outcomes in an evidence-based practice setting: reduced caseload size and intensive supervision effectiveness. Journal of Offender Rehabilitation, 49, 233–253.CrossRefGoogle Scholar
  35. LaVigne, N., Bieler, S., Cramer, L., Ho, H., Kotonias, C., Mayer, D., McClure, D., Pacifici, L., Parks, E., Peterson, B., & Samuels, J. (2014). Justice Reinvestment Initaitive State Assessment Report. Washington D.C.: Bureau of Justice Assistance, U.S. Department of Justice.Google Scholar
  36. Loeffler, C. E. (2015). Processed as an adult: a regression discontinuity estimate of crime effects of charging nontransfer juveniles as adults. Journal of research on crime and delinquency, 52(6), 890–922.CrossRefGoogle Scholar
  37. McCafferty, J. J. (2015). Professional discretion and the predictive validity of a juvenile risk assessment instrument: Exploring the overlooked principle of effective correctional classification. Youth Violent and Juvenile Justice, December 28.Google Scholar
  38. Miller, J., & Malony, C. (2013). Practitioner compliance with risk/needs assessment tools: A theoretical and empirical assessment. Criminal Justice and Behavior, 40(7), 716–736.CrossRefGoogle Scholar
  39. Monahan, J., & Skeem, J. L. (2014). Risk redux: The resurgence of risk assessment in criminal sanctioning. Federal Sentencing Reporter, 26(3), 158–661.CrossRefGoogle Scholar
  40. Pew (2011). Risk/needs assessment 101: Science reveals new tools to manage offenders. PEW Center on the States, Public Safety Performance Project.
  41. Rhodes, W., & Jalbert, S. K. (2013). Regression discontinuity design in criminal justice evaluation. Eval. Rev., 37(3-4), 239–273.CrossRefGoogle Scholar
  42. Starr, S. B. (2014). Evidence-based sentencing and the scientific rationalization of discrimination. Stanford Law Review, 66, 803–872.Google Scholar
  43. Thistlewaite, D. L., & Campbell, D. T. (1960). Regression-Discontinuity analysis: An alternative to the ex-post facto design. J. Educ. Psychol., 51, 309–317.CrossRefGoogle Scholar
  44. Tonrey, M. (2014). Legal and ethical issues in the prediction of recidivism. Federal Sentencing Reporter, 26(3), 167–176.CrossRefGoogle Scholar
  45. Trochim, W. M. K. (2001). Regression discontinuity design Smelser, N. J., & Bates, P. B. (Eds.), (Vol. 19.Google Scholar
  46. Viglione, J., Rudes, D. S., & Taxman, F. S. (2014). Misalignment in supervision implementing risk/needs assessment instruments in probation. Criminal Justice and Behavior, 42(3), 263–285.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  1. 1.Department of CriminologyUniversity of PennsylvaniaPhiladelphiaUSA
  2. 2.Department of StatisticsUniversity of PennsylvaniaPhiladelphiaUSA

Personalised recommendations