Skip to main content
Log in

Forecasts of Violence to Inform Sentencing Decisions

Journal of Quantitative Criminology Aims and scope Submit manuscript



Recent legislation in Pennsylvania mandates that forecasts of "future dangerousness" be provided to judges when sentences are given. Similar requirements already exist in other jurisdictions. Research has shown that machine learning can lead to usefully accurate forecasts of criminal behavior in such setting. But there are settings in which there is insufficient IT infrastructure to support machine learning. The intent of this paper is provide a prototype procedure for making forecasts of future dangerousness that could be used to inform sentencing decisions when machine learning is not practical. We consider how classification trees can be improved so that they may provide an acceptable second choice.


We apply an version of classifications trees available in R, with some technical enhancements to improve tree stability. Our approach is illustrated with real data that could be used to inform sentencing decisions.


Modest sized trees grown from large samples can forecast well and in a stable fashion, especially if the small fraction of indecisive classifications are found and accounted for in a systematic manner. But machine learning is still to be preferred when practical.


Our enhanced version of classifications trees may well provide a viable alternative to machine learning when machine learning is beyond local IT capabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3


  1. There can also be very different approaches to how a classification model is developed and evaluated. For example, proper concern about overfitting is sometimes not coupled with the use of held-out, test data when a classification tree is constructed. Out-of-sample performance is today’s gold standard.

  2. Any predictor can be used as many times as needed, but with different splitting values. For instance, the first split for age might be at 21 years, and a later split might be at 32 years.

  3. The procedure is formally a Bayes classifier if the terminal node proportions are treated as probabilities.

  4. This formalization applies equally well when there are more than two outcome classes. Also, the 1/0 coding is convenient but any other binary labels will work. Because a class is categorical, the coding does not even need to be numerical. One might use, for instance, “F” for fail; and “NF” for not fail.

  5. There is nothing special about the 5 to 1 cost ratio, and for the methodological issues raised in the paper, most any reasonable cost ratio would suffice. The cost ratio just could not be so extreme that the same class is assigned to essentially all cases. In that instance, the role of either false positives or false negatives would be obscured.

  6. Cost ratios have ranged from 20 to 1 to 3 to 1. The relative costs of false negatives are more dear. Anecdotally, we have found that criminal justice officials and representatives from a variety of citizen groups broadly agree that at least for crimes of violence, false negatives are substantially more costly than false positives.

  7. In this context, a random forest is an ensemble of classification trees that differ from one another by user-controlled chance processes. In one such process, when each tree is grown, a random sample of observations is “held out” and serves as test data for that tree. There is no need for the researcher to assemble test data before the statistical analysis begins.

  8. Overall model error is reported at the lower right corner of the table. However, when the loss function is asymmetric, the overall proportion correct (or in error) is misleading. Symmetric costs are being assumed.

  9. One important reason stems from the large imbalance in the outcome distribution favoring the absence of violent crime. The absence of violent crime is far more common and, therefore, easier to predict correctly. Random forests earns its keep by forecasting rare events.

  10. We used the same tuning parameter values so that the trees were comparable. We were trying to isolate the impact of new data for a tree of a given complexity. Had we altered the tuning parameter values, changes in how cases were classified could result from either the new data or differing tuning parameters. The cost ratio was also unchanged.

  11. One reason Kuhnert and Mengersen suggest additional measures of stability is that they apparently do not find sufficiently strong justification for using a Bayes classifier. In our context, where the relative costs of false negatives and false positives are elicited from stakeholders, their concerns seem far less troubling.

  12. Even using the same data, it is likely that a machine learning procedure would do better. Recall, however, that in this paper we are assuming that machine learning is not a practical option.


  • Banks S, Robbins PC, Silver E, Vesselinov R, Steadman HJ, Monahan J, Mulvey EP, Applebaum PS, Grisso RT, Roth LH (2004) A multiple-models approach to violence risk assessment among people with mental disorder. Crim Justice Behav 31:324–340

    Article  Google Scholar 

  • Barnes GC, Ahlman L, Gill C, Sherman LW, Kurtz E, Malvestuto R (2010) Low intensity community supervision for low-risk offenders: a randomized, controlled trial. J Exp Criminol 6:159-189

    Article  Google Scholar 

  • Berk RA (2008a) Forecasting methods in crime and justice. In: Hagan J, Schepple KL, Tyler TR (eds) Annual review of law and social science, vol 4. Annual Reviews, Palo Alto, pp 173–192

  • Berk RA (2008b) Statistical learning from a regression perspective. Springer, New York

    Google Scholar 

  • Berk RA (2009) The role of race in forecasts of violent crime. Race Soc Probl 1(4):231–242

    Article  Google Scholar 

  • Berk RA (2011) Asymmetric loss functions for forecasting in criminal justice settings. J Quant Criminol 27:107–123

    Article  Google Scholar 

  • Berk RA (2012) Criminal justice forecasts of risk: a machine learning approach. Springer, New York

    Book  Google Scholar 

  • Berk RA (2013) Algorithmic criminology. Secur Inform 2(5) (forthcoming)

  • Berk RA, Sorenson SB, He Y (2005) Developing a practical forecasting screener for domestic violence incidents. Eval Rev 29(4):358–382

    Article  Google Scholar 

  • Berk RA, Kriegler B, Baek J-H (2006) Forecasting dangerous inmate misconduct: an application of ensemble statistical procedures. J Quant Criminol 22(2):135–145

    Article  Google Scholar 

  • Berk RA, Sherman L, Barnes G, Kurtz E, Ahlman L (2009) Forecasting murder within a population of probationers and parolees: a high stakes application of statistical learning. J R Stat Soc Ser A 172(part I):191–211

    Article  Google Scholar 

  • Borden HG (1928) Factors predicting parole success. J Am Inst Crim Law Criminol 19:328–336

    Article  Google Scholar 

  • Breiman L (2001a) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Breiman L (2001b) Statistical modeling: two cultures (with discussion). Stat Sci 16:199–231

    Article  Google Scholar 

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth Press, Monterey

    Google Scholar 

  • Burgess EM (1928) Factors determining success or failure on parole. In: Bruce AA, Harno AJ, Burgess EW, Landesco EW (eds) The working of the indeterminate sentence law and the parole system in Illinois. State Board of Parole, Springfield, pp 205–249

    Google Scholar 

  • Bushway S (2011) Albany Law Rev 74(3)

  • Casey PM, Warren RK, Elek JK (2011) Using offender risk and needs assessment information at sentencing: guidance from a national working group. National Center for State Courts.

  • Chipman HA, George EI, McCulloch RE (2010) BART: Bayesian additive regression trees. Ann Appl Stat 4(1):266–298

    Article  Google Scholar 

  • Farrington DP, Tarling R (2003) Prediction in criminology. SUNY Press, Albany

    Google Scholar 

  • Feeley M, Simon J (1994) Actuarial justice: the emerging new criminal law. In: Nelken D (ed) The futures of criminology. Sage, London, pp 173–201

    Google Scholar 

  • Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367–378

    Article  Google Scholar 

  • Gottfredson SD, Moriarty LJ (2006) Statistical risk assessment: old problems and new applications. Crime Delinq 52(1):178–200

    Article  Google Scholar 

  • Harcourt BW (2007) Against prediction: profiling, policing, and punishing in an actuarial age. University of Chicago Press, Chicago

    Google Scholar 

  • Hastie R, Dawes RM (2001) Rational choice in an uncertain world. Sage, Thousand Oaks

    Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York

    Book  Google Scholar 

  • Holmes S (2003) Bootstrapping phylogenetic trees: theory and methods. Stat Sci 18(2):241–255

    Article  Google Scholar 

  • Hyatt JM, Chanenson L, Bergstrom MH (2011) Reform in motion: the promise and profiles of incorporating risk assessments and cost-benefit analysis into Pennsylvania sentencing. Duquesne Law Rev 49(4):707–749

    Google Scholar 

  • Kleiman M, Ostrom BJ, Cheeman FL (2007) Using risk assessment to inform sentencing decisions for nonviolent offenders in Virginia. Crime Delinq 53(1):1–27

    Article  Google Scholar 

  • Kuhnert PM, Mengersen K (2003) Reliability measures for local nodes assessment in classification trees. J Comput Graph Stat 12(2):398–426

    Article  Google Scholar 

  • Messinger SL, Berk RA (1987) Dangerous people: a review of the NAS report on career criminals. Criminology 25(3):767–781

    Article  Google Scholar 

  • Monahan J (1981) Predicting violent behavior: an assessment of clinical techniques. Sage, Newbury Park

    Google Scholar 

  • Pew Center of the States, Public Safety Performance Project (2011) Risk/needs assessment 101: science reveals new tools to manage offenders. The Pew Center of the States.

  • Silver E, Chow-Martin L (2002) A multiple models approach to assessing recidivism risk: implications for judicial decision making. Crim Justice Behav 29:538–569

    Article  Google Scholar 

  • Skeem JL, Monahan J (2011) Current directions in violence risk assessment. Curr Dir Psychol Sci 21(1):38–42

    Article  Google Scholar 

  • Tóth N (2008) Handling classification uncertainty with decision trees in biomedical diagnostic systems. PhD thesis, Department of Measurement and Information Systems, Budapest University of Technology and Economics

  • Turner S, Hess J, Jannetta J (2009) Development of the California Risk Assessment Instrument. Center for Evidence Based Corrections, The University of California, Irvine

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Richard Berk.

Appendix: Outline of Software for Stability Assessments

Appendix: Outline of Software for Stability Assessments

The steps that follow summarize the R code used to discard unreliable forecasts.

  1. 1.

    Construct a classification tree using a loss-matrix with the desired costs.

  2. 2.

    For each case, find the terminal node where it was classified.

  3. 3.

    Create a table that shows for each terminal node, how many observations in that node were labelled “Fail” vs. “No Fail.”

  4. 4.

    To understand how close the vote in each node was, re-weight the “Fails” in each terminal node by their weight as if they were false negatives (e.g., assign a cost of 5.0 to each failure). This now allows one to check the majority vote (using weighted fails) to see which classification won.

  5. 5.

    Let F be the weighted sum of all of the “Fails” (e.g., 400 fails ×  5.0 cost = 2,000) and NF be total number of non-failures, each with a weight of 1.0. Let p = F/(F + NF), the proportion of the weighted total number of votes that are “Fail” in a given terminal node. This can be done symmetrically with (1 − p) as well.

  6. 6.

    Set a desired margin. Check if |p − .5| < c is too close to call. How close is too close will require some trial and error.

  7. 7.

    Store which nodes are too close and which are not. Use the output from the tree with the additional label that some nodes are now too close too call.

  8. 8.

    One can exclude these too-close observations and run a cross-check against the bootstrapped trees to see how the stability improves.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Berk, R., Bleich, J. Forecasts of Violence to Inform Sentencing Decisions. J Quant Criminol 30, 79–96 (2014).

Download citation

  • Published:

  • Issue Date:

  • DOI: