Real Applications

Berk, Richard

doi:10.1007/978-3-030-02272-3_7

Richard Berk²

1551 Accesses

Abstract

In order to help illustrate the ideas from previous chapters, this chapter provides detailed examples of criminal justice forecasting. These are real applications that led to procedures adopted by criminal justice agencies. As such, they combine a number of technical matters with the practical realities of criminal justice decisions-making. For the reasons already addressed, random forests will be the machine learning method of choice with one exception near the end of the chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This pattern is common. In Philadelphia, for example, “Last year, 85% of the city’s homicides were African American, almost all of them male. Four of five killers were African American males, demographically indistinguishable from their victims.” …Quoting Mayor Michael Nutter: “The No. 1 issue for homicide in Philadelphia is generally classified as an ‘argument.’ ” (Heller 2012).
2.
For this analysis, the code in the random forests program in R (randomForest) was written as: rf3 ← randomForest(morefail ∼ iassault+igun+priors+intage+sex, data = temp1, importance = T, sampsize = c(500, 200)). The output was saved under the name of rf3. The variable morefail was the response. There were 5 predictors starting with iassault and ending with sex. The input data were temp1, predictor importance plots are requested with importance=T, and sampsize = c(500,200) determined the sampling strategy for each tree. The assignment symbol ← is produced in R by a < followed immediately by a −. Random forests does not provide for evaluation data used to tune the algorithm in part because very little tuning is needed.
3.
The order in c(500, 200) is numerically (low to high) or alphabetically, depending on how the two outcome categories are coded.
4.
Classification accuracy and forecasting accuracy is reported rather than classification error and forecasting error. This was requested by criminal justice stakeholders although the information contained is really no different. It was felt that the results would have a more favorable reception if accuracy rather than error was highlighted.
5.
The approach generalizes to outcomes with more than two categories. The two-thirds criterion is applied to the category with the fewest cases and the other categories are adjusted to obtain the desire cost ratios. The number of cost ratios to consider will increase. With three outcome categories, there are three costs ratios. Tuning becomes more demanding.
6.
In effect, the false negative 2% should be 15 times greater (2% × 15) when relative costs are factored in.
7.
Using a larger number of predictors, random forests correctly identifies failures about 80% of the time.
8.
An “immediate” crime is the crime of the most recent conviction after which the probation or the parole decision was made.
9.
This reasoning is much like the reasoning that applies to the difference between a partial regression coefficient and the “explained” variance uniquely attributable to a predictor.
10.
Serious crimes included murder, attempted murder, aggravated assault, robbery or a sexual crime.
11.
output ←randomForest(threeway ∼iseriouscount+Asexpriors+Jfirstage+seriousyears+Aviolpriors+jailpriors+Afirstviolage+age+Afirstage+jaildaypriors+Aallpriors+Zipfac, data = w2, importance = T, sampsize = c(10000, 10000, 10000)).
12.
Recall that the sample sizes alter the prior distribution of the outcome for each tree, which in turn alters to loss associated with each kind of forecasting error.
13.
Having a prior for a sex offense does not necessarily mean that the individual is labeled a sex offender. The sexual offense may be minor, in the distant past, and unrelated to current criminal behavior.
14.
That said, it often seems that when machine learning is applied to criminal justice risk assessments, the stronger predictors tend to be behavioral, not attitudinal. As noted earlier, various features of anti social behavior in the past can be excellent predictors of anti social behavior in the future. Psychological profiles or inferences about states of mind typically add little to classification accuracy once the behavioral measures are included.
15.
There are almost no data beyond a value of 50, and what there is, probably was recorded in error. Individuals with no prior serious offenses were coded as having a value of 100. A value of 100 is sufficiently long ago that the crime could never have occurred. This large value has no effect on the response function for values less than 50 because the fitted values are highly local. Researchers unfamiliar with local fitting methods, may find this surprising.
16.
The plot was constructed using the R procedure ternaryplot in the library TeachingDemos. Here’s the code: ternaryplot(votes, dimnames = c(V = “NonViolent”, O = “None”, V = “Violent”), main = “Ternary Plot for Class Votes”, col = “blue”, cex = 0.5, coordinates = T). The meaning each argument is well explained in the help documentation.
17.
The impact of different base rates can be overcome if there are no classification errors.
18.
This may seem counterintuitive until one recalls that false negatives and false positives condition on the actual outcome while forecasting accuracy conditions on the forecast. The two kinds of errors measure rather different things. That is one reason why the exposition of algorithmic errors has stressed that false positives and false negatives are about correct classification and forecasting accuracy is about correct prediction into the future.
19.
Differences between very small proportions cannot be large. One might make the case that a better way to summarize racial differences is with ratios. Some of those ratios are quite large (e.g., 5 to 1). Whether differences or ratios are preferred is a choice to be made by stakeholders. They would need to consider not just the size of the disparities but how many individuals would be differentially affected. If very few individuals would be differentially affected, stakeholders might conclude that ratio comparisons are misleading. Note also that had the base rates been equalized at the Black juvenile figure, the results could have been quite different. With current concerns about “mass incarceration,” equalizing at the base rate for White offenders was probably a prudent choice.

References

Berk, R. A. (2019) Accuracy and fairness for juvenile justice risk assessments. Journal of Empirical Legal Studies, forthcoming.
Google Scholar
Heller, K. (2012) Karen Heller: Philadelphia’s murder rate is a deadly, costly epidemic. Philadelphia Inquirer January 4, 2012.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Criminology, University of Pennsylvania, Philadelphia, PA, USA
Richard Berk

Authors

Richard Berk
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Berk, R. (2019). Real Applications. In: Machine Learning Risk Assessments in Criminal Justice Settings. Springer, Cham. https://doi.org/10.1007/978-3-030-02272-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-02272-3_7
Published: 14 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02271-6
Online ISBN: 978-3-030-02272-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics