Skip to main content

Advertisement

Log in

Forecasting Dangerous Inmate Misconduct: An Application of Ensemble Statistical Procedures

  • Original article
  • Published:
Journal of Quantitative Criminology Aims and scope Submit manuscript

Abstract

In this paper, we attempt to forecast which prison inmates are likely to engage in very serious misconduct while incarcerated. Such misconduct would usually be a major felony if committed outside of prison: drug trafficking, assault, rape, attempted murder and other crimes. The binary response variable is problematic because it is highly unbalanced. Using data from nearly 10,000 inmates held in facilities operated by the California Department of Corrections, we show that several popular classification procedures do no better than the marginal distribution unless the data are weighted in a fashion that compensates for the lack of balance. Then, random forests performs reasonably well, and better than CART or logistic regression. Although less than 3% of the inmates studied over 24 months were reported for very serious misconduct, we are able to correctly forecast such behavior about half the time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. For both age variables, the categories represent how CDC records such data. We could not get other age breakdowns.

  2. The upper bound of 50 represents how CDC records sentence length information. We could not distinguish between the several different kinds of sentences all labeled as “50”.

  3. With no false positives, the cost ratio was infinite.

  4. Without the oversampling, one risks getting a number of bootstrap samples with none of the rare cases. The response variable is then a constant.

  5. For our software, there was no way to directly introduce costs into the algorithm. But there were fitted values one could interpret as estimates of the probability of serious misconduct. We set the threshold not at the usual 0.50, but at a value slightly below the observed proportion of cases for which serious misconduct was reported. This value was chosen to approximate the desired 10 to 1 balance of false positives to false negatives (implying that the costs of false negatives to false positives was 10 to 1). This is analogous to one of the methods for handling costs in random forest where the voting threshold would not be set at 50%, but at the marginal percentage for the response category that needed to be given more weight.

  6. The small negative values represent sampling error and are properly interpreted as effectively zero.

  7. The very small negative values again represent sampling error and are properly interpreted as effectively zero.

  8. Had the partial response plot for no misconduct been shown, it would just have been the mirror image. For binary responses, only one of the two possible partial response plots need be shown. This is not true when there are more than two classification categories. Then, there needs to be one partial response plot for each response category.

  9. The partial response plots are not very interesting for the two age variables and for gang activity because the age variables are measured in just a few ordinal categories and gang activity is a binary variable.

  10. This is not caused by overfitting, which is measured by the decline of forecasting skill into a new random sample from the same population. Here the issue is a changing population.

References

  • Alexander J, Austin J (1992) Handbook for evaluating prison classifications systems. National Council on Crime and Delinquency, San Francisco

    Google Scholar 

  • Austin J (1986) Evaluating how well your classification system is operating: a practical approach. Crime Delinquency 32(3):302–322

    Article  Google Scholar 

  • Austin J, Baird C, Neuenfeldt D (1993) Classification for internal management purposes: the Washington experience. Classification: a tool for managing today’s offenders. Am Correctional Assoc

  • Baird C (1993) Objective classification in Tennessee: management, effectiveness, and planning issues. Classification: A tool for managing today’s offenders. Am Correctional Assoc

  • Berk RA, de Leeuw J (1998) An evaluation of California’s inmate classification system using a generalized regression discontinuity design. J Am Stat Assoc 94(448):1045–1052

    Article  Google Scholar 

  • Berk RA, Ladd H, Graziano H, Baek J (2003) A randomized experiment testing inmate classification systems. J Criminol Public Policy 2(2):215–242

    Article  Google Scholar 

  • Berk RA (2005) An introduction to ensemble methods for data analysis. Soc Meth and Research, forthcoming

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984). Classification and regression trees. Wadsworth and Brooks/Cole, Monterey, CA

    Google Scholar 

  • Breiman L (2001a) Random forests. Machine Learning 45:5–32

    Article  Google Scholar 

  • Breiman L (2001b) Statistical modeling: two cultures (with discussion). Stat Sci 16:199–231

    Article  Google Scholar 

  • Breiman L (2001c) Wald lecture I: machine learning. At ftp://ftp.stat.berkeley.edu/pub/users/breiman/

  • Brennan T (1987) Classification: an overview of selected methodological issues. In: Prediction and classification: criminal justice decision making. University of Chicago Press, Chicago

  • Brennan T (1993) Risk assessment: an evaluation of statistical classification methods. In: Classification: a tool for managing today’s offenders. Am Correctional Assoc

  • Friedman JH (2002) Stochastic gradient boosting. Comp Stat Data Anal 38(4):367–378

    Article  Google Scholar 

  • Gottfredson MR, Hirschi T (1990) A general theory of crime. Stanford University Press, Stanford, CA

    Google Scholar 

  • Hamil-Luker J, Land KC, Blau J (2003) Diverse trajectories of cocaine use through early adulthood among rebellious and socially conforming youth. Soc Sci Res, forthcoming

  • Harer MD, Langan NP (2001) Gender differences in predictors of prison violence: assessing the predictive validity of a risk classification system. Crime Delinquency 47:513–536

    Article  Google Scholar 

  • Hardyman PL, Austin J, Tulloch OC (2000) Revalidating external classification systems: the experience of seven states and model for classification reform. Report submitted to the National Institute of Corrections. The Institute on Crime, Justice and Corrections at The George Washington University, Washington, DC

  • Hardyman PL, Adams-Fuller T (2001) National institute of corrections prison classification peer training and strategy session: what’s happening with prison classification systems? September 6–7, 2000 Proceedings

  • Harrison PM, Karlberg JC (2003) Prison and jail inmates at midyear, 2002. Bureau of Justice Stat Bulletin, April, 2003, NCJ 198877

  • Hastie T, Tibshiani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction. Springer-Verlag, New York

    Google Scholar 

  • Kane TR (1986) The validity of prison classification: an introduction to practical considerations and research issues. In: Lawrence A. Bennette (ed) Crime & delinquency 32, no. 3. Sage Publications, Newbury Park, CA

    Google Scholar 

  • Sampson RJ, Laub JH (1993) Crime and deviance over the life course: the salience of adult social bonds. Am Soc Rev 55:609–627

    Article  Google Scholar 

Download references

Acknowledgments

The research reported in this paper would have been impossible without the talents and efforts of our colleagues at the California Department of corrections: George Lehman, Maureen Tristan, Gloria Rea, Penny O’Daniel, Micki Mitchell, Mark Cook, Martha Pyog, and Terrence Newsome. Andy Liaw provided a number of useful suggestions on the random forests analysis. Support for work on this paper was provided by the National Science Foundation: (SES-0437169)“Ensemble Methods for Data Analysis in the Behavioral, Social and Economic Sciences.” The support is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard A. Berk.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Berk, R., Kriegler, B. & Baek, JH. Forecasting Dangerous Inmate Misconduct: An Application of Ensemble Statistical Procedures. J Quant Criminol 22, 131–145 (2006). https://doi.org/10.1007/s10940-006-9005-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10940-006-9005-z

Keywords

Navigation