Journal of Classification

, Volume 35, Issue 1, pp 147–171 | Cite as

The Lack of Cross-Validation Can Lead to Inflated Results and Spurious Conclusions: A Re-Analysis of the MacArthur Violence Risk Assessment Study

  • Ehsan Bokhari
  • Lawrence Hubert


Cross-validation is an important evaluation strategy in behavioral predictive modeling; without it, a predictive model is likely to be overly optimistic. Statistical methods have been developed that allow researchers to straightforwardly cross-validate predictive models by using the same data employed to construct the model. In the present study, cross-validation techniques were used to construct several decision-tree models with data from the MacArthur Violence Risk Assessment Study (Monahan et al., 2001). The models were then compared with the original (non-cross-validated) Classification of Violence Risk assessment tool. The results show that the measures of predictive model accuracy (AUC, misclassification error, sensitivity, specificity, positive and negative predictive values) degrade considerably when applied to a testing sample, compared with the training sample used to fit the model initially. In addition, unless false negatives (that is, incorrectly predicting individuals to be nonviolent) are considered more costly than false positives (that is, incorrectly predicting individuals to be violent), the models generally make few predictions of violence. The results suggest that employing cross-validation when constructing models can make an important contribution to increasing the reliability and replicability of psychological research.


Classification trees Cross-validation Replicability Misclassification costs Random forests Violence prediction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

357_2018_9252_MOESM1_ESM.pdf (827 kb)
ESM 1 (PDF 826 kb)
357_2018_9252_MOESM2_ESM.pdf (852 kb)
ESM 2 (PDF 852 kb)


  1. BANKS, S., ROBBINS, P.C., SILVER, E., VESSELINOV, R., STEADMAN, H.J., MONAHAN, J., and ROTH, L.H. (2004), “A Multiple-Models Approach to Violence Risk Assessment Among People With Mental Disorder”, Criminal Justice and Behavior, 31, 324–340.CrossRefGoogle Scholar
  2. BERK, R. (2011), “Asymmetric Loss Functions for Forecasting in Criminal Justice Settings”, Journal of Quantitative Criminology, 27, 107–123.CrossRefGoogle Scholar
  3. BERK, R. (2012), Criminal Justice Forecasts of Risk: A Machine Learning Approach, New York, NY: Springer.CrossRefGoogle Scholar
  4. BREIMAN, L. (1996), “Bagging Predictors”, Machine Learning, 26, 123–140.zbMATHGoogle Scholar
  5. BREIMAN, L. (2001), “Random Forests”, Machine Learning, 45, 5–32.CrossRefzbMATHGoogle Scholar
  6. BREIMAN, L., FRIEDMAN, J.H., OLSHEN, R.A., and STONE, C.J. (1984), Classification and Regression Trees, Belmont, CA: Wadsworth and Brooks.zbMATHGoogle Scholar
  7. BREIMAN, L., and SPECTOR, P. (1992), “Submodel Selection and Evaluation in Regression. The X-Random Case”, International Statistical Review, 291–319.Google Scholar
  8. DOYLE, M., SHAW, J., CARTER, S., and DOLAN, M. (2010), “Investigating the Validity of the Classification of Violence Risk in a UK Sample”, International Journal of Forensic Mental Health, 9, 316–323.CrossRefGoogle Scholar
  9. FERNÁNDEZ-DELGADO, M., CERNADAS, E., BARRO, S., and AMORIM, D. (2014), “Do We Need Hundreds of Classifiers to Solve Real World Classification Problems?”, The Journal of Machine Learning Research, 15, 3133–3181.MathSciNetzbMATHGoogle Scholar
  10. GARDNER, W., LIDZ, C.W., MULVEY, E.P., and SHAW, E.C. (1996), “A Comparison of Actuarial Methods for Identifying Repetitively Violent Patients with Mental Illnesses”, Law and Human Behavior, 20, 35–48.CrossRefGoogle Scholar
  11. GINI, C. (1912), Variability and Mutability: Contribution to the Study of Distributions and Report Statistics, Bologna, Italy: C. Cuppini.Google Scholar
  12. HARE, R.D. (1980), “A Research Scale for the Assessment of Psychopathy in Criminal Populations”, Personality and Individual Differences, 1, 111–119.CrossRefGoogle Scholar
  13. HARRIS, G.T., and RICE, M.E. (2013), “Bayes and Base Rates: What is an Informative Prior for Actuarial Violence Risk Assessment?”, Behavioral Sciences and the Law, 31, 103-124.CrossRefGoogle Scholar
  14. HASTIE, T. , TIBSHIRANI, R., and FRIEDMAN, J. (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.), New York, NY: Springer.CrossRefzbMATHGoogle Scholar
  15. JAMES, G., WITTEN, D., HASTIE, T., and TIBSHIRANI, R. (2013), An Introduction to Statistical Learning, New York, NY: Springer.CrossRefzbMATHGoogle Scholar
  16. KUHN, M., and JOHNSON, K. (2013), Applied Predictive Modeling, New York, NY: Springer.CrossRefzbMATHGoogle Scholar
  17. MCCUSKER, P.J. (2007), “Issues Regarding the Clinical Use of the Classification of Violence Risk (COVR) Assessment Instrument”, International Journal of Offender Therapy and Comparative Criminology, 51, 676–685.CrossRefGoogle Scholar
  18. MCDERMOTT, B.E., DUALAN, I.V., and SCOTT, C.L. (2011), “The Predictive Ability of the Classification of Violence Risk (COVR) in a Forensic Psychiatric Hospital”, Psychiatric Services, 62, 430–433.CrossRefGoogle Scholar
  19. MEEHL, P.E., and ROSEN, A.(1955), “Antecedent Probability and the Efficiency of Psychometric Signs, Patterns, or Cutting Scores”, Psychological Bulletin, 52, 194–215.CrossRefGoogle Scholar
  20. MONAHAN, J., STEADMAN, H.J., APPELBAUM, P.S., GRISSO, T., MULVEY, E.P., ROTH, L.H., and SILVER, E. (2006), “The Classification of Violence Risk”, Behavioral Sciences and the Law, 24, 721–730.CrossRefGoogle Scholar
  21. MONAHAN, J., STEADMAN, H.J., ROBBINS, P.C., APPELBAUM, P.S., BANKS, S., GRISSO, T., and SILVER, E. (2005), “An Actuarial Model of Violence Risk Assessment for Persons with Mental Disorders”, Psychiatric Services, 56, 810–815.CrossRefGoogle Scholar
  22. MONAHAN, J., STEADMAN, H.J., ROBBINS, P.C., SILVER, E., APPELBAUM, P.S., GRISSO, T., and ROTH, L.H. (2000), “Developing a Clinically Useful Actuarial Tool for Assessing Violence Risk”, The British Journal of Psychiatry, 176, 312–319.CrossRefGoogle Scholar
  23. MONAHAN, J., STEADMAN, H.J., SILVER, E., APPELBAUM, P.S., ROBBINS, P.C., MULVEY, E.P., and BANKS, S. (2001), Rethinking Risk Assessment: The MacArthur Study of Mental Disorder and Violence, New York, NY: Oxford University Press.Google Scholar
  24. MOSSMAN, D. (2006), “Critique of Pure Risk Assessment or, Kant Meets Tarasoff”, University of Cincinnati Law Review, 75, 523–609.Google Scholar
  25. MOSSMAN, D. (2013), “Evaluating Risk Assessments Using Receiver Operating Characteristic Analysis: Rationale, Advantages, Insights, and Limitations”, Behavioral Sciences and the Law, 31, 23–39.CrossRefGoogle Scholar
  26. PASHLER, H., and WAGENMAKERS, E.J. (2012), “Editors’ Introduction to the Special Section on Replicability in Psychological Science: A Crisis of Confidence?”, Perspectives on Psychological Science, 7, 528–530.CrossRefGoogle Scholar
  27. POLLACK, I., and NORMAN, D.A. (1964), “A Non-Parametric Analysis of Recognition Experiments”, Psychonomic Science, 1, 125–126.CrossRefGoogle Scholar
  28. R CORE TEAM (2014), R: A Language and Environment for Statistical Computing (Version 3.1.1), Vienna, Austria,
  29. ROBERTS, S., and PASHLER, H. (2000), “How Persuasive is a Good Fit? A Comment on Theory Testing”, Psychological Review. 107, 358–367.CrossRefGoogle Scholar
  30. SNOWDEN, R.J., GRAY, N.S., TAYLOR, J., and FITZGERALD, S. (2009), “Assessing Risk of Future Violence Among Forensic Psychiatric Inpatients with the Classification of Violence Risk (COVR)”, Psychiatric Services, 60, 1522–1526.CrossRefGoogle Scholar
  31. SPSS, INC. (1993), SPSS for Windows (Release 6.0), Chicago, IL: SPSS, Inc.Google Scholar
  32. STEADMAN, H.J., SILVER, E., MONAHAN, J., APPELBAUM, P.S., ROBBINS, P.C., MULVEY, E.P., and BANKS, S. (2000), “A Classification Tree Approach to the Development of Actuarial Violence Risk Assessment Tools”, Law and Human Behavior, 24, 83–100.CrossRefGoogle Scholar
  33. STURUP, J., KRISTIANSSON, M., and LINDQVIST, P. (2011), “Violent Behaviour by General Psychiatric Patients in Sweden: Validation of Classification of Violence Risk (COVR) Software”, Psychiatry Research, 188, 161–165.CrossRefGoogle Scholar
  34. VRIEZE, S.I., and GROVE, W.M. (2008), “Predicting Sex Offender Recidivism. I. Correcting for Item Overselection and Accuracy Overestimation in Scale Development. II. Sampling Error-Induced Attenuation of Predictive Validity over Base Rate Information”, Law and Human Behavior, 32, 266–278.CrossRefGoogle Scholar

Copyright information

© Classification Society of North America 2018

Authors and Affiliations

  1. 1.University of Illinois at Urbana-ChampaignChampaignUSA
  2. 2.Los AngelesUSA

Personalised recommendations