What do the experts know? Calibration, precision, and the wisdom of crowds among forensic handwriting experts

  • Kristy A. Martire
  • Bethany Growns
  • Danielle J. Navarro
Brief Report


Forensic handwriting examiners currently testify to the origin of questioned handwriting for legal purposes. However, forensic scientists are increasingly being encouraged to assign probabilities to their observations in the form of a likelihood ratio. This study is the first to examine whether handwriting experts are able to estimate the frequency of US handwriting features more accurately than novices. The results indicate that the absolute error for experts was lower than novices, but the size of the effect is modest, and the overall error rate even for experts is large enough as to raise questions about whether their estimates can be sufficiently trustworthy for presentation in courts. When errors are separated into effects caused by miscalibration and those caused by imprecision, we find systematic differences between individuals. Finally, we consider several ways of aggregating predictions from multiple experts, suggesting that quite substantial improvements in expert predictions are possible when a suitable aggregation method is used.


Judgment and decision-making Bayesian modeling Expertise Wisdom of crowds 


  1. Aitken, C., Berger, C. E. H., Buckleton, J. S., Champod, C., Curran, J., Dawid, A., & Jackson, G. (2011). Expressing evaluative opinions: a position statement. Science and Justice, 51, 1–2.CrossRefGoogle Scholar
  2. Biedermann, A., Garbolino, P., & Taroni, F. (2013). The subjectivist interpretation of probability and the problem of individualisation in forensic science. Science and Justice, 53, 192–200.CrossRefPubMedGoogle Scholar
  3. Budescu, D. V., & Johnson, T. R. (2011). A model-based approach for the analysis of the calibration of probability judgments. Judgment and Decision Making, 6, 857–869.Google Scholar
  4. Cochran, W. G. (1968). Errors of measurement in statistics. Technometrics, 10(4), 637–666.CrossRefGoogle Scholar
  5. Dror, I. E. (2016). A hierarchy of expert performance. Journal of Applied Research in Memory and Cognition, 5(2), 121–127.CrossRefGoogle Scholar
  6. Dror, I. E., & Cole, S. A. (2010). The vision in “blind” justice: expert perception, judgment, and visual cognition in forensic pattern recognition. Psychonomic Bulletin & Review, 17(2), 161–167.CrossRefGoogle Scholar
  7. Dyer, A. G., Found, B., & Rogers, D. (2006). Visual attention and expertise for forensic signature analysis. Journal of Forensic Sciences, 51(6), 1397–1404.CrossRefPubMedGoogle Scholar
  8. Edwards, H., & Gotsonis, C. (2009). Strengthening forensic science in the United States: A path forward. Washington, DC: National Academies Press.Google Scholar
  9. Ericsson, K. A., & Lehmann, A. C. (1996). Expert and exceptional performance: Evidence of maximal adaptation to task constraints. Annual Review of Psychology, 47(1), 273–305.CrossRefPubMedGoogle Scholar
  10. Ericsson, K. A., & Pool, R. (2016). Peak: Secrets from the new science of expertise. Houghton Mifflin Harcourt.Google Scholar
  11. Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100(3), 363.CrossRefGoogle Scholar
  12. Faigman, D. L. (2007). Anecdotal forensics, phrenology, and other abject lessons from the history of science. Hastings Law Journal, 59, 979–1000.Google Scholar
  13. Fiser, J., & Aslin, R. N. (2001). Unsupervised statistical learning of higher-order spatial structures from visual scenes. Psychological Science, 12, 499–503.CrossRefPubMedGoogle Scholar
  14. Fiser, J., & Aslin, R. N. (2002). Statistical learning of higher-order temporal structure from visual shape sequences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28(3), 458–467.PubMedGoogle Scholar
  15. Found, B., & Rogers, D. (2008). The probative character of forensic handwriting examiners’ identification and elimination opinions on questioned signatures. Forensic Science International, 178(1), 54–60.CrossRefPubMedGoogle Scholar
  16. Gladwell, M. (2008). Outliers: The story of success. UK: Hachette.Google Scholar
  17. Goertzel, T. (1994). Belief in conspiracy theories. Political Psychology, 731–742.Google Scholar
  18. Janis, I. L. (1982) Vol. 349. Boston: Houghton Mifflin.Google Scholar
  19. Johnson, M. E., Vastrick, T. W., Boulanger, M., & Schuetzner, E. (2016). Measuring the frequency occurrence of handwriting and handprinting characteristics. Journal of Forensic Sciences.Google Scholar
  20. Kam, M., Gummadidala, K., Fielding, G., & Conn, R. (2001). Signature authentication by forensic document examiners. Journal of Forensic Science, 46(4), 884–888.CrossRefGoogle Scholar
  21. Lee, M. D., & Danileiko, I. (2014). Using cognitive models to combine probability estimates. Judgment and Decision Making, 9(3), 259–273.Google Scholar
  22. Lichtenstein, S., Fischhoff, B., & Phillips, L. (1982). Calibration of probabilities: The state of the art to 1980. In Kahneman, D., Slovic, P., & Tversky, A. (Eds.) Judgement under uncertainty: Heuristics and biases. New York: Cambridge University Press.Google Scholar
  23. Martire, K. A., & Edmond, G. (2017). Rethinking expert opinion evidence. Melbourne University Law Review, 40, 967–998.Google Scholar
  24. Merkle, E. C. (2010). Calibrating subjective probabilities using hierarchical Bayesian models. In International conference on social computing, behavioral modeling, and prediction, (pp. 13–22).Google Scholar
  25. Morey, R. D., & Rouder, J. N. (2015). BayesFactor: Computation of Bayes factors for common designs [Computer software manual]. Retrieved from (R package version 0.9.12-2).
  26. Murphy, A. H., & Daan, H. (1984). Impacts of feedback and experience on the quality of subjective probability forecasts. Comparison of results from the first and second years of the Zierikzee experiment. Monthly Weather Review, 112(3), 413–423.CrossRefGoogle Scholar
  27. Pearson, K. (1902). On the mathematical theory of errors of judgment, with special reference to the personal equation. Philosophical Transactions of the Royal Society of London, Series A, 198, 235–299.CrossRefGoogle Scholar
  28. Plummer, M. (2003). JAGS: A Program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing, (Vol. 124, p. 125).Google Scholar
  29. Prelec, D. (1998). The probability weighting function. Econometrica, 66, 497–527.CrossRefGoogle Scholar
  30. President’s Council of Advisors on Science and Technology (2016). Forensic science in criminal courts: Ensuring scientific validity of feature-comparison methods. Washington, DC: Executive Office of the President of the United States.Google Scholar
  31. Saks, M. J., & Koehler, J. J. (2005). The coming paradigm shift in forensic identification science. Science, 309(5736), 892–895.CrossRefPubMedGoogle Scholar
  32. Satopää, V. A., Baron, J., Foster, D. P., Mellers, B. A., Tetlock, P. E., & Ungar, L. H. (2014). Combining multiple probability predictions using a simple logit model. International Journal of Forecasting, 30(2), 344–356.CrossRefGoogle Scholar
  33. Schön, D., & François, C. (2011). Musical expertise and statistical learning of musical and linguistic structures. Frontiers in Psychology, 2(167), 1–9.Google Scholar
  34. Sita, J., Found, B., & Rogers, D. K. (2002). Forensic handwriting examiners’ expertise for signature comparison. Journal of Forensic Science, 47(5), 1–8.CrossRefGoogle Scholar
  35. Surowiecki, J. (2005). The wisdom of crowds. Anchor.Google Scholar
  36. Taroni, F., Aitken, C., & Garbolino, P. (2001). De Finetti’s subjectivism, the assessment of probabilities and the evaluation of evidence: A commentary for forensic scientists. Science & Justice, 41(3), 145–150.CrossRefGoogle Scholar
  37. Turk-Browne, N. B., Jungé, J., & Scholl, B. J. (2005). The automaticity of visual statistical learning. Journal of Experimental Psychology: General, 134(4), 552–564.CrossRefGoogle Scholar
  38. Weiss, D. J., & Shanteau, J. (2003). Empirical assessment of expertise. Human Factors, 45(1), 104–116.CrossRefPubMedGoogle Scholar
  39. Weiss, D. J., Shanteau, J., & Harries, P. (2006). People who judge people. Journal of Behavioral Decision Making, 19(5), 441–454.CrossRefGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2018

Authors and Affiliations

  • Kristy A. Martire
    • 1
  • Bethany Growns
    • 1
  • Danielle J. Navarro
    • 1
  1. 1.School of PsychologyUniversity of New South WalesSydneyAustralia

Personalised recommendations