In testing situations, participants are often asked for supplementary responses in addition to the primary response of interest, which may include quantities like confidence or reported difficulty. These additional responses can be incorporated into a psychometric model either as a predictor of the main response or as a secondary response. In this paper we explore both of these approaches for incorporating participant’s reported difficulty into a psychometric model using an error rate study of fingerprint examiners. Participants were asked to analyze print pairs and make determinations about the source, which can be scored as correct or incorrect decisions. Additionally, participants were asked to report the difficulty of the print pair on a five point scale. In this paper, we model (a) the responses of individual examiners without incorporating reported difficulty using a Rasch model, (b) the responses using their reported difficulty as a predictor, and (c) the responses and their reported difficulty as a multivariate response variable. We find that approach (c) results in more balanced classification errors, but incorporating reported difficulty using either approach does not lead to substantive changes in proficiency or difficulty estimates. These results suggest that, while there are individual differences in reported difficulty, these differences appear to be unrelated to examiners’ proficiency in correctly distinguishing matched from non-matched fingerprints.
- Item response theory
- Forensic science
- Bayesian statistics
This work was partially funded by the Center for Statistics and Applications in Forensic Evidence (CSAFE) through Cooperative Agreements 70NANB15H176 and 70NANB20H019 between NIST and Iowa State University, which includes activities carried out at Carnegie Mellon University, Duke University, University of California Irvine, University of Virginia, West Virginia University, University of Pennsylvania, Swarthmore College and University of Nebraska, Lincoln.
This is a preview of subscription content, access via your institution.
Tax calculation will be finalised at checkout
Purchases are for personal use onlyLearn about institutional subscriptions
These latent evaluation categories may vary depending on different laboratory practices. We use the categories that were recorded in the Black Box study (Ulery et al., 2011).
Individualizations are no longer recommended in practice, in favor of ‘identification’ or ‘same source’ conclusions. Since the data used in this paper was collected in 2011 and used the ‘Individualization’ terminology, this is what we use throughout. See Friction Ridge Subcommittee of the Organization of Scientific Area Committees for Forensic Science (2017, 2019) for further discussion and current recommendations.
AAAS. (2017). Forensic science assessments: A quality and gap analysis - latent fingerprint examination. Tech. rep., (prepared by William Thompson, John Black, Anil Jain, and Joseph Kadane)
Batchelder, W. H., & Romney, A. K. (1988). Test theory without an answer key. Psychometrika, 53(1), 71–92.
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01
Bürkner, P. C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28.
Bürkner, P. C. (2019). Bayesian item response modeling in R with brms and Stan. Preprint, arXiv:190509501.
De Boeck, P., & Partchev, I. (2012). IRTrees: Tree-based item response models of the GLMM family. Journal of Statistical Software, Code Snippets, 48(1), 1–28. https://doi.org/10.18637/jss.v048.c01, https://www.jstatsoft.org/v048/c01
De Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer.
Dror, I. E., & Scurich, N. (2020). (Mis) use of scientific measurements in forensic science. Forensic Science International: Synergy, 2, 333–338.
Eldridge, H., De Donno, M., & Champod, C. (2021). Testing the accuracy and reliability of palmar friction ridge comparisons–a black box study. Forensic Science International, 318, 110457.
Ferrando, P. J., & Lorenzo-Seva, U. (2007). An item response theory model for incorporating response time data in binary personality items. Applied Psychological Measurement, 31(6), 525–543. https://doi.org/10.1177/0146621606295197
Fischer, G. H., & Molenaar, I. W. (2012). Rasch models: Foundations, recent developments, and applications. New York: Springer Science & Business Media.
Friction Ridge Subcommittee of the Organization of Scientific Area Committees for Forensic Science. (2017). Guideline for the articulation of the decision-making process leading to an expert opinion of source identification in friction ridge examinations. Online; accessed September 15, 2021.
Friction Ridge Subcommittee of the Organization of Scientific Area Committees for Forensic Science. (2019). Friction ridge process map (current practice). Online; accessed September 15, 2021.
Hofmann, H., Carriquiry, A., & Vanderplas, S. (2020). Treatment of inconclusives in the AFTE range of conclusions. Law, Probability and Risk, 19(3–4), 317–364.
Holland, P. W., & Wainer, H. (2012). Differential item functioning. Routledge.
Jeon, M., De Boeck, P., & van der Linden, W. (2017). Modeling answer change behavior: An application of a generalized item response tree model. Journal of Educational and Behavioral Statistics, 42(4), 467–490.
Koehler, J. J. (2007). Fingerprint error rates and proficiency tests: What they are and why they matter. Hastings LJ, 59, 1077.
Luby, A. (2019). Decision making in forensic identification tasks. In S. Tyner & H. Hofmann (Eds.), Open forensic science in R (Chap. 13). rOpenSci, US.
Luby, A., Mazumder, A., & Junker, B. (2020). Psychometric analysis of forensic examiner behavior. Behaviormetrika, 47, 355–384.
Luby, A., Mazumder, A., & Junker, B. (2021). Psychometrics for forensic fingerprint comparisons. In Quantitative psychology (pp. 385–397). Springer.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174.
R Core Team. (2013). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.R-project.org/
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press.
Sinharay, S., Johnson, M. S., & Stern, H. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30(4), 298–321.
Stan Development Team. (2018a). RStan: The R interface to Stan. r package version 2.18.2. http://mc-stan.org/
Stan Development Team. (2018b). Stan modeling language users guide and reference manual. http://mc-stan.org
Thissen, D. (1983). 9 - timed testing: An approach using item response theory. In D. J. Weiss (Ed.), New horizons in testing (pp. 179–203). San Diego: Academic.
Ulery, B. T., Hicklin, R. A., Buscaglia, J., & Roberts, M. A. (2011). Accuracy and reliability of forensic latent fingerprint decisions. Proceedings of the National Academy of Sciences, 108(19), 7733–7738.
Ulery, B. T., Hicklin, R. A., Buscaglia, J., & Roberts, M. A. (2012). Repeatability and reproducibility of decisions by latent fingerprint examiners. PloS One, 7(3), e32800.
van der Linden, W. J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31(2), 181–204.
van der Linden, W. J., Klein Entink, R. H., & Fox, J. P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34(5), 327–347.
Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5), 1413–1432.
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11(Dec), 3571–3594.
Editors and Affiliations
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Luby, A., Thompson, R.E. (2022). Modeling Covarying Responses in Complex Tasks. In: Wiberg, M., Molenaar, D., González, J., Kim, JS., Hwang, H. (eds) Quantitative Psychology. IMPS 2021. Springer Proceedings in Mathematics & Statistics, vol 393. Springer, Cham. https://doi.org/10.1007/978-3-031-04572-1_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04571-4
Online ISBN: 978-3-031-04572-1