## Abstract

Dr. Leonhard presents a comprehensive and insightful critique of the existing malingering research literature and its implications for neuropsychological practice. Their statistical critique primarily focuses on the crucial issue of diagnostic inference when multiple tests are involved. While Leonhard effectively addresses certain misunderstandings, there are some overlooked misconceptions within the literature and a few new confusions were introduced. In order to provide a balanced commentary, this evaluation considers both Leonhard's critiques and the malingering research literature. Furthermore, a concise introduction to Bayesian diagnostic inference, utilizing the results of multiple tests, is provided. Misunderstandings regarding Bayesian inference are clarified, and a valid approach to Bayesian inference is elucidated. The assumptions underlying the simple Bayes model are thoroughly discussed, and it is demonstrated that the chained likelihood ratios method is an inappropriate application of this model due to one reason identified by Leonhard and another reason that has not been previously recognized. Leonhard's conclusions regarding the primary dependence of incremental validity on unconditional correlations and the alleged mathematical incorrectness of the simple Bayes model are refuted. Finally, potential directions for future research and practice in this field are explored and discussed.

### Similar content being viewed by others

## Availability of Data and Materials

Simulation code available in Supplementary Material.

## Notes

The denominator can be expanded as \(P\left({X}\right)=P\left({X}|{Y}^{+}\right)P\left({Y}^{+}\right)+P\left({X}|{Y}^{-}\right)P({Y}^{-})\).

To simplify the discussion, I will primarily discuss concepts in terms of probabilities, while limiting references to likelihood functions, likelihood ratios, odds, and odds ratios except where necessary to discuss papers that referred specifically to these terms.

The denominator can be expanded as \(P\left({X}_{1},{X}_{2},{X}_{3}\right)=P\left({X}_{1},{X}_{2},{X}_{3}|{Y}^{+}\right)P\left({Y}^{+}\right)+P\left({X}_{1},{X}_{2},{X}_{3}|{Y}^{-}\right)P({Y}^{-})\).

For a single test, 2 (= 2 possible diagnoses) * (2 = possible test results − 1). For three tests, 2 (= 2 possible diagnoses) * (8 = 2

^{3}possible test results − 1). The minus one is because these probabilities must sum to 1, so the final probability is implied.Leonhard (2023a, b) and Larrabee (2022) have proposed conducting joint validity studies with logistic regression. While application of the standard test validation method (criterion or known group validation) to \(K\) tests would involve estimation of \(2({2}^{K}-1)\) validity coefficients, logistic regression could be construed as an approximation to the standard method to reduce the number of validity coefficients to \(K\) by estimating a unique effect for each test.

The denominator can be expanded as \(P\left({X}_{1}^{+},{X}_{2}^{+},{X}_{3}^{-}\right)=P\left({X}_{1}^{+}|{Y}^{+}\right)P\left({X}_{2}^{+}|{Y}^{+}\right)P\left({X}_{3}^{-}|{Y}^{+}\right)P\left({Y}^{+}\right)+P\left({X}_{1}^{+}|{Y}^{-}\right)P\left({X}_{2}^{+}|{Y}^{-}\right)P\left({X}_{3}^{-}|{Y}^{-}\right)P({Y}^{-})\).

Mixed Group Validation estimates the validity of test scores against a reference test with imperfect but known validity, in contrast to the standard method which estimates the validity against a reference test which is assumed to have perfect validity.

Ignoring negative results will only provide the correct posterior probability if \((1-\mathrm{TPR})/\mathrm{TNR}=1\), that is, if the test results have no validity and both cases and non-cases are equally likely to test negative. If the test results have no validity, the positive results should also be ignored.

Larrabee (2022) describes summing multiple posterior odds, rather than multiple posterior probabilities. Why the odds were summed was not clear. A sum of probabilities can be a probability, but a sum of odds is never an odds. Summing multiple posterior odds, then converting the odds to a probability guarantees that the estimated posterior probability is less than 1 so the illogic is less obvious, but otherwise also has no logical basis and overestimates the posterior probability.

If the equally mathematically unjustified approach to sum multiple posterior odds is taken instead of summing multiple posterior probabilities, the posterior probability will be calculated as 75%. However, as we have equivalent evidence for and against the individual being a case, the correct posterior probability must equal the prior probability. Furthermore, the sum of the posterior probability that the individual is a case and the posterior probability that the individual is not a case must logically sum to one. However, summing multiple posterior odds to calculate both probabilities would calculate a 75% chance of being a case + a 57% chance of not being a case = 132% chance of being either a case or not. Clearly, summing multiple posterior odds does not provide the correct probabilities.

The simulation was written in R 4.2.1 with code in the Supplementary Material. The test results (coin flips) were generated under conditional independence. The posterior probabilities were calculated using the true values of the true positive and true negative rates.

Assuming a prior probability of 50% to be neutral may not be appropriate for real neuropsychological applications, where prevalence of conditions may be much lower and the consequences of misdiagnosing a case vs a non-case may be asymmetrical.

All methods were identical when the coin was two-headed because there were no negative results to ignore (a two-headed coin always flips heads) and because there was only ever one pattern of results with the same number of positive and negative results (there is only ever one possible pattern of all heads). The methods will not be identical in general.

Incremental validity of a Test Result 2 above and beyond the validity of Test Result 1 alone may be defined as \(P\left({X}_{1},{X}_{2}^{+}|{Y}^{+}\right)/P({X}_{1}|{Y}^{+})\) and \(P\left({X}_{1},{X}_{2}^{-}|{Y}^{-}\right)/P({X}_{1}|{Y}^{-})\), where values greater than one indicate incremental validity. The relevance of the conditional relationships may be self-evident. I decided against an extended discussion to focus on the calculation of the posterior probability here.

Highly valid tests may be independent if the tests are negatively conditionally dependent, however, this seems unlikely in reality.

More formally, the consensus in assessment fields is that test properties are best described in terms of conditional univariate distributions (that is, best described by TPR and TNR for a dichotomous test). Extending this logic to multiple tests, we would therefore want the conditional multivariate distribution, which is a function of the conditional univariate distributions and the conditional correlations.

While Leonhard discusses the apparent proof in the context of the chained likelihood ratios method, the mathematics would also apply to the simple Bayes method.

To be precise, Leonhard expressed the simple Bayes method in terms of odds not probabilities. However, this has no impact on the discussion in the main text as both expressions are mathematically equivalent.

Given the TPR, TNR, and prevalence, the unconditional correlation can be calculated.

The equation for the tetrachoric correlation in Table 2 is incorrect. However, it can be confirmed that the correction equation as given by Digby (1983) was used to calculate the tetrachoric correlations in the Table.

Chafetz (2022) observed an estimation of the posterior probability of 99% via the chained likelihood ratios method (Larrabee, 2008) and an empirically calculated posterior probability of 84%. Aside from this overestimation being arguably large (from some doubt of malingering to virtually no doubt) the simulation shows that the overestimation can be extreme when the correct posterior probability is further from 100%. In fact, the chained likelihood ratios method can produce posterior probabilities close to 100% when the correct posterior probability was close to 0%. See also Black et al. (2016).

The overestimation of the posterior probability of a case reflects how the problem was framed in Larrabee (2008). If the posterior probability was calculated using only the negative test results, instead of only the positive test results, the posterior probability that the individual is a case would be underestimated. Similarly, if the posterior probability that the individual is a non-case was calculated, and the posterior probability of a case was calculated as one minus this value, again the posterior probability that the individual was a case would be underestimated. This also indicates that if the posterior probability that the individual is a case and the posterior probability that the individual is not a case were both calculated with the likelihoods method, then added together, the resulting posterior probability that the individual is either a case or not a case would exceed 100%.

The assumption is misstated as independence in previous papers.

The null hypothesis significance test methods estimate \(P(X|{M}^{-})\); the Bayesian methods estimate \(P({M}^{+}|X)\) and \(P({M}^{-}|X)\). Note that the first probability can be meaningfully summed for different patterns of results (i.e., different values of \(X\)) but the second and third probabilities cannot. The underappreciation of the fundamental distinction between these two types of method in previous publications may have contributed to the misunderstanding to sum the posterior probabilities in the chained likelihood ratios method.

Interestingly, there are some studies and discussions in the same spirit in accounting for correlations between performance validity test results (e.g., Berthelson et al., 2013; Bilder et al., 2014). Focusing on generalizing the simple Bayes method instead of the null hypothesis significance test method, using estimates of the conditional correlations instead of unconditional correlations, and using more flexible distribution assumptions rather than assuming normality may provide more fruitful research in this area.

## References

Altman, D. G., & Bland, J. M. (1994). Statistics notes: Diagnostic tests 2: Predictive values.

*BMJ,**309*(6947), 102. https://doi.org/10.1136/bmj.309.6947.102Bender, S. D., & Frederick, R. I. (2018). Neuropsychological models of feigned cognitive deficits. In R. Rogers & S. D. Bender (Eds.),

*Clinical assessment of malingering and deception*(4th ed., pp. 42–60). The Guilford Press.Berthelson, L., Mulchan, S. S., Odland, A. P., Miller, L. J., & Mittenberg, W. (2013). False positive diagnosis of malingering due to the use of multiple effort tests.

*Brain Injury,**27*(7–8), 909–916. https://doi.org/10.3109/02699052.2013.793400Bilder, R. M., Sugar, C. A., & Hellemann, G. S. (2014). Cumulative false positive rates given multiple performance validity tests: Commentary on Davis and Millis (2014) and Larrabee (2014).

*The Clinical Neuropsychologist,**28*(8), 1212–1223. https://doi.org/10.1080/13854046.2014.969774Black, J., Necrason, B., & Omasta, N. (2016). Refining the use of likelihood ratios for determining non-credible effort (Abstract).

*Archives of Clinical Neuropsychology,**31*(6), 573.1-573. https://doi.org/10.1093/arclin/acw042.01Boone, K. B., & Lu, P. (2003). Noncredible cognitive performance in the context of severe brain injury.

*The Clinical Neuropsychologist,**17*(2), 244–254. https://doi.org/10.1076/clin.17.2.244.16497Chafetz, M. D. (2022). Deception is different: Negative validity test findings do not provide “evidence” for “good effort.”

*The Clinical Neuropsychologist,**36*(6), 1244–1264. https://doi.org/10.1080/13854046.2020.1840633Dawes, R. M., & Meehl, P. E. (1966). Mixed group validation: A method for determining the validity of diagnostic signs without using criterion groups.

*Psychological Bulletin,**66*(2), 63. https://doi.org/10.1037/h0023584Digby, P. G. N. (1983). Approximating the tetrachoric correlation coefficient.

*Biometrics,**39*(3), 753–757. https://doi.org/10.2307/2531104Frederick, R. I. (2000). Mixed group validation: A method to address the limitations of criterion group validation in research on malingering detection.

*Behavioral Sciences & the Law,**18*(6), 693–718. https://doi.org/10.1002/bsl.432Frederick, R. I. (2015).

*Too much information: Problems using multiple malingering tests*[Invited lecture]. American Psychology-Law Conference, San Diego, CA.Jewsbury, P. A. (2019). Diagnostic test score validation with a fallible criterion.

*Applied Psychological Measurement,**43*(8), 579–596. https://doi.org/10.1177/0146621618817785Jewsbury, P. A., & Bowden, S. C. (2013). Considerations underlying the use of mixed group validation.

*Psychological Assessment,**25*(1), 204. https://doi.org/10.1037/a0030063Jewsbury, P. A., & Bowden, S. C. (2014). A description of mixed group validation.

*Assessment,**21*(2), 170–180. https://doi.org/10.1177/1073191112473176Larrabee, G. J. (2008). Aggregation across multiple indicators improves the detection of malingering: Relationship to likelihood ratios.

*The Clinical Neuropsychologist,**22*(4), 666–679. https://doi.org/10.1080/13854040701494987Larrabee, G. J. (2022). Synthesizing data to reach clinical conclusions regarding validity status. In Schroeder, R. W., & Martin, P. K. (Eds.),

*Validity assessment in clinical neuropsychological practice: Evaluating and managing noncredible performance*. New York: The Guilford Press.Larrabee, G. J., Rohling, M. L., & Meyers, J. E. (2019). Use of multiple performance and symptom validity measures: Determining the optimal per test cutoff for determination of invalidity, analysis of skew, and inter-test correlations in valid and invalid performance groups.

*The Clinical Neuropsychologist,**33*(8), 1354–1372. https://doi.org/10.1080/13854046.2019.1614227Leonhard, C. (2023a). Review of statistical and methodological issues in the forensic prediction of malingering from validity tests: Part I: Statistical issues.

*Neuropsychology Review.*Leonhard, C. (2023b). Review of statistical and methodological issues in the forensic prediction of malingering from validity tests: Part II: Methodological issues.

*Neuropsychology Review.*Messick, S. (1989). Validity. In R. L. Linn (Ed.),

*Educational measurement*(third edition). New York: MacMillan.Meyers, J. E., Miller, R. M., Thompson, L. M., Scalese, A. M., Allred, B. C., Rupp, Z. W., ... & Junghyun Lee, A. (2014). Using likelihood ratios to detect invalid performance with performance validity measures.

*Archives of Clinical Neuropsychology*,*29*(3), 224–235. https://doi.org/10.1093/arclin/acu001Sherman, E. M., Slick, D. J., & Iverson, G. L. (2020). Multidimensional malingering criteria for neuropsychological assessment: A 20-year update of the malingered neuropsychological dysfunction criteria.

*Archives of Clinical Neuropsychology,**35*(6), 735–764. https://doi.org/10.1093/arclin/acaa019Sweet, J. J., Heilbronner, R. L., Morgan, J. E., Larrabee, G. J., Rohling, M. L., Boone, K. B., ... & Conference Participants. (2021). American Academy of Clinical Neuropsychology (AACN) 2021 consensus statement on validity assessment: Update of the 2009 AACN consensus conference statement on neuropsychological assessment of effort, response bias, and malingering.

*The Clinical Neuropsychologist*,*35*(6), 1053–1106. https://doi.org/10.1080/13854046.2021.1896036Wainer, H., & Thissen, D. (2001).

*True score theory: The traditional method*. In H. Wainer and D. Thissen, (Eds.), Test Scoring. Mahwah, NJ: Lawrence Erlbaum. https://doi.org/10.4324/9781410604729Wasserstein, R. L., & Lazar, N. A. (2016). The ASA statement on p-values: Context, process, and purpose.

*The American Statistician,**70*(2), 129–133. https://doi.org/10.1080/00031305.2016.1154108

## Author information

### Authors and Affiliations

### Contributions

Not applicable.

### Corresponding author

## Ethics declarations

### Ethics Approval

Not applicable.

### Competing Interests

The authors declare no competing interests.

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary Information

Below is the link to the electronic supplementary material.

## Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

## About this article

### Cite this article

Jewsbury, P.A. Invited Commentary: Bayesian Inference with Multiple Tests.
*Neuropsychol Rev* **33**, 643–652 (2023). https://doi.org/10.1007/s11065-023-09604-4

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s11065-023-09604-4