Abstract
This study was designed to evaluate the utility of the Atypical Responses (ATR) scale of the Trauma Symptom Inventory – Second Edition (TSI-2) as a symptom validity test (SVT) in a medicolegal sample. Archival data were collected from a consecutive case sequence of 99 patients referred for neuropsychological evaluation following a motor vehicle collision. The ATR’s classification accuracy was computed against criterion measures consisting of composite indices based on SVTs and performance validity tests (PVTs). An ATR cutoff of ≥ 9 emerged as the optimal cutoff, producing a good combination of sensitivity (.35-.53) and specificity (.92-.95) to the criterion SVT, correctly classifying 71–79% of the sample. Predictably, classification accuracy was lower against PVTs as criterion measures (.26-.37 sensitivity at .90-.93 specificity, correctly classifying 66–69% of the sample). The originally proposed ATR cutoff (≥ 15) was prohibitively conservative, resulting in a 90–95% false negative rate. In contrast, although the more liberal alternative (≥ 8) fell short of the specificity standard (.89), it was associated with notably higher sensitivity (.43-.68) and the highest overall classification accuracy (71–82% of the sample). Non-credible symptom report was a stronger confound on the posttraumatic stress scale of the TSI-2 than that of the Personality Assessment Inventory. The ATR demonstrated its clinical utility in identifying non-credible symptom report (and to a lesser extent, invalid performance) in a medicolegal setting, with ≥ 9 emerging as the optimal cutoff. The ATR demonstrated its potential to serve as a quick (potentially stand-alone) screener for the overall credibility of neuropsychological deficits. More research is needed in patients with different clinical characteristics assessed in different settings to establish the generalizability of the findings.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Exaggeration or outright fabrication of symptoms occurs in a high proportion of neuropsychological evaluations, especially in forensic settings, where the external incentives to appear impaired are often substantial (Bush et al., 2005). Thus, it is crucial to thoroughly assess the credibility of self-reported symptoms in order to ensure the validity of diagnoses and treatment recommendations based on the results of psychometric testing.
Methods used to assess symptom validity may vary depending on the context of the evaluation and the type of symptoms presented. Additionally, the American Psychological Association (2013) has drafted some guidelines to assist practitioners in their forensic evaluations. Among them, Guideline 9.02 (Use of Multiple Sources of Information) explicitly recommends, “Forensic practitioners ordinarily avoid relying solely on one source of data” (APA, 2013; p. 15).
Performance and Symptom Validity Assessment
In the context of neuropsychological assessments, the credibility of the clinical presentation must be further divided into performance and symptoms validity (Larrabee, 2012). The former refers to the extent to which scores on performance-based measures of cognitive abilities reflect the examinee’s true ability level; the latter refers to the extent to which self-reported symptoms accurately capture the examinee’s level of emotional distress. Although the two constructs are related, they ultimately measure conceptually distinct aspects of the examinee’s neuropsychological profile (Bianchini et al., 2014; Gervais et al., 2007, 2011; Merten et al., 2022; Richman et al., 2006; Tarescavege et al., 2013; Tylicki et al., 2021; Young, 2020). Therefore, they are assessed using different types of instruments: performance versus symptom validity tests (PVTs vs SVTs).
Naturally, PVTs and SVTs use different detection strategies, following the measurement paradigms established to assess cognitive ability (i.e., performance-based tasks) and emotional functioning (i.e., self-reported symptom inventories), respectively. PVTs are designed to detect implausibly low scores on measures of cognitive abilities, whereas SVTs are designed to detect implausibly high scores on measures of psychological symptoms (Giromini et al., 2022). The most common detection mechanisms used by PVTs are the method of threshold (an unusually low level of performance – below that commonly observed in credible patients with genuine impairment) and measures of compelling inconsistency (combination of scores incompatible with known patterns of neurological deficits). In contrast, SVTs are designed to detect a tendency to endorse extremely rare symptoms or indiscriminate endorsement of all symptoms. Both patterns of response are interpreted as the examinees’ tendency (whether deliberate or not) to exaggerate their true level of their emotional distress (Rogers & Bender, 2018).
The different detection mechanisms used by PVTs and SVTs predict a weak correlation between these two types of tests. In fact, SVTs tend to correlate more with other SVTs than with PVTs and vice versa (Giromini et al., 2022) and the method variance predicts that the outcome of SVTs and PVTs administered to a given examinee will be unrelated. Recent empirical instigations largely supported this prediction (Sabelli et al., 2021; Shura et al., 2022; Van Dyke et al., 2013), and there is consensus that symptom and performance validity should be assessed separately (Sweet et al., 2021). Similarly, researchers calibrating or cross-validating SVTs should establish criterion groups based on the outcomes of other SVTs, not PVTs (Gegner et al., 2022).
Within the last decades, empirical research on both free-standing (Boone et al., 2002a, b; Green, 2003, 2004; Nelson et al., 2006; Pearson, 2009; Slick et al., 1997; Tombaugh, 1996) and embedded PVTs has proliferated (Martin et al., 2015): old instruments have been continuously recalibrated (Boucher et al., 2023; Deloria et al., 2021; Johnson et al., 2012; Sugarman & Axelrod, 2015; Whiteside et al., 2015) while new measures and cutoffs are being introduced (Abeare et al., 2021a, b; Erdodi et al., 2016; Langeluddecke & Lucas, 2003; Rai et al., 2019; Sawyer et al., 2017; Schroeder & Marshall, 2010). Assessors have a wide range of PVTs to choose from. Although there is a comparable number of well-established free-standing SVTs, both interview-based [e.g., the Structured Interview of Reported Symptoms (SIRS; Rogers et al., 1992; SIRS-2; Rogers et al., 2010); the Miller Forensic Assessment of Symptoms Test (M-FAST; Miller, 2001)] and self-report measures [e.g., the Structured Inventory of Malingered Symptoms (SIMS; Smith & Burger, 1997); the Inventory of Problems – 29 (IOP-29; Viglione & Giromini, 2020; Viglione et al., 2017), the SRSI (Merckelbach et al., 2018)], the most commonly used SVTs are embedded within comprehensive personality inventories [the Minnesota Multiphasic Personality Inventory (MMPI-2; Butcher et al., 2001; MMPI-2-RF; Ben-Porath & Tellegen, 2008) or the Personality Assessment Inventory (PAI; Morey, 1991)].
Validity scales embedded within brief symptom inventories are less common and the existing ones have limited post-publication empirical evidence base (Roth et al., 2005). In contrast, nowadays the most commonly used neuropsychological test of cognitive ability contain embedded PVTs. It may be no coincidence that the trend for developing embedded SVTs in shorter self-report inventories has started with instruments commonly used by neuropsychologists (Abeare et al., 2021b; Cutler et al., 2022; Shwartz et al., 2020; Silva, 2021; Vanderploeg et al., 2014).
The use of SVTs is highly recommended in all psychological evaluations (Sherman et al., 2020; Sweet et al., 2021). However, to date, SVTs are still underutilized, regardless of the assessment context (Merten & Merckelbach, 2013; Nelson et al., 2019; Plohmann & Merten, 2013; Sharland & Gfeller, 2007; Tierney et al., 2021). The limited range of available SVTs that are quick and easy to administer and score may be a practical limitation inhibiting their widespread use. Conversely, the research on the integrated use of several different PVTs (Boone, 2009; Erdodi, 2019, 2021, 2023; Larrabee, 2008, 2014; Larrabee et al., 2019) is robust and morphed into clear guidelines on multivariate models. Additionally, there is still little evidence on how many SVTs should be administered to properly assess the credibility of symptom report, and how many failures are required to deem a response invalid (Sherman et al., 2020). Given these significant knowledge gaps, there is a clear need for more research on SVTs.
Trauma-Related Symptoms Validity Assessment
SVTs may differ in terms of the type of symptoms being evaluated. Some scales were designed to assess a broad spectrum of symptoms, without any specificity to a given category of psychopathology [e.g., the Negative Impression Management scale of the PAI (NIMPAI); the Infrequency scale of the MMPI-2 (FMMPI-2)]; others focus on specific symptoms clusters (e.g., psychiatric, somatic, cognitive). The SVTs embedded within the Trauma Symptom Inventory – Second Edition (TSI-2; Briere, 2011) are examples of the latter. The TSI-2 is a self-report inventory designed to assess symptoms and behaviors following trauma of various kinds (e.g., sexual and/or physical assault, domestic violence, physical confrontation, torture, car accident, multiple victim events, health care incident, witnessing violence, traumatic loss, and early experiences of child neglect or abuse).
In addition, the TSI-2 contains two validity scales: Response Level (RL) and Atypical Responses (ATR). High scores on either of these scales raise concerns about the validity of the profile. RL was designed to monitor a tendency to deny symptoms that others generally recognize. In contrast, ATR was designed to monitor a tendency to (over)endorse trauma-related symptoms that are only rarely reported by others, including those with significant post-traumatic symptomatology. High scores on the ATR may indicate (a) general over-estimation of symptoms, (b) specific over-estimation of PTSD-related items, (c) random response style, and (d) very high levels of genuine distress (Palermo & Brand, 2019).
Consequently, the underlying problem of this scale is that over-endorsing items may be interpreted either as an attempt at gross symptom exaggeration/factitious complaints or the experience of symptoms with greater intensity than others. In clinical and forensic settings, the recommended cutoff for non-credible presentation is ≥ 15 (Briere, 2011). In research settings (i.e., general population assessed in a non-clinical context), the recommended cutoff is ≥ 8 (Gray et al., 2010).
A review of the ATR’s item content reveals a mixture of different detection mechanisms: rare symptom endorsement combined with neurologically (global amnesia) or physiologically (inability to meet basic needs for prolonged periods of time) implausible level of impairments. At face value, reporting a high frequency of these symptoms seems incompatible with genuine distress, unless the associated psychopathology is correlated with severe cognitive deficits that interfere with the examinees’ ability to objectively evaluate their level of functioning (i.e., clinically significant impairments in reality testing). Arguably, the latter scenario should still be classified as a subtype of non-credible reporting – although perhaps of a different etiology (Merten & Merckelbach, 2013). In other words, there are no salient, face-valid a priori reasons to justify the need for a highly conservative cutoff on the ATR.
The TSI-2 has been standardized and validated on a representative sample of the U.S. population (n = 678). The Professional Manual reports variable internal consistency (α = 0.76–0.94) and test–retest reliability with a one-week interval (r = 0.76–0.93; Briere, 2011). However, post-publication research on the effectiveness of the TSI-2 at distinguishing coached simulators from patients with genuine dissociative disorders, found that it underperformed compared to both the Trauma Index of the SIRS-2 (Brand et al., 2014), and the Infrequency-Psychopathology scale (Fp) of the MMPI-2 (Palermo & Brand, 2019). Other studies reported incremental utility of the TSI-2 above and beyond other SVTs or study-specific predictors in distinguishing individuals attempting to feign PTSD from honest responders (Efendov et al., 2008; Elhai et al., 2005).
After examining PAI and TSI-2 scores in coached PTSD simulators and credible patients with PTSD, Gray and colleagues (2010) found that both the PAI and the TSI-2 successfully differentiated between the two groups, but the NIMPAI outperformed the ATR. Taken together, these findings, along with a recent review of the available literature on the effectiveness of the ATR scale (Ales & Erdodi, 2021), suggest the need for further research to evaluate its clinical and forensic utility. In addition, there is no information on the ATR’s differential predictive power using SVTs vs PVTs as criterion measures. Although peri-traumatic dissociation is a common occurrence (Azoulay et al., 2020; Holeva & Tarrier, 2001; Ursano et al., 1999) with verifiable neurobehavioral (Daniels et al., 2012) and genetic correlates (Koenen et al., 2005), its validity has been called into question (Candel & Merckelbach, 2004). Therefore, there is a clear need for objectively verifying claims of memory deficits associated with traumatic events.
Present Study
This study was designed to address this gap in the research literature. The classification accuracy of the ATR was computed against both SVTs and PVTs as criterion measures to empirically evaluate its differential predictive power. More importantly, we collected data from clinical patients with identifiable external incentives to appear impaired – an important factor in the study of motivated exaggeration of symptoms and deficits (Boskovic, 2020; McDermott, 2012; Peace & Richards, 2014). Based on previous reports (Sabelli et al., 2021; Shura et al., 2022), we hypothesized that the ATR would produce a superior classification accuracy against SVTs compared to PVTs. In addition, we predicted that the optimal cutoff would be closer to that proposed by Gray et al. (2010; ≥ 8) as opposed to the one proposed by Briere (2011; ≥ 15).
Methods
Participants
Data were collected from a consecutive case sequence of 99 files retrieved from the clinical archives of a clinical neuropsychologist from the Greater Toronto Area in Ontario, Canada. Patients were assessed in the context of a motor vehicle collision to provide an independent medicolegal evaluation of their neuropsychological and adaptive functioning. Inclusion criteria were 1) A full administration of the TSI-2 and PAI; 2) Age between 18 and 69 (adults); and 3) Being born in Canada (to control for limited English proficiency as a confounding variable; Ali et al., 2022; Boskovic et al., 2020; Crişan et al., 2023a, b; Dandachi-FitzGerald et al., 2023a, b; Erdodi & Lajiness-O’Neill, 2012; Erdodi et al., 2017b). The majority of the sample was female (63.6%) and right-handed (92.9%). Mean age was 42.5 (SD = 14.2); mean level of education was 12.7 (SD = 2.6). Overall intellectual functioning (MFSIQ = 93.5, SD = 12.8) and single-word reading level (MFSIQ = 92.2, SD = 13.2) were in the average range.
All patients were involved in litigation around the motor vehicle collision that prompted the referral for neuropsychological assessment. The majority of patients (77) sustained an uncomplicated mild TBI [Glasgow Coma Scale (GCS) > 13; loss of consciousness < 30 min; peritraumatic amnesia < 1 h; and negative neuroradiological findings], followed by complicated (positive neuroradiological findings) mild TBI (10), severe (3) and moderate (2) TBI. All patients were assessed in the post-acute stage of recovery (> 3 months post injury for mild TBI and > 12 months post injury for moderate/severe TBI). Coincidentally, the same proportion of the sample (42.9%) reported clinically significant PTSD symptoms on the PAI and TSI-2
Measures
Trauma Symptom Inventory – Second Edition (TSI-2)
The TSI-2 consists of 136 items and measures a wide range of complex psychopathology across the lifespan (e.g., post-traumatic stress, dissociation, somatization, insecure attachment styles, reduced self-capacity, and wide-ranging dysfunctional behaviors) organized into 12 clinical scales, 12 subscales, four factors, and two validity scales (Table 1). The TSI-2 instructs the examinee to read each item carefully and rate how often the symptom was experienced in the past six months on a scale ranging from 0 to 3.Footnote 1 The TSI-2 assesses acute or chronic trauma-related symptomatology. T-scores represent linear transformations of raw scores (M = 50, SD = 10). Higher scores represent higher levels of symptomatology. T-scores between 60 and 64 are considered problematic (i.e., above average symptoms, with potential clinical implications); a T-score ≥ 65 is considered clinically elevated (i.e., high levels of symptoms that constitute a major clinical problem).
Personality Assessment Inventory (PAI)
The PAI offers four validity scales to determine whether the profile emerging from the test accurately represents the individual’s distress, and to assess any potential biases in delivering responses. The NIMPAI is a 9-item scale specifically designed to detect whether the individual attempts to present a more negative picture of their symptoms. It comprises items on bizarre symptoms that are rarely endorsed in both clinical and non-clinical samples. Thus, the NIMPAI may be considered a measure of over-estimation of pathology driven by pessimism and/or intentional over-estimation of distress (Morey, 1991). In the second edition of the PAI Professional Manual, Morey (2007) proposed a T-score cutoff < 74 suggesting little response distortion, whereas T-scores between 74 and 84 would suggest some exaggeration. Additionally, Hawes and Boccaccini (2009) conducted an extensive meta-analysis examining different PAI cutoffs. They found that a NIMPAI T-score cutoff of ≥ 81 yielded the highest overall classification rate (.79), while preserving relatively strong sensitivity (.73) and specificity (.83), and thus suggested that future PAI validity studies report classification results using optimal cutoffs identified by their meta-analysis. In the current study, the NIMPAI (at a cutoff of T ≥ 81) served as the legacy criterion SVT (i.e., domain-congruent measure) for evaluating the ATR’s classification accuracy.
Beck Depression Inventory – Second Edition (BDI-II)
The BDI-II (Beck et al., 1996) is a 21-item self-report measuring presence and severity of depressive symptoms in the past two weeks. The BDI-II provides a total score covering two symptoms spectra, i.e. the somatic-affective and cognitive. The former intends to capture somatic-affective manifestations of depression such as loss of interest, loss of energy, changes in sleep and appetite, agitation and crying; the latter targets cognitive manifestations such as pessimism, guilt, and self-criticism. A recent study demonstrated that a cutoff of ≥ 38 on the BDI-II is specific to non-credible symptom report (Fuermaier et al., 2023a). As such, the BDI-II’s new embedded validity indicator was employed as an alternative SVT. A BDI-II total raw score of ≥ 38 was used to operationalize symptom exaggeration within this study.
SVT-2
The NIMPAI (invalid defined as T ≥ 81) and BDI-II (invalid defined as ≥ 38) were combined into a multivariate criterion measure labeled SVT-2, consistent with methodological recommendations by Sherman et al. (2020). The classification accuracy of the ATR was evaluated across two alternative multivariate cutoffs. On the SVT-2A, invalid responding was defined as failing either of the two components (liberal cutoff). In contrast, on the SVT-2B, invalid responding was defined as failing both of the components (conservative cutoff).
Test of Memory Malingering (TOMM)
The TOMM (Tombaugh, 1996) is one of the most commonly used free-standing PVT worldwide (Dandachi-FitzGerald et al., 2013; Martin et al., 2015; Sharland & Gfeller, 2007; Slick et al., 2004; Uiterwijk et al., 2021). It is based on the visual forced choice recognition paradigm using pictures of common objects represented by black-and-white single line drawings. Its first trial (TOMM-1) was initially developed as an inactive learning trial but has been subsequently validated as a free-standing PVT on its own right. A liberal cutoff of ≤ 43 demonstrated high specificity to non-credible responding (Ashendorf et al., 2004; Erdodi, 2022; Greve et al., 2006; Jones, 2013; Kulas et al., 2014; Rai & Erdodi, 2021), but a recent meta-analysis endorsed the use of a more conservative cutoff of ≤ 41 (Martin et al., 2020). Therefore, invalid performance on the TOMM-1 within this study was operationalized as a raw score of ≤ 41.
Validity Index Five (VI-5)
Next, a composite measure of performance validity (VI-5) was created by aggregating five embedded PVTs. Each component was dichotomized along published cutoffs (Table 2). The value of the VI-5 is the number of its components failed by a given patient. As such, it ranges from 0 (all five PVTs passed) to 5 (all five PVTs failed). A VI-5 score ≥ 2 was used as the multivariate cutoff for invalid performance (Larrabee, 2014).
Erdodi Index Seven (EI-7)
Finally, another validity composite was developed using an alternative aggregation method following the template developed by Erdodi (2019). Each embedded PVT was recoded onto a four-point ordinal scale, where 0 is defined by a score that cleared the most liberal cutoff and suggests valid performance; 3 is a score that failed the most conservative cutoff, with 1 and 2 representing in-between levels of failure (Table 3). The value of the EI-7 is obtained by summing the recoded components. As such, it ranges from 0 (all components passed) to 21 (all components failed at the most conservative cutoff). An EI-7 ≤ 1 is considered an incontrovertible Pass, as it reflects at most one marginal failure. EI-7 values in the 2–3 range are considered Borderline, as they indicate either at most three marginal failures, which contains insufficient overall evidence of globally invalid performance (Pearson, 2009). However, an EI-7 score ≥ 4 represents either at least four marginal failures, which is associated with a < 5% cumulative failure rate (Pearson, 2009) or at least two failures at conservative cutoffs. Either of these combinations provide sufficient psychometric evidence of non-credible responding. Therefore, this level of performance (≥ 4) was considered an overall Fail in this study. The EI model has been extensively validated in different samples and against a variety of criterion measures (Abeare et al., 2021a, 2022b; An et al., 2019; Boucher et al., 2023; Erdodi, 2023; Erdodi et al., 2019a; Holcomb et al., 2022b; Tyson et al., 2023), demonstrating strong classification accuracy and robustness to moderate/severe TBI (Erdodi & Abeare, 2020; Erdodi et al., 2019b). Independent replications confirmed its clinical utility (Robinson et al., 2023; Tyson & Shahein, 2023).
The parallel use of the TOMM-1, VI-5 and EI-7 provides alternative conceptualizations of performance validity [i.e., free-standing versus embedded PVTs; the traditional dichotomous (Pass/Fail) versus ordinal components (levels of failure)]. As such, they represent an engineered method variance for the validation of the ATR. In the absence of a gold standard measure of the credibility of the clinical presentation, using a variety of instruments/aggregation methods affords an opportunity to examine classification accuracy across changing psychometric definitions of invalid response sets.
Procedure
Data were collected and curated by the first author. Patient files were irreversibly de-identified at the source: no personal information was recorded for research purposes. The project was approved by the Research Ethics Board of the university listed as the last author’s institutional affiliation. APA guidelines regulating research involving human participants were followed throughout the process.
Data Analysis
Descriptive statistics [M, SD, base rates of failure (BRFail)] were reported as relevant. Inferential statistics included receiver operating characteristics curves [area under the curve (AUC) with corresponding 95% confidence intervals (CIs)], and Chi-square tests of independence. Sensitivity, specificity, and overall correct classification (OCC; the sum of true positives and true negatives divided by N) were calculated using standard formulas. Effect size estimates were expressed in Ф2. Although the interpretation of the magnitude of the association is context-dependent, an effect of .40 is considered to be at the upper limit of values typically observed in psychosocial and biomedical research (Rosnow & Rosenthal, 2003).
For most clinical instruments, the benchmark value for sensitivity and specificity is .80 (Gregory, 2013). However, given the delicate nature of symptom and performance validity, specificity is prioritized over sensitivity, with ≥ .90 being the lower limit (i.e., a false positive rate of ≤ .10; Boone, 2013; Chafetz, 2022). Therefore, instead of optimizing cutoffs to achieve a balance between sensitivity and specificity, the latter is prioritized, allowing the former to fall where it may. In PVT research, this typically produces a sensitivity hovering around .50, while specificity is fixed at .90. This seemingly inevitable trade-off between sensitivity and specificity has been labeled the Larrabee limit (Erdodi et al., 2014; Crişan et al., 2021).
Results
The ATR was a significant predictor of the SVT-2A (AUC = .73; 95% CI: .63-.83). A cutoff of ≥ 7 failed to approximate the specificity standard (.78). The next cutoff (≥ 8) produced an acceptable combination of sensitivity (.43) and specificity (.89) at .710 OCC. Raising the cutoff to ≥ 9 achieved high specificity (.95) at a reasonable cost to sensitivity (.35) and no change in OCC. Further increasing the cutoff to ≥ 10 reached the point of diminishing returns (.24 sensitivity at .96 specificity and .677 OCC). Perfect specificity but low sensitivity (.16) was observed at ≥ 11 (Table 4). At ≥ 15, sensitivity was very low (.05).
The ATR was an even stronger predictor of the SVT-2B (AUC = .85; 95% CI: .74-.96). Once again, a cutoff of ≥ 7 failed to approximate the specificity standard (.78), but ≥ 8 produced a good combination of sensitivity (.68) and specificity (.89) at .821 OCC. Raising the cutoff to ≥ 9 achieved improved specificity (.92) at an acceptable cost to sensitivity (.53) and OCC (.786). Further increasing the cutoff to ≥ 10 resulted in high specificity (.95) but a further decline in sensitivity (.37) and OCC (.750). Perfect specificity was achieved at ≥ 13 at low sensitivity (.21). Predictably, sensitivity was even lower at ≥ 15 (.11).
In sharp contrast to the analyses above, the ATR was a non-significant predictor of TOMM-1 (BRFail = 45.4%; AUC = .53, 95% CI: .41-.64). Therefore, classification accuracy was not computed. However, the ATR was a significant predictor of the VI-5 (AUC = .64, 95% CI: .52-.77). Once again, a cutoff of ≥ 7 failed to achieve minimum specificity (.79), as did the next level of cutoff (.83 specificity). The first cutoff to reach .90 specificity was ≥ 9, at .37 sensitivity and .690 OCC. Making the cutoff more conservative (≥ 10) resulted in high specificity (.94) but low sensitivity (.29) and OCC (.678). Further raising the cutoff to ≥ 13 resulted in increased specificity (.96), but a notable loss in sensitivity (.18) and OCC (.644). Sensitivity further declined at ≥ 15 (.04), with slight improvement in specificity (.99).
The ATR was also a significant predictor of the EI-7 (AUC = .69, 95% CI: .56-.82). Once again, ≥ 7 and ≥ 8 failed to achieve minimum specificity (.80-.85). However, the next cutoff (≥ 9) produced high specificity (.93), albeit at low sensitivity (.26) and OCC (.657). Raising the cutoff to ≥ 10 resulted in the predictable trade-off between sensitivity (.19) and specificity (.95). Making the cutoff even more conservative (≥ 13) achieved perfect specificity but low (.15) sensitivity (Table 5). Predictably, sensitivity was even lower at ≥ 15 (.04).
Next, the relationship between self-reported trauma symptoms and the outcome of SVTs and PVTs was examined. Trauma was operationalized as the T-score on the Anxiety Related Distress scale of the PAI (ARDPAI) [categorized as none (< 60), mild (60–69), moderate (70–89) and severe (≥ 90)] and the Posttraumatic Stress factor (PTSTSI-2) on the TSI-2 [categorized as none (< 55), mild (55–64), moderate (65–74) and severe (≥ 75)]. On ARDPAI, a strong linear relationship emerged for NIMPAI (invalid defined as T ≥ 81), BDI-II (invalid defined as ≥ 38) and ATR ≥ 9 (p < .001, Ф2: .229-.297; very large effects). The trend extended to SVT-2A (p < .001, Ф2 = .248, very large effect) and was accentuated on SVT-2B (p < .001, Ф2 = .406, very large effect). However, none of the contrasts were significant using BRFail on PVTs (p: .335-.930).
On PTSTSI-2, a notably stronger linear relationship emerged for NIMPAI (invalid defined as T ≥ 81), BDI-II (invalid defined as ≥ 38) and ATR ≥ 9 (p < .001, Ф2: .377-.501; extremely large effects). The trend extended to SVT-2A (p < .001, Ф2 = .375, very large effect) and was further accentuated on SVT-2B (p < .001, Ф2 = .569, extremely large effect). However, the only significant contrast using BRFail on PVTs emerged on the EI-7 (p = .032, Ф2 = .133, medium effect; Table 6).
Finally, the BRFail on ATR cutoffs that showed the best overall classification accuracy (≥ 8, ≥ 9 and ≥ 10) were compared across patients with low head injury severity (i.e., uncomplicated mild TBI) and patients with high head injury severity (i.e., complicated mild, moderate and severe TBI). There was no difference in BRFail as a function of TBI severity (p: .824-.963; Table 7). Likewise, comparable BRFail was observed on the criterion measures [SVT-2A and SVT-2B (p: .461-.492) as well as the TOMM-1, VI-5, and EI-7 (p: .117-.508)].
Discussion
Overview of the Results
Assessing the credibility/validity of self-reported symptoms is of paramount importance in clinical and forensic settings (Sweet et al., 2021), and the ATR of the TSI-2 is one of the relatively few SVTs available to professionals working in the field of psychological injury and law (Giromini et al., 2022). Unfortunately, empirical research on its efficacy has been relatively sparse and inconclusive (Ales & Erdodi, 2022). Therefore, the current study was designed to empirically evaluate its classification accuracy against a commonly used SVT and a series of PVTs in a consecutive case sequence of 99 patients referred for neuropsychological evaluations in the context of motor vehicle collisions. We predicted that the ATR would produce a superior classification accuracy against SVTs compared to PVTs and that the optimal cutoff would be closer to ≥ 8 (Gray et al., 2010) than ≥ 15 (Briere, 2011).
Both hypotheses were supported by the data. The default cutoff (≥ 15) grossly underestimated the prevalence of non-credible symptom report (2%) within this sample relative to other SVTs (34–40%) or PVTs (40–45%). Given the strong correlation between BRFail, sensitivity and specificity (Dandachi-FitzGerald & Martin, 2022; Rai et al., 2023), it is not surprising that ATR ≥ 15 produced consistently poor classification accuracy (driven by dismal sensitivity) against both versions of the SVT-2 (.05-.11 sensitivity at 1.00 specificity and .624-.773 OCC) and the VI-5/EI-7 (.03-.04 sensitivity at .98–1.00 specificity and .598-.612 OCC). In contrast, at a BRFail of 25.3%, ATR ≥ 8 approximated the specificity standard (.89) against SVT-2; at a BRFail of 19.2%, ATR ≥ 9 produced a good combination of sensitivity (.35-.53) and specificity (.92-.95), at .710-.786 OCC. Similarly, ATR ≥ 9 was specific (.90-.93) to invalid performance on measures of cognitive ability, albeit at low sensitivity (.26-.37). Therefore, ≥ 9 emerged as the optimal cutoff on the ATR that provides a reasonable balance between high (≥ .90) specificity and sensitivity (.26-.53) using both SVTs and PVTs as criterion measures. It should be noted, however, that an ATR ≥ 9 still only detects between a quarter and half of the sample with independent psychometric evidence of non-credible clinical presentation.
Clinical/Forensic Implications
Taken together, these results converge in a number of practical conclusions: 1) The default cutoff of ≥ 15 provides a highly biased estimate of the prevalence of invalid symptom report, detecting 5–11% of the profiles identified as non-credible by other SVTs. Therefore, its use in clinical and forensic settings cannot be justified due to unacceptably high (90–95%) false negative rates. 2) Alternative cutoffs offer an opportunity to recalibrate the classification of the ATR and provide a more balanced trade-off between sensitivity and specificity. The more liberal cutoff of ≥ 8, although technically fell short of the .90 specificity standard (.89 against both versions of the SVT-2), provided the single best OCC, correctly classifying between 71 and 82% of the sample. As such, it can be considered the first level of failure, and has the potential to serve as a screening cutoff (i.e., help rule in non-credible symptom report). The next level of cutoff (≥ 9) had uniformly high specificity (.90-.95) against a range of SVTs and PVTs as criterion measures. Finally, an ATR score ≥ 10 was associated with consolidated specificity (.95-.96), indicating a level of symptom report that is likely invalid. 3) Although the ATR was a weaker predictor of PVTs relative to SVTs as criterion measures (consistent with our prediction and the results of previous research), the specificity of ≥ 9 was invariant of type and composition of the criterion. In other words, failing this cutoff suggests a globally invalid clinical presentation, consistent with earlier reports that sufficiently extreme response styles override the modality specificity effect (Rai & Erdodi, 2021) and become significant predictors of invalid presentation in different domains of clinical assessment (Holcomb et al., 2022a). 4) Elevations on the ARDPAI and PTSTSI-2 were associated with symptom overreport on other scales. The majority of patients (70–100%) with extreme scores (T ≥ 90 and ≥ 80, respectively) had independent evidence of symptom magnification. 5) Elevations on the ARDPAI and PTSTSI-2 were unrelated to PVT outcomes, suggesting that the credibility of self-reported PTSD symptoms and cognitive deficits observed on performance-based tests may be orthogonal and therefore, should be evaluated independently (Sabelli et al., 2021).
Mathematically, an ATR cutoff of ≥ 9 allows for endorsing all of the items at the first severity level above Never or half of the items at the severity level above that, and still have the response set deemed valid. Phenomenologically, the ATR’s item content [neurologically (global amnesia) or physiologically (inability to meet basic needs for prolonged periods of time; medically unexplained severe disfunction of the autonomic and/or peripheral nervous system) implausible severe impairments that are incompatible with normal functioning] seems to consist of a series of pathognomonic signs of non-credible symptom report. In other words, a qualitative review of the symptoms used to determine the validity of the response set suggests that ≥ 9 constitutes a sufficiently conservative demarcation line between credible and non-credible response sets. Assessors can recruit this argument in defense of their interpretation of a score in the failing range on the ATR, on top of the classification accuracy statistics.
In addition, results displayed in Table 6 reveal that the PTSTSI-2 is more susceptible to contamination by non-credible symptom report compared to the ARDPAI. A larger proportion of variance in T-scores was explained by failing both univariate (38–50% versus 23–30%) and multivariate (38–57% versus 25–41%) SVTs on the PTSTSI-2 relative to ARDPAI. Likewise, while a trivial (and statistically non-significant) amount of variance was captured by PVT failures on the ARDPAI (1–4%), these values were higher (6–13%) on the PTSTSI-2. If these findings were to be replicated by future research, they could inform both test selection and interpretation.
Given that 92–100% of the patients who scored T ≥ 75 on the PTSTSI-2 also had strong psychometric evidence of non-credible clinical presentation, these results implicitly validate this cutoff as an emerging alternative embedded SVT within the TSI-2. Our findings suggest that extreme scores (i.e., T ≥ 75) on the PTSTSI-2 are specific to invalid symptom report. Therefore, a score in this range should be interpreted with caution. Namely, alternative explanations (i.e., invalid responding) should be ruled out before considering such a score evidence of genuinely elevated posttraumatic stress. Naturally, future cross-validation research is needed to determine the generalizability of these results to other samples with different clinical characteristics, using different criterion grouping methods.
Results in the Context of Previous Research
Although there are no universally accepted standards for assessing the efficacy of an SVT, the results of recent meta-analytic studies suggest that widely used embedded SVTs, such as the validity scales of the MMPI-2-RF or the validity scales of the PAI, typically yield classification accuracy statistics similar to those observed for the ATR in our study. For example, a frequently cited meta-analysis by Sharf et al. (2017) found that in assessing feigned mental disorders, the Fp-r of the MMPI-2-RF (arguably one of the most effective SVTs currently available; Burchett & Bagby, 2022; Giromini et al, 2022) has an average specificity of .92 and an average sensitivity of .45, at the commonly used cutoff of T ≥ 80 (a seemingly invariant trade-off between sensitivity and specificity dubbed the Larrabee limit; Crişan et al., 2021; Erdodi et al., 2014). From this point of view, the ATR appears to be a promising SVT at the alternative cutoff score of ≥ 9.
On the other hand, it may be premature to conclude that the ATR is about as valid as the other commonly used SVTs. Indeed, results suggest that at ≥ 9, the ATR identified between a third and half of the patients who failed the SVT-2, indicating non-credible symptom report. At the same time, ATR ≥ 9 demonstrated a level of sensitivity to elevations in ARDPAI and PTSTSI-2 that is comparable to that of the NIMPAI ≥ 81. This is a remarkable performance given that ARDPAI and NIMPAI are part of the same instrument. Combined with the fact that the ATR was also a significant predictor of PVT failures, it provides further support for the hypothesis that the ATR is sensitive to diffuse signs of non-credible clinical presentation that transcends domains (symptom versus performance validity) and instruments.
Likewise, the ATR and the NIMPAI explained a similar proportion of the variance in self-reported symptom severity on the ARDPAI (23% and 27%, respectively). However, the ATR explained a notably higher proportion of the variance than the NIMPAI in self-reported symptom severity on the PTSTSI-2 (50% versus 38%, respectively). Therefore, at least in terms of non-credible PTSD symptoms, the ATR demonstrated comparable sensitivity to the NIMPAI (Table 6).
These findings contradict earlier reports by Gray and colleagues (2010) that the NIMPAI outperformed the ATR in differentiating coached PTSD simulators and credible patients with PTSD. The discrepancy between the two studies reinforces the importance of relying on real-world patients with suspected symptom over-report who operate under significant financial incentives rather than experimentally used malingering while calibrating SVTs. The incentive structure in lab-based studies (i.e., participation is rewarded rather than the ability to mimic credible impairment; Abeare et al., 2021a, b; Erdal, 2012; Rai et al., 2019; van Helvoort et al., 2019) are meaningfully different from those in high-stake medicolegal settings. The ultimate purpose of SVTs is to accurately detect non-credible symptom report in applied clinical or forensic setting (Fuermaier et al., 2023b).
The validity scales of the MMPI instruments have been the subject of meta-analytic studies summarizing the results of dozens of empirical studies (e.g., Ingram & Ternes, 2016; Rogers et al., 2003; Sharf et al., 2017), as have the validity scales of the PAI (e.g., Hawes & Boccaccini, 2009; Kurtz & McCredie, 2022). The SIMS and the IOP-29, two other widely used SVTs, have also been extensively researched and have been the subject of quantitative literature reviews (e.g., Giromini & Viglione, 2022; Shura et al., 2022) and extensive meta-analytic studies (e.g., Puente-López et al., 2023; van Impelen et al., 2014). In contrast, there are still a limited number of studies to date on the efficacy of the ATR. Until the knowledge base on this relatively rarely studied SVT consolidates, assessors should exercise appropriate caution when interpreting its results.
Consistent with emerging empirical findings (Holcomb et al., 2022a; Sabelli et al., 2021), the ATR showed better classification accuracy against the SVT-2 as a criterion than as compared to PVTs (VI-5 and EI-7). This finding is not surprising, as symptom validity and performance validity are commonly conceptualized as related but ultimately distinct constructs (Blavier et al., 2023; De Boer et al., 2023; Giromini et al., 2020; Larrabee, 2012; Merten et al., 2020; Sabelli et al., 2021). Indeed, as Giromini et al. (2022) pointed out, “The optimal criterion variables in SVT research are SVTs, or maybe SVTs combined with PVTs, but not PVTs alone” (p. 13). Therefore, additional research using other SVTs to further investigate the efficacy of the ATR in detecting symptom invalidity would be beneficial.
The ATR Scale as a Screening Tool
In this context, it should be noted that the fact that ATR ≥ 9 had similar specificity against both SVTs and PVTs (although lower sensitivity to the latter) is remarkable, suggesting that the ATR taps a common source of non-credible presentation affecting both performance on cognitive tests and pattern of self-reported symptoms (Bianchini et al., 2005, 2014; Gervais et al., 2007, 2011; Merten et al., 2022; Richman et al., 2006; Tarescavage et al., 2013; Tylicki et al., 2021; Young, 2020). If replicated by future research, this feature may uniquely position the ATR scale to serve as a brief (potentially stand-alone) screener for the credibility of self-reported symptoms and deficits. Although administering the full TSI-2 in a setting in which assessors operate under high volume pressures would not be practical, the eight items that define the score on ATR can be administered and scored under one minute. In other words, the score on the ATR can serve as a quick and rough estimate of symptom validity to inform downstream decision about further in-depth assessment or treatment planning, similar to the screening function the Mini-Mental Status Exam (Folstein et al., 1975) for the presence/absence of cognitive deficits (Erdodi et al., 2020; Mitchell, 2009, 2017; Tsoi et al., 2015).
Equally important, failing the ATR was unrelated to TBI severity, as was the case for all of the criterion measures. This negative finding can serve to pre-empt attempts to discount scores in the failing range on the ATR (or other SVTs and PVTs), invoking contamination by genuine and severe trauma. Although the emotional salience of a motor vehicle accident cannot be accurately captured by physical parameters (force of the impact, whether airbags deployed, amount of damage to the car, etc.) alone, in the context of motor vehicle collisions, psychological trauma and TBI severity likely have an inverted U-shape relationship. In other words, the intuitive positive linear relationship between these two factors eventually reverses: once the impact results in a sufficiently severe TBI, the significant peritraumatic amnesia typically associated with such an injury effectively erases accident-related memories. However, this may not hold true for other traumatic experiences due to irreversible, tragic losses (e.g., death of a loved one during the accident) that induce a secondary trauma unrelated to the experiential aspects of the collision.
Limitations
Results should be interpreted in the context of the study’s limitations. First, the sample size was relatively small, restricted to a single region of Canada and to a medicolegal context, so additional replications with larger sample sizes based on patients from different geographic locations (Lichtenstein et al., 2019), with different clinical characteristics, assessed in a medical context are needed before the TSI-2 ATR can be fully endorsed as an all-purpose SVT. Second, but somewhat related, because the percentage of noncredible presentations in real-world clinical settings is likely greater than zero but typically lower than the 40% in the present sample (Young, 2015; Young et al., 2016), criterion groups studies tend to include a large number of credible cases but a smaller number of non-credible cases.
Given the influence of BRFail on classification accuracy (Dandachi-FitzGerald & Martin, 2022; Rai et al., 2023), the difference between the prevalence of invalid profiles should be taken into account during the interpretation of divergent findings from future reports. Although the prevalence of non-credible presentation was remarkably consistent across domains (SVTs and PVTs) and instruments (SVT-2, TOMM-1, VI-5 and EI-7) as well as with previous estimates (Czornik et al., 2022; Larrabee et al., 2009; Merten et al., 2009, 2020; Richman et al., 2006), given the ubiquitous presence of significant external incentives to appear impaired within our sample, it is likely higher than what is typical in clinical settings (Merten et al., 2016; Puente-López et al., 2023; Young, 2015; Young et al., 2016). Therefore, the ATR’s classification accuracy and predictive power may differ in settings with different clinical characteristics and motivation status.
Finally, the present sample was restricted to patients born in Canada to control for the potential confounding effect of linguistic and cultural diversity (Boskovic et al., 2020; Crişan, et al., 2023a; Dandachi-FitzGerald et al., 2023a; Erdodi & Lajiness-O’Neill, 2012). Future research examining the classification accuracy of the ATR in examinees with limited English proficiency (LEP) would greatly advance the knowledge base of symptom validity assessment. Although some instruments proved remarkably robust to LEP, concerns about this threat to the validity of SVT scores rightfully persist (Crișan, 2023).
Like all research based on criterion groups, our study also provides stronger evidence on specificity than sensitivity (Chafetz, 2022). Another limitation common to all criterion-group studies is that the internal validity of our research design needs to be critically evaluated in the absence of gold standard method for establishing credible vs non-credible responding. That is, although we attempted to increase the internal validity of our study by using psychometrically sound PVTs and SVTs, it is possible that our criterion groups were themselves contaminated by classification errors. Notably, one of the SVTs (BDI-II) is a recently introduced measure of symptom validity (Fuermaier et al., 2023a, b) with no independent replication (although there is previous support for its sensitivity for non-credible symptom report; Wiggins et al., 2012). Nevertheless, despite all of these and possibly other limitations, our study contributes valuable empirical data from a real-world medicolegal sample that provide unique insights into the utility of the ATR in assessing symptom validity.
Conclusions
Overall, the ATR demonstrated its potential to serve as an effective SVT in medico-legal (and potentially general clinical) settings. However, the cutoff (≥ 15) proposed by Briere (2011) proved prohibitively conservative in the present sample, and grossly underestimated the prevalence of non-credible responding as operationalized by the SVT-2 (2.0% versus 34–40%). In combination with external incentives to appear impaired due to active engagement in personal injury litigation and a consistently high BRFail on both SVTs and PVTs (40–45%) within this sample, the most plausible interpretation of the very low BRFail on the ATR at ≥ 15 is that such a highly conservative cutoff artificially suppresses the detection of invalid symptom report. Using this cutoff in survivors of motor vehicle collisions is essentially giving examinees a Pass (Erdodi, 2023). In other words, the choice of cutoff can strongly influence the outcome of the evaluation, lowering the likelihood of detecting invalid response sets below what most assessors consider reasonable. Results suggest that the decision to use the default cutoff (≥ 15) on ATR essentially sets the false negative rate to around 90–95%.
Conversely, even though the cutoff (≥ 8) recommended by Gray and colleagues (2010) falls short of the .90 specificity standard, it may be useful for screening purposes. As a compromise, a cutoff of ≥ 9 seems to provide sufficient specificity to both invalid symptom report and cognitive performance (.90-.95), while maximizing sensitivity (.26-.53). These findings are consistent with trends observed in PVTs: post-publication research often reveals that the originally proposed cutoffs were overly conservative, and more liberal cutoffs would optimize overall classification accuracy (Ashendorf et al., 2021; Erdodi et al., 2018, 2023; Martin et al., 2020, Poynter et al., 2019).
As it is the case with all measures of performance and symptom validity, assessors should not rely on the ATR as the sole indicator to establish the credibility of the entire clinical presentation (APA, 2013; p. 15). However, the ATR can serve as an effective screener of symptom validity, providing the first valuable data point in the assessment process. Its potential for providing incremental information when combined with other SVTs is worth investigating further. Given that the ATR only contains eight items that are quick and easy to administer and score, clinicians operating under time constraints may choose to only administer it as a stand-alone SVT – provided that a thorough assessment of trauma-related distress is not a central/immediate goal of the evaluation.
Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Notes
Zero means that the item content has never occurred in the past six months; 1 means that it has rarely occurred in the past six months; 2 means it occurred a few times; 3 that it has occurred often.
References
Abeare, C., Messa, I., Whitfield, C., Zuccato, B., Casey, J., & Erdodi, L. (2019a). Performance validity in collegiate football athletes at baseline neurocognitive testing. Journal of Head Trauma Rehabilitation, 34(4), 20–31. https://doi.org/10.1097/HTR.0000000000000451
Abeare, C., Sabelli, A., Taylor, B., Holcomb, M., Dumitrescu, C., Kirsch, N., & Erdodi, L. (2019b). The importance of demographically adjusted cutoffs: Age and education bias in raw score cutoffs within the Trail Making Test. Psychological Injury and Law, 12(2), 170–182. https://doi.org/10.1007/s12207-019-09353-x
Abeare, C. A., Hurtubise, J., Cutler, L., Sirianni, C., Brantuo, M., Makhzoun, N., & Erdodi, L. (2021a). Introducing a forced choice recognition trial to the Hopkins Verbal Learning Test – Revised. The Clinical Neuropsychologist, 35(8), 1442–1470. https://doi.org/10.1080/13854046.2020.1779348
Abeare, C. A., An, K., Tyson, B., Holcomb, M., Cutler, L., May, N., & Erdodi, L. A. (2022a). The emotion word fluency test as an embedded performance validity indicator-alone and in a multivariate validity composite. Applied Neuropsychology: Child, 11(4), 713–724. https://doi.org/10.1080/21622965.2021.1939027
Abeare, K., Razvi, P., Sirianni, C. D., Giromini, L., Holcomb, M., Cutler, L., Kuzmenka, P., & Erdodi, L. A. (2021b). Introducing alternative validity cutoffs to improve the detection of non-credible symptom report on the BRIEF. Psychological Injury and Law, 14(1), 2–16. https://doi.org/10.1007/s12207-021-09402-4
Abeare, K., Cutler, L., An, K. Y., Razvi, P., Holcomb, M., & Erdodi, L. A. (2022b). BNT-15: Revised performance validity cutoffs and proposed clinical classification ranges. Cognitive and Behavioral Neurology, 35(3), 155–168. https://doi.org/10.1097/WNN.0000000000000304
Ales, F., Erdodi, L. (2021). Detecting Negative Response Bias Within the Trauma Symptom Inventory-2 (TSI-2): a Review of the Literature. Psychological Injury and Law, 1–8. https://doi.org/10.1007/s12207-021-09427-9
Ali, S., Elliott, L., Biss, R., Abumeeiz, M., Brantuo, M., Kuzmenka, P., Odenigbo, P., & Erdodi, L. (2022). The BNT-15 provides an accurate measure of English proficiency in cognitively intact bilinguals – A study in cross-cultural assessment. Applied Neuropsychology: Adult, 29(3), 351–363. https://doi.org/10.1080/23279095.2020.1760277
American Psychological Association. (2013). Specialty guidelines for forensic psychology. American Psychologist, 68(1), 7–19. https://doi.org/10.1037/a0029889
An, K. Y., Charles, J., Ali, S., Enache, A., Dhuga, J., & Erdodi, L. A. (2019). Re-examining performance validity cutoffs within the Complex Ideational Material and the Boston Naming Test - Short Form using an experimental malingering paradigm. Journal or Clinical and Experimental Neuropsychology, 41(1), 15–25. https://doi.org/10.1080/13803395.2018.1483488
Ashendorf, L., Constantinou, M., & McCaffrey, R. J. (2004). The effect of depression and anxiety on the TOMM in community-dwelling older adults. Archives of Clinical Neuropsychology, 19(1), 125–130. https://doi.org/10.1016/S0887-6177(02)00218-4
Ashendorf, L., Clark, E. L., & Sugarman, M. A. (2017). Performance validity and processing speed in a VA polytrauma sample. The Clinical Neuropsychologist, 31(5), 857–866. https://doi.org/10.1080/13854046.2017.1285961
Ashendorf, L., Clark, E. L., & Humphreys, C. T. (2021). The Rey 15-Item Memory Test in US veterans. Journal of Clinical and Experimental Neuropsychology, 43(3), 324–331.
Azoulay, E., Cariou, A., Bruneel, F., Demoule, A., Kouatchet, A., Reuter, D., ... & Kentish-Barnes, N. (2020). Symptoms of anxiety, depression, and peritraumatic dissociation in critical care clinicians managing patients with COVID-19. A cross-sectional study. American Journal of Respiratory and Critical Care Medicine, 202(10), 1388–1398. https://doi.org/10.1164/rccm.202006-2568OC
Beck, A. T., Steer, R. A., & Brown, G. K. (1996). Beck Depression Inventory-II (BDI-II). The Psychological Corporation, Harcourt Brace.
Ben-Porath, Y. S., & Tellegen, A. (2008). Minnesota Multiphasic Personality Inventory-2-Restructured Form: Manual for administration, scoring and interpretation. University of Minnesota Press.
Bianchini, K. J., Greve, K. W., & Glynn, G. (2005). On the diagnosis of malingered pain-related disability: Lessons from cognitive malingering research. The Spine Journal, 5(4), 404–417.
Bianchini, K. J., Aguerrevere, L. E., Guise, B. J., Ord, J. S., Etherton, J. L., Meyers, J. E., ... & Bui, J. (2014). Accuracy of the Modified Somatic Perception Questionnaire and Pain Disability Index in the detection of malingered pain-related disability in chronic pain. The Clinical Neuropsychologist, 28(8), 1376–1394.
Blavier, A., Palma, A., Viglione, D. J., Zennaro, A., & Giromini, L. (2023). A Natural Experiment Design Testing the Effectiveness of the IOP-29 and IOP-M in Assessing the Credibility of Reported PTSD Symptoms in Belgium. Journal of Forensic Psychology Research and Practice, Advance Online Publication. https://doi.org/10.1080/24732850.2023.2203130
Boone, K. B. (2009). The need for continuous and comprehensive sampling of effort/response bias during neuropsychological examination. The Clinical Neuropsychologist, 23(4), 729–741. https://doi.org/10.1080/13854040802427803
Boone, K. B. (2013). Clinical Practice of Forensic Neuropsychology – An evidence-based approach. New York, NY: Guilford.
Boone, K. B., Lu, P., & Herzberg, D. (2002a). Rey Dot Counting Test: A handbook. Los Angeles: Western Psychological Services.
Boone, K. B., Salazar, X., Lu, P., Warner-Chacon, K., & Razani, J. (2002b). The Rey 15-item recognition trial: A technique to enhance sensitivity of the Rey 15-item memorization test. Journal of Clinical and Experimental Neuropsychology, 24(5), 561–573. https://doi.org/10.1076/jcen.24.5.561.1004
Boskovic, I. (2020). Do motives matter? A comparison between positive and negative incentives in students’ willingness to malinger. Educational Psychology, 40(8), 1022–1032. https://doi.org/10.1080/01443410.2019.1704400
Boskovic, I., Merckelbach, H., Merten, T., Hope, L., & Jelicic, M. (2020). The Self-Report Symptom Inventory as an instrument for detecting symptom over-reporting: An exploratory study with instructed simulators. European Journal of Psychological Assessment, 36(5), 730–739. https://doi.org/10.1027/1015-5759/a000547
Boucher, C., May, N., Shahein, A., Roth, R. M., & Erdodi, L. A. (2023). Examining the effect of repeat administration, alternate versions and performance validity on letter fluency tests in a mixed clinical sample. Advance online publication.
Brand, B. L., Tursich, M., Tzall, D., & Loewenstein, R. J. (2014). Utility of the SIRS-2 in distinguishing genuine from simulated dissociative identity disorder. Psychological Trauma: Theory, Research, Practice, and Policy, 6(4), 308–317. https://doi.org/10.1037/a0036064
Briere, J. (2011). Trauma Symptom Inventory-2nd edition (TSI-2) professional manual. Odessa, FL: Psychological Assessment Resources. https://doi.org/10.1002/9780470479216.corpsy1010
Burchett, D., & Bagby, R. M. (2022). Assessing negative response bias: A review of the noncredible overreporting scales of the MMPI-2-RF and MMPI-3. Psychological Injury and Law, 15, 1–15. https://doi.org/10.1007/s12207-021-09435-9
Bush, S. S., Ruff, R. M., Tröster, A. I., Barth, J. T., Koffler, S. P., Pliskin, N. H., Reynolds, C. R., & Silver, C. H. (2005). Symptom validity assessment: Practice issues and medical necessity NAN Policy and Planning Committee. Archives of Clinical Neuropsychology, 20(4), 419–426. https://doi.org/10.1016/j.acn.2005.02.002
Butcher, J. N., Graham, J. R., Ben-Porath, Y. S., Tellegen, A., Dahlstrom, W. G., & Kaemmer, B. (2001). Minnesota Multiphasic Personality Inventory – 2: Manual for administration, scoring and interpretation (rev). University of Minnesota.
Candel, I., & Merckelbach, H. (2004). Peritraumatic dissociation as a predictor of post-traumatic stress disorder: A critical review. Comprehensive Psychiatry, 45(1), 44–50. https://doi.org/10.1016/j.comppsych.2003.09.012
Chafetz, M. D. (2022). Deception is different: Negative validity test findings do not provide “evidence” for “good effort.” The Clinical Neuropsychologist, 36(6), 1244–1264. https://doi.org/10.1080/13854046.2020.1840633
Crișan, I. (2023). English versus native language administration of the IOP-29-M produces similar results in a sample of Romanian bilinguals: A brief report. Psychology & Neuroscience. https://doi.org/10.1037/pne0000316
Crişan, I., Sava, F. A., Maricuţoiu, L. P., Ciumăgeanu, M. D., Axinia, O., Gîrniceanu, L., & Ciotlăuş, L. (2021). Evaluation of various detection strategies in the assessment of noncredible memory performance: Results of two experimental studies. Assessment, 29(8), 1973–1984. https://doi.org/10.1177/10731911211040105
Crişan, I., Ali, S., Cutler, L., Matei, A., Avram, L., & Erdodi, L. A. (2023a). Geographic variability in limited English proficiency: A cross-cultural study of cognitive profiles. Journal of the International Neuropsychological Society. https://doi.org/10.1017/S1355617723000280
Crişan, I., Matei, A., Avram, D. L., Bunghez, C., & Erdodi, L. A. (2023b). Full of surprises: Performance validity testing in examinees with limited English proficiency. Advance online publication.
Curtis, K. L., Thompson, L. K., Greve, K. W., & Bianchini, K. J. (2008). Verbal fluency indicators of malingering in traumatic brain injury: Classification accuracy in known groups. The Clinical Neuropsychologist, 22, 930–945. https://doi.org/10.1080/13854040701563591
Cutler, L., Sirianni, C., & D., Abeare, K., Holcomb, M., & Erdodi, L. A. (2022). One-minute SVT? The V-5 is a stronger predictor of symptom exaggeration than self-reported trauma history. Journal of Forensic Psychology Research and Practice, 22(5), 470–488. https://doi.org/10.1080/24732850.2021.2013361
Czornik, M., Seidl, D., Tavakoli, S., Merten, T., & Lehrner, J. (2022). Motor reaction times as an embedded measure of performance validity: A study with a sample of Austrian early retirement claimants. Psychological Injury and Law, 15(2), 200–212.
Dandachi-FitzGerald, B., Ponds, R. W. H. M., & Merten, T. (2013). Symptom validity and neuropsychological assessment: A survey of practices and beliefs on neuropsychologists in six European countries. Archives of Clinical Neuropsychology, 28(8), 771–873. https://doi.org/10.1093/arclin/act073
Dandachi-FitzGerald, B., & Martin, P. K. (2022). Clinical judgment and clinically applied statistics: Description, benefits, and potential dangers when relying on either one individually in clinical practice. In R. W. Schroeder & P. K. Martin (Eds.), Validity assessment in clinical neuropsychological practice: Evaluating and managing noncredible performance (pp. 107–125). The Guilford Press.
Dandachi-FitzGerald, B., De Page, L., & Merckelbach, H. (2023a). Detecting symptom exaggeration in compensation-seeking individuals, psychotherapy clients, and individuals referred for job assessments: Psychometric features of the French and Dutch versions of the Self-Report Symptom Inventory. Advance online publication.
Dandachi-FitzGerald, B., Pienkohs, S., Merten, T., & Merckelbach, H. (2023b). Detecting symptom overreporting–equivalence of the dutch and german self-report symptom inventory. Psychological Test Adaptation and Development.
Daniels, J. K., Coupland, N. J., Hegadoren, K. M., Rowe, B. H., Densmore, M., Neufeld, R. W., & Lanius, R. A. (2012). Neural and behavioral correlates of peritraumatic dissociation in an acutely traumatized sample. The Journal of Clinical Psychiatry, 73(4), 12573.
De Boer, A. B., Phillips, M. S., Barwegen, K. C., Obolsky, M. A., Rauch, A. A., Pesanti, S. D., ... & Soble, J. R. (2023). Comprehensive analysis of MMPI-2-RF symptom validity scales and performance validity test relationships in a diverse mixed neuropsychiatric setting. Psychological Injury and law, 16(1), 61–72. https://doi.org/10.1007/s12207-022-09467-9
Deloria, R., Kivisto, A. J., Swier-Vosnos, A., & Elwood, L. (2021). Optimal per test cutoff scores and combinations of failure on multiple embedded performance validity tests in detecting performance invalidity in a mixed clinical sample. Applied Neuropsychology: Adult. Advance online publication.
Denning, J. H. (2012). The efficiency and accuracy of the Test of Memory Malingering Trial 1, errors on the first 10 items of the Test of Memory Malingering, and five embedded measures in predicting invalid test performance. Archives of Clinical Neuropsychology, 27(4), 417–432. https://doi.org/10.1093/arclin/acs044
Efendov, A. A., Sellbom, M., & Bagby, R. M. (2008). The utility and comparative incremental validity of the MMPI-2 and Trauma Symptom Inventory validity scales in the detection of feigned PTSD. Psychological Assessment, 20(4), 317–326. https://doi.org/10.1037/a0013870
Elhai, J. D., Gray, M. J., Naifeh, J. A., Butcher, J. J., Davis, J. D., Falsetti, S. A., & Best, C. L. (2005). Utility of the Trauma Symptom Inventory’s Atypical Response Scale in detecting malingered post-traumatic stress disorder. Assessment, 12(2), 210–219. https://doi.org/10.1177/1073191105275456
Erdal, K. (2012). Neuropsychological testing for sports-related concussion: How athletes can sandbag their baseline testing without detection. Archives of Clinical Neuropsychology, 27(5), 473–479. https://doi.org/10.1093/arclin/acs050
Erdodi, L., & Lajiness-O’Neill, R. (2012). Humor perception in bilinguals: Is language more than a code? International Journal of Humor Research, 25(4), 459–468. https://doi.org/10.1515/humor-2012-0024
Erdodi, L., Calamia, M., Holcomb, M., Robinson, A., Rasmussen, L., & Bianchini, K. (2023). M is For Performance Validity: The IOP-M provides a cost-effective measure of the credibility of memory deficits during neuropsychological evaluations. Journal of Forensic Psychology Research and Practice. https://doi.org/10.1080/24732850.2023.2168581
Erdodi, L. A. (2019). Aggregating validity indicators: The salience of domain specificity and the indeterminate range in multivariate models of performance validity assessment. Applied Neuropsychology: Adult, 26(2), 155–172. https://doi.org/10.1080/23279095.2017.1384925
Erdodi, L. A. (2021). Five shades of gray: Conceptual and methodological issues around multivariate models of performance validity. NeuroRehabilitation, 49(2), 179–213. https://doi.org/10.3233/NRE-218020
Erdodi, L. A. (2022). Multivariate models of performance validity: The Erdodi Index captures the dual nature of non-credible responding (continuous and categorical). Assessment. https://doi.org/10.1177/10731911221101910
Erdodi, L. A. (2023). From “below chance” to “a single error is one too many”: Evaluating various thresholds for invalid performance on two forced choice recognition tests. Behavioral Sciences and the Law. https://doi.org/10.1002/bsl.2609
Erdodi, L. A., & Abeare, C. A. (2020). Stronger together: The Wechsler Adult Intelligence Scale-Fourth Edition as a multivariate performance validity test in patients with traumatic brain injury. Archives of Clinical Neuropsychology, 35(2), 188–204. https://doi.org/10.1093/arclin/acz032/5613200
Erdodi, L. A., & Lichtenstein, J. D. (2017). Invalid before impaired: An emerging paradox of embedded validity indicators. The Clinical Neuropsychologist, 31(6–7), 1029–1046. https://doi.org/10.1080/13854046.2017.1323119
Erdodi, L. A., & Lichtenstein, J. D. (2021). Information processing speed tests as PVTs. In K. B. Boone (Ed.), Assessment of feigned cognitive impairment. A neuropsychological perspective (pp. 218–247). New York, NY: Guilford
Erdodi, L. A., Kirsch, N. L., Lajiness-O’Neill, R., Vingilis, E., & Medoff, B. (2014). Comparing the Recognition Memory Test and the Word Choice Test in a mixed clinical sample: Are they equivalent? Psychological Injury and Law, 7(3), 255–263. https://doi.org/10.1007/s12207-014-9197-8
Erdodi, L. A., Tyson, B. T., Abeare, C. A., Lichtenstein, J. D., Pelletier, C. L., Rai, J. K., & Roth, R. M. (2016). The BDAE Complex Ideational Material – A measure of receptive language or performance validity? Psychological Injury and Law, 9, 112–120. https://doi.org/10.1007/s12207-016-9254-6
Erdodi, L. A., Abeare, C. A., Lichtenstein, J. D., Tyson, B., & T., Kucharski, B., Zuccato, B. G., & Roth, R.M. (2017a). WAIS-IV processing speed scores as measures of non-credible responding – The third generation of embedded performance validity indicators. Psychological Assessment, 29(2), 148–157. https://doi.org/10.1037/pas0000319
Erdodi, L. A., Nussbaum, S., Sagar, S., Abeare, C. A., & Schwartz, E. S. (2017b). Limited English proficiency increases failure rates on performance validity tests with high verbal mediation. Psychological Injury and Law, 10(1), 96–103.
Erdodi, L. A., Abeare, C. A., Medoff, B., Seke, K. R., Sagar, S., & Kirsch, N. L. (2018). A single error is one too many: The Forced Choice Recognition trial on the CVLT-II as a measure of performance validity in adults with TBI. Archives of Clinical Neuropsychology, 33(7), 845–860. https://doi.org/10.1093/acn/acx110
Erdodi, L. A., Green, P., Sirianni, C., & Abeare, C. A. (2019a). The myth of high false positive rates on the Word Memory Test in mild TBI. Psychological Injury and Law, 12(2), 155–169. https://doi.org/10.1007/s12207-019-09356-8
Erdodi, L. A., Taylor, B., Sabelli, A., Malleck, M., Kirsch, N. L., & Abeare, C. A. (2019b). Demographically adjusted validity cutoffs in the Finger Tapping Test are superior to raw score cutoffs. Psychological Injury and Law, 12(2), 113–126. https://doi.org/10.1007/s12207-019-09352-y
Erdodi, L. A., Shahein, A. G., Kent, K. J., & Roth, R. M. (2020). The doubtful benefits of giving the benefit of the doubt: Lenient scoring of the spatial orientation items on the mini-Mental Status Exam increases false negative rates. Applied Neuropsychology: Adult, 27(2), 143–149. https://doi.org/10.1080/23279095.2018.1497990
Erdodi, L. A., Hurtubise, J. H., Brantuo, M., Cutler, L., Kennedy, A., & Hirst, R. (2021). Old vs new: The classic and D-KEFS Trails as embedded performance validity indicators and measures of psychomotor speed/executive function. Archives of Assessment Psychology, 11(1), 137–161.
Folstein, M. F., Folstein, S. E., & McHugh, P. R. (1975). “Mini-mental state”: A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12(3), 189–198.
Fuermaier, A. B., Dandachi-Fitzgerald, B., & Lehrner, J. (2023a). Validity assessment of early retirement claimants: Symptom overreporting on the Beck Depression Inventory–II. Advance Online Publication. Applied Neuropsychology: Adult. https://doi.org/10.1080/23279095.2023.2206031
Fuermaier, A. B., Dandachi-Fitzgerald, B., & Lehrner, J. (2023b). Attention performance as an embedded validity indicator in the cognitive assessment of early retirement claimants. Psychological Injury and Law, 16(1), 36–48.
Gegner, J., Erdodi, L. A., Giromini, L., Viglione, D. J., Bosi, J., & Brusadelli, E. (2022). An Australian study on feigned mTBI using the Inventory of Problems–29 (IOP-29), its Memory Module (IOP-M), and the Rey Fifteen Item Test (FIT). Applied Neuropsychology: Adult, 29(5), 1221–1230. https://doi.org/10.1080/23279095.2020.1864375
Gervais, R. O., Ben-Porath, W., & Green, P. (2007). Development and validation of a Response Bias Scale (RBS) for the MMPI-2. Assessment, 14(2), 196–208.
Gervais, R. O., Wygant, D. B., Sellbom, M., & Ben-Porath, Y. S. (2011). Associations between symptom validity test failure and scores on the MMPI–2–RF validity and substantive scales. Journal of Personality Assessment, 93(5), 508–517.
Giromini, L., & Viglione, D. J. (2022). Assessing negative response bias with the Inventory of Problems-29 (IOP-29): A quantitative literature review. Psychological Injury and Law, 15, 79–93. https://doi.org/10.1007/s12207-021-09437-7
Giromini, L., Viglione, D. J., Zennaro, A., Maffei, A., & Erdodi, L. A. (2020). SVT meets PVT: Development and initial validation of the Inventory of Problems – Memory (IOP-M). Psychological Injury and Law, 13, 261–274. https://doi.org/10.1007/s12207-020-09385-8
Giromini, L., Young, G., & Sellbom, M. (2022). Assessing Negative Response Bias Using Self-Report Measures: New Articles, New Issues. Psychological Injury and Law, 15, 1–21. https://doi.org/10.1007/s12207-022-09444-2
Gray, M. J., Elhai, J. D., & Briere, J. (2010). Evaluation on the Atypical Response scale of the Trauma Symptom Inventory-2 in detecting simulated posttraumatic stress disorder. Journal of Anxiety Disorders, 24(5), 447–451. https://doi.org/10.1016/j.janxdis.2010.02.011
Green, P. (2003). Green’s Word Memory Test. Green’s Publishing.
Green, P. (2004). Green’s Medical Symptom Validity Test. Green’s Publishing.
Gregory, R. J. (2013). Psychological testing. History, principles, and applications. Upper Saddle River, NJ: Pearson
Greve, K. W., Bianchini, K. J., & Doane, B. M. (2006). Classification accuracy of the Test of Memory Malingering in traumatic brain injury: Results of a known-group analysis. Journal of Clinical and Experimental Neuropsychology, 28(7), 1176–1190. https://doi.org/10.1080/13803390500263550
Hawes, S. W., & Boccaccini, M. T. (2009). Detection of overreporting of psychopathology on the Personality Assessment Inventory: A meta-analytic review. Psychological Assessment, 21(1), 112–124. https://doi.org/10.1037/a0015036
Heaton, R. K., Miller, S. W., Taylor, M. J., & Grant, I. (2004). Revised comprehensive norms for an expanded Halstead-Reitan battery: Demographically adjusted neuropsychological norms for African American and Caucasian adults. Lutz, FL: Psychological Assessment Resources.
Heinly, M. T., Greve, K. W., Bianchini, K., Love, J. M., & Brennan, A. (2005). WAIS Digit-Span-based indicators of malingered neurocognitive dysfunction: Classification accuracy in traumatic brain injury. Assessment, 12(4), 429–444. https://doi.org/10.1177/1073191105281099
Holcomb, M., Pyne, S., Cutler, L., Oikle, D. A., & Erdodi, L. A. (2022a). Take their word for it: The Inventory of Problems provides valuable information on both symptom and performance validity. Journal of Personality Assessment. https://doi.org/10.1080/00223891.2022.2114358
Holcomb, M. J., Roth, R. M., Tyson, B. T., & Erdodi, L. A. (2022b). Critical item (CR) analysis expands the classification accuracy of performance validity tests based on the forced choice paradigm – Replicating previously introduced CR cutoffs within the Word Choice Test. Neuropsychology, 36(7), 683–694. https://doi.org/10.1037/neu0000834
Holeva, V., & Tarrier, N. (2001). Personality and peritraumatic dissociation in the prediction of PTSD in victims of road traffic accidents. Journal of Psychosomatic Research, 51(5), 687–692. https://doi.org/10.1016/S0022-3999(01)00256-2
Hurtubise, J., Baher, T., Messa, I., Cutler, L., Shahein, A., Hastings, M., Carignan-Querqui, M., & Erdodi, L. (2020). Verbal fluency and digit span variables as performance validity indicators in experimentally induced malingering and real world patients with TBI. Applied Neuropsychology: Child, 9(4), 337–354. https://doi.org/10.1080/21622965.2020.1719409
Ingram, P. B., & Ternes, M. S. (2016). The detection of content-based invalid responding: A meta-analysis of the MMPI-2-Restructured Form’s (MMPI-2-RF) over-reporting validity scales. The Clinical Neuropsychologist, 30(4), 473–496. https://doi.org/10.1080/13854046.2016.1187769
Johnson, S. C., Silverberg, N. D., Millis, S. R., & Hanks, R. A. (2012). Symptom validity indicators embedded in the Controlled Oral Word Association Test. The Clinical Neuropsychologist, 26(7), 1230–1241. https://doi.org/10.1080/13854046.2012.709886
Jones, A. (2013). Test of Memory Malingering: Cutoff scores for psychometrically defined malingering groups in a military sample. The Clinical Neuropsychologist, 27(6), 1043–1059. https://doi.org/10.1080/13854046.2013.804949
Koenen, K. C., Saxe, G., Purcell, S., Smoller, J. W., Bartholomew, D., Miller, A., Hall, E., Kaplow, J., Bosquet, M., Moulton, S., & Baldwin, C. (2005). Polymorphisms in FKBP5 are associated with peritraumatic dissociation in medically injured children. Molecular Psychiatry, 10(12), 1058–1059. https://doi.org/10.1038/sj.mp.4001727
Kulas, J. F., Axelrod, B. N., & Rinaldi, A. R. (2014). Cross-validation of supplemental Test of Memory Malingering Scores as performance validity measures. Psychological Injury and Law, 7(3), 236–244. https://doi.org/10.1007/s12207-014-9200-4
Kurtz, J. E., & McCredie, M. N. (2022). Exaggeration or fabrication? Assessment of negative response distortion and malingering with the Personality Assessment Inventory. Psychological Injury and Law, 15, 37–47. https://doi.org/10.1007/s12207-021-09433-x
Langeluddecke, P. M., & Lucas, S. K. (2003). Wechsler Adult Intelligence Scale-Third Edition findings in relation to severity of brain injury in litigants. The Clinical Neuropsychologist, 17(2), 273–284. https://doi.org/10.1076/clin.17.2.273.16499
Larrabee, G. J. (2008). Aggregation across multiple indicators improves the detection of malingering: Relationship to likelihood ratios. The Clinical Neuropsychologist, 22(4), 666–679. https://doi.org/10.1080/13854040701494987
Larrabee, G. J. (2012). Performance validity and symptom validity in neuropsychological assessment. Journal of International Neuropsychological Society, 18(4), 625–630. https://doi.org/10.1017/S1355617712000240
Larrabee, G. J. (2014). False-positive rates associated with the use of multiple performance and symptom validity tests. Archives of Clinical Neuropsychology, 29(4), 364–373.
Larrabee, G. J., Millis, S. R., & Meyers, J. E. (2009). 40 plus or minus 10, a new magical number: Reply to Russel. The Clinical Neuropsychologist, 23(5), 841–849. https://doi.org/10.1080/13854040902796735
Larrabee, G. J., Rohling, M. L., & Meyers, J. E. (2019). Use of multiple performance and symptom validity measures: Determining the optimal per test cutoff for determination of invalidity, analysis of skew, and inter-test correlations in valid and invalid performance groups. The Clinical Neuropsychologist, 33(8), 1354–2137. https://doi.org/10.1080/13854046.2019.1614227
Lichtenstein, J. D., Greenacre, M. K., Cutler, L., Abeare, K., Baker, S. D., Kent, K., & J., Ali, S., & Erdodi, L. A. (2019). Geographic variation and instrumentation artifacts: In search of confounds in performance validity assessment in adults with mild TBI. Psychological Injury and Law, 12(2), 127–145. https://doi.org/10.1007/s12207-019-0935
Martin, P. K., Schroeder, R. W., & Odland, A. P. (2015). Neuropsychologists’ validity testing beliefs and practices: A survey of North American Professionals. The Clinical Neuropsychologist, 29(6), 741–746.
Martin, P. K., Schroeder, R. W., Olsen, D. H., Maloy, H., Boettcher, A., Ernst, N., & Okut, H. (2020). A systematic review and meta-analysis of the Test of Memory Malingering in adults: Two decades of deception detection. The Clinical Neuropsychologist, 34(1), 88–119. https://doi.org/10.1080/13854046.2019.1637027
McDermott, B. E. (2012). Psychological testing and the assessment of malingering. Psychiatric Clinics, 35(4), 855–876. https://doi.org/10.1016/j.psc.2012.08.006
Merckelbach, H., Merten, T., Dandachi-FitzGerald, B., & Boskovic, I. (2018). De Self-Report Symptom Inventory (SRSI): Een instrument voor klachtenoverdrijving [The Self-Report Symptom Inventory (SRSI): An instrument to measure symptom overreporting]. De Psycholoog, 53(3), 32–40.
Merten, T., & Merckelbach, H. (2013). Symptom validity in somatoform and dissociative disorders: A critical review. Psychological Injury and Law, 6(2), 122–137. https://doi.org/10.1007/s12207-013-9155-x
Merten, T., Thies, E., Schneider, K., & Stevens, A. (2009). Symptom validity testing in claimants with alleged posttraumatic stress disorder: Comparing the Morel Emotional Numbing Test, the Structured Inventory of Malingered Symptomatology, and the Word Memory Test. Psychological Injury and Law, 2(3–4), 284–293.
Merten, T., Merckelbach, H., Giger, P., & Stevens, A. (2016). The Self-Report Symptom Inventory (SRSI): A New Instrument for the assessment of distorted symptom endorsement. Psychological Injury and Law, 9, 102–111. https://doi.org/10.1007/s12207-016-9257-3
Merten, T., Kaminski, A., & Pfeiffer, W. (2020). Prevalence of overreporting on symptom validity tests in a large sample of psychosomatic rehabilitation inpatients. The Clinical Neuropsychologist, 34(5), 1004–1024. https://doi.org/10.1080/13854046.2019.1694073
Merten, T., Dandachi-FitzGerald, B., Hall, V., Bodner, T., Giromini, L., Lehrner, J., ... & Di Stefano, G. (2022). Symptom and performance validity assessment in European countries: An update. Psychological Injury and Law, 15(2), 116–127.
Miller, H.A. (2001). M-FAST: Miller Forensic Assessment of Symptoms Test professional manual. Odessa, FL: Psychological Assessment Resources
Mitchell, A. J. (2009). A meta-analysis of the accuracy of the mini-mental state examination in the detection of dementia and mild cognitive impairment. Journal of Psychiatric Research, 43(4), 411–431.
Mitchell, A. J. (2017). The Mini-Mental State Examination (MMSE): update on its diagnostic accuracy and clinical utility for cognitive disorders. Cognitive screening instruments: A practical approach, 37–48. Cham, Switzerland: Springer
Morey, L. C. (1991). Personality assessment inventory (PAI). Professional manual. Odessa, FL: Psychological Assessment Resources.
Morey, L. C. (2007). Personality Assessment Inventory (PAI). Professional manual (2nd ed.). Psychological Assessment Resources.
Nelson, J. M., Whipple, B., Lindstrom, W., & Foels, P. A. (2019). How is ADHD assessed and documented? Examination of psychological reports submitted to determine eligibility for postsecondary disability. Journal of Attention Disorders, 23(14), 1780–1791.
Nelson, N. W., Sweet, J. J., & Demakis, G. J. (2006). Meta-analysis of the MMPI-2 Fake Bad Scale: Utility in forensic practice. The Clinical Neuropsychologist, 20(1), 39–58. https://doi.org/10.1080/13854040500459322
Palermo, C. A., & Brand, B. L. (2019). Can the Trauma Symptom Inventory-2 distinguish coached simulators from dissociative disorder patients? Psychological Trauma: Theory, Research, Practice, and Policy, 11(5), 477–485. https://doi.org/10.1037/tra0000382
Peace, K. A., & Richards, V. E. S. (2014). Faking it: Incentives and malingered PTSD. Journal of Criminal Psychology, 4(1), 19–32. https://doi.org/10.1108/JCP-09-2013-0023
Pearson (2009). Advanced Clinical Solutions for the WAIS-IV and WMS-IV – Technical Manual. San Antonio, TX: Author
Persinger, V. C., Whiteside, D. M., Bobova, L., Saigal, S. D., Vannucci, M. J., & Basso, M. R. (2018). Using the California Verbal Learning Test, Second Edition as an embedded performance validity measure among individuals with TBI and individuals with psychiatric disorders. The Clinical Neuropsychologist, 32(6), 1039–1053. https://doi.org/10.1080/13854046.2017.1419507
Plohmann, A. M., & Merten, T. (2013). The Third European Symposium on Symptom Validity Assessment-Facts and controversies. Clinical and Health, 24(3), 197–203.
Poynter, K., Boone, K. B., Ermshar, A., Miora, D., Cottingham, M., Victor, T. L., Ziegler, E., Zeller, M. A., & Wright, M. (2019). Wait, there’s a baby in this bath water! Update on quantitative and qualitative cut-offs for Rey 15-Item Recall and Recognition. Archives of Clinical Neuropsychology, 34(8), 1367–1380. https://doi.org/10.1093/arclin/acy087
Puente-López, E., Pina, D., López-Nicolás, R., Iguacel, I., & Arce, R. (2023). The Inventory of Problems–29 (IOP-29): A systematic review and bivariate diagnostic test accuracy meta-analysis. Psychological Assessment, 35(4), 339. https://doi.org/10.1037/pas0001209
Rai, J., & Erdodi, L. (2021). The impact of criterion measures on the classification accuracy of TOMM-1. Applied Neuropsychology: Adult, 28(2), 185–196. https://doi.org/10.1080/23279095.2019.161.1613994
Rai, J., Gervais, R., & Erdodi, L. (2023). A large-scale investigation of the classification accuracy of various performance validity tests in a medical-legal setting. Advance online publication.
Rai, J. K., An, K. Y., Charles, J., Ali, S., & Erdodi, L. A. (2019). Introducing a forced choice recognition trial to the Rey Complex Figure Test. Psychology & Neuroscience, 12(4), 451–472. https://doi.org/10.1037/pne0000175
Resch, Z. J., Pham, A. T., Abramson, D. A., White, D. J., DeDios-Stern, S., Ovsiew, G. P., Castillo, L. R., & Soble, J. R. (2022). Examining independent and combined accuracy of embedded performance validity tests in the California Verbal Learning Test-II and Brief Visuospatial Memory-Revised for detecting invalid performance. Applied Neuropsychology: Adult, 29(2), 252–261. https://doi.org/10.1080/23279095.2020.1742718
Richman, J., Green, P., Gervais, R., Flaro, L., Merten, T., Brockhaus, R., & Ranks, D. (2006). Objective tests of symptom exaggeration in independent medical examinations. Journal of Occupational and Environmental Medicine, 303–311.
Robinson, A., Miller, L. R., Herring, T. T., & Calamia, M. (2023). Utility of the D-KEFS color–word interference test as an embedded validity indicator in psychoeducational evaluations. Psychology & Neuroscience, 16(2), 138–146. https://doi.org/10.1037/pne0000301
Rogers, R., & Bender, D. (2018). Clinical assessment of malingering and deception. New York, NY: Guilford.
Rogers, R., Bagby, R. M., & Dickens, S. E. (1992). SIRS: Structured interview of reported symptoms professional manual. Psychological Assessment Resources Inc.
Rogers, R., Sewell, K. W., Martin, M. A., & Vitacco, M. J. (2003). Detection of feigned mental disorders: A meta-analysis of the MMPI-2 and malingering. Assessment, 10(2), 160–177. https://doi.org/10.1177/1073191103010002007
Rogers, R., Sewell, K. W., & Gillard, N. D. (2010). Structured interview of reported symptoms, second edition: Professional test manual (2nd ed.). Psychological Assessment Resources.
Rosnow, R. L., & Rosenthal, R. (2003). Effect sizes for experimenting psychologists. Canadian Journal of Experimental Psychology/revue Canadienne De Psychologie Expérimentale, 57(3), 221.
Roth, R. M., Isquith, P. K., & Gioia, G. A. (2005). BRIEF-A: Behavior Rating Inventory of Executive Function – Adult Version. Lutz, FL: Psychological Assessment Resources.
Sabelli, A. G., Messa, I., Giromini, L., Lichtenstein, J. D., May, N., & Erdodi, L. A. (2021). Symptom versus performance validity in patients with mild TBI: Independent sources of non-credible responding. Psychological Injury and Law, 14(1), 17–36. https://doi.org/10.1007/s12207-021-09400-6
Sawyer, R. J., Testa, S. M., & Dux, M. (2017). Embedded performance validity tests within the Hopkins Verbal Learning Test - Revised and the Brief Visuospatial Memory Test - Revised. The Clinical Neuropsychologist, 31(1), 207–218. https://doi.org/10.1080/13854046.2016.1245787
Schroeder, R. W., & Marshall, P. S. (2010). Validation of the Sentence Repetition Test as a measure of suspect effort. The Clinical Neuropsychologist, 24(2), 326–343. https://doi.org/10.1080/13854040903369441
Sharf, A. J., Rogers, R., Williams, M. M., & Henry, S. A. (2017). The effectiveness of the MMPI-2-RF in detecting feigned mental disorders and cognitive deficits: A meta-analysis. Journal of Psychopathology and Behavioral Assessment, 39(3), 441–455. https://doi.org/10.1007/s10862-017-9590-1
Sharland, M. J., & Gfeller, J. D. (2007). A survey of neuropsychologists’ beliefs and practice with respect to the assessment of effort. Archives of Clinical Neuropsychology, 22(2), 213–223. https://doi.org/10.1016/j.acn.2006.12.004
Sherman, E. M. S., Slick, D. J., & Iverson, G. L. (2020). Multidimensional malingering criteria for neuropsychological assessment: A 20-year update of the malingered neuropsychological dysfunction criteria. Archives of Clinical Neuropsychology, 35(6), 735–764. https://doi.org/10.1093/arclin/acaa019
Shura, R. D., Martindale, S. L., Taber, K. H., Higgins, A. M., & Rowland, J. A. (2020). Digit Span embedded validity indicators in neurologically-intact veterans. The Clinical Neuropsychologist, 34(5), 1025–1037. https://doi.org/10.1080/13854046.2019.1635209
Shura, R. D., Ord, A. S., & Worthen, M. D. (2022). Structured Inventory of Malingered Symptomatology: A psychometric review. Psychological Injury and Law, 15(1), 64–78. https://doi.org/10.1007/s12207-021-09432-y
Shwartz, S. K., Roper, B. L., Arentsen, T. J., Crouse, E. M., & Adler, M. C. (2020). The Behavior Rating Inventory of Executive Function-Adult Version is related to emotional distress, not executive dysfunction, in a veteran sample. Archives of Clinical Neuropsychology, 35(6), 701–716. https://doi.org/10.1093/arclin/acaa024
Silva, M. A. (2021). Review of the Neurobehavioral Symptom Inventory. Rehabilitation Psychology, 66(2), 170. https://doi.org/10.1037/rep0000367
Slick, D., Hopp, G., Strauss, E., & Thompson, G. B. (1997). VSVT: Victoria Symptom Validity Test (Version 1.0). Odessa, FL: Psychological Assessment Resources.
Slick, D. J., Tan, J. E., Strauss, E. H., & Hultsch, D. F. (2004). Detecting malingering: A survey of experts’ practices. Archives of Clinical Neuropsychology, 19(4), 465–473. https://doi.org/10.1016/j.acn.2003.04.001
Smith, G. P., & Burger, G. K. (1997). Detection of malingering: Validation of the structured inventory of malingered symptomatology (SIMS). Journal of the American Academy on Psychiatry and Law, 25(2), 180–183.
Sugarman, M. A., & Axelrod, B. N. (2015). Embedded measures of performance validity using verbal fluency tests in a clinical sample. Applied Neuropsychology: Adult, 22(2), 141–146. https://doi.org/10.1080/23279095.2013.873439
Sweet, J. J., Heilbronner, R. L., Morgan, J. E., Larrabee, G. J., Rohling, M. L., Boone, K. B., Kirkowood, M. W., Schroeder, R. W., Suhr, J. A., & Participants, C. (2021). American Academy of Clinical Neuropsychology (AACN) 2021 consensus statement on validity assessment: Update of the 2009 AACN consensus conference statement on neuropsychological assessment of effort, response bias, and malingering. The Clinical Neuropsychologist, 35(6), 1053–1106. https://doi.org/10.1080/13854046.2021.1896036
Tarescavage, A. M., Wygant, D. B., Gervais, R. O., & Ben-Porath, Y. S. (2013). Association between the MMPI-2 Restructured Form (MMPI-2-RF) and malingered neurocognitive dysfunction among non-head injury disability claimants. The Clinical Neuropsychologist, 27(2), 313–335. https://doi.org/10.1080/13854046.2012.744099
Tierney, S. M., Webber, T. A., Collins, R. L., Pacheco, V. H., & Grabyan, J. M. (2021). Validity and utility of the Miller Forensic Assessment of Symptoms Test (M-FAST) on an inpatient epilepsy monitoring unit. Psychological Injury and Law, 14(4), 248–256.
Tombaugh, T. N. (1996). Test of memory malingering (TOMM). New York, NY: Multi Health Systems.
Trueblood, W. (1994). Qualitative and quantitative characteristics of malingered and other invalid WAIS-R and clinical memory data. Journal of Clinical and Experimental Neuropsychology, 14(4), 697–607. https://doi.org/10.1080/01688639408402671
Tsoi, K. K., Chan, J. Y., Hirai, H. W., Wong, S. Y., & Kwok, T. C. (2015). Cognitive tests to detect dementia: A systematic review and meta-analysis. JAMA Internal Medicine, 175(9), 1450–1458.
Tylicki, J. L., Rai, J. K., Arends, P., Gervais, R. O., & Ben-Porath, Y. S. (2021). A comparison of the MMPI-2-RF and PAI overreporting indicators in a civil forensic sample with emphasis on the Response Bias Scale (RBS) and the Cognitive Bias Scale (CBS). Psychological Assessment, 33(1), 71–83. https://doi.org/10.1037/pas0000968
Tyson, B. T., Pyne, S. R., Crisan, I., Calamia, M., Holcomb, M., Giromini, L., & Erdodi, L. A. (2023). Logical Memory. Visual Reproduction and Verbal Paired Associates are Effective Embedded Validity Indicators in Patients with Traumatic Brain Injury. Advance Online Publication. Applied Neuropsychology: Adult. https://doi.org/10.1080/23279095.2023.2179400
Tyson, B. T., & Shahein, A. (2023). Combining accuracy scores with time cutoffs improves the specificity of the Word Choice Test. Advance online publication.
Uiterwijk, D., Wong, D., Stargatt, R., & Crowe, S. F. (2021). Performance and symptom validity testing in neuropsychological assessments in Australia: A survey of practises and beliefs. Australian Psychologist, 56(5), 355–371. https://doi.org/10.1080/00050067.2021.1948797
Ursano, R. J., Fullerton, C. S., Epstein, R. S., Crowley, B., Vance, K., Kao, T. C., & Baum, A. (1999). Peritraumatic dissociation and posttraumatic stress disorder following motor vehicle accidents. American Journal of Psychiatry, 156(11), 1808–1810. https://doi.org/10.1176/ajp.156.11.1808
Van Dyke, S. A., Millis, S. R., Axelrod, B. N., & Hanks, R. A. (2013). Assessing effort: Differentiating performance and symptom validity. The Clinical Neuropsychologist, 27(8), 1234–1246. https://doi.org/10.1080/13854046.2013.835447
van Helvoort, D., Merckelbach, H., & Merten, T. (2019). The Self-Report Symptom Inventory (SRSI) is sensitive to instructed feigning, but not genuine psychopathology in male forensic inpatients: An initial study. The Clinical Neuropsychologist, 33(6), 1069–1082. https://doi.org/10.1080/13854046.2018.1559359
van Impelen, A., Merckelbach, H., Jelicic, M., & Merten, T. (2014). The Structured Inventory of Malingered Symptomatology (SIMS): A systematic review and meta-analysis. The Clinical Neuropsychologist, 28(8), 1336–1365. https://doi.org/10.1080/13854046.2014.984763
Vanderploeg, R. D., Cooper, D. B., Belanger, H. G., Donnell, A. J., Kennedy, J. E., Hopewell, C. A., & Scott, S. G. (2014). Screening for postdeployment conditions: development and cross-validation of an embedded validity scale in the neurobehavioral symptom inventory. The Journal of Head Trauma Rehabilitation, 29(1), 1–10. https://doi.org/10.1097/HTR.0b013e318281966e
Viglione, D. J., & Giromini, L. (2020). Inventory of Problems–29: Professional Manual. IOP-Test, LLC.
Viglione, D. J., Giromini, L., & Landis, P. (2017). The development of the Inventory of Problems-29: A brief self-administered measure for discriminating bona fide from feigned psychiatric and cognitive complaints. Journal of Personality Assessment, 99, 534–544. https://doi.org/10.1080/00223891.2016.1233882
Whiteside, D. M., Gaasedelen, O. J., Hahn-Ketter, A. E., Luu, H., Miller, M. L., Persinger, V., Rice, L., & Basso, M. R. (2015). Derivation of a cross-domain embedded performance validity measure in traumatic brain injury. The Clinical Neuropsychologist, 29(6), 788–803. https://doi.org/10.1080/13854046
Whitney, K. A., Davis, J. J., Shepard, P. H., Bertram, D. M., & Adams, K. M. (2009). Digit Span age scaled score in middle-aged military veterans: Is it more closely associated with TOMM failure than reliable digit span? Archives of Clinical Neuropsychology, 24(3), 263–272. https://doi.org/10.1093/arclin/acp034
Wiggins, C. W., Wygant, D. B., Hoelzle, J. B., & Gervais, R. O. (2012). The more you say the less it means: Overreporting and attenuated criterion validity in a forensic disability sample. Psychological Injury and Law, 5, 162–173.
Young, G. (2015). Malingering in forensic disability-related assessments: Prevalence 15±15%. Psychological Injury and Law, 8(3), 188–199. https://doi.org/10.1007/s12207-015-9232-4
Young, G. (2020). Thirty complexities and controversies in mild traumatic brain injury and persistent post-concussion syndrome: A roadmap for research and practice. Psychological Injury and Law, 13(4), 427–451. https://doi.org/10.1007/s12207-020-09395-6
Young, J. C., Sawyer, R. J., Roper, B. L., & Baughman, B. C. (2012). Expansion and re-examination of Digit Span effort indices on the WAIS-IV. The Clinical Neuropsychologist, 26(1), 147–159. https://doi.org/10.1080/13854046.2011.647083
Young, J. C., Roper, B. L., & Arentsen, T. J. (2016). Validity testing and neuropsychology practice in the VA healthcare system: Results from recent practitioner survey. The Clinical Neuropsychologist, 30(4), 497–514. https://doi.org/10.1080/13854046.2016.1159730
Funding
Open access funding provided by Università degli Studi di Torino within the CRUI-CARE Agreement.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Relevant ethical guidelines were followed throughout the project. All data collection, storage and processing was done with the approval of relevant institutional authorities regulating research involving human participants, in compliance with the 1964 Helsinki Declaration and its subsequent amendments or comparable ethical standards.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Nussbaum, S.H., Ales, F., Giromini, L. et al. Cross-Validating the Atypical Response Scale of the TSI-2 in a Sample of Motor Vehicle Collision Survivors. Psychol. Inj. and Law 16, 351–370 (2023). https://doi.org/10.1007/s12207-023-09487-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12207-023-09487-z