Response bias indicates distortion of symptom presentation, either in a negative (i.e., overreporting) or in a positive direction (i.e., underreporting), and it can be a product of conscious intent or a consequence of personality traits (e.g., fantasy proneness, Merckelbach, 2004). Intentional symptom overreporting can be driven by different types of motives, some internal (e.g., playing a sick role) and some external (e.g., compensation). When such behavior occurs due to the internal motives, it signals possible factitious disorder, whereas when external benefits are the motivators, this behavior is not a reflection of pathology but rather of deceptive behavior—malingering (Rogers & Bender, 2018). However, oftentimes the type of incentives is unknown, in which case feigning is the preferred term.

Measures serving the purpose of screening for negative response bias based on self-reported information are often referred to as symptom validity tests (SVTs; for a review, see Giromini et al., 2022). A recently developed stand-alone SVT is the Inventory of Problems-29 (IOP-29; Viglione et al., 2017), which includes 29 wide-range items capturing both the invalid performance and the invalid symptom endorsement qualities in symptom presentation. The items pertain to different psychiatric and cognitive issues, such as posttraumatic stress disorder (PTSD), depression, schizophrenia, and cognitive impairment. Until now, IOP-29 has been well researched in different contexts and with different types of symptom presentations, and the research outcomes have been invigorating (for a review see Giromini & Viglione, 2022; for a meta-analysis see Puente-López et al., 2023).

The topic of positive response bias, also known as impression management or defensiveness (see Rogers, 2018), has been investigated mainly in the context of self-reported personality traits during pre-employment evaluations (Griffin & Wilson, 2012; Lavashina, 2018; see also Paulhus, 2012). The results of these studies suggest that the prevalence of positive response bias may be higher than that of negative response bias (Rogers, 2018), although it is less well researched (Faust, 2023). When it comes to the symptom presentation and underreporting, the majority of measures are scales embedded in clinical instruments, such as the social desirability (L) scale of MMPI instruments (Baer & Miller, 2002; for meta-analyses see Picard et al., 2023). Looking at the stand-alone measures, the only test developed specifically to allow testing in both general and forensic populations is the Supernormality Scale (SS; Cima et al., 2003). The term supernormality conceptualizes systematic denial of common everyday symptoms, regardless of social norms, therefore differing from social desirability (Cima et al., 2003, 2008). The revised version of this scale (SS-R) consists of 50 items, 34 of which relate to mood, dissociation, aggression, and obsessive issues in the broadest sense, while the remaining items are bogus items designed to obscure the true goal of the test (Cima et al., 2008). The psychometric properties of the SS-R have been shown to be adequate in both general and forensic populations (Cima et al., 2008). However, not many studies have been carried out using the SS-R, and further investigation would be beneficial.

An important problem in the study of response bias is that over- and under-reporting of symptoms have largely been treated as behavioral opposites, as dichotomous phenomena (Sherman et al., 2020; Walters et al., 2008). Typically, the instructions that experimental studies employ in these domains reflect this idea of bipolarity. In simulation studies relevant to the forensic domain, instructions to overreport are commonly contrasted with instructions to respond honestly (e.g., Boskovic et al., 2022), and a similar approach is typical for simulation of pre-employment evaluation, where instructions to underreport are contrasted with instructions to respond honestly (e.g., Edens et al., 2001). In real-life, however, it happens that people sometimes engage in both over- and under-reporting of symptoms (e.g., Whitman et al., 2023). More specifically, on some occasions, people strategically tailor their reports and presentations to exaggerate some problems (i.e., overreporting) while concealing some other complaints (i.e., underreporting). This type of behavior, which we refer to as “mixed feigning,” has been well-documented in organizational psychology research focusing on job applicants (Levashina & Campion, 2007; Melchers et al., 2020) but has not been thoroughly investigated in the domain of symptom presentation.

Trauma Reports and Response Bias

PTSD is a cluster of symptoms occurring after a traumatic event (see DSM-5, American Psychological Association (APA), 2013; DSM-5-TR; American Psychiatric Association, 2022). The prevalence of PTSD mostly depends on the type of traumatic exposure, so the highest frequency of such diagnosis is often found among victims of sexual assault (up to 80%, Hall & Hall, 2006) and war veterans (up to 58%, Guriel & Fremouw, 2003). In the general population, approximately 15% of individuals exhibit PTSD (Hall & Hall, 2006). However, these prevalence figures should be taken with caution, as the formulation of a PTSD diagnosis may be especially susceptible to distorted representations of symptoms due to high familiarity of the general public with traumatic experiences, and due to wide media coverage of this disorder. This is especially true because diagnoses of PTSD, as majority of psychological complaints listed in DSM-5, are often based largely on subjective, self-reported symptoms (Resnick et al., 2018), which are easily and frequently modified or embellished (Burges & McMillan, 2001) in both clinical and forensic contexts.

The psychological assessment of PTSD is a commonly used context for the investigation of overreporting (Guriel & Fremouw, 2003). The estimated prevalence of deceptive symptom presentation ranges from 30 up to 50% of trauma reports (Freeman et al., 2008), and research involving students also confirmed the ease with which one could mispresent trauma-related symptomatology (Boskovic et al., 2019a, b). In contrast, positive response bias (i.e., underreporting) in trauma reports is a far less popular topic among researchers. This is confirmed by the lack of results when trying to find literature on the topic of symptom underreporting in PTSD.

While overreporting may lead to an inflated prevalence of trauma and PTSD in general, underreporting of symptoms, on the contrary, is likely to (falsely) lower it. The case in point are victims of sexual assault. As mentioned above, they are the most vulnerable to develop PTSD (Hall & Hall, 2006; Young, 2016), yet, sexual assault is commonly underreported trauma (REINN, 2016), especially among students (Boskovic et al., 2023; Wilson & Miller, 2016), which calls into question the prevalence of PTSD in this group. The forensic context includes many situations in which screening for underreporting could be of importance, such as child custody (see Baer & Miller, 2002) and parole hearings (Ruback & Hopper, 1986). Looking specifically for underreporting of PTSD symptoms may carry even more weight knowing that PTSD is highly comorbid with substance abuse (Brady et al., 2004), which is one of the most underreported problems (Lapham et al., 2001; Magura, 2010). Concealment of such behavior is important to detect, especially among professionals carrying weapons, for instance, police officers or military personal, who were already shown to exhibit underreporting of unfavorable personality traits (Jackson & Harrison, 2018; see also Lavashina, 2018).

As PTSD includes a variety of symptoms, a person does not necessarily have to either overreport or underreport all of them. One could pick and choose which symptoms to exaggerate and which to underreport so as to find an optimal level of self-presentation that would appear convincing yet functional. For instance, exaggerating intrusive symptoms but underreporting high arousal or irritation might lead to the most supportive reaction of the environment, including the assessor. Further, some symptoms of PTSD mostly refer to physical complaints, which, in certain culture, might also be easier to acknowledge or exaggerate due to lower stigma than when disclosing psychological issues (e.g., Gilmoor et al., 2019). Due to the selective symptom endorsement in such cases, we expect that this more subtle type of response bias (i.e., mixed reporting) might be the most difficult to detect using current SVTs. Specifically, most SVTs are based on the notion that feigners’ dominant response style is hyperbolism, i.e., a generalized form of symptom exaggeration (Boskovic, 2022). To our knowledge, no study so far has directly compared the detectability of the three types of response bias in trauma-related accounts. Therefore, this project was undertaken with a specific aim to address this gap.

Current Study

To evaluate how well different forms of response bias can be detected, we employed a simulation design with four conditions. In the first phase of the study, all participants were screened for PTSD symptoms. Then, by random allocation, students were instructed to either respond honestly (i.e., control group), or they received a vignette depicting a situation in which exhibiting either (1) overreporting, or (2) underreporting, or (3) a mixed strategy (i.e., simultaneous over- and under-reporting) would be beneficial (see Materials). Participants were then assessed for PTSD symptoms (PTSD checklist; PCL-5; Weathers, 2008), overreporting (IOP-29) and underreporting (Supernormality Scale-Revised). Based on the research findings thus far, we expected that the responses of mixed strategy group would be well calibrated, hence, indistinguishable from the honest group, whereas the other two forms of responding (over- and under-reporting) will be detected to a higher degree. Conversely, we anticipated a higher degree of detectability for the other two response types (over- and under-reporting).

Method

Participants

An a priori G*power analysis, with alpha set at .05, beta at .80, and opting for a medium-size effect (f = .25), indicated a required sample size of 180 participants. Thus, our initial sample consisted of 189 undergraduates. However, some participants had to be removed from the dataset due to our exclusion criteria: (a) failure to complete all questions (nexcluded = 7), (b) not giving permission to use data (nexcluded = 2), (c) failure to provide a detailed elaboration of the task at the end of the study (nexcluded = 11), (d) failure to pass attention checks within each measure (nexcluded = 17Footnote 1), (e) and being younger than 18 (nexcluded = 1; see Procedure). As such, the final sample consisted of 151 undergraduates in their 20 s (M = 20, SD = 2.75), mostly women (81.5%). The most commonly reported nationalities were Dutch (45%), German (10%), and Polish (5%). Participants’ self-reported English proficiency on a 5-point Liker scale was overall high (M = 4.46, SD = .67).

As noted above, participants were randomly assigned to one of four groups: control (n = 40), overreporting (n = 37), underreporting (n = 36), and mixed presentation (n = 38). These groups did not differ in terms of age (F(3, 147) = .406, p = .749) or English proficiency (F(3, 147) = .806, p = .492).

Measures and Materials

Brief Symptom Inventory (BSI-18; Derogatis, 2001)

The BSI-18 includes 18 items that tap into symptoms of depression, anxiety, and somatization, measuring the individuals’ overall level of psychological distress during the last 7 day. The response format is a 5-point scale, ranging from 0 (not at all applicable) to 4 (extremely). The range of the total score, hence, is from 0 to 72, with a higher score indicating the presence of a higher level of distress (Derogatis, 2001). The Cronbach’s alpha for BSI-18 in this study was .93.

Severity of Posttraumatic Stress Symptoms-Adult (National Stressful Events Survey PTSD Short Scale (NSESSS; Kilpatrick et al., 2013)

The NSESSS contains nine items and is used to evaluate the severity of individuals’ PTSD-related symptoms during the past week based on its description in the DSM-5 (APA, 2013). The response format is a 5-point Likert-like scale, ranging from 0 (not at all) to 4 (extremely). The minimum score to be obtained is 0 and the maximum 36 with higher scores indicating increased severity of PTSD. The NSESSS is a reliable measure with proven high internal consistency and convergent validity in a non-clinical sample (LeBeau et al., 2014). In this study, this measure was employed in the pre-screening phase. The Cronbach’s alpha of NSESSS in this study was .89.

PTSD Checklist for DSM-5 (PCL-5; Weathers, 2008)

The PCL-5 consists of 20 items that measure the presence and severity of PTSD criteria according to the DSM-5. Participants do not have to provide any information regarding a traumatic event but just to respond to the list of symptoms and whether (and in which intensity) they were present during the last month. The response format is a 5-point Likert scale, ranging from 0 (not at all) to 4 (extremely). The total score hence ranges from 0 to 80, with higher score indicating higher severity of PTSD symptoms. A score higher than 33 is suggestive of probable PTSD within general population samples (Weathers, 2008; see also www.ptsd.va.gov). The Cronbach’s alpha of PCL-5 was .96. In order to secure that participants were paying attention while filling out this questionnaire, we added two items that served as attention checks (“Please select Quite a bit/or Not at all/ if you are reading this”). Because seven participants failed these checks for inattentive responding, their data were removed from the dataset. We acknowledge that PCL-5 is rarely used as a stand-alone assessment measure and is mostly combined with Clinician-Administered PTSD Scale for DSM-5 (CAPS-5; Weathers et al., 2013). Importantly, research has confirmed appropriate psychometric properties of PCL-5 and strong association between the two measures (Boyd et al., 2021; Lee et al., 2022; Roberts et al., 2021).

Supernormality Scale-Revised (SS-R; Cima et al., 2008)

The SS-R consists of 50 items, and it is employed to evaluate the tendency to underreport symptoms (i.e., supernormality). The items describe common everyday problems and it is expected that participants endorse the majority of them. Supernormality is detected when respondents endorse the “not applicable” option on a large scale and systematically deny everyday problems in an attempt to appear “supernormal.” The response format is a 4-point Likert scale, with 1 (not applicable at all) to 4 (extremely applicable); lower scores are indicative of a stronger tendency toward supernormality. The total score is calculated as a sum of responses (with two items having revised coding). The SS-R was shown to have acceptable sensitivity and specificity, with a proposed cutoff score of 60 (Cima et al., 2008). Two checks for inattentive responding were added (“Please select Applicable/ or Not Applicable/ if you are reading this”), and 13 participants did not provide appropriate answers and were excluded from the dataset. The Cronbach’s alpha of SS-R was .88.

Inventory of Problems-29 (IOP-29; Viglione et al., 2017)

The IOP-29 is a relatively new measure designed to differentiate between genuine or credible symptom presentations and symptom overreporting, related to a variety of psychological problems that include trauma-related complaints, cognitive/neuropsychological, psychotic, and depression-related symptoms. It includes 26 self-report items and three cognitive subtasks. For most of the items, the response options are in the form of “true,” “false,” and “doesn’t make sense,” the latter being a novelty for SVTs and a unique feature of the IOP-29 (Viglione et al., 2017). In this study, we also included one attention check on the basis of which three participants were excluded (“To this item respond with T”). Also, because the computation of the IOP-29 feigning score requires participants to be 18 or older, one participant was excluded due to their young age.

The scoring of the 29 IOP-29 items generates the False Disorder Probability Score (FDS), which in a recent quantitative literature review inspecting IOP-29 results from 3777 protocols yielded an average sensitivity of .86 and an average specificity of .92 when using the standard cutoff score of FDS ≥ .50 (Giromini & Viglione, 2022). However, it should be noted that most of the studies included in this review article used a simulation design, and almost half of these simulation studies included non-clinical controls rather than clinical controls. Therefore, these results might overestimate the true accuracy of the IOP-29. Still, Holcomb et al. (2023), studying a sample of 150 clinically referred individuals for neuropsychological assessment, found that the IOP-29 outperformed the Negative Impression Management (NIM) scale of the Personality Assessment Inventory (Morey, 1991, Morey et al., 2007) in predicting performance (in)validity (rIOP-29 = .34 Vs. rPAI NIM = .06; z = 2.50, p < .01).Also, the results of a recently published criterion-groups study inspecting a dataset of 174 court-ordered psychological evaluations using the SIMS and MMPI-2-RF as criterion variables supported the effectiveness of the IOP-29. That study found Cohen’s d effect size values ranging from 1.70 to 2.67, depending on the criterion (Roma et al., 2023). Accordingly, when designing our research project, we considered the IOP-29 to be an adequate measure to include in our study.

Vignettes

Four different instructions were randomly administered to participants. One involved responding honestly (i.e., the control group), while the other three were created for the simulation groups: overreporting, underreporting, and mixed strategy (see Appendix). As our participants were psychology students, the instructions were created having in mind the type of experiences they could relate to. The vignette content was driven by prior research on response bias in student populations. Specifically, based on studies on symptom overendorsement among students (Boskovic, 2020), we decided that for the overreporting group depicting an exam context would be highly motivational. For the underreporting group, we followed the work of Lavashina (2018), in which this type of responding is primarily connected to job applicants. Hence, we described a job-seeking situation. In order to elicit a mixed strategy, we stayed with the job-seeking context but included additional background information that would encourage students to overreport trauma-related complaints, at least to some degree.

The overreporting group received a vignette in which a protagonist is a student very likely to fail the exams unless they are given an extension. The extension might be achieved by pretending to experience trauma-related symptoms, which are exceptionally distressful to the protagonist. It should be noted that these instructions informed participants that they had the possibility of receiving an extension if they fabricated or exaggerated trauma-related problems, but they did not explicitly encourage them to engage in this behavior.

The underreporting group was asked to imagine being a protagonist who is fresh out of the university and wants to get employed in an “old school” trauma institution that has strict expectations from their employees, such as being professional first and human second. In this case, participants were explicitly invited to “find an optimal way to present [themselves] in order to impress them and score this job”.

The mixed strategy group received similar instructions as the underreporting group, except that the institution they are job hunting at is a modern “new age” trauma center, where having some personal experience and understanding of trauma is considered more important than being a professional. More specifically, they were told that in this institution, “empathy and personal difficult experiences are highly cherished”. As in the underreporting group, the mixed feigning group also was explicitly encouraged to find an optimal way to present themselves in order to be hired.

In all three vignettes, the end part referred to the assessment they needed to go through and requested participants to imagine that the tests given in this study were part of that official procedure. Participants were explicitly warned not to overdo their presentations as they would be caught lying. They were also told that the most convincing presentation would be rewarded with a €10 voucher.

Procedure

This study was conducted online, using Qualtrics. The link for the study was provided on the university research participation platform from where they could sign up for the study, after which they received the Qualtrics link. After the information about the study and the informed consent, participants were presented with demographic questions about their age, gender, education, and English proficiency. Then, participants received pre-screening measures (BSI-18 and NSESSS), enabling us to check for potential differences in mental health between the groups. After filling out the measures, participants were randomly (pre-set by Qualtrics configuration) assigned to the control group, the overreporting group (academic extension), the underreporting group (old school trauma institution), and the mixed strategy group (new age trauma center). The first group was just given the instructions to respond honestly, whereas the three remaining groups received vignettes that included information about the context they needed to imagine being in. Following the instructions, they were asked to complete the PCL-5, the SS-R, and the IOP-29. The presentation order of SS-R and IOP-29 was randomized. After completing these scales, participants were told to respond honestly to exit-questions, which were about their motivation, the clarity of instructions, and the difficulty of the task. Finally, participants were debriefed and rewarded student credits (0.5 credits for 30 min) for their participation, and randomly chosen participants received additional monetary award. The study was approved by the standing Ethical committee of Erasmus University Rotterdam, the Netherlands.

Data Analyses

Potential differences between the conditions were inspected using Analyses of Variance (ANOVA), with Bonferroni post hoc tests. The data was also analyzed using the non-parametric alternatives, but, as the outcomes did not differ from the outcomes of ANOVA, we opted to present the parametric results. The data and the outputs of our analyses are uploaded on the Open Science Framework (OSF) platform (anonymous view link: https://osf.io/huw67/?view_only=b73b5cea14b645ee8a2c58fc1ad80f5b).

Results

Motivation, Clarity of Instructions, and Difficulty of the Task

The four groups reported moderate motivation (M = 3.34, SD = .80), with no differences between groups in this aspect, F(3, 147) = .89, p = .449. They did, however, provide significantly different scoring on the clarity of instructions, F(3, 147) = 3.17, p = .026, and on the difficulty of the task, F(3, 147) = 2.97, p = .034. More specifically, post hoc Bonferroni tests indicated that significant differences in clarity and difficulty were evident when comparing the control and overreporting group (p = .038, ɳ2 = .057 and p = .036, ɳ2 = .061, respectively) (For descriptives, see Table 1).

Table 1 Mean scores of groups on motivation, clarity of instructions, and difficulty of the task

Pre-screening Measures: Distress and PTSD Symptoms

To control for potential group differences in terms of a priori psychopathology, we examined students’ general distress levels (BSI-18) and PTSD-related symptoms (NSESSS). Overall distress was moderate (M = 15.87, SD = 12.35; range 0–68), and the level of PTSD symptoms was on the lower end (M = 8.18, SD = 7.29; range 0–33). The four groups did not significantly differ on these measures, F(3, 147) = .36, p = .782 and F(3, 147) = .501, p = .682, respectively (for details, see Supplemental Table 1).

Group Differences in Reported PTSD Symptoms (PCL-5)

Looking at the number of endorsed items on the PCL-5, the overall effect of group was found to be significant, F(3, 147), = 34.71, p < .001, ɳp2 = .41. More specifically, post hoc Bonferroni checks revealed statistically significant differences between the control and the overreporting groups (p < .001, Cohen’s d = 1.42), whereas the difference between the control and underreporting groups attained exactly the p = 0.050 level with an associated Cohen’s d of .75. No statistically significant difference was found between the control and mixed strategy groups (p = 1.00, Cohen’s d = .30). The overreporting group endorsed significantly more items than the underreporting (p < 0.001, Cohen’s d = 2.27) and mixed strategy groups (p < .001, Cohen’s d = 1.20), and the underreporting group had a significantly lower scores than the mixed strategy group (p < .001, Cohen’s d = 1.18; for details see Table 2). For results regarding groups’ scores on the separate symptom domains, please see Supplemental Table 2.

Table 2 Mean scores of groups and scores beyond the cutoff points on PTSD Checklist (PCL-5), Supernormality Scale (SS-R), and Inventory of Problems-29 (IOP-29)

Using the cutoff score of > 33, 20% of the control group (n = 8) obtained scores indicative of risk for PTSD, against 76% of the overreporting condition (n = 28), 2.7% of the underreporting group (n = 1), and 26.3% of the mixed strategy participants (n = 10).

Group Differences in Supernormality (SS-R)

Significant group differences were also evident for the SS-R, F(3, 147) = 13.78, p =  < .001, ɳp2 = .22. More specifically, post hoc tests showed that the control group exhibited significantly more supernormality than the overreporting group (p = .004, Cohen’s d = .85) but less than the underreporting group (p = .014, Cohen’s d = .67). The control group and the mixed strategy group attained comparable supernormality levels (p = 1.00, Cohen’s d = .12). Further, the overreporters presented significantly lower levels of supernormality than the underreporting (p < .001, Cohen’s d = 1.43) and mixed strategy groups (p = .025, Cohen’s d = .71). Finally, the underreporting and mixed strategy groups also differed significantly from each other (p = .003, Cohen’s d = .77), with the underreporters exhibiting higher levels of supernormality.

Using the SS-R cutoff of < 60, five participants in the control condition (12.5%) engaged in a supernormal presentation, and one participant in the overreporting group obtained score of 60 (2.7%). Among underreporting participants, 14 exhibited supernormality (39%), while in the mixed strategy condition two students did so (5.3%; see Table 2).

Group Differences in Overreporting (IOP-29)

The FDS index was significantly different across the conditions, F(3, 147) = 30.10, p < .001, ɳp2 = .38. More specifically, the control group showed significantly lower scores than the overreporting group (p < .001, Cohen’s d = 1.57), but a very similar score to those of the underreporting (p ≈ 1.00, Cohen’s d = .14) and mixed strategy (p ≈ 1.00, Cohen’s d = .14) groups. The overreporting group, as expected, obtained higher FDS scores than the underreporting (p < .001, Cohen’s d = 1.65) and mixed strategy (p < .001, Cohen’s d = 1.28) groups, whereas underreporting participants and those employing mixed strategy did not differ from each other (p = 1.00, Cohen’s d = .37).

Applying the cutoff score of FDS ≥ .50, one participant in the control group exhibited a noncredible symptom presentation (2.5%), 17 overreporters had FDS values ≥ 0.50 (46%), one participant in the underreporting condition obtained borderline score (FDS = .54; 2.7%), and three participants of mixed strategy group had FDS values ≥ 0.50 (7.8%; see Table 2).

Correlation Between Pre-screening Scores and Post-manipulation PTSD Reporting

To inspect how preexisting levels of distress and trauma-related complaints reflected on students’ responding post-manipulation, we ran Pearson product-moment correlations (r) between the pre-screening scores (BSI-18 and NSESSS) and post-manipulation score (PCL-5) for each condition separately. For the control group (n = 40), the correlation between the BSI-18 and PCL-5 were high and significant, r = .82, p < .001. For NSESSS and PCL-5, the correlations were similarly robust with r being .93, p < .001. For the overreporting condition (n = 37), the correlation between BSI-18 and PCL-5 remained non-significant, r = .24, p = .15, whereas that between NSESSS and PCL-5 was significant, albeit of modest size, r = .35, p = .04. The pre-screening scores on BSI-18 and NSESSS of the underreporting group (n = 36) were moderately and significantly related to the PCL-5, r = .37, p = .03 and r = .44, p = .008, respectively. Lastly, for the mixed strategy condition (n = 38), the correlation pattern resembled that found in the control condition, with BSI-18 and PCL-5 correlating at r = .65, p < .001, and NSESSS and PCL-5 correlating at r = .65, p < .001.

Discussion

In this experimental simulation study, we examined a number of different response strategies and related results on some relevant test scores. In addition to the commonly tested extreme points of the response bias spectrum—overreporting and underreporting—we also included an often-overlooked type of response bias—simultaneous over- and under-reporting (i.e., mixed strategy condition). Our findings can be summarized as follows: first, it is worth noting that participants’ scores on the PTSD checklist differed significantly in the order we expected, with the underreporting group obtaining the lowest scores, followed by the control and mixed strategy groups, and then by the overreporting group, which exhibited the highest levels of PTSD symptoms. Important to note is that, based on the pre-screening scores (BSI and NSESSS), there were no differences between groups in the levels of distress and PTSD-like symptoms prior to the instructions, meaning that the group differences on PCL-5 can be explained by our manipulation. Interestingly, 20% of the control group (i.e., honest participants) obtained scores indicative of clinical levels of PTSD, which fits well with the prevalence of trauma among students shown in a large multisite study (21%, Frazier et al., 2009; Sharp & Theiler, 2018; see also Stallman, 2010). The pattern in the mixed strategy condition was very similar to those of the control group, with 26% providing PCL-5 scores above the cutoff. This suggests that simultaneously using both over- and under-reporting might result in a clinical test profile that looks authentic. The majority of the overreporting group crossed the PCL-5 cutoff (76%), and, to our surprise, one participant in the underreporting condition obtained PCL-5 score above the screening point. As participants had to pass the attention checks and to provide a proper elaboration of their task to be kept in the dataset, it is unlikely that this participant was just responding randomly. It is, for instance, possible that this participant might have experienced a high baseline of PTSD symptoms.

Second, unlike all other group comparisons for this variable, the supernormality scores of the control and mixed strategy group were rather similar, again showing that instructing participants to be selective in the features they need to exaggerate and to hide might lead to a balanced presentation. Supernormality levels were, as expected, the highest for the underreporting group and the lowest for the overreporting group, while the control and mixed strategy groups stayed in the middle-range. The results of the control participants, of whom 12.5% attained a score indicative of underreporting, signal that a non-trivial minority of our students spontaneously engaged in supernormality. Our findings align with previous research showing that some students conceal their mental and emotional issues (Martin, 2010), thereby obscuring prevalence estimates of serious symptoms and disorders that this population might be particularly vulnerable to.

Third, with regard to the IOP-29 scores, the overreporting group obtained the highest FDS values, whereas all other groups did not significantly differ from each other. As this measure was specifically designed to detect feigned psychiatric and/or cognitive problems (Viglione et al., 2017; for a review, see Giromini & Viglione, 2022), it is not surprising that the control and underreporting groups did not differ from each other. Regarding the mixed strategy group, it is possible that the instruction to only partially overreport while still appear functional resulted in psychological portrayals that were too subtle for the IOP-29 to classify as noncredible. Indeed, the reference to “empathy and personal difficult experiences [being] highly cherished” is more likely to solicit the feigning of a history of PTSD rather than the feigning of ongoing PTSD complaints, which explains the absence of elevations on the IOP-29. This explanation is also consistent with the fact that the PCL-5 scores of this group were very similar to those of the control group.

With regard to the overreporting group, from which less than half crossed the FDS cutoff point, it should be emphasized that the PCL-5 and IOP-29 scores observed in this study were considerably lower than in many other published research studies (e.g., Blavier et al., 2023; Carvalho et al., 2021; Szogi & Sullivan, 2018). For example, in Szogi and Sullivan (2018), the average PCL-5 scores for the PTSD feigning groups ranged from 54.66 (SD = 12.66) to 61.04 (SD = 12.09), whereas our overreporting group had a notably lower mean PCL-5 score of 43.43, with a remarkably higher SD of 19.24. Along similar lines, while in our study the percentage of overreporters with an IOP-29 FDS ≥ .50 was only 46%, according to recent meta-analytic (Puente-López et al., 2023) and quantitative literature review (Giromini & Viglione, 2022) studies, the sensitivity of IOP-29 FDS ≥ .50 to feigned psychopathology is likely to range from 82 to 86%. Although we do not have a valid explanation for this unexpected finding, it is likely that the specific instructions we used in this study did not clearly enough convey the message that participants were supposed to feign PTSD. Indeed, as noted above, although our instructions informed participants that they had the option of receiving an extension if they faked or exaggerated trauma-related problems, they might have been too subtle, as we did not explicitly encourage them to engage in this behavior. Future studies with more explicit instructions would thus be beneficial to address this issue.

In any case, the main message of this study is that instructions leading to both over- and under-reporting of trauma-related symptoms are likely to result in a psychometric profile that is quite similar to the profile of a group instructed to respond honestly. Pending future replications, thus, it is possible that such mixed strategy may help individuals to evade detection on symptom validity tests. Accordingly, three new considerations are to be made: first, from a methodological point of view, experimental studies on the diagnostic accuracy of symptom validity tests should consider including a mixed strategy group. Second, having clinical implications in mind, we need to know more about the settings that elicit this type of mixed strategy, so further investigation is necessary (see also Whitman et al., 2023). Third, diagnosticians and researchers in symptom validity domain are well advised to assess both over- and under-reporting, even if their primary goal is to detect feigned or exaggerated mental health problems. We further argue that, despite the room for improvement of currently available measures, inspecting a tendency to exaggerate and to conceal psychological and physical complaints significantly enhance the accuracy of any health assessment, and we encourage practitioners to include SVTs in their test battery.

Limitations

The above presented results need to be considered in the context of limitations of this study. First, although our original sample met the requirements of power analyses, due to multiple exclusion criteria, our sample was slightly lower than needed. Therefore, future investigation should include larger samples. This is all the more relevant issue considering that the standard deviations on all of the measures indicated large variability of presentations. Also, it is important to note that our participants were highly proficient in English, but most of them were not native English speakers. Second, as we included young (mostly female), functional adults in our study, and given the experimental context of our study which notably limits its ecological validity, the generalization of our findings, especially to the forensic population, is severely constricted. Third, the study was conducted online and, despite multiple attention checks, we cannot be sure that our participants were not distracted during their participation. Fourth, the inclusion of the Supernormality Scale-Revised might not have been the best option to measure underreporting of PTSD, as its items are broad and do not specifically refer to this type of complaint. However, currently there are no other stand-alone validity measures for the detection of positive response bias in forensic context. Therefore, we urge our peers to devote more attention to this existing gap in our field. Fifth, although we used the task elaboration as an inclusion criterion, we cannot ensure that all of our participants understood or complied with given instructions. Due to the random allocation of participants, it is possible that some, already exhibiting high levels of PTSD-like symptoms, were also given instructions to exaggerate them, and then some participants without any complaints to underreport them. Further, our instructions might have been too flexible, giving participants the opportunity to decide what is the best way to exhibit response bias, thereby introducing room for individual degrees of freedom in how they would approach over- or/and under-reporting. Further, we did not inspect the presence of response bias prior to the manipulation. Thus, we do not know whether a habitual individual responding style might have confounded our results (but see Van Helvoort et al., 2022). Yet, looking at the correlations between the pre-screening scores and post-manipulation test results suggests that the manipulation impacted participants’ responding style. Specifically, the correlations for the manipulation groups were significantly lower than those in the control condition, with the exception of mixed strategy group for which the scores on two employed measures were associated to a higher degree. Finally, we failed to have in-depth exit interviews with participants in the mixed strategy condition. Therefore, it is difficult to determine whether they balanced their biased presentation or just opted to respond honestly in order to resolve their confusion about exhibiting both underreporting and overreporting tendencies. Future investigation of this type of responding might want to address the symptom profiles on the PCL-5 of mixed strategy groups. It may well be the case that certain PTSD symptoms (i.e., domains) lend themselves better to either under- or overreporting. Because of the limited sample size and a high degree of freedom in our instructions, we refrained from a thorough analysis of the interactions with symptom profiles, but it obviously is a topic that warrants further investigation. Still, as this topic has not been investigated before, we hope our findings encourage others to continue further investigation of different types of response bias that are exhibited simultaneously. It is likely that PTSD claims, including the reflection on the comorbidities, such as substance use and anger-control issues, might be particularly fitting for testing the mixed response strategy.

Conclusion

This is the first attempt to systematically compare multiple types of response bias in trauma symptom reports (overreporting, underreporting and mixed strategy reporting). Our findings regarding symptom presentation showed the anticipated trend of results: overreporting group exhibited the worst symptom presentation, underreporters the best, and mixed strategy condition provided the most balanced symptom reports, very much like that of the control group. Thus, the simultaneous over- and under-reporting of symptoms can lead to presentations resembling those of honest responders, not only on clinical measures but also on symptom validity tests. Arguably, given the clinical implications, this is an issue that deserves more study, first and foremost outside the laboratory.