Psychological Injury and Law

, Volume 10, Issue 4, pp 358–367 | Cite as

Standard Symptom Inventories for Asylum Seekers in a Psychiatric Hospital: Limited Utility Due to Poor Symptom Validity

  • Douwe van der Heide
  • Irena Boskovic
  • Harald Merckelbach


We examined symptom validity in two samples (Ns = 27 and 35) of asylum seekers who had been admitted to a psychiatric facility. Considerable proportions over-endorsed atypical symptoms (63 and 83%, respectively) and underperformed on a simple forced-choice task requiring the identification of basic emotions (41 and 71%, respectively). Over-endorsement and underperformance were unrelated to Dutch language proficiency but were related to raised scores on standard symptom inventories commonly used to assess psychiatric symptoms of asylum seekers. This pattern of findings casts doubts on attempts to monitor symptom severity and treatment progress in psychiatric asylum seekers without taking symptom validity into account.


Refugees Asylum seekers Symptom validity tests Symptom over-reporting Underperformance 

Various groups are known to be at risk for symptom over-reporting. Social security disability claimants (Griffin, Normington, May, & Glassmire, 1996) and defendants who claim to suffer from psychiatric symptoms are well-studied examples (Rogers, Sewell, & Goldstein, 1994). Asylum seekers may also engage in symptom over-reporting. Many Western countries have regulations that grant asylum seekers a permit to stay when there are humanitarian or medical reasons, which may apply when applicants suffer from mental disorders (European Council on Refugees and Exiles, 2012; Meffert, Musalo, McNiel, & Binder, 2010). Even after a refugee status has been granted, a psychiatric diagnosis can be advantageous in procedures pertaining to housing, family reunion, and naturalization (Immigration Naturalization Services, 2016; Storm, 2003). Apart from deliberate symptom exaggeration, over-reporting may be the result of an inability to articulate symptoms (i.e., alexithymia; see Söndergaard & Theorell, 2004), careless responding, confusion/misunderstanding, catastrophizing, negative impression management, or feigning in general without the ability to know the conscious or unconscious reasons that might be involved; neither of these categories is mutually exclusive (see for an extensive discussion: Young, 2014). Whatever the reasons for an over-reporting style, the phenomenon in and of itself will likely affect how patients respond to standard clinical instruments administered to them for diagnostic purposes.

Over the past two decades or so, several tests have been developed that can effectively screen for distorted symptom presentation (see for reviews, e.g., Smith, 2008; Sollman & Berry, 2011; Young, 2014). Some authors (e.g., Weiss & Rosenfeld, 2017) have referred to these instruments as feigning measures. However, as these tests are often not able to clarify whether distorted symptom presentation is intentional or not, we prefer the more neutral label of symptom validity tests (SVTs). There are two types of SVTs (Egeland, Andersson, Sundseth, & Schanke, 2015): self-report tests consisting of rare symptoms that intend to measure over-endorsement and cognitive tasks that assess underperformance. Both types have proven their added value; in the absence of SVTs, clinicians are poor in determining to what extent patients’ symptom presentation is distorted (Dandachi-FitzGerald, Merckelbach, & Ponds, 2017).

Unfortunately, the extant literature on the cross-cultural stability of SVTs is limited (see, for a review, Nijdam-Jones & Rosenfeld, 2017). Some SVTs have been translated into other languages and appear to function adequately in ethnic groups for which they were not originally designed (e.g., DuAlba & Scott, 1993; Geraerts et al., 2009; Montes & Guyton, 2014; Vilar-López et al., 2007). However, with regard to the culturally highly diverse group of asylum seekers, there is a paucity of studies (but see Weiss & Rosenfeld, 2017).

The current study addresses this target group, more specifically asylum seekers in a Dutch psychiatric facility. We reasoned that if SVT outcomes would be intimately associated with the presence of positive incentives (e.g., seeking a permit to stay, needing exemption from the language test required for naturalization) rather than Dutch language proficiency, this would provide initial support for interpreting SVT failures in terms of distorted symptom presentation. In that case, clinicians would be well-advised to interpret data obtained with standard clinical instruments with caution, keeping at the same time in mind that the presence of positive incentives per se does not necessarily imply that symptom over-reporting reflects feigning.

In mental health facilities that specialize in asylum seekers, one limitation associated with SVTs—and, indeed, all psychodiagnostic tests and structured interviews for that matter—is that optimal translations of instruments for culturally highly diverse groups are often not available. Thus, it is standing practice that professional interpreters are involved in on the spot translations of test items that are read out by clinicians during the diagnostic evaluation. This approach may compromise validity due to subtleties in items and answers that are easily lost during live interpretations (Bot, 2005). Unlike a clinical interview, where follow-up questions can help clarify responses or make sure that the patient understood the items, standardized instruments are intended to be administered in a fixed format without any interruptions by the clinician. On the spot translations may result in data that cannot be compared to existing normative data. This raises the question whether deviant scoring on SVTs can be attributed to this partially nonstandardized method of presentation. In an attempt to control at least partially for this significant confounder, we reasoned that one would expect asylum seekers with good Dutch proficiency to pass SVTs more often than those with limited proficiency, because the first group depends less on the translations of the interpreter. That is, compared with their low proficiency counterparts, the high proficiency group would be better able to understand the original text read out aloud by the staff member presenting the test items. It stands to reason that this group is also more acculturated to Dutch society and has more experience in taking Western-style tests (which are common during Dutch language courses).

In their pilot study, Van der Heide and Merckelbach (2016) administered SVTs to asylum seekers who were treated in a psychiatric facility. Specifically, with the help of professional translators, the authors administered items taken from the Structured Inventory of Malingered Symptomatology (SIMS; Smith & Burger, 1997) and a forced-choice task modeled after Morel’s Emotional Numbing Test (MENT; Morel, 1998). Both symptom over-reporting on SIMS items and underperformance on the forced-choice task occurred on a non-trivial scale (i.e., rates between 41 and 87%) and were not related to Dutch language proficiency. However, the various subsamples in this study were small and standard self-report instruments to index psychiatric symptoms were not included. In the current study, we explored whether deviant performance on SVTs goes hand in hand with escalated symptom endorsement on standard diagnostic instruments among refugee patients. Specifically, we were interested in scores on the Dissociative Experiences Scale (DES; Bernstein & Putnam, 1986) and the Harvard Trauma Questionnaire (HTQ; Mollica et al., 1992). Both DES and HTQ are widely used in refugee communities throughout the world (Carlson & Rosser-Hogan, 1993; Favaro, Maiorani, Colombo, & Santonastaso, 1999; Mollica et al., 1992; Shoeb, Weinstein, & Mollica, 2007). In the first sample, we related SVT outcomes to the DES, while in the second sample, SVT outcomes were related to the HTQ. In both samples, we took Dutch proficiency and the presence of incentives into account.



We recruited two inpatient samples from a psychiatric facility with 32 beds for inpatient treatment of asylum seekers. This facility is part of a general psychiatric hospital in the Netherlands and serves as a national referral center for non-forensic refugee mental health. As a rule, patients are referred after usual outpatient treatments have been ineffective. Most of their symptoms can be classified as part of a posttraumatic stress disorder (PTSD), but many patients also present affective, dissociative, or psychotic symptoms secondary to trauma.

Both samples are here referred to as “asylum seekers” but actually included asylum seekers whose procedure was still ongoing as well as former asylum seekers with a refugee status and a residence permit. All inpatients present in the clinic during the time frame of the study were included, except for a few patients who refused informed consent or were unable to give such consent because of severe disorientation. During the time frame of the study, the average length of admission varied between 6 and 9 months. Data in the first sample were obtained during a 2-month period in 2009; data in the second sample were obtained during a 5-month period in 2010.

Sample 1

The first sample included 27 asylum seekers (70% men). Mean age was 34.5 years (range = 20–51 years), mean level of education 10.7 years (SD = 4.8). All of them were given the DES items (see below). Eleven patients (41%) came from Africa, eight (30%) from the Middle East, four (15%) from the Far East, three (11%) from the former USSR, and one (4%) from former Yugoslavia.

Ten asylum seekers (37%) had a low level of proficiency in Dutch, the language of the host country (see below). Nine (33%) had an intermediate level and eight (30%) were advanced students. Six respondents (22%) had a negative incentive, nine (33%) had mixed incentives, and 12 (44%) a positive incentive (see below).

Sample 2

The second sample consisted of 35 patients (80% men), who were interviewed with the HTQ (see below); besides asylum seekers, this sample included one regular non-Western migrant with a history of severe psychological trauma. Mean age was 27 years (range = 15–55 years); mean level of education was 6.4 years (SD = 5.2); five patients (14%) originated from the former USSR, 22 (63%) from Africa, six (17%) from the Middle East, and two (6%) from the Far East.

Eighteen (51%) patients had a poor proficiency in Dutch; 12 (34%) an intermediate proficiency and only one (3%) was classified as an advanced student. In four cases (11%), the assessment was missing. Five patients (14%) had a negative incentive, another five (14%) mixed incentives, and 25 (71%) a positive incentive.


The Dissociative Experiences Scale (Sample 1)

The DES (Bernstein & Putnam, 1986) is a self-report scale that requires participants to indicate on 100 mm visual analogue scales (anchors: 0 = never; 100 = always) to what extent they experience 28 dissociative experiences in daily life (amnesia, depersonalization, absorption). A typical sample item is: “Some people find that sometimes they are listening to someone talk and they suddenly realize that they did not hear part or all of what was just said. Mark the line to show what percentage of time this happens to you.”

Items of the Dutch DES (Boon & Draijer, 1993) were presented to patients by a sixth year medical intern with the help of a professional interpreter; when patients were unable to indicate a percentage, they were encouraged to use verbal quantifiers (e.g., “never,” “occasionally,” “fairly often,” “very often,” and “always”; Wright & Loftus, 1999). Scores were summed across items to obtain a total DES score (range = 0–100). Values above 30 (Putnam, Carlson, Ross, & Anderson, 1996) are considered to be indicative of clinically raised dissociation levels.

The Harvard Trauma Questionnaire (Sample 2)

The HTQ (Mollica et al., 1992) is a self-inventory of traumatic experiences. In the current study, we used the Dutch version (Kleijn & Mook, 1999). It has four parts: part I is an inventory of past traumatic events (17 items) with four possible responses for each event: “Experienced,” “Witnessed,” “Heard about it,” or “No.” Illustrative items are “lack of food or water,” “torture,” or “rape.” Part II asks respondents for a subjective description of the most traumatic event(s) they experienced. Part III is an inventory of incidents that may have caused traumatic brain injury (drowning, suffocation, blows to the head, and subsequent loss of consciousness). Part IV is an inventory of 30 symptoms. Sixteen of these are derived from the DSM-III-R criteria for PTSD (part IVa); the remaining 14 items were devised to target symptoms associated with refugees’ traumatic life events (part IVb). Illustrative items are “feeling as if you are going crazy” and “feeling that someone you trusted betrayed you.” Items of part IV are rated on a 4-point scale (“Not at all,” “A little,” “Quite a bit,” and “Extremely,” which are scored 1, 2, 3, and 4, respectively). The total score of the HTQ is the sum of the ratings of part IV symptoms divided by 30 (range = 1–4). Items of the HTQ were presented to patients by a fourth year psychology student with help of a professional interpreter who also translated the responses.

Implausible Symptoms (Samples 1 and 2)

We used all the items of the SIMS (Smith & Burger, 1997) to index over-reporting. In its original form, the SIMS is a self-report instrument that consists of 75 true-false items that describe atypical and extreme symptoms (e.g., “There is a constant ringing in my ear”; “The voices that I hear, have never stopped since they began”). There are five subscales, each containing 15 items, which address the following conditions: amnesia, neurologic impairment, psychosis, affective disorders, and low intelligence. After recoding some items, endorsed symptoms are summed to obtain a total SIMS score (range = 0–75), with higher scores indicating more symptom endorsement. Van Impelen, Merckelbach, Jelicic, and Merten (2014) summarize psychometric data indicating that the internal consistency of the SIMS is satisfactory (with Cronbach’s alpha ranging from 0.80 to 0.96), its test-retest stability sufficient (rs = 0.72–0.97), and its ability to discriminate between symptom exaggeration and honest responding fairly effective (with sensitivities varying between 75 and 100%). Nijdam-Jones and Rosenfeld (2017) concluded in their recent review of studies on cross-cultural feigning assessment that of all self-report SVTs included in their review, the SIMS had the highest overall classification accuracy. Previous studies recommended a cutoff of 16 to screening for symptom over-reporting (Merckelbach & Smith, 2003). This cutoff is problematic when the SIMS is employed as a measure to detect individual feigners. This, however, was not our goal. We employed the translated SIMS items as a global screening measure for over-reporting and as the extensive review of Van Impelen et al. (2014) shows, the SIMS is sensitive to differential prevalence in that context.

For the purpose of the current study, items 11, 14, and 60 of the Dutch version of the SIMS (Merckelbach, Koeyvoets, Cima, & Nijman, 2001) were adapted. Items 11 and 14 refer to Western-oriented geography, and after consultation with professional interpreters, they were reformulated. For example, item 11 “The capital of Italy is Hungary” was changed to “The capital of Turkey is Azerbaijan” for asylum seekers from Armenia. The original item 60 pertained to the queen of Holland in the Dutch version and was rephrased as follows: “The prophet of Allah is called Mohammed” (for patients with a Muslim background) and “The mother of Jesus is called Mary” (for patients with a Christian background). The SIMS was presented to the asylum seekers by a psychiatrist with the help of a professional interpreter, who also back-translated the yes or no answers.

Forced-Choice Task (Samples 1 and 2)

To screen for underperformance, we employed a forced-choice task involving the identification of basic emotional expressions modeled after the MENT (Morel, 1998). We included this task because it has low verbal mediation (see also Benuto, Leany, & Lee, 2014; Erdodi, Nussbaum, Sagar, Abeare, & Schwartz, 2017). Thus, compared with the DES, HTQ, or SIMS items, the forced-choice task required only minimal effort to translate test items, which makes it an interesting instrument for cross-linguistic contexts.

Our version was adapted from Geraerts, Merckelbach, and Jelicic (2007) and consisted of 20 colored slides of 10 facial expressions posed by a man and a woman. Their expressions reflected happiness, frustration, sadness, anger, fear, calmness, surprise, shyness, confusion, and sleepiness. The slides were presented on a computer screen (30 × 38 cm). Patients were instructed to identify the emotion that best matched the expression of the face. In a first series of 20 trials, patients had to indicate which of two words (e.g., “happy” versus “surprised”) best described the facial expression in the picture. In a second run of 20 trials, patients viewed two slides with different expressions at the same time and only one emotion word. They had to identify the expression that best matched the word. In a final run of 20 trials, patients were shown two slides and two words in each slide that had to be connected in the correct way. Emotional labels used in the test were translated and back-translated into several languages. The forced-choice task was administered by a psychiatrist. However, as some asylum seekers reported to be unable to read or write and some translations were not available in the native language of the asylum seeker (e.g., a Russian translation for asylum seekers from former Soviet Republics), a professional interpreter was present during the test to assist with the instructions and the key verbal labels when necessary.

Before the test, patients were told that emotional numbness is a prominent symptom of trauma-related problems and that this may cause people to have difficulties with the recognition of facial expressions. The rationale behind this instruction is that individuals who want to over-state trauma-related symptoms will produce more errors. Errors were summed across the three runs. Morel (1998) recommended a cutting score of nine errors on the MENT, with scores above this level raising the suspicion of underperformance. In a sample of Croatian war veterans, elevated error levels on our version of the forced-choice task were found to be effective in differentiating between treatment-seeking and compensation-seeking veterans (sensitivity 92%, specificity 96%; Geraerts et al., 2009).


Before test items were administered, patients were informed that some tests would be used to assess the validity of symptom reporting in their target group. We explained that if results would indicate poor validity, this would mean that standard symptom inventories do not provide useful information and that diagnosis would have to be based on additional interviews and observations. Only patients who gave informed consent for anonymous use of their data for scientific purposes were included. The study was approved by the Central Committee on Research Involving Human Subjects (CCMO).

The SVTs—SIMS items and the forced-choice task—were presented in counterbalanced order. DES or HTQ items were administered during a separate session, either before or after SVT results had been obtained.

Proficiency in Dutch was independently assessed by certified Dutch language teachers of the hospital. All teachers were specifically trained to instruct students with little or no knowledge of the Dutch language and used the European reference scale (Meijer & Noijons, 2008) that has been developed to differentiate between different levels of proficiency. This way, we were able to assign asylum seekers to three groups: those with poor, intermediate, and good proficiency. The language teachers had no access to information about the performance of the patients on the various tests.

Social workers of the hospital independently evaluated incentive levels of patients on the basis of file information. Relevant information was in the files of the patients through contacts between social workers and legal representatives or lawyers of the patients. As a rule, patients were aware of the presence of such information as Dutch law requires informed consent for legal steps such as application for asylum for medical reasons or child custody procedures. Thus, social workers could differentiate between the presence of positive incentives, the absence of incentives, and the presence of negative incentives. They did this by adding one point for each condition that potentially might promote symptom over-reporting, such as (1) an asylum procedure still in progress, (2) a temporary refugee status issued for medical reasons, and (3) any other current procedure requiring a medical report indicating medical necessity, urgency, or exemption (e.g., request for family reunion while the patient is not able to generate the necessary income demanded by Dutch law, a request for urgent change of housing or special housing arrangements, a request to be exempted from the demand to pass the language test in the naturalization procedure). For each condition discouraging symptom over-reporting, they subtracted a point. Such circumstances would be (1) compulsory admission and (2) any current procedure requiring a medical report indicating improved functioning or decreased need for medical treatment or scrutiny (e.g., a child custody procedure, a request for voluntary repatriation). Patients with one point or more were considered to have a positive incentive, and patients with minus one point or less were considered to have a negative incentive. Social workers evaluating incentive levels had no knowledge about the test outcomes.

Staff members who presented the standard symptom inventories (DES or HTQ) and the SVTs were not aware of the information obtained by the social workers and the Dutch language teachers. However, as staff members, they were involved in the treatment of the patients and had access to their medical files. So, in this respect, blinding was not complete.

Data Analysis

We used descriptive statistics to evaluate scores on standard symptom inventories and SVTs. Depending on whether data were skewed or evenly distributed, we employed t tests, one-way analyses of variance (ANOVA’s), or Kruskal-Wallis and Mann-Whitney U tests to compare groups that differed in language proficiency and incentives with regard to their symptom reports, endorsement of implausible symptoms, and errors on the forced-choice task.


Sample 1

Dissociative Symptoms (DES)

There were no missing data and data were not normally distributed (Shapiro-Wilks p < 0.05). Cronbach’s alpha was 0.97. The mean DES score was 22.6 (95% CI [14.5, 30.7]), with a range of 0–70.4. Ten patients (37%) scored ≥ 30.

Low, intermediate, and high proficiency groups had mean DES scores of 25.1 (SD = 22.7), 17.4 (SD = 22.5), and 25.5 (SD = 20.5); a Kruskal-Wallis test was non-significant (χ 2(2) = 1.2, p = 0.54). A Kruskal-Wallis test performed on DES scores as a function of incentives did attain significance (χ 2(2) = 14.9, p < 0.01, η 2 = 0.43). Further details are given in Table 1.
Table 1

Mean DES scores, implausible symptoms, and forced-choice task errors of patients with relatively more negative or more positive incentives


Negative (n = 6)

Neutral (n = 9)

Positive (n = 12)

χ 2(2)


1.1 (1.2)

[0.7, 1.6]

16.7 (18.0)

[9.9, 23.5]

37.8 (18.0)

[31.0, 44.6]


Implausible symptoms

11.7 (4.4)

[10.0, 13.4]

15.2 (10.2)

[11.4, 19.0]

43.4 (9.2)

[39.9, 46.9]


Errors forced-choice task

6.8 (3.3)

[5.6, 8.0]

4.8 (4.3)

[3.2, 6.4]

21.2 (12.7)

[16.3, 25.9]


Those with a “neutral” incentive status are shown in the middle. 95% confidence intervals [CI’s] are also shown

*p < 0.05; **p < 0.01

Implausible Symptoms

There were no missing data and data were not normally distributed (Shapiro-Wilks p < 0.05). Cronbach’s alpha for the full list of implausible items was 0.96. Cronbach’s alphas for subscales ranged from 0.74 (affective disorders) to 0.91 (amnesia). The mean endorsement rate of implausible symptoms was 27.0 (95% CI [20.4, 33.6]) with a range of 5–59. Seventeen patients (63%) scored above the cutoff (16).

Low, intermediate, and high proficiency groups had mean scores of 33.4 (SD = 19.4), 20.4 (SD = 14.0), and 26.5 (SD = 17.6), respectively; a Kruskal-Wallis test remained non-significant (χ 2(2) = 2.6, p = 0.24). A Kruskal-Wallis test on implausible symptoms endorsement as a function of incentives was significant (χ 2(2) = 18.2, p < 0.01, η 2 = .54) (see Table 1).

Forced-Choice Task

There were no missing data in this data set. The data were not normally distributed (Shapiro-Wilks p < 0.05). Cronbach’s alpha for the forced-choice task was 0.95. The mean error score was 12.4 (95% CI [7.9, 16.9]) with a range of 0–42, 11 patients (41%) scored above the cutoff (nine errors), and two patients (7%) failed on more than half of the items of the forced-choice task.

Low, intermediate, and high level subgroups had mean error scores of 17.7 (SD = 14.4), 10.7 (SD = 10.0), and 7.8 (SD = 7.5), respectively. A Kruskal-Wallis test remained non-significant (χ 2(2) = 3.3, p = 0.19). Again, a Kruskal-Wallis test performed on error scores of the three incentive groups was significant (χ 2(2) = 12.1, p < 0.01, η 2 = .33) (see Table 1).

Correlations Between Measures

e computed Spearman rank correlations between DES scores, implausible symptom endorsement, and errors on the forced-choice task. Forced-choice errors correlated at rho = 0.74 (p < 0.01) with implausible symptom endorsement, suggesting that both tests tap into symptom over-reporting. Also, both forced-choice errors and implausible symptoms correlated significantly with DES symptoms (rho = 0.41 (p < 0.05) and rho = 0.79 (p < 0.01), respectively).

Sample 2


There were no missing data for the HTQ. The number of traumatic events that patients rated as “experienced” in part I (“self-reported traumatic events”), the number of events suggestive of traumatic brain injury (part III), and scores on part IVa (PTSD symptoms) were not normally distributed (all Shapiro-Wilks ps < 0.05). The HTQ scores (part IV, all 30 items) were normally distributed (Shapiro-Wilks p = 0.12). Cronbach’s alphas for parts I, III, IVa, and IV were 0.86, 0.78, 0.85, and 0.92, respectively. The mean number of self-reported traumatic events was 10.4 (95% CI [9.0, 11.8]), with a range of 1–16; the mean number of events suggestive of traumatic brain injury was 3.1 (95% CI [2.4, 3.9]) with a range of 1–6, the mean score on part IVa (16 PTSD items) was 2.5 (95% CI [2.3, 2.7]) with a range of 1.2–3.5, and the mean HTQ score (averaged over 30 items) was 2.4 (95% CI [2.2, 2.6]) with a range of 1.1–3.5. For the 16 PTSD items, 22 patients (63%) had a score > 2.5, and for all 30 items of part IV, 21 patients (60%) had a score > 2.5.

To test the effects of Dutch proficiency, levels were recoded as “poor proficiency” (level A0 according to the European reference scale, n = 18) and “sufficient proficiency” (level A1 and above, n = 13). Mean group scores were 9.9 (SD = 4.6) and 10.3 (SD = 4.0), respectively, for self-reported traumatic events; 2.8 (SD = 2.2) and 3.5 (SD = 2.4), respectively, for self-reported traumatic brain injury incidents; and 2.5 (SD = 0.6) and 2.6 (SD = 0.6) for PTSD symptoms of part IVa. Mann-Whitney U tests showed that the two proficiency groups did not differ in self-reported traumatic events, events suggestive of traumatic brain injury, and PTSD symptoms (U = 115.0, p = 0.94; U = 96.5, p = 0.40 and U = 106.5, p = 0.67 respectively). Neither did the two proficiency groups differ in overall HTQ scores (t(29) < 1.0, p = 0.48), means being 2.4 (SD = 0.7) and 2.5 (SD = 0.6) for the poor and sufficient groups, respectively. On the other hand, when the sample was broken down into different incentive subgroups, significant differences emerged. This was true for self-reported traumatic events, brain injury incidents, and PTSD symptoms such that those with relatively more negative incentives reported lower numbers in all three categories (χ 2(2) = 8.8, p < 0.01, η 2 = .18; χ 2(2) = 12.2, p < 0.01, η 2 = .27; and χ 2(2) = 13.2, p < 0.01, η2 = .38, respectively). A one-way ANOVA indicated that incentive groups also differed with regard to overall HTQ scores (F (2, 32) = 18.7, p < 0.001, η 2 = 0.21). Further details are given in Table 2.
Table 2

Mean number of self-reported traumatic events (HTQ part I), brain injury incidents (HTQ part III), PTSD symptoms (HTQ part IVa), full HTQ part IV symptoms, implausible symptoms, and forced-choice task errors of patients with relatively more negative or more positive incentives


Negative (n = 5)

Neutral (n = 5)

Positive (n = 25)

χ 2(2); F(2, 32)

HTQ part I (traumatic events)

4.0 (4.0)

[2.7, 5.3]

11.6 (2.4)

[10.8, 12.4]

11.4 (3.4)

[10.3, 12.5]


HTQ part III (brain injury)

0.4 (0.9)

[0.1, 0.7]

2.0 (2.0)

[1.3, 2.7]

3.9 (1.9)

[3.3, 4.5]


HTQ part IVa (PTSD symptoms)

1.5 (0.3)

[1.4, 1.6]

2.3 (0.6)

[2.1, 2.5]

2.7 (0.4)

[2.6, 2.8]


HTQ part IV (all symptoms)

1.3 (0.2)

[1.2, 1.4]

2.3 (0.6)

[2.1, 2.5]

2.6 (0.4)

[2.5, 2.7]


Implausible symptoms

12.0 (6.0)

[10.0, 14.0]

26.6 (19.7)

[20.1, 33.1]

43.4 (11.5)

[39.6, 47.2]


Errors forced-choice task

8.0 (5.3)

[6.2, 9.8]

11.8 (9.6)

[8.6, 15.0]

25.5 (13.0)

[21.2, 29.8]


Those with a “neutral” incentive status are shown in the middle. Ninety-five percent confidence intervals [CIs] are also shown

*p < 0.05; **p < 0.01

Implausible Symptoms

A total of 20 items was left unanswered in all 35 interviews, which is less than 1%. These missing data were treated as non-endorsements. The data were not normally distributed (Shapiro-Wilks p < 0.05). Cronbach’s alpha for the full scale of implausible symptoms was 0.95 and ranged from 0.69 (affective disorders) to 0.90 (amnesia) for the subscales. The mean score was 38.2 (95% CI [33.3, 43.4]), ranging from 4 to 59. Twenty-nine patients (83%) scored above the cutoff. Mann-Whitney U tests showed no significant differences for proficiency levels (U = 109.0, p = 0.75). Mean endorsement rates for poor and sufficient proficiency groups were 36.7 (SD = 19.5) and 40.4 (SD = 13.2), respectively. A Kruskal-Wallis test indicated that implausible symptom levels were significantly higher in the group with positive incentives (χ 2(2) = 13.5, p < 0.01, η 2 = .30). Further details are given in Table 2.

Forced-Choice Task

There were no missing data for the forced-choice task and error scores were not normally distributed (Shapiro-Wilks p < 0.05). Cronbach’s alpha was 0.94. Mean number of errors was 21.1 (95% CI [16.4, 25.7]), ranging from 1 to 50. Twenty-five patients (71%) scored above the cutoff; 11 (31%) failed on more than half of the items. Differentiating between proficiency levels did not result in any significant group differences in error rates (U = 99.0, p = 0.47). Mean errors for poor and sufficient proficiency groups were 20.2 (SD = 14.5) and 24.8 (SD = 12.7), respectively. A Kruskal-Wallis test showed that patients with incentives made more errors on this task (χ 2(2) = 10.4, p < 0.01, η 2 = .22). Further details are given in Table 2.

Correlations Between Measures (Full Sample)

Endorsement of implausible symptoms correlated at rho = 0.59 with errors on the forced-choice task (p < 0.01), suggesting that both measures tap into a common conceptual domain of distorted symptom presentation. The HTQ symptoms correlated at rho = 0.43 with endorsement rates of implausible symptoms (p < 0.01). The correlation between HTQ and errors on the forced-choice task remained non-significant (rho = 0.31, p = 0.07).


Our main findings can be summarized as follows. First, considerable proportions in samples 1 and 2 over-endorsed implausible symptoms (63 and 83%, respectively) and underperformed on a simple forced-choice task (41 and 71%, respectively). Meanwhile, these proportions should not be taken as precise estimates of the base rate of feigning among refugees or asylum seekers. After all, these proportions are based on SVTs that were administered in a suboptimal way and in a context in which cutoffs might not have enough accuracy. What is more informative are the mean endorsement rates for implausible symptoms and the mean error rates on the forced-choice task in the positive incentive groups. These means are way beyond the cutoffs and are suggestive of symptom over-reporting. Second, replicating earlier findings in similar samples (van der Heide & Merckelbach, 2016), over-endorsement, and underperformance were unrelated to language proficiency. While controlling for language proficiency is not likely to counter the confounding effects of live interpretation of test items entirely, this pattern does contradict the idea that deviant scoring on SVTs in this group is merely an artifact of poor language proficiency and/or the involvement of interpreters.

Third, in both samples, over-endorsement and underperformance were significantly associated with incentives, such that a stronger presence of positive incentives went hand in hand with higher levels of distorted symptom presentation. The intimate connection between problematic symptom validity and the presence of positive incentives is a recurrent theme in neuropsychological studies on litigating or compensation-seeking patients (e.g., Bianchini, Curtis, & Greve, 2006). Fourth, overall, over-endorsement, and underperformance were associated with raised scores on standard clinical instruments (i.e., DES and HTQ). Studies in other settings (e.g., psychiatric outpatients: Dandachi-FitzGerald, Ponds, Peters, & Merckelbach, 2011; veterans: Wisdom et al., 2014) also found that deviant scoring on SVTs correlates with heightened scores on standard diagnostic tests.

Our finding of non-trivial proportions of psychiatric asylum seekers who over-report symptoms and who underperform parallels earlier findings. Studying psychiatric asylum seekers in the same facility, Van der Heide and Merckelbach (2016) observed over-endorsement of SIMS symptoms in 87% and underperformance on the forced-choice task in 58% of their patients. In that study too, problematic symptom validity could not be explained by poor language proficiency but was linked to the presence of positive incentives. It is important to emphasize that such findings do not rule out the possibility that different cultural attitudes towards test taking shape performance on SVTs. In fact, there is every reason to suspect that people from different cultures differ in their test behavior (Ardila, 2005). We did not look into this complex issue (see for a review, Nijdam-Jones & Rosenfeld, 2017).

Another important point to consider is that our study was conducted in a highly specialized, clinical setting for therapy-resistant cases. Thus, our findings cannot be generalized to asylum seekers in general and they certainly are silent about the prevalence of feigning among refugees and asylum seekers. Although positive incentives were associated with failures on two different SVTs and patients, as a rule, will have been aware of their positive incentives, this constellation is not enough to conclude anything about possible feigning. Apart from our partially nonstandardized test administration and the fact that the SVTs were merely used as screening instruments, the differential prevalence design (Rogers, 2008) of our study does not permit any definite classification. In fact, recent research suggests that the use of SVTs in trauma-exposed African immigrants may lead to high rates of false positive classifications of feigning (Weiss & Rosenfeld, 2017).

Our finding of a connection between SVT failure and heightened scores on standard clinical instruments is important. It raises the question how much trust can be placed in the outcomes of these instruments in the current setting. Several authors have pointed out that SVT failure may render diagnostic information obtained with standard clinical instrument largely non-interpretable (e.g., Fox, 2011). Ignoring poor symptom validity might be even dangerous, particularly when clinicians base their treatment interventions on self-reported symptom information that is so distorted that it obscures the real underlying disorder and, consequently, affects treatment decisions (Bush et al., 2005; Institute of Medicine, 2015).

Our findings do cast doubts on attempts to monitor symptom severity in asylum seekers with self-report instruments such as the DES and the HTQ in the absence of any checks on symptom validity. Note that the overall DES and the HTQ scores that we obtained were similar to those reported in other studies. The mean DES score in the present study (i.e., 22.6, SD = 21.5) comes close to the average DES score of 20.0 (SD = 18.1) found in a sample of Dutch psychiatric inpatients (Friedl & Draijer, 2000) and to that of 18.5 (SD = 10.8) observed in refugees from former Yugoslavia in an Italian refugee camp (Favaro et al., 1999). Carlson and Rosser-Hogan (1993) even reported a mean DES score of 37.1 (SD = 16.1) in Cambodian refugees living in the USA. Similarly, for the HTQ, van Dijk, Kortmann, Kooijman, and Bot (1999) observed in their sample of patients admitted to a psychiatric facility for asylum seekers a mean number of self-reported traumatic events of 8.9 (SD = 5.0) and a mean HTQ of 2.4 (SD = 0.8); Kleijn, Hovens, Rodenburg, and Rijnders (1998) found in their sample of patients referred to a psychiatric facility for asylum seekers (both clinical and outpatient) a mean number of self-reported traumatic events of 10.3 (SD = 4.3). All these scores are well in line with what we observed: a mean number of self-reported traumatic events of 10.4 (SD = 4.2) and a mean HTQ score of 2.4 (SD = 0.6). Without the red flags that the SVTs provided us with, the outcomes of the present study could simply have be construed as replicating earlier epidemiological studies in refugee populations. Our point is that more precise estimates as to the prevalence and severity of psychopathology are possible when one corrects for the distorting influence of symptom over-reporting by taking SVT failures into account (see also Merckelbach, Langeland, De Vries, & Draijer, 2014).

Four important limitations of the current study deserve some comment. First, in our study, proficiency in Dutch language was employed not as a proxy for acculturation but rather as a measure to control for the potential confounding effects of partially nonstandardized test presentation. Live interpretation of the items of standard instruments may serve to invalidate the results obtained because of the alteration in the standardized procedure in testing. Our reliance on interpreters was inspired by the Dutch practice of using interpreters for clinical interviewing and testing, the underlying idea being that with their help in translating standard instruments for which no official back-translated version is available (e.g., in Armenian), it is—within limits—possible to arrive at a diagnosis. The implication of our findings for this suboptimal approach to diagnosis is that it is wise to take SVT results into account. Second, in our study, social workers inspected patient files for verifiable positive and negative incentives. More subtle motives for over-reporting symptoms (e.g., earlier experiences of care being denied) or under-reporting were not included in their assessment. The latter might be particularly relevant given the lowered scores on standard symptom inventories (e.g., the DES) in the minority of patients with negative incentives. This might reflect denial of symptoms, which is a topic that warrants further research.

Third, how incentives precisely play a role in symptom misrepresentation can only be understood if one would administer the SVT items employed in the current study to other groups as well, e.g., non-clinical samples of asylum seekers. Fourth, our samples were relatively small and our design was cross-sectional in nature. It would be informative to monitor the symptomatic course over time of psychiatric asylum seekers who fail and who pass SVTs. Such longitudinal setup could elucidate how adverse living conditions may foster problematic symptom validity.


Over-endorsement of SIMS symptoms and excessive errors on a simple forced-choice task were related to heightened symptom levels on standard clinical instruments that are commonly used in the diagnostic assessment of psychiatric asylum seekers. Furthermore, for all instruments, elevated symptom scores were related to positive incentives rather than language proficiency. This pattern of findings casts doubts on attempts to monitor symptom severity and treatment progress in psychiatric asylum seekers without taking the validity of symptom reports into account.

While our study highlights the problems in relying on self-report inventories to monitor treatment progress, the data of this study cannot be used to determine the frequency of feigning in this sample. Its findings cannot be generalized to asylum seekers in general and they certainly are no indication of the prevalence of feigning among refugees and asylum seekers. Even though the presence of positive incentives was the most prominent correlate of problematic symptom reports in our samples of asylum seekers, there is little doubt that asylum seekers in general are a highly vulnerable group with high rates of psychopathology, notably trauma-related psychopathology (Reko, Bech, Wohlert, Noerregaard, & Csillag, 2015). Our results do raise the possibility, however, that in a considerable subgroup of psychiatric asylum seekers the various medicolegal procedures may have contributed to poor symptom validity to an extent that compromises diagnostic decision making.



The authors would like to thank Hanneke Bot, current head of the clinical facility where this study took place, for her review of the concept version of this article.

Compliance with Ethical Standards

Conflict of Interest

This project received no financial support from outside funding agencies. The authors have no disclosures to make that could be interpreted as conflict of interests.

Human and Animal Rights and Informed Consent

Relevant ethical guidelines regulating research involving human participants were followed throughout the project. All data collection, storage, and processing was done in compliance with the Helsinki Declaration.


  1. Ardila, A. (2005). Cultural values underlying psychometric cognitive testing. Neuropsychology Review, 15, 185–195. Scholar
  2. Benuto, L. T., Leany, B. D., & Lee, A. (2014). Assessing effort and malingering with the African American client. In L. T. Benuto & B. D. Leany (Eds.), Guide to psychological assessment with African Americans (pp. 79–85). New York: Springer.Google Scholar
  3. Bernstein, E. M., & Putnam, F. W. (1986). Development, reliability, and validity of a dissociation scale. Journal of Nervous and Mental Disease, 174, 727–735. Scholar
  4. Bianchini, K. J., Curtis, K. L., & Greve, K. W. (2006). Compensation and malingering in traumatic brain injury: A dose-response relationship? The Clinical Neuropsychologist, 20, 831–847. Scholar
  5. Boon, S., & Draijer, N. (1993). Multiple personality disorder in The Netherlands: A clinical investigation of 71 patients. American Journal of Psychiatry, 150, 489–494. Scholar
  6. Bot, H. (2005). Dialogue interpreting in mental health. New York: Rodopi.Google Scholar
  7. Bush, S. S., Ruff, R. M., Tröster, A. I., Barth, J. T., Koffler, S. P., Pliskin, et al. (2005). Symptom validity assessment: Practice issues and medical necessity. Archives of Clinical Neuropsychology, 20, 419–426. Scholar
  8. Carlson, E. B., & Rosser-Hogan, R. (1993). Mental health status of Cambodian refugees ten years after leaving their homes. American Journal of Orthopsychiatry, 63, 223–231. Scholar
  9. Dandachi-FitzGerald, B., Merckelbach, H., & Ponds, R. W. (2017). Neuropsychologists’ ability to predict distorted symptom presentation. Journal of Clinical and Experimental Neuropsychology, 39, 257–264. Scholar
  10. Dandachi-FitzGerald, B., Ponds, R. W., Peters, M. J., & Merckelbach, H. (2011). Cognitive underperformance and symptom over-reporting in a mixed psychiatric sample. The Clinical Neuropsychologist, 25, 812–828. Scholar
  11. DuAlba, L., & Scott, R. L. (1993). Somatization and malingering for workers' compensation applicants: A cross-cultural MMPI study. Journal of Clinical Psychology, 49, 913–917.<913::AID-JCLP2270490619>3.0.CO;2-1.CrossRefPubMedGoogle Scholar
  12. Egeland, J., Andersson, S., Sundseth, Ø. Ø., & Schanke, A. K. (2015). Types or modes of malingering? A confirmatory factor analysis of performance and symptom validity tests. Applied Neuropsychology: Adult, 22, 215–226. Scholar
  13. Erdodi, L. A., Nussbaum, S., Sagar, S., Abeare, C. A., & Schwartz, E. S. (2017). Limited English proficiency increases failure rates on performance validity tests with high verbal mediation. Psychological Injury and Law, 10, 96–103. Scholar
  14. European Council on Refugees and Exiles. (2012). Asylum Information Database. Retrieved from
  15. Favaro, A., Maiorani, M., Colombo, G., & Santonastaso, P. (1999). Traumatic experiences, posttraumatic stress disorder, and dissociative symptoms in a group of refugees form former Yugoslavia. Journal of Nervous and Mental Disease, 187, 306–308. Scholar
  16. Fox, D. D. (2011). Symptom validity test failure indicates invalidity of neuropsychological tests. The Clinical Neuropsychologist, 25, 488–495. Scholar
  17. Friedl, M. C., & Draijer, N. (2000). Dissociative disorders in Dutch psychiatric inpatients. American Journal of Psychiatry, 157, 1012–1013. Scholar
  18. Geraerts, E., Kozaric-Kovacic, D., Merckelbach, H., Peraica, T., Jelicic, M., & Candel, I. (2009). Detecting deception of war-related post traumatic stress disorder. Journal of Forensic Psychiatry and Psychology, 20, 278–285. Scholar
  19. Geraerts, E., Merckelbach, H. L. C. J., & Jelicic, M. (2007). Het simuleren van posttraumatische stresssymptomen: De Nederlandse versie van de Morel Emotional Numbing Test (MENT) [Feigning symptoms of posttraumatic stress: The Dutch version of the Morel Emotional Numbing Test (MENT)]. Neuropraxis, 11, 8–12. Scholar
  20. Griffin, G. A., Normington, J., May, R., & Glassmire, D. (1996). Assessing dissimulation among Social Security disability income claimants. Journal of Consulting and Clinical Psychology, 64, 1425–1430. Scholar
  21. Immigration & Naturalization Services. (2016). Protocol Bureau Medische Advisering [Protocol of the Bureau for Medical Advice]. Retrieved from
  22. Institute of Medicine. (2015). Psychological testing in the service of disability determination. Washington DC: The National Academic Press.Google Scholar
  23. Kleijn, W. C., Hovens, J. E. J. M., Rodenburg, J. J., & Rijnders, R. J. P. (1998). Psychiatrische symptomen bij vluchtelingen aangemeld bij het psychiatrisch centrum de Vonk [Psychiatric symptoms of refugees referred to psychiatric center De Vonk]. Nederlands Tijdschrift voor Geneeskunde, 142, 1724–1728.PubMedGoogle Scholar
  24. Kleijn, W. C., & Mook, J. (1999). Nederlands-Engelstalige adaptatie van de Harvard Trauma Questionnaire [Dutch-English adaptation of the Harvard Trauma Questionnaire]. Oegstgeest: Centrum’45.Google Scholar
  25. Meffert, S. M., Musalo, K., McNiel, D. E., & Binder, R. L. (2010). The role of mental health professionals in political asylum processing. Journal of the American Academy of Psychiatry and the Law, 38, 479–489.PubMedGoogle Scholar
  26. Meijer, D., & Noijons, J. (2008). Gemeenschappelijk Europees referentiekader voor moderne vreemde talen: Leren, onderwijzen, beoordelen [Common European framework of reference for modern languages: Learning, teaching, assessment]. The Hague: Nederlandse Taalunie.Google Scholar
  27. Merckelbach, H., Langeland, W., De Vries, G., & Draijer, N. (2014). Symptom overreporting obscures the dose-response relationship between trauma severity and symptoms. Psychiatry Research, 217, 215–219. Scholar
  28. Merckelbach, H., & Smith, G. P. (2003). Diagnostic accuracy of the Structured Inventory of Malingered Symptomatology (SIMS) in detecting instructed malingering. Archives of Clinical Neuropsychology, 18, 145–152. Scholar
  29. Merckelbach, H. L. G. J., Koeyvoets, N., Cima, M., & Nijman, H. (2001). De Nederlandse versie van de SIMS [The Dutch version of the SIMS]. De Psycholoog, 11, 586–591.Google Scholar
  30. Mollica, R., Caspi-Yavin, Y., Bollini, P., Truong, T., Tor, S., & Lavelle, J. (1992). The Harvard Trauma Questionnaire. Validating a cross-cultural instrument for measuring torture, trauma and posttraumatic stress disorder in Indochinese refugees. Journal of Nervous and Mental Disease, 180, 111–116. Scholar
  31. Montes, O., & Guyton, M. R. (2014). Performance of Hispanic inmates on the Spanish Miller Forensic Assessment of Symptoms Test (M-FAST). Law and Human Behavior, 38, 428–438. Scholar
  32. Morel, K. R. (1998). Development and preliminary validation of a forced-choice test of response bias for posttraumatic stress disorder. Journal of Personality Assessment, 70, 299–314. Scholar
  33. Nijdam-Jones, A., & Rosenfeld, B. (2017). Cross-cultural feigning assessment: A systematic review of feigning instruments used with linguistically, ethnically, and culturally diverse samples. Psychological Assessment. Advance online publication.
  34. Putnam, F. W., Carlson, E. B., Ross, C. A., & Anderson, G. (1996). Patterns of dissociation in clinical and nonclinical samples. Journal of Nervous and Mental Disease, 184, 673–679. Scholar
  35. Reko, A., Bech, P., Wohlert, C., Noerregaard, C., & Csillag, C. (2015). Usage of psychiatric emergency services by asylum seekers: Clinical implications based on a descriptive study in Denmark. Nordic Journal of Psychiatry, 69, 587–593. Scholar
  36. Rogers, R. (2008). An introduction to response styles. In R. Rogers (Ed.), Clinical assessment of malingering and deception (3rd ed., pp. 3–13). New York: Guilford.Google Scholar
  37. Rogers, R., Sewell, K. W., & Goldstein, A. (1994). Explanatory models of malingering: A prototypical analysis. Law and Human Behavior, 18, 543–552. Scholar
  38. Shoeb, M., Weinstein, H., & Mollica, R. (2007). The Harvard trauma questionnaire: Adapting a cross-cultural instrument for measuring torture, trauma and posttraumatic stress disorder in Iraqi refugees. International Journal of Social Psychiatry, 53, 447–463. Scholar
  39. Smith, G. P. (2008). Brief screening measures for the detection of feigned psychopathology. In R. Rogers (Ed.), Clinical assessment of malingering and deception (3rd ed., pp. 323–339). New York: Guilford.Google Scholar
  40. Smith, G. P., & Burger, G. K. (1997). Detection of malingering: Validation of the Structured Inventory of Malingered Symptomatology (SIMS). Journal of the American Academy of Psychiatry and the Law, 25, 183–189.PubMedGoogle Scholar
  41. Sollman, M. J., & Berry, D. T. (2011). Detection of inadequate effort on neuropsychological testing: A meta-analytic update and extension. Archives of Clinical Neuropsychology, 26, 774–789. Scholar
  42. Söndergaard, H. P., & Theorell, T. (2004). Alexithymia, emotions and PTSD; findings from a longitudinal study of refugees. Nordic Journal of Psychiatry, 58, 185–191. Scholar
  43. Storm, I. (2003). Ervaringen van Marokkaanse en Turkse migranten met een functiebeperking of chronische ziekte in de zorg [Experiences of Moroccan and Turkish migrants with impairments of chronic illness in healthcare]. Retrieved from
  44. Van der Heide, D. H., & Merckelbach, H. (2016). Validity of symptom reports of asylum seekers in a psychiatric hospital: A descriptive study. International Journal of Law and Psychiatry, 49, 40–46. Scholar
  45. Van Dijk, D. G. L., Kortmann, F., Kooijman, M., & Bot, J. (1999). De Harvard Trauma Questionnaire (HTQ) als transcultureel screeningsinstrument voor de posttraumatische stress-stoornis bij opgenomen vluchtelingen [The Harvard Trauma Questionnaire (HTQ) as transcultural screening instrument for posttraumatic stress disorder among refugee inpatients]. Tijdschrift voor Psychiatrie, 41, 45–49.Google Scholar
  46. Van Impelen, A., Merckelbach, H., Jelicic, M., & Merten, T. (2014). The Structured Inventory of Malingered Symptomatology (SIMS): A systematic review and meta-analysis. The Clinical Neuropsychologist, 28, 1336–1365. Scholar
  47. Vilar-López, R., Santiago-Ramajo, S., Gómez-Río, M., Verdejo-García, A., Llamas, J. M., & Pérez-García, M. (2007). Detection of malingering in a Spanish population using three specific malingering tests. Archives of Clinical Neuropsychology, 22, 379–388. Scholar
  48. Weiss, R. A., & Rosenfeld, B. (2017). Identifying feigning in trauma-exposed African immigrants. Psychological Assessment, 29, 881–889. Scholar
  49. Wisdom, N. M., Pastorek, N. J., Miller, B. I., Booth, J. E., Romesser, J. M., Linck, J. F., & Sim, A. H. (2014). PTSD and cognitive functioning: Importance of including performance validity testing. The Clinical Neuropsychologist, 28, 128–145. Scholar
  50. Wright, D. B., & Loftus, E. F. (1999). Measuring dissociation: Comparison of alternative forms of the Dissociative Experiences Scale. American Journal of Psychology, 112, 497–519. Scholar
  51. Young, G. (2014). Malingering, feigning, and response bias in psychiatric/psychological injury: implications for practice and court. doi:

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Phoenix, ProPersonaWolfhezeThe Netherlands
  2. 2.GGZ CentraalErmeloThe Netherlands
  3. 3.Forensic Psychology SectionMaastricht UniversityMaastrichtThe Netherlands
  4. 4.University of PortsmouthPortsmouthUK

Personalised recommendations