The term malingering refers to the conscious fabrication or exaggeration of mental or physical symptoms in order to gain secondary personal benefits or financial compensation, avoid school, work or military service, receive drugs or medication, or obtain mitigation of criminal charges (American Psychological Association, 2013). Failure to detect malingering results in enormous social costs and places a heavy burden on the healthcare system (Shapiro & Teasell, 1998). As such, evaluating the credibility of presented symptoms has become a key issue for almost all psychological injury evaluations (Bush et al., 2014; Giromini et al., 2022; Sherman et al., 2020; Sweet et al., 2021; Young, 2014).

Psychotic symptoms are particularly commonly feigned in the context of criminal trials. A study conducted in the Los Angeles County jail, which is considered the largest jail system in the USA, reports that almost a third of inmates engaged in malingering psychotic symptoms in order to be prescribed psychoactive drugs (Pierre et al., 2004). Furthermore, given that being diagnosed with a mental illness often leads to mitigation of punishment, defendants charged with serious crimes may be particularly tempted to pretend to suffer from psychosis (Resnick, 1999). Given that, and because clinical judgment alone is not sufficient to identify the presence of malingering (Dandachi-FitzGerald et al., 2017), in these settings, professionals are expected to include additional assessments specifically developed to test the validity of presented mental health problems (Giromini et al., 2022; Sherman et al., 2020; Sweet et al., 2021). These are typically referred to as Symptom Validity Tests (SVTs) when employing a self-report format and Performance Validity Tests (PVTs) when they present the test-taker with cognitive problems or tasks to solve (Larrabee, 2012).

Symptom Validity Assessment

Symptom validity assessment consists of evaluating the overall credibility of the mental health problems reported by the examinee. In essence, SVTs and PVTs assist professionals in determining whether the examinee has provided an accurate and truthful picture of their symptoms and mental health problems without feigning or exaggerating their health status (Bush et al., 2005). To this end, current guidelines recommend administering multiple SVTs and multiple PVTs, and experts agree that decisions about symptom validity should not be based on a single validity test (Giromini et al., 2022; Sherman et al., 2020; Sweet et al., 2021). In addition, several other sources of information need to be considered too, such as observational materials and interview-related behaviors.

The Structured Interview of Reported Symptoms (SIRS; Rogers et al., 1992; SIRS-2; Rogers et al., 2010) and the Miller Forensic Assessment of Symptoms Test (M-FAST; Miller, 2001) are two well-known examples of interview-based SVTs. In addition, a list of widely used and/or psychometrically sound self-report SVTs has been reviewed recently in a special issue of Psychological Injury and Law (Giromini et al., 2022). These include, among others, the Structured Inventory of Malingered Symptomatology (SIMS; Smith & Burger, 1997), the Inventory of Problems – 29 (IOP-29; Viglione & Giromini, 2020), and the validity scales of the Minnesota Multiphasic Personality Inventory (MMPI-RF; Ben-Porath & Tellegen, 2008; MMPI-3; Ben-Porath & Tellegen, 2020a, b) and Personality Assessment Inventory (PAI; Morey, 1991, 2007).

Eye Movements and Feigning

In recent years, technological advancement has prompted researchers to find other measures able to detect non-credible symptom presentations to use alongside self-reports. For example, reaction times were found to be useful in the assessment of invalid responding in both symptom and performance validity tests (Hartman, 2008; Vendemia et al., 2005; Willison & Tombaugh, 2006). More specifically, it has been shown that reaction times tend to be slower during feigning attempts compared to honest responding (Browndyke, 2013; Johnson et al., 2003), suggesting a delay when the respondent has to plan a simulation strategy and then endorse a non-genuine response (Willison & Tombaugh, 2006).

Other research has investigated individuals’ attempts of feigning by means of psychophysiological and neurophysiological techniques such as electroencephalography (EEG) and magnetic resonance imaging (fMRI). The rationale behind these studies is that the brain activity and neural processes of individuals who cooperate with the assessment process might differ from those of individuals who feign. Thus, some studies suggested that the EEG signals of individuals who feign are characterized by excessive cognitive load (Vagnini et al., 2008). Similarly, Kozel et al. (2005) conducted an fMRI study and showed that specific brain regions (i.e., anterior cingulate, orbitofrontal cortex, and dorsolateral prefrontal cortex) are involved in deception attempt mechanisms. Other studies have examined the role of Event-Related Potentials (ERPs) in malingering assessment and detection. However, although early scientific evidence suggested that the use of ERPs may be especially useful in detecting feigned memory impairment (Ellwanger et al., 1996, 1999; Rosenfeld et al., 199919981996; Tardif et al., 20002002), the results of ERPs research have been mixed, overall. Finally, another important line of psychophysiological research focused on malingering involves the study of electrodermal activity (EDA) during deception. In particular, Kozel et al. (2005) conducted a pilot study showing that changes in EDA correlated with activation of specific brain regions, i.e., the orbitofrontal cortex and the anterior cingulate.

Among all these other technological advances, oculomotor measures seem particularly promising for the detection of attempts of invalid responding (Hannula et al., 2012). In fact, eye tracking technology allows non-invasive measurement of eye position and behavior providing a useful and deep understanding of cognitive processes in both healthy adults and clinical populations (Duchowski, 2007). A number of studies demonstrated that eye movements are associated with cognitive processing, executive functions, attention deployment, working memory, and response inhibition (Barnes, 2008; Gooding & Basso, 2008; Hutton, 2008; Müri & Nyffeler, 2008; Olk & Kingstone, 2003; Pierrot-Deseilligny et al., 2004; Sharpe, 2008). Additionally, abnormalities in eye movements are typical of some neurological conditions and mental disorders, such as dementia, Parkinson’s disease, Alzheimer disease, and schizophrenia (Crawford et al., 2005; Heitger et al., 2009; Maruta et al., 2010; van Stockum et al., 2008). The latter, in particular, has been studied extensively in relation to eye movements. The first study reporting abnormalities in the eye movements of individuals with schizophrenia dates back to 1908 (Diefendorf & Dodge, 1908), and the visual scanning behavior of these patients has been studied ever since. Abnormalities in the oculomotor patterns of individuals suffering from schizophrenia cover a wide range of eye movements, from smooth pursuit to anti-saccadic movements to more exploratory search patterns, i.e., visual search (for a more exhaustive treatise on this topic, see the next section “Eye Movements and Schizophrenia”).

Currently, non-invasive eye-tracking systems using video cameras are available. Recent advances in the performance of eye-tracking cameras allow us to measure eye movements with high temporal and spatial resolution. Thus, researches on the eye movements of subjects with mental illnesses including schizophrenia have been actively conducted. In the following section, we will review the neural basis of eye movement control and characteristics of schizophrenia. We will then discuss the prospects for eye movements as biomarkers for mental illnesses.

The study of eye movements is a valuable source of information in both clinical and research settings. However, eye tracking technology is still underutilized in malingering-related research. One of the few studies using eye movements to better understand the phenomenon of malingering is an unpublished doctoral dissertation (Bashem, 2016). In this work, the author inspected eye movements of individuals suffering vs individuals feigning mild Traumatic Brain Injury (mTBI) symptoms, while taking the Test of Memory Malingering (TOMM; Tombaugh, 1996). Results indicated that certain oculomotor patterns could provide incremental validity over the classification accuracy of the TOMM, supporting the hypothesis that eye tracking technology might add a significant contribution to symptom validity assessment.

Similar results were found in a recent study (Kanser et al., 2020) that investigated the incremental validity of eye movements on PVTs in identifying individuals instructed to feign cognitive deficits. Kanser et al. (2020) found that feigners showed multiple eye tracking indexes of greater cognitive effort compared to both healthy controls and individuals with genuine TBI. Results of this study also indicated that eye movements were the best predictor in discriminating group membership. In light of these findings, Kanser et al. (2020) suggested that the investigation of eye movements may be an important integration to performance validity assessment, and that including eye movements’ measurement in routine cognitive evaluations would provide reliable, bio-behavioral data able to improve sensitivity to feigned deficits.

Another recent study (Tomer et al., 2018) also highlighted the potential of eye movements to detect feigned cognitive impairment by using eye tracking technology in conjunction with the Word Memory Test (WMT; Green et al., 1996). Results showed that eye movements used along with the WMT were able to predict group membership (simulators vs honest controls), with eye movements uniquely contributing to this prediction. Tomer et al. (2018) thus concluded that eye movements represent a promising addition to performance validity assessment and that they are able to shed light on the strategies used by simulators when attempting to exaggerate or fabricate a cognitive deficit or a mental disorder.

Another similar pattern of findings was reported inspecting eye movements in combination with the Binomial Forced-Choice Digit Memory Test (BFDMT; Liu et al., 2001), a tool widely used in China for testing performance validity (Zhong et al., 2021). Specifically, feigners showed longer dwell time and more fixations compared to honest controls, suggesting that various eye tracking parameters may be potential markers to detect simulators.

Eye Movements and Schizophrenia

Taken together, all of the findings described above suggest that oculomotor patterns may be useful for understanding the cognitive processes underlying feigning and over-reporting. To date, the literature has focused mainly on the use of eye movements to detect feigned brain damage, and no study has tested their efficacy in mental disorders in which eye movement abnormalities are also detected, such as schizophrenia. Indeed, individuals with schizophrenia are known to exhibit oculomotor anomalies in both simple subconscious eye movements, such as smooth pursuit, and more complex cognitive tasks such as the anti-saccade task and visual search (Morita et al., 2019). With regard to the former, during smooth pursuit eye movements, individuals are required to follow a moving target (usually horizontally, vertically, or elliptically) with their eyes. Individuals suffering from schizophrenia are not able to smoothly follow the target as their eyes cannot keep up with its speed (Lencer et al., 2015; O’Driscoll & Callahan, 2008). With regard to the latter, saccades are rapid eye movements that humans constantly (3–4 saccades per second, on average) make to bring the area of interest to match the fovea and can occur as an involuntary reflex, or as a voluntary movement to redirect fixation. This second kind of eye movements can be assessed with the anti-saccade task, which is based on the premise that usually when a stimulus appears in our visual field, we are led to perform a saccade directly to the stimulus and to avoid any distracters. In the anti-saccade task, the subject is asked to inhibit this involuntary saccadic reflex and look in the opposite direction (e.g., if the distractor cue appears to the left, the subject should look to the right). Previous literature is consistent in supporting that patients with schizophrenia have lower performance on the anti-saccade task compared to control participants (Benson et al., 2012; Radant et al., 2015).

Finally, Kojima et al. (1990) identified deficits in exploratory movements (i.e., visual search) of patients with schizophrenia. This type of eye movements is strongly associated with cognitive processing (Thomas & Lleras, 2007; Van der Stigchel et al., 2006), which is equally impaired in patients with schizophrenia (Silverstein & Keane, 2011).

First-degree relatives of patients suffering from schizophrenia also underperform in smooth pursuit, anti-saccade, and exploratory eye movement tasks (Kikuchi et al., 2018; Levy et al., 2010; Takahashi et al., 2008), and candidate genes related to these oculomotor abnormalities have been identified (Greenwood et al., 2007, 2011). In fact, specific eye movement patterns have been suggested as neuropsychological biomarkers for schizophrenia (Calkins et al., 2007; Kojima et al., 2001; Light et al., 2012; Suzuki et al., 2009). For all these reasons, examination of eye movements may prove particularly informative in assessing the credibility of schizophrenia-related symptoms.

To our knowledge, only one study (Ales et al., 2021) so far investigated whether experimental simulators could reproduce eye movement abnormalities typical of patients suffering from schizophrenia. More specifically, eye movement data were registered during two tasks widely used to evaluate oculomotor deficits in schizophrenia (i.e., smooth pursuit and anti-saccade) in order to test whether eye movements of experimental feigners would differ from those of honest participants. Results were also compared with those reported in two major studies (O’Driscoll & Callahan, 2008; Radant et al., 2015) that had collected eye movements’ data in a very large sample of schizophrenia patients. Ales et al. (2021) observed that individuals who attempted to feign schizophrenia were only partially able to reproduce eye movement abnormalities typically shown by genuine patients suffering from schizophrenia. It was therefore concluded that eye movements’ investigation may be a valuable addition to detect malingered schizophrenia.

The current study aimed to provide additional evidence that eye tracking technology may contribute to symptom validity assessment. More specifically, we investigated whether the eye movements of healthy individuals taking an SVT with the instruction to feign schizophrenia would differ from those of control participants taking the same test but with the instruction to respond honestly. In order to address this research question, we recorded eye movements of 83 healthy volunteers taking the IOP-29. Approximately half were instructed to respond honestly, whereas the other group was instructed to feign schizophrenia. Our hypothesis was that experimental feigners would spend more time fixating on the different response options of the same items, compared to control participants instructed to respond honestly. In other words, we hypothesized that the extra uncertainty and cognitive effort associated with feigning would lead to extra consideration of the answer options (hypothesis 1). Additionally, we also speculated that feigners would focus more than controls on those response options that the IOP-29 identifies as more indicative of feigning, whether or not they actually endorsed those options (hypothesis 2). It should be noted that although these hypotheses have not yet been tested, the same data set has been used before to evaluate some other hypotheses related to eye movements, and the results of these other analyses have been described in another article (Ales et al., 2021).



The demographic composition of the sample is described in more detail in Ales et al. (2021). Briefly, the sample consisted of 83 participants whose native language was English. Sixty-four were women, and the mean age was 23.35 years (SD = 6.84, range = 18–57). The sample was collected in the north of England via an advertisement on the university website and snowball sampling. The advertisement on the website provided a brief description of the experiment and inclusion criteria and informed potential participants that all of them would be paid £5 upon completion of the experiment and that some of them could potentially win an additional £25 (see below). Exclusion criteria for participation in the study were (a) not being native English speaking, (b) presence of mental and/or neurological diseases, (c) history of psychiatric disorders, and (d) presence of pathological conditions related to the visual system. No statistical differences were found between the two groups in terms of age [t(57) = 1.26, p = 0.20] and gender [χ2 = 0.007, p = 0.93].

Materials and Measures

The Inventory of Problems-29 (IOP-29)

The IOP-29 is a self-administered test measuring a range of emotional, cognitive, and social experiences (Viglione et al., 2017). Out of the 29 items, 27 provide three possible response options, i.e., True, False, and Doesn’t make sense. The other two items are open-ended questions that involve calculations or logical reasoning. Ultimately, IOP-29 results are interpreted on the basis of the False Disorder Probability Score (FDS), which provides a probability value of finding a given score within a reference sample of genuine patients vs a reference sample of experimental feigners. The higher the FDS, the lower the credibility of the presented complaints. Viglione and Giromini (2020) set the FDS cutoff at ≥ 0.50.

The algorithm underlying the FDS of IOP-29 is not discussed in detail here for reasons of test security, so as not to compromise its effectiveness. However, for the present article, it is important to note that each IOP-29 item contains one or more feigning-key response options, the endorsement of which suggests a possible exaggeration or negative response bias. In addition, the FDS uses a scaling approach that incorporates a multiple-weighting procedure, which is discussed in detail in the first IOP-29 validation article (Viglione et al., 2017). Thus, an item keyed True might have a weighting score of +2 for True, a weighting score of +1 for Doesn’t Make Sense, and a weighting score of −1 for False. For another item, in contrast, the response option False may have a weighting score of +1 whereas the response options True and Doesn’t make sense might have both a weighting score of 0. In the current study, for those items where more than one feigning-key response option is available, we considered the response option with the highest weight to be the “target” feigning-key response option for the item.

The validity of the IOP-29 has been demonstrated in several studies. In particular, its effectiveness in detecting feigned schizophrenia has been observed in several countries such as North America (Viglione et al., 2017), the UK (Winters et al., 2021), Italy (Di Girolamo et al., 2021; Giromini et al., 2018, 2020b; Pignolo et al., 2021), Slovenia (Šömen et al., 2021), and France (Banovic et al., 2021). These studies demonstrated that the IOP-29 is valid, reliable, and easily adaptable across cultures and languages (see also Boskovic et al., 2022; Ilgunaite et al., 2022). In fact, in a recent quantitative review, Giromini and Viglione (2022) showed that the same cutoffs yielded similar results in different cultures, populations, and contexts. This undoubtedly simplifies the use of the test and minimizes errors due to different cutoff interpretations. Importantly, IOP-29 ecological (Roma et al., 2019) and incremental validity (Giromini et al., 2019, 2020a) has also been demonstrated by empirical research. Indeed, several studies consistently indicated that using the IOP-29 with other SVTs and PVTs improved classification accuracy (for a quantitative literature review, see Giromini & Viglione, 2022).

Parallel Version of the Inventory of Problems-29

We created a parallel version of the IOP-29 to ensure that the control and feigning groups did not differ from each other in their visual scanning approach to the questions and response options of a test, when they are asked to respond honestly. Said differently, we wanted to rule out the possibility that the participants in the control and feigning groups had generally different eye movement approaches regardless of the condition to which they had been assigned.

These 29 items were extracted from the same pool of 181 items from which the standard IOP-29 items were extracted. In fact, to develop the False Disorder Probability Score, Viglione et al. (2018) conducted a series of simulation studies utilizing a longer version of the IOP-29—namely IOP—and comprising a broader (i.e., 181) pool of items. Additional information concerning these items may be found in Viglione et al. (2018). Similar to the standard IOP-29, the parallel version has two items with open-ended response options, whereas all other items have the three aforementioned response options (i.e., True, False, and Doesn’t make sense).

Eye Tracker

An EyeLink 1000 Plus Desktop Mount tracker was used to record participants’ eye behavior, using a chin rest to minimize head movements. Consistent with the guidelines reported in the EyeLink manual, the participant sat 40 cm away from the camera. Eye movements were sampled at 500 Hz which allows to report eyes’ location every 2 msFootnote 1 with an accuracy within 0.25–0.50° of visual angle. The EyeLink 1000 Plus provided a spatial resolutionFootnote 2 of 0.01° of visual angle. Before each task (i.e., IOP-29 and its parallel version), all participants were asked to complete a 9-point calibration and validation in order to set the eye tracker for an accurate gaze point calculation tailored on each participant’s eye. Figure 1 shows the experimental setup, the EyeLink 1000 Plus apparatus, and prototypical subjects looking at the screen.

Fig. 1
figure 1

Experimental setup showing the EyeLink 1000 Plus apparatus and a prototypical participant looking at the screen


A malingering experimental paradigm (sometimes named “analogue” or “simulation” study) was implemented. Prior to participants’ recruitment, the study was approved by the Lancaster University Research Ethics Committee. Participants volunteered to take part in the study and were randomly assigned to the “feigning” condition (i.e., instructed to feign schizophrenia) or the “honest” condition (i.e., asked to complete the entire procedure honestly). Specifically, participants assigned to the feigning group (n = 43) received a vignetteFootnote 3 describing a scenario in which they might find it convenient to simulate schizophrenia. The most typical symptoms and manifestations of schizophrenia were also reported at the end of the vignette. Experimental feigners were warned not to exaggerate the symptom presentation to avoid being detected as simulators. To this end, they were told that if they could appear genuinely suffering from schizophrenia without being identified as simulators, they would have the chance to win £25. Said differently, prior to starting that experimental procedure, feigners were informed that they could win £25 if they could produce test results that look like those of patients with schizophrenia. Thereby, the potential £25 award served as an external incentive. Experimental feigners were also administered the parallel version of the IOP-29 but with the instruction to answer honestly.

Participants assigned to the “honest” condition (i.e., control group, n = 40) received a vignette describing the same scenario, but they were not asked to put themselves in the shoes of someone willing to feign schizophrenia. More specifically, the vignette they were presented with was about someone else feigning schizophrenia, and they were asked to read and memorize it. This was done in order to ensure they actually read and processed it. Then, they were instructed to complete the IOP-29 and its parallel version honestly, following standard instructions. Honest responders were also informed, prior to the beginning of the experiment, they could have the chance to win £25 if they completed both tasks.

For the entire duration of the experiment (i.e., both while filling out the standard and the parallel IOP-29), participants’ eyes were being tracked and their eye movements recorded. The layout of both the IOP-29 and its parallel version closely resembled the layout of the IOP-29 in its online administration format (Fig. 2). Order of administration of the two IOP-29 versions (i.e., standard and parallel) was randomized and counterbalanced. Once the experiment was completed, all participants were paid £5 and were asked to provide their email so that they could be contacted in case they resulted to be the winners of the £25 award.

Fig. 2
figure 2

Prototypical image of how the IOP-29 items were presented on the screen. Note: To protect test security, we did not report an actual item from the IOP-29. This is a representation of how the items were portrayed on the screen. This set-up corresponds to the online administration format of the IOP-29. In order to test our hypotheses, and prior to data collection, we created four Areas of Interest (AOI) corresponding to the four “boxes” in the image, i.e., AOI #1 = “Question or Statement” box; AOI #2 = “True or Mostly True” box; AOI #3 = “False or Mostly False” box; AOI #4 = “Does not make sense” box. The three response option boxes measured 5 cm × 2 cm; the question/statement box measured 22.5 cm × 3.5 cm

Data Analysis

Preliminary Analyses

First, to rule out the possibility that the two groups simply had a different visual approach attending to the items and response options on a test, we compared the mean dwell time (measured in ms), number of fixations, and number of runs from a response option to another made by feigners vs controls while taking the parallel form of the IOP-29. Because both groups were instructed to respond honestly to the parallel IOP-29, no between-group differences were expected. Next, to ensure that feigners made an effort to follow instructions and to respond to the items of the standard IOP-29 as if they were suffering from schizophrenia, we inspected the scores of the IOP-29 FDS produced by the two groups. Given that the IOP-29 has demonstrated strong validity in discriminating bona fide from experimentally feigned schizophrenia (Giromini & Viglione, 2022), we anticipated significant between-group differences, with a large or very large effect size.Footnote 4

Main Analyses

To evaluate whether feigners scanned the text and response options of the IOP-29 items differently from honest controls (hypothesis 1), we calculated five key indicators:

  1. 1.

    The average dwell time (ms) spent on reading the text of each item of the IOP-29 (Mean Dwell (Items’ Text)).

  2. 2.

    The average dwell time (ms) spent on visually scanning the three response options (i.e., “True,” “False,” and “Doesn’t Make Sense”) of the 27 multiple-choice items of the IOP-29 (Mean Dwell (Response Options)).

  3. 3.

    The average number of fixations made while reading the text of each item of the IOP-29 (Mean Fixations Count (Items’ Text)).

  4. 4.

    The average number of fixations made by the participant while visually scanning the three response options of the 27 multiple-choice items of the IOP-29 (Mean Fixations Count (Response Options)).

  5. 5.

    The average number of times the eyes of the examinee moved from the outside to the inside of any given response option areas, across all of the 27 multiple-choice items of the IOP-29 (Mean Run Count (Response Options)). This may be conceived of as an index of uncertainty by the participant, given it is based on the number of times the examinee moves their eyes from one response option to another within the same IOP-29 item.

Next, we inspected whether our experimental feigners focused their visual attention more on the feigning-keyed response options than did the controls (hypothesis 2). For example, for an item stating “I have never smiled in my life,” the feigning-keyed option would be “False,” because feigners are expected to endorse F more frequently than honest responders do, since it is really unlikely for a person to authentically say that they never smiled in their life.Footnote 5 Thus, our hypothesis 2 states that for an item like this, feigners would focus their visual attention more on the response option “False” than would controls. Additionally, we also tested whether experimental feigners spent more time, compared to honest controls, scanning those feigning-keyed response options, even when they eventually decided not to endorse them. This was done because feigned-keyed response options are obviously more likely to be chosen by feigners than by controls, so we were concerned that feigners might focus more on these response options than controls simply because they endorsed them more, rather than because they thought about them more or for longer time. To test these hypotheses, we performed a series of t-tests assessing possible between-group differences on the average dwell time (ms), fixations count, and number of runs made from a response option to another on IOP-29 feigning-keyed response options.


Results of Preliminary Analyses

Consistent with the hypothesis that the experimental group did make an effort to fake schizophrenia, and in line with previous research on this matter (Giromini & Viglione, 2022), the IOP-29 FDS scores of our experimental feigners (M = 0.82; SD = 0.22) were significantly higher than those of our honest controls (M = 0.13; SD = 0.12), t(67.3) = 17.83, p < 0.001.Footnote 6 Cohen’s d was 3.84, which is in line with previous research comparing honest responders against experimental feigners of schizophrenia (Giromini & Viglione, 2022). The Receiver Operator Characteristic curve (ROC) was 0.98 (SE = 0.01; see Figs. 3 and 4 and Table 1).

Fig. 3
figure 3

Receiver Operator Characteristic Curve (AUC) of IOP-29 FDS. Note: The Receiver Operator Characteristic (ROC) curve illustrates the diagnostic accuracy of the IOP-29 by showing the true-positive rate (sensitivity) and the true-negative rate (specificity). This, the Area under the Curve, is a graphical measure of the accuracy of the IOP-29

Fig. 4
figure 4

Representation of IOP-29 FDS scores by group. Note: The figure shows the graphical representation of the distribution of the IOP-29 FDS scores obtained in the two conditions, i.e., controls and experimental feigners. The Y-Axis represents the FDS scores range, whereas the X-Axis represents the frequency of participants that obtained a specific score. The dotted line in the X-axis indicates the IOP-29 FDS value of 0.50

Table 1 Sensitivity and specificity of the IOP-29 based on three commonly inspected cutoffs

Furthermore, as hypothesized, the average dwell time (ms), the fixations count, and the number of runs from a response option to another during visual inspection of the parallel IOP-29 did not differ by group (all p’s > 0.05). These findings indicate that when instructed to respond honestly, the two groups did not significantly differ in their approach to visually scanning the items and response options of the parallel IOP-29.

Results of Main Analyses

As shown in Table 2, on average, feigners spent more time than controls looking at the text of the IOP-29 items (Cohen’s d = 0.48), but no statistically significant differences emerged when considering the total time spent on the response options, nor when considering possible runs from one response option to another. However, as hypothesized, experimental feigners focused their visual attention on feigning- keyed response options more than controls did, regardless of whether they eventually decided to endorse those response options (Table 3). Therefore, feigners spent more time observing feigning-keyed response options and returned to those response options more frequently than controls did. Crucially, this finding holds true even when considering the average dwell time (ms), average fixations count, and average number of runs from a response option to another referred to feigning-keyed response options not endorsed by the respondent. Said differently, our experimental feigners paid more attention to the feigning-keyed response options even if they eventually decided not to endorse them. The size of these differences ranges from d = 0.86 to 1.11.

Table 2 Visual inspection of IOP-29 items and response options by controls and feigners
Table 3 Visual inspection of feigning-keyed response options by controls and feigners


Assessing symptoms validity is a crucial step in order to draw useful conclusions about an examinee’s health status, make accurate diagnoses, and plan appropriate medical treatments. This is true especially in high-stakes forensic contexts, in which there is a significant risk of incurring in false or exaggerated symptom presentations. Thus, to detect possible over-reporting, clinical and forensic psychologists are expected to utilize several SVTs and PVTs in their daily practice.

Our study sought to examine the utility of pairing a bio-behavioral measure with a SVT in order to improve detection of feigned psychiatric conditions. Specifically, we conducted a simulation study to investigate eye movements during the administration of the IOP-29 in a sample of 83 healthy individuals, half of which were asked to simulate schizophrenia (the other half served as group of control).

To date, a few studies have addressed the use of eye movements in relation to feigned cognitive deficits, but none have examined eye behavior in relation to those psychiatric disorders whose eye-tracking abnormalities are well-established (e.g., schizophrenia). Only one study (Ales et al., 2021) attempted to address this topic but its focus was on PVTs and not SVTs. Therefore, the current study aimed to investigate the oculomotor behavior of healthy participants who were asked to feign schizophrenia while completing an SVT (i.e., the IOP-29), comparing their eye movements to control participants who were asked to complete the same test honestly.

The results of this study indicate that overall, compared to controls, feigners spent more time looking at the text of the IOP-29 items and that they focused longer on and returned more frequently to feigning-keyed response options. Taken together, these results suggest that tracking an examinee’s eye movements while taking an SVT can provide information about the credibility of their responses.

Experimental feigners spent slightly more time than controls looking at the text of the IOP-29 items. Therefore, feigners may have been thinking about which option they should endorse. There is consensus that fixation duration in a task is associated with the duration of the cognitive processes and the degree of engagement in that same task (e.g., Irwin, 2004). This is consistent with the theory that deception increases cognitive load and the effort required by the participant (Blandón-Gitlin et al., 2014; Sporer, 2016; Vrij et al., 2011), partly through inhibition of the truthful response (Lane & Wegner, 1995). In fact, the two groups (i.e., controls and feigners) did not differ when they were both asked to respond honestly, suggesting that it is deception that drives the delay.

Additionally, compared to honest responders, experimental feigners spent more time and made a higher number of fixations and a higher number of runs from one response option to another in the feigning-keyed response options, even when eventually they did not they endorse that option. The extra time experimental feigners took may be due to the effort required by high-level decision-making and problem-solving strategies. When requested to respond to the items, patients with schizophrenia or control participants may ask themselves if that particular item represents them or their experience of their own symptoms, whereas feigners have to (a) consider whether a particular item could reflect the experience of a genuine patient affected by schizophrenia; (b) reason out how to respond in order to appear schizophrenic but, simultaneously, not be detected; and (c) suppress thoughts about their own experience.

It is worth mentioning that this study also has some important limitations. First and most importantly, there was no direct comparison with patients with schizophrenia. As a simulation design, in our study, small incentives were offered to experimental feigners and their performance was compared with that of honest controls. As such, one might question the generalizability of our results to clinical settings since no comparison was made with patients with a genuine schizophrenic symptomatology. Future studies should compare results of experimental feigners to those of genuine patients in order to test whether individuals attempting to malinger are able to feign schizophrenic-like symptoms without being detected, offering a higher generalizability of findings. Second, our sample consisted mostly of women. With regard to eye movements, results on sex differences in visual scanning generated mixed results. Recently, Mathew et al. (2020) investigated sex differences in oculomotor tasks and their results showed no significant differences. However, some studies have observed slight differences using specific stimuli or tasks. As for the IOP-29, no significant sex differences were reported suggesting no gender influences on IOP-29 results (Carvalho et al., 2021; Giromini et al., 2020a). Nevertheless, one might question the generalizability of our results and future studies should take sex of the participants into account. Third, our study was designed as a malingering experimental paradigm and, although we made an effort to engage participants assigned in the feigning group (e.g., financial incentives, relevant vignette scenario), they were nonetheless instructed to feign schizophrenia so the external validity of our study might be questioned, given the differences with real-life forensic contexts. In addition, we did not employ a post-manipulation check. Rogers (2008) recommended using post-test questions in simulation studies to verify that the participant understood their task and was compliant with the instructions. Therefore, this certainly represents a limitation of our study. However, the IOP-29 performed almost the same in this study as in other similar studies in which a manipulation check was implemented (for a review, see Giromini & Viglione, 2022), suggesting that our results should not have been compromised. Somewhat related, our internal validity should not have been affected by the experimental design we implemented, given that our participants were not suspected malingerers but rather were openly instructed to feign symptoms of schizophrenia. Moreover, as mentioned above, the use of role simulation and random assignments to the feigning condition should have preserved internal validity of our study. Fourth, although the items of the IOP-29 include feigning-keyed response options, the ultimate feigning score of the IOP-29 is generated by considering a multitude of factors, including the consistency between one response and another (Viglione et al., 2017). In addition, test-takers are unlikely aware of which of the IOP-29 response options are more likely to suggest bona fide schizophrenia vs deliberate feigning. Accordingly, future studies in which fewer and more straightforward response options are provided for each item (e.g., the SIMS) would be beneficial. Finally, a technical limitation worth mentioning is that our experiment was designed so that the participant chose the response option (i.e., True, False, and Doesn’t make sense) by clicking one out of three keys on the numeric keypad (i.e., 1, 2, and 3). This was done to prevent the participant from looking away from the screen—which would have compromised data acquisition—as could have happened using the mouse. However, this may have resulted in an automated process in which the participant was not prompted to look at the area of interest corresponding to the response option. On one hand, this would explain the absence of significant differences between experimental simulators and honest participants in the number and duration of fixations on the IOP-29 response options overall; on the other hand, it makes it even more remarkable that feigners paid more attention than controls to the feigning-keyed response options. Indeed, using the keyboard instead of the mouse may have underestimated participants’ eye behavior in terms of duration and number of fixations. Thus, it is perhaps remarkable that we were able to objectively discern that, despite the possible underestimation of eye movements’ measurement, our experimental feigners paid more attention to the feigning-keyed response option overall by comparison to the control group.

Despite these limitations, our study sought to provide preliminary evidence that eye movements may improve symptom validity assessment. Indeed, the use of eye-tracking technology in conjunction with the administration of the IOP-29 has the potential to improve our understanding of the cognitive load of experimental feigners during item inspection, as well as the simulation strategies used by individuals instructed to pretend to be mentally ill. Our results contribute to a deeper understanding of the decision-making and cognitive processes underlying deception mechanisms and simulation attempts. Although eye-tracking technology—and other neuropsychological measures as well—is advancing both in terms of cost effectiveness and usability, we believe that, to date, they are not yet ready to be paired with symptom validity assessment in real-world settings. Nevertheless, they certainly represent a resource to refine available SVTs and PVTs. As such, our study may represent a proof-of-concept that the use of bio-behavioral measures such as eye movements is extremely useful in validity assessment contexts, given the increasing demand for valid and reliable instruments that would enhance the quality of clinical and forensic assessments, facilitate the practice, and encourage gold standards in delivering psychological services (APA, 2013). Perhaps in the future these technologies will be more accessible and can be paired with self-reports for malingering detection. Overall, our findings indicate that eye tracking technology may be a promising adjunct for assessing symptom validity.