Research has shown that approximately 30% of the general public distorts their reports about their mental or physical state to reflect their health in a worse way than it actually is (i.e., feigns; Dandachi-Fitzgerald et al., 2020). Yet, such feigning behaviour in the general public does not necessarily signals intentional deception. But when we look at specific populations that have certain external incentive to intentionally portray their health worse than it is (i.e., exhibit negative response bias), such as forensic sample or compensation applicants, the terms malingering or feigning bad are more suitable (van Oorsouw et al., 2016). The exact prevalence of malingering is unknown, as it is based on estimates, but the research has indicated that, for physical symptoms reports, prevalence ranges between 20% and 40%, whereas for psychological complaints, the prevalence goes up to 50% (Greve et al., 2009; Mittenberg et al., 2002; Freeman et al., 2008; see also Young, 2015). This discrepancy could be explained simply due to the subjective, therefore rarely verifiable, nature of psychological complaints. The key feature of malingered symptom claims is the largely inflated scores on symptom validity tests (SVTs), which usually contain bizarre or implausible complaints that genuine patients would not endorse (Bianchini et al., 2001). Hence, by endorsing such items, negative response bias is exhibited.

In a similar manner in which people elevate their complaints, they can also hinder them (i.e., exhibit positive response bias or fake good). People who engage in “faking good” intentionally deny or grossly minimalize their physical or psychological issues (Rogers, 2018). This behaviour is estimated to be one of the most pervasive source of bias in evaluations, and an even more frequent issue than faking bad, with the estimated prevalence ranging up to 50% (see Griffith & Converse, 2012; Mazza et al., 2020), although the exact prevalence is yet unknown (Rogers, 2018). Recent research showed the engaging in underreporting of psychopathology seems to be more related to the assessment context (high vs. low stake) rather than to the evaluees’ personality (Novo et al., 2022). Therefore, it is not surprising that the typical contexts in which faking good behaviour is exhibited are: job applications (Armour, 2002; Donovan et al., 2003), child custody hearings (Baer & Miller, 2002), parole hearings (Kucharski et al., 2007), therapy (Blanchard & Farber, 2016), and personality assessments (Griffin & Wilson, 2012). In all of these contexts, admitting psychopathology, or any type of socially unacceptable behaviour (e.g., substance abuse) would be considered unfavourable. There are different types of faking good behaviour, such as defensiveness, socially desirable responding (see Rogers, 2018), and supernormality (Cima et al., 2008), among others. Supernormality presents systematic denial of even most common symptoms, regardless of the social context (Cima et al., 2003, 2008). Due to its low reliance on social norms, supernormality is a separate type of symptom underreporting than socially desirable responding. However, supernormality does imply social desirability, although this relationship does not go both ways (Cima et al., 2003).

It is important to note that, although equally relevant, faking good behaviour is often times neglected by researchers compared to the volume of research output referring to faking bad, although the study designs for these two research questions do not differ significantly. Response bias, whether negative or positive, is usually experimentally tested by employing a simulation design (Rogers, 2018). This type of study design consists of providing participants with a specific scenario depicting a person for whom it would be beneficial to act in a certain way (e.g., faking bad or faking good), and participants are asked to identify with the protagonist. If compliant, participants in the experimental conditions then exhibit significantly different scores on symptom inventories than healthy participants who were told to respond honestly (i.e., the control group). Interestingly, literature on faking behaviors showed that participants who were instructed to intentionally fabricate certain behaviors (e.g., overreporting neurocognitive and psychiatric symptoms), continue reporting such complaints even after being asked to stop faking and to act or respond honestly (Kunst et al., 2016; Merckelbach et al., 2011). Hence, that the reporting instructions employed in the simulation design can actually prompt prolonged changes in one’s self-perception of their health (i.e., residual effect of feigning). Further research of this behaviour also showed that this residual effect of feigning seems to be resistant to corrective feedback (Merckelbach et al., 2015). The authors argued that the most probable explanation of this effect seem to fall under the cognitive dissonance theory (Festinger & Carlsmith, 1959). When one’s beliefs and their actions are in conflict, the person might adjust their beliefs in order to align them with exhibited behaviour, hence, decrease the discrepancy between the two. For instance, people usually wish to think of themselves as “good”, yet, “good people” do not engage in faking. Hence, one potential way of solving that discrepancy would be to convince oneself that the faked behaviors/emotions are indeed genuinely experienced (i.e., internalization of symptoms; Merckelbach & Merten, 2012). However, the above-mentioned studies inspected only faking bad behaviour, therefore, it is currently unknown whether the residual effect of feigning works as well when people exhibit positive response bias (i.e., “fake good”).

The majority of literature on faking good is mainly focused on personality traits (e.g., Mazza et al., 2020), hence, one important aspect is oftentimes neglected, such as well-being or, simply put, happiness. Happiness has been associated with positive social impressions and other beneficial outcomes (e.g., it facilitates the pursuit of goals or contributes to vital social bonds), and it is considered as an important societal value (Eid & Diener, 2001; Oishi et al., 2007). Hence, there are strong motives for people to still strive to appear as happy to others (Moore et al., 2017). Some of the previous research supports the notion that faking happiness would increase well-being and other, in contrast, provided evidence showing that faking happiness might have counterproductive effects (e.g., Abele, 2003; Ferguson and Sheldon, 2013; Hülsheger & Schewe, 2011). Yet, none of these studies specifically tested the overtime impact the instructions prompting people to appear more happy could have on one’s self-perception (i.e., residual effect of feigning).

Current Study

Research has shown that attorneys very often coach their clients to fake good (see Baer & Miller, 2002). In the forensic context, for instance the custody hearings, preseting oneself as a happy and healthy person might aid being viewed as a better suited guardian, hence, this type of coaching is unsuprising. Therefore, in this study, we sought to achieve two goals: first, to test the efficacy of simulation instructions (i.e., coaching) in evoking positive response bias, and second, to check if such instructions could trigger the residual effects of feigning, which has not yet been tested in the domain of faking good behaviour. We operationalised happiness as subjective well-being (hedonic approach; Diener, 1984) and psychological well-being (eudaimonic approach; Ryff, 1989). Subjective well-being involves a combination of a cognitive (i.e., life satisfaction) and two affective components (i.e., positive and negative affect; Cummins & Cahill, 2000), whereas psychological well-being is explained by positive (and effective) functioning that comes from the development and realization of the human potential (self-fulfillment; Waterman, 1993). Student participants were randomly allocated to either the control (genuine) group or to a fake happy group. All participants needed to fill out well-being measures during two sessions separated by 7-day period (period was extended to eight days, see Participants). Members of the control group received instructions to respond honestly, whereas the fake happy group first received instructions to fake good, and then to respond honestly during the second session. Based on prior research on the residual effect of feigning, we anticipated that the fake happy group would exhibit higher well-being scores than the control group on the measures provided in both sessions.

Method

Participants

Initially, 152 bachelor students joined our study. The inclusion criteria were that (1) participants completed all of the questions (nexcluded = 24); (2) that participants provided a traceable code in order to combine their data from two sessions (nexcluded = 12); (3) that participants completed session 2 seven days after session (the period was extended to eight days as nine participants finished the study just a few hours after the seventh day; nexcluded = 21). This left a total of 95 participants.

The average age of the participants was 20 years old (range: 18–26; SD = 1.69), and the majority was female (87.4%; male 10.5%, and two participants did not want to disclose their gender). The most frequent nationalities within our sample were Dutch (42%), German (11%), British (3%), Turkish (3%), and Bulgarian (3%). As the study was conducted in English, we checked participants’ proficiency in English on a 5-point scale (1 indicating poor proficiency, 5 indicating C2 or native level). The results showed that participants had very good proficiency in English (M = 4.32; SD = 0.64; range: 3–5). Then, participants were randomly divided into the control group (n = 52) and the manipulation (i.e., fake happy) group (n = 43). The groups did not differ in terms of age (t(93) = 1.40, p = .166), nor in English proficiency (t(93) = 0.13, p = .894).

Measures

Self-Rating of Happiness Scale (SRHS; Abdel-Khalek, 2006). The SRHS was developed to measure individuals’ happiness using a single self-report item (“How satisfied are you with your live as a whole?”) that is scored on a 11-point Likert scale (anchors: 0 = completely dissatisfied; 10 = completely satisfied). This measure was applied to check participants’ baseline happiness.

Subjective Happiness Scale (SHS; Lyubomirsky and Lepper, 1999). The SHS is a 4-item measure of one’s overall happiness that are scored on a 7-point Likert scale. Each item (e.g., “In general, I consider myself:”) presents dichotomous sentence fragments (e.g., from “not a very happy person” to “a very happy person”) to which the respondent indicates to what extent they identify with each item. A higher score represents greater levels of happiness. In the current study, the Cronbach’s alpha of the scale was 0.84.

Satisfaction with Life Scale (SWLS; Diener et al., 1984; Pavot & Diener, 2008). The SWLS is a brief 5-item self-report measure of one’s global life satisfaction. Responses for each item (e.g., “In most ways my life is close to ideal.”) are recorded on a 7-point Likert scale (anchors: 1 = strongly disagree; 7 strongly agree) and possible scores range from 5 to 35 where a higher score indicates a higher satisfaction. In the current study, the Cronbach’s alpha for the SWLS was 0.87.

Supernormality Scale – Revised (SS-R; Cima et al., 2008). The SS-R is a revised and extended version of the Supernormality Scale that was initially constructed by Cima et al. (2003) as a self-report measure of supernormality (Cima et al., 2008). The SS-R contains 50 items (e.g., “When I walk on the street late at night, I have the feeling that I am being followed”) that are rated on a 4-point Likert scale (anchors: 1 = not at all applicable; 4 = extremely applicable) and subdivided into seven subscales. Namely, the subscales address (1) social desirability, (2) mood disorders, (3) obsessive compulsive symptoms, (4) psychotic symptoms, (5) dissociative symptoms, (6) aggression, and (7) anxiety. Additionally, 16 distractor items are included (e.g., “I like to watch television”). Subscale scores are computed as well as a total score. The lower the score, the more an individual appears to present supernormality and the cut off is set at 60. In the current study, the Cronbach’s alpha for the total SS-R was 0.89.

Scale of Positive and Negative Experiences (SPANE; Diener et al., 2009). The SPANE is a 12-item self-report measure of positive and negative experiences that reflects various good or bad feelings such as boredom, interest, physical pleasure. The respondent is asked to think about what they have been doing in the past four weeks and then indicate on a 5-point Likert scale (anchors: 1 = very rarely or never; 5 = very often or always) how often they experienced each feeling (e.g., “positive”, “unpleasant”, or “joyful”). A positive (SPANE-P) and a negative scale score (SPANE-N) can be computed (range from 6 to 30), as well as an overall balance score (SPANE B) between those two by subtracting the negative score from the positive one (range from − 24 to + 24). The more extreme a balance score, the more an individual tends to identify those feelings in themselves (Diener et al., 2009). In the current study, the Cronbach’s alphas for the SPANE-P and the SPANE-N were 0.92 and 0.85, respectively.

Flourishing Scale (FS; Diener et al., 2010). The FS is a short self-report measure that taps into a broad range of components of social-psychological functioning essential to well-being. Its eight items (e.g., “I lead a purposeful and meaningful life”) are scored on a 7-point Likert scale (anchors: 1 = strongly disagree; 7 = strongly agree) and address the respondent’s own perception of their relationships, self-esteem, optimism, and purpose in life. The higher the total score (ranging from 8 to 56), the more positive the respondent seems to view themselves across those diverse domains. In the current study, the Cronbach’s alpha of 0.90 indicated excellent internal consistency.

The SRHS and the SS-R were included in the first measurement occasion (Session 1) to control for participants’ baseline happiness levels and check for instruction compliance, respectively, whereas the other measures were conceptually paired into measures of more affective well-being (Session 1: SHS and Session 2: SPANE) and more cognitive/eudaimonic well-being (Session 1: SWLS and Session 2: FS). The reasoning behind such pairings is justified by previous work showing significant correlations between these measures (Pair 1. SHS and SPANE P correlations ranging from 0.61; Proctor et al., 2015, to 0.83; Rice & Shorey-Fennell, 2020; with SPANE N − .65, Proctor et al., 2015; Pair 2. SWLS and FS correlations ranging from 0.71, Ramsay et al., 2023) to 0.76, Fong and Loi, 2016).

Procedure

Students signed up online via the university-internal [removed for review] website and then received a Qualtrics link to the study. Data were collected in the period of April-May 2021. After receiving study information and providing the informed consent, participants were asked about their age, gender, level of education, nationality, as well as their English proficiency. To measure participants’ baseline happiness, they were administered the one-item SRHS. Thereafter, they were randomly assigned to one of two conditions (i.e., control and fake happy groups). The randomization was a priori set in Qualtrics, so it was done automatically. However, if a participant endorsed the highest score on the happiness baseline measure (i.e., 10), they were automatically assigned to the control condition because it would not be possible to evaluate whether those participants showed any potential residual effects of our manipulation (n = 3). Participants in the control condition were instructed to answer all questionnaires truthfully. The fake happy group was asked to read a vignette about a person who is overall happy and has generally a very wholesome life, and were told to imagine being the protagonist. We based our vignette on two well-being dimensions; (1) indicators of momentary positive affect/emotions (e.g., enjoys music in the morning) and satisfaction with life (e.g., is very fulfilled and loves s/her life); and (2) indicators of positive relationships (e.g., loves meet people and is in a loving relationship), meaningful life (i.e., loves to learn and is very productive), and personal growth (e.g., is very understanding towards other people and her/himself). Subsequently, they were asked to respond to the questionnaires as if they were the happy person from the vignette. The first part entailed three questionnaires, namely, the SHS, the SWLS, and the SS-R. The latter scale was included to inspect whether participants were compliant with the instructions and if they were exhibiting positive response bias. Subsequently, participants answered three questions about their motivation to participate in this project, about the clarity of the questions and instructions, and about the difficulty of the survey. They were asked to create a personalized code that they should use in session 2, so that their responses could be merged.

Seven days later, the link to the second part was automatically sent to the participants via Qualtrics. The second part contained a question for the code and two well-being questionnaires, namely the SPANE and the FS. Rather than using the same measures at time 1 and time 2, multiple different measures of well-being were chosen to avoid the recursiveness of scales and carry-over effects. Among the scale items, several items that required a specific answer were added as attention and effort checks. All participants passed the checks. Finally, participants filled in the same evaluation questions as from part one. All participants received a debriefing form. Students who completed both parts of the study received one research credit as compensation. The study was approved by the Ethics Review Committee of the [removed for review].

Data Analyses

The data were tested in statistical package SPSS, using the independent t-tests and the pair t-tests combining the measures in Session 1 and Session 2 (pair a: SHS – SPANE Balance; pair b: SWLS – FS). For the interpretation of effect sizes, we followed Cohen’s (1988) categorization of < 0.20 as small, 0.50 medium, and 0.80 as large effect. The data and the outputs are available at Open Science Framework (https://osf.io/rfh4w/?view_only=1e41f0e0e24440d382eca49a4a8490ac).

Results

Motivation, Clarity, Difficulty

Participants were asked to rate their motivation, the clarity, and the difficulty of the tasks for both study sessions on a 5-point scale. In Session 1, the overall ratings were high (Mmotivation = 3.77, SD = 0.59; Mclarity = 4.62, SD = 0.60; Mdifficuly = 4.07, SD = 0.83). In Session 2, the ratings for motivation were slightly lower (M = 2.61, SD = 0.76), whereas the ratings for clarity and difficulty remained high (Mclarity = 4.71, SD = 0.70; Mdifficulty = 4.38, SD = 0.72). The means and standard deviations for each group, as well as independent t-tests are presented in the Table 1. We checked whether the groups differed in their ratings using the independent t-tests. The only significant difference (p < .05) was found in the clarity of the task in Session 1, although both groups provided high scores.

Table 1 Participants’ rating on motivation, clarity, and difficulty of the task across groups

Testing Happiness: Session 1 and Session 2

We also inspected group differences on all of the used measures using the independent t-tests. The means and standard deviations on each measure across groups are provided in Table 2. Before introducing the instructions and other measures, we asked participants to report their overall life satisfaction (0–10), and students in both groups reported similar, above-average levels on this measure. Subjects provided similar scores on most other measures as well. The significant group differences (ps < .001) were found only on SWLS (Cohen’s d = 1.11) and SS-R (Cohen’s d = 0.82), both given during Session 1, right after the instructions to either respond honestly or to fake happiness. The fake happy group exhibited higher levels of life satisfaction and higher rates of supernormality. The data were checked also using the non-paramtric Mann-Whitney U test, but the results did not differ.

Table 2 Means, standard deviations, and t-tests for all used measures at Session 1 and Session 2

Testing Happiness over Time

In order to test whether participants indicated stable levels of happiness within their groups overtime, we conducted pair t-tests. We compared (standardized scores) on SHS with SPANE B (Balance score = SPANE P - SPANE N), and (standardized) scores on SWLS and FS. The results are presented in Table 3. Our results indicated that both groups dropped their scores significantly (p < .001) over time regarding life satisfaction. The data were also checked using Wilcoxon Signed Ranks test, and the results did not differ.

Table 3 Standardized Z values and pair t-tests for control and happy group

Discussion

In this study, the goal was to investigate whether the residual effect of feigning (Merckelbach et al., 2011) occurs when used for faking happiness, specifically defined through the concepts of affective and cognitive indicators of well-being. Our results are as follows: First, looking at Session 1, our data from the supernormality scale (Cima et al., 2008) showed that feigners’ scores were significantly more indicative of biased reporting than the scores of the control group. Hence, feigners significantly more denied even everyday, common symptoms (i.e., exhibited a tendency to show themselves as “supernormal”; Cima et al., 2003; Cima et al., 2008) than the control group. This confirms the immediate effect of instructions on the way one presents themselves also in the domain of symptom underreporting. It is important to note that such prompts could often be provided by attorneys or councelors in, for instance, parole or child custody hearings, when it is beneficial for a person to present themselves in a best light possible. What our results show is that such coaching can have an immediate effect.

Second, and in accordance with the previous point, the results are indicating that, despite obtaining similar baseline scores, the two groups significantly differed also on the life satisfaction measure administered during Session 1. Specifically, feigners exhibited higher life satisfaction than the control group. These findings partially support our expectations that the fake happy group will exhibit higher scores than the control group. Yet, the difference was not found for the happiness measure. Looking at the affective well-being scores, both groups exhibited an above average level of it, hence, it is likely that the control group exhibited too high levels for feigners to overtake, especially if they did not want to “overdo” their task, in order to appear credible in their assignment. Also, we did have participants who reported the maximum score, hence, we automatically allocated to the control condition, which could partially explain the lack of group difference.

Third, however, feigners’ engagement was short-lived, as the effect of feigning was only present at Session 1, right after they received the instructions. During Session 2, our fake happy participants obtained scores similar to those of the control group. Therefore, looking at the lack of differences between groups at Session 2, we did not succeed to induce the residual effect of feigning. This lack of difference could be due to the time between the sessions, which might have been too long for participants to remember the instructions. Indeed, the studies by Merckelbach and colleagues (2011), as well as the replication by Kunst et al. (2016), included intervals not longer than two hours. The original authors suggested that future investigation should introduce longer intervals between testing sessions, however, it appears that the 8-day interval was too long for the residual effect of feigning, if existing, to remain. Therefore, further research should incorporate a few testing moments so as to determine the stability of this effect. For instance, employing an experience sampling method (Verhagen et al., 2016) might be the most appropriate as participants would report their state every day repeatedly (see also Boskovic et al., 2021). This investigation would be beneficial as it would potentially indicate a peak of the residual effect, as well as its lowest point, which could be informative for, for instance, validity of repeated assessment. Although these findings suggest that feigning happiness does not reflect on increased well-being scores overtime, it is possible that symptom underreporting, as a key feature of supernormality, would persist. However, as no other measure except SS-R was available to us, we did not administer such test at Session 2. Future studies should monitor symptom underreporting overtime as well.

Fourth, we also checked for any potential over-time changes within the groups. We compared the standardized scores on measures that tap into similar constructs in order to observe any potential changes in participants’ happiness. It seems that both groups reported significantly lower levels of well-being during Session 2. This was not surprising for the fake happy group, considering the instructions they previously received and the briefness of residual effect, however, the same direction of score change was found in the control group. Together, these findings might fall under the “initial elevation phenomenon”, which was found specifically in self-reported data (Anvari et al., 2022). Namely, research subjects given the self-report questionnaires tend to initially provide significantly higher scores than those provided at later times. Thus far, this phenomenon was only found in the context of negative subjective experiences (mood and physical health symptoms; Anvari et al., 2022; pain; Boskovic et al., 2021), hence, our data might be novel in a sense that it indicates that the initial elevation phenomenon could even apply to self-reported positive experiences.

There are a few limitations of our work that need to be addressed. First, our sample consisted of well-adjusted young (female) adults (i.e., students), which limits the generalizability of our findings. Future studies should try to employ the same intervention among clinical participants such as patients with depression or other psychological symptomatology. It is also important to mention that our sample might have been too small for the employed design, so we suggest that future investigations aim at including larger samples. Second, we did not check for the presence of any psychopathology among our sample, which could have impacted the stability of the initial significantly different scores between groups. Third, the given vignette might not accurately reflect the participants’ vision of a happy person, hindering the task of “embodying” of the simulated case to respond to the measures. Fourth, even though all participants attend their studies in English, there were not native English speakers, and our testing material was only provided in that language. Hence, although unlikely, it is possible that some of our participants did not comprehend all of the administered items. Fifth, as discussed above, the time period between the two sessions in our study was significantly longer than the intervals used in the previous work. Future work should possibly include either some mid-periods or employ the experience sampling method with multiple testing points. Sixth, based on the study evaluation questions, we could see that the fake happy group provided significantly lower scores at Session 1 regarding the clarity of the instructions. This is reasonable considering that their instructions were much more complex than those of the control, yet, their scores were above 4.4, indicating still very high levels of clarity. Finally, our participants came from diverse cultural backgrounds, which is important to consider, especially in the context of happiness investigation. While West countries thrive on enhancing happiness, expressing happiness in other cultures might be considered rude or even completely irrelevant (Joshanloo & Weijers, 2014). Therefore, future studies should build on these limitations, aiming to include a more uniform sample in terms of cultural background and language.

Conclusion

Overall, in this study, we tested the strength and stability of the residual effect of feigning in the domain of faking good. We showed that instructing people to fake good (i.e., to fake happiness) leads to significantly increased self-reported well-being, as well as to underreporting of everyday symptoms, when measured right after the instructions. However, this trend was short-lived, as no residual effect of fake happiness on the well-being scores was captured eight days after instructions, which was probably too long of an interval for the effect so far only investigated in short sequences (e.g., 1 h). Therefore, future investigations, preferably with multiple and shorter testing periods, are encouraged.