Introduction

The foreign language effect (FLE) suggests that the tendency of bilingual speakers to experience less emotional involvement in their second language (L2) can lead to a reduction in cognitive biases (e.g., Keysar et al. 2012). This means that when using their L2, bilinguals may be able to engage in more rational thinking, which in turn may lead to a reduction of typical biases in decision-making or moral judgment.

Evidence for the FLE has been provided for a number of different cognitive biases. For example, it has been found that the FLE may reduce superstitious belief (Hadjichristidis et al. 2019). Bilingual participants in this study were asked to rate how bad or good they would feel about doing an action (such as applying for a job) in different “good luck” and “bad luck” scenarios. It was found that reading the scenarios in their L2 prompted more neutral feelings towards good versus bad luck scenarios. The FLE has also been found to mitigate causality illusions in a contingency learning task, where people falsely believe that two events are related (Diaz-Lago and Matute 2018).

Most of the research on the FLE has been conducted in the context of decision-making. For instance, Keysar et al. (2012) investigated the loss-aversion bias, i.e. whether the way a decision-making dilemma is framed affects how participants choose to respond to it (see also Kahneman and Tversky 1979). Their participants were presented with a hypothetical scenario in which 600,000 people were exposed to a deadly disease. The participants were presented with two choices of medicine, one of which was a “sure” option (A) and one of which was a “risky” option (B). In the gain frame condition, participants were told that a choosing medicine A will save 200,000 lives, whilst if they choose medicine B, there is a 33.3% chance that 600,000 people will be saved and 66.6% chance that no one will be saved. In the loss frame condition, they were told that choosing medicine A will cost the lives of 400,000, whilst with medicine B, there is a 33.3% chance that no one will die, and a 66.6% chance that 600,000 people will die. Hence, the outcomes were identical in both framing conditions—however, participant’s choices were not. They were more likely to choose the “risky” medicine (B) if the outcome was framed in terms of loss rather than gain—in other words, a clear framing effect was found. Crucially, being presented the dilemma in one’s L2 mitigated this bias. These findings have also been replicated by Costa et al. (2014a) on a number of similar framing problems. They suggested that using L2 reduces loss aversion because it mutes the emotional involvement of participants.

In an investigation of utilitarian judgements, Costa et al. (2014b) studied the classic ‘footbridge dilemma’, and found bilinguals participating in their L2 were more likely to opt for (hypothetically) pushing one individual off a bridge to save the lives of five others. They argued that due to reduced emotionality in L2, the emotional compromise of harming one individual does not interfere with the rational decision of saving more lives. Further research has found that the effect emerges for the ‘footbridge dilemma’, but not the ‘trolley dilemma’, which involves pushing a button to sacrifice an individual, instead of actively harming the individual (Cipolletti et al. 2015; Geipel et al. 2015).

Emotionality of the decision-making scenario presented seems to be an important mediator. Using one’s second language only seems to mitigate the bias for more emotional and morally compromising hypothetical situations; for example, those involving actively pushing a person to their death. Corey et al. (2017) replicated this effect over several experiments, and found that the FLE was stronger in personal dilemmas, as opposed to impersonal ones. Importantly, it was also found that the effect decreased if emotionality was diminished by manipulating the severity of consequences, e.g. death vs. disability vs. injury. Thus, the FLE appears to be stronger in more emotional contexts, which supports a strong link to reduced emotional resonance in one’s second language.

Little research so far has focused on whether the FLE also affects judgements about other people, in particular attributions. Attribution is defined as the process of assigning cause and meaning to the actions of others and/or phenomena in the world around us (e.g., Alicke et al. 2015). Previous research on attribution suggests that people often fail to provide unbiased judgements. One well-known attribution bias, for example, is the fundamental attribution error: people are prone to attribute their own mistakes to environmental factors, whilst attributing mistakes made by others to dispositional factors (e.g., Ross 1977). More recently, however, some theorists argued that this divide between ‘person’ vs. ‘environment’ is too simplistic, as it fails to address the complex reasons behind responsibility, such as intervening causes, failure to act, or previous failed attempts (Alicke et al. 2015).

The aspect of emotion has also been incorporated into attribution theory. According to the ‘person-as-reconstructor’ theory (Kahneman and Tversky 1982; Kahneman and Miller 1986), psychological reactions to an event are reconstructed after the event. Tragic outcomes produce strong affective reactions, which motivate observers to reconstruct the event and look towards alternative choices. An actor may be blamed for failing to act differently, even when the outcome was not foreseeable to the actor. Similarly, the ‘person as moralist’ theory (Alicke et al. 2011; Mandel 2010) argues for a bidirectional relationship between cause and blame. The theory suggests that assessing an actor’s causal role becomes conflated with the observer’s emotional responses. Factors like negative perceptions of an actor, or negative consequences of an action, can therefore influence blame attribution to some extent.

According to the optimality principle, observers assume that people are rational and strive to make the best possible decision in a complex and competitive environment (Schoemaker 1991). This principle is often problematic when judging other people (Toda 1991), specifically given that observers are hardly able to account for the many unknown variables that can affect the actions of others. This can lead to a discrepancy between perceived intention and behaviour, and failure to realise that ‘good intentions’ may not necessarily lead to ‘good outcomes’ (or vice versa). In other words, observers often fail to recognise the simple fact that people are fallible and make mistakes, and that optimality cannot always be achieved.

A recent study has offered a novel application of this concept, by studying optimality bias in moral judgements (De Freitas and Johnson 2018). The authors argued that suboptimal choices or actions made by others are difficult to understand, because people are always expected to behave optimally, even in situations where they do not have full control. Consequently, actors making suboptimal decisions will elicit more pronounced affective reactions in observers, and thus be subject to more severe moral judgements.

In a series of experiments (De Freitas and Johnson 2018), participants were presented with different vignettes, each describing a scenario where an actor must choose between three different alternatives, e.g., a doctor having to choose between three different treatments for a patient with hearing problems. Unbeknown to the described actor, the three options had different degrees of optimality. The vignettes always explicitly stated that the actor thought that all options were of equal efficacy, while in fact they had statistically different success rates. Regardless of the described actor’s decision, the vignettes always described the same tragic outcome (e.g., the patient suffering from permanent hearing loss after treatment). Participants were randomly allocated to conditions in which the actor made either the best, middle, or worst decision from an objective, omniscient perspective. It was found that actors who made the best choice were assigned significantly less blame than those in either of the two suboptimal conditions. This effect emerged despite the fact that all decisions were made in the same (hypothetical) context of insufficient knowledge, and that each type of decision produced the same negative outcome. The authors replicated this effect across seven experiments with different manipulations, including varying the consequences of the action and the degree of explanation regarding the actor’s intentions. De Freitas and Johnson (2018) concluded that the most important factor in this bias is the tendency to ignore the actor’s mental state, i.e., to expect them to behave optimally even when this is not possible from the actor’s point of view.

To date, there is hardly any research on linguistic background as a potential mediating factor in attribution biases, despite the wide-ranging implications such biases may have on social judgements in general, and the previously discussed foreign language effect (FLE) findings in particular. The present paper is a first attempt at bridging this gap by exploring whether the FLE modulates the optimality bias in blame attribution. Specifically, we aim to replicate De Freitas and Johnson’s (2018) work with slight modifications to the design. More specifically, we investigate whether the optimality bias in blame attribution is mitigated by the FLE. The original experiments had three levels of optimality (best, middle, worst), but found no significant difference between the two sub-optimal conditions. As we are adding a target language manipulation to our designs (L1 vs. L2), we will include only two levels of optimality.

In the following, we will report two separate experiments. The first experiment compares optimality bias across two speaker groups (native [L1] vs. non-native [L2] speakers of English) using vignette materials in English. The second experiment compares the effect across two target languages (Finnish [L1] vs. English [L2]) within a population of Finnish–English bilinguals.

In line with the original study, we expect that participants should ascribe more blame for a negative outcome to a hypothetical actor who unknowingly chooses the worst course of action (suboptimal condition) than to a hypothetical actor who unknowingly chooses the best course of action (optimal condition). We expect this to happen even though (a) the consequences of the choice are equally negative and (b) the actor is described as having insufficient information in each case. More crucially, under the assumption that this effect is mitigated by the FLE, we also expect an interaction between condition and target language. Specifically, as a result of reduced emotional involvement in L2, we predict that there should be a reliably weaker optimality bias in blame judgements when participants are tested in their second language (L2), compared to when they are tested in their native language (L1).

Method

Pre-registration

Hypotheses (see above), methods, and analyses (indicated in the results section) were pre-registered on the Open Science Framework (https://osf.io/arx3u).

Participants

Three groups of participants were recruited across the two experiments; a native English-speaking monolingual group, a bilingual Finnish–English group, and a bilingual group that consisted of native speakers of various languages with English as their L2. All participants resided in the United Kingdom at the time of taking part in the experiment. In both experiments, bilingual participants were asked to fill out a questionnaire regarding their language background (see “Appendix A”). Bilingual participants were defined as speakers who are fluent in their native language and in English as their second language. Bilingual participants who reported having learned English before the age of six and/or having native English speaking parents were not included in the final sample. This cut-off point was chosen to exclude ‘early bilinguals’, i.e. participants who have learnt English from early childhood and/or in a home setting. Participant samples and further exclusion criteria per experiment are described in more detail in the following sub-sections.

In Experiment 1, an initial sample of 186 participants was recruited through convenience sampling on social media. Of these, 25 were excluded for having incomplete datasets due to technical problems in online data transfer. Another 17 were excluded for incorrect answers to comprehension questions. Finally, 25 were excluded from the bilingual subgroup for learning English before the age of 6 or having native-English parents. The final sample consisted of 119 participants, aged from 19 to 63 years (M = 26.02, SD = 8.58). Of these, 56 were bilinguals from various L1 language backgrounds, and 63 were native English speakers. Ninety-one of the 119 participants identified themselves as female, 25 as male, and 3 declined to reveal their gender. Table 1 provides a more detailed breakdown of the condition counts and gender distributions in Experiment 1.

Table 1 Participant counts and gender distribution per condition in Experiment 1

In Experiment 2, a sample of Finnish–English bilinguals residing in the UK was recruited, again through social media. Half of the participants completed the study in their native language (Finnish), and half in their L2 (English). Of an initial set of 331 respondents, 59 gave incorrect answers to comprehension questions, and another 27 were excluded for having learnt English before 6 years of age. Finally, data sets from 34 respondents were incomplete and thus removed. The final sample therefore included 211 participants, of whom 103 had been randomly assigned to Finnish (L1) and 108 to English (L2) as the target language for testing. Participants ranged in age from 18 to 71 years (M = 36.05, SD = 11.72). Of the final sample, 187 participants reported to be female, 23 male, and one participant declined to reveal their gender. Table 2 shows a more detailed breakdown of the condition counts and gender distributions in Experiment 2.

Table 2 Participant counts and gender distribution per condition in Experiment 2

Bilingual participants’ reported age of English acquisition was comparable across the two studies (Experiment 1: M = 9.34 years; Experiment 2: M = 9.21 years). Bilinguals in Experiment 1 reported to have lived in the UK for 5.20 years on average. Bilinguals in Experiment 2 reported a longer average length of stay in the UK (9.7 years). For a full breakdown of AoA and length of stay by experiment and condition see Tables 3 and 4 below. Participants were asked to rate their English (L2) proficiency in terms of speaking, reading and writing on a scale from 1 “very poor” to 7 “excellent”. After summing the scores across the three sub-scales (speaking, reading, and writing), self-assessed proficiency could range from 3 (lowest) to 21 (highest). The mean self-assessment scores were very high both in Experiment 1 (M = 18.93, SD = 2.59) and in Experiment 2 (M = 18.82, SD = 2.24). There was no reliable difference in self-assessed proficiency between the bilingual groups in the two experiments (p = 0.62 by Mann–Whitney U test). Within Experiment 2, the bilingual subgroup who completed the task in English was slightly (but not reliably, p = 0.092) higher in self-assessed English proficiency (M = 19.07, SD = 2.28) than the subgroup who completed the task in Finnish (M = 18.55, SD = 2.18).

Table 3 Bilinguals’ self-reported length of stay in the UK and age of L2 acquisition (means and SDs in years), broken down by condition for Experiment 1
Table 4 Bilinguals’ self-reported length of stay in the UK and age of L2 acquisition (means and SDs in years), broken down by condition for Experiment 2

Materials

Both studies were carried out online using Experimentum, a platform for online surveys set up by the University of Glasgow School of Psychology and Institute of Neuroscience and Psychology. All materials used in the studies were available in both English and Finnish. Finnish materials for Experiment 2 were translated from English by a native Finnish (English L2) speaker, and cross-translated by two other native Finnish speakers (who currently reside in Finland) to ensure compatibility.

The vignette used in the study was adapted from Experiment 1 in De Freitas and Johnson (2018). The original vignette included three levels of optimality (“best”, “middle”, “worst”), but since the original paper did not find a difference in blame between the two suboptimal conditions, we decided to implement only two choice conditions for the sake of simplicity. The third (“middle”) option was still included in the vignette in order not stray too much from original setup, but only the “best” and the “worst” option were used as choices made by the described actor (manipulated conditions). The vignette was therefore as follows:

A doctor working in a hospital has a patient who is having hearing problems. This patient has three, and only three, treatment options. The doctor believes that all treatment options have a 70% chance of giving the patient a full, successful recovery. But in fact, the doctor’s belief is wrong. Actually:

  1. 1.

    If she gives the patient treatment LPN, there is a 70% chance that the patient will have a full recovery.

  2. 2.

    If she gives the patient treatment PTY, there is a 50% chance the patient will have a full recovery.

  3. 3.

    If she gives the patient treatment NRW, there is a 30% chance the patient will have a full recovery.

The doctor chooses treatment (LPN or NRW) [manipulated between conditions], and the patient does not recover at all. The patient now has permanent hearing loss.

There were two versions of the vignette; in the optimal condition the hypothetical doctor was described to have chosen the ‘optimal’ treatment (LPN, 70% efficacy) and in the suboptimal condition the doctor had chosen the ‘suboptimal’ treatment (NRW, 30% efficacy). In both cases, the doctor was described as erroneously assuming equal efficacies of the treatments. The described outcome remained the same across conditions, with the hypothetical patient suffering permanent hearing loss regardless of the treatment that was administered.

A five-item “blame questionnaire” was designed to measure participants’ responses to the narratives. The responses were collected on 9-point Likert scales (cf. De Freitas and Johnson 2018) ranging from 1 (low blame) to 9 (high blame). The items addressed five different aspects of the blame judgements: (1) how much the doctor is to blame; (2) how much responsibility the doctor had; (3) how much the doctor deserved punishment; (4) how seriously wrong the doctor’s decision was; and finally, (5) how confident the participant was in making their judgement. The last item (5) was not considered to be a direct measure of blame attribution; it rather served as an additional control metric. Full wordings of the relevant questions can be found in “Appendix B”. In addition, there were three comprehension questions about the content of the vignettes which were also taken from De Freitas and Johnson (2018). Comprehension questions can also be found in “Appendix B”. Participants were excluded if they gave wrong answers to either of the first two of the comprehension questions. The third comprehension question was not used as an exclusion criterion, due to high numbers of participants answering this question incorrectly, regardless of target language. However, this comprehension question was included in exploratory analyses (see “Results” section).

Design and procedure

In Experiment 1, all participants completed the experiment in English. We compared two groups of participants (L1 vs. L2 speakers of English) in two conditions (optimal vs. suboptimal) using a 2 × 2 between-subjects design. Assignment of participants to experimental conditions (optimal vs suboptimal) was determined at random. In Experiment 2, Finnish–English bilinguals were tested in a 2 × 2 between-subjects design crossing target language (Finnish [L1] vs. English [L2]) with condition (optimal vs. suboptimal). Participants were randomly allocated to one of the four design cells: Finnish-optimal, Finnish-suboptimal, English-optimal, or English-suboptimal. Each participant read only one vignette.

Both studies were conducted online, and each participant was sent a link to complete the experiment. Bilingual participants were first asked to fill out a short questionnaire assessing linguistic background and English (L2) proficiency. Native English speakers skipped this step. Participants were then asked to read vignette allocated to them, followed by the five-item blame questionnaire (choosing appropriate scale-points via mouse click). After the blame items, participants were asked to answer the three comprehension questions about the vignette. All participants were then fully debriefed via a debriefing page. The procedure took less than 10 min to complete.

Ethics

The experiment was carried out in full compliance of the BPS Code of Ethics and Conduct (2018) and approved by the University of Glasgow College of Science and Engineering Ethics Committee.

Results

Power

Power analyses were conducted prior to recruitment of participants, using the PANGEA application (http://jakewestfall.org/pangea/). The analyses suggested that, assuming a conventional ‘medium’ effect size, 120 participants were needed to achieve 69% power, and 160 to achieve 80% power. This suggests that the final samples for Study 1 (N = 119) and Study 2 (N = 211) were reasonably sensitive to the effects of interest, although imbalances in the design (due to participant exclusion) could lower the actual power figures relative to the ‘idealised’ calculations reported here.

Blame scores

We combined rating responses to the first four items of the blame questionnaire (covering blame, responsibility, punishment, and seriously wrong) into a single blame composite score by summing them up. Since participants gave scores from 1 to 9 on the Likert scales, blame composite scores ranged from 4 (low blame) to 36 (high blame). This was treated as a continuous variable in subsequent analyses. Reliability analyses based on the R package psych (Revelle 2018) confirmed excellent internal consistency of the 4-item composite scale, with 95% CIs for Cronbach’s alpha of [0.923, 0.959] in Experiment 1 and [0.930, 0.957] in Experiment 2 (established via bootstrapping over 10,000 resamples per study).

Experiment 1

Table 5 shows means and SDs of the blame composite scores in each participant group and condition and the violin plot in Fig. 1 provides corresponding distributional information. Participants in the optimal condition gave lower blame scores than those in the suboptimal condition. Moreover, bilinguals (performing the task in L2) tended to attribute more blame than native speakers (performing the task in L1) regardless of condition.

Table 5 Means and SDs for blame attribution scores across participant group and optimality condition in Experiment 1
Fig. 1
figure 1

Blame scores by group and optimality condition

A 2 × 2 between-subjects ANOVA was performed to test the effects of Group and Optimality on blame attribution. Overall, participants in the optimal condition attributed less blame than those in the suboptimal condition, resulting in a strong main effect of Optimality [F(1115) = 165.773, p < 0.001, η2 = 0.577]. A significant effect of Group was also found [F(1115) = 5.934, p = 0.016, η2 = 0.021], confirming that the bilingual group gave reliably higher blame scores than the native group. The expected interaction between the two predictors was not confirmed [F < 1]. The optimality bias in Experiment 1 was therefore not mitigated by the FLE.

Experiment 2

Descriptive data for Experiment 2 are provided in Table 6 and Fig. 2 below. Again, participants gave clearly higher blame scores in the suboptimal than in the optimal condition. In contrast to Experiment 1, overall blame scores were comparable across L2 vs. L1 conditions. Also note that optimality condition differences in the means were in the opposite direction to the expected FLE: For English (L2), the suboptimal-optimal contrast amounted to 23.46 − 8.73 = 14.73 blame-score units, and for Finnish (L1) to 21.75 − 10.65 = 11.10 blame-score units.

Table 6 Means and SDs for blame attribution scores across participant group and optimality condition in Study 2
Fig. 2
figure 2

Blame scores by target language and optimality condition

A 2 × 2 between-subjects ANOVA confirmed only one significant effect, namely the main effect of optimality [F(1207) = 176.748, p < 0.001, η2 = 0.456]: participants in the suboptimal condition gave higher blame scores than those in the optimal condition.

The main effect of target language was not significant [F < 1]. The interaction between optimality and target language was marginal [F(1207) = 3.467, p = 0.064, η2 = 0.009] and in the opposite direction to the expected FLE.

Exploratory analyses

We conducted further analyses to investigate additional factors that may have affected the blame judgements. These analyses were not pre-registered, but are reported for completeness and to inspire future work.

Judgement confidence

Participants’ confidence scores were measured by item (5) in the blame questionnaire. Since responses to this question were measured on a single, discrete but rank-ordered 9-point Likert scale, we analysed these data using ordinal logistic regression, as implemented in the R package ordinal (Christensen 2019).

In Experiment 1, average confidence ratings did not seem to differ between the optimal (M = 6.39, SD = 2.00) and suboptimal condition (M = 6.30, SD = 1.40). Bilingual speakers (M = 6.14, SD = 1.85) tended to be slightly less confident overall than native speakers (M = 6.54, SD = 1.64), but the ordinal logistic regression analysis actually revealed no reliable main effect or interaction effects (all ps > 0.2).

Ordinal logistic models of the confidence ratings in Experiment 2 showed a reliable optimality main effect (b = − 0.562; p = 0.023): irrespective of target language condition, participants in the optimal condition (M = 7.64, SD = 1.87) were more confident in their judgements than participants in the suboptimal condition (M = 6.48, SD = 1.85). By contrast, the main effect of Target Language, as well as the optimality × target language interaction, did not approach significance in the confidence ratings (ps > 0.4).

Third comprehension question

As explained earlier, participants had to answer the first two comprehension questions correctly to be included in the main analyses. The third comprehension question (“Did the doctor have any way of knowing that this belief about the probabilities was false or was it outside her control?”) actually turned out to be somewhat problematic. In Experiment 1, 70 participants unexpectedly answered this question with “yes”; only 47 said “no” (as expected), and another two participants skipped this question altogether. Therefore, most participants (58%) answered this question in an unexpected manner. In Experiment 2, 82 participants unexpectedly answered “yes”, compared to 128 “no” responses and one participant skipping the question. While more in line with our expectations, the proportion of participants giving the ‘wrong’ answer was still quite large in Experiment 2 (38%).

Binary logistic regression analyses were conducted to explore whether there would be any cross-condition differences in answering the third comprehension question correctly. No clear main effects or interactions were established in either of the two studies (all ps > 0.2). Hence, answering the third comprehension question correctly was unlikely to be predictive of the blame attribution scores of the main analyses.

Length of stay and age of acquisition as a predictors

As suggested in Tables 3 and 4 above, there were slight imbalances in length of stay in an English speaking country and in age of acquisition of English across the bilingual samples per condition. We therefore conducted additional multiple regression analyses in order to assess were these two variables were predictive of the observed blame ratings.

For Experiment 1, only bilingual participant data were considered, as we did not have information about the age of acquisition or length of stay in English speaking country for the English native speakers. Age of acquisition (AoA), Length of Stay (LoS), optimality condition (Condition), and all possible two-way interactions between these predictors, were included in the model as predictors of the blame composite scores.

As seen in Table 7, the regression model confirmed the previously established main effect of Condition even when variation in Length of Stay and AoA was accounted for: the reliably negative estimate for Condition shows that blame judgments were harsher in the suboptimal condition. Interestingly, Age of Acquisition of English also had a significant effect; earlier acquisition of English predicted harsher blame judgments. There was also an interaction between Length of Stay and Age of Acquisition, suggesting that the effect of AoA was mitigated by LoS to some extent.

Table 7 Regression table for Experiment 1, Bilinguals only

For Experiment 2, Age of Acquisition (AoA), Length of Stay (LoS), optimality condition (Condition), and test language (Language) were entered into the model as predictors of the composite blame judgments. We also included the two-way interactions between each of the predictors.

As Table 8 shows, only the effect of optimality condition was significant (as in the pre-registered main analysis). The interaction between Condition and Language was marginal (p = 0.06) and it should be noted that its direction suggested the opposite pattern to the hypothesised FLE (same as in the pre-registered main analysis).

Table 8 Regression table for Experiment 2: Finnish speakers tested in Finnish or English

Discussion

In line with De Freitas and Johnson (2018), we expected blame scores to be lower in the optimal condition than in the suboptimal condition. Both studies fully supported this hypothesis, showing clear evidence for an optimality bias in blame attribution. We also hypothesised that there would be an interaction between Language/Group and Condition, such that the difference in blame judgments between the two conditions (optimal vs. suboptimal) would be smaller in L2 than in L1. This hypothesis was clearly not supported. In Experiment 1, L2 speakers were found to provide reliably higher blame attribution scores than L1 speakers, regardless of condition. In Experiment 2, no reliable difference between language conditions was found; if anything, there was a marginal interaction suggesting that the optimality bias in blame judgements was actually somewhat higher in L2 than in L1. In other words, the optimality bias in blame attribution did not appear to be modulated by a Foreign Language Effect (FLE)—or at least not in the direction we originally hypothesised.

Interestingly, in the exploratory analyses, we found lower age of L2 acquisition to be predictive of higher blame scores, and this effect to be mitigated the longer the participants have stayed in an English-speaking country. Although this pattern was found only in Experiment 1 (bilinguals from various L1 backgrounds) and not in Experiment 2 (Finnish L2 speakers of English), this may point to the importance of controlling for these variables more carefully in future research on this topic. In Experiment 2, the Finnish participants completing the study in English varied in duration of residence in the UK from a minimum of 3 months to a maximum of 50 years (average 10 years). In comparison, bilinguals in Study 1 only ranged in duration of residence from 2 months to 17 years (average 5 years).

The processes of blame attribution

In both experiments, the hypothetical actor faced significantly more blame for the same tragic outcome when they (unknowingly) made a suboptimal rather than an optimal choice. Thus, we replicated the findings from De Freitas and Johnson (2018), and found an optimality bias in blame attribution. Findings such as these are consistent with the person-as-reconstructor theory of blame attribution (Kahneman and Tversky 1982; Kahneman and Miller 1986). According to this theory, tragic outcomes motivate observers to reconstruct events after they happen, considering alternative choices and blaming the agent for failing to act otherwise. The doctor in our vignettes had three choices, which means that they could have acted differently. As a result, we observed higher blame judgements in the suboptimal condition.

This may also be explained by the Path Model of Blame (Guglielmo and Malle 2017), which argues that blame is assigned systematically. Once causality is determined, observers assess whether the action was intentional. If the action was unintentional, observers then assess preventability. Our vignette was based on an unintentional scenario, so according to the theory, degree of preventability should guide blame judgements. In the optimal condition, the outcome was clearly not preventable because the patient suffers hearing loss even when the doctor picks the ‘best’ treatment option. In the suboptimal condition, however, it is likely that participants believed the outcome could have been prevented, had the doctor chosen the ‘better’ treatment. Thus, participants in the suboptimal condition seemingly based their judgments on potential alternative outcomes, while ignoring the doctor’s mental state. Interestingly, exploratory analysis showed that in Experiment 2, participants in the optimal condition reported significantly more confidence in their judgement than those in the suboptimal condition, which could be seen as support for this kind of explanation.

Cushman (2008) argues that moral judgements involve two processes. The first one is triggered by negative consequences, where we search for an agent who is causally responsible. The second process is determined by analysing mental states, where blame is assigned only if the agent believed the action would cause harm. In this model, causality and foreseeability are separate processes, so causation and blame should not become conflated in moral judgements. However, our findings suggest that observers often make this mistake. Participants did not appear to engage in the second process when forming their moral judgements, i.e., they ignored the actor’s viewpoint and beliefs. This contradicts the idea of two separate processes, or alternatively, suggests that the second process was given little consideration by participants: while the hypothetical doctor was causally responsible for her patient’s hearing loss, analysing her mental state should have resulted in equal blame judgements across conditions, which was clearly not what the data showed.

The FLE in blame attribution

De Freitas and Johnson (2018) argue that factors inhibiting participants from considering the actor’s mental state should enhance the optimality bias in blame attribution. Based on this assumption, and considering that emotionality might play a role in inhibiting the adoption of the actor’s viewpoint, our second hypothesis was that the optimality bias in blame attribution should be stronger in L1 than in L2, particularly because previous demonstrations of the Foreign Language Effect (FLE) have pointed to reduced emotionality in L2.

In Experiment 1, we found that using L2 did not facilitate participants to think ‘more rationally’ about the actor’s actual beliefs. Rather, L2 speakers apportioned generally more blame than L1 speakers. In Experiment 2, we found a marginal interaction in the opposite direction to our expectations, i.e., the optimality bias in blame judgements was slightly stronger in L2 than in L1. How can these unexpected results be reconciled with previous findings on the FLE?

It is possible that the FLE, by reducing emotionality, promotes consequentialist, utilitarian moral judgements. When using a foreign language, people become less sensitive to intentions and beliefs and more sensitive to outcomes (see also Hayakawa et al. 2016). Previous research on the FLE in moral judgement has indeed been confined to dilemmas involving utilitarian decision-making, i.e. the ‘trolley’ and ‘footbridge’ dilemma (Cipolletti et al. 2015; Corey et al. 2017; Costa et al. 2014a; Geipel et al. 2015). The present study is novel in applying FLE to the attribution domain, which involves judging the intentions and actions of another person.

We conjecture that emotional involvement—in the sense of enhanced empathy (discussed below)—may actually be a requirement for considering a situation from another person’s perspective. Under this view, diminishing emotion (e.g., via the FLE) might enhance the optimality bias in blame attribution, and thus partially account for the findings in both Experiment 1 (where bilinguals were found to be harsher in their blame judgments than L1 speakers) and Experiment 2 (where the optimality bias was found to be slightly stronger in L2 than in L1).

Masto (2015) argues that empathy is a crucial aspect in the forming of moral judgements. It is not enough to just observe an actor’s behaviour to assess whether it is morally right, but we must also make additional evaluations regarding the motivations and thought-processes of others. Previous research suggests that considering an action from the perpetrator’s point of view can indeed reduce the severity of blame judgements. For example, in a mock-trial paradigm, Haegerich and Bottoms (2000) presented participants with a patricide scenario where a hypothetical child defendant claimed to have committed the crime in self-defence following years of abuse. Participants in the experimental condition were instructed to take the perspective of this child, and imagine how they would feel and think in the same situation. This resulted in significantly lower blame judgements compared to a control group where no such instructions were provided.

Encouraging observers to think from the actor’s perspective would likely also mitigate the optimality bias by directing focus away from the existence of alternative options and towards the key fact that these options are redundant (because the actor is not aware of their importance). Increased perspective-taking and empathy towards the ‘doctor’ in our vignettes may make participants realise that the outcome was not preventable.

Some research suggests that bilinguals may actually have advanced executive functions that are advantageous for perspective-taking (e.g., Greenberg et al. 2013). However, this has primarily been demonstrated for early bilinguals, especially those with native-like proficiency in both languages (see Rubio-Fernández 2017). The purported bilingual advantage may actually not exist in late bilinguals with L2 as foreign language. For example, Ryskin et al. (2014) studied visuospatial perspective-taking in a paradigm where participants completed a route-finding task by following instructions from an experimenter who had either the same or the opposite perspective. Late bilinguals struggled significantly more than monolinguals when taking opposite perspectives in their L2. Indeed, both of our experiments focused on late bilinguals, i.e. we deliberately excluded a relatively small number of bilingual participants who might have benefited from (potentially) enhanced executive functioning.

Mante-Estacio and Bernardo (2015) found a bilingual disadvantage in a Theory of Mind task where they asked participants to take the perspective of a character in a vignette. They studied the ‘illusory transparency of intention’—originally demonstrated by Keysar (1994)—whereby readers falsely assume that characters in a story have access to the same information as the reader does. Participants were given vignettes describing a conversation and asked to judge whether the tone of a statement was sarcastic or genuine from the perspective of the character in the vignette. It was found that participants in L2 were more likely to focus on information that was clearly not available to the described character. Thus, these participants had more pronounced ‘illusory transparency of intention’ and found it more difficult to take the character’s perspective in their foreign language.

Muted emotional resonance can also reduce the vividness of mental imagery. This was demonstrated by Hayakawa and Keysar (2018) on several measures. Bilingual participants reported experiencing difficulty in imagining objects in their L2. The same trend appeared also in a number of objective tasks. Participants were asked to mentally categorise objects based on visual attributes like shape. Bilinguals completing the task in their second language were less accurate than those completing the task in their native language. Importantly, Hayakawa and Keysar (2018) also found that bilingual participants completing the task in their L2 were more likely to agree to pushing a man in front of a train in the ‘footbridge dilemma’ and found that these participants rated the scenario as being far less visually vivid than those in L1.

As a whole, the present studies tap into a relatively new area of research. Few studies so far have investigated potential links between bilingualism and perspective-taking, and whether using a foreign language makes it difficult to imagine or consider the thoughts and feelings of others. The present research can make only tentative conclusions in this regard. In Experiment 1, L2 participants attributed significantly more blame than L1 participants, regardless of condition. In Experiment 2, the marginal interaction between language and condition suggested that L2 participants were somewhat more susceptible to the optimality bias in blame attribution than L1 participants. Together, these results could be accounted for by assuming decreased empathy (or perspective-taking ability) as a result of reduced emotional resonance in L2.

Finally, a potential issue arose from the third comprehension question in our experiments, which was also included in the original De Freitas and Johnson (2018) study: “Did the doctor have any way of knowing this belief about the probabilities was false or was it outside her control?” This question was answered incorrectly by a large proportion of participants (58% in Experiment 1 and 38% in Experiment 2) and could therefore not be used as an exclusion criterion. Participants were possibly thinking beyond what was stated in the narrative, and assumed that the doctor must have been careless in her prior research for having insufficient knowledge about the treatments’ differing efficacies. That said, the exploratory analyses showed no systematic effects of language or condition in the likelihoods of answering this question incorrectly. Thus, answering this question incorrectly did not appear to be associated with participants’ blame attributions.

Conclusion

The present experiments provide further evidence for the existence of an optimality bias in moral judgements. As such, they add to the existing literature on blame attribution and related theories. People find the existence of ‘better’ options important when morally judging the choices made by others, even when (a) all choices lead to the same (negative) outcome and (b) decision-makers are described as believing that all choices are equally optimal. More specifically, participants apportion reliably more blame (for the same negative outcome) when a described actor unknowingly made a suboptimal rather than an optimal choice. Against our expectations, we found that this optimality bias in blame attribution may be further enhanced by impaired perspective-taking, or empathy, in one’s second language (L2). This contributes to the literature by suggesting that the Foreign Language Effect does not necessarily put bilinguals at an advantage in all types of moral decision-making scenarios. Indeed, there appear to be cases where reduced emotional resonance in L2 could potentially enhance irrational biases in moral judgement rather than diminish them.