Introduction

During the last twenty years, the enthusiasm in the traffic psychology research field has newly flourished, mainly fueled by the worldwide interest in the cognitive, social and ethical mechanisms behind driving behavior in vision of the upcoming revolution of autonomous transportation (e.g., Baum, 2020; De Sio, 2017; Gkartzonikas & Gkritza, 2019; Haboucha et al., 2017; Hidalgo et al., 2021; Pigeon et al., 2021; Shariff et al., 2017).

Focusing on the ethical framework, several studies have investigated the beliefs that people have about how autonomous and human drivers should behave when facing complex decisions in which moral values and rules are involved (e.g., Bonnefon et al., 2016; Goodall, 2014; Meder et al., 2019; Sütfeld et al., 2019; Sütfeld et al., 2017). The definition of the “right” behavior is far from an easy task, considering how much individuals’ and countries’ ethical principles, social preferences, and traffic culture differ from each other (Awad et al., 2018; Graham et al., 2016, Özkan et al., 2006).

Regardless of the level of automation, a widely used experimental approach to investigating reasoning and moral judgment behind the wheel is the sacrificial moral dilemma (e.g. Martí-Vilar et al., 2021; Nasello et Al., 2021, Trémolière et al., 2017), defined as a conflict between two undesirable actions with unpleasant outcomes involving loss of human lives. The trolley and the footbridge problems are two paradigmatic examples (Foot, 1978; Thomson, 1985). In the first case, a trolley moving along tracks is on course to run down five track workers. The only way to save them is by pulling a lever that diverts the trolley onto another track where only one worker will die. In the footbridge version, instead, the five track workers can be saved only by pushing a large man off an overpass onto the main track: he will die, but his body will stop the trolley. The main aim of moral dilemmas such as these is to bring out two categories of different schools of moral thought: Kantian deontologism and Benthamian utilitarianism. Deontologism is a duty-based ethical theory that focuses on moral norms (i.e., categorical imperatives) that cannot be violated (e.g., “the ends never justify the means”; Kant, 1785). Utilitarianism, as the paradigm case of consequentialism, claims that an act is right if and only if it minimizes overall harm, denying that moral rightness depends on anything other than consequences (e.g., Bentham, 1781).

In this context, the exploration of the role of intention in moral decision-making is also thorough, on the basis of the doctrine of the double effect (DDE; Aquinas, 1952). According to this principle, voluntarily causing harm for a greater good (“instrumental dilemmas”) is not acceptable, albeit it is permissible as a foreseen—but unintended—side effect to save more people (“incidental dilemmas”). In line with the DDE, the footbridge dilemma is structured as an instrumental dilemma, in which the sacrifice of one person is the unethical means to save more people, and the trolley problem is structured as an incidental dilemma, in which diverting the course of the trolley towards the single worker is ethically permissible as a predictable—but undesired—consequence of achieving a greater goal (i.e., saving five people). The role of intention in moral judgments has been widely explored (e.g., Borg et al., 2006; Cushman et al., 2006), also through the manipulation of time pressure or cognitive interference (see Trémolière et al., 2017), and several studies have shown how participants were much more likely to cause harm unintentionally than intentionally (e.g., Borg et al., 2006; Cushman et al., 2006; Greene et al., 2001; Lotto et al., 2014; Moore et al., 2008).

A typical adaptation of this thought experiment to on-road scenarios consists of a driver—or an autonomous vehicle—facing a binary decision in a typical incidental dilemma: to maintain the trajectory and run over a number of pedestrians, killing all of them, or to swerve and kill a single unaware pedestrian or the driver/passenger (Bonnefon et al., 2016). During the last decade, this hypothetical AV-type dilemma framework has been thorough in different experimental settings: virtual reality (e.g., Kallioinen et al., 2019; Sütfeld et al., 2017, 2019), immersive driving simulations (e.g., Frison et al., 2016; Samuel et al., 2020), and importantly through text-based moral scenarios (e.g., Bonnefon et al., 2016; Sütfeld et al., 2019).

When the available actions were set for a dichotomous decision, the deontological option stood for a submissive action (e.g., maintaining trajectory), and the utilitarian one for a proactive action (e.g., swerving or breaking). In general, respondents strongly prefer vehicles programmed to maximize utility by saving the highest number of people (Awad et al, 2018). Among the several variations that have characterized on-road moral scenarios (e.g., manipulating the number or the ages of potential victims, their socioeconomic status, the road laws; Awad et al., 2018), the self-involvement risk factor seems a crucial application to the driving-type dilemmas (e.g., Bonnefon et al., 2016; Sütfeld et al., 2019). The seminal work by Bonnefon et al. (2016) showed that drivers/passengers who approve utilitarian algorithms, when directly involved as potential victims, tend to prefer self-protective vehicles for themselves. Nonetheless, Bergmann and colleagues (2018) further explored the utilitarian approach when the decision maker’s life was at stake. Surprisingly, the participants acted more altruistically than expected when their life was at risk, protecting the pedestrians over themselves and proportionally to the number of potential victims. In the described experimental application, Bergmann and colleagues took the inspiration from Thomson’s dissertation on the absence of a self-sacrificial option in classic moral dilemmas (2008), which addition may cause a quarrel between utilitarian aims and instincts of self-preservation. Furthermore, in the exemplification of the moral agent’s three-option variant (killing five vs killing one vs killing himself), Thomson also utilized a typical on-road scenario, in which the car’s brakes fails and there was no chance to avoid a sacrificial solution. Evidence on this point are still controversial, and Thomson herself has opted to leave the floor to further investigations (i.e., Huebner & Hauser, 2011), underlining the complexity of interpreting an individual’s moral judgment when the self-preservation instinct collides with utilitarian collective demands.

In addition, emotional valence and arousal have an influence on driving behavior (Chan & Singhal, 2015), and the role that emotional processing plays in the development of the ‘right’ moral decision is well-known. Particularly, Greene’s dual-process theory underlined the systematic competition between emotional and cognitive processes in moral judgment: a higher endorsement of utilitarian options seems to be related to a lower emotional activation because a slow, controlled cognitive process can overwhelm a fast and automatic emotional response, when the perceived benefits of a moral decision exceed the costs (Greene et al., 2001, 2004, 2008). Specifically, some studies have focused on the measurement of self-reported emotional experience at the time of judgment (Lotto et al., 2014; Pletti et al., 2016; Sarlo et al., 2012), showing a low emotional involvement in incidental scenarios.

To date, several theoretical frameworks and empirical data on moral judgments for sacrificial and incidental dilemmas are available in the literature, in terms of moral acceptability, decision times, and emotional activation (Cushman et al., 2006; Greene et al., 2001, 2008; Lotto et al., 2014; Moore et al., 2008; Navarrete et al., 2012; Ugazio et al., 2012), typically applied to a wide variety of extreme and high-conflict events in heterogeneous settings. To date, only a very small number of these scenarios have been properly structured and tested to for the context of moral driving behavior, and mainly as an adaptation of the traditional trolley problem. Moreover, the majority of these cases are applied to autonomous transportation, whose social acceptance, perceived safety, and trust are still a work in progress (e.g., Ghazizadeh et al., 2012; Hengstler et al., 2016; Jing et al., 2020), leaping over the traditional non-autonomous driving level.

Indeed, although applying the traditional sacrificial dilemma structure to the driving activity—with or without a human driver—seems plausible, the significant amount of experimental research on the ethical perception of driver behavior has not yet provided a solid experimental validation.

The Present Study

Until today, relatively low attention has been paid to (a) the reliability of thought experiments as experimental tools in the context of traffic behavior and (b) how these scenarios should be structured. A variety of general rules has been settled concerning the structuring of text-based driving dilemmas, leading to a case-by-case adaptation of AV/driving dilemmas. In the present research, we decided to take a step back from the contemporary autonomous transportation application, with the aim of ensuring an experimental and structural validation of sacrificial manual driving dilemmas. This passage seems crucial to ensuring the possibility of applying the trolley problem in the traffic psychology framework, and consequently the reliability of the experimental conclusions in traffic and AV ethics obtained so far.

To this aim, we compared, in text-based form, a set of traditional sacrificial trolley problems (Lotto et al., 2014, as modified by Sarlo et al., 2012) with a novel set of trolley-like manual driving dilemmas. We hypothesized no differences between the two sets, in terms of decision time, moral judgment, emotional activation and moral acceptability. This result may be consistent with the idea of a negligible effect of the driving customization of the dilemmas on moral decision processes, confirming the validity of this experimental tool in on-road scenarios.

Furthermore, considering the direct involvement of the moral agent in the execution of the driving activity, the potential role of the self-involvement risk factor in the decision process was also considered in the development of the new manual driving dilemmas set. In this regard, we expected a higher frequency of self-protective decisions in self-involvement dilemmas, which should be considered as more unpleasant and more arousing than dilemmas without individual risk. These results may be also considered as a replication of Lotto and colleagues (2014).

Through the proposed sacrificial driving scenarios, we provide a new specific set of incidental dilemmas, rigorously controlled for a number of psychological and linguistic confounding factors, on the basis of the DDE. In this context, the Italian normative values for the following variables were provided: (a) rates of participants’ choices in each scenario; (b) decision times; (c) ratings of emotional valence and arousal experienced during the decision process, and (d) judgments of the moral acceptability of the two proposed driving behaviors.

Methods

Participants

A baseline equation assuming a medium effect size (Cohen’s d = 0.25) and a correlation of 0.50 among repeated measures, with a bidirectional hypothesis and an alpha error probability of 0.05 with 0.90 power, was tested before any data analysis with the G-power statistical software (Faul & Erdfelder, 1992). The system suggested 124 participants (31 per group). We recruited 152 participants (75 women, 1 unspecified) for the experiment. Each participant gave formal written consent prior to participation, which was voluntary and unremunerated. Mean age was 25.7 (SD = 5.48, range = 18–57), and 69.08% were enrolled in university courses (n = 105), with 40% matriculated in human sciences degree programs (e.g., psychology or sociology; n = 42) and 21% in technical courses (e.g., engineering or mathematics; n = 22). Most participants (95.4%) had held driver licenses (n = 145) for 6.56 years on average (SD = 5.45, range: 1–39). Almost all participants (99.3%, n = 151) drove a maximum of 15,000 km per year, and only 4.60% were involved in a car accident in the prior 12 months (n = 7). The study was approved by the local ethics committee (ID No.: 3514).

Stimuli

The set of stimuli was composed of 42 dilemmas: 40 sacrificial and 2 “filler” nonsacrificial moral scenarios (e.g., being dishonest). The two fillers, as well as 20 of the sacrificial dilemmas (“traditional dilemmas”), were selected from the validated set of Lotto et al., (2014, edited from Sarlo et al., 2012), whereas the remaining 20 scenarios were newly developed as driving-type dilemmas. The new driving-type subset was developed as typical incidental dilemmas based on the DDE (Aquinas, 1952). We decided to focus our attention solely on incidental scenarios (i.e., killing one person as a foreseen but unintended consequence of saving many) because of the difficulty in imagining and producing plausible behind-the-wheel moral judgments in the classic instrumental version.

As with the traditional validated set, driving-type dilemmas are composed of a hypothetical moral text scenario and two possible resolutions: a deontological action and a utilitarian action. In both categories of dilemmas, the self-risk involvement factor was considered: in 10 scenarios per type, the agent was not involved as a potential victim of the accident (“other-involvement dilemmas”), whereas, in the remaining 10 scenarios, the utilitarian outcome resulted in protecting his own and other people’s lives, but sacrificing a single individual as an unintended and predicted side effect (“self-involvement dilemmas”) (Table 1). The driving-type self-involvement scenarios were structured consistently with the validated self-involvement set (Lotto et al., 2014), sharing the same self-sacrificial frame of more widely used moral dilemmas (i.e., see the “Crying Baby”, the “Plane Crash” and the “Sacrifice” scenarios, Greene et al., 2001, 2004, 2008).

Table.1 Sample non-customized, driving-type and filler dilemma (text translated from Italian)

The new driving-type set was controlled for several potential factors that may blur the ethical decision process (Awad et al., 2018; Lotto et al., 2014). These potential factors included (a) consideration of traffic rules and regulation to prevent a clear allocation of responsibilities; (b) avoidance of “leading” language and critical words (e.g., response bias; Loftus & Palmer, 1974); (c) avoidance of characterizing passengers and pedestrians in terms of gender, age, and personal relationship with the driver; (d) a constant 1:3 ratio between killed and saved characters; (e) limitation of the scenarios to a unique moral dilemma (i.e., killing or letting die); and (f) careful control of each scenario for the number of letters and words (see supplementary material: https://osf.io/cp285/?view_only=37c928fa5dc047d6905e40521a6a48dc).

Experimental Design and Procedure

The experimental task was programmed and distributed on Qualtrics survey software, and the complete experimental procedure had a medium duration of 26 min (SD = 7.26 min).

All the participants were requested to complete the survey using a fixed device or a laptop, because of two reasons: the limited comparability of data when obtained through multiple device surveys (e.g. Krebs & Höhne, 2021) and to avoid that the participant performed the experiment while on board of a road vehicle.

For methodological reasons, participants were divided into four “lists” (n = 38 per list), each of which was composed of 18 properly randomized dilemmas: eight traditional, eight driving-type, and the two filler scenarios. Four couples of these 18 dilemmas—one couple per category—were “list-specific,” thus only present in one particular list, whereas the remaining five couples or “anchors” (including fillers) were common between all the lists.

This experimental design was chosen for two reasons. First, because the intrinsic nature of the experimental activity requests a certain amount of effort and time for each scenario, the administration of all the dilemmas to all participants was considered too demanding and time consuming. A preliminary pilot study conducted on 12 participants confirmed that the 42-dilemma version was perceived as too long. In fact, the administration of 42 dilemmas required more than 1 h per participant. Given these results, the set of dilemmas to be administered was reduced to 18 per list. In this way, the experimental duration was halved, while scenarios were paired per category in each list. This approach granted the collection of a sufficient number of answers for each “list-specific” scenario (38), to obtain an adequate a priori statistical power.

The “anchor” dilemmas, selected from the normative scores of Sarlo et al. (2012) and pilot scores, had the aim of ensuring the reliability of each scenario in its own category. This was allowed by a preliminary analysis on non differences between anchors of the same category and between lists.

The experimental procedure is represented in Fig. 1. Due to the COVID-19 pandemic, participants completed the experiment online. Before starting, they were asked to complete the informed consent form and read instructions about the experimental session. Each dilemma was then presented as a text, in black type (font Arial, size 10) against a white background. For each dilemma, participants could read the scenario for as long as needed. Moving forward to the next slide, Outcome A (deontological action) was shown. After 5 s, Outcome B (utilitarian action) was added to the screen, and then, after 7 more seconds, options keys appeared allowing the participants to indicate the preferred outcome (for a detailed explanation of the procedure, see Palmiotti et al., 2020; Sarlo et al., 2012).

Fig. 1
figure 1

Sequence of events in the experimental procedure for each dilemma

Subsequently, respondents rated their emotional state during the moral decision-making process using the Self-Assessment Manikin (SAM; Lang et al., 2008). The 9-point graphic scale was presented in two different slides, to evaluate valence (unpleasantness/pleasantness) and arousal (calm/activation), with higher scores indicating higher pleasantness and higher emotional arousal. Moving on to the last slide, participants were asked to rate on a 0–7 8-point scale (7 = completely acceptable) how morally acceptable the two proposed outcomes were. Decision times were recorded from the onset of the decision slide until the button press. The above experimental procedure was described step-by-step before the beginning of the experimental session, avoiding the presentation of an additional trial scenario.

Analysis

The statistical analysis was conducted in the R environment (version 4.0.4). In a preliminary step, the similarity of anchor dilemmas among lists was tested, looking for a nonsignificant interaction between the four lists and, respectively, dilemma type and specific anchor variables. We focused on outcome preferences (decision type, dichotomic), decision times, valence (1–9), and arousal (1–9). Given the nature of the collected data, linear mixed models and generalized linear models only for the dichotomous variable, with participants as random variables, were fitted to the data (‘lme4’ package in R; Bates et al., 2015).

Main data analysis focused on the complete experimental stimuli set, setting aside list grouping, and considering several potential predictors in six forward stepwise model comparisons (M1 – M6; further information on the procedure is available in the supplementary material). Decision type, decision time, valence, arousal, and moral acceptability for both the outcomes were set as dependent variables in different mixed-effect linear model comparisons, with participants as random variables. Post hoc pairwise comparisons were computed with the emmeans function from R package ‘emmeans’ (Lenth, 2020), setting Bonferroni correction as an adjustment method. Decision times were transformed in their natural logarithmic form (Lotto et al., 2014) and cleaned of outliers, applying a 98% acceptance interval. Filler dilemmas were excluded from the preliminary and main data analysis and analyzed separately. Descriptive information can be retrieved in Table 3 and Table 4, and in greater detail in the supplementary materials, along with the dataset and the R script.

Results

Anchors

As predicted, no interaction was observed between lists and dilemma type (χ29 = 4.40, p = 0.88) and between lists and specific anchor (χ 12 = 3.46, p = 0.99) in terms of decision times. Similarly, no significant interactions in outcome preferences were found either between lists and dilemma type (χ29 = 11.92, p = 0.21) or between lists and specific anchor (χ212 = 10.38, p = 0.58). Furthermore, no interaction was observed between lists and dilemma type in terms of valence (χ29 = 3.58, p = 0.93) and arousal (χ29 = 5.33, p = 0.80). Such lack of significance was also observed between lists and specific anchor, both in valence (χ212 = 2.54, p = 0.99) and in arousal (χ212 = 14.04, p = 0.30). These results are in line with the homogeneity hypothesis of the four lists, based on nondifferent responses between dilemma categories and specific anchors. They allowed consideration of the four lists as a whole in the subsequent statistical analysis of the complete dataset.

Complete Dataset

Table 2 shows the estimates from the computed models. Decision time model M1 included risk involvement (self, other), dilemma type (traditional, driving-type), experimental order (1 to 18), and decision type (deontological, utilitarian) as fixed effects, as well as the interaction between risk involvement and dilemma type. A significant increase in decision times was observed in dilemmas comprising self-involvement (χ21 = 17.05, p < 0.001), as in the case of deontological decisions (χ21 = 38.62, p < 0.001). As expected, slower decision times were observed at the beginning of the experiment, with an increase in velocity during the experimental time (χ21 = 427.66, p < 0.001; Fig. 2). At the same time, an overall effect of dilemma type—slower decisions in traditional dilemmas (χ21 = 43.10, p < 0.001)—and an interaction effect between dilemma type and risk involvement (χ21 = 24.41, p < 0.001) were observed. Post hoc pairwise comparisons showed quicker resolutions of driving-type scenarios not involving the participant as a potential victim than traditional did (TO–DO: z = 8.13; p < 0.001) and driving-type self-involvement scenarios did (DS–DO: z = 6.41, p < 0.001), indicating less cognitive conflict in this type of dilemma.

Table.2 Beta estimates e p-values from M1 to M6
Fig. 2
figure 2

Smoothed curves plot with error bars plot representing means and standard errors for decision times (in seconds), divided by experimental order and dilemma type

For the decision-type model M2, the binomial family distribution was set as a reference point for implementing a generalized mixed effects linear model, with subject as random variable. Following the comparison of the models, the one with risk involvement and dilemma type as fixed effects, and the interaction between them was selected. A higher percentage of utilitarian resolution approach was observed in driving scenarios (89%) when compared to traditional ones (80%, χ21 = 47.34, p < 0.001), whereas no risk involvement effect was detected. The interaction between involvement and dilemma type was significant (χ21 = 6.29, p = 0.012): driving-type dilemmas were solved more frequently through the utilitarian option, both when the decider’s life was at risk (TS–DS: z =  − 3.36; p = 0.004) and when it was not (TO–DO: z =  − 6.53; p < 0.001; Fig. 3).

Fig. 3
figure 3

Flipped bar chart of decision percentage frequencies, divided by involvement and dilemma type in rows and by decision type in columns (Deontological, Utilitarian)

Focusing on valence, forward stepwise model comparisons led to an M3 mixed effects linear model including risk involvement, dilemma type, decision type, arousal, gender, degree program (7 levels), profession (student, nonstudent), and age as fixed effects, as well as the interaction between risk involvement and dilemma type. No differences were observed in terms of risk involvement (χ21 = 0.62, p = 0.430), dilemma type (χ21 = 0.50, p = 0.478), and in the interaction between the two factors (χ21 = 1.02, p = 0.312), while lower scores of unpleasantness were detected when the deontological outcomes were selected (χ21 = 8.08, p = 0.004) and for higher activation scores (χ2 1 = 174.44, p < 0.001). Women indicated higher levels of unpleasantness than men did (χ21 = 7.63, p = 0.022), and older people (χ2 1 = 5.68, p = 0.017) indicated higher unpleasantness than the overall sample did. Interestingly, students also showed higher scores of unpleasantness during moral decision-making than nonstudents did (χ21 = 5.57, p = 0.018).

When arousal as dependent variable was selected, we computed a mixed linear model M4 containing risk involvement, dilemma type, decision type, valence, and gender as fixed effects, as well as the interaction between risk involvement and dilemma type. As expected, higher activation matched dilemmas with personal involvement (χ21 = 5.54, p = 0.018), whereas no statistical significance was detected looking at dilemma type factor (χ21 = 2.43, p = 0.118) and its interaction with risk involvement (χ2 1 = 0.45, p = 0.501). Women showed higher arousal levels than men did (χ21 = 13.48, p = 0.001), and coherently with the observed negative relation between valence and arousal (χ21 = 156.53, p < 0.001), the selection of deontological options led to higher scores of the dependent variable.

Models M5 and M6 focused on the moral judgment concerning the two proposed outcomes, respectively deontological and utilitarian, independently from the chosen behavior. A dilemma effect on the deontological option (M5) was observed, with lower scores judging driving-type dilemmas (χ21 = 19.85, p < 0.001), as well as a significant interaction effect between risk involvement and dilemma type (χ21 = 17.87, p < 0.001). In traditional self-involvement dilemmas, the deontological option received higher ratings of moral acceptability than the ones of the same typology without personal involvement (NCS–NCO: z = 9.46, p < 0.001) and then driving-type self-involvement scenarios (NCS–DTS: z = 6.14, p < 0.001). A coherent overall effect of Decision Type on the moral evaluation of both the moral outcomes was confirmed (deontological: χ21 = 107.64, p < 0.001; utilitarian: χ21 = 107.21, p < 0.001; Fig. 4). Gender differences related only to the moral rating of utilitarian behavior were observed, with lower ratings of moral agreement expressed by women (χ21 = 14.97, p < 0.001).

Fig. 4
figure 4

Error bars plot representing means and standard errors of participants’ moral evaluations, divided by involvement and dilemma type in rows (DS: Driving-type Self; DO: Driving-type Other; TS: Traditional Self; TO: Traditional Other), and decision type in columns (deontological action, left; utilitarian action, right)

Finally, four separate linear mixed effect models were performed for filler dilemmas on logarithmic decision times, selected outcome (generalized), valence, and arousal, with subject as random variable and gender (male, female), profession (student, otherwise), degree program (7 levels), and decision type (immoral, moral behavior). No relevant effects were observed in any of the four models. Descriptive information can be retrieved in Table 3 and Table 4

Table.3 Mean and Standard Deviation of the dependent variables considered, divided by Dilemma Type (traditional, driving-type) and Risk Involvement (self, other involvement)
Table.4 Mean and Standard Deviation of the dependent variables considered, divided by the interaction between Dilemma Type and Risk Involvement

Discussion

The present study aimed to investigate the suitability of the trolley-problem research tool when applied to moral driving behavior in on-road scenarios. To this end, a selection of validated incidental dilemmas was compared with newly developed driving-type dilemmas, structured based on the DDE. The behavioral choice, decision times, emotional state, and the moral acceptability of the two proposed behaviors were analyzed, and the corresponding normative scores were reported.

Results showed that both sets of dilemmas led to more frequent utilitarian than deontological outcome selections, as well as to comparable scores of unpleasantness and arousal and low moral acceptability regardless of the assumed decision, suggesting that similar decision processes were involved during the dilemma resolution, which also seems resistant to demographic, educational, and gender differences (Navarrete et al., 2012; Palmiotti et al, 2020). Therefore, we can lend credence to a fundamental role of structure over the context in the interpretation of the trolley problem (Schein, 2020). A “structure-based” interpretation of moral dilemmas should be supported by no differences in moral judgments and emotional activation between traditional and driving incidental scenarios. On the contrary, score differences would represent different interpretations of the moral problems, underlying the importance of each scenario’s contextualization in its consequent perception (“context-based” hypothesis). In the present research, the behavioral attitudes shown towards the proposed scenarios were not dependent on their customization, but the driving-type seemed to enhance the utilitarian moral code strongly. In fact, regardless of the potential personal risk, resolutions were faster for on-road trolley problems than for traditional dilemmas. This phenomenon may suggest that on-road storylines are easier to contextualize in daily challenges than traditional moral dilemmas, which are typically applied to extreme and often unrealistic circumstances. Our belief is that when moral judgments and behaviors become more lifelike, cognitive demands decrease (Conway & Gawronski, 2013; Schein, 2020; Sütfeld et al., 2019), allowing the individual’s moral inclination to emerge more automatically and effectively. Furthermore, this result lends new credence to the criticism of low-plausibility dilemmas as a serious confounder of moral judgment (e.g., Bauman et al., 2014; Gold et al., 2014). For example, when moral dilemmas are detected as scarcely plausible, Körner and colleagues (2019) observed a systematic increase of deontological judgments, triggering a potential distortion in the examination of moral cognition. The on-road driving activity seems a simple example of this phenomenon, as the utilitarian approach to driving dilemmas emerged faster and sharper.

In the context of the dual-process theory, previous studies focused on the role of time pressure on moral judgment, discovering a negative effect on the proportion of the utilitarian resolution to sacrificial dilemmas (Jaquet & Cova, 2021; Suter & Hertwig, 2011; Trémolière & Bonnefon, 2014; Trémolière et al., 2017). This evidence seems helpful in explaining the high percentage of utilitarian behaviors in the presented study, where no time pressure was planned for each dilemma. Nonetheless, Tinghög et al. (2016) did not observe an effect of time manipulation on moral judgment, contrasting our interpretation. The topic deserves a more in-depth analysis in future studies.

Interestingly, results also show an overall improved responsiveness to sacrificial dilemmas during the course of the experiment. The decreasing trend in reaction times during the completion of iterated trolley problem experiments is a slightly addressed issue in the field of moral psychology. In the present research, the frequent selection of the utilitarian moral code and the constant reduction of decision times during the whole task opens a new discussion on the potential risks of the iterated approach. Traditionally, one important methodological factor that determines the statistical power of experimental results is the definition of a sufficient number of trials (e.g., Baker et al., 2020; Forrester, 2015; Lerche et al., 2017). Solving a moral dilemma is allegedly a conflictual task to face (e.g., Broeders et al., 2011) in terms of time and cognitive effort, and the attempt to operationalize its experimental load on the participant does not seem trivial. In this context, presenting more moral scenarios than needed may have a detrimental effect, potentially increasing the distance from the true value and so from a proper definition of the individual moral code. This noise effect may be due to several causes that can reduce participants’ commitment to the task, all conceivably related to the standard within-subject experimental design (e.g., high number of trials, cognitive fatigue, boredom, emotional activation, desire to finish the experiment). Future iterated moral dilemma applications are needed to address this methodological question.

A structure-based hypothesis concerning the trolley problem is also supported by the results concerning the emotional state experienced during decision-making, indicating comparable unpleasantness and activation between the two dilemma types. The inverse relation between valence and arousal ratings (i.e., more unpleasantness associated with higher activation) for both traditional and driving-type dilemmas is consistent with previous findings on incidental scenarios (Lotto et al., 2014). As in traditional sacrificial dilemmas, emotional activation also seems to play a pivotal role in the on-road moral decision process when human lives are at stake (e.g., Cushman & Greene, 2012; Cushman et al., 2012; Haidt, 2007; Szekely & Miu, 2015). In the present study, the emotional state during driving scenarios is qualitatively and quantitatively comparable to the one observed during traditional dilemmas.

Predictably, data also showed how both the available outcomes were constantly judged as scarcely moral. Nevertheless, it is also possible to observe how the chosen action was consistently judged as less immoral than the alternative option, independently from the selected behavior and the dilemma type. This perception can be interpreted as a confirmation bias of the decision-maker, defined as the tendency to interpret evidence selectively to reinforce one’s current beliefs or decisions (Nickerson, 1998). This result confirms the widely acknowledged dissociation between moral choice and moral judgment (Francis et al., 2016; Tassy et al., 2013), also observed in the context of autonomous transportation. Indeed, Bonnefon et al. (2016) highlighted the so-called “social dilemma of AVs” as the individual moral disagreement on utilitarian AVs, which are considered morally rightful for the community but unfair and unappealing for the individuals as passengers. As in our results, a mismatch between preferred behavior and its moral evaluation is apparent.

Surprisingly, the self-risk involvement factor did not have a significant effect on moral judgment and emotional activation. Manifold studies have observed the fundamental role of life-threatening scenarios in the development of a proper moral decision (Huebner & Hauser, 2011; Lotto et al., 2014; Moore et al., 2008; Petrinovich et al, 1993; Sütfeld et al., 2019), recognizing the direct proportion between utilitarian decisions and the potential number of lives saved (Bergmann et al., 2018; Bonnefon et al., 2016). This argument has not only theoretical but also practical implications: a threat to life is certainly a conflictual factor to cope with for a driver, both when driving a vehicle and when simply carried by an autonomous one. This leads us to recognize the importance of stressing further this topic in future applications.

This study has some potential limitations. The adaptation of the typical incidental dilemma to the context of manual driving results in a new perception of the individual agency in the available moral options. Traditionally, a proactive utilitarian decision is compared to a passive deontological one, in which the agent’s inaction leads to a costly coincidental causal mechanism. When the agent is placed with her/his hands on the wheel, his endorsement of the passive action can be legitimately perceived as a deliberate decision to run onto several pedestrians, with or without self-sacrifice. Although the deontological option of the driving-type dilemmas shares the traditional dichotomy between intervening and taking no action against an imminent threat, this factor has to be considered.

Additionally, in order to be consistent with the dilemma structure of Lotto and colleagues (2014) and with other previous procedures (Palmiotti et al., 2020; Sarlo et al., 2012), the presentation of the two moral outcomes was not counterbalanced between and within participants. This decision was based on the need to postpone, as requested, the reasoning process at the presentation of the decision slide. The utilitarian option depicts a predictable loss in order to achieve a collective benefit (e.g., save more people). Using this design, we wanted to avoid the risk of anticipating the decision on the only basis of the proactive utilitarian alternative, so on the individual’s disposition to harm someone for a greater good.

Further applications may be helpful in deepening these results in future researches: aiming for a wider cross-cultural validation (e.g., Awad et al., 2018; Di Stasi et al., 2020), strengthening the comparison between experimental modalities (e.g., Virtual Reality; see Sütfeld et al., 2017, 2019; Vankov & Jankovszky, 2021), controlling moral judgments for individual and situational factors (e.g. baseline affective state, online emotional state or personality traits, see Klenk, 2021), as well as deepening the role of social distance in determining moral judgment (Hofer et al., 2020), the causality of the agent in the dramatic consequences of the proposed outcomes (Phillips & Shaw, 2015), or the tendency to self-protection in pursuing the utilitarian behavior (i.e., framing the self-protective option both in the utilitarian and in the non-utilitarian outcome).

In conclusion, through the present research, we brought new evidence on the suitability of the incidental dilemma tool in the investigation of moral driving behaviors, suggesting the possibility that on-road contextualization leads to faster resolution of dilemmas with comparable emotional activation. Regardless of their contextualization, these kinds of hypothetical scenarios are frequently employed to unveil ethical issues relevant to everyday life. Although these dilemmas may seem sometimes radical and unrealistic, it helps us untie the cognitive and emotional mechanisms behind moral decisions. Nonetheless, moral judgment behind the wheel appears to be closer to our daily challenges, as opposed to the traditionally broader moral scenarios. We believe that researchers in traffic behavior and moral decision-making may benefit from this new driving-type dilemma set, to easily approach the complex task of developing and testing realistic moral scenarios in the context of autonomous and nonautonomous transportation.