As expected, the main effect of case complexity was again statistically significant (F(2, 1208) = 50.39, p < .001, ηp
2 = 0.08), and the contrast between high emotional and high technical complexity case conditions was non-significant (p = .50), indicating that participants perceived the complexity of these cases the same regardless of its cause. Replicating experiment 1, the contrast analysis also revealed that both types of high complexity cases were perceived to be more complex than the low complexity one (p < .001). The main effect of type of the judge was found to be non-significant (F(1, 1208) = 0.16, p = .69). The interaction effect between the complexity and judge type, however, was significant this time (F(2, 1208) = 3.48, p = .03, ηp
2 = 0.006). This interaction effect indicates that perceived complexity of a simple divorce case was greater for a human judge compared to an algorithmic judge (p = .002, see Fig. 6), whereas the contrast between the two types of complex cases was statistically non-significant. Given that the materials were identical in the two studies and that the pattern of results in experiment 2 closely mimics those of experiment 1 (where we did not observe such an effect), this interaction on the manipulation check items is unlikely to explain the findings for the main dependent variables.
As pre-registered, we found a main effect of the judge type (F(1, 1208) = 89.51, p < .001, ηp
2 = 0.07): Participants perceived the human judge to be more trustworthy (M = 6.58, SD = 1.60) than the algorithmic judge (M = 5.65, SD = 1.91). Furthermore, the main effect of case complexity was also significant in this study (F(2, 1208) = 6.72; p = .001, ηp
2 = 0.01): Participants who read about the low complexity case perceived the judge as more trustworthy (M = 6.3, SD = 1.86) than participants who read about the emotionally (M = 5.93, SD = 1.85; p < .001) or technically complex cases (M = 6.12, SD = 1.74; p = .08). Importantly, the interaction effect between complexity and judge type was again significant (F(2, 1208) = 3.12, p = .04, ηp
2 = 0.005, see Fig. 7). Similar to the results of experiment 1, participants trusted the algorithm even less when the case included emotional complexities compared to cases that were low in complexity (p = .03) or technically complex (p = .01). Interestingly, participants trusted the human judge more when the case was uncomplicated compared to cases that were high in emotional (p = .003) or technical complexity (p = .004).
As pre-registered, a 2 (judge type) x 3 (case complexity type) ANOVA revealed an even stronger main effect of judge type (F(1, 1208) = 331.40, p < .001, ηp
2 = 0.22, see Fig. 8). Replicating experiment 1’s pattern, participants were more willing to submit their cases when the judge was human (M = 8.3, SD = 2.34) than an algorithm (M = 5.36, SD = 3.25). The main effect of case complexity was again statistically significant (F(2, 1208) = 3.61, p = .03, ηp
2 = 0.006): Participants were more willing to submit their cases when the case they read about was low in complexity (M = 7.01, SD = 3.2) than high in emotional complexity (M = 6.69, SD = 3.32; p = .008). This contrast was only directional when comparing the cases with low and high technical complexity (M = 6.8, SD = 3.04; p = .12). Finally, the interaction between judge type and case complexity was non-significant (F(2, 1208) = 0.42, p = .66), indicating that interaction observed for trust did not spill over to intentions.
Replicating experiment 1’s results, a 2 (judge type) x 3 (case complexity type) ANOVA revealed a significant main effect of judge type (F(1, 1208) = 129.48, p < .001, ηp
2 = 0.10, see Fig. 9). Participants again perceived the human judge to be slower (M = 5.84, SD = 1.93) than the algorithmic judge (M = 7.12, SD = 1.95). The main effect of case complexity was also significant (F(2, 1208) = 3.99, p = .02, ηp
2 = 0.007). In particular, cases that were low in complexity were considered to be processed faster (M = 6.69, SD = 1.95) than the ones that were emotionally complex (M = 6.23, SD = 2.05; p = .005). Finally, the interaction effect was significant (F(2, 1208) = 3.77, p = .02, ηp
2 = 0.006). Interpreting this interaction effect, human judge was perceived to be faster when the legal case was uncomplicated compared to emotionally (p = .001) or technically complex legal cases (p = .007), with no such difference in the case of the algorithmic judge (p > .27).
Replicating experiment 1’s results, the main effect of the judge type was statistically significant (F(1, 1208) = 96.33, p < .001, ηp
2 = 0.07, see Fig. 10). Human judge was again perceived to be more expensive (M = 5.54, SD = 2.03) than the algorithmic judge (M = 4.30, SD = 2.32). Furthermore, the main effect of the case complexity was also significant (F(2, 1208) = 3.74, p = .02, ηp
2 = 0.006): Participants who read about the uncomplicated case rated the perceived cost to be significantly lower (M = 4.66, SD = 2.32) than the ones who read about emotionally (M = 5.17, SD = 2.21; p = .007) or technically complex cases (M = 4.95, SD = 2.23; p = .097). The interaction effect between judge and case complexity type was again non-significant (F(2, 1208) = 1.39, p = .25).
Discussion of Experiment 2.
The results of experiment 2 replicate the key main effects of judge type from experiment 1: Respondents reported less trust and lower intentions to submit a legal case to the local court when the judge was an algorithm than when it was a human. These main effects of judge type were again large in magnitude (intentions: d = 1.04; perceived trust: d = 0.53), corroborating generally negative views of respondents towards algorithmic judges. Moreover, we replicated the results for perceived speed and cost: The algorithmic judge was perceived to be faster and cheaper than the human judge. Finally, replicating experiment 1’s findings, we observed an interaction between judge and case complexity type on perceived trust.
12. General Discussion.
Every day, more and more computational and predictive technologies are being used within social institutions, including the justice system. There are many ongoing discussions about how to integrate AI in judicial decision-making and justice is one of the most frequently mentioned domains in which algorithms have a high potential to change the current practices (Araujo et al. 2018). We argue that it is important to understand how individuals perceive algorithmic judges when discussing the future application of AI in deciding court cases.
The current work studies individuals’ trust towards algorithmic and human judges and explores their intentions to submit their cases to a local court. In two empirical studies with a combined sample of over 1,800 adult US residents, we provide strong support for the notion that individuals care about the specific judge (human vs. algorithm) that will adjudicate their case. Specifically, we demonstrate that even though potential court users acknowledge that algorithms might lead to quicker and cheaper processes, perceived trust, and willingness to submit a case to court is negatively influenced by the use of algorithmic judge. Moreover, although human judges are in general trusted much more than algorithmic judges, both technical and emotional complexities reduce trust in human judges, whereas only emotional complexities reduce trust in algorithmic judges.
To provide robustness of our findings, we combined the data from all studies we ran in a single data file (three studies in total) and meta-analysed the findings (N = 3,039, Mage = 37.8, 53.00% F). We again found for trust a significant main effect of judge type (F(1, 3021) = 238.94, p < .001, ηp
2 = 0.07, d = 0.6) and type of case complexity (F(2, 3021) = 7.85, p < .001, ηp
2 = 0.005). Importantly, in line with the results of experiments 1 and 2, we found a significant interaction effect between judge and case complexity type on perceived trust (F(2, 3021) = 4.34, p = .01, ηp
2 = 0.003). Details of this internal meta-analysis can be found in the Supplemental Materials.
Our work provides novel insights on the impact of algorithms on individuals’ attitudes and decision-making. First, we document algorithm aversion in an important domain: Judicial decision-making. In many situations people need to go to court to protect their rights. The idea of facing an algorithmic judge may increase their frustration and influence their predisposition to use courts. Therefore, access to justice may suffer. Accordingly, despite the positive aspects of algorithms (i.e., speed and cost), policy-makers should expect pushback from citizens against courts’ adoption of algorithms in adjudication.
Our paper also adds to the growing literature on algorithmic decision-making (Helberger et al. 2020; Yeomans et al. 2019), we document its effect in a practical context, perceived trust of algorithmic and human judges. Additionally, existing research on algorithm aversion predominantly studies how individuals choose between using algorithms and humans (Dietvorst et al. 2018; Dietvorst et al. 2015). We contribute to this line of research by investigating how individuals perceive algorithms and humans when they are on the receiving side of the decisions that would be made by such decision-makers. Finally, our paper adds to the existing work on algorithms as we investigate the impact of legal case complexity (emotional vs. technical complexity). In particular, results of our internal meta-analysis highlight that trust in algorithmic judges especially drops when a legal case involves emotional complexity (vs. technical vs. low complexity).
13. Limitations and Future Directions.
Our studies have several limitations that deserve attention. First, all our respondents were US residents. Therefore, we would advise policy-makers not to generalize our results to respondents residing in other countries as it is possible that differences across countries may influence the general trust in judges. For instance, in countries with low court trust and low esteem of justice institutions, algorithmic judges may be trusted more than in countries in which courts and the justice administration have a better reputation. In addition to trust in the judicial system, court users’ trust in courts is influenced also by other factors, such as legal culture, the case at hand, the presence of a lawyer, or previous experiences. Future research is needed to conduct the same research in other jurisdictions and to use court or justice trust indicators when comparing data between jurisdictions. Second, trust in algorithmic decisions might also be influenced by repeated interaction with an algorithmic judge. For instance, experienced court players may have different attitudes towards algorithmic judges as they practice. In addition, we concur with Rule and Friedberg that trust in an algorithm should be considered in the broader context of where, how, and when the algorithm is used to resolve conflicts (Rule and Friedberg 2005). Trust is a contextual construction. We recommend more research on the effect of repeated exposure to algorithmic judges.
Third, even though there are many differences between humans and algorithms, current work aims to study lay people’s general perceptions of algorithms in judicial decision-making. Therefore, we prioritized achieving high internal validity and minimized differences between conditions by only manipulating the type of judge. Future research should investigate differences between algorithmic and human judges systematically. Research should focus also on hybrid situations where AI and humans work together, for instance an AI system supports the judge to draft a decision, or an AI system and a judge write a decision together. The level of AI integration and its relation to human judges may take many shapes and may affect people differently. Additionally, our paper covers several different perceptions such as trust, speed, and cost. However, we do not investigate how and when these variables impact individuals’ decisions to submit their legal cases to the court. More research is needed to further understand the dynamics between perceptions of algorithms and their impact on individuals’ attitudes and behaviours.
Further research might also delve deeper into the potential differences between legal fields. Depending on the field of law and the type of case, there might be divergence in the legal knowledge and in the approach potential court users take. These differences can be explained by the fact that parties are assisted by legal professionals like attorneys, who exercise considerable power over their clients and control their litigation strategies (Themeli 2018). Moreover, differences in the nature of the parties (e.g., business vs. private individuals) might have an influence on the willingness to submit a case to an algorithmic judge.
Our research is comparable to that of Sela (2018). Both our studies indicate less appreciation for automated decision-making. However, the studies differ in the dispute resolution mechanism under investigation – court for us, ODR for Sela (2018); and the timing of the interview– ex ante for us, ex post for Sela (2018). Additionally, we investigate the role of different types of case complexities to provide policy-makers with insights about what to expect when they adopt algorithmic judges.
In addition, our research may be comparable to Helberger et al. (2020). Both our studies investigate human perception of algorithm (automated for Helberger et al.) decision-makers but reach different conclusions. This may be due to the following difference between both studies: Helberger et al. (2020) survey is broad and without reference to any sector, whereas our experiment focuses on court litigation; Helberger et la. inquire on the perception of fairness (as used in legal literature), whereas for us fairness is one of the elements that constitute trust; Helberger et al. base their study on a survey, ours is an experiment which uses complexity moderators in addition to manipulating human vs. algorithm; Helberger et al. use a Dutch sample, whereas our sample is based in the US. Nevertheless, both our studies agree that the mechanism with which humans perceive algorithmic decision-makers is complex and sensitive to circumstances. Both our studies agree that more studies are needed in this direction.
Finally, we are also aware that the underlying values and concepts in this paper are very much legally imprinted. Our use of the categories simple and complex is closely related to what is accepted as such in the legal world (Themeli and Philipsen 2021). A civil litigation is legally simple when parties compromise on the outcome and the judge only has to sign at the bottom, after a marginal assessment of compatibility with minimum standards of law. In the psychological and technological frame of concepts and values, the categories simple and complex might refer to something totally different. Consequently, legally simple is not equal to easy to automate. To find out how those differences play out, a conversation is needed on the intricate conventions between the disciplines (de Vey Mestdagh 2020). Then it may turn out that the legally simple cases comprise a much larger variation in complexity than we envisage and that complex in the legal world does not correspond with complex in the technical world. We observe that behind a simple court case often a host of human complexities are hidden. We tried to mitigate the effects of our respective imprints, at least in part, by composing a multidisciplinary team for this first investigation. To bring our results further to concrete policy guidelines requires the inclusion of other experts into the conversation.