Preliminaries
What follows will be concerned with situations where an evaluator evaluates a set of people with respect to some goal (e.g. hiring, admission, promotion, proposal evaluation etc.), and how such evaluation can be improved. Even if these situations by no means exhaust the situations where prejudice is manifested, they do apply to a large range of situations which are tremendously important from the perspective of the distribution of wealth and power in society and the overall integration of its members. So even though a focus on evaluation situations involves a considerable restriction (excluding as it does, for instance, manifestations of hostility and violence), it still targets tremendously important situations. It also bears emphasis that evaluation situations are often characterized by prejudice, as is demonstrated particularly clearly by so-called CV–studies (See e.g. Steinpreis et al., 1999; Bertrand & Mullainathan, 2004; Correll et al., 2007; Moss–Racusin et al. 2012; Döbrich et al., 2014; Agerström & Rooth, 2011; Rooth, 2010) which demonstrate that identical, or comparable, CVs give rise to differential evaluation when they signal membership in different irrelevant social groups (e.g. male/female or Black/White).
Most evaluation situations involve rankings, i.e. orders which relate all members of a set of candidates, but allows for two or more members to have the same rank (formally, a total preorder over a set of people). Some evaluation situations just involve ranking a single candidate above the rest, but are ranking situations in this sense nonetheless. Thinking about evaluation in terms of ranking is particularly useful when we are concerned with the epistemological problems of prejudice, since it affords an easily measurable notion of misrepresentation, in terms of the difference (e.g. in terms of a rank order correlation) between the correct ranking of candidates in terms of their competence, and the ranking produced by an evaluator.Footnote 1 Had we instead focused on, for instance, people’s possibly erroneous general beliefs about particular social groups—which is commonplace in the philosophy of prejudice—the notion of a misrepresentation, and the related notion of increased veracity, would have been harder to operationalize, in particular since these beliefs are typically expressed in terms of generic generalizations, a form of generalization with very unclear truth conditions (cf. Leslie and Lerner 2016). The question of what we should to do to improve the veracity of rankings, thus seems clearer than other related questions.Footnote 2
With these preliminaries in place, we can define an individual prejudice intervention (‘individual intervention’, for short) broadly as an attempt to change an individual so that they become less prejudiced. It can thus be seen as a function from one set of features of an individual—the features of the individual that influences, or is influenced by, the outcome of the intervention—to another such set—what the individual is like after the intervention. Thus understood, individual interventions encompass everything from increasing intergroup contact (Allport, 1954; Pettigrew & Tropp, 2006), learning counter stereotypical ‘if—then plans’ (so called ‘implementation intentions’) (Rees et al., 2019), and just being told to avoid prejudice in the right way (Döbrich et al., 2014), to playing a softball game with teammates mostly from the social group you are prejudiced towards, in order to reduce implicit bias (Lai et al., 2014).
Since our focus is on evaluation situations, we are concerned with such functions where ‘become less prejudiced’ is understood in terms of ‘evaluate more accurately’.
The Presuppositions of Individual Interventions
Consider now the question of what conditions need to hold in order for an individual intervention to be epistemologically beneficial with respect to a prejudiced evaluator E and future ranking situations of sort s (e.g. hiring new baristas at a particular coffee shop).
First, in order for it to be (conceptually) possible for an intervention to be successful, E’s evaluations have to be (conceptually) possible to improve. This presupposes some performance on the part of E that can be evaluated, and this seems to minimally involve that E is able to order the people E is evaluating, i.e. that E’s evaluations can be captured by an ordinal scale (equivalently, that the evaluations have the structure of a ranking). Although this assumption is likely often satisfied in practice, it is not a completely trivial assumption, since it rules out transitivity violations and cases of non–comparability, so an evaluator who ranks a before b before c before a, or is unable to rank two or more people, does not satisfy this assumption.
With this in place, we can (following Jönsson & Sjödahl, 2017 and Jönsson and Bergman, forthcoming) measure the veracity of a ranking in terms of the rank order correlation between an evaluator’s ranking and the correct ranking (e.g. in terms of Spearman’s rank correlation coefficient, a number between 1—which signifies a perfect positive correlation—and –1—which signifies a perfect negative correlation).Footnote 3 Given this measure of veracity, E’s degree of accuracy (with respect to s–situations) can be identified with the average veracity of rankings that E is disposed to produce (in s–situations).
We can then understand an intervention being successful in terms of it bringing about an improvement in accuracy.
An improvement in accuracy can in turn be understood in two main ways: either (1) asynchronously, in terms of the contrast between the veracity of E’s rankings before and after the intervention (which involves comparing two different sets of rankings), or (2) synchronously, in terms of the contrast between the veracity of E’s rankings after the intervention, and the veracity of those interventions if E had not undergone the intervention (i.e. the contrast between E’s accuracy with and without the intervention). Both notions are reasonable, but (2) is slightly better than (1) since it avoids possibly confounding differences between past and future rankings (e.g. that later rankings are easier, or feature less people that E is prejudiced towards). Moreover, (2) is in the same spirit of an (between factors) experimental design which is common in intervention studies (see below).
Second, the aforementioned measure of veracity presupposes that there are objective rankings that E’s rankings can be evaluated in terms of. Although such rankings are seldom readily available in real life selection situations (due to time pressure and various uncertainties concerning the candidates) it seems reasonable to assume that there are such rankings in many situations. Moreover, it seems reasonable to think that we can come to know these rankings (given enough time, information and ingenuity), at least in testing situationsFootnote 4. Such objective rankings are readily available, for instance, in the aforementioned CV–studies. In these studies, individual interventions are often tested using pairs of CVs that differ only with respect to which irrelevant social group a candidate belongs to. In these situations, the objective ranking of the two CVs—qua them being relevantly identical—is just an assignment of the same rank to both CVs. An evaluator who assigns them different ranks thus deviates from the objective ranking, something which can be captured by the suggested veracity measure.Footnote 5
Third, in order for the intervention to be epistemologically beneficial to E, it needs to be accuracy increasing in s–situations featuring people from the social groups which E is prejudiced towards. We can distinguish between two situations. First, an individual intervention is unconditionally accuracy increasing (in s–situations, and with respect to certain social groups) if it tends to increase the accuracy of the individual that undergoes it independently of their features (over and above their prejudice). Second, an individual intervention is conditionally accuracy increasing on f, where f is a feature (or set of features), if it tends to increase the accuracy of the individual that undergoes it only if they are f.
At the very least, showing that an individual intervention is unconditionally accuracy increasing in s–situations, typically involves measuring the accuracy of a large, diverse, randomly selected group of people by having them rank individuals (or representations of individuals) in s–situations, then administering an intervention to some portion of these people (the intervention group), measuring everyone’s accuracy again, and finding a significant improvement for the intervention group but not for the control group with some suitable statistical test, e.g. a test for an interaction effect in a mixed measures analysis of variance (ANOVA). Given a sufficient number of such studies across different conditions, it has been shown that the intervention is unconditionally accuracy increasing.
If it is instead found that the intervention only leads to increased accuracy for people that are f, or if testing hitherto only have involved people that are f, it has only been shown that the intervention is conditionally accuracy increasing on f.
Fourth, if the intervention is only conditionally accuracy increasing on f it also needs to be the case that E is f in order for the intervention to epistemologically benefit E. In some cases this will be fairly trivial to ascertain, (e.g. if an intervention only increases accuracy for women), but other times it will not be (e.g. if the intervention only increases accuracy for people having a certain level of implicit bias) and E has to be shown to satisfy f through additional testing.
Fifth, most individual interventions change various psychological states of the individual in order to bring about a particular change in ranking accuracy (e.g. if the intervention involves E learning certain ‘if—then plans’, this changes the evaluator’s beliefs and/or associations). Given that these changes influence the evaluator’s accuracy, it is also likely that they will bring about other changes. This raises the possibility that these other changes might not be epistemological improvements. So in order for an individual intervention to be epistemologically overall beneficial with respect to E it is important that the gains in accuracy the intervention gives rise to, are not offset by unintended detrimental epistemological side–effects. Sometimes an absence of such side effects will be fairly obvious, but in other circumstances things will be less clear (as will be exemplified shortly).
Sixth, assumptions 1–5 are, strictly speaking, all the assumptions that are needed in order for an individual intervention to be epistemologically beneficial with respect to a prejudiced evaluator E. However, in order for it to be reasonable to believe that an individual intervention is epistemologically beneficial, it needs to be shown that the third assumption (and possibly the fourth and fifth) obtains, and this presupposes that the presuppositions of the statistical tests used to show this are also satisfied.Footnote 6 For instance, if an ANOVA is used, it is presupposed that the involved populations are normally distributed and have homogenous variance.Footnote 7
In order to sum up the aforementioned assumptions succinctly (and to frame them in a way that facilitates comparison with the corresponding assumptions of post hoc intervention in Sect. 3), let E be an evaluator who evaluates people in s–situations, let g1 … gp be social groups towards which E is substantially prejudiced (i.e. prejudiced enough for this to distort E’s rankings), let r1 … rn be future rankings produced by E (in s—situations) that feature members of g1 … gp, and let m1 … mo be rankings used in empirical tests of a particular intervention. We can then say, that it is reasonable to believe that an individual intervention will be epistemologically beneficial with respect to E and future s–situations r1 … rn, if it is reasonable to believe the following six assumptions.
-
11.
E’s evaluations (r1 … rn and m1 … mo) are carried out using, minimally, an ordinal scale.
-
12.
There are objective rankings for r1 … rn and m1 … mo.
-
13.
The intervention is accuracy increasing in s–situations featuring members of g1 … gp, either unconditionally or conditionally on f.
-
14.
(E is f.)
-
15.
The intervention is without negative epistemological side–effects.
-
16.
The statistical assumptions underlying the demonstration of I3 (and possibly I4, and I5) are satisfied.
For the purpose of illustrating a case where I think that these conditions are satisfied, consider the simple individual intervention (AGE) due to (Döbrich et al., 2014) aimed at reducing ageism in performance appraisals and hiring decisions. AGE just involves briefly informing evaluators of prevailing age biases and the negative consequences of age discrimination, and telling them to disregard age when carrying out their evaluations. To test AGE, Döbrich et al., (2014, study (2) had participants evaluate job applications that were identical save for the indicated age of the fictitious applicant. The participants had no difficulty evaluating the applications, and to the extent that E is like the participants, I1 is thus reasonable. As argued above, CV–studies induce an objective ranking, and I2 is thus satisfied (at least with respect to the testing situation).
Döbrich et al. demonstrated a substantial age–based prejudice among the participants, and that AGE could successfully remove this prejudice (or at least the influence of this prejudice) to the point of non–significance, using a between measures analysis of co–variance (ANCOVA). This demonstration lends some credence to the idea that AGE is accuracy increasing, conditionally on an evaluator being like the participants in the study (people currently or previously working in HR with experience of hiring decisions), i.e. that I3 is satisfied. Given that our imagined evaluator E is like the participants in the study, I4 is also satisfied. AGE just warned participants about the existence and effects of ageism and prompted them to not base their evaluation on age. Refraining from an irrelevant factor in this way seems unlikely to have negative side–effects, and I5 is thus reasonable as well.Footnote 8 Finally, no reasons are reported by Döbrich et al. to doubt the statistical assumptions underlying the ANCOVA (e.g. similarity of variance and distributional shape of the underlying populations and some additional linearity assumptions), so it seems reasonable to assume that I6 holds as well. Since all of I1–I6 seems reasonable, it seems tentatively reasonable to think that AGE will be epistemologically beneficial with respect to an evaluator E—who is like the participants in the study—and ranking situations like those in the study.
Before we move on, let us also consider an example of a kind of intervention where it is—currently—more questionable that I1—I6 are satisfied. For this purpose, we can consider an implicit bias intervention, an individual intervention that attempts—in the first instance—to reduce a measure of implicit bias, such as the IAT (Greenwald et al., 1998). One such intervention (endorsed by Madva, 2020, and found to be among the most effective implicit bias interventions tested by Lai et al., 2014 in a comparison of 17 such interventions) encourages participants to form implementation intentions, simple if–then plans of the form ‘If I see a Black person, I will respond by thinking “good”’.Footnote 9 This kind of intervention has repeatedly been shown to lower implicit bias (as measured by the IAT and other measures, see e.g. Gollwitzer & Schaal, 1998; Gollwitzer, 1999; Stewart & Payne, 2008; Mendoza et al., 2010; Webb et al., 2012; Lai et al., 2014; Wieber et al., 2014; Rees et al., 2019). However, it is questionable if it is epistemologically beneficial in the sense we are concerned with in the present context.
First, it is doubtful that the intervention really is accuracy increasing, since demonstrations of it decreasing implicit bias measures doesn’t establish this, even if these are coupled with negative correlations between implicit bias and degree of accuracy. Unless implicit bias causally contributes to impaired accuracy, changing implicit bias will not improve accuracy. If implicit bias and prejudiced behavior are both determined by some third factor, then lowering implicit bias will not reduce biased behavior. This in itself should be enough for us to doubt I3, but this doubt is further strengthened by a meta study by Forscher et al. (2019), who reported that even successful reductions in implicit bias do not generally seem to substantially reduce degree of prejudiced behavior.
Second, implementation intentions—as they have been studied so far—are also questionable from the perspective of I5, since it seems that they are more susceptible to side–effects than something like AGE since it is less clear what cognitive effects they have. Strengthening an association between Black faces and positive feelings need not have overall epistemologically beneficial consequence. For instance, Stewart & Payne’s (2008) successful intervention to reduce the degree to which participants misidentify tools as weapons after seeing a Black face by way of an implementation intention (‘If I see a Black face, I will think safe.’), also had the consequence of having participants misidentify guns as harmless tools after seeing a Black face (cf. Stewart & Payne, 2008: 1337, Fig. 1) to a greater extent. As long as the intervention is abstracted away from the task at hand, as is the case with many implementation intentions, the presence of unintended side-effects are more difficult to rule out.Footnote 10
Given these doubts about two of the conditions listed above, it seems—given the evidence currently available—to be questionable whether an implementation intention is epistemologically beneficial with respect to an evaluator. This is—of course—not meant as a forward–looking critique of implementation intentions—future research might reveal them to be excellent ways to reduce prejudice.Footnote 11 It is just an illustration of what doubts about the six conditions might look like, given our current evidential state.