BACKGROUND

Widely documented racial/ethnic disparities are particularly striking in the treatment of cardiovascular disease,1,2 with whites up to twice as likely as blacks to receive thrombolytic therapy for myocardial infarction.37 Whether health professionals’ biases contribute to such disparities in care has been a subject of speculation and study.1,814 For example, physicians might believe that black patients are less likely to adhere to treatment recommendations than whites, and thus offer treatment less often.12 Some researchers speculate that unconscious bias is more likely to underlie treatment disparities than overt prejudice.12,1518

The computer-based Implicit Association Test (IAT), first introduced in 1998, is now used widely to measure bias that may not be consciously recognized.19 The IAT measures the time it takes subjects to match representatives of social groups (e.g., age, gender, and race) to particular attributes (e.g., good, bad, cooperative, and stubborn). The IAT operationalizes unconscious bias by hypothesizing that subjects will match a group representative to an attribute more quickly if they connect these factors in their minds, regardless of their awareness of this connection. For instance, the more strongly subjects associate pictures of white persons with good concepts and pictures of black persons with bad concepts, the more quickly they will match them, and vice versa. The computerized IAT measures the aggregate time required for these matching tasks under two conditions (pairings). A difference in average matching speed for opposite pairings (e.g., black+bad/white+good vs black+good/white+bad) determines the IAT score (Fig. 1). Subjects are typically aware that they are making these connections but unable to control them given the rapid response times and structure of the test. To understand the IAT procedure, readers can sample the test at www.implicit.harvard.edu.20

Figure 1
figure 1

Implicit Association Test (IAT) sample screens and stimuli. This figure displays sample screens and stimuli from the race preference (black-white/good-bad) IAT. Sample screens a, b, c, and d represent examples of pairing tasks that participants rapidly complete. Pictures of black or white individuals and words representing good or bad evaluative attributes are flashed in the center of the screen, and subjects quickly classify these as to whether they belong with category pairs shown in the upper left or the upper right of their screens using the e or i key on their computer keyboard. Numerous pictures and words are flashed onscreen for each of the two possible pairings, with responses usually taking less than a second and the order counterbalanced across participants. The speed to associate black+bad and white+good (screens a and b) relative to the opposite pairing of black+good and white+bad (screens c and d) constitutes the IAT score, interpreted to be a measure of implicit race preference

Although more than 200 studies have employed numerous versions of the IAT,19,2024 and data from 5 million tests has accumulated from www.implicit.harvard.edu, the test has not been used to systematically observe the behavior of health care professionals. Given questions about the source of observed disparities in health service use, the IAT might provide insight into the contribution of implicit biases among physicians. In this study, we used a race preference IAT to measure implicit biases among emergency medicine and internal medicine residents. We also developed two new tests to measure stereotypes about general cooperativeness and specific cooperation with medical procedures. We tested whether both preferences and stereotypes affected physicians’ clinical decisions for white and black patients. More specifically, using a case vignette with patient race assigned randomly, three IATs, and a questionnaire, we sought to determine whether implicit or explicit race biases predict physicians’ decisions to give thrombolysis for acute myocardial infarction.

METHODS

Participants and Study Procedures

In April and May 2005, we e-mailed a study invitation and three weekly reminders to all 776 internal medicine and emergency medicine residents in four academic medical centers in Boston, Mass, and Atlanta, Ga. The emails included a link to the research web site and a login code. Using an honor system administered by the chief residents, we offered participants a $10 gift certificate and entry into a lottery ($200 and $100 prizes for each site) for completing the 20-minute, anonymous, web-based study. Of the 776 residents, 393 (50.6%) participants completed the randomized vignette questionnaire and explicit bias section of the study. We excluded 25 participants who were not residents in an eligible program (n = 2) or had previously completed part of the study (n = 23). Fifty-seven participants failed to complete the IATs or had unusable IAT results, as described elsewhere.21 Twenty-four participants failed to complete the demographics section. This left 287 participants (37.0% of 776) who completed all aspects of the study. On a posttest question, 67 of these 287 participants reported some awareness of what the study was about through discussions with colleagues who had completed it. Because this awareness may have biased their responses to the case vignette, we omitted these participants from the analyses. All results (unless otherwise specified) are based on the 220 participants (28.4%) who completed the study and were unaware of the nature of the study.

Study Design

We created a web-based survey instrument that randomly assigned participants to see a picture of a black or white patient while reading a clinical vignette. From hundreds of shareware photographs, we chose 58 whose facial expressions appeared neutral. We created new patient images by morphing together these photographs using Photo Morpher Software (Morpheus Software, LLC, Santa Barbara, Calif, USA). The 21 best quality images were chosen and 19 independent evaluators (physicians, research assistants, and graduate students of various racial/ethnic backgrounds and not involved in the study) reviewed these. We chose four (two black and two white) that were most closely matched on apparent age (approximately 50 years) and attractiveness (7-point scale). The vignette (see Appendix) describes a 50-year-old male presenting to the emergency department with chest pain and an electrocardiogram suggestive of anterior myocardial infarction. It is stated that primary angioplasty is not an option and no absolute contraindications to thrombolysis are evident.

We asked participants to rate the likelihood that the chest pain was because of coronary artery disease (CAD) (5-point scale, very unlikely to very likely), whether they would give the patient thrombolysis (yes/no), and the strength of their recommendation (5-point scale, definitely to definitely not). To assess explicit bias, the software then asked participants several questions about whether they preferred white or black Americans (5-point scale with preference expressed as somewhat or slightly prefer black or white Americans, and 10-point thermometer scale of warm feelings toward each group separately). We also asked about their beliefs about patients’ cooperativeness in general and with regard to medical procedures such as thrombolysis (5-point scale—black patients somewhat less cooperative, slightly less cooperative, equally cooperative; white patients slightly less cooperative or somewhat less cooperative). Finally, the online survey included queries about respondent demographics, effectiveness of thrombolysis, and pre- and posttest opinions on unconscious bias and IATs. The vignettes and survey are available upon request.

Participants also completed three IATs corresponding to the explicit bias questions. The Race Preference IAT measured implicit association of white and black race with good and bad terms. We created the next two IATs specifically for this study. The Race Cooperativeness IAT measured implicit associations between race and general cooperativeness. The Race Medical Cooperativeness IAT measured implicit associations between race and cooperativeness with medical recommendations. All IAT scores are expressed as normally distributed continuous variables. For efficiency we used a 5-block structure for the IATs, with the specific pairing received first (e.g., black-bad/white-good) counterbalanced across participants. We scored IATs according to published guidelines with zero representing no racial bias, positive values representing prowhite bias, and negative scores representing problack bias (range typically −0.6 to 1.2).21 Figure 1 shows the faces representing white or black race and the terms used as stimuli for the concepts of good/bad and cooperativeness/uncooperativeness.

Analysis

We examined differences in demographic characteristics, likelihood of CAD, and decisions to treat with thrombolysis between participants assigned to black versus white patients using chi-square and t tests as appropriate. We compared mean IAT scores for various demographic groups using t tests. To look for relative disparity by race between diagnosis and treatment, we compared participants’ ratings of the likelihood that the chest pain was because of CAD (the diagnosis variable, 1–5 scale as above) with the likelihood of treating the patient with thrombolysis (the treatment variable, yes/no). To do this we put both the diagnosis and treatment variables on the same scale using z-scores. We then subtracted the treatment variable from the diagnosis variable to create a delta variable. A delta score of zero indicated that treatment was commensurate with diagnosis. A negative score indicated that treatment was more likely than diagnosis, and a positive score indicated that diagnosis was more likely than treatment. We used a one-way ANOVA to test whether diagnosis-treatment delta was different for black versus white patients.

To test whether bias predicted physicians’ use of thrombolysis for black and white patients, we used moderated multiple linear regression analysis with thrombolysis decision as the dependent variable, bias (implicit or explicit) as the independent variable, and patient race (black or white) as the moderator, adjusting for analysis-relevant covariates (e.g., physician race, sex, socioeconomic background, explicit race bias, implicit race bias, and belief in the effectiveness of thrombolysis). We then added the 67 physicians who were aware of the nature of the study back into the dataset and used moderated multiple linear regression to examine the potentially moderating impact of physician awareness on the relation between bias and thrombolysis decision. We performed all analyses using SPSS statistical software (SPSS Inc., Chicago, Ill, USA). The study received approval from the Institutional Review Boards at Beth Israel Deaconess Medical Center, Partners HealthCare System, and Emory University.

RESULTS

Table 1 describes demographic characteristics of the participants stratified by whether they were randomly assigned a black or white patient. Participants assigned black vs white patients did not differ significantly, except that first- and second-year residents were more likely to be assigned white patients. Year of residency did not have any significant effect on either likelihood of recommending thrombolysis (chi-square P = .98) or on IAT scores however. Table 1 shows mean IAT scores for all three IATs by participants’ demographic characteristics. Physician race was the only consistent demographic predictor of IAT scores. Black physicians had mean scores on all three IATs near zero, whereas all other groups had scores in the positive, prowhite range. Emergency medicine residents also had somewhat less prowhite IAT sores on the general cooperativeness IAT. There was no difference in the IAT scores of participants randomized to black versus white patient vignettes.

Table 1 Baseline Characteristics and IAT Scores of Physician Participants

Physicians’ Explicit and Implicit Racial Biases

On the measures of explicit bias, participants expressed equal preference for black and white Americans on the 5-point scale of race preference (mean difference = 0.03, P = .36) and on the 10-point thermometer scale measuring warmth toward black and white Americans separately (mean difference = 0.04, P = .61). They reported black and white patients to be equally cooperative on a 5-point scale of cooperativeness with medical procedures (mean difference = 0.01, P = 1.00) and on a 10-point thermometer scale measuring cooperativeness separately for black and white patients (mean difference = 0.08, P = .49).

On the measures of implicit bias, all three IATs showed statistically significant effects (P < .001), with stronger associations of negative attributes (e.g., bad and uncooperative) to blacks than to whites. Figure 2 displays a graph of the magnitude of physicians’ bias on the 4 explicit measures (top half) and 3 implicit measures (bottom half). Because measures of explicit bias (5- and 10-point scales) and implicit bias (reaction time scores ranging from −1.01 to +1.35) were on different scales, the magnitude of physicians’ bias across the seven measures could only be directly compared by converting them all to the same metric—Cohen’s effect size d. Cohen’s d is conceptually defined as the magnitude of an effect independent of sample size (see conversion formula at the bottom of Fig. 2) and is widely used in empirical research and meta-analysis in the behavioral sciences. Cohen’s d values range in size from small (0.20), to medium (0.50), and large (0.80).25 As shown in Figure 2, none of the explicit effects approached the cutoff for a small effect. In contrast, all of the implicit effects were medium or large in magnitude.

Figure 2
figure 2

Magnitude of physicians’ explicit (self-reported) and implicit (Implicit Association Test) race bias on a standardized scale—Cohen’s effect size d

Aggregate scores on the three separate IATs were all somewhat correlated (average pairwise correlation r = .32, P = .001). We found some correlation between implicit bias (IAT score) and explicit bias (composite 5-point scale and 10-point feeling thermometer) for general racial preference (r = .28, P = .001) and no correlation for cooperativeness with medical procedures (r = .05, P = .50).

Diagnosis of CAD and Treatment with Thrombolysis

On a scale from 1 (less than 20% likely) to 5 (more than 80% likely), physicians were more likely to diagnose black patients (M = 4.08) than white patients (M = 3.71) with CAD as a cause of their chest pain (P = .02). However, participants were equally likely to give thrombolysis for black (52%) and white (48%) patients (chi-square P = .68). In absolute numbers 29.8% (33/112) of physicians who saw a white patient vignette thought he was very likely to have CAD versus 40.1% (43/108) for black patients. Within this subgroup 58.2% of physicians were very likely to offer white patients thrombolysis versus 42.7% for black patients (P = .12) (results not shown). Using the delta score (z-score relating likelihood of diagnosis and treatment) we were able to adjust for covariates and show a racial disparity in thrombolysis relative to CAD diagnosis. For blacks, delta was 0.11, indicating lower likelihood of thrombolysis relative to the physician’s perception of the likelihood of acute myocardial infarction. For whites, delta was −0.14, indicating higher likelihood of thrombolysis (P = .06).

Implicit (But Not Explicit) Bias Predicts Differences in Physicians’ Thrombolysis Decisions

Physicians’ explicit (self-reported) attitudes toward patients (preference) or stereotypes about cooperativeness by race did not influence their decision to give thrombolysis for black versus white patients. A moderated multiple linear regression analysis showed no evidence of an interaction between self-reported attitude and patient race on thrombolysis recommendation (P = .82) (results not shown). This result remained nonsignificant after controlling for physicians’ implicit bias, race, sex, socioeconomic status (SES), and belief in thrombolysis effectiveness (P = .64).

Physicians’ implicit biases, however, showed strong associations with their decisions to give thrombolysis. Figure 3 illustrates how each of the three IAT results and the combined IAT composite predicted thrombolysis decisions for black and white patients. Subpanel A shows that as the degree of antiblack bias on the race preference IAT increased, recommendations for thrombolysis for black patients decreased. The interaction between implicit antiblack bias and patient race on treatment recommendation was significant (P = .009). After controlling for physicians’ explicit race bias, race, sex, SES, and belief in thrombolysis effectiveness, the interaction effect of patient race and thrombolysis remained significant. A composite IAT measure combining all three IATs (race, attitude, and stereotypes) showed the same pattern (subpanel D) and was statistically significant both with and without the covariates included in the model (P = .04). The same general pattern also held for the medical cooperativeness IAT (subpanel C); however, the interaction was not statistically significant (P =.21).

Figure 3
figure 3

Relationship between physician race preference Implicit Association Test (IAT) score and thrombolysis decisions by patient race. *P < .05, **P = .05–0.11, B values are standardized regression coefficients that describe the magnitude of each relationship that the regression lines represent. IAT bias is a continuous variable represented on the polar ends of the x-axis as low antiblack IAT and high antiblack IAT. Treatment recommendation of thrombolysis is represented on the y-axis and is a dichotomous variable for which 0 means “would not give thrombolysis” and 1 means “would give thrombolysis.” Subpanels AD represent race preference, general cooperativeness, medical cooperativeness, and the composite IAT measures, respectively

Participants Who Were Aware of the Study’s Purpose

Results presented above excluded the 67 participants who reported some awareness of the nature of the study. Additional analyses including these 67 aware physicians demonstrated a two-way interaction between awareness and IAT score on thrombolysis recommendation (P = .001) (Fig. 4). As unaware physicians’ bias on the composite IAT variable increased, their likelihood of recommending thrombolysis to black patients decreased, as described above. In contrast, increase in bias among aware physicians was associated with more thrombolysis for black patients. All P values remained significant after adjusting for covariates and the same general pattern held for all three IATs.

Figure 4
figure 4

Relation between physicians’ awareness of the study’s purpose and Implicit Association Test (IAT) bias on recommendation for thrombolysis (black patients only). B values are standardized regression coefficients that describe the magnitude of each relationship that the regression lines represent (P = .001). IAT bias is a continuous variable represented on the polar ends of the x-axis as low antiblack IAT and high antiblack IAT. Treatment recommendation of thrombolysis is represented on the y-axis and is a dichotomous variable for which 1 means “no recommendation” was given and 2 means a “recommendation” was given

Before completing the IAT section of the study, 60.5% of physicians agreed or strongly agreed with the statement: “Subconscious biases about patients based on their race may affect the way I make decisions about their care without my realizing it.” When shown the same statement after taking the IATs, 71.6% of physicians agreed or strongly agreed with this statement (difference in mean 5-point score = 0.33, P < .001 by paired t test). Meanwhile 74.8% felt that taking IATs is a worthwhile experience for physicians, and 76.1% felt that learning more about unconscious biases could improve their care of patients.

COMMENT

The IAT has been used to study implicit preferences and stereotypes for over a decade. It is a new method in its application to studying health care provider bias as a potential root cause of racial/ethnic disparities in health care. This is the first study to use a sociocognitive measure of bias among physicians, and to correlate this with treatment decisions according to patient race. It also represents the first time that the IAT, first published in 1998,19 has been modified to measure and demonstrate an implicit stereotype specific to medical care (i.e., that black patients are less willing to undergo medical procedures).

Not surprisingly, most physicians did not admit to any racial biases explicitly. However, on the implicit measures of bias (IATs), most nonblack physicians demonstrated some degree of bias favoring whites over blacks. Participants’ scores on the race preference IAT showed a range of implicit race bias similar to previous experiments on nonphysicians.21,26 The new cooperativeness IATs were normally distributed and somewhat correlated with the well-studied race preference IAT, suggesting that they measure different but related components of race bias.

Findings of implicit bias and its effects on clinical decisions may surprise physicians who tend to view their work as both altruistic and evidence-based.27 Implicit race biases are prevalent in the United States in general,26 and as such it should not be surprising that they are prevalent among physicians as well. The neural and cognitive processes underlying these biases are assumed to reflect both evolutionary bases and socially acquired orientations. The content of implicit biases(e.g., that black Americans are less cooperative than white Americans) are assumed to derive from sociocultural learning (e.g., explicit instruction and implicit messages) that accumulate over time. Implicit biases are primarily unconscious and do not imply overt racism. This is supported by the strong dissociation in the average level of expressed, explicit preferences and elicited, implicit ones, as well as the low correlation between explicit and implicit preference observed in this study. Critics of implicit measure of social cognition have asserted that such preferences and beliefs may reflect messages about the state of social groups in the larger culture but cannot be said to reflect an individual’s own preferences. If that were the case, doctors’ own decisions should not have been predicted so clearly by their implicit biases. The fact that they do remind us that implicit biases may affect the behavior even of those individuals who have nothing but the best intentions,24 including those in medical professions.12,13,15 The IAT is but one method for detecting implicit social cognition and it is the first to be put to use in the present study in a medical context. As such, the meaning and significance of implicit biases in health care deserves much greater investigation.

We found no difference in the crude rate of thrombolysis between study participants assigned a black patient versus those assigned a white patient. However, this race equality in treatment occurred in the presence of greater diagnosis of CAD in black than white patients. Equal treatment in the face of unequal diagnosis between the two groups constitutes a disparity.

The result of interest did not depend on demonstrating disparities in treatment. Rather, this study was designed to determine whether physicians’ implicit biases (IAT scores) predicted different patterns of thrombolysis recommendation for black and white patients. We found that implicit bias against blacks (as measured by the race preference IAT) was negatively correlated with likelihood of recommending thrombolysis for black patients and positively correlated with likelihood of recommending thrombolysis for white patients. This finding suggests that unconscious race biases among physicians may influence their decisions about important interventions such as thrombolysis for suspected myocardial infarction. Whereas several studies have pointed to unconscious biases as one potential root cause for racial and ethnic disparities in health care,914 this is the first evidence directly supporting this link. We were encouraged to find most resident physicians open to the idea that unconscious biases could affect their clinical decisions, and that learning more about these biases could improve their care of patients. After completing the IATs, residents acknowledged greater vulnerability to unconscious bias than they did at the start, suggesting that the experience heightened their awareness. Also, those physicians who were aware that the study had to do with racial bias, and who had higher levels of implicit prowhite bias, were more likely to recommend thrombolysis to black patients than physicians with low bias—the opposite of the study’s main effect. This suggests that implicit bias can be recognized and modulated to counteract its effect on treatment decisions. These finding support the IAT’s value as an educational tool.

There are several limitations inherent in this study. Response rates were relatively low and the sample size smaller than ideal, making it difficult to detect smaller effects that may exist. Resident physicians, particularly those at large academic health centers in Boston and Atlanta may differ from physicians who typically make thrombolysis decisions, so it remains to be seen if those with greater experience show the same pattern. Nevertheless, our primary findings are based on an experimental manipulation involving randomized assignment of the physician to a black or white patient vignette, which provides confidence in the causal interpretations that are drawn. A second limitation derives from the use of a computerized presentation of a patient, which may, for reasons that may not be obvious, have contributed to an outcome that may not occur in a typical in-person encounter. The result of predictive validity we report may be an overestimation, but equally likely an underestimation of the role of implicit bias in clinical decision making.

Future studies might do well to examine actual patient-physician interactions, introducing such dimensions as communication, rapport, and other nonverbal behaviors that are known to be related to implicit discrimination. It may in fact be the subtleties of interracial interactions that lay the foundation for differential treatment to occur.28 IATs can be developed to provide a broader range of clinically relevant stereotypes, in addition to the tests we used. Studies should continue to obtain detailed measures of participant awareness because this did show impact on treatment decisions in our study.

In conclusion, our findings suggest that physicians, like others, may harbor unconscious preferences and stereotypes that influence clinical decisions. Further study is needed to confirm our findings, and to determine the extent to which unconscious racial biases contribute to health care disparities. Given the potential existence of these biases, new approaches to addressing disparities might include confidential feedback mechanisms to make physicians aware of disparities in their own cohort of patients, securely and privately administered IATs to increase physicians’ awareness of unconscious bias, and targeted education to mitigate its effects on clinical decision making. We cannot and do not suggest that unconscious bias among health professionals is the largest or most important factor leading to disparities in health care. However, the fact that it is, by its very nature, hidden from conscious awareness suggests that it receive explicit attention.