[A] Some advantages and disadvantages of the scenario:
The use of the scenario has the advantage of standardizing the materials presented to subjects and allows the independent variable to be manipulated between the experimental groups. (In the current study, the proposed moderating variable, black or white face, was used for randomization rather than the primary independent variable, IAT score.)
The use of the scenario has the disadvantage of not allowing the individuation of information3,4 that may be important to diagnosis (e.g., other risk factors that may affect the illness presentation and the distribution of prior probabilities across diseases in the differential diagnosis) or treatment selection (e.g., prior access to care, preferences, beliefs).
[B] When considering the differential diagnosis of a given case of chest pain, does the presence of a black or white face associated with the vignette provide clinically important information?
We believe the answer is “yes.”
If the scenario were to be used by itself (without faces), one is asked, in essence, to estimate the likelihood of diseases for the average 50-year-old man with hypertension who is a smoker.
With the addition of white and black faces (which in this study were morphed to be an “average” white or black man), the task becomes a bit more specific. One is asked to either estimate the likelihood of diseases for the average 50-year-old black man with hypertension (HT) who is a smoker or for the average 50-year-old white man with HT who is a smoker.
Rates of coronary heart disease CHD prevalence and outcomes in the US vary with race, age, gender, and other factors.
Data from the population based National Health and Nutrition Examination Survey (NHANES) survey suggest that CHD rates are lower for black men in the US population5
Disproportionate coronary heart disease mortality exists for African-American men and women2
Earlier age at onset of CHD
Higher overall CHD mortality
Higher out of hospital death rate
Higher sudden cardiac death rate
Data from a recent study from one state (1984–1993) documents decreased CHD mortality rates for white men generally and for black men in the highest SES category but not for other black men6.
[C] The cognitive process for identifying CHD (weighing information for diagnosis and prognosis) can be quite challenging and may need to be somewhat different for black and white patients. The diagnostic testing literature has long acknowledged the phenomenon of a given symptom having different diagnostic outcomes in cogent subsets of patients. An obvious example is the frequency with which coronary artery disease will be found among patients with substernal chest pain when one subgroup is young women and the other is old men. This phenomenon has been labeled “spectrum”7.
In support of the likelihood of a spectrum effect, a recent paper that summarizes literature about coronary heart disease in African-Americans2 states the following:
The predictive value of most conventional risk factors for CHD is similar for blacks and whites. However,
Risk of death and other sequelae attributable to hypertension are greater for African-Americans.
In African-Americans hypertension develops at younger ages and is associated with 3 to 5 times higher cardiovascular mortality rate than whites.
African-Americans appear to experience greater cardiovascular and renal damage than whites at any level of blood pressure.
More African-Americans than whites smoke but tend to consume fewer cigarettes per day.
African-Americans are more likely than whites to present with symptoms that strongly mimic coronary disease even in the absence of significant coronary obstruction on angiography.
Although issues such as socioeconomic status, access to cardiovascular care, and patients’ health seeking behaviors all contribute to clinical outcomes, recent advances in our understanding of the pathophysiology of acute coronary events also provide insights into biological similarities and differences in the spectrum of clinical presentations and outcomes for African-Americans and whites.
Extent of underlying coronary atherosclerosis
Type and extent of thrombus formation
Degree of endothelial dysfunction and coronary vasospasm
Does it matter that the subjects were asked to provide only the likelihood of CHD instead of a differential diagnosis?
The answer is clearly “yes.”
Diagnostic uncertainty8 and levels of tolerance for uncertainty9,10 have been linked to variations in physicians’ diagnoses, testing patterns, and treatment choices.
In the face of uncertainty about a given diagnostic possibility, the decision to recommend a given therapy should not be based solely on the likelihood of the disease for which the therapy is a reasonable recommendation (i.e., the treatment decision is not solely a function of the likelihood of the primary disease). The decision-maker must also consider whether this same therapy might cause harm if the diagnosis were an entity other than the diagnosis currently considered as being most likely.
How low the probability of the alternative disease(s) must be to warrant using the therapy being considered will depend, among other things, on how harmful the considered treatment would be if the alternative diagnostic possibility is the disease causing the presenting symptoms. Even if the CHD probability were judged to be very high (>85%), thrombolysis would be best avoided if the use of thrombolysis were associated with a high likelihood of fatality in the alternative condition (e.g., aortic dissection or pericarditis).
[D] Data on the primary independent variable, all the covariates, and the dependent variable were collected at only one point in time (cross-sectional data collection). Within the two arms of the study, post-randomization subgroups were formed based on levels of the subjects’ implicit association test (IAT) scores. The authors were thus comparing two subgroups selected from each study arm (see E, below). The values of the IAT scores for each subject did not change. Despite the cross-sectional nature of the data collection, the authors perform regression analyses, connect the point estimates in each study arm (black or white scenario), and discuss “increases” in “pro-white implicit bias.” Cross-sectional data support associations but not causal claims.
[E] Once the subjects who had been randomized into each arm of the study (by black face/white face allocation) were subsequently partitioned by IAT levels, are the sub-groups within each arm still comparable?
We don’t know.
IAT scores have been partitioned into two parts, low and high, and represent two groups of resident subjects. These groupings are no longer the two groups randomized to black and white faces with the vignette but are now clustered by their IAT results. The IAT scores do not “change” between the “high” and “low” groups being compared but are present in differing amounts in the different residents who form the two new groups. Since IAT scores (the putative “causal agent”) were not randomly allocated among the residents, and since the groups being compared have been constructed based on a non-random characteristic, we no longer can hold the expectation that the two groups will be similar in measured and unmeasured covariates.
For example, let’s say we randomly allocated black and white faces associated with a vignette with the purpose of creating two groups with approximately equal heights. With a sufficiently large sample, we could reasonably expect similar height distributions in each arm of the study (since randomization provides the basis for the expectation for– but not the guarantee of–similar distributions of measured and unmeasured covariates in each arm).
If we then separated the subjects in each arm into shorter and taller subgroups, we can no longer have the same expectation that other covariates, e.g., gender and weight, will be equivalently distributed. The similar distributions of height in the original (randomized) groups may occur due to an admixture of taller women (and thus relatively heavier than shorter women) and shorter men (and thus relatively lighter than taller men) in one group and an admixture of both women and men of medium heights in the other group. This would create a potential problem for interpreting results that may be associated with gender (and perhaps weight) if we neglected to record sex or measure weights in the sample. (In comparing the taller with taller subjects and shorter with shorter subjects across the arms of the trial, we would be comparing taller/heavier women with medium height/weight men and shorter/lighter men with medium height/weight women.)
[F] Separating the “news from the views” (personal communication, Alvin Feinstein, 1986) – In the Background section, the IAT methodology is introduced as hypothesizing that subjects will match a group representative to an attribute more quickly if they connect these in their minds and is quantified as the “difference in average matching speeds for opposite pairings.” These are reasonable characterizations of the method and measurement issues. The Methods section should continue the theme of measurement related issues. In this section “differences in average matching speed” are characterized as “implicit race preference” and “pro-white” and “problack bias.” These latter terms are repeated in the Results section. In the place and time in which we live, these terms are not synonyms for “differences in average matching speed” and would have been more appropriately placed in the Comment section of the paper.
[G] The Comment section of the paper reiterates the findings, discusses why prior criticisms of IAT may not apply, discusses how the findings seem to fit a broader societal picture of implicit race biases, speculates about how a study done on residents may apply to physicians generally, and devotes two sentences to limitations. No alternative hypotheses for the findings are offered or discussed.
In the closing paragraph, the authors suggest that physicians may harbor “unconscious...stereotypes that influence clinical decisions.” In doing so, it seems the authors have confused prior probabilities with stereotypes. Webster’s New Collegiate Dictionary defines stereotype as “something conforming to a fixed or general pattern, especially a standardized mental picture that is held in common by members of a group that represents an over simplified opinion, affective attitude, or uncritical judgment.” In contradistinction to this definition, a prior probability is not a static picture, but a dynamic decision-making feature that is based on the information available at a given point in time. The estimate will change with the introduction of clinically important information, e.g., age, sex, health related habits, and comorbid diseases as in the current scenario based study. Spectrum effects (discussed in C, above) may alter the degree to which a given piece of evidence shifts the estimated probability of disease for the leading diagnosis and its competitors.