Reader beware. As laudable as is the attempt of Colombo et al. (2016) to apply scientific methods to the study of the scientific enterprise itself, a closer look at their methods instils serious doubts about the validity of their conclusions. I will raise two main groups of concerns, one related to the specific experimental set-up and statistical evaluation of the results used by the authors, the other is more general and might well be worth considering also when evaluating other studies in the philosophy of science.
Concerns, Part 1: Experimental Set-Up and Statistical Evaluation
I have a number of concerns relating to the reporting and analysis of the experiments in Colombo et al. (2016), as well as to the experimental setup.
First, in discussing study 2, the authors claim that ‘The main effects of “Moral” indicate that the moral content of the hypotheses influenced explanatory judgment. [...] The results support and extend the claim that differences in scientific hypotheses’ perceived moral offensiveness predict differences in explanatory judgments’, when in fact only one of the five main effects is statistically significant and three are far from being significant. Nor has it been taken into account that the data are highly non-normal, hence the use of parametric statistical techniques like t-test and ANOVA is doubtful. In addition, as already noted above, one can see from table 3 in the paper that when participants are incentivised for accuracy of judgement, two of the five effects go in the other direction. So it seems that, if anything, the results of study 2 show some tendency to disconfirm the conclusions of the authors.
Next, the authors rightly argue for the need to control for the prior credibility of the morally offensive versus neutral statements when comparing ratings of the kind elicited from the participants in their studies. They try to do this by using only statements that in the pre-study received low credibility ratings. Nevertheless, the average credibility rating of the statements used is 1.525 for the offensive and 2.41 for the neutral ones, almost a full point higher! This raises serious doubts as to whether this eliminates possible influence of differential prior credibility on the results, even more so since the estimated effect sizes in Colombo et al. (2016) are small by the authors own admission. Even more doubts come from the fact that the pre-test and study 1 were done in very different populations: students in one case, Mechanical Turk workers in the other. It is easily conceivable that mean ratings of credibility and/or morality may differ between those populations, again obviating the attempted necessary control for prior credibility. Note that this problem does not arise with study 2 (which used students, as did the pre-study), hence my remarks in the previous paragraph carry even more weight.
Another problematic aspect of the setup is that prior to the experiment, participants “received careful debriefing [...] informing them about the potential offensive character of the material’s content.” This is problematic because it may have primed participants to think in moral categories more than they usually would in the context of evaluating scientific evidence, thus biasing the results of the studies towards finding an influence of moral judgements on outcomes.
Another concern related to the setup of these studies is that almost exclusively statements of very low credibility (mean credibility <2.1 on a 1 to 5 scale) were used. That again is worrying: if you present allegedly scientific evidence for something that your participants strongly disbelieve, you put them in an awkward position where they might be unduly influenced by their emotions, and even more so if they are students and/or expect to be judged by scientists on the accuracy of their judgements.
In conclusion, neither of the two studies reported in Colombo et al. (2016) nor their combination provides plausible evidence for the claimed psychological unattainability of value-free scientific reasoning.
Concerns, Part 2: Ecological Validity and Corrigibility
In the previous subsection I pointed out a number of methodological flaws which in themselves already invalidate the conclusions of those Colombo et al. (2016). Here I list a number of further objections which I discuss separately because they seem to apply more generally to claims about the supposed impossibility of value-free science. In this I will take science to be value-freeFootnote 1 to the extent that it is free from “non-epistemic values systematically intrud[ing] the appraisal of the evidential relation, on which the evaluation and justification of scientific claims depend” (Colombo et al. 2016, p. 755), at least where such intrusions would bias the appraisal relative to an appraisal based entirely on epistemic factors.
The well-known demarcation problem entails that when making general claims about “science”, one has to explicate what counts a science. A statement like “the ideal of a value-free science is not attainable” is meaningless, unless one specifies what areas of research it is claimed to apply to. In the context of the discussion above this clearly matters, since there are obvious differences between, as well as within, the sciences as to the possibility of controlling for bias. For example, in terms of method, controlled experiments tend to be easier in (most areas of) physics than in astronomy, even within physics they seem out of reach for the foreseeable future in string theory (Ellis and Silk 2014), RCTs are easier in microeconomics than in macroeconomics, blinding therapists is easy in pharmacology and hardly possible in manual medicine, to name just a few obvious and well-known cases. In terms of reliability of research outcomes, there is (again just one example) a curious discrepancy in the reproducibility of research in cognitive compared to social psychology, with social psychology showing lower reproducibility (Doyen et al. 2014; Open Science Collaboration 2015). Hence the question arises: Is the alleged impossibility to evaluate scientific evidence in a value-free way supposed to apply to all fields and subfields of science, including for example all of mathematics? Should we really believe that professional mathematicians’ moral emotions will decisively interfere with their study of, say, locally convex vector spacesFootnote 2? While Colombo et al. (2016) do not say whether they claim that the value-free ideal is unattainable in all sciences; their studies, even if they were sound (they are not, see above), would apply at best to a small number of research areas. Making such constraints on generality explicit is yet another step towards reducing non-epistemic influences in science (Simons et al. 2017).
A second, related problem is that the participants in the studies were students and Mechanical Turk workers, basically what philosophers like to call “the folk”. But then, nobody said that doing science is easy or comes naturally. It takes years of training to put a lid on one’s intuitive physics or intuitive probability, or to be able to think in terms of abstract vector spaces, to take just a few examples. So even if someone succeeded in showing that the folk can’t keep irrelevant considerations out of their evaluation of evidence, why would that tell us anything about the impossibility of doing so? (Don’t many philosophers love to argue that their training makes their intuitions superior to that of the folk because it enables them to focus on what is really relevant? (Nado 2015)) Saying “it is not implausible to hypothesise that our conclusions about morally-motivated explanatory reasoning may sometimes apply to the reasoning of many professional scientists too” (Colombo et al. 2016) does little to answer this point, not least because “it is not implausible [...] may sometimes apply” is a far cry from their claim in the same paper that “our results show that explanatory judgments about a scientific hypothesis are robustly sensitive to the perceived moral offensiveness [...] which suggests that scientific reasoning is imbued with non-epistemic values” in those cases where it matters, i.e., among scientists.
In this respect one must not forget that scientists tend to be well aware of the fact that our perceptions and judgements can be biased and have developed a number of ways to try to reduce or eliminate these problems,Footnote 3 a few of which have been discussed above. Here I want to mention only two more that may be especially relevant for the present discussion: adversarial collaboration (e.g., Dam et al. 2018) may be particularly helpful to reduce the influence of moral emotions, and Greenhoot et al. (2004) show that asking a question in a more abstract (what a fictional experimenter should conclude) rather than personal (their own conclusions) way reduces the influence of prior beliefs on scientific reasoning, and furthermore that this influence was much smaller even for the personal framing among those participants who understood the issue of objectivity of inquiry.Footnote 4 Unsurprisingly, MacCoun (1998) – who was quoted by Colombo et al. (2016) as evidence that “judgment about scientific evidence is often biased in subtle and intricate ways” – indeed presents “a wealth of evidence that biased research interpretation is a common phenomenon”, but then continues noting that
there is danger of excessive cynicism here. The evidence suggests that the biases are often subtle and small in magnitude; few research consumers see whatever they want in the data.[...] far from condemning the research enterprise, the evidence cited here provides grounds for celebrating it; systematic empirical research methods have played a powerful role in identifying biased research interpretation and uncovering its courses.
Thus, proving the impossibility (rather than for example the difficulty) of value-free science requires first of all specifying which areas of science one talks about, and showing that scientists in those areas succumb to non-epistemic influences (e.g., moral outrage). Nevertheless, this would still not suffice: Even if some scientists were unable to keep moral judgements out of their evaluations, that would not imply that science is unavoidably value-ladenFootnote 5; it could as easily mean that arguments should be judged only by those who can keep them out.Footnote 6 Moreover, even if all relevant individuals were psychologically handicapped, that would not imply that one can not correct for bias. After all, if you can quantify the effect of moral influences on judgements of credibility, you can presumably also correct for itFootnote 7 - for example by running a suitable regression. Another way might be (at least in many cases): judge the quality of an experimental setup before you know the outcome.
The above discussion concentrates on biases in the epistemic appraisal of scientific hypotheses that arise from the psychology of individual humans. As a reviewer pointed out, the intrusion of non-epistemic values can also happen in other ways, for example when scientists achieve leading positions they might become “more susceptible to a-epistemic influences on one’s work,” -because of “more regular interaction with the funding bureaucracy or more active involvement in university politics.” Nevertheless, even if this were true, it is far from obvious that this makes value-free science impossible: if one could show that some scientists are unavoidably biased by non-epistemic values, that would only indicate that value-free science requires excluding them from the epistemic appraisal of hypotheses, or correcting for their biases (certainly influential scientist made mistakes in the past that were corrected later, why would this be impossible for biases arsing from non-epistemic values?). History might also suggest caution in claiming that institutional influences make value-free science impossible: Religious values seem to have had considerable influence on areas like astronomy or medicine for some time, but were eventually corrected for. If currently existing institutions (compare, e.g., Smaldino and McElreath 2016) obstruct value-free science that would not show that the latter is impossible, to prove this one would have to show that there can be no institutional structures under which science can be value-free. Quite simply (but easily overlooked, compare footnote 3 above), arguing for the impossibility of value-free science can not be based solely on showing that scientists commit errors (whether rooted in human psychology, or for example in institutional structures), it has to rule out that there is any way of correcting for these errors.
To Quoque, Reversed
I want to add two brief remarks on how arguments of Colombo et al. (2016) actually work against themselves: As these authors assert, when discussing the small effect sizes that they observe, “It is not implausible that, in real-life explanatory contexts where reasoning concerns complex subject matters and bodies of evidence, the effect we found subtly interacts with other small biases leading to grossly mistaken explanatory judgments.” Maybe so, but the study of values, biases, and objectivity in science is a complex subject matter too, so if their assertion is sound, then presumably small biases in philosophical theorizing or psychological experimentation can lead to grossly mistaken judgements about the possibility of value-free and/or objective science, as well. This provides one more reason to be weary of claims of the impossibility of value-free science. After all, philosophers should not put demands on scientists that they themselves fail to fulfil (Park 2018). Add to that the remarkably irony of Colombo et al. claiming that “as a matter of psychological fact [sic!], the ideal of a value-free science is not attainable”. So they do acknowledge the existence of scientific facts after all!