1 Errare Humanum Est

To err is human: the realization of this unfortunate fact goes back at least to the ancient Romans. But not only do we err, in many ways we humans err in systematic ways that impinge on our ability to accurately learn about the world: Optical illusions, for example, are well-known and have provided not only examples used since ancient times and throughout the history of philosophical theorizing (Burnyeat 1983), but they also provide input into psychological and neuroscientific research (Macknik et al. 2010; Ekroll et al. 2017). In addition, research has also found strong evidence for the fallibility of human memory (Shaw 2016) and the illusory and often self-serving character of moral beliefs (Mattes 2018). Maybe even more interestingly, psychological science has exposed dozens of widespread additional cognitive biases, including motivated reasoning, overconfidence, saliency bias, survivorship bias, selection bias, hindsight bias, overeagerness to see spurious patterns in data, and many others (for overviews see for example Kahneman et al. (1982), Pohl (2005), or Kahneman (2011)). All this is particularly problematic because of serious problems recognizing when we are biased (this is often referred to as the “bias-blindspot”), a fact which again is largely out of awareness (e.g., Hansen et al. 2014), and, notably, general cognitive sophistication does not seem to help (West et al. 2012).

If you believe that human bodies are more robust than cars, or that drinking detergents is good for your health, your life is likely to be rather short. Consequently, society tries to protect those whose beliefs it understands to likely misrepresent reality - for example children, people with dementia, or those believing that safety belts in cars are more dangerous than helpful - from the probable consequences of their inaccurate views. Clearly, at least approximately accurate beliefs are essential both for us as individuals and as members of society (discussions at the margins, for example Taylor (1989), not withstanding).

There is an obvious tension between the recognition of the existence of biases, and the need for our beliefs not to stray too far from reality. Fortunately, there is a number of helpful tools to reduce and maybe eliminate the influence of biases and other distortions on our beliefs. Here are a few examples: To reduce the influence of memory distortions, you can avail yourself of a technology perfectly well suited for this task: writing. Given individual observations that are affected by random influences, researchers use statistical techniques to combine several observations. To reduce the influence of survivorship bias you check whether you look at all relevant cases, to reduce hindsight bias you do prospective rather than retrospective investigations, to avoid being impressed with spurious “results” you compare to control conditions, in order to avoid selection bias you randomly assign the study participants (chosen from an appropriate population to ensure what is often called ecological validity) to treatment and control conditions, blinding both researchers and participants avoids motivated perception and experimenter effects, etc. etc.

Such systematic ways to deal with the influence of human cognitive biases are above all an essential ingredient in the conduct of science, and in particular the extent to which they have been applied is crucial in the evaluation of scientific evidence. For example, in medical research, so-called randomized controlled trials (RCT) are considered to provide stronger evidence than cohort studies, which in turn are preferable to case studies, at the bottom level of evidence is expert opinion without supporting evidence. Rated above RTCs are well-conducted systematic reviews and meta-analyses (Moher et al. 2015; Shamseer et al. 2015).

2 Tu Quoque, Scientia?

But then, science is done by humans, hence human error proneness may well impact on the conduct and/or results of science itself. If it indeed does, the question is: to what extent? The impressive progress of science, and its application in technology, seem to strongly suggest that science is reasonably successful in painting an accurate picture of reality. This sometimes leads to claims like that of science being “a candle in the dark” (Sagan 1996).

Nevertheless, a number of philosophers have raised objections to such claims. The present paper is not the place to discuss purely philosophical objections, except to note that criticising science based on philosophical theorizing seems rather unconvincing given “the historical record of philosophical argumentation, which is a track record that is marked by an abundance of alternative theories and serious problems for those theories” (Mizrahi 2016a), given that where comparable such philosophical theorizing seems clearly inferior to the sciences (Wootton 2015), and given that at least some of the philosophical arguments applied in this context in turn apply to those philosophical arguments as well (Park 2017). In addition, even where arguments are based on historical studies, these seem to rely on retrospective non-random samples, hence what researchers rightly would consider to be only low-level evidence (compare Mizrahi 2016b).

Instead, I want to concentrate on a recent attempt in this journal (Colombo et al. 2016) to show, using scientific tools (specifically, studies of the type often performed in social psychology), that the biases of researchers rule out the “psychological attainability of objective, value-free scientific reasoning”. Colombo et al. (2016) describe value-free science as follows:

According to the ideal of the value-free character of science, scientists should strive to minimize the influence of non-epistemic values on their assessment of the evidence for a hypothesis [… the] idea is that scientific reasoning is objective to the extent that the appraisal of scientific hypotheses, which contributes to producing scientific knowledge, is not influenced by non-epistemic values, but only by the available evidence.

Then they go on to claim that sometimes “[b]ackground knowledge also allows for [...] non-epistemic moral values [… to] provide a non-trivial lower or upper bound on the probability that the hypothesis is true”. To argue for this, they use an example of Sober (2007), where background knowledge relates the supposedly non-moral proposition “Drug X is safe.” to the supposedly moral proposition “Good consequences accrue to the patients.” I for my part find this example decidedly unconvincing since I don’t understand why “no bad consequences accrue to the patients” (i.e., the drug is safe) is supposed to be a non-moral statement, while “good consequences accrue to the patients” is supposed to be a moral one.

However that may be, Colombo et al. (2016) are of course right in stating in the following paragraph that in the sciences, ”moral information should not affect the assessment of the evidence available for a hypothesis over and above the hypothesis’s prior credibility“, if they are to be considered value-free. They go on to note that “reasoning and valuing are obviously psychological processes” (italics in the original), and list a number of papers in the literature as supporting claims that prior beliefs “predict” explanatory judgements of research, that such judgement “is often biased in subtle and intricate ways”, and motivational states “can influence many of our beliefs about the world.” Nevertheless, they point out that

these studies provide weak support for the claim that non-epistemic values systematically affect the appraisal of the relation between a scientific hypothesis, data, and background knowledge, because they did not control for hypotheses’ prior credibility and did not assess the extent to which accuracy incentives can mitigate the effect of directional goals.

Against this background, Colombo et al. (2016) conducted a pre-study and two studies as briefly summarized next:

In the pre-study, participants were presented with a number of statements and asked to rate them as to the extent to which they found those statements credible, and also as to the extent they found them morally offensive, both on a scale from 1 = “not at all” to 5 = “very much”. Based on the mean ratings, seven statements were selected for use in the main studies (described below), four as being morally offensive (mean offensiveness rating > 3.46; mean credibility ratings between 1.15 and 1.8) and three as being morally neutral (mean offensiveness rating < 1.86; mean credibility ratings: 1.33, 2.05, 3.85). For two of the chosen offensive statements, the rank correlation between the individual ratings of offensiveness and credibility was medium sized (r = 0.382 and r = 0.366) and statistically significant with p-values 0.020 and 0.015, respectively; the other correlations were only reported as “n.s.”.

In study 1, participants read “short reports about alleged scientific studies”, which allegedly provided evidence for one of the seven statements selected in the pre-study. Participants then rated (again on a five-point Likert scale, plus an option of “I Don’t know”) each report and the research described therein as to whether it was plausible, convincing, well-conducted, worthy of funding, and whether it provides strong evidence for the alleged conclusion. Results showed that for each of these five outcome measures, ratings were higher for reports arguing for morally neutral compared to morally offensive statements, with t-tests suggesting statistical significance (all p-values <0.03). Nevertheless, as the authors themselves acknowledge, all estimated the effects were small (Cohen’s d between 0.22 and 0.38).

In study 2, the authors replicated this study in another sample, with the addition of a monetary incentive for accuracy for a subgroup of participants. The reports used were (with one exception) the same as in the first study. For each outcome measure, results were subjected to a 2 × 2 ANOVA with the factors moral (neutral, offensive) and incentive (yes, no). Only in one case (plausibility) there was a statistically significant but small main effect of moral condition, for three of the five outcomes p-values were in fact >0.40. Remarkably, in the accuracy group the reports supporting offensive statements received higher ratings on being well-conducted and on providing good evidence than those supporting morally neutral statements.

In their concluding discussion, Colombo et al. (2016) claim that their “results support three conclusions. First, explanatory judgment is consistently driven by a motivation to avoid inferring a morally undesirable conclusion. Second, the appraisal of the evidential relationship between a particular hypothesis, background beliefs, and a specific body of evidence is affected by moral motivation, and more generally by non-epistemic values. Third, as a matter of psychological fact, the ideal of a value-free science is not attainable.”

3 Caveat Lector

Reader beware. As laudable as is the attempt of Colombo et al. (2016) to apply scientific methods to the study of the scientific enterprise itself, a closer look at their methods instils serious doubts about the validity of their conclusions. I will raise two main groups of concerns, one related to the specific experimental set-up and statistical evaluation of the results used by the authors, the other is more general and might well be worth considering also when evaluating other studies in the philosophy of science.

3.1 Concerns, Part 1: Experimental Set-Up and Statistical Evaluation

I have a number of concerns relating to the reporting and analysis of the experiments in Colombo et al. (2016), as well as to the experimental setup.

First, in discussing study 2, the authors claim that ‘The main effects of “Moral” indicate that the moral content of the hypotheses influenced explanatory judgment. [...] The results support and extend the claim that differences in scientific hypotheses’ perceived moral offensiveness predict differences in explanatory judgments’, when in fact only one of the five main effects is statistically significant and three are far from being significant. Nor has it been taken into account that the data are highly non-normal, hence the use of parametric statistical techniques like t-test and ANOVA is doubtful. In addition, as already noted above, one can see from table 3 in the paper that when participants are incentivised for accuracy of judgement, two of the five effects go in the other direction. So it seems that, if anything, the results of study 2 show some tendency to disconfirm the conclusions of the authors.

Next, the authors rightly argue for the need to control for the prior credibility of the morally offensive versus neutral statements when comparing ratings of the kind elicited from the participants in their studies. They try to do this by using only statements that in the pre-study received low credibility ratings. Nevertheless, the average credibility rating of the statements used is 1.525 for the offensive and 2.41 for the neutral ones, almost a full point higher! This raises serious doubts as to whether this eliminates possible influence of differential prior credibility on the results, even more so since the estimated effect sizes in Colombo et al. (2016) are small by the authors own admission. Even more doubts come from the fact that the pre-test and study 1 were done in very different populations: students in one case, Mechanical Turk workers in the other. It is easily conceivable that mean ratings of credibility and/or morality may differ between those populations, again obviating the attempted necessary control for prior credibility. Note that this problem does not arise with study 2 (which used students, as did the pre-study), hence my remarks in the previous paragraph carry even more weight.

Another problematic aspect of the setup is that prior to the experiment, participants “received careful debriefing [...] informing them about the potential offensive character of the material’s content.” This is problematic because it may have primed participants to think in moral categories more than they usually would in the context of evaluating scientific evidence, thus biasing the results of the studies towards finding an influence of moral judgements on outcomes.

Another concern related to the setup of these studies is that almost exclusively statements of very low credibility (mean credibility <2.1 on a 1 to 5 scale) were used. That again is worrying: if you present allegedly scientific evidence for something that your participants strongly disbelieve, you put them in an awkward position where they might be unduly influenced by their emotions, and even more so if they are students and/or expect to be judged by scientists on the accuracy of their judgements.

In conclusion, neither of the two studies reported in Colombo et al. (2016) nor their combination provides plausible evidence for the claimed psychological unattainability of value-free scientific reasoning.

3.2 Concerns, Part 2: Ecological Validity and Corrigibility

In the previous subsection I pointed out a number of methodological flaws which in themselves already invalidate the conclusions of those Colombo et al. (2016). Here I list a number of further objections which I discuss separately because they seem to apply more generally to claims about the supposed impossibility of value-free science. In this I will take science to be value-freeFootnote 1 to the extent that it is free from “non-epistemic values systematically intrud[ing] the appraisal of the evidential relation, on which the evaluation and justification of scientific claims depend” (Colombo et al. 2016, p. 755), at least where such intrusions would bias the appraisal relative to an appraisal based entirely on epistemic factors.

The well-known demarcation problem entails that when making general claims about “science”, one has to explicate what counts a science. A statement like “the ideal of a value-free science is not attainable” is meaningless, unless one specifies what areas of research it is claimed to apply to. In the context of the discussion above this clearly matters, since there are obvious differences between, as well as within, the sciences as to the possibility of controlling for bias. For example, in terms of method, controlled experiments tend to be easier in (most areas of) physics than in astronomy, even within physics they seem out of reach for the foreseeable future in string theory (Ellis and Silk 2014), RCTs are easier in microeconomics than in macroeconomics, blinding therapists is easy in pharmacology and hardly possible in manual medicine, to name just a few obvious and well-known cases. In terms of reliability of research outcomes, there is (again just one example) a curious discrepancy in the reproducibility of research in cognitive compared to social psychology, with social psychology showing lower reproducibility (Doyen et al. 2014; Open Science Collaboration 2015). Hence the question arises: Is the alleged impossibility to evaluate scientific evidence in a value-free way supposed to apply to all fields and subfields of science, including for example all of mathematics? Should we really believe that professional mathematicians’ moral emotions will decisively interfere with their study of, say, locally convex vector spacesFootnote 2? While Colombo et al. (2016) do not say whether they claim that the value-free ideal is unattainable in all sciences; their studies, even if they were sound (they are not, see above), would apply at best to a small number of research areas. Making such constraints on generality explicit is yet another step towards reducing non-epistemic influences in science (Simons et al. 2017).

A second, related problem is that the participants in the studies were students and Mechanical Turk workers, basically what philosophers like to call “the folk”. But then, nobody said that doing science is easy or comes naturally. It takes years of training to put a lid on one’s intuitive physics or intuitive probability, or to be able to think in terms of abstract vector spaces, to take just a few examples. So even if someone succeeded in showing that the folk can’t keep irrelevant considerations out of their evaluation of evidence, why would that tell us anything about the impossibility of doing so? (Don’t many philosophers love to argue that their training makes their intuitions superior to that of the folk because it enables them to focus on what is really relevant? (Nado 2015)) Saying “it is not implausible to hypothesise that our conclusions about morally-motivated explanatory reasoning may sometimes apply to the reasoning of many professional scientists too” (Colombo et al. 2016) does little to answer this point, not least because “it is not implausible [...] may sometimes apply” is a far cry from their claim in the same paper that “our results show that explanatory judgments about a scientific hypothesis are robustly sensitive to the perceived moral offensiveness [...] which suggests that scientific reasoning is imbued with non-epistemic values” in those cases where it matters, i.e., among scientists.

In this respect one must not forget that scientists tend to be well aware of the fact that our perceptions and judgements can be biased and have developed a number of ways to try to reduce or eliminate these problems,Footnote 3 a few of which have been discussed above. Here I want to mention only two more that may be especially relevant for the present discussion: adversarial collaboration (e.g., Dam et al. 2018) may be particularly helpful to reduce the influence of moral emotions, and Greenhoot et al. (2004) show that asking a question in a more abstract (what a fictional experimenter should conclude) rather than personal (their own conclusions) way reduces the influence of prior beliefs on scientific reasoning, and furthermore that this influence was much smaller even for the personal framing among those participants who understood the issue of objectivity of inquiry.Footnote 4 Unsurprisingly, MacCoun (1998) – who was quoted by Colombo et al. (2016) as evidence that “judgment about scientific evidence is often biased in subtle and intricate ways” – indeed presents “a wealth of evidence that biased research interpretation is a common phenomenon”, but then continues noting that

there is danger of excessive cynicism here. The evidence suggests that the biases are often subtle and small in magnitude; few research consumers see whatever they want in the data.[...] far from condemning the research enterprise, the evidence cited here provides grounds for celebrating it; systematic empirical research methods have played a powerful role in identifying biased research interpretation and uncovering its courses.

Thus, proving the impossibility (rather than for example the difficulty) of value-free science requires first of all specifying which areas of science one talks about, and showing that scientists in those areas succumb to non-epistemic influences (e.g., moral outrage). Nevertheless, this would still not suffice: Even if some scientists were unable to keep moral judgements out of their evaluations, that would not imply that science is unavoidably value-ladenFootnote 5; it could as easily mean that arguments should be judged only by those who can keep them out.Footnote 6 Moreover, even if all relevant individuals were psychologically handicapped, that would not imply that one can not correct for bias. After all, if you can quantify the effect of moral influences on judgements of credibility, you can presumably also correct for itFootnote 7 - for example by running a suitable regression. Another way might be (at least in many cases): judge the quality of an experimental setup before you know the outcome.

The above discussion concentrates on biases in the epistemic appraisal of scientific hypotheses that arise from the psychology of individual humans. As a reviewer pointed out, the intrusion of non-epistemic values can also happen in other ways, for example when scientists achieve leading positions they might become “more susceptible to a-epistemic influences on one’s work,” -because of “more regular interaction with the funding bureaucracy or more active involvement in university politics.” Nevertheless, even if this were true, it is far from obvious that this makes value-free science impossible: if one could show that some scientists are unavoidably biased by non-epistemic values, that would only indicate that value-free science requires excluding them from the epistemic appraisal of hypotheses, or correcting for their biases (certainly influential scientist made mistakes in the past that were corrected later, why would this be impossible for biases arsing from non-epistemic values?). History might also suggest caution in claiming that institutional influences make value-free science impossible: Religious values seem to have had considerable influence on areas like astronomy or medicine for some time, but were eventually corrected for. If currently existing institutions (compare, e.g., Smaldino and McElreath 2016) obstruct value-free science that would not show that the latter is impossible, to prove this one would have to show that there can be no institutional structures under which science can be value-free. Quite simply (but easily overlooked, compare footnote 3 above), arguing for the impossibility of value-free science can not be based solely on showing that scientists commit errors (whether rooted in human psychology, or for example in institutional structures), it has to rule out that there is any way of correcting for these errors.

3.3 To Quoque, Reversed

I want to add two brief remarks on how arguments of Colombo et al. (2016) actually work against themselves: As these authors assert, when discussing the small effect sizes that they observe, “It is not implausible that, in real-life explanatory contexts where reasoning concerns complex subject matters and bodies of evidence, the effect we found subtly interacts with other small biases leading to grossly mistaken explanatory judgments.” Maybe so, but the study of values, biases, and objectivity in science is a complex subject matter too, so if their assertion is sound, then presumably small biases in philosophical theorizing or psychological experimentation can lead to grossly mistaken judgements about the possibility of value-free and/or objective science, as well. This provides one more reason to be weary of claims of the impossibility of value-free science. After all, philosophers should not put demands on scientists that they themselves fail to fulfil (Park 2018). Add to that the remarkably irony of Colombo et al. claiming that “as a matter of psychological fact [sic!], the ideal of a value-free science is not attainable”. So they do acknowledge the existence of scientific facts after all!

4 Conclusion

To conclude, the present paper discussed the claim that value-free science is impossible. It did not discuss how difficult to achieve it may be (which likely depends very much on the specific field of inquiry, see above), nor whether it is desirable. After applauding the observation of Colombo et al. (2016) that this is at least to a considerable extent a psychological question, and should therefore be studied using the methods of psychological science, the studies performed by these authors were examined and unfortunately found seriously wanting in various respects.

Beyond the merits or demerits of that particular piece of philosophy, the discussion lead to a conclusion likely relevant to the entire discussion about the alleged impossibility of value-free science: Showing the impossibility of value-free science would entail at least a) defining what the term science is intended to cover, b) providing high level evidence that few if any scientists in the relevant area(s) are immune to non-epistemic influences (else one could presumablyFootnote 8 achieve value-free science by having scientific hypothesis only evaluated by those who are immune), c) that these influences meaningfully bias the results of science, d) that there is no way to correct for these influences, and e) explain why – unlike epistemic appraisal in science – the epistemic appraisal of this argument can be trusted.