Whilst localism is a credible and promising approach to the replication crisis debate, it is also an approach that raises new questions. One issue that stands out is that of demarcation: if replicability is not a universal ideal, then where do we draw the line? How can we know to which fields the norm applies and to which it doesn’t?
The discussion in section 3 has shown that there might be well-defined poles where things are relatively straightforward – research in computer science at one end and participant observation in fields like anthropology at the other. But once we enter the realm of semi-standardised or non-standardised experiments, we encounter a large grey zone where it becomes much harder to decide if replicability should apply or not. This uncertainty matters because it directly affects debates about policy measures and about how they should be designed.
In this section, I will have a closer look at how different authors address this issue. I will show that the nature of the object of interest becomes crucial in this context. I will also highlight some key issues this particular demarcation criterion faces.
The question of standardisation
When it comes to the suitability of the replicability ideal, we have seen that high standardization and control over variables are crucial. Leonelli describes one class of experiments (apart from work in computer science) that can achieve such levels, namely what she calls ‘standardised experiments’. Examples of such experiments can be found in the clinical sciences or in physics. In randomized controlled trials (RCTs), for instance, researchers apply rigorous controls to ensure that results are as reliable as possible. Leonelli argues that in such standardised experiments the idea of direct replication can be applied and is expected by scientists in actual practice.
However, such high standardization and control can only be achieved in a select few experimental contexts. In the majority of cases, especially in fields such as biology, researchers are dealing with what Leonelli calls ‘semi-standardised’ or ‘non-standardised’ experiments. The latter include, for instance, exploratory experiments where researchers investigate new entities or phenomena on which they have little or no background information. In these cases, standardization is not possible because the researchers might not know what to expect or what to control for. Leonelli argues that in such non-standardised settings the idea of direct replication (or conceptual replication for that matter) is ‘not helpful’ (Leonelli 2018, p. 10).
In the case of semi-standardised experiments there is also limited control that can be exerted (even though standardisation is possible and actually implemented). The problem here is not the availability of information or materials but the nature of the objects of interest. In particular, when working with living entities, such as model organisms, researchers are unable to control each and every aspect of the intervention because the objects of interest are highly context-sensitive (Leonelli 2018, p. 9). Animals, for instance, are responsive to changes in lighting, nutrition, or even the gender of the people handling them (Chesler et al. 2002; Lewejohann et al. 2006; Sorge et al. 2014). Such context-sensitivity imposes constraints on the level of control that can be achieved. Similar issues also affect experiments in psychology, where the human research subjects can be influenced by the research setting and the interaction with other people.
Leonelli does not make a normative statement when considering these grey zone cases. She simply makes the empirical claim that many researchers who work with semi- or non-standardized setups don’t aim for direct replicability. As I will show in the following section, other authors who have commented on this issue argue that the living/non-living distinction can and should be used as a demarcation line in this debate.
Living entities and the problem of replicability
The idea that the nature of the entity of interest is crucial for any debate about replication and replicability is not new. Schmidt (2009) begins his discussion of the topic by highlighting that the idea of direct replicability builds on the fundamental assumption of the uniformity of nature. This assumption, he claims, is problematic as many fields deal with what he calls ‘irreversible units’ (ibid, p. 92). These are entities that are complex and not time-invariant (Schmidt does not specify what ‘complex’ means in this context). A key feature of these entities is that they have a memory of some sort; they ‘accumulate history’, as Schmidt puts it. In the case of human test subjects this historicity means that they might remember – consciously or sub-consciously – previous experiences and that this memory can affect their behaviour. Testing the same entity at a later time point might therefore not produce the same results, simply because the entity has transformed in key aspects. This historicity creates a problem for the idea of direct replicability as it undermines uniformity. Importantly, it goes beyond the context-sensitivity that Leonelli emphasises in her discussion of animal model research (which focuses on present influences on the test subject). Schmidt focuses his debate on the social sciences and only uses human test subjects as an example. It is therefore not clear what other entities he would include in his category of ‘irreversible units’.
Crandall and Sherman (2016) approach the issue in a similar way. They claim that the idea of direct replication is a ‘sensible proposition’ in fields such as physics or biology, where the processes that matter for the outcome of an experiment are transhistorical and transcultural; changes in politics or language don’t change the weight of an electron or the fold of a protein (Crandall and Sherman 2016).Footnote 9 But in a field like social psychology, a shift in language or socio-economic circumstances can profoundly affect the behaviour of the entities studied (a blog post by Michael Ramscar provides an in-depth analysis of how this might work (Ramscar 2015)). The entities studied in this field change over time, and their internal makeup depends not only on the context they are in but also the contexts they have experienced in the past. Crandall and Sherman not only highlight the importance of memory and experience, but also the fact that cultural factors, which shape a person’s behaviour, can change over time. Their account is also more specific than Schmidt’s, as they seem to propose a sharp line between natural sciences, such as physics or biology, and research in the social sciences.
Looking specifically at the humanities, de Rijcke and Penders (2018) follow a similar approach to that of Crandall and Sherman when they talk of ‘interactive’ and ‘indifferent’ kinds. Humans are examples of the first kind, DNA molecules of the second. Replicability, they argue, can only be used as a standard for the quality of data when doing research on indifferent kinds. In the humanities, where interactive beings are studied, this standard should not apply.
Whilst the above authors mainly focus on the role of historicity and plasticity in psychology and the humanities, others have, like Leonelli, focused more specifically on the situation in biology. Mihai Nadin, for instance, singles out biology because he draws a sharp line between the realm of living entities and that of ‘dead matter’ (Nadin 2018). He argues that there are fundamental differences in how change and causality work in these different realms, linking the idea of historicity and plasticity exclusively to living entities. Of particular importance to his account is the idea that the space of possibilities of living systems is continuously changing, an idea he takes from Giuseppe Longo’s work (see, e.g., Longo 2017). He argues that:
“[T]he expectation of experiment reproducibility – legitimate in the decidable domains of the non-living (physics, chemistry) – is a goal set in contradiction to the nature of the change processes examined [in biology]” (Nadin 2018, p. 467).
Contrary to Crandall and Sherman (2016), he thus includes biology in the set of disciplines that pose significant problems for reproducibility.
The special status of the entities biology studies is also emphasised by Maël Montévil, who analyses the concept of ‘measurement’ in biology in the context of the replication crisis (Montévil 2019). He points out that the behaviour of systems analysed in physics is guided by an invariant underlying structure that can be captured in mathematical terms. This invariance (and invariance-preserving transformations) allows physicists to assume that generic conditions can be applied to generic objects when they deal with their objects of interest. This also means that replicability can be expected when particular features of physical systems are measured.
In biology, the situation is different. Here the organization of an entity depends on its past and current contexts, meaning that history matters for the (living) object of analysis (Montévil calls them ‘diachronic objects’ (ibid, p. 3)). Change also happens in physics of course, but there it is based on an unchanging mathematical structure, which is generic and therefore not context-sensitive (ibid, p. 5).Footnote 10
The history-dependent nature of organisms also explains certain research practices in biology, for instance why researchers often exchange cell lines or other living model systems with each other (see also (Bissell 2013) on this point). The researchers have to make sure that they work with materials which have had the same experiences, and which are therefore more likely to display similar behaviours. The recent genealogy of the specimen is a feature of biological systems that has to be controlled as tightly as possible by the researcher to increase reliability (Montévil 2019, p. 10).
In summary, we see that a range of authors emphasise, in different ways, the importance of historicity and plasticity for debates about replicability. They highlight the fact that some types of entities are fundamentally time-dependent and that this interferes with the idea of uniformity that underlies the replicability ideal. Some authors, such as Nadin (2018) and Montévil (2019), link these features explicitly to living systems, whereas others talk more generally of ‘irreversible’ or ‘interactive’ entities.
Much of this debate implies that these distinctions should define a relatively clear boundary between science that can be treated as replicable and disciplines to which the replicability norm does not apply. The nature of the objects of interest affects the level of standardization and control that is possible in a field. This, in turn, affects the level of replicability that can be expected. However, as I will show in the next section, the practice of animal model researchers suggests that this line is not as clear as it might seem at first.
Rescuing replicability by abandoning standardization
The way in which researchers in animal model research deal with the problem of plasticity and historicity shows that they don’t abandon the ideal of replicability when standardization and control become problematic. Similarly to what the above authors have emphasized, these scientists stress the importance of the history of the organism and its plasticity. As Voelkl and Würbel put it:
“[T]he response of an organism to an experimental treatment (e.g., a drug or a stressor) often depends not only on the properties of the treatment but also on the state of the organism, which is as much the product of past and present environmental influences as of the genetic architecture.” They go on to conclude that “we should expect results to differ whenever an in vivo experiment is replicated” (Voelkl and Würbel 2016).
Even though this sounds like these researchers are ready to abandon the ideal of replicability, the opposite is the case: rather than turning their backs on replicability, they abandon the idea of standardization. Instead of seeing control in the form of uniform parameters as the solution (see, e.g., Festing 2004), some animal model researchers now see standardization as part of the problem (Würbel 2000; Richter 2017). This led them to coin the term ‘standardization fallacy’, which is defined as “the erroneous belief that reproducibility can be improved through ever more rigorous standardization” (Voelkl and Würbel 2016).
This shift in thinking leads to intriguing methodological consequences: to increase the reliability of their findings these researchers now introduce systematic heterogenization in their experimental setups, for instance through the use of animals with different genotypes, gender, or housing conditions. The idea is that the results of the experiment should become less sensitive to variations in these parameters, as the variation is already factored in.
Several studies using this approach showed that it can lead to a significant increase in the reproducibility of specific results. Using different mice strains in the same cage, for instance, led to a reduction of the variation in experimental outcomes (Walker et al. 2016). Varying environmental factors the animals are exposed to also had a positive effect on replicability (Richter et al. 2009, 2010, 2011). Furthermore, simulations suggested that multi-laboratory experiments, which automatically sample different housing and handling conditions, could increase reproducibility from 50% to 80% (Würbel 2017).
Whether this approach is applicable to other cases of semi-standardised research remains to be seen. What it shows, though, is that we are unlikely to find a general approach to the issue of control and replicability, as researchers will abandon the ideal in some cases, and re-invent their own experimental approaches in others in order to stick to the ideal. When it comes to the grey zone of semi- and non-standardised experiments, a case-by-case analysis that pays close attention to actual research practice will therefore be more important than a single, general demarcation criterion. There is no general ‘ought’ that can be imposed here, and the living/non-living distinction can only serve as a rough guide for demarcation.
In the last section, I will turn my attention to a second issue this demarcation criterion faces, namely that of scope. I will argue that plasticity and historicity, which have been exclusively attributed to living systems, apply to other systems as well, in particular to macromolecular complexes such as DNA or proteins.Footnote 11 This extension matters as it suggests that the problem of localism is relevant to more fields than initially thought, extending beyond the realm of living things. The rise of post-genomic approaches to biological research, and in particular the field of environmental epigenetics, has had a huge role to play in this context.
Extending the lines
The shifts that were brought about by the postgenomic revolution over the last 10–15 years had several effects on biological theory and practice. The one that matters most for our discussion here is the shift in our understanding of historicity: some of the dynamics that were exclusively ascribed to living systems are now seen as features of other elements of biological systems as well.
This shift has been mainly due to methodological developments. New technologies, such as microarrays or high-throughput sequencing, have allowed researchers to gain new insights into the dynamics of the organism and into the importance of phenomena such as symbiosis (Guttinger and Dupré 2016). This has led researchers to a new understanding of the importance of context and history for the makeup of what were previously seen as ‘mere’ molecular systems.
Especially macromolecular complexes such as genomes or proteins are no longer seen as passive junks of matter. The genome, for instance, is now seen by some as a ‘reactive’ entity that is co-produced and maintained by a range of different processes (Gilbert 2003; Stotz 2006; Keller 2014). The genome has a sort of memory of past exposures (through epigenetic modifications of nucleotides or histone proteins) and its structure and behaviour are therefore not only defined by its sequence and its present context but also by past events; as some authors have argued, the genome has a lifespan of its own (Lappé and Landecker 2015). Because of these empirical and theoretical developments, fields such as molecular biology, genomics, or even biochemistry now have to be considered as areas of science where the ideal of replicability might hit its limits.
Interestingly, most of the accounts that I discussed in section 4.2 don’t leave room for such an extension, as they insist on using the living/non-living distinction as a demarcation criterion. Montévil (2019), for instance, explicitly excludes biochemistry from the problems biological measurement faces. Nadin (2018) also seems to exclude the physico-chemical realm of peptides and other molecules from the problems living systems pose. Similarly, de Rijcke and Penders (2018) count DNA molecules as part of the class of ‘indifferent’ entities. In essence, anything molecular is excluded in these accounts from the realm of plasticity and historicity.
What the recent developments in the postgenomic life sciences highlight is that these lines might be too narrow and that important questions about historicity, plasticity, standardization, and control also have to be asked in the molecular life sciences. Overall, this means that the new localism in the replication crisis debate is even more important than the authors discussed above claim, as it raises important questions for a broader range of disciplines regarding their methodologies and the production of reliable output. At the same time, the discussion in section 4.3 has shown that historicity and plasticity always have to be assessed in the context of actual practice. In themselves they are not a reason for scientists to shun the norm of replicability.