1 Introduction

Scientists constantly rely on surrogates to obtain evidence supporting or disconfirming their hypotheses. A surrogate is generally an object that can be used to obtain information or make inferences about another object, frequently called the target. Surrogates are very diverse, including thought experiments, mathematical models, computational methods such as machine learning or cellular automata, and more complex material models such as organoids or even model organisms (Ankeny & Leonelli, 2011; Bolker, 2009; Green et al., 2022; Griesemer, 1990; Mäki, 2009; Oliveira, 2022; Suárez, 2004). In general, when an object is used as a surrogate or as a tool for surrogative reasoning, it is used to make inferences about its target under the general assumption that whatever can be learnt from the surrogate—i.e., whatever happens to the surrogate—can be attributed to the target—i.e., it will happen to the target (Cassini, 2018; Contessa, 2007; Swoyer, 1991). When this occurs, the object used for making the inferences is called surrogate, and the activity of using the object to make inferences about the phenomenon is called surrogative reasoning. For instance, a climate model is a surrogate for the climate because it is assumed that whatever will happen to the model will happen to the climate. We will assume such basic characterization of surrogates and surrogative reasoning for the rest of this paper.

Two key elements that all surrogates share are that they function as tools that can provide evidence for or against some scientific hypotheses and that they are used because they are more convenient for testing the hypotheses than doing more direct research on the phenomenon or entity that the hypothesis is about (Contessa, 2007). This convenience may be derived from different sources: sometimes, the surrogate is cheaper to use than it would be obtaining direct evidence for the objects that the hypothesis is about (as in the case we discuss in this paper); on other occasions, this is due to the ethical limitations (e.g., the use of model organisms to test secondary effects of drugs), and still in others, it is due to intrinsic limitations of the phenomenon, whose complexity makes impossible to study it by relying on other means (Moreno & Suárez, 2020). It is generally accepted in contemporary philosophy of science that the fact that a model or technology can be used for surrogative reasoning about a system is grounded on the fact that users of the model employ it for that purpose. This view is expressed, for example, in Contessa (2007), who, following Suárez’s (2004) inferentialist account of scientific models, believes that the user’s intention to use a model as a basis or vehicle to make inferences about the phenomenon or target system, is what grounds surrogative reasoning. That is, a surrogate will be grounded as a tool for answering a specific scientific question if the scientists interpret it as useful to answer such a question, even though it is not sound (see Sect. 5, for a review of these positions).Footnote 1

This paper concentrates on the use of surrogates in microbiome science to show that this thesis is false, at least insofar as it concerns the use of technologically-driven surrogates for hypothesis testing. We argue that even while the aim of the scientists in using a specific tool as a surrogate influences that it acquires such a role, not every object that scientists decide to use as a surrogate does so. In other words, there must be certain soundness in the surrogative reasoning for it to be considered as such, at least insofar as it concerns technology-driven surrogates.Footnote 2 By drawing on Lloyd’s (2015) concept of “the logic of research questions”, we defend that for a surrogate to work as such, it needs to track the properties imposed by the internal rationale of the research questions being asked. Or, to put it differently, failure to correctly appreciate the research questions that the technology being used has been designed to answer amounts to a failure in the inferences being made by using the technology as a surrogate. We refer to this type of failure as epistemic misalignment, which is any situation in which the information provided by the apparent surrogate fails to confirm the hypotheses despite apparently positive results, general acceptance of the evidence by the community, and the scientist’s intention to use it as a confirmatory piece of evidence for the hypothesis.Footnote 3 We defend that in cases of epistemic misalignment, as we define it here, there is no surrogative reasoning at all.Footnote 4 To support our claim, we study the generalized use of the 16S rRNA gene as the basic sequencing technology that acts as a surrogate to establish certain claims and support a certain hypothesis about the microbiome.

In Sect. 2, we justify the choice of microbiome science and introduce the theoretical framework we will build upon. In Sect. 3, we analyse the history of 16S rRNA sequencing to illuminate the type of research questions that it was designed to answer (i.e., what it is a surrogate for). In Sect. 4, we introduce the type of research questions posed in today’s microbiome research and show that many of them concern an evolutionary tempo which does not match adequately with the tempo that 16S rRNA was introduced to measure. In Sect. 5, we argue that this case study shows that surrogative reasoning is strongly tool/technology-driven, to the extent that failure to appreciate the real implications of the sequencing technology being used in microbiome science has resulted in a lack of alignment between the research questions being asked and the technologies used to reply to these questions. We use this evidence to claim that in technology-driven research, the validity of a given technology to serve as a surrogate heavily depends on the type of properties that the technology captures and how these properties match the research question being asked. In Sect. 6, we conclude.

2 Repertoires, the logic of research questions, and the problem of surrogates in microbiome science

Microbiome science is characterized by the combination of tools, methods, hypotheses, research infrastructures, etc. from many different sources across multi-disciplinary teams that, in most cases go well beyond a single lab, and even a single country (Huss, 2014). Its growth during the last two decades was facilitated by the development and generalized use of high-throughput technologies—including 16S rRNA analysis—for the study of cell biology (Douglas, 2018a; Vrancken et al., 2019). This had profound consequences on the understanding of the phenomenon of life (Reuter et al., 2015). On the one hand, it has been shown that microbial life is much more abundant and diversified than it was previously thought, clearly widening the scope of contemporary biological research and triggering an unprecedented interest in microbiology (O’Malley, 2014; Suárez, 2016). On the other, it has been demonstrated that numerous living entities, traditionally classified as ‘animals’ and ‘plants’ (macro-organisms or hosts, hereafter), host a vast array of microorganisms in their bodies, collectively called its microbiome or its microbiota (Lederberg & McCray, 2001; Marchesi & Ravel, 2015). Some of these microorganisms are highly specialized and play very specific functional roles in host biology, influencing the host diet, immune system, resistance to adverse environmental pressures, etc. (Douglas, 2018a; Gilbert et al., 2012; Lynch & Hsiao, 2019; McFall-Ngai et al., 2013).

The profound impact of these discoveries has given rise to a series of research projects addressed specifically understanding multiple features of microbiome biology. Some of these projects include the Human Microbiome Project (Turnbaugh et al., 2007), the Soil Microbiome Project (Köberl et al. 2020), the American Gut Project (http://americangut.org/about/), or the Earth Microbiome Project (Gilbert et al., 2014; Thompson et al., 2017) (see Huss, 2014; for a review). All these projects have been grounded on the assumption that a better understanding of the biology of the microbiome(s) would provide a deeper understanding of complex biological systems, ranging from different features of human health to human and non-human evolution, with potential effects on different types of industrial settings (e.g., pest control, disease treatment, etc.). The projects worked by integrating and aligning knowledge derived from a wide variety of research sources (biomodelling, sample-collection, sequencing, statistical analysis, etc.), creating shared standards for sample-collection and data-sharing that the community could uniformly follow and re-use in their future projects, and reconceptualizing or reinterpreting the data in combination with new evidence to answer new questions (O’Malley, 2014; O’Malley & Soyer, 2012).

Most contemporary research on microbiome science is carried out by relying exclusively on 16S rRNA as the default sequencing technology, in most cases, regardless of the specific questions that the research aims to answer. This is not the place to review all this evidence, nor is this paper’s intention to contribute to the growing literature suggesting that other types of technologies—proteomics, metagenomics, etc.—should be preferred over 16S rRNA analyses (see, e.g. Cirstea et al., 2018; Douglas & Langille, 2021; Greslehner, 2020; Poretsky et al., 2014; Wang et al., 2021). But we want to at least illustrate the pervasiveness of the technology by highlighting a few studies relying on 16S rRNA to answer very diverse research questions. These are summarized in Table 1.

Table 1 Short sample of studies using 16S rRNA to reply to different research questions in today’s microbiome research

In this paper, we follow Ankeny and Leonelli (2016) in considering that microbiome science could be fruitfully analysed as a (research) repertoire.Footnote 5 This notion was originally introduced as a handy post-Kuhnian concept to facilitate grasping the complexities involved in the type of research that is carried out in fields that require multi-disciplinary interactions. Examples of repertoires include model organism research and microbiome research, both analysed in the seminal work by Ankeny and Leonelli (2016), exposome research (Canali, 2020), small RNA research (Veigl, 2021), or climate modelling (Lloyd et al., 2022), among other instances of contemporary science. One of the characteristics of repertoires is that they are created and maintained through the coordinated efforts of different individuals working in different subgroups who have the ability “to wield and align specific skills and behaviours with appropriate methods, epistemic components, material, resources, participants, and infrastructures” (Ankeny & Leonelli, 2016, p. 19). Repertoires affect those research groups that join them insofar as they influence their identity, boundaries, practices, and outputs. If microbiome science is a repertoire, then a complex alignment between all these components must be appreciated. This includes, among others, an alignment between the research methods and technologies employed in microbiome research, including what these technologies are surrogates for and the type of questions that microbiome researchers primarily address in their projects.

We examine the relationship between the sequencing technologies as they are employed in microbiome science as surrogates that support/falsify the answers to the main questions asked within the field and the type of research questions that structure the repertoire. This suggests analysing a part of the microbiome repertoire according to the following structureFootnote 6:


Research question_________

Answer/Explananda (in the form of a hypothesis)__________

Source of the evidence for the answer_________

Research method (technology) used as a surrogate_________


An alignment occurs when the technology used for obtaining the evidence that ultimately supports the hypothesis really matches the research question. A misalignment occurs whenever any of these elements is empirically or conceptually disconnected. For example, questions about the interaction between humans and their microbiome sometimes rely on evidence obtained from mice as model organisms. An alignment occurs when the research question can genuinely be answered by relying on mice as model organisms—for instance, in case mice and humans share the specific developmental or physiological pathway that the research question is about. However, if this does not occur, the result is a misalignment between the source of the evidence and the answer (Douglas, 2018b, for an interesting review article criticizing this type of misalignment). Another type of misalignment, which we investigate in this work, occurs when even if the source of the evidence is reliable to answer the research question, the research method that is used—in the form of the sequencing technology—does not capture a reliable answer because there is a mismatch between the properties that the research question demands to be satisfactorily replied and the properties captured by the research method. When this occurs, we will say that there is an epistemic misalignment, which, we will argue, poses fundamental challenges for considering the sequencing technology as a surrogate (Sect. 5).

Concerning what research questions are and the role they play within the repertoire, we follow Lloyd’s (2015) concept of logic of research questions. In Lloyd’s framework, it is argued that the logic of research questions “we ask, constrains what classes of answers we can give” (Lloyd, 2015, p. 346). While her account is primarily conceived in model-theoretical terms, as framing how the way in which scientists formulate the answers they aim to account for limits the possible hypotheses or new ideas that they can investigate, we believe it can easily be expanded also to include the technologies (or research methods, more generally) that scientists use to obtain the evidence that supports the correctness of their explananda (Lloyd, personal communication). Scientists rely on different research methods as surrogates to make inferences about a class of phenomena that usually expands way beyond the limits of what the research method primarily establishes. This is legitimate insofar as the research method allows making the type of surrogate reasoning for the specific questions that scientists are investigating. But whether this is so must be proven on a case-by-case basis, being a complex task that strongly depends on the type of surrogate reasoning that the method was created to do.

Given the relevance of the choice of research methods for correctly answering the research question posed, we believe that Lloyd’s definition of the logic of research questions could be complemented by arguing that the logic of research questions constrains the classes of answers …as well as the classes of methods (understood as technological, experimental, or field devices) that allow the surrogate reasoning that adequately captures the answers that are logically acceptable for a scientific research field. Identifying the logic of research questions that drive a research field, as well as the appropriate methods to gather the required evidence to answer the question, is essential in any research field and probably even more essential in a repertoire given their multi-disciplinary nature. Failure to appreciate these components “can lead us to miss what’s really going on, therefore to scientific failure” (Lloyd, 2015, p. 346). Importantly, failure to appreciate what can really be inferred from the use of a research tool may lead to conflating an epistemic thing, with a technical object (Rheinberger, 1997), to the point that can substantially affect the progress of the field (Rheinberger & Mueller-Wille, 2009).

This paper builds on the theoretical frameworks of repertoires, which suggest that microbiome science should be built upon the alignment of epistemic elements, and the logic of research questions, which serves as a theoretical tool to understand how certain methods align with the inner logic of some research questions, to analyse the extended use of 16S rRNA sequencing as a surrogate in contemporary microbiome science. While the primary message of the paper is that insofar as surrogates are technology-driven, the user’s intentions to use them as surrogates are not enough to ascribe them such a role, the paper also presents an extra-layer of complexity in the analysis of scientific repertoires. Particularly, we show how the alignment of elements in repertoires needs to be constantly revised due to the combinations of teams with very diverse sources of expertise and diverse background assumptions about what specific technologies really measure. This is especially relevant in the context of biology, where different research questions examine different temporal properties, and technologies are sometimes built to examine very concrete tempos. Drawing on this, we show that the aim of “alignment” that characterizes repertoires has only been partially met in microbiome science, and microbiome scientists need to keep reflecting on how to reach better alignments.

3 The origins of 16S rRNA sequencing as a surrogate for non-eukaryotic phylogeny

The introduction of the 16S rRNA/rDNA technology to sequence microorganisms and establish non-eukaryotic phylogeny constitutes one of the most fundamental steps for our contemporary understanding of non-eukaryotic life.Footnote 7 It could be asserted without the risk of equivocation that all we currently know about non-eukaryotic phylogenetic evolution is substantially a result of the field of possibilities that 16S rRNA opened up. Knowing the history of this technology, as well as the reasons that justified its introduction and generalization in non-eukaryotic research, is essential for understanding what 16S rRNA can serve as a surrogate for, as well as what it cannot serve as a surrogate for. In other words, we show what 16S rRNA provides information about, and what type of research questions can be answered by relying on this type of information.

It is generally acknowledged that the generalization in the use of 16S rRNA as a genetic marker to establish phylogeny derives from the pioneering work of Woese in the 70 and 80s.Footnote 8 Before the introduction of molecular technologies to establish phylogenetic relationships, phylogeny was based on phenotypic characteristics and classificatory schemas for non-eukaryotic forms of life derived largely and mistakenly from cytology (Woese, 1987, p. 224; Woese et al., 1990). In fact, the concept of prokaryote (pro—before; karyon—nucleus) was introduced based on microscopic observations and referred to the lack of a properly defined nucleus (Chatton, 1938; see Sapp, 2005). Yet, as Woese would note, the use of phenotypic markers (be morphological, based on biochemistry, or of any other type) to establish phylogeny runs serious risks of failing to capture what they really aim to grasp: historical relationships of ancestry and descent between different lineages. Woese expressed this problem in an incredibly elegant manner.

It is the classical microbiologist’s insistence on morphology as the primary criterion (…) that more than anything engendered the confused and confusing state of bacterial taxonomy; almost none of the taxa (…) defined primarily in this way pass phylogenetic muster (Woese, 1987, p. 232).

The 70s were convoluted but revolutionary years for taxonomy, as it is well-known to any philosopher of biology (Hull, 1965). Three schools of thought were then competing to become the predominant school: phenetism, cladism, and evolutionary systematics. It was in this context that Woese convincingly showed the limits of phenetism as a phylogenetically valid taxonomic method in non-eukaryotic phylogeny. Phylogeny is about finding a temporal, evolutionary order that shows how certain lineages appeared through divergence from other lineages. Relying on phenotypic features to establish phylogeny is misleading, and especially so in non-eukaryotic life, because similar environmental pressures led to the appearance of similar phenotypes, despite the lack of phylogenetic relationship between the lineages where these phenotypes appear (Woese, 1987, p. 226). This is what all biologists recognise today as convergent evolution, and the limitations that this form of evolution poses to phenetism do not escape the attention of anyone. Hence, what is required for phylogeny is a valid evolutionary “clock”, or chronometer, one that clearly and universally captures the evolutionary tempo (i.e. the ordered way in which lineages diverged) rather than its mode (i.e. the selected phenotypic changes), to use the famous distinction by Simpson (1944).

In this context, Woese wisely vindicates rRNA genes as the optimal basis to capture the evolutionary tempo. The choice of rRNA is conditional upon its universality among living creatures—it is so universal that one of the hypotheses about the origins of life suggests that RNA must have preceded DNA (see Neveu et al., 2013). But in addition to its universality, a feature that could be shared by other molecules and thus taken in isolation would seem a “random choice”, the choice of rRNA was supported by several excellent reasons. Woese summarizes these reasons under the claim that these genes “define the organizational fabric of the cell, (…) give the cell its basic character.” (Woese, 2004, p. 181). rRNA genes thus comprise a family of genes that cannot be easily replaced by other genes (i.e., they do not substantially intervene in processes of horizontal gene transfer, and it is not expected they will do so)Footnote 9, and thus track the real history of the lineage.

But a problem, though, is how to make rRNA epistemologically useful or tractable. In other words, even though rRNA would have the perfect features, in theory, to be the type of chronometer that Woese was looking for to establish a solid ground for non-eukaryotic taxonomy, it may happen that it is experimentally or statistically intractable. This would pose a serious challenge, for as it is well-known, the choice of scientific objects that serve as a model or tool to structure a field always results from a commitment between different, sometimes opposing, research goals (Potochnik, 2017). Woese was very conscious of this, and this tension between goals played a fundamental role in his choice of the 16S rRNA subunit as the optimal one. To quote:

To be a useful chronometer, a molecule has to meet certain specifications as it (i) clocklike behaviour (changes in its sequence have to occur as randomly as possible), (ii) range (rates of change have to be commensurate with the spectrum of evolutionary distances being measured), and (iii) size (the molecule has to be large enough to provide an adequate amount of information and to be a ‘smooth-running’ chronometer) (Woese, 1987, p. 227).

Concerning (i), rRNA is known to be non-coding, but structurally necessary for the process of protein synthesis. This causes the effect that any changes that would influence rRNA sequences would not uniquely reflect the selection pressures, making it a poor candidate for being a chronometer due to the already mentioned convergent evolution. Furthermore, it would make that most mutations would not be easily tolerated, due to its structural rather than directly functional role (Claridge, 2004, p. 841). This last point can be easily observed with an example: suppose an organism suffers a mutation blocking the synthesis of fructose. This mutation would be tolerated, provided it can rely on a different source of nutrients. This event can repeat several times in the history of life and in independently evolving lineages, which suggests that relying on these genes as phylogenetic markers can be misleading. Yet the same is not true with structural components of cellular function, i.e., those that are required for making any protein. rRNA is one of these components.

Point (ii) is used to argue the invalidity of cytochrome c as a chronometer for non-eukaryotic phylogeny. This is relevant because cytochrome c is widely used in cladistic studies in eukaryotes. But Woese thinks that this genetic marker is of little use to understanding non-eukaryotic evolution because the organisms in question have completely different and greater phylogenetic ranges than eukaryotes. Thus, what is optimal for eukaryotes should not immediately be considered optimal for non-eukaryotic life.

Finally, point (iii) is essential for it allows discarding 5S rRNA as an appropriate tool and leaves only two options open: 16 or 23S rRNA. The idea is the following: even though we had a specific gene that had characteristics (i) and (ii) and thus was the perfect option for becoming a phylogenetic marker, it would be useless if its sequence were so short that the data derived from it were useless for statistical comparisons between regions. 5S rRNA (120 nucleotides), but not 16S rRNA (1542 nucleotides) or 23S rRNA (2906 nucleotides), is precisely one of these excessively short genes, which are of little use for phylogenetic comparisons. Claridge expressed this concern about the ideal gene for making statistical comparisons as follows:

The [16S rRNA gene sequence] is large enough, with sufficient interspecific polymorphisms of 16S rRNA gene, to provide distinguishing and statistically valid measurements (Claridge, 2004, p. 842).

The successes of 16S rRNA in non-eukaryotic phylogeny are well-known, starting with the discovery that there are three domains of life being possibly the most salient one (Woese & Fox, 1977; Woese et al., 1990; for a good summary, see Han, 2006; Janda & Abbott, 2007; Wang et al., 2015).Footnote 10 Yet what is relevant for this work is to understand what Woese was really doing with his insistence on using 16S rRNA, and not any other tool, to study non-eukaryotic phylogeny. The framework of the logic of research questions introduced in Sect. 2 is helpful here. Woese, as well as many other scientists interested in the study of non-eukaryotic evolution, recognised the necessity of introducing a specific tool that allows tracking of their guiding research question satisfactorily. As their interest concerned phylogenetics, their guiding research question can be summarized as a specific subset of the question about how the non-eukaryotic world evolved. Concretely, the research question being asked is something like:

How did non-eukaryotic lineages diverge and what are the relations of ancestry between non-eukaryotic lineages?

Note that formulated this way, the research question constrains the type of answers that can be given. The research question does not concern the evolution of certain non-eukaryotic lineages or how certain non-eukaryotic traits have changed over time. These questions, formulated under the more global research question concerning bacterial evolution, can be answered (at least partially) by relying on morphological data. But Woese and other microbiologists were not interested in that question: they wanted to know how non-eukaryotic lineages came from each other, i.e., their evolutionary tempo, not their mode. They wanted to establish a phylogenetic ordering in the non-eukaryotic domain, and one that faithfully reflected the different histories of phylogenetic divergence rather than the historical contingencies of certain lineages.

The question Woese and others asked thus constrained the range of possible biological answers they should look for. Additionally, as the logic of research questions one asks simultaneously constrains the possible range of methods that serve to provide satisfactory evidence to answer those questions, it turns out that the specific gene that Woese and those working on non-eukaryotic phylogeny would choose was the region of the non-eukaryotic DNA that was more appropriate for their task. In fact, Woese’s justification of his choice of rRNA and, specifically, of the 16S subunit, clearly reflects his intention of finding an optimal tool to investigate this research question. The choice was a process in which other regions were discarded because they lacked the adequate genetic properties—cytochrome c—whereas others were discarded because they were epistemically unsuited for the type of comparisons that are required to establish the adequate standards of evidence—5S rRNA.

Therefore, as evolutionary tempo can only be measured indirectly, by relying on the traces it leaves, Woese and other microbiologists in the 70s had to find an adequate surrogate to answer their guiding research question. As the logic of research questions imposes some restrictions on the type of surrogates that can satisfactorily provide evidence for the answers to their questions, they opted for 16S rRNA, after having discarded a range of options. 16S rRNA turns out to be an optimal methodological tool to measure bacterial phylogeny and to measure this specific evolutionary tempo. This case provides a clear example in which a research tool or technology used for surrogate reasoning is selected, among a set of possibilities, because it is an optimal tool to reply to a specific research question and to satisfy the inner logic that the research question demands.

This section illustrates why knowing the history of a technology illuminates the type of research questions that can be addressed by relying on it, as well as those that cannot be answered by relying on that technology. The question we ask is whether the family of questions that contemporary research on the microbiota asks (Sect. 4) is tractable by relying on the same type of methodology that Woese elevated to a gold standard for non-eukaryotic taxonomy. To put it another way, is the 16S rRNA gene fine-tuned enough to provide any significant answer that guides the logic of research questions in contemporary microbiota research? Based on the discussion in this section, we speculate that if Woese were asked to find a perfect candidate gene in non-eukaryotic life to not match with the research goals of contemporary microbiota research, he would probably be inclined to choose 16S rRNA. He would do so for very good reasons. We will argue why we believe this to be so in the next section.

4 Research questions in today’s microbiome science

4.1 Evolutionary research

Microbiome science has had a substantial impact in the last two decades on evolutionary biology. Several research groups studying the evolution of very diverse host lineages have shown consistent evidence that the microbiome has played a key role in host evolution. A seminal case study, which actually inspired the hypothesis that hosts evolve together with their microbiome, concerns corals, whose evolution would have been partially caused by the microbiome (Zilber-Rosenberg & Rosenberg, 2008). Additional examples include the evolution of hybrid lethality in Nasonia wasps (Brucker & Bordenstein, 2013), the evolution of hematophagy in vampire bats (Suárez & Triviño, 2020; Zepeda Mendoza et al., 2018), or the evolution of the digestive capacities in ruminants (Gilbert, 2020), among others (Suárez, 2018).

One of the fundamental questions underlying the research on the microbial basis of host evolution concerns how a rapidly changing microbiome can affect, positively or negatively, the evolutionary responses of their hosts to different types of environmental challenges. As is well-known, host evolution occurs relatively slowly due to their long reproductive periods. It thus requires several generations before their genomes can adapt to changing environmental conditions, which may be a challenging task to achieve when these conditions shift faster than the capacity of hosts to generate enough genetic variation. The situation is especially acute when the host is evolving to extreme environments, i.e., environments which require a high degree of specialization at different levels of the host’s physiology. Several researchers have perceived that the microbiome offers a perfect opportunity for generating rapid phenotypic variation that allows coping with environmental challenges until this variation can be encoded in the host genome (Henry et al., 2021; Suárez, 2020, 2021). In a recent study, Rudman et al., (2019) showed that the microbiome shapes a rapid adaptation in Drosophila melanogaster, which strongly suggests that the microbiome has the potential to play a fundamental evolutionary role which would explain local variation in host lineages, including potentially humans (Suzuki & Ley, 2020). In fact, the idea that the microbiome influences host evolution over different timescales and with important evolutionary effects has been recently explored succinctly (Kolodny & Schulenburg, 2020). Note that this possibility had been previously pointed out in several evolutionary models (Bourrat, 2019; Lloyd & Wade, 2019; Osmanovic et al., 2018).

A more intriguing philosophical question that derives from the study of the impact of the microbiome on biological evolution concerns the exact implication(s) of the hypotheses being raised: Is microbiome evolutionary research showing that host species and the totality of the microbial species that compose the host microbiome constantly coevolve/cospeciate, and co-adapt to each other? Is it rather suggesting that some very specific microbial species of the microbiome are coevolving with the host species? Or alternatively, is microbiome evolutionary research exploring whether, and if so, how the microbiome is an evolutionary factor affecting the evolution of the host species? These different research questions require being adequately framed and distinguished from each other, as they impose conflicting logic of research questions and thus necessitate different types of evidence to be answered, or have those answers confirmed/falsified. For instance, a rigid interpretation of host–microbiome evolution as host-microbiome cospeciation likely requires an almost perfect heritability of the microbiome (Douglas & Werren, 2016; Moran & Sloan, 2015; Stencel & Wloch-Salamon, 2018), as well as evidence that the microbes of the microbiome cannot live outside the host environment. On the contrary, a more nuanced interpretation which accommodates the possibility that host-microbiome evolution/adaptation may be one-sided (on the host lineage) requires evidence that the host-lineage has evolved some basic mechanisms to guarantee the control over some parts of its microbiome that outsource key traits for the phenotypic expression and the survival of its host (Lloyd & Wade, 2019; Suárez 2020; Suárez & Triviño, 2020).

Determining which question is asked in microbiome research depends on the specific perspective that the researchers or research groups adopt. Suárez and Stencel (2020) distinguish between whole-dependent and part-dependent perspectives, and among the latter, they differentiate between host-dependent and microbe-dependent research questions. This makes room for at least three different types of research questions concerning the role of the microbiome in evolution. Each of the questions isolated by Suárez and Stencel explores a different dimension of the evolutionary tempo, and hence each imposes its own logic. Table 2 summarizes these questions and the logic of the research question each of them imposes.

Note that only one of the questions can be immediately investigated by relying on 16S rRNA sequencing, as it is the only question that is aimed to track the phylogeny of non-eukaryotic lineages directly. However, both for the whole-dependent research questions, and for the host-dependent questions, the role of 16S rRNA as a surrogate is limited. Let us start with the case of host-dependent research questions. These questions impose a logic in which what needs to be tracked is whether a specific host phylum has diverged partially due to its association with the microbiome. While using the 16S rRNA may seem initially plausible, and in fact, it has been the main tool in use to carry phylosymbiotic studies (Brooks et al., 2017; Lim & Bordenstein, 2020), its use is limited for it masks variations between strains

Variations between strains are produced when changes in one or more genes affecting species function take place without necessarily a corresponding change in the 16S rRNA. Changes in the microbiome strains are known to affect the host phenotype, and over evolutionary time, they may also affect host phylogeny. In fact, it is known that even changes in microbial species as these are identified by relying on 16S rRNA may be irrelevant for host evolution if the new species acquired still plays the same functional role, i.e., still bears the genes that code for the same function than the other species coded for (Doolittle & Booth, 2017; Lemanceau et al., 2017; Suárez, 2020; Taxis et al., 2015; Veigl et al. 2022). As this is so, a functional approach to microbiome analysis is preferred over one that centres on the phylogeny because the evolutionary tempo being measured in the microbiome is at odds with the evolutionary tempo of the host phylogeny. Ignoring that the 16S rRNA is a gene specifically chosen for its role as a perfect guide to non-eukaryotic phylogeny allows knowing the specific evolutionary tempo that it allows measuring, as well as the tempo that it does not allow measuring. Early phylosymbiotic studies ignored the existence of these contrasting tempos, as denounced by Moran and Sloan (2015) and acknowledged by those encouraging the study of phylosymbiosis (Rosenberg & Zilber-Rosenberg, 2018). Therefore, 16S rRNA sequencing, despite how widely used it is as a surrogate for evidence concerning host-dependent research questions, is a limited tool for answering the type of host-dependent questions we have isolated due to the properties of the specific gene being used. As we already argued, 16S rRNA is strictly a surrogate for replying to research questions about bacterial phylogeny

Whole-dependent questions are, however, more problematic, since the type of evidence they require may sometimes be provided by 16S rRNA analysis. This is especially so for the cases of major transitions in individuality, where what is asked is whether a host and its microbiome—or a subset of those—co-speciated to the point of becoming a single evolutionary individual (in the sense of a manifestor of adaptation, see Lloyd 2017). 16S rRNA may be revelatory in these cases, as it allows singling out specific microbial phyla that are exclusively bounded to their hosts. Yet, while this is true, this is only partially useful, for even with all the changes that a microbial symbiont may experience because of obligate transgenerational symbiosis, the 16S rRNA is so stable that it may not show any specific pattern of response. Therefore, while useful for some cases, 16S rRNA is also limited for answering whole-dependent questions, a point that several have already emphasised (Suárez, 2020)

Table 2 Research questions in evolutionary microbiome science and the role of 16S rRNA as a surrogate to provide satisfactory evidence to answer these questions

4.2 Biomedical research

The relevance of microbiome research for biomedicine purposes has grown tremendously over the past decade. Indeed, a growing body of evidence, spanning from in vitro and in vivo studies to so-called omics research, has shown that an essential and determinant component for understanding the state of health of the human organism and the development of certain pathologies strictly depends on the activity of the symbiotic microorganisms associated with our species (Fan & Pedersen, 2021; Lynch & Pedersen, 2016). The microbiome populates not only the intestine but many other areas and surfaces of the human organism (Fan & Pedersen, 2021; Ursell et al., 2012), and it is now widely accepted that the microbiome plays various roles for humans: it trains the immune system, modulates the immune response and the neuroendocrine functioning, and influences a great number of metabolic processes and global organismal functions, such as digestion (Belkaid & Hand, 2014; Neuman et al., 2015). Therefore, the microbiome has become associated with several diseases, including metabolic disorders (obesity, type 2 diabetes, cardio-metabolic disorders, malnutrition, etc.), autoimmune diseases, and various forms of tumours. On top of that, its role in cognitive functions as well as neurodegenerative diseases and neurological and behavioral disorders, such as autism spectrum syndrome, is becoming clearer (Boem et al., 2021; Johnson, 2018; Sherwin et al., 2019).

All these lines of research trigger different research questions which are explored in biomedical research on the microbiome. The most striking question concerns the real effect of the microbiome on the host health state. Is it the only cause of the pathology? Is it rather an influencing cause, among many others, of the disease or the health state? Is it instead a primary causal factor in determining the health state? These three general questions converge under the umbrella of how the microbiome influences host health, which is the basic guiding research question in biomedical microbiome science. Importantly, this set of questions have guided microbiome researchers in biomedicine to elaborate on three main lines of research, closely interrelated, and which trigger specific research question imposing their own logic and requiring a specific way of aligning the research methods with the problem to be solved. Table 3 isolates these questions.

Table 3 Research questions in biomedical microbiome science and the role of 16S rRNA as a surrogate to provide satisfactory evidence to answer these questions

The first of these questions concerns the characterization of the microbiome composition of a healthy donor, which originally meant determining its taxonomic architecture. The guiding assumption here is that knowing the taxonomic composition will immediately provide knowledge of its function and, in due time, it will allow intervening on the microbiome (Tap et al., 2009). In other words, to know how the microbiota has an impact on human physiology and pathology, it is first of all essential to uncover who are the causal actors at play. Early analyses of microbiome composition concentrated precisely on elaborating a catalogue of the taxonomic composition of a healthy microbiome, which connects to a well-known tradition in natural history according to which the collection, comparison and classification of samples constitutes an irreducible form of knowledge (Strasser, 2012, 2019). 16S rRNA is probably one of the most useful tools for this task, as it allows capturing a relatively stable identity, measured by the stable evolutionary tempo of the bacteria.

However, the paradox arises because the logic imposed by the first of these questions is not really connected to the logic imposed by the second and the third research question. This is because each question is connected to different biological tempos. While the question about microbial classification is connected to the evolutionary tempo of the microorganisms, questions about how the microbiome functions in a host and how it is possible to intervene in it connect to the ecological tempo of the host (Lemon et al., 2012; McDonald et al. 2015; Ronai et al., 2020). In fact, that these questions were disconnected was soon discovered in biomedical microbiome science, where it eventually became common ground that taxonomic information was mostly irrelevant to uncovering the function of the microorganisms or the specific ways of intervening on unhealthy individuals. This, in turn, motivated the introduction of network approaches from ecology, in which the global microbiome community will exhibit certain features because of the relative abundances, densities and interactions of the microorganisms (Deulofeu et al., 2021; Heintz-Buschart & Wilmes, 2018; Knight et al., 2018; Rosen & Palm, 2017; Xu & Knight, 2015). Additionally, it was soon noticed that the nodes in these networks must represent functions and not taxonomic genes—including 16S rRNA—as the global properties of the network architecture ultimately depend on how functions interact (Lemon et al., 2012). Functions, however, are not captured by 16S rRNA analysis, for the reasons explained in Sect. 3. Therefore, 16S rRNA analysis, even while it may provide useful information about the microbiome, is pointless when the guiding research question concerns intervention on host health through the microbiome or analysis of microbiome function. In other words, 16S rRNA is not an adequate surrogate for answering these questions.

Overall, it seems that 16S rRNA is extremely limited when it comes to answering the type of questions posed in today’s microbiome research insofar as the logic these questions impose requires the investigation of biological properties whose evolutionary tempo is different from the evolutionary tempo captured by 16S rRNA. Importantly, this is so despite the generalized use of 16S rRNA sequencing in microbiome science and despite the intentions of microbiome scientists to adopt 16S rRNA analysis to back up their evolutionary and biomedical claims.

From a philosophical perspective, the perceived problems can be analysed by pointing out the difficulties in answering any research question by reusing a tool which was designed to answer a different research question. While this is common in scientific practice, it is problematic, for the type of evidence that the tool provides might be at odds with the type of evidence that is necessary to answer the other research questions, even when the tool was optimal to answer the research question it was designed to answer. To put it differently, research tools cannot be easily extrapolated between fields or even within the same field if the type of research questions that the tool is extrapolated to answer are not clearly specified. In connection with these general problems about extrapolation, in the next section, we will derive the philosophical lessons for thinking about surrogative reasoning.

5 The perils of epistemic misalignment: surrogative reasoning in microbiome science

The concept of surrogative reasoning was originally introduced by Swoyer (1991), who presents it in connection with the concept of structural representation to denote the type of reasoning oriented to draw inferences from a vehicle of the representation (what represents) to the target of the representation (what is represented). He illustrates the concept with the use of numbers and mathematical theory to represent and make inferences about empirical physical objects. He says that the numbers can function as surrogates or proxies for the empirical world because the patterns of relations between the numbers mirror the pattern of relations between the objects. To quote:

And because the arrangements of things in the representation are like shadows cast by the things they portray, we can encode information about the original situation as information about the representation. Much of this information is preserved in inferences about the constituents of the representation, so it can be transformed back into information about the original situation. And this justifies surrogative reasoning since if we begin with true premises about the object of representation, our detour through the representation itself will eventually wind its way back to a true conclusion about the original object (Swoyer, 1991, p. 253).

Swoyer is conscious that not every relation of representation allows surrogative reasoning. This depends on specific features of the mapping between the target and the vehicle and, concretely, which relations between the individual constituents of the phenomena are preserved by the vehicles (p. 473). He, however, fails to provide a faithful characterization of the exact conditions that the specific vehicle-target relationships must satisfy for surrogative reasoning to be sound for the target of the representation. In other words, he does not explain what the mapping consists in. And, given some of the cases he analyses, and his view that surrogative reasoning may involve multiple levels of surrogates (p. 505, fn. 31), it seems unclear whether the validity depends on the user or on specific shared properties—although he would likely be more inclined to support the second alternative, given what he says in the quoted paragraph.

Contessa (2007) moved the debate forward by relating the concept of surrogative reasoning to Suárez’s (2004) inferential conception of scientific representation, which allows depicting surrogative reasoning as a feature of scientific representations. In Contessa’s view, Suárez’s conception entails that a scientific model represents its target only if someone uses the model to represent a system (which Contessa calls “denotation”), and the model allows the user to perform specific inferences about the system (which Contessa calls “surrogative reasoning”) (Contessa, 2007, p. 49). Drawing on this, Contessa intends to spell out the conditions that allow the user of a model to perform surrogative reasoning: a model (vehicle) represents a system and allows surrogative reasoning about the system (target) provided that the user interprets the model in terms of the system. To quote:

… the user’s ability to perform surrogative inferences from the model to the system can be explained by the fact that the user interprets the model [vehicle] in terms of the system [target]. Interpretation is what grounds both scientific representation and surrogative reasoning. (Contessa, 2007, p. 51, emphasis added).

For Contessa, a model needs to be an epistemic representation of its target if it is to allow surrogative reasoning. And he adds that “a vehicle is an epistemic representation of certain target for a certain user if and only if the user is able to perform valid (though not necessarily sound), surrogative inferences from the vehicle to the target.” (2007, pp. 52–53). Note that, formulated this way, a vehicle is never a surrogate absolutely, but rather it is a surrogate for a specific user of the vehicle. This, in turn, occurs whenever the users of the vehicle: (1) take it to stand for the target as a whole, (2) take some of its components to stand for some components of the target, (3) take some of the properties of and relations among the objects in the model to stand for the properties of and relations among the corresponding objects in the target (2007, p. 59). If this occurs, then the user interprets the model (vehicle) in terms of the system (target), and surrogative reasoning is valid. An interesting consequence of Contessa’s view is that it allows divorcing the conditions for a valid scientific representation, including the conditions for the valid use of a vehicle in surrogative reasoning, from the faithfulness of this representation or the success of the representation/reasoning. Otherwise, the concept of representation would not allow for misrepresentation, which Contessa—and most scholars thinking about scientific modelling (Cassini & Redmond, 2021; Frigg & Nguyen, 2020; Giere, 1988; van Fraassen, 1980, 2002)—take as necessary for any good account of modelling.

Microbiome science, however, shows that while the user’s intentions and interpretation of the model as a vehicle that carries information which allows making inferences about the empirical system (target) are obviously necessary for taking the vehicle as a surrogate, they are insufficient to ground surrogative reasoning, at least when it comes to technology-driven surrogates. To put it differently, even if scientists have the intention of using a surrogate with the aim of making inferences about a phenomenon, and they do so because they believe (1), (2), or (3) occur, they may not be doing surrogative reasoning after all. Before presenting the argument, let us clarify what we understand about technology-driven surrogates. By technology-driven surrogates we mean any specific material tool that has been designed and selected based on previous scientific knowledge and with the intention, often explicit, of capturing a specific relevant aspect of an empirical object that cannot be captured without the surrogate. For instance, a model organism is a technology-driven surrogate, and so is sequencing technology. Technology-driven surrogates contrast with the use of mathematical computation or numbers, which can also be surrogates, but are neither usually created based on previous scientific knowledge nor with the intention of capturing a specific relevant aspect of an empirical object that cannot be captured without them. They also contrast with any mathematical model of a phenomenon insofar as technology-driven surrogates substantially depend on the materiality of the object that has been chosen (Ankeny & Leonelli, 2011; Griesemer, 1990; Leonelli, 2007; Weisberg, 2013), and the properties and relations that the surrogate carries out as a result. For example, if one picks up a mouse as a surrogate to study the secondary effects of a specific medical treatment in humans, then the physiological pathways of the mouse will partially condition the results. If there is at least one pathway not shared by the mouse and the human, and the treatment specifically affects this pathway, then it will follow that either the mouse or the human will develop some secondary effect that will not affect the other.

A priori, the technology-driven nature of the surrogate would not need to affect surrogative reasoning, as one may always interpret the surrogate as a limited source of evidence that only serves as a valid source for making inferences about a very concrete subset of states. This is, in the end, common scientific practice and a general feature of scientific modelling. Secondly, even if the conclusion from the surrogate is not true of the target, it does not necessarily follow that the scientists are not using it to do surrogative reasoning. They are simply using it wrong, or have not chosen the right type of surrogate, so surrogative reasoning is not sound. But the conclusion that follows from the study on microbiome science is slightly deeper, as it affects the very properties of the vehicle and the target that need to map onto each other for surrogative reasoning to be grounded and for the specific technological tool used for surrogative reasoning to be really a surrogate. Particularly, we can imagine cases where the reasoning carried out with 16S rRNA is both valid and true of the target, yet it is so only by accident. In these cases, we contend that there is no surrogative reasoning at all.

Bolker (2009) defends a similar view by introducing two conditions that technology-driven surrogates in biomedicine must satisfy. First, it must match its target in relevant ways or features; second, it must respond to manipulations in the same way as its target would do. He, however fails to provide a convincing characterization of why these features ground the role of surrogates as surrogates, which is a gap we will cover here.

Additionally, Díez (2021, p. 122) introduces a similar consideration but with regard to representation and modelling, instead of with regards to surrogative reasoning. He claims “we cannot represent (not even wrongly) if the standing for relation does not appropriately connect the respects in M [the model] and the individuative elements of the target towards which (…) S [the user] intends M to be addressed”. There are two differences with his account: first, we specify what these relations must be; second, we do not intend our account to be about representation or even about modelling, but about surrogates, specifically when the latter are technologies.Footnote 11

To illustrate our view about how technology-driven surrogates work, take the following example. Imagine one performs a 16S rRNA sequencing of the microbiome of two human groups, divided because one carries a specific trait—say, a specific disease—and the other doesn’t. From the 16S rRNA analysis, and by comparing the datasets, it is possible to appreciate that there are significant differences in the microbiome composition between the two groups. One can infer that the microbiome is causally responsible for the disease. In this case, one can structure the research procedure following the schema we sketched in Sect. 2.


Research question What causes a specific disease in a human population?

Answer/Explananda (in the form of a hypothesis) Differences in the microbiome.

Source of the evidence for the answer Datasets generated by two human groups.

Research method (technology) used as a surrogate 16S rRNA sequencing.


If there were surrogative reasoning, in this case, it would work as follows: the targets are sick and healthy humans, the surrogate would be the datasets generated by using 16S rRNA analysis (a technology-driven surrogate), and the inferences that scientists make from the surrogate are then extrapolated to the target. Assume, for the sake of the argument, that the conclusion is true: differences in the microbiome explain/cause the disease. In the end, we agree that the truth or falsity of the reasoning should not invalidate whether this is surrogate reasoning or not. Therefore, in this case, we would have not only valid, but also true surrogative reasoning—i.e., sound surrogative reasoning—according to Contessa. First, scientists aim to use the dataset generated by 16S rRNA to know something about the target, they do so grounded on their belief that this can be done, make valid inferences, and their conclusion is true. We argue, however, that this conclusion is incorrect because there is an epistemic misalignment between the research method being used and any acceptable explananda for the research question.

The research question asks about diseases affecting a human population. In these cases, the tempo of the question concerns the functions encoded in the microbiome, insofar as these are the ones that directly interact with the host immune system and affect its health status (Sect. 4). In other words, the properties scientists must investigate to find a satisfactory reply to their research question concern a specific evolutionary tempo that matches the ontogeny of the host. The research question, thus, puts constraints not just on the answers but also on the classes of technologies that allow the surrogate reasoning that adequately captures the answers that are logically acceptable for a scientific research field. This is because the technologies themselves capture specific properties, frequently embedded in their very design—the research questions they were originally created to answer. On the other hand, the technology being used—16S rRNA sequencing—was designed to capture the phylogenetic tempo of bacteria (Sect. 3). It was chosen, among an extensive group of alternatives, due to its properties for acting as a chronometer, given the changes in its sequence were not subject to natural selection. To put it differently, it was chosen because it was not functionally expressed, and therefore it can measure relations of ancestry. The properties that 16S rRNA captures match the phylogeny of the bacteria and not the ontogeny of the host. Its role as a surrogate is thus constrained by the research question it was designed to investigate, and it is not a surrogate for any research question a scientist decides to use it for. It is only a surrogate for these research questions that measure exactly the tempo that 16S rRNA measures.

The use of 16S rRNA to answer the research question we showed constitutes thus a case of epistemic misalignment, i.e., a situation in which the information provided by the apparent surrogate fails to confirm the hypotheses despite apparently positive results, general acceptance of the evidence by the community, and the scientist’s intention to use it as a confirmatory piece of evidence for the hypothesis. We contend that cases of epistemic misalignment are not cases of surrogate reasoning, even though the scientists may treat them as such, because the mismatch between the research question and the research method—in the form of one given technology—used to gather evidence is so big, that the technology is not a surrogate for such a question. Note that we are not demanding that the technology must allow building true explananda to consider it a surrogate and the reasoning carried out with such technology surrogative reasoning. We are only demanding that there is a non-accidental connection between the properties captured by the technology and the properties required to answer the research question. By relying on the case of the evolutionary tempo, as that’s the relevant property in the case study we have analysed, if the research question requires measuring a specific tempo t, and the technology captures a tempo t′ which is not strikingly different from t, then the technology is a potential surrogate for the research question. This may happen even if, because the research technology captures other properties that are not in the target, the inferences that follow from it acting as a surrogate are false of the target system. However, if the technology captures an evolutionary tempo t″ strikingly different from t, then it cannot act as a surrogate for the research question, even though the conclusions following from the technology are true of the research question. In these cases, scientists would say that they are true by accident, and a characteristic of well-built scientific reasoning is that it cannot be true by accident. Scientific reasoning, even if leading to false conclusions, needs to be justified. Therefore, for a technology to be a surrogate for a specific research question, it needs to be justified that the technology serves to answer that question. It does so, we contend when the properties captured by the technology match the properties of the phenomenon.

Finally, even while the argument has been built by relying on 16S rRNA sequencing and microbiome science, its conclusions expand beyond the scope of the case study to cover all the cases of technology-driven surrogative reasoning. In these cases, because the reasoning departs from a given technology, and the technology constitutes a materiality designed to specifically capture certain properties and relationships, its potential to act as a surrogate will be constrained by its material constitution. This constitution will, in turn, have been chosen for how useful it is to respond to specific research questions, and it is usually necessary to investigate the underlying justification for the uses of the technology in order to understand what it can be a surrogate for. Failing to do so will trigger constant epistemic misalignments, seriously committing the justification of scientific research. Therefore, it follows that in technology-driven surrogate reasoning, the potential of a specific tool to act as a vehicle or surrogate to draw inferences about its target does not only depend on the user’s intentions, but rather on the specific properties of the technology, and the extent to which these very properties match the properties that new research questions demand.

6 Conclusion

In this paper, we have argued that the fact that something plays the role of a surrogate for making inferences about a specific scientific hypothesis does not depend exclusively on the user’s intentions, at least in technology-driven research. On the contrary, the properties optimally captured by the technology impose serious epistemic constraints, potentially triggering a disconnection between the research question being asked and the type of evidence that the technology provides. We have referred to the situations when this happens as cases of epistemic misalignment, which is any situation in which the information provided by the apparent surrogate fails to confirm the hypotheses despite apparently positive results, general acceptance of the evidence by the community, and the scientist’s intention to use it as a confirmatory piece of evidence for the hypothesis. We have argued that, in cases of epistemic misalignment, the technology being used does not act as a surrogate at all, and no surrogative reasoning is carried out.

To illustrate our thesis, we have analysed the use of 16S rRNA as a sequencing technology used for surrogate reasoning in microbiome research by embedding it under the theoretical frameworks of research repertoires (Ankeny & Leonelli, 2016) and the logic of research questions (Lloyd, 2015). The repertoire structure of microbiome science imposes the necessity of an alignment between different epistemic and non-epistemic elements, including an alignment between the methods used for surrogate reasoning and the type of questions being asked in microbiome science. The logic of research questions provides a basis for understanding the conditions in which a real alignment between the logic imposed by a research question and the methods used to gather the evidence to track it can emerge. We have shown that the logic imposed by most of the research questions asked in microbiome research is such that 16S rRNA is not a very useful tool to collect the evidence required to answer most of these questions, despite the omnipresence of the technology in most microbiome studies, and its wide acceptance by the community. Drawing on this, we have argued that the role of 16S rRNA as a surrogate in today’s microbiome science is limited by the internal constraints that the technology imposes, as determined by the type of biological properties of 16S rRNA that is studied in that type of sequencing—concretely, the study of specific patterns of the evolutionary tempo. In that vein, 16S rRNA is not a surrogate for microbiome research, regardless of the scientists’ intentions of using it as such.

A secondary lesson deriving from this work is the necessity of knowing the history of technology in technology-driven surrogate reasoning. In most cases, the properties that a technology captures will be constrained by the research questions that it was developed to analyse. In this vein, the potential for re-using the technology to answer different types of research questions is limited by the previous research questions that it was devised to answer. Failure to appreciate this may lead to scientific failure, to the point that a tool that is used as an apparent surrogate is not a surrogate after all. In technology-driven surrogates, knowing the properties that the technology captures is thus essential to determine when the technology is surrogate and when it is not, regardless of scientists’ intentions to use it in surrogative reasoning.