1 Introduction

We all rely on science in our daily lives. While some scientific results are well-established, others are, at a given point in time, debated, and there is not yet a consensus on their validity.

Through education, we must all learn about science: some major well-established results, some more contested results, and also, very importantly, we must learn about the methodology and social nature of science (Allchin, 2013). Trivially, not everyone needs to learn the same. Those who are being trained to do science or work academically in some way, as doctors or academic employees at government institutions, for instance, should learn so much about the results, nature, and methodology of science that they are able to contribute to the creation of scientific work and assess the validity of work in their field done by others.

Those who are not being trained to do science should still get an understanding of central results, as well as the nature and methodology of science. This is partly because this understanding is central to being an informed citizen in functional democracy. In particular, we all need to assess which sources claiming to present reliable scientific knowledge should be trusted and which should not (Osborne et al., 2022).

Models—the nature of which is discussed in Section 2.2—are central to scientific investigation. Within science studies, the importance of models has been recognized for decades (Frigg & Hartmann, 2018). Whether scientists are predicting the future, solving a concrete present problem or developing and testing a new theory, they are building and using models to do so.

Developing an understanding of models and modelling is therefore also central to learning to do science and becoming literate citizen (Matthews, 2007). This is now widely recognized by central actors in higher education. For instance, the Next Generation Science Standards (NGSS Lead States, 2013) guiding science teaching in the USA have systems and system models as one of the four central concept pairs that should guide all K-12 science education. The National Science Teacher Association (NSTA) backs this up by emphasizing that by the time students finish high school, they should have a good understanding of models and their role in explaining natural phenomena (NSTA, 2020). We find a similar emphasis on models and modelling in the science curricula across Europe (e.g., in England (Department of Education, 2014) and Denmark (Ministry of Education, 2017)). Furthermore, the OECD (2019) emphasizes numeracy among the core skills to develop in education which, on their definition, includes skills in and understanding of mathematical modelling (p. 49).

What students should learn about models and modelling also differs depending on whether they are being trained to do science or as “competent outsiders” (Osborne et al., 2022). Osborne et al. (2022) argue that even the competent outsider needs to have some understanding of the nature of models as simplified representations of certain aspects of the world and the uncertainty that comes with that.

In addition, those who are being trained to do academic work—including university undergraduate and graduate students—need a deeper understanding. The academic worker will often have to at least assess, if not develop, the arguments within a scientific source from their field. This means also being able to identify the central parts of the argument and assess whether they are convincing. This in turn requires a deeper understanding of the role of models in scientific testing and argumentation.

The functions and importance of models in science can, and should, be taught in different ways. One valuable approach is introducing students to general models of how science works and using them to aid reflection on the students’ own practice, their encounters with science in their daily lives, or detailed case studies (Allchin, 2013; Allchin et al., 2014). This is especially true, if the models of science are based on careful historical and philosophical analysis of scientific practice. A number of good models exist. Caldwell et al., for instance, developed the portal Understanding Science (https://undsci.berkeley.edu/index.php (visited Jan. 2023)) which includes a general and nuanced model of the development of a scientific result. Similar to other more simplified models, this model, however, largely abstracts away from the use of models in scientific testing. This is not helpful, if our aim is to develop and understanding of models and modelling in science. The model presented by Giere et al. in their book Understanding Scientific Reasoning (Giere et al., 2006, Ch. 2), on the other hand, addresses models in science explicitly. The model, described in detail in Section 2, is based on detailed historical case studies and careful philosophical analysis (Giere, 1988, 2004). We refer to it as the Giere model in this paper. The Giere model describes how a scientific result can be established by testing a hypothesis that describes a similarity between a model (or set of models) and an aspect of “the real world”. The test involves collecting data and making predictions in order to argue for the truth of the hypothesis.

This paper takes the Giere model as a valuable starting point for teaching aiming to train graduate and undergraduate science students in reflecting on the nature of science (NOS) and the reliability of results of science in the making and aims to develop its potential even further. More specifically, this paper aims to introduce some small but important additions to the Giere model in order to do the following:

  1. 1.

    Increase the applicability of the Giere model within its original domain: science aiming to construct new knowledge about nature (Section 3).

  2. 2.

    Adapt the Giere model to represent the key elements in a result of scientific design, thus providing a model of another important type of scientific result (Section 4).

These aims will mainly be pursued through philosophical analysis, but (historical) cases and practical experiences will also be drawn upon. Before any modifications can be suggested, however, the original Giere model must be introduced in some detail.

2 The Giere Model

The Giere model (shown in Fig. 1) is a model of how knowledge about the workings of the world can be gained through models. It is therefore a kind of metamodel. Like all models, the Giere model represents its target in a simplified way, focusing on specific aspects while ignoring others (details in Section 2.1). One thing that the Giere model does not aim to capture in detail is the process of constructing models and generating and testing scientific hypotheses. Thus, if we seek a manual of doing science (a general scientific method), the Giere model is not what we are looking for (but the model by Caldwell et al. mentioned in the Introduction might be). Rather, the Giere model is valuable if we are looking for a tool for understanding and evaluating the validity of scientific results. Accordingly, Giere et al. introduce their model as “a framework for understanding and evaluating a wide range of scientific cases” (Giere et al., 2006, p. 11). This quote also implies that Giere et al. view their model as widely but not universally applicable. The model is based on the assumption that scientists “insofar as they participate in science as an institution, […] are engaged in exploring how the world works” (ibid, p. 19, emphasis in the original). This is arguably an approximation (cf. Section 3), which partly explains why the model does not have universal applicability. This basic assumption suggests that there is a world that all scientists can study—referred to as “the real world”—and that when they have reached a result, they claim to have learned something about how this real world works. The aim of the Giere model is then to help the user understand how scientists can claim that they have learned something about how the world works and whether their claim is justified.

Fig. 1
figure 1

Adapted from Fig. 2.9 in Giere et al. (2006)

The original Giere model.

In line with most accounts of scientific methodology, the Giere model is based on the idea that scientists learn about the world by formulating and testing hypotheses, which, very generally, are sentences describing what the world is expected to be like. Many standard accounts of scientific method will start with an attempt to characterise hypotheses that are sufficiently testable and interesting to be deemed “scientific” and go on to discuss how these hypotheses can be tested through comparisons with data. While not overlooking the importance of these issues, Giere and et al. take a different approach. They start from scientific practice and consider the elements that are commonly used to generate and test hypotheses and when such tests can be considered successful. Using historical cases they argue that—at least in the natural sciences—scientists commonly generate and justify hypotheses via models.Footnote 1 To understand how a scientific result can be justified, we thus need to understand the nature and use of models in science.

2.1 Models as Representations

The aspects of the world studied by scientists are often difficult to study directly. While atoms and molecules are too small, other things—like the universe as a whole—are too big, and others again are too far away in space–time, e.g. distant stars or the Cambrian explosion. Therefore, scientists often construct manageable representations of the aspects of the world that they study. In brief, a representation is anything used by an agent to represent something else (Giere, 2004). A representation can be a physical object (e.g. a scale model) or an abstract object (e.g. an equation or an imagined scenario). Representations are often chosen or deliberately constructed to be similar in certain respects and to a certain degree of precision to the thing that they represent. Furthermore, representations in science are often constructed to be more manageable than what they represent, e.g. by being either a more convenient size, simpler, or more accessible for our theoretical and physical tools. For instance, if we are interested in understanding how predator and prey populations depend on each other in a given area, it is convenient to represent these populations with mathematical variables, as this means we are able to apply the vast and varied toolbox of mathematics to the problem. This combination of similarity and manageability means that some of the difficulties related to studying a specific aspect of the real world can be overcome by studying it though a representation.

Models are examples of what scientists use as representations. The precise distinction between models and other types of representations is not of particular relevance here, but there is a wide range of objects (physical and abstract) that scientists call models. Important examples include physical objects, like the model of DNA that Watson and Crick constructed from metal plates and wires (discussed in detail in Giere et al., 2006), as well as equations, graphs, images, and more.

Models can be divided into different types (discussed in detail in Giere et al., 2006, Sec. 2.3). Examples include scale models, which are physical objects that are similar to the modelled object in terms of the size of the modelled aspects of the two objects. Therefore, although there are many similarities between a house and a scale model of it (number of windows, colour, etc.), it is the fact that they have the same ratio, say 1:10, which makes it a scale model. Theoretical models are complex abstract objects. Important scientific examples include Bohr’s model of the hydrogen atom, the predator–prey model mentioned earlier, and the lock-and-key model of enzyme activity. Theoretical models can be described using words, e.g. in textbooks or research articles (different sources may describe the same model using different words) or by pointing to physical objects with similar features (Giere et al., 2006). It is widely recognised that models—including theoretical models—are extremely important in scientific practice (Frigg & Hartmann, 2018), although the specific ontology of theoretical models is debated (French, 2010; Frigg & Hartmann, 2018). Giere developed an account of the ontology of theoretical models as abstract objects (2004), and the introduction to the Giere model is based on this (Giere et al., 2006, p. 24). However, the Giere model does not hinge on this specific account. The fundamental point is that models are representations, and as such, they are both similar to and different from what they represent, and both the similarities and differences make the models valuable to scientists.

One fundamental way in which a model may differ from what it represents is that the real-world object or process that it is intended to represent does not actually exist. Famous examples include the various intricate models of the luminiferous aether or caloric flow constructed by physicists in the nineteenth century, as well as more recent biological models of the functions of mesosomes in living cells. Such models may still be useful as a representation of other aspects of the real world, but not for the reasons that the creators of the model had intended. Models may also be deliberately constructed to represent objects or processes that do not currently exist in order to investigate counterfactual or future scenarios such as super-volcanic eruptions, or the behaviour of hitherto unseen phenomena resulting from climate change or other drastic changes to the existing environment.

Given that models are objects, it seems no more meaningful to speak of them as being true than it does to speak of other concrete or abstract objects as true, e.g. rabbits and numbers.Footnote 2 However, we can say things about models that can be true or false. In particular, we can formulate hypotheses describing ways in which we expect a model to be similar to the real world and the precision with which we expect this similarity to hold. A core claim by Giere et al. is that this is exactly what scientists often do when they propose hypotheses.

2.2 Testing Hypotheses Using Models

Scientists spend a lot of time and effort developing models and new hypotheses, and it is not uncommon to find research papers that present a new model or a new hypothesis and the main “result”. Such purely explorative research (Steinle, 2002) is very important for the advancement of science as a whole, but it is arguably only a step on the way towards establishing an understanding of how the world works. A cornerstone of science is that hypotheses are not just proposed; they are also tested. It is the outcome of this process that is in focus in the Giere model. It assumes that the model and hypothesis are in place and asks how scientists can claim that the hypothesis is justified, thus helping us assess whether this is in fact the case in specific cases.

To test a hypothesis describing a similarity between a model and an aspect of the real world, it is necessary to compare the model and the relevant aspect of the real world. It is likely, however, that the model was actually constructed because the relevant aspect of the real world is not readily accessible (cf. Section 2.1). We therefore need to find an indirect way of testing the hypothesis.

With sufficient ingenuity, scientists can construct experimental or observational setups that allow them to interact with the relevant aspect of the real world in a way that yields reliable and relevant information that we call data. These data must be compared to the hypothesis in order to evaluate its truth value. However, it can be challenging to make this comparison as the hypothesis does not necessarily say anything about the data.

The hypothesis specifies a similarity between a model and an aspect of the real world. It could, for instance, be a hypothesis about the formation of the Cosmic Microwave Background (CMB) derived from a complicated theoretical model in cosmology. Such a hypothesis cannot be compared directly to data obtained billions of years later in a tunnel under the surface of a planet that did not—and could not—exist at the time of the formation of the CMB. The solution is to compare the data with a description of what we—given the model, our specific knowledge of the experimental setup, and general knowledge of how the world works—would expect the data to look like assuming that the hypothesis is true. In other words, the solution is to derive a prediction from a combination of the hypothesis, the model and our background knowledge. This is sometimes called a hypothetico-deductive approach.

One of the main advantages of the Giere model is that it highlights the complexity in comparing hypotheses and data. Whereas other meta-models simplifies it down to a direct comparison between hypothesis and data (cf. Introduction), the Giere model makes explicit, the complicated steps that go into this process. Recall that an important aim of the model is to serve as a tool for evaluating the reliability of a specific result. To this end, the model should guide the user towards potential weaknesses in the argumentation for a specific result. For many people, it is natural to consider possible errors in the generation or collection of data, and although this is important, other parts of the process also require scrutiny. Deriving a prediction from a hypothesis often requires complicated reasoning and calculation. Going back to the CMB example, deriving a prediction of what the data should look like in order to match the hypothesis requires not only a detailed understanding of the cosmological model but also of the specific experimental set up where particles are scattered to form characteristic patterns that require complicated calculations to predict. This process is also open to errorsFootnote 3 that can compromise the reliability of the final result. Evaluating such potential errors and ambiguities is an important aspect of evaluating a scientific result.

Once a sufficiently specific prediction has been derived and the data have been collected, they can be compared as part of the process of testing the hypothesis. Such a comparison can have at least three possible outcomes: the data and prediction either do or do not match, or the outcome is unclear (Giere et al., 2006, p. 37). The latter outcome is common. Even when evaluating a published scientific result, a common conclusion is that, contrary to the authors’ claims, the outcome of the test was not clear and further testing is needed before a clear conclusion about the hypothesis can be drawn.

If there is a clear mismatch between prediction and data, it is tempting to conclude that the hypothesis is false. However, this conclusion does not follow. The possibility of a “false negative” result must be considered. False negatives can arise in two different ways. Firstly, the mismatch could be due to errors in the data. If the data do not provide sufficient and reliable information about the relevant aspect of the real world, then we cannot use it to conclude anything about a relationship between a model and the real world. Secondly, the mismatch could be due to an error introduced when deriving the prediction. In the example with the CMB, it is not difficult to imagine that the complexity of the experimental setup resulted in an important factor being overlooked, thus resulting in an error in the predicted scattering pattern. If we are to argue convincingly that a given hypothesis is false based on a mismatch between data and prediction, we must argue, at least, that the mismatch was not due to an error in the data nor in the derivation of the prediction.

A match between the data and prediction can also be due to errors, i.e. a false positive result, but what if we can argue convincingly that this is not the case? Can we then conclude that we have good reason to believe that the hypothesis specifying a similarity between a specific model and an aspect of the real world is true? The Giere model demonstrates why this is not the case, again by virtue of highlighting the complexity in comparing a hypothesis with data.

Since we do not compare hypotheses directly to data, but indirectly through a prediction, there is a risk that a match of equal quality could be found between the data and a prediction derived from a different model. In this case, it is not possible to distinguish between the two models based on the data at hand. There are many historic examples of scientific controversies, where scientists over significant periods of time were able to explain all relevant data equally well using very different models and were thus unable to decide which model to favour (examples in Giere et al., 2006; Collins & Pinch, 2003; Kuhn, 2012).

The possibility of viable alternative models should not be considered only in the evaluation of a given hypothesis. It is also a frustrating possibility for students hoping for certainty, and a vivid reminder that scientific knowledge is always tentative. In teaching, this presents an opportunity to challenge the more philosophically minded students to consider the nature of theoretical virtues like simplicity, fruitfulness, and generality, which play an important role when deciding whether or not a model is worth taking seriously (Kuhn, 2012; Schindler, 2018).

In summary, the Giere model is a rich yet simple model of the core elements of a scientific result and it highlights some central aspects of common scientific argumentation that are often overlooked in general accounts of science—including the complexity of comparing hypotheses and data.

3 Refining the Giere Model

The author has used the Giere model in multiple courses on NOS for graduate and undergraduate students from a broad range of educational programmes. These programmes include core science subjects like physics and chemistry, as well as fields like biotechnology, biomedicine, and even agricultural economics.Footnote 4

After introducing the Giere model, an obvious exercise suggested by Giere and colleagues is to challenge the students to apply it to results of science in the making. Undergraduate students practicing critical reading and evaluation of research papers may use the model to structure their analysis and find weak spots in the arguments. Graduate students may use the model on their own research project in order to gain a better overview and improve their argumentation. Having challenged well over 300 students from various programmes to such an exercise, the author has noticed (but not systematically investigated) an emerging pattern in the questions they ask and the difficulties they face when applying the model. Two questions in particular are of interest to the present discussion as they point to potential improvements in the applicability of the Giere model.

The first question is: what goes in the “real world” box? (see Fig. 1). The problem here is not that students find the idea of a “real world” suspicious. Although some philosophers may quibble with it, science students generally have little trouble with the idea that there is a world “out there” that exists independently of the intentions and beliefs of the individual researcher.Footnote 5 We can seek, though not necessarily gain, true knowledge of this reality and reality can constrain our models via the data we collect. Although science students generally accept this, they sometimes have trouble identifying the specific aspect of “the real world” that is being studied in a specific case. Part of the problem here seems to be that the distinction between data and the phenomenon from which the data are generated is not as sharp as sometimes described (see Section 3.1). Another issue is that many scientific results do not represent new knowledge about a pre-existing real world. As Giere et al. are well aware (cf. Section 2), the Giere model only applies to some scientific results. We have already seen (in Section 2.1) that it does not capture results of purely explorative research. In addition, the results of design processes are not captured by the model. This does not necessarily pose a problem for the Giere model as it was not intended to capture such results, but it does mean that additional models are needed to fully cover the diversity of scientific results. It is beyond the scope of this paper to develop a model of explorative research (see Orozco et al. (2022), for an outline of such a research programme), but Section 4 presents a model of the results of scientific design that has been adapted from the Giere model.

Another question that students often have when applying the Giere model is: what exactly are the data? Data are generally defined by Giere and colleagues as “all the special information that may be directly relevant to deciding whether the model in question does fit” (Giere et al., 2006, p. 27). Two further constraints are placed on this information for it to qualify as reliable data: it must be “reliably detected” (ibid.) and “obtained through a process of physical interaction with the part of the real world under investigation” (ibid.), which we can call measurements. The latter constraint is reflected in Fig. 1 as the data box is connected directly to the real-world box via the arrow representing experimentation/observation. In addition to this explicit definition, data are also characterised by the function they serve in the Giere model, namely as something to which the prediction is compared when testing a hypothesis. Understandably, this dual characterisation of data as part information coming (directly?) from measurements and part something to which a prediction is compared tends to confuse students. The confusion arises because in practice, predictions are rarely compared directly to the outcomes of measurements. In both philosophical and scientific terminology, the outcomes of measurements are referred to as raw data. Raw data need to be analysed before they can be used in a reliable test of a hypothesis. Data analysis is thus a central aspect of any empirical science, and the reliability of a scientific result depends heavily on the quality of the data analysis. Giere et al. are of course aware of this but have chosen not to include this aspect in their model, perhaps to preserve its simplicity, which is often a desirable trait. However, it seems that is in this case the model’s simplicity causes confusion. Furthermore, the data analysis is one of the central components of an argument for a scientific result. Neglecting it therefore seems detrimental to the applicability of a model intended to help the user evaluate the credibility of a scientific result. Together, these observations therefore suggest that the original Giere model is too simple—at least if the applicability can be improved by adding a few details.

3.1 Raw Data and Data Models

To distinguish clearly between raw and analysed data, it is valuable to introduce the distinction between data and phenomena to the Giere model. The distinction, which is now widely used in philosophy of science, was systematically described by Bogen and Woodward (1988) (see also Woodward, 2011). Bogen and Woodward point out that scientists are primarily (but not exclusively) interested in understanding aspects of the real world that at least in principle can reoccur under different circumstances—what they call phenomena. For instance, Watson and Crick were interested in the structure of DNA as a general phenomenon, not the particular structure of a particular DNA molecule at a specific place and time. To be more precise and without any significant loss of applicability, we can therefore say that the aspects of “the real world” that scientists study are actually phenomena. The models they construct of these are thus models of phenomena. Performing measurements on phenomena through experiments or observations yields raw data, e.g. numbers read from a detector, splashes of colour on a screen or pictures of cells under a microscope.Footnote 6 Raw data need to be analysed before scientists can compare them to a prediction. There are at least two reasons for this: one is that the individual measurements must be assessed for reliability and precision. If the raw data include data points that are erroneous for a known reason, e.g. a human error or a fault in the equipment, it is important to address this. Likewise, any noise in the data, e.g. background radiation in measurements of nuclear decay, must also be addressed. In addition to “cleaning up” the data, analysis is often required to identify trends in the data. Scientists are rarely interested in the outcome of a single measurement. Rather, they are interested in either changes between measurements e.g. before and after an intervention, or in the more general relationship between a dependent and an independent variable. Such dependencies are not evident from individual measurements but can be identified through analyses on larger datasets. The result of a data analysis is often one or more representations of the most relevant aspects of the raw data, e.g. in the form of a graph or an image. In practice, it is often these models of data that are used when comparing the data and prediction. Introducing these modifications to the Giere model yields the following visual (compare to Fig. 1).

Although slightly more complicated than the original, this modified version of the Giere model is less ambiguous when it comes to identifying the contents of each box, and it highlights that developing a data model involves both data collection and data analysis—both of which can be highly complex processes with ample room for error. This is a significant aspect to highlight in a model designed to aid the user in assessing the reliability of a scientific result.

3.2 Collecting Data from Models

In Section 2, the complexity of comparing hypotheses to data was illustrated using a (fictional) example of a hypothesis about the formation of the CMB in the early universe being compared to data obtained from CERN. Upon closer consideration, it is difficult to fit this example into any of the versions of the Giere model presented so far. The problem lies in “the real world” box in Fig. 1 and “phenomenon” box in Fig. 2. In both cases, the model implies that we obtain data from the phenomenon/aspect of the real world to which the hypothesis relates. This is certainly true in many cases, but in other cases, the picture seems much complex. Is the phenomenon created at CERN really exactly the same as the one instantiated billions of years ago, the traces of which we now measure as the CMB? Perhaps in this case, it is more like the common life science practice of using model organisms to test hypotheses.

Fig. 2
figure 2

The refined version of the Giere model. This version highlights that predictions are generally not compared directly to raw data, but rather to a data model generated through data analysis. When assessing the validity of a scientific result, it is important to consider the potential for error in both the collection of raw data and in the subsequent data analysis

Rather than collecting data directly from the phenomenon of interest, it is very common in some areas of science to collect data from model systems for practical and ethical reasons. Such systems include model organisms like fruit flies or E. coli, but also in vitro systems and scale models. In such cases, we need to add an additional layer to the original Giere model in order to make it fit. Schematically, it will look like this:

The validity of data collected from model systems hinges on there being a sufficient and relevant similarity between the phenomenon created in the model system and the phenomenon of interest. In practice, it is not a strict requirement that the phenomena are identical. Complex phenomena like diseases will often lead to different phenotypic symptoms in humans and the animals used to study them, but there may still be significant value in studying e.g. the underlying mechanisms leading to a disease using animal models.

In some cases, it can be controversial whether a given model system is in fact relevantly and sufficiently similar to the target system to justify its use in testing a given hypothesis. Recent examples include toxicity studies of chemicals like coumarin (found e.g. in cinnamon) and the artificial sweetener aspartame. In both cases, the current Acceptable Daily Intake (ADI) levels have been queried based on studies indicating human toxicity with an intake below the ADI. However, the European Food Safety Authority (EFSA) questioned the relevance of these studies because they were performed on model organisms (rats and rabbits; for details on coumarin see EFSA AFC Panel, 2008; for aspartame see EFSA ANS Panel, 2013). The arguments for lowering the ADI were based on the assumption that the animals in question metabolised the specific chemicals in a way that was sufficiently similar to humans as to make the results transferable. However, this assumption is questionable, as there is no indication of toxicity at similar intakes in the available human data. Conversely, other studies show that there are significant differences in the way humans and other animals metabolise coumarin and aspartame, and the animal studies therefore do not in themselves justify the hypothesis that these chemicals are toxic to humans, even if they do justify the hypothesis that they are toxic to the model organisms.

Again, it is important that a model designed to help the user evaluate the validity of a scientific result highlights the key points in the argumentation for that result. In cases where hypotheses are tested using data collected from model systems, the assumed similarity between the phenomenon of interest and the phenomenon generated in the model system is one of the important issues that must be considered in the evaluation. The version of the Giere model shown in Fig. 3 highlights this additional element, which is left out of the more general versions.

Fig. 3
figure 3

A special case of the general refined Giere model (Fig. 2) representing the key elements in a research result where a hypothesis is tested using data obtained from model systems. In such cases, relevant similarities between the phenomenon of interest and the phenomenon generated in the model system are assumed. However, further testing may reveal that these assumed similarities are not actually present

In the toxicology studies mentioned above, it was possible to empirically test the assumption that the phenomenon of interest—human metabolism of a specific chemical—and the phenomenon generated in the model system—rat metabolism of the same chemical—are sufficiently similar, because we can actually take relevant measurements of human metabolism, although this is more challenging than performing similar measurements on animals. In other cases, it is almost impossible to make such direct empirical comparisons between a phenomenon of interest and phenomena created in a model system. Testing the hypothesis about the formation of the CMB using a model system at CERN illustrates this. In this case, it may be extremely difficult to compare the conditions in the early universe directly to those created at CERN in order to assess whether they are in fact relevantly and sufficiently similar, simply because it can be extremely challenging to obtain relevant data on the conditions in the early universe. In such cases, the validity of the assumption that the phenomenon created at CERN is relevant for testing a hypothesis about the formation of the CMB must rely more heavily on theoretical arguments referring to models of the conditions in the early universe—perhaps including the model associated with the hypothesis. This can of course introduce uncertainty and a risk of vicious circularity, where the truth of the hypothesis has to be assumed in order to test it. However, it is not necessarily problematic, as the assumption may be justified when referring to other aspects of the model that have previously been independently tested to a satisfactory degree.

4 What About Design?

So far, the aim of this paper has been to improve the Giere model under its original aim and scope—understanding and assessing scientific findings about how the world works. However, as noted in Section 2, not all research in the natural sciences aims to understand how the world works. Many scientists are engaged in creating e.g. new and better treatments for diseases, genetically modified organisms with special properties, new and better instruments, or new chemical compounds with interesting properties. In other words, many scientists are engaged in designing new things and testing them. To gain a broader understanding of scientific results, it is therefore valuable to have models of the key elements of a result of scientific design analogous to those used to understand and evaluate the findings about how the world works. Although the Giere model was not designed to be applicable to the results of design processes, I will argue that it can be adapted to fit this type of result as well through the refinements introduced in Section 3.

What we seek is a model that can help the user understand and evaluate certain knowledge claims, but in contrast to the knowledge claims captured by the Giere model related to how the world works, we seek a model that captures knowledge claims about designs of products that do not exist yet. This makes it difficult to apply the original model as there is nothing that we can meaningfully put into the “real world”/phenomenon box, suggesting that in such cases, there are important differences in the ways in which scientists can argue for the validity of a result of scientific design and a result from standard research on existing phenomena. On the other hand, we also saw in Section 2 that models used to understand how the world works may represent non-existent objects and/or phenomena that have not yet occurred, so we would also expect there to be significant similarities in arguments supporting knowledge claims about how the world works and knowledge claims about a new design.

To explore this issue further, we will first look at a specific case of a design that has been tested, adapted and eventually shown to work in the ways intended.

4.1 Testing a Design: The Transistor

By the end of World War II, engineers had designs for early computers and new advanced telephone systems, but only limited success in translating them into functioning technology. A major obstacle was the vacuum tubes used as amplifiers in these devices. The use of vacuum tubes meant that early computers were very large and consumed a lot of power. In addition, vacuum tubes are not always very reliable, and technology that uses many vacuum tubes is likely to be even less reliable. In 1948, however, John Bardeen and Walter Brattain of Bell Labs published a letter to the editors of Physical Review entitled The transistor, a semi-conductor triode (Bardeen & Brattain, 1948). The authors claimed that the device described in the paper, “may be employed as an amplifier, oscillator, and for other purposes for which vacuum tubes are ordinarily used” (ibid, p. 230). Furthermore, it quickly became apparent that transistors could be made smaller, less power consuming and more reliable than a vacuum tube. As we now know, the transistor was revolutionary in the development of new electronic devices.

Although a similar device was constructed the same year in France by Herbert Mataré and Heinrich Welker (Williams, 2017), it was Bell Labs’ transistors that became widely known as the world’s first functioning transistors, and Bardeen and Brattain shared the 1956 Nobel Prize in physics with their colleague William Shockley “for their researches on semiconductors and their discovery of the transistor effect” (Nobel Committee for Physics, 1956).

A detailed history of the invention of the first transistors is presented in Riordan and Hoddeson (1997). Here, we only consider the process of showing that the transistor described in Bardeen and Brattain’s letter actually works in the way they thought it would.

Shockley had the idea of a semiconductor amplifier as early as 1939, but the systematic work of realising it only began after the war. The first attempts to make a prototype all failed. In an attempt to understand why, Bardeen developed a theoretical model describing electrons getting caught in the “surface states” on a piece of semiconductor. This model turned out to be very promising.

By mid-November 1947, the group had made such progress in understanding surface states that they were able to neutralise them. Armed with this new understanding, Bardeen and Brattain again attempted to make a prototype. The aim was to construct a device that could substantially amplify an AC signal with a frequency between 10 and 20,000 Hz. After testing several different prototypes that all fell short of this aim in some way, Barden and Brattain finally succeeded in December 1947, when they tested their now famous transistor for the first time and demonstrated that it could produce significant amplification at 1000 Hz. By Christmas, it had been modified so that it also worked as an oscillator.

At this point, the group had a device that could work as an amplifier and oscillator, but it was relatively large, impractical, and unstable and “had a long way to go before it could even begin to replace vacuum tubes in electronic circuits” (Riordan & Hoddeson, 1997, p. 141). For that to happen, the prototype would have to be further developed into a small reliable device that could be mass produced. It would take an additional two years before Western Electric was able to start mass producing stable transistors (Williams, 2017).

Meanwhile, Bardeen and Brattain were eager to tell the world about their new device, but Bell Labs were even more eager to patent it. While the patent application was being prepared, work continued on improving the transistor. It turned out that a part of the first working prototype which Bardeen and Brattain had thought was essential was not essential at all. This suggested that the group still only had a very limited understanding of why their device worked the way it did.

The simplified design meant that developers had more freedom to adapt the original design for miniaturisation and mass production. A new version that was so small it could fit in a small metal cylinder was developed. These smaller transistors were stable and reliable enough for Bell Labs engineers to build a radio and a telephone repeater circuit completely devoid of vacuum tubes. However, these devices were still far from being reliable enough to use in standard electronic items. A member of the research group later recalled that “no two devices behaved the same”, and their performance was “apt to change if someone slammed the door” (Riordan & Hoddeson, 1997, p. 169).

In June 1948, the patent application covering the transistor was filed. The transistor could now be presented to the world. This happened on 15th July 1948, when Physical Review published two letters to the editor by Brattain and Bardeen: the one mentioned above, describing the design of the transistor (Bardeen & Brattain, 1948), and another presenting further theoretical results found in the design process (Brattain & Bardeen, 1948).

4.2 A Model of Knowledge Claims About Designs

How are we to understand the knowledge claims made by Bardeen and Brattain in their now classic letter that they had designed a device that (they know) can be “employed as an amplifier, oscillator, and for other purposes for which vacuum tubes are ordinarily used”? First of all, we should note that the claims are not made about any specific physical device, but about a design—a theoretical model of the essential features of a whole class of objects. The letter contains a schematic rendering of this design, yet no specific instantiations of it are depicted or described. The only references to physical transistors in the letter are to “an experimental semi-conductor triode” and other “units”. These prototypes were used to collect the data presented in support of the knowledge claims made about the design, but none of them are presented as anything more than a prototype. From the previous section, we know that there is a good reason for this: none of the transistors that had been constructed by the summer of 1948 were a fully suitable replacement for a vacuum tube in an ordinary electronic device, mainly because they were too unstable.

The letter does not therefore describe an existing functional physical device and a theoretical model of how and why it works. Rather, it presents a theoretical model of a device that is not yet fully developed and argues that it will have a certain combination of properties once fully developed. This case is not unique in this respect. It is common to distinguish between development and design, and many designs are never developed into actual products. Even so, reliable knowledge claims can be made about these designs for use in e.g. patent applications. This presents us with a challenge when we try to use the original Giere model to help us understand how and whether a knowledge claim about a design is justified.

In the original Giere model (Fig. 1), we test a hypothesis about some aspect of the real world by collecting data from experiments on and/or observations of the relevant aspect of the real world. This is not possible in the case of hypotheses about properties of designs that have not yet been developed into actual products, as there is no way to physically interact with the non-existent final product. We should therefore consider how the original Giere model can be modified to apply to knowledge claims about designs. The refinements of the Giere model introduced in Section 3.2 give us an indication of how we might proceed.

In Section 3.2, we discussed the use of model systems in research about how the world works. When a phenomenon of interest is practically inaccessible to our measurement methods, we may instead try to create a sufficiently and relevantly similar phenomenon in a model system and perform our measurements on this phenomenon. In a very similar way, designers often cannot take measurements on the final product they are working on, simply because it does not exist yet. Instead they may—like Brattain and Bardeen—construct a model system—i.e. a prototype—to use to test their design. This prototype need not be identical to the final product in every respect—in fact, prototypes rarely are. Some of the differences between the prototype and the final product are often known and intentional, as in the case of scale models. If executed properly, studies of these prototypes can teach researchers important things about the properties of the final product they are working towards (cf. Figure 4). The key to understanding how they were justified in making such knowledge claims is to understand the relationship between the prototypes, the theoretical models, and the actual final product.

Fig. 4
figure 4

The refined Giere model adapted to results of scientific design. Since the final product has not yet been developed, data must be collected from prototypes that are assumed to be relevantly and sufficiently similar to the final product. This assumption may be challenged, but it is difficult to test directly and empirically. The assessment will therefore often have to take into account theoretical considerations or knowledge of differences between the prototype and the final product

In Section 3.2, we saw that the relevant and sufficient similarity between a phenomenon of interest and a phenomenon generated in a model system is often simply assumed. If questioned, the assumed similarity can in some cases be tested empirically, but in other cases, this can be challenging and arguments for the validity of the assumption of similarity must be based more heavily on theoretical arguments. In the case of the transistor and new designs in general, the use of model systems also seems to be based on an assumption that these are relevantly and sufficiently similar to the final product to justify using them to test the design. By definition, this assumption is impossible to test empirically as the final product is yet to be developed. This presents a challenge to researchers, e.g. when they need to assess explanations for a mismatch between data collected from a prototype and a prediction derived from a design. Assuming that experimental error and errors in deriving the prediction can be reasonably ruled out, two possible explanations remain: (1) the hypothesis describing a similarity between the design and the potential final product is false and (2) the assumption that the prototype is relevantly and sufficiently similar to the potential final product is false. The question then is how to distinguish these two options when the latter cannot be empirically tested?

As we saw in Section 4.1, Bardeen, Brattain and their colleagues faced this challenge multiple times, both when their early prototypes failed to work as solid-state amplifiers, and when their later prototypes turned out to be highly unstable. Was this an indication that their prototypes were too crude? Or was it an indication of a flaw in their design? To answer these questions, Bardeen and Brattain relied on theoretical models of how semiconductors work and on their knowledge of the differences between their prototypes and the final products.

When their early prototypes did not amplify, they could not explain this based on the theoretical models available and the specific properties of the prototypes that they knew would not be carried over to the final product. This led them to consider the possibility that their theoretical model of semiconductors was inadequate, which in turn led Bardeen to develop his model of surface states, which turned out to be vital to the development of the transistor (cf. Section 4.1).

In contrast, when the later prototypes were unstable, the group did not suspect that the basic design was to blame. Although they did not know in any detail how the final product would look, they knew that it would be different from their prototypes and that progress had already been made in terms of stability. They therefore saw no good theoretical reason why this progress should not continue on to a satisfactory level of stability.

When Bardeen and Brattain claimed that instances of their general design could be employed for “purposes for which vacuum tubes are ordinarily used”, they did so because they knew—based on fairly standard scientific arguments that fit into the original Giere model—that their prototypes could work as amplifiers and oscillators and that they had a thoroughly tested model explaining why. This meant that they also knew that the properties they believed made the prototypes work would be present in any future products based on their design. In addition, although they did not know the exact extent to which their prototypes would be similar to the final product, they had reason to believe that the process of making the prototypes increasingly stable could be continued through the development process, and that instability in their prototypes was, to a significant extent, due to properties that would not be present in future instances of their design.

In summary, when assessing knowledge claims about designs, we can generally use many of the same tools as we use to evaluate knowledge claims about how the world works. More specifically, assessing knowledge claims about designs can be closely compared to assessing knowledge claims about how the world works, tested using phenomena generated in model systems that cannot be directly empirically compared to the phenomenon of interest (Figs. 3 and 4 are thus very similar, and this visual similarity helps illustrate the similarity between these processes and nonexperts). A central challenge is evaluating the assumption that the model system is relevant and sufficiently similar to what it represents to justify using it as a model in the given context when deprived of the ability to make a direct empirical comparison.

5 Conclusions

Encouraging students to reflect on the nature and reliability of results of science in the making is important to develop their ability scientific literacy and professional skills. To this end, it is valuable to have research based general models that can help scaffold the students’ reflections. The Giere model is an excellent example of a simple yet fruitful model that can help university science students understand and assess both their own research and research results presented to them through various media. This is partly because the Giere model highlights the importance of models in science, which in itself is important for the students to understand. However, the simplicity of the Giere model limits its applicability in two respects: firstly because it is based on the assumption that all science aims to find out how the world works and secondly because it ignores most of the complexity involved in collecting data from phenomena that are practically impossible to measure. This paper has sought to refine and adapt the Giere model to capture some of the complexity neglected in the original model in order to increase its applicability, while at the same time keeping it fairly simple. The resulting models can be used not only to understand and assess individual results of science in the making but also scaffold reflections and discussions on the diversity of scientific practice. Whether these refinements actually make the model easier to use for university students is an empirical question to be investigated in future studies.