The above gives a flavor of the complexity involved in HEP simulations. Before we can address the applicability of Winsberg’s hierarchical account to this study case, let us once more confront a terminological subtlety. Recall that Karaca distinguishes between models of instrumentation, of target phenomena, and hybrid models. However, the LHC was built to find and measure elementary particles like the Higgs or from physics beyond the SM. These are only to be expected at very high energies, i.e., as decay products from the hard process. Hence, most of the phenomena modeled in generators and detector simulations are not really the target phenomena of the (overall) simulation at all, i.e., not the phenomena of primary interest. They are of interest only for the purpose of being able to infer the presence and properties of sought-for particles from characteristic patterns in the detector. Call these secondary phenomena, in contrast to the (primary) phenomena truly targeted by HEP simulations.
Applying Winsberg’s hierarchy...with some ifs and buts
To see how Winsberg’s hierarchy can be plausibly applied in HEP, consider now the parton shower. The basis for any CS will here be approximate QCD results, namely branching probabilities and Sudakov factors. Their application as a model of the showering process will involve general physical modeling assumptions, such as the negligibility of higher terms in the strong coupling. Moreover, pragmatic choices of the concrete form of branching probabilities, the evolution variable, as well as its connection to the variable of the Sudakov factors, will create a plethora of different simulation models. Still, these are all within the scope of QCD results, and so resulting simulation models may be thought of as theory driven.
To accurately model initial state radiation (gluons emitted before the hard process), PDFs from global fits are needed for showers as well. However, unlike in the hard process, they should here be understood as (providing information on) initial data, since they will be solely used to achieve accurate normalization in the backwards evolution from the initial state (Seymour and Marx 2015, p. 297). A similar function attaches to the highest value of the Sudakov evolution variable, \(q^2\), gathered from one’s model of the hard process, as well as the lower cutoff, gathered from one’s knowledge of the detector resolution. Both of these deliver boundary values for the evolution. In this way, one obtains a dynamical model in Winsberg’s sense.
Resulting such models are then implemented in the form of a Markov chain. The approximation of the integrals involved in Sudakov factors means a discretization, and the assumption of (the very existence of) a cut-off for the evolution variable as induced by experimental constraints may be considered part of the ad hoc modeling assumptions mentioned by Winsberg.
This is a straightforward application of the steps involved in Winsberg’s hierarchy to a HEP example. However, taking a closer look, we can already see some limitations: Sudakov factors themselves already amount to a discretization of the probabilistic dynamics into (time) steps. This means that some amount of discretization already figures at the conceptually prior level of devising a simulation model, and that discretization and coding can fall apart. Moreover, somewhat ad hoc pragmatic considerations, such as the instrumentation-based definition of a cutoff or a choice of ordering (angular vs. momentum) of the evolution steps, will too feature already in the formation of the relevant simulation model. They do not occur only as part of the coding process or, more generally, in the process of devising a computational model. Hence, we suggest to adapt Winsberg’s hierarchy to the case of parton showers as displayed in Fig. 4.
These adaptations, like those discussed already in Sect. 2.2, are so far rather benign, but they do underscore how simulation modeling is not easily fit into a tight, preconceived format. Moreover, HEP simulations may also urge some less benign adaptations, as can be seen on the example of string models of hadronization.
String models of hadronization are generally based on an interpretation of the data retrieved from showering algorithms in terms of the behavior of a (semi-classical) stretchable, tearable string. Basically all string models are based also on a particular mathematical relation, the area-decay law, but the specific implementation of this law will ultimately depend on the precise assumed properties of the strings. Quite usually, these will be conceptualized in terms of diagrams and illustrations depicting the underlying physical intuitions in the modeling process, or even by means of qualitative textual descriptions. In so far as this is part of the concrete physics modeling, basic string models hence amount to models of phenomena.
However, only the inclusion of further theoretical knowledge will lead to workable models that can be translated into an algorithm. This will include all sorts of relations oriented on features of QCD, but not provide a fundamental treatment of hadronizing partons. Selecting, for instance, a precise form of the decay law by appeal to the relation between quark and hadron masses as given by theory (cf. Dissertori et al. 2003, p. 167), one retrieves an enriched, workable model that can serve as the basis for an implementable algorithm and is phenomenological in the sense established in Sect. 2.3: non-fundamental, not too closely connected in construction to either theory or observation, displays exploitable mathematical relations and includes fitable parameters. Unlike a proper model of the phenomena, it will be deprived of other conceptual devices, such as illustrations or descriptions.
Hence, it seems incorrect to say that we straightforwardly obtain a simulation model from QCD, together with a set of general physical modeling assumptions. Rather, the top level of our hierarchy should be populated by two entities here: the underlying theory, as well as a genuine model of (secondary) phenomena.
To retrieve from these a properly so called phenomenological simulation model requires taking into account also relevance criteria for the selection of particular elements of QCD, empirical (prior data sets and measurement results) and mathematical (combinatorics, probability theory) background knowledge, as well as estimates of parameters (e.g. \(\kappa \approx 1\)GeV/fm).
From thereon, Winsberg’s hierarchy continues: The free parameters of a given string model have to be fitted to data, and the entire model must be discretized and translated into computer code for the execution of a simulation. Interpreting the output, one obtains a model of further secondary phenomena: hadrons that are either stable or decay into ‘jets’.
The structure adequately representing the relations between models for CSs of hadronization is thus still hierarchical, as displayed in Fig. 5. However, in contrast to Winsberg’s proposal, theory is here not the only entity on the top of the hierarchy: the top level is populated by theory together with models of phenomena. The resulting hierarchy is thus poly- rather than autocratic: Several distinct entities jointly govern the subsequent modeling steps.
A similar diagram can be drawn for the hard process, but the joint governance will here be different. Recall that the PDFs, which could not be obtained from theory, constitute a fundamental ingredient for modeling the relevant scatterings in the hard process. Hence, it is not a model of the phenomena but rather the PDFs gathered from fitting to a large range of experimental data that, together with QCD and (in most cases) the assumption of a factorization into an elementary cross section plus PDFs, give rise to a simulation model (cf. Fig. 6). Since the existence of PDFs and perturbative calculations of elementary cross sections come directly from QCD, theory should still be considered the driving force. Everything else then repeats as above.
Quite different modifications of Winsberg’s hierarchy can be gathered from hybrid models in the detector simulation. Consider, for instance, models of nucleon-nucleon interactions. Here, one proceeds quite directly from existing data (from the so called SAID database; cf. GEANT Collaboration 2016, p. 255), interpolated in the most convenient way relative to one’s needs of fit and speed, to a basic simulation model. Geant4 references a linear interpolation in both the values of the cumulative distribution retrieved from the cross-section data as well as the energies. This is the simplest interpolation method available, and its use here is presumably connected to reduced computation times.
Because data points are being interpolated, it makes sense to speak of a (primitive) model at all, not just of a data set called in the overall simulation program. Since data are the driving force here, moreover, we may call this simulation model observation driven. From this, one then proceeds directly to a computational model whose output can ultimately be interpreted in terms of a model of the (secondary) phenomena occurring inside the detector (Fig. 7).Footnote 19 No dynamical model will be created; there simply is no model of the dynamics and the whole situation is effectively (though by no means explicitly) treated as one of instantaneous transition from interacting input nucleons to output particles. Thus, Winsberg’s hierarchy can be fitted to this example only with the omission of one intermediate step.
We admit that one may question the appropriateness of the term ‘simulation’ for this part, due to the term’s dynamical connotation (cf. Hartmann 1996). Moreover, simply sampling some distribution according to some MC prescription is obviously borderline between ‘simulation proper’ and ‘mere computation’. However, Hartmann (1996, p. 83) emphasizes in particular the dynamical character of the simulation step itself, and the execution of the sampling by a computer is, of course, still dynamic. In other words, one might equally take it that the sampling prescription contained in the computational model constitutes a ‘covert’ dynamical model, and that two steps have here been merged into one.
Consider also what it would mean to think of this part as not being a simulation. It would mean that, when a detector simulation is run and nucleon-nucleon interaction models are being called up, the simulation pauses to make room for a computation. This is highly counter-intuitive, and conveys another prima facie reason to consider the term ‘simulation’ appropriate in this context.
More importantly, though, whether the hard process contains a dynamical model properly so called is debatable on essentially the same grounds: one here too simply samples a differential cross section; even if provided by the theory in this case. Given that the matrix element models the probability of a transition between two well-defined momentum states, and given that the kinds of constraints recognized by Winsberg will have to be applied here in order to create individual events from this matrix element, we think that a dynamical model can be said to exist in the latter case. However, the main difference between both kinds of sampling thus largely resides in the (un)availability of an underlying theory.
This seems like an ill-posed criterion for distinguishing simulation from mere computation, and so the boundary between both examples, when it comes to their being simulations or not, is fuzzy rather than well-defined. If one disputes, in other words, that nucleon-nucleon interactions are properly simulated this immediately raises doubts about other parts being properly simulated as well. For these reasons, we think that nucleon-nucleon interactions remain a valid example of simulation without a proper (or at least overt) dynamical model.
A network of simulation models
The previous section demonstrates that even in the case of CSs used by large-scale HEP experiments such as ATLAS, hierarchical accounts can, with some modifications, be successfully applied. These modifications include (i) the replacement of theory by data on the highest level (cf. Fig. 7), (ii) the omission of intermediate steps such as a proper dynamical model (cf. ibid.), (iii) the distribution of crucial techniques such as discretization across several consecutive (re)modeling steps (cf. Fig. 4), and, most importantly, (iv) the weakening of autocratic hierarchical structures to polycratic ones, wherein (e.g.) theory and data figure jointly on the top level (cf. Figs. 5 and 6).
There are lessons to be drawn from this already. For instance, the fact that data alone can populate the top level of the hierarchy impairs Winsberg’s point that simulation modeling is “a case of theory articulation in the spirit of Kuhn.” However, we already conceded that Winsberg is concerned with the special case in which CSs are used to cope with the intractability of theory. It is hence rather an extension than a refutation of his account to say that simulation modeling can also be an exercise in data articulation: It can allow us to tickle out the information contained in a set of data, by embedding them into a larger context and ‘bringing them alive’.
Point (ii) was sufficiently addressed in Sect. 4.1, and there is no need to repeat the arguments here. Points (iii) and (iv), on the other hand, may be judged to reflect what Winsberg (1999, p. 266) calls the “motley character of simulation inferences”; that
our theoretical knowledge is just one of several ingredients that simulationists use to produce their results. All of these sources and their influences need to be considered when justifying simulation results. If the influences and possible pitfalls of each element are not properly understood and managed by the simulationist, they may potentially threaten the credibility of simulation results. Doing so, moreover, requires reliance upon an equally diverse range of sources of knowledge and skills. A great deal of this knowledge is not contained in the theoretical knowledge that formed the original basis for the simulation. (Winsberg 2001, p. S448)
As we can see, the inferences in question turn out to be even more motley than reflected in Winsberg’s account. Sometimes extra-theoretic knowledge will not just contribute to the articulation of some theory in terms of a CS: It may be so important as to co-contribute, together with theoretical results, to the very basis of a simulation model. Moreover, the same kind of extra-theoretic knowledge may have to be used and re-used at multiple junctions in the modeling process.
All of this really extends Winsberg’s account, instead of impairing it; but the examples discussed in the last section were all concerned with only one part of the overall simulation. MC simulations actually used in experimental procedures in HEP will typically make use of the whole palette, or otherwise patch the linkage between primary phenomenon and reconstructed data by means of calculations and subsidiary data. So what happens when everything is chained together?
Building on the discussion in Sec. 3, we have isolated a number of relations pertaining between different simulation models. The result is an outright network of simulation models, depicted in Fig. 8. It is clear that these constitute only a proper subset of all relations between the component models, so an even more complex network may be assumed in fact.
Given the details provided in Sect. 3, the diagram should be largely self-explanatory. Still, a few details are worth discussing. The diagram, for one, includes a number of deliberate simplifications: for instance, details of models going into the detector simulation are not resolved in the same way as those of the event generator. Similarly, partons created in the underling event will shower off further partons before contributing to hadronization, so the solid blue arrow going from underlying event to hadronization includes a ‘tacit’ parton shower.
Another issue is the apparent loop between the experimental constraints implied by the detector and models of the parton shower in generators with cluster models of hadronization, meditated by the influence of the cutoff-value on the mass spectrum of the clusters.
Per se, loops are nothing worrisome in simulation modeling. For instance, Lenhard (2016, p. 728) stresses that, when implementing a model on a computer, “iterations can be performed in great numbers and at high speed[...]. Modeling hence can proceed via an iterated feedback loop according to criteria of model performance.” Loops of this kind, as similarly recoginzed by Boge (2019), concern modifications of a single implemented, computational model, according to its performance in comparison to benchmarks. The loop discerned in the diagram, however, concerns a modificatory feedback between two different, coupled simulation models, prima facie independently of comparison to experiment. One might hence suspect a vicious circularity between the models here, ultimately leading to the models’ sanctioning themselves.
Fortunately, this would be mistaken: First off, similar constraints on the possibility of resolving two showered partons would be implied by any detector model. A given cutoff should hence not be interpreted as strictly reflecting a feature of the specific model of ATLAS’ geometric properties and material composition, but rather a generic feature that transpires from our best understanding of present day detector technology. Secondly, geometry and material models are fully empirical and only include minor ad hoc corrections for simulation artifacts (e.g. stuck tracks due to infinitely dense spacing). Instead of entering into a vicious circle, empirical information about the experimental setup thus effectively enters in two places: in defining the models used to simulate the detector (response) itself and in defining what sorts of information from the phenomena of (secondary) interest can possibly influence the observable behavior of the detector in response to the interaction.
Justification (i): holistic validation
What is the significance of our network for the EoS? Recall the basic purpose that Winsberg had originally assigned to his hierarchical account: Through (re)modeling steps that preserve some (though usually not all) of what is known about a targeted system, but are also creative enough to provide new kinds of access, we come to obtain genuinely new knowledge of the given target. Moreover, because of their largely non-deductive nature, the steps need to be executed in such a way that we can justify simulation results as providing genuine knowledge.
However, it appears that this justification is not possible in HEP CSs, for most of the individual hierarchies recognized above do not concern the target phenomena: the LHC was designed to find the Higgs, determine its properties as well as those of certain quarks, and possibly gather evidence of particles beyond those of the SM. These are all particles reliably created only in hard (high energetic, elementary) scattering processes. But only if it is known what happens inside the beam pipe beside an elementary scattering process, and only if one understands the role of the detector in creating the characteristic experimental signatures, can one infer the existence and properties of specific kinds of particles from these signatures.
In other words: We do not deny that it would be possible to draw some inferences from the simulation of, e.g., a hard process alone; but these inferences would not reach beyond the beam pipe. There would hence be no possibility of directly validating corresponding simulation results, as our empirical access is inevitably mediated by the experimental setup, including the detector.
Compare our findings to those of Lenhard and Winsberg (2010, 2011), who recognize a kludgy methodology and a messy overall structure to the simulation modeling in climate science. Salient features that Lenhard and Winsberg (2010, 2011) highlight for climate simulations are a fuzzy modularity, a generative entrenchment of the different modules, the inclusion of kludges, and, related to all this, a holism regarding validation.
For now, let us focus on the ‘fuzzy modularity’, by which Lenhard and Winsberg (2010, p. 256) mean the following: “normally, modules are thought to stand on their own. In this way, modularity should have the virtue of reducing complexity.” However, in contexts such as climate science, “the modules [...] are interdependent and therefore lack this virtue” (ibid.).
As we have seen, this is certainly also true to a large extent of HEP. The assumed factorization of the event, as well as the neat separation between events in the beam pipe and in the detector, have turned out to be necessary but ultimately idle simplifications: It is impossible to strictly separate all these different phases, as vividly illustrated in Fig. 8. Hence, a hierarchical account cannot possibly deliver the ways in which HEP researchers justify their simulation-driven inferences to target systems.
This holistic aspect is actually well-known to physicists. For instance, (Dissertori et al. 2003, p. 172; emph. added) write:
Monte Carlo event generators[...] inextricably couple the perturbative parton shower and non-perturbative hadronization model. Furthermore, both components contain free parameters which can be simultaneously tuned to the data. As a consequence, it is often difficult to correlate the inherent properties of the hadronization model with the quality of a Monte Carlo’s description of the data.
Another reference discussing the subtleties of validation in HEP in a philosophical context is Mättig (2019), who lays some emphasis on comparison of elements from generators and detector simulations to data taken across different experiments (i.e., other than those at the LHC). But Mättig (2019, p. 647) really only invokes these cross-checks as a means for “avoiding circularity”, and so does not touch on the holistic aspect.
As mentioned earlier, new releases will, in fact, be treated with much caution in the community, and not be applied in analyses until they have survived a whole host of benchmarking procedures over many years. But this of course means that all the (computational) models contained in the new release will have to undergo validation in concert, and possibly even in concert with other, already established models (e.g. in the case of special-purpose generators that cannot stand alone).
There is an important difference, however, between generators and detector simulation. After the hadronization stage, the remaining particles will all be known particles that can be produced and experimented on in contexts other than the LHC. Accordingly, the GEANT Validation PortalFootnote 20 includes the possibility to compare predictions from various models included in the simulation toolkit individually (and also in different versions) to reference data, produced in experiments that are relevantly similar to the respective conditions in the ATLAS detector. For example, thin foils of certain materials allow for a more direct assessment of the responses of individual materials used in the construction of ATLAS, because scattering hadrons and leptons on them produces data “without the effect of other processes like particle propagation or electromagnetic physics effects.” (Banerjee et al. 2011, p. 2)
In sum, a first result is that, while checks across experiments may safeguard at least against vicious circularities spoiling the possibility of validation, ‘fuzzy modularity’ is present in HEP as well, and it implies that justification by comparison to empirical data is in part possible only in a holistic fashion, much like in climate science. A difference arises between generators and detector simulation though, and this may somewhat lessen the impact of this kind of holism.
Justification (ii): model coherence
Another key difference between both fields lies in the details of how (and even why) models are coupled therein. In particular, the ‘generative entrenchment’ Lenhard and Winsberg (2010, 2011) recognize in climate science means
that climate models are, in interesting ways, products of their specific histories. Climate models are developed and adapted to specific sets of circumstances, and under specific sets of constraints, and their histories leave indelible and sometimes inscrutable imprints on these models. (Lenhard and Winsberg 2011, p. 116)
Certainly, there is some historical component to the development of HEP simulations as well and it does play a role in how simulation models and their implementations have played out. For instance, the GEANT detector simulation tool has been in continuous development since 1974, where it “initially emphasised tracking of a few particles per event through relatively simple detectors” (Brun et al. 1993, p. 5). Similarly, the need for matching and merging arose from developments in the ability to compute matrix elements to higher orders.
Various elements of detector simulations as well as such things as matching and merging algorithms, moreover, arguably fall under the category of kludges, held to be among the main reasons for entrenchment by Lenhard and Winsberg (2010, cf. p. 257).
By ‘kludge’, one means “an inelegant, ‘botched together’ piece of program; something functional but somehow messy and unsatisfying.” (Clark 1987, p. 278) One may dispute the ‘inelegance’ of matching and merging algorithms themselves, but it certainly seems unsatisfying that partons have to be identified and merged or removed ex post, and cannot be produced in a coherent fashion right away, when higher contributions from matrix elements are taken into account.
However, this is only part of the story in HEP, because many of the holism-promoting connections are established on purely theoretical grounds. For instance, asymptotic freedom implies that questions of hadronization can only play a role when the energy is sufficiently low, and so the dependence of the hadronization algorithm on the cutoff is at least partly theory-induced. Accordingly, it can be assessed also on theoretical grounds. Similarly, color reconnections come into play because of a more detailed treatment of the proton remnants and the underlying event, and the expectation that they do play a role is hence rather theory-driven as well.
Certainly, some of the connections between climate simulation modules are in the same ways stimulated by theoretical considerations. But the general theoretical basis of climate science is far less crisp than that of HEP, where it consists mostly of our current QFTs (with the addition of atomic, solid state, and nuclear physics for detector simulation).
In particular, Lenhard and Winsberg (2010, p. 256) discuss general circulation models in climate science, which have been the primary source of theoretical ideas in earlier decades, and have as their “theoretical core [...] the so-called fundamental equations, a system of partial differential equations motivated by fluid mechanics and thermodynamics.” However: “Today, atmospheric [general circulation models] have lost their central place and given way to a deliberately modular architecture of coupled models that comprise a number of highly interactive sub-models, like atmosphere, oceans, or ice-cover.” (ibid.)
In a more recent assessment of climate science’s theoretical foundations, Katzav and Parker (2018) don’t even mention general circulation models anymore, but rather discern “a number of outstanding issues in the theoretical foundations of climate science”, such as “how to draw the boundaries of the climate system; whether to pursue fully reductive notions of Earth’s climate system and its states; whether climate states should be characterized statistically or in a combined physical-statistical way; which quantities [...] should be used in characterizing climate change” (ibid., pp. 7–8).
This observation makes for an important difference between both fields. Consider how CS results are sanctioned by the ATLAS group, apart from (holistic) validation. When a new piece of code is introduced, efforts will, of course, be undertaken to benchmark it against existing results (such as the predictions from the older code considered valid). In addition, however, a great deal of small- and large-scale integration testing will be performed, in order to ensure that the new code does not mess up the entire simulation’s ability to produce the desired results (cf. ATLAS Computing Group 2005, p. 78 ff.).
It is, in other words, well-acknowledged in the community that the knowledge-generating function associated with a given CS can only be established if the manifold connections among individual models are taken into account and everything is well calibrated also in the sense of model coherence, i.e., through integration testing and pre-integration assessment.
Assessments of integration are discussed in the environmental modeling-literature as well, but the challenges here appear to be very different (and arguably greater), and there also appears to be no generally accepted approach to pursuing integration tests. A review of existing approaches by Belete et al. (2017), for instance, considers a large (but non-exhaustive) list of integration frameworks (p. 51) and discerns (p. 61) “wide variation in strategies for implementing and documenting model integration across the development community”, as well as “wide variation in strategies for orchestrating execution of workflows within the frameworks, and preparing modeling components for assimilation into the frameworks.” In consequence, a number of recommendations for developing a unified approach are given in the paper, such as the inclusion of an explicit interface for communication with other modules, or regarding dataset conversion, semantic mediation, error handling, and similar factors.
These factors appear to be pretty much under control in HEP experiments at the LHC. ATLAS, for instance, only has three testing frameworks, called AtNight (ATN), Kit Validation (KV) and Run Time Tester (RTT), with “differences in the intention as to how these frameworks are to be used” (ATLAS Computing Group 2005, p. 79):
ATN is run on the machines used to run the nightly builds. This limits the resources available for testing, and tests run under ATN are usually short. KV testing is usually run ‘by hand’ as part of kit building, and once again the tests are usually short. The RTT is currently run daily on a farm at UCL. With these resources, more extensive testing is possible. The full set of RTT results currently take a number of hours to generate. (ibid.)
Thus, in contrast to testing frameworks in climate science, the different frameworks used in HEP fulfill complementary roles, and together constitute a rather comprehensive means for testing the integration properties of new pieces of code.
There are also interesting differences on the pre-implementational level, as can be gathered from the following remark by Belete et al. (2017, p. 51):
On the science side, pre-integration assessment includes a problem statement; articulation of the purpose and goals of the science-based modeling effort; conceptualization of the relevant system (including major components and interactions); characterization of scenarios and use cases; analysis of alternatives; design of a science-based approach; and a statement of any project resource constraints and solution criteria.
These steps are all pretty much fixed in advance in HEP, in virtue of either the underlying theory (problems, use cases, science-based approach, alternatives and solution criteria) or the conditions of the experiment (goals, project resource constraints) or sometimes even both in concert (problems, use cases, science-based approach).
The bottom line is that HEP scientists seem to have a much better handle on establishing model-coherence than do climate scientists. But what (A) is the significance of that, and (B) what is this difference due to?
Regarding (A), we believe that model-coherence allows a respectable amount of (error-) attribution, which Lenhard and Winsberg (2010, p. 259) find to be a major problem for climate simulations. Testing the integration-properties of a certain piece of code—after having taken into account the connections that already exist across the underlying physics models (pre-integration assessment)—means checking whether including this code (or the model it represents) will spoil the validity established for the integrated whole. When this is the case, it is possible to attribute at least the failure to the new piece of code or to its connection to other pieces, and to re-assess it on the physics and implementation level.
As our discussion indicates, the solid theoretical basis in HEP often allows for the kind of “fundamental analysis of how the unexpected behavior occurred” Lenhard (2019, p. 955) finds to be sparse in software development. But it certainly also sometimes results in the “adapting [of] interfaces so that the joint model will work as anticipated in the given circumstances” which he finds to be far more common (ibid.).
Regarding (B), the theoretical differences between HEP and climate science were already outlined, and the far better understanding of relevant theory in HEP contributes to the possibility of re-assessment (and improvement) upon attribution of failure. However, this is clearly only part of the story, for why are integration frameworks in HEP complementary, whereas they appear to largely constitute rival approaches in climate science? The reason, we suspect, lies in the nature of the subject matter, and the related fact that HEP can be considered an experimental discipline, whereas climate science relies almost exclusively on field observations.
We do not have an elaborate account of experimentation in mind here, but we do maintain, with, e.g., Schurz (2014, p. 35) or Radder (2009, p. 3), that experiments crucially involve an element of control that is absent in field observations. This control is evident at the LHC, for instance, in the preparation of protons in bunches accelerated to a specified center of mass energy, and their colliding at pre-defined angles and in pre-defined interaction points inside the different detectors.Footnote 21
Why does this difference between both fields exist? We believe that it directly relates to the nature of the subject matter and the different origins of (messiness-inducing) complexity. In HEP, the complexity is entirely conditioned on the remoteness of the target phenomena: quarks, gluons, or Higgs bosons are not directly detectable, in the same sense as electrons, photons, or neutrons. Moreover, they decay (or hadronize) so quickly that even, say, a hypothetical ‘quark detector’ would have a hard time catching free quarks (which is to say that this is pretty much inconceivable). The entire complexity of the simulation is conditioned on this and other (theoretically established) facts, such as the coloredness of gluons, sparking the parton shower, or even the (expected) rarity of relevant events, which requires detectors to be flexible enough for recognizing diverse and subtle signatures attached to the complementary decay channels for relevant particles.
This is very different from climate science where the target phenomena (local temperature, cloud patterns, oceanic streams, rainfall, and so forth) are, by and large, directly observable. But it is here totally out of the question to devise a controlled laboratory experiment that captures them all. Hence, in contrast to HEP, complexity is inherent in the subject matter of climate science.
As it turns out, the higher degree of control over observational conditions and the better theoretical understanding pave the way to better model coherence in HEP, as an additional means for justification. The theoretical understanding makes pre-integration assessment easier (maybe even largely superfluous), and experimental design can be such that it includes such a division of labor from the outset in which there are dedicated collaborations that assess the software-properties such as small- and large-scale integration according to highly organized schemes or even pre-defined strategies.
A socio-historical point surfaces in this context: whereas climate science had a continuous, integrated development as an entire field, in which some of the entrenchment is ultimately rooted (Lenhard and Küppers 2006; Lenhard and Winsberg 2010, 2011), experimental collaborations in HEP, in contrast, each have their own, somewhat secluded histories and social development. This allows HEP researchers to build on successes and failures of previous experiments, at other colliders, from the very design stages on.