Models are designed by the practitioner to answer specific scientific questions. There is no such thing as a “universal model” that is able to emulate the natural world in all of its detail and provide an answer to every question the practitioner asks. Even if one could, in principle, write down some universal wave function of the Universe, this is useless for the practising astrophysicist seeking to understand the natural world by confronting models with data—it is computationally intractable if one wishes to understand complex, non-linear systems, where the interplay between various components of the system is often the most interesting outcome. If buying a larger computer were the solution to understanding these complex systems, we would have solved biology and economics by now. One example of a spectacular, failed attempt at constructing an emulation is the billion-euro Human Brain Project, which attempted to replace laboratory experiments aimed at studying the human brain with all-encompassing computer simulations. Another fundamental obstacle with emulations, even if we could construct them, is that they merely produce correlations. To transform these correlations into statements on cause-and-effect requires theoretical understanding.

One of the first skills a competent theorist learns is to ask when her or his model breaks. What are the assumptions made? What is the physical regime beyond which the model simply becomes invalid? What are the scientific questions one may (or may not) ask of the model? It is the job of the theorist to be keenly aware of these caveats. To give concrete examples: if one’s scientific question is, “What is the structure of the water molecule,” then one solves the Schrödinger equation. If one’s scientific question is instead, “What is the behaviour of waves within a body of water,” then one solves the Navier-Stokes equation. For the latter, it is understood that one cannot ask questions on length scales shorter than the mean free path of collisions between water molecules—or on time scales shorter than the collisional time between water molecules. A similar reasoning applies to why one is able to simulate the behaviour of dark matter on large scales—without knowing what dark matter actually is. This is because, in these simulations, one is forbidden from actually querying the nature of dark matter—this is an input, rather than output, of the simulation. Another rule-of-thumb that all competent theoretical astrophysicists who run simulations know well is: one gets out what one puts in. Or to put it more colloquially: garbage in, garbage out. This rule-of-thumb bears some resemblance to what philosophers of science term “robustness analysis”.

While it is tempting to separate the practice of science from the science itself, the skill level of the practitioner is an aspect that philosophers of science cannot ignore. Not all theorists or modellers should be placed on the same footing. For example, questioning what philosophers term the “theory-ladenness” of an observation, which one is interpreting, is a skill that is honed over years of practice. There are time-honoured “best practices” in astronomy and astrophysics that do not always make their way into the peer-reviewed literature. Only by interacting with practitioners will philosophers of science uncover them. On short time scales, sensationalism and frivolity may enter our peer-reviewed literature. On longer time scales, our peer-reviewed literature has the tendency to self-correct; practitioners are fairly conservative about what we term “standard” (methods or approaches).

It is instructive to elucidate the intention of the practitioner when constructing simulations. Not all simulations are constructed with the same goals. In the grandest sense, one would like to simulate the full temporal and spatial evolution of some system or phenomenon. But sometimes the goals are more modest. The practitioner starts with studying the system on paper and ponders how various physical (or chemical) effects interact with one another. If all of these effects have comparable time scales (or length scales), it implies that they exert comparable influences on the outcome. One is then solving for a complex steady state produced by the interplay between different physical effects, which are often highly non-linear. Simulating the long-term climate of a planet is one such example. If sufficient empirical data are present, one may also incorporate them as initial or boundary conditions in order to predict the future, short-term behaviour of a system. Weather prediction simulations are such an example. In astrophysics, a common goal is to study trends in the predicted observables and how they depend on varying the various input parameters—what the philosophers call “intervention”, which we simply term a parameter study or sweep. As simulations are often computationally expensive, few practitioners would claim that any suite of simulations being computed is complete. Rather, the goal is to elucidate trends and (hopefully) understand the underlying physical mechanisms.

As a practitioner of simulations, I consider the “Verification and Validation” framework to be an unattainable dream. Verification has the ideal that one should compare the simulations against all possible analytical solutions in order to establish their accuracy. The fundamental obstacle is that non-linear analytical solutions are rare, e.g., the solution for solitons. Unfortunately, one often runs a simulation precisely because one is interested in the non-linear outcome! If one adheres to this ideal of verification, no simulation will ever be fully verified—and hence such an ideal is irrelevant to the practitioner (and will thus be ignored in practice). Rather, the practitioner often speaks of benchmarking, where one agrees on an imperfect test that multiple practitioners should attempt to reproduce. Agreement simply implies consistency—but there is a possibility that these practitioners could have all consistently obtained the wrong answer. By contrast, when a practitioner uses the term “validation” it means that one is comparing the simulation to an absolute ground truth—either provided by data or mathematics. In astrophysics, these ground truths are hard to come by. One example of validation is the Held-Suarez test for producing a simple climate state of Earth (without seasons), which was motivated by climate scientists wishing to verify the consistency of simulation codes operated by different laboratories (Held and Suarez 1994).

As a professional maker of models, I find the debate about “fictions” to be puzzling. All governing equations of physics involve approximations—even if one is unaware of them being built into the equations. Rather than visualise a universal model, it is much more useful to think of a hierarchy of models of varying sophistication, which is standard practice in climate science (Held 2005) and used widely in astrophysics. Each model in the hierarchy incorporates a different set of simplifying assumptions designed to answer specific questions. If one’s scientific question is to understand the evolution of stars over cosmic time scales, then approximations such as spherical symmetry are not unreasonable. However, if one’s scientific question is to understand the density structure of stars by studying how sound waves propagate across them over comparatively shorter time scales, then more elaborate models need to be constructed. The question is not whether stars are perfectly spherical—they certainly are not. The real question is: what is the magnitude of the correction to spherical symmetry and how does this affect the accuracy of one’s answer for addressing a specific scientific question? Simplicity is intentionally built into these models, because it allows one to more cleanly identify cause and effect, rather than simply recording correlated outcomes in a simulation.

While Einstein’s equations of relativity supercede Newton’s equations in principle, it is sufficient to solve the latter if one wishes to understand the orbits of exoplanets. While the theoretical foundation of thermodynamics is provided by statistical mechanics, implementing thermodynamics in one’s model or simulation is often sufficiently accurate for the scientific question being asked. In the previous example given, it would be unnecessary (and infeasible) to simulate large-scale fluid behaviour by numerically solving the Schrödinger equation. Models are not constructed in an absolute sense. In addition to addressing specific scientific questions, they are constructed to facilitate effective comparison with data—at the quality and precision available at that time. In other words, one cannot discuss models without also discussing the associated errors in comparison to data. Speaking in generalities without quantitative estimates of the approximations and tying them to the specific scientific question being addressed is not useful for the practising theoretical astrophysicist. If one approaches modelling from the perspective of a model hierarchy, what philosophers of science term “de-idealisation” is simply irrelevant, because each member of the hierarchy employs a different degree of idealisation.

In the use of similarity arguments to justify how terrestrial experiments may mimic celestial systems, one should note that similarity may be broken by introducing physical effects that encode intrinsic length scales. To use the well-known Rayleigh-Taylor instability as an example, if one introduces surface tension to the calculation of the fluid then a minimum length scale for features in the flow appears. If one introduces gravity, then a maximum length scale appears. Similarity only appears when one is asking a scientific question that is justified by treating the system purely as a fluid, but radiation, chemistry and other effects exert non-negligible influences in real astrophysical systems.

In confronting models with data, the modern approach is to use Bayesian inference. When multiple combinations of parameter values yield the same observable outcome, this is known as a model degeneracy. Degeneracies are a feature—and not a bug—of models. The formal way of quantifying degeneracies is to compute the joint posterior distributions between parameters—a standard feature of Bayesian inference. Testing if the data may be explained by families of models and penalising models that are too complex for the quality and precision of data available is a natural outcome of Bayesian model comparison (Trotta 2008). In other words, Bayesian model comparison is the practitioner’s quantitative method for implementing Occam’s Razor. Combining the use of Bayesian model comparison with the construction of a model hierarchy is how modern astrophysics approaches problem solving and the confrontation of models with data. Another feature of Bayesian inference is the specification of prior distributions, which reflect one’s state of knowledge of the system or phenomenon at that point in time. A skilled theorist is keenly aware of when the answer to a scientific question is prior-dominated—again, one gets out what one puts in.

What is the over-arching goal of the theoretical astrophysicist? Certainly, Nature has laws and our models need to abide by them. The construction of models always has unification as a goal—if I observe N phenomena and I need N classes of models to describe them, then I have failed. Our goal is to advance our understanding of Nature on celestial scales—whether by the use of theory, simulation, observation or experiment. The most useful models are the ones we can falsify using data, because they teach us important lessons about the system we are studying. The sparseness of data in astronomy for any single object is not a bug, but a feature—it is a reality of astronomical data that we have to live with. This requires us to adjust our thinking: instead of asking intricate questions about a single object, we often have to ask questions of the ensemble of objects. Instead of tracking a single object or system across time, we have to contend with studying an ensemble of objects at a specific point in time—akin to an astronomical version of the ergodic principle. Such a property distinguishes astrophysics from the rest of physics. I would argue that questions of the ensemble are no less interesting or fundamental, e.g., what fraction of stars host exoplanets and civilisations? A potentially fruitful future direction for astrophysicists and philosophers of science to collaborate on is to combine ensemble thinking and model hierarchy building with thinking deeply about the detection versus auxiliary properties of a system or phenomenon.

Some fundamental issues are missing from the debate about simulations that are often dismissed by philosophers of science as belonging to the realm of practice or implementation. To set up any simulation, the governing equations of physics, which describe continuous phenomena, need to be discretised before they are written into computer code. The very act of discretisation introduces challenges that are ubiquitous to computer simulations, such as an artificial, unphysical form of dissipation that cannot be specified from first principles. Such numerical “hyper-parameters” severely impact the predictive power of simulations, further casting doubt on the analogy between simulations and experiments. Furthermore, simulations often suffer from a “dynamic range” problem—Nature has infinite resolution, but in order to run any simulation within one’s lifetime one has to specify minimum and maximum length scales of the simulated system. The practitioner can never implement an emulation, where all relevant length and time scales are captured in the simulation. There is often crucial physics (e.g., turbulence) occurring below the smallest length scale simulated: so-called “sub-grid physics”. It is not uncommon to have simulated outcomes being driven by one’s prescription of sub-grid physics. One example that affects the study of brown dwarfs, exoplanets, climate science, etc, is how clouds form on small length scales. The problems of dynamic range and numerical hyper-parameters are widely debated by practitioners and are relevant to the debate on the epistemic value of computer simulations.

I would like to end with an unsolved problem in physics (and astrophysics) that I find fascinating, but to date has not received much attention from philosophers of science. It concerns our incomplete understanding of turbulence, which is considered to be an important subfield of modern astrophysics. The Nobel laureate and physicist Werner Heisenberg once allegedly remarked, “When I meet God, I am going to ask him two questions: Why relativity? Why turbulence? I really believe he will have an answer for the first.” The fascinating thing about turbulence is that we have all of the tools and data at our disposal: we have the Navier-Stokes equation, the ability to perform laboratory experiments, astronomical observations of turbulence on a dazzling range of length scales and all of the computational power to simulate it in computers. Yet, despite decades of research, we do not have a complete theory of turbulence. If we did, then we would be able to exactly calculate the threshold Reynolds number for any flow to transition from being laminar to turbulent—and calculate the variation in this dimensionless fluid number as the geometry and boundary conditions of the system change. We would also be able to understand why some turbulent flows are intermittent. Currently, the determination of these phenomena remains an engineering exercise. Studying why we are unable to understand turbulence will potentially yield valuable insights for philosophers of science on the epistemic value of theory, simulation, observation and experiment—and how these different approaches need one another in order to advance our understanding.