1 Introduction

The current standard cosmological model describes a Universe accelerated by a cosmological constant (\(\varLambda \)) and dominated by cold dark matter (CDM), where structure arose from minute initial perturbations—seeded in the primordial quantum Universe—which collapsed on ever larger scales over cosmic time (Planck Collaboration 2020; Alam et al. 2017; Betoule et al. 2014). The nonlinear interplay of these ingredients have formed the cosmic web and the intricate network of haloes harbouring galaxies and quasars.

Over the last decades, numerical simulations have played a decisive role in establishing and testing this \(\varLambda \)CDM paradigm. Following pioneering work in the 1980s, numerical simulations steadily grew in realism and precision thanks to major advances in algorithms, computational power, and the work of hundreds of scientists. As a result, various competing hypotheses and theories could be compared with observations, guiding further development along the years. Ultimately, \(\varLambda \)CDM was shown to be quantitatively compatible with virtually all observations of the large-scale structure of the Universe, even for those that involve nonlinear physics and that are inaccessible to any method other than computer simulations (see e.g., Springel et al. 2006; Vogelsberger et al. 2020).

Nowadays, simulations have become the go-to tool in cosmology for a number of tasks: (i) the interpretation of observations in terms of the underlying physics and cosmological parameters; (ii) the testing and aiding of the development of perturbative approaches and analytic models for structure formation; (iii) the production of reliable input (training) data for data-driven approaches and emulators; (iv) the creation of mock universes for current and future large-scale extragalactic cosmological surveys, from which we can quantify statistical and systematic errors; (v) the study of the importance of various aspects of the cosmological model and physical processes, and determining their observability.

Despite the remarkable agreement between simulations and the observed Universe, there are indications that \(\varLambda \)CDM might not be ultimately the correct theory (Planck Collaboration 2020; Riess et al. 2019; Asgari et al. 2021; DES Collaboration 2021). Future cosmological observations will provide enough data to test competing explanations by probing exceedingly large sub-volumes of the Universe in virtually all electromagnetic wavelengths, and including increasingly fainter objects and smaller structures (e.g. Laureijs et al. 2011; Bonoli et al. 2021; DESI Collaboration 2016; Ivezić et al. 2019; Ade et al. 2019; Merloni et al. 2012). These observations will be able to test the physics of our Universe beyond the standard model: from neutrino masses, over the nature of dark matter and dark energy, to the inflationary mechanism. Since these observations are intimately connected to the nonlinear regime of structure formation, any optimal exploitation of the cosmological information will increasingly rely on numerical simulations. This will put numerical simulations in the spotlight of modern cosmology: they can either confirm or break the standard \(\varLambda \)CDM paradigm, and therefore will play a key role in the potential discovery of new physics.

The required accuracy and precision to make predictions of the necessary quality poses many important challenges and requires a careful assessment of all the underlying assumptions. This is the main topic of this review; we cover the ample field of cosmological simulations, starting from the fundamental equations to their connection with astrophysical observables, highlighting places where current research is conducted.

1.1 Large-scale simulations

The spatial distribution and dynamics of the large-scale structure give us access to fundamental aspects of the Universe: its composition, the physical laws, and its initial conditions. For instance, the overall shape of the galaxy power spectrum is sensitive to the abundance of baryons and dark matter; the anisotropy induced by redshift-space distortions can constrain the cosmic density-velocity relation which in turn is set by the gravity law; and high-order cumulants of the galaxy field can encode non-Gaussian signatures inherited from an early inflationary period.

To extract this information from observations of the large-scale distribution of galaxies, quasars or other tracers, we must rely on a model that predicts their observed distribution as a function of cosmic time for a given cosmological model. That is, we require a prediction for the distribution of the mass density and the velocity field together with the properties and abundance of collapsed objects. Furthermore, we need to predict how galaxies or other astronomical objects will populate these fields. This is a well-posed but very challenging problem, due to the complexity and nonlinear nature of the underlying physics.

On large scales and/or at early times, density fluctuations are small and the problem can be tackled analytically by expanding perturbatively the relevant evolution equations. However, a rigorous perturbation theory can only be carried out on scales unaffected by shell-crossing effects. On intermediate scales, instead, so far only effective models with several free parameters exist (that are themselves either fitted to simulations or tested with simulations). On smaller very nonlinear scales, the only way to accurately solve the problem is numerically.

We illustrate the complicated nature of the nonlinear universe in Fig. 1, which shows the simulated matter field in a region \(40 h^{-1}\mathrm{Mpc}\) wide. In the top panel we can appreciate the distribution of dark matter halos and their variety in terms of sizes, masses, and shapes. In the middle and bottom panels we show the same region but on much thinner slices which emphasise the small-scale anisotropies and ubiquitous filamentary structure.

Fig. 1
figure 1

The distribution of dark matter in thin slices as predicted by a cosmological N-body simulation. Each panel shows a region \(40 h^{-1}\mathrm{Mpc}\) wide with different levels of thickness—40, 2, and \(0.1 h^{-1}\mathrm{Mpc}\), from top to bottom—which highlight different aspects of the simulated density field, from the distribution of dark matter halos in the top panel, to the filamentary nature of the nonlinear structure in the bottom panel. Image adapted from Stücker et al. (2018)

When one is concerned with the large-scale structure of the Universe, the dynamics is dominated by gravity, and baryons and dark matter can be considered as a single pressureless (cold) fluid. This poses an ideal situation for computer simulations: the initial conditions as well as the relevant physical laws are known and can be solved numerically considering only gravitational interactions. We review in detail the foundations of such numerical simulations in Sects. 2, 3, 4, 5, and 6. Specifically, in Sect. 2 we describe the derivation of the relevant equations solved by numerical simulations, in Sect. 3 how they can be discretised by either an N-body approach or by alternative methods. As we will discuss later, considering different discretisations will be crucial to test the robustness of the predictions from N-body simulations. In Sect. 4 we discuss how to evolve the discretised equations in time, and pay attention to common approaches for computational optimization, whereas in Sect. 5 we discuss various numerical techniques to compute gravitational interactions. Finishing our recap of the main numerical techniques underlying large-scale simulations, in Sect. 6 we discuss several aspects of how to set their initial conditions.

The importance of numerical solutions for structure formation is that they provide currently the only way to predict the highly nonlinear universe, and potentially extract cosmological information from observations at all scales. In contrast, if one is restricted to scales where perturbative approaches are valid and shell-crossing effects are negligible (i.e., have only a sub-per cent impact on summary statistics), then a huge amount of cosmological information is potentially lost.

The primary role of numerical simulations for cosmological constraints has already been demonstrated for several probes focusing mostly on small scales, and in setting constraints on the properties of the dark matter particle, as exemplified by constraints from the Ly-\(\alpha \) forest, the abundance and properties of Milky-Way satellites, and strong lensing modelling. They all rely on a direct comparison of observations with numerical simulations, or with models calibrated and/or validated using numerical simulations. In Sect. 7 we discuss several ways in which the distinctive properties of various potential dark matter candidates can be modelled numerically, including neutralinos, warm dark matter, axions, wave dark matter, and decaying and self-interacting dark matter. In the future, this will also be the case for large-scale clustering, and we discuss the current and potential challenges to be address in Sect. 8.

1.2 Upcoming challenges

Given that simulations are increasingly used for the inference of cosmological parameters, a question of growing importance is therefore how one can demonstrate the correctness of a given numerical solution. For instance, any simulation-based evidence—of massive neutrinos or a departure from GR—would necessarily need to prove its accuracy in modelling the relevant physics in the nonlinear regime. A proper understanding is of paramount importance: simulators need to demonstrate that a potential discovery relying on simulations is not simply a numerical inaccuracy or could be explained by uncertain/ill understood model parameters (i.e. the uncertainties due to all “free” parameters must be quantified). We devote Sect. 7 to this topic.

Unfortunately still only a limited set of exact solutions is known for which strong convergence can be demonstrated. For useful predictions, the correctness of the solution has to be established in a very different, much more non-linear regime. However, far from these relatively simplistic known solutions. There has been significant progress in this direction over the last decade: from large code comparisons, a better understanding of the effect of numerical noise and discreteness errors, the impact of code parameters that control the accuracy of the solution, to the quality of the initial conditions used to set-up the simulations. These tests, however, presuppose that the N-body method itself converges to the right solution for a relatively small number of particles. Clear examples where the N-body approach fails have emerged: most notably near the free-streaming scale in warm dark matter simulations, and in the relative growth of multi-fluid simulations. Very recently, methods that do not rely on the N-body approach have become possible which have allowed to test for such errors. Although so far no significant biases have been measured for the statistical properties of CDM simulations, many more careful and systematic comparisons will be needed in the future.

In parallel, there has been significant progress in the improved modelling of the multi-physics aspect of modern cosmological simulations. For instance, accurate two-fluid simulations are now possible capturing the distinct evolution of baryons and cold dark matter, as are simulations that quantify the non-linear impact of massive neutrinos with sophisticated methods. Further, the use of Newtonian dynamics has been tested against simulations solving general relativistic equations in the weak field limit, and schemes to include relativistic corrections have been devised. We discuss all these extensions, which seek to improve the realism of numerical simulations, in Sect. 7.

An important aspect for confirming or ruling out the \(\varLambda \)CDM model will be the design of new cosmic probes that are particularly sensitive to departures from \(\varLambda \)CDM. For this it is important to understand the role of non-standard ingredients on various observables and in structure formation in general. This is an area where rapid progress has been seen with advances in the variety and sophistication of models, for instance regarding the actual nature of dark matter and simulations assuming neutralinos, axions, primordial black holes, or wave dark matter. Likewise, modifications of general relativity as the gravity law, and simulations with primordial non-Gaussianity have also reached maturity.

To achieve the required accuracy and precision that is necessary to make optimal use of upcoming observational data, it is blatantly clear that, ultimately, all systematic errors in simulations must be understood and quantified, and the impact of all the approximations made must be under control. One of them is that on nonlinear scales baryonic effects start to become important since cold collisionless and collisional matter start to separate, and baryons begin being affected by astrophysical processes. Hence, it becomes important to enhance the numerical simulation with models for the baryonic components, either for the formation of galaxies or for the effects of gas pressure and feedback from supernovae or supermassive black holes. We discuss different approaches to this problem in Sect. 9.

In parallel to increasing the accuracy of simulations, the community is focusing on improving their “precision”. Cosmological simulations typically push international supercomputer centers and are among the largest calculations. New technologies and algorithmic advances are an important part of the field of cosmological simulations, and we review this in Sect. 10. We have seen important advances in terms of adoption of GPUs and hardware accelerators and new algorithms for force calculations with improved precision, computational efficiency, and parallelism. Thank to these, state-of-the-art simulations follow regions of dozens of Gigaparsecs with trillions of particles, constantly approaching the level required by upcoming extragalactic surveys.

The field of cosmological simulations have been also reviewed in the last decade by other authors, who have focused on different aspects of the field and from different perspectives. For more details, we refer the reader to the excellent reviews by Kuhlen et al. (2012), Frenk and White (2012), Dehnen and Read (2011), and by Vogelsberger et al. (2020).

1.3 Outline

In the following we briefly outline the contents of each subsection of this review.

Section 2: This section provides the basic set of equations solved by cosmological dark matter simulations. We emphasise the approximations usually adopted by most simulations regarding the weak-field limit of General Relativity and the properties of dark matter. This section sets the stage for various kinds of simulations we discuss afterwards.

Section 3: This section discusses the possible numerical approaches for solving the equations presented in Section 2. Explicitly, we discuss the N-body approach and alternative methods such as the Lagrangian submanifold, full phase-space techniques, and Schrödinger–Poisson.

Section 4: This section derives the time integration of the relevant equations of motion. We discuss the symplectic integration of the dynamics at second order. We also review individual and adaptive timestepping, and the integration of quantum Hamiltonians.

Section 5: We review various methods for computing the gravitational forces exerted by the simulated mass distribution. Explicitly, we discuss the Particle-Mesh method solved by Fourier and mesh-relaxation algorithms, Trees in combination with traditional multipole expansions and Fast multipole methods, and their combination.

Section 6: In this section we outline the method for setting the initial conditions for the various types of numerical simulations considered. Explicitly, we review the numerical algorithms employed (Zel’dovich approximation and higher-order formulations) as formulated in Fourier or configuration space.

Section 7: This section is focused on simulations relaxing the assumption that all mass in the Universe is made out of a single cold collisionless fluid. That is, we discuss simulations including both baryons and dark matter; including neutrinos; assuming dark matter is warm; self-interacting; made out of primordial black holes; and cases where its quantum pressure cannot be neglected on macroscopic scales. We also discuss simulations easing the restrictions that the gravitational law is given by the Newtonian limit of General Relativity, and that the primordial fluctuations were Gaussian.

Section 8 This section discusses the current challenge for high accuracy in cosmological simulations. We consider the role of the softening length, cosmic variance, mass resolution, among others numerical parameters. We also review comparisons of N-body codes and discuss the validity of the N-body discretization itself.

Section 9: This section covers the connection between simulation predictions and cosmological observations. We discuss halo finder algorithms, the building of merger trees, and the construction of ligthcones. We also briefly review halo-occupation distribution, subhalo-abundance-matching, and semi-analytic galaxy formation methods.

Section 10: This section provides a list of state-of-the-art large-scale numerical simulations. We put emphasis in the computational challenge they face in connection with current and future computational trends and observational efforts.

In the final section, we conclude with an outlook for the future of cosmological dark matter simulations.

2 Gravity and dynamics of matter in an expanding Universe

Large-scale dark matter simulations are (mostly) carried out in the weak-field, non-relativistic, and collisionless limit of a more fundamental Einstein–Boltzmann theory. Additionally, since these simulations neglect any microscopic interactions in the collisionless limit right from the start (we will not consider them until we discuss self-interacting dark matter), one operates in the Vlasov-Einstein limit. This is essentially a continuum description of the geodesic motion of microscopic particles since only (self-)gravity is allowed to affect their trajectories. Despite these simplifications, this approach keeps the essential non-linearities of the system, which gives rise to their phenomenological complexity.

In this section, we first derive the relevant relativistic equations of motion in a Hamiltonian formalism. We then take the non-relativistic weak-field limit by considering perturbations around the homogeneous and isotropic FLRW (Friedmann-Lemaître–Robertson–Walker) metric. This Vlasov–Poisson limit yields ordinary non-relativistic equations of motion, with the twist of a non-standard time-dependence due to the expansion of space in a general FLRW space time. Due to the preservation of symplectic invariants, the expansion of space leads to an intimately related contraction of momentum space to preserve overall phase space volume.

With the general equations of motion at hand, we consider the cold limit (owed to a property of cold dark matter (CDM) that is well constrained by observations) which naturally arises since the expansion of space (or rather the compression of momentum space) reduces any intrinsic momentum dispersion of the particle distribution over time. In the cold limit, the distribution function of dark matter takes a particularly simple form of a low-dimensional submanifold of phase space. These discussions are aimed to provide a formal foundation for the equations of motion as well as a motivation for many of the techniques and approximations discussed and reviewed in later sections.

2.1 Equations of motion and the Vlasov equation

Since we are, due to the very weak interactions of dark matter, interested mostly in the collisionless limit, we are essentially looking at freely moving particles in a curved space-time. To describe the motion of these particles, it is much easier to work with Lagrangian coordinates in phase space, i.e., the positions and momenta of particles. In general relativity, we have in full generality an eight dimensional relativistic extended phase space of positions \(x_\mu \) and their conjugate momentaFootnote 1\(p^\mu \). Kinetic theory in curved space time is discussed in many introductory texts on general relativity in great detail (e.g. Ehlers 1971; Misner et al. 1973; Straumann 2013; Choquet-Bruhat 2015 for the curious reader), but mostly without connection to a Hamiltonian structure. For our purposes, we can eliminate one degree of freedom by considering massive particles and neglecting processes that alter the mass of the particles. In that case the mass-shell condition \(p^\mu p_\mu = -m^2 = \mathrm{const}\) holdsFootnote 2 and allows us to reduce dynamics to 3+3 dimensional phase space with one parameter (e.g. time) that can be used to parameterise the motion. Note that we employ throughout Einstein’s summation convention, where repeated indices are implicitly summed over, unless explicitly stated otherwise.

Geodesic motion of massive particles In the presence of only gravitational interactions, the motion of particles in general relativity is purely geodesic by definition. Let us therefore begin by considering the geodesic motion of a particle moving between two points A and B. The action for the motion along a trajectoryFootnote 3 (X(t), P(t)), parametrised by coordinate time t, between the spacetime points A and B is

$$\begin{aligned} S(A,B) = \int _{A}^{B} P_\mu \mathrm{d}X^\mu = \int _A^B \left[ P_0 \mathrm{d}t + P_i \mathrm{d}X^i \right] = \int _A^B\left[ P_i \frac{\mathrm{d}X^i}{\mathrm{d}t} + P_0 \right] \mathrm{d}t. \end{aligned}$$
(1)

From Eq. (1), one can immediately read off that the Lagrangian \(\mathscr {L}\) and Hamiltonian \(\mathscr {H}\) of geodesic motion are given by the usual Legendre transform pair (e.g. Goldstein et al. 2002)

$$\begin{aligned} \mathscr {L}:=P_i \frac{\mathrm{d}X^i}{\mathrm{d}t} - \mathscr {H},\qquad \text {with}\qquad \mathscr {H}:=-P_0,\quad \mathrm{d}t := \mathrm{d}X^0 \end{aligned}$$
(2)

respectively, meaning that \(P_0\) represents the Hamiltonian itself [as one finds also generally in extended phase space, cf. Lanczos (1986)]. It is easy to show in a few lines of calculation that the coordinate-time canonical equations of motion in curved spacetime are then given by two dynamical equationsFootnote 4 (e.g. Choquet-Bruhat 2015)

$$\begin{aligned} \frac{\mathrm{d}X^\mu }{\mathrm{d}t} = \frac{P^\mu }{P^0} \qquad \text {and}\qquad \frac{\mathrm{d}P_\mu }{\mathrm{d}t} = F_\mu \qquad \text {with}\quad F_\mu := -g_{\alpha \beta ,\mu }(X)\, \frac{P^\alpha P^\beta }{2P^0}. \end{aligned}$$
(3)

The Christoffel symbols of the metric simplify here to this simple partial derivative due to the mass-shell condition, but otherwise these two equations are equivalent to the geodesic equation. Note the formal similarity of these equations compared to the non-relativistic equations, with the ‘gravitational interaction’ absorbed into the derivative of the metric. Eqs. (3) determine the particle motion given the metric. The metric in turn is determined by the collection of all particles in the Universe through Einstein’s field equations, which we will address in the next section.

Statistical Mechanics When considering a large number of particles, it is necessary to transition to a statistical description and consider the phase-space distribution (or density) function of particles in phase-space over time, i.e. on \((\varvec{x},\varvec{p},t) \in \mathbb {R}^{3+3+1}\), rather than individual microscopic particle trajectories \((\varvec{X}(t),\varvec{P}(t))\). The on-shell phase space distribution function \(f_{\mathrm{m}}(\varvec{x},\varvec{p},t)\) can be defined e.g. through the particle number, which is a Lorentz scalar, per unit phase space volume. This phase-space density is then related to the energy-momentum tensor as

$$\begin{aligned} T^{\mu \nu } := \frac{1}{\sqrt{|g|}} \int _{\mathbb {R}^3} \mathrm{d}^3p\;f_{\mathrm{m}}(\varvec{x},\varvec{p},t) \,\frac{p^\mu p^\nu }{p^0} , \end{aligned}$$
(4)

where g is the determinant of the metric. For purely collisionless dynamics, the evolution of \(f_{\mathrm{m}}\) is therefore determined by the on-shell Einstein–Vlasov equation (e.g. Choquet-Bruhat 1971; Ehlers 1971)

$$\begin{aligned} \widehat{L}_{\mathrm{m}} \,f_{\mathrm{m}} = 0,\qquad \text {with}\qquad \widehat{L}_{\mathrm{m}} := \frac{\partial }{\partial t} + \frac{p^i}{p^0}\frac{\partial }{\partial x^i} + \frac{F_i}{p^0} \frac{\partial }{\partial p_i} \end{aligned}$$
(5)

where \(\widehat{L}_{\mathrm{m}}\) is the on-shell Liouville operator in coordinate time. This equation relates Hamiltonian dynamics and incompressibility in phase space: the Vlasov equation is simply the continuum limit of Hamiltonian mechanics with only long-range gravitational interactions (i.e., geodesic motion). This can be seen by observing that particle trajectories \(\left( X^i(t),P_i(t)\right) \)following Eqs. (3) solve the Einstein–Vlasov equation as characteristic curves, i.e. \(\mathrm{d}f_\mathrm{m}\left( X^i(t),P_i(t),t\right) /\mathrm{dt}=0\).

2.2 Scalar metric fluctuations and the weak field limit

Metric perturbations The final step needed to close the equations is made through Einstein’s field equations \(G_{\mu \nu } = 8\pi G T_{\mu \nu }\). The field equations connect the evolution of the phase-space density \(f_{\mathrm{m}}\), which determines the stress-energy tensor \(T^{\mu \nu }\), with the force 1-form \(F_i\), which is determined by the metric. The results presented above are valid non-perturbatively for an arbitrary metric. Here, we shall make a first approximation by considering only scalar fluctuations using two scalar potentials \(\phi \) and \(\psi \)Footnote 5. This approximation is valid if velocities are non-relativistic, i.e. \(\left| P_i/P^0\right| \ll 1\). In this case, the only dynamically relevant component of \(T^{\mu \nu }\) is the time-time component. Let us thus consider the metric (which corresponds to the “Newtonian gauge” with conformal time), following largely the notation of Bartolo et al. (2007),

$$\begin{aligned} \mathrm{d}s^2=a^2(\tau )\left[ -\exp (2\psi )\,\mathrm{d}\tau ^2 + \exp (-2\phi )\,\mathrm{d}x^i \mathrm{d}x_i \right] \end{aligned}$$
(6)

where x are co-moving coordinates. The metric determinant is given by \(\sqrt{|g|}=a^4\exp \left( \psi -3\phi \right) \).

The kinetic equation in GR is simply a geodesic transport equation and will thus only depend on the gravitational “force” 1-form \(F_i\), which can be readily computed for this metric to be

$$\begin{aligned} F_i = a^2 p^0 \exp \left( -2\psi \right) \left( \psi _{,i}+\phi _{,i}\right) - \frac{m^2}{p^0} \phi _{,i} \end{aligned}$$
(7)

If the vector and tensor components are non-relativistic [see e.g. Kopp et al. (2014), Milillo et al. (2015) for a rigorous derivation of the Newtonian limit], we are left only with a constraint equation from the time-time component of the field equations. The time-time component of the Einstein tensor \(G_{\mu \nu }\) is found to be

$$\begin{aligned} G^0{}_{0} = \frac{\exp (2\phi )}{a^2} \left( (\nabla \phi )^2 -2\nabla ^2\phi \right) - 3 \frac{\exp (-2\psi )}{a^2} \left( \frac{a^{\prime}}{a}-\phi ^{\prime}\right) ^2, \end{aligned}$$
(8)

where a prime indicates a derivative w.r.t. \(\tau \). Inserting this in the respective field equation and performing the weak field limit (i.e. keeping only terms up to linear order in the potentials) one finds the following constraint equation

$$\begin{aligned} -3\mathcal {H}\left( \phi ^{\prime}+\mathcal {H}\psi \right) + \nabla ^2 \phi + \frac{3}{2}\mathcal {H}^2 = 4\pi G a^2 \rho , \end{aligned}$$
(9)

where \(\rho :=T^0{}_{0}\), \(\mathcal {H}:= a^{\prime}/a\), and G is Newton’s gravitational constant. Note that this equation alone does not close the system, since we have no evolution equation for \(a(\tau )\) yet.

Separation of background and perturbations The usual assumption is that backreaction can be neglected, i.e. the homogeneous and isotropic FLRW case is recovered with \(\phi ,\psi \rightarrow 0\) and density \(\rho \rightarrow \overline{\rho }(\tau )\). In this case, \(a(\tau )\) is given by the solution of this equation in the absence of perturbations which becomes the usual Friedmann equation

$$\begin{aligned} \mathcal {H}^2 = \frac{8\pi G}{3} a^2 \overline{\rho }\qquad \text {with }\qquad \overline{\rho } =: \left( \varOmega _{\mathrm{r}} a^{-4}+\varOmega _\nu (a)+\varOmega _{\mathrm{m}} a^{-3} + \varOmega _{\mathrm{k}} a^{-2} + \varOmega _\varLambda \right) \rho _{\mathrm{c},0}, \end{aligned}$$
(10)

where \(\rho _{\mathrm{c},0} := \frac{3H_0^2}{8 \pi G}\) is the critical density of the Universe today, \(H_0\) is the Hubble constant, and the \(\varOmega _{X\in \{\mathrm{r},\nu ,\mathrm{m},\mathrm{k},\varLambda \}}\) are the respective density parameters of the various species in units of this critical density (at \(a=1\)). Note that massive neutrinos \(\varOmega _\nu (a)\) have a non-trivial scaling with a (see Sect. 7.8.2 for details). In the inhomogeneous case one can subtract out this FLRW evolution—neglecting by doing so any non-linear coupling, or ‘backreaction’, between the evolution of a and the inhomogeneities—and finds finally

$$\begin{aligned} -3\mathcal {H}\left( \phi ^{\prime}+\mathcal {H}\psi \right) + \nabla ^2 \phi = 4\pi G a^2 (\rho -\overline{\rho }), \end{aligned}$$
(11)

This is an inhomogeneous diffusion equation (cf. e.g., Chisari and Zaldarriaga 2011; Hahn and Paranjape 2016) reflecting the fact that the gravitational potential does not propagate instantaneously in an expanding Universe so that super-horizon scales, where density evolution is gauge-dependent, are screened.

2.3 Newtonian cosmological simulations

Newtonian gravity In the absence of anisotropic stress the two scalar potentials in the metric (6) coincide and one has \(\psi =\phi \). One can further show that on sub-horizon scales [where \(\rho \) must be gauge independent, see e.g. Appendix A of Hahn and Paranjape (2016)] one then recovers from Eq. (11) the non-relativistic Poisson equation,

$$\begin{aligned} \nabla ^2 \phi = 4\pi G a^2 (\rho -\overline{\rho }). \end{aligned}$$
(12)

Note that this Poisson equation is however a priori invalid on super-horizon scales. Formally, when carrying out the transformation that removed the extra terms from Eq. (11), the Poisson source \(\rho \) has been gauge transformed to the synchronous co-moving gauge. If a simulation is initialised with density perturbations in the synchronous gauge and other quantities are interpreted in the Newtonian gauge, then the Poisson equation consistently links the two. In addition, we have in the non-relativistic weak field limit that \(p^0=a^{-1} m\) so that we also recover the Newtonian force law

$$\begin{aligned} F_i \rightarrow -am\phi _{,i}\,. \end{aligned}$$
(13)

Note that such gauge mixing can be avoided and horizon-scale effects can be rigorously accounted for by choosing a more sophisticated gauge (Fidler et al. 2016, 2017b) in which the force law is required to take the form of Eq. (13) and coordinates and momenta are interpreted self-consistently in this ‘Newtonian motion’ gauge to account for leading order relativistic effects. A posteriori gauge transformations exist to relate gauge-dependent quantities, but remember that observables can never be gauge dependent.

Non-relativistic moments For completeness and reference, we also give the components of the energy-momentum tensor (4) as moments of the distribution function \(f_{\mathrm{m}}\) in the non-relativistic limit

$$\begin{aligned} \rho := T^0{}_{0}= & {} \frac{m}{a^{3}} \int _{\mathbb {R}^3} \mathrm{d}^3p\,f_{\mathrm{m}}(\varvec{x},\varvec{p},t)\, \end{aligned}$$
(14a)
$$\begin{aligned} \pi _i := T^0{}_{i}= & {} \frac{1}{a^4}\int _{\mathbb {R}^3} \mathrm{d}^3p\,f_{\mathrm{m}}(\varvec{x},\varvec{p},t) \,p_i\, \end{aligned}$$
(14b)
$$\begin{aligned} \varPi _{ij} := T_{ij}= & {} \frac{1}{ma^5} \int _{\mathbb {R}^3} \mathrm{d}^3p\,f_{\mathrm{m}}(\varvec{x},\varvec{p},t) \,p_i \,p_j\,, \end{aligned}$$
(14c)

defining the mass density \(\rho \), momentum density \(\varvec{\pi }\) and second moment \(\varPi _{ij}\), which is related to the stress tensor as \(\varPi _{ij} - \pi _i \pi _j / \rho \).

The equations solved by standard N-body codes. Finally, the equations of motion in cosmic time \(\mathrm{d}t = a\, \mathrm{d}\tau \), assuming the weak-field non-relativistic limit, are

$$\begin{aligned} \frac{\mathrm{d}{X}^i}{\mathrm{d}t} = \frac{P^i}{m}=\frac{{P}_i}{ma^2}\qquad \text {and}\qquad \frac{\mathrm{d}{P}_i}{\mathrm{d}t} = -m\frac{\partial \phi }{\partial X^i} \end{aligned}$$
(15)

with the associated Vlasov–Poisson system

$$\begin{aligned}&\frac{\partial f_{\mathrm{m}}}{\partial t} + \frac{{p}_i}{ma^2} \,\frac{\partial f_{\mathrm{m}}}{\partial {x}^i} - m\,\frac{\partial \phi }{\partial x^i} \,\frac{\partial f_\mathrm{m}}{\partial p_i} =0 \end{aligned}$$
(16a)
$$\begin{aligned}&\nabla ^2\phi = 4\pi G a^2 (\rho - \overline{\rho }) \end{aligned}$$
(16b)
$$\begin{aligned}&\rho = \frac{m}{a^{3}} \int _{\mathbb {R}^3} \mathrm{d}^3p\,\,f_\mathrm{m}(\varvec{x},\varvec{p},t) \end{aligned}$$
(16c)

where \(\overline{\rho }(t)\) is the spatial mean of \(\rho \) that is used also in the Friedmann equation \(\mathcal {H}^2 = \frac{8\pi G}{3} a^2 \overline{\rho }\) which determines the evolution of a(t). It is convenient to change to a co-moving matter density \(a^{-3}\rho \), eliminating several factors of a in these equations. In particular, the Poisson equation can be written as

$$\begin{aligned} \nabla ^2 \phi = \frac{3}{2}H_0^2 \varOmega _{\mathrm{m}} \delta / a, \end{aligned}$$
(17)

if gravity is sourced by matter perturbations alone so that \(\rho (\varvec{x},t) = (1+\delta (\varvec{x},t))\,\varOmega _{\mathrm{m}} \rho _{\mathrm{c},0} a^{-3}\). Note that we have also introduced here the fractional overdensity \(\delta := \rho /\overline{\rho }-1\).

2.4 Post-Newtonian simulations

While traditionally all cosmological simulations were carried out in the non-relativistic weak-field limit, neglecting any back-reaction effects on the metric, the validity and limits of this approach have been questioned (Ellis and Buchert 2005; Heinesen and Buchert 2020). In addition, with upcoming surveys reaching horizon scales, relativistic effects need to be quantified and accounted for correctly. Since such effects are only relevant on very large scales, where perturbations are assumed to be small, various frameworks have been devised to interpret the outcome of Newtonian simulations in relativistic context (Chisari and Zaldarriaga 2011; Hahn and Paranjape 2016), which suggested in particular that some care is necessary in the choice of gauge when setting up initial conditions. Going even further, it turned out to be possible to define specific fine-tuned gauges, in which the gauge-freedom is used to absorb relativistic corrections, so that the equations of motion are strictly of the Newtonian form (Fidler et al. 2015; Adamek et al. 2017a; Fidler et al. 2017a). This approach requires only a modification of the initial conditions and a re-interpretation of the simulation outcome. Alternatively, relativistic corrections can also be included at the linear level by adding additional large-scale contributions computed using linear perturbation theory to the gravitational force computed in non-linear simulations (Brandbyge et al. 2017).

Going beyond such linear corrections, recently the first Post-Newtonian cosmological simulations have been carried out (Adamek et al. 2013, 2016) which indicated however that back-reaction effects are small and likely irrelevant for the next generation of surveys. Most recently, full GR simulations are now becoming possible (Giblin et al. 2016; East et al. 2018; Macpherson et al. 2019; Daverio et al. 2019) and seem to confirm the smallness of relativistic effects. The main advantage of relativistic simulations is that relativistic species, such as neutrinos, can be included self-consistently. In all cases investigated so-far, non-linear relativistic effects have however appeared to be negligibly small on cosmological scales. However, such simulations will be very important in the future to verify the robustness of standard simulations regarding relativistic effects on LSS observables (e.g. gravitational lensing, gravitational redshifts, e.g. Cai et al. 2017, or the clustering of galaxies on the past lightcone, e.g. Breton et al. 2019; Guandalin et al. 2021; Lepori et al. 2021; Coates et al. 2021, all of which have been proposed as tests of gravitational physics on large scales).

2.5 Cold limit: the phase-space sheet and perturbation theory

The cold limit All observational evidence points to the colder flavours of dark matter (but see Sect. 7 for an overview over various dark matter models). A key limiting case for cosmological structure formation is therefore that of an initially perfectly cold scalar fluid (i.e. vanishing stress and vorticity). In this case, the dark matter fluid is at early enough times fully described by its density and (mean) velocity field, which is of potential nature. The higher order moments (14c) are then fully determined by the lower order moments (14a14b) and the momentum distribution function at any given spatial point is a Dirac distribution so that \(f_{\mathrm{m}}\) is fully specified by only two scalar degrees of freedom, a density \(n(\varvec{x})\), and a velocity potential, \(S(\varvec{x})\), at some initial time, i.e.

$$\begin{aligned} f_{\mathrm{m}}(\varvec{x},\varvec{p},t) = n(\varvec{x},t)\;\delta _D\left( \varvec{p} - m \nabla _x S(\varvec{x},t) \right) . \end{aligned}$$
(18)

Since S is differentiable, it endows phase space with a manifold structure and the three-dimensional hypersurface of six-dimensional phase space on which f is non-zero is called the ‘Lagrangian submanifold’. In fact, if at any time one can write \(\varvec{p}=m\varvec{\nabla }S\), then Hamiltonian mechanics guarantees that the Lagrangian submanifold preserves its manifold structure, i.e. it never tears or self-intersects. It can however fold up, i.e. lead to a multi-valued field \(S(\varvec{x},t)\), invalidating the functional form (18). Prior to such shell-crossing events (as is the case at the starting time of numerical simulations) this form is, however, perfectly meaningful and by taking moments of the Vlasov equation for this distribution function, one obtains a Bernoulli–Poisson system which truncates the infinite Boltzmann hierarchy already at the first moment, leaving only two equations (Peebles 1980) in terms of the density contrast \(\delta = n/\overline{n}-1\) and the velocity potential S,

$$\begin{aligned} \frac{\partial \delta }{\partial t} + a^{-2} \varvec{\nabla }\cdot \left( (1+\delta ) \varvec{\nabla }S\right) =0 \qquad \text {and}\qquad \frac{\partial S}{\partial t} + \frac{1}{2a^2} \left( \varvec{\nabla }S\right) ^2 + \phi = 0, \end{aligned}$$
(19)

supplemented with Poisson’s equation (Eq. 17). Note that this form brings out also the connection to Hamilton-Jacobi theory. After shell-crossing, this description breaks down, and all moments in the Boltzmann hierarchy become important.

Eulerian perturbation theory For small density perturbations \(|\delta |\ll 1\), it is possible to linearise the set of equations (19). One then obtains the ODE governing the linear instability of density fluctuations

$$\begin{aligned} \delta ^{\prime \prime } + \mathcal {H} \delta ^\prime - \frac{3}{2}H_0^2 \varOmega _{\mathrm{m}} a^{-1} \delta = 0. \end{aligned}$$
(20)

The solutions can be written as \(\delta (\varvec{x},\tau ) = D_+(\tau ) \delta _+(\varvec{x}) + D_-(\tau ) \delta _-(\varvec{x})\) and in \(\varLambda \)CDM cosmologies given in closed form (Chernin et al. 2003; Demianski et al. 2005) as

$$\begin{aligned} D_+(a) = a \, {}_{2}F_1\left( \frac{1}{3},\, 1,\, \frac{11}{6};\,-f_\varLambda (a)\right) \qquad \text {and}\qquad D_-(a)=\sqrt{1+f_\varLambda (a)} \,a^{-\frac{3}{2}}, \end{aligned}$$
(21)

where \( f_\varLambda := \varOmega _\varLambda / (\varOmega _{\mathrm{m}} a^{-3})\), and \({}_2F_1\) is Gauss’ hypergeometric function. In general cases, especially in the presence of trans-relativistic species such as neutrinos, Eq. (20) needs to be integrated numerically however. Moving beyond linear order, recursion relations to all orders in perturbations of Eqs. (19) have been obtained in the 1980s (Goroff et al. 1986) and provide the foundation of standard Eulerian cosmological perturbation theory [SPT; cf. Bernardeau et al. (2002) for a review].

Lagrangian perturbation theory Alternatively to considering the Eulerian fields, the dynamics can be described also through the Lagrangian map, i.e. by considering trajectories \(\varvec{x}(\varvec{q},t) = \varvec{q} + \varvec{\varPsi }(\varvec{q},t)\) starting from Lagrangian coordinates \(\varvec{q} = \varvec{x}(\varvec{q},t=0)\). It becomes then more convenient to write the distribution function (18) in terms of the Lagrangian map, i.e.

$$\begin{aligned} f_{\mathrm{m}}(\varvec{x},\varvec{p},\tau ) = \delta _D\left( \varvec{x} - \varvec{q}-\varvec{\varPsi }(\varvec{q},\tau )\right) \;\delta _D\left( \varvec{p} - m a \varvec{\varPsi }^\prime (\varvec{q},t) \right) . \end{aligned}$$
(22)

Mass conservation then implies that the density is given by the Jacobian \(\mathrm{J} := \det J_{ij} := \det \partial x_i/\partial q_j\) as

$$\begin{aligned} 1+\delta (\varvec{q},\tau ) = \left| \mathrm{J}\right| ^{-1} := \left| \det \varvec{\nabla }_q \otimes \varvec{x} \right| ^{-1} = \left| \det \delta _{ij} + \partial \varPsi _i / \partial q_j \right| ^{-1}, \end{aligned}$$
(23)

which is singular if any eigenvalue of \(\varvec{\nabla }_q\otimes \varvec{x}\) vanishes. This is precisely the case when shell crossing occurs. The canonical equations of motion (15) can be combined into a single second order equation, which in conformal time reads

$$\begin{aligned} \varvec{x}^{\prime \prime } +\mathcal {H}\varvec{x}^\prime + \varvec{\nabla }_x \phi (\varvec{x}) = 0, \end{aligned}$$
(24)

where we now consider trajectories not for single particles but for the Lagrangian map, i.e. \(\varvec{x}=\varvec{x}(\varvec{q},t)\). By taking its divergence, this can be rewritten as an equation including only derivatives w.r.t. Lagrangian coordinates

$$\begin{aligned} \mathrm{J}\left( \delta _{ij}+\varPsi _{i,j}\right) ^{-1} \left( \varPsi _{i,j}^{\prime \prime }+\mathcal {H}\varPsi _{i,j}^\prime \right) = \frac{3}{2}\mathcal {H}^2 \varOmega _{\mathrm{m}}\left( \mathrm{J} - 1 \right) . \end{aligned}$$
(25)

In Lagrangian perturbation theory (LPT), this equation is then solved perturbatively using a truncated time-Taylor expansion of the form \(\varvec{\varPsi }(\varvec{q},\tau ) = \sum _{n=1}^\infty D(\tau )^n \varvec{\varPsi }^{(n)}(\varvec{q})\) (Buchert 1989, 1994; Bouchet et al. 1995; Catelan 1995). At first order \(n=1\), restricting to only the growing mode, one finds the famous Zel’dovich approximation (Zel’dovich 1970)

$$\begin{aligned} \varvec{x}(\varvec{q},\tau ) = \varvec{q} + D_+(\tau ) \varvec{\nabla }_q \nabla _q^{-2} \delta _+(\varvec{q}), \end{aligned}$$
(26)

where \(\delta _+(\varvec{q})\) is, as above, the growing mode spatial fluctuation part of SPT. All-order recursion relations have also been obtained for LPT (Rampf 2012; Zheligovsky and Frisch 2014; Matsubara 2015). LPT solutions are of particular importance for setting up initial conditions for simulations. Both SPT and LPT are valid only prior to the first shell-crossing since the (pressureless) Euler–Poisson limit of Vlasov–Poisson ceases to be valid after. This can be easily seen by considering the evolution of a single-mode perturbation in the cold self-gravitating case, as shown in Fig. 2. Prior to shell-crossing, the mean-field velocity \(\langle \varvec{v}\rangle (\varvec{x},\,t)\) coincides with the full phase-space description \(\varvec{p}(\varvec{x}(\varvec{q};\,t);\,t)/m\). Then the DF from Eq. (18) guarantees that the (Euler/Bernoulli-) truncated mean field fluid equations describe the full evolution of the system. This is no longer valid after shell-crossing, accompanied by infinite densities where \(\det \partial x_i/\partial q_j=0\), when the velocity becomes multi-valued. Nevertheless, LPT provides an accurate bridge between the early Universe and that at the starting redshift of cosmological simulations, which can then be evolved further deep into the nonlinear regime. This procedure will be discussed in detail in Sect. 6.

Heuristic models based on PT Additionally, perturbation theory has been used as the backbone for approximate, but computationally extremely fast, descriptions of the nonlinear structure [see Monaco (2016) or a review and Chuang et al. (2015b) for a comparison of approaches]. These methods overcome the shell-crossing limitation of perturbation theory in different ways. The introduction of a viscuous term in the adhesion model (Gurbatov et al. 1985; Kofman and Shandarin 1988; Kofman et al. 1992) prevents the crossing of particle trajectories. In an alternative approach, Kitaura and Hess (2013) replaced the LPT displacement field on small-scales by values motivated from spherical collapse (Bernardeau 1994; Mohayaee et al. 2006). A similar idea is implemented in Muscle (Neyrinck 2016; Tosone et al. 2021). Numerous models have been developed based on an extension of the predictions of LPT via various small-scale models that aim to capture the collapse into halos or implement empirical corrections: Patchy (Kitaura et al. 2014), PTHalos (Scoccimarro and Sheth 2002; Manera et al. 2013), EZHalos (Chuang et al. 2015a), HaloGen (Avila et al. 2015). Finally, Pinocchio (Monaco et al. 2002, 2013) and WebSky (Stein et al. 2020) both combine LPT displacements with an ellipsoidal collapse model and excursion set theory to predict the abundance, mass accretion history, and spatial distribution of halos. The low computational cost of these approaches makes them useful for the creation of large ensembles of “simulations” designed at constructing covariance matrices for large-scale structure observations, or for a direct modelling of the position and redshift of galaxies. However, due to their heuristic character, their predictions need to be constantly quantified and validated with full N-body simulations.

Fig. 2
figure 2

Evolution of a single-mode perturbation from early times (\(a=0.1\), left panels), through shell-crossing (at \(a=1\), middle panels), to late times (\(a=10\), right panels). The top row shows the phase space, where the cold distribution function occupies only a one-dimensional line. Density is shown in the middle row, with singularities of formally infinite density appearing at and after the first shell-crossing. The bottom panels show the mean fluid velocity (\(\langle \varvec{v}\rangle = \varvec{\pi }/\rho \), Eq. 14b), which is identical to the phase-space diagram up to the first shell-crossing, but develops a complicated structure with discontinuities in the multi-stream region (indicated by green shading). Since the distribution function has a manifold structure, its tangent space (indicated in orange) can be evolved in a “geodesic deviation” equation, or it can be approximated by tessellations. Caustics appear where \(\partial x/\partial q = 0\). N-body simulation do not track this manifold structure, and sample only the distribution function

2.6 Deformation and evolution in phase space

The canonical equations of motion describe the motion of points in phase space over time. Moving further, it is also interesting to consider the evolution of an infinitesimal phase space volume, spanned by \((\mathrm{d}\varvec{x},\mathrm{d}\varvec{p})\), which is captured by the “geodesic deviation equation”.

As we have just discussed, in the cold case, the continuum limit leads to all mass occupying a thin phase sheet in phase space and one can think of the evolution of the system as the mapping \(\varvec{q}\mapsto (\varvec{x},\varvec{p})\) between Lagrangian and Eulerian space (cf. Fig. 2). One can take the analysis of deformations of pieces of phase space one level further beyond the cold case by considering a general mapping of phase space onto itself, i.e. \((\varvec{q},\varvec{w})\mapsto (\varvec{x},\varvec{p})\) (but note that this definition is formally not valid as \(a\rightarrow 0\) since in the canonical cosmological case the momentum space blows up in this limit). The associated phase-space Jacobian matrix \(\mathsf{\varvec {D}}\), which reflects the effect in Eulerian space of infinitesimal changes to the Lagrangian coordinates, is

$$\begin{aligned} \mathsf{\varvec {D}} := \frac{\partial (\varvec{x},\varvec{p})}{\partial (\varvec{q},\varvec{w})} = \begin{bmatrix} \frac{\partial \varvec{x}}{\partial \varvec{q}} &{} \frac{\partial \varvec{x}}{\partial \varvec{w}} \\ \frac{\partial \varvec{p}}{\partial \varvec{q}} &{} \frac{\partial \varvec{p}}{\partial \varvec{w}} \end{bmatrix} =: \begin{bmatrix} \mathsf{\varvec {D}}_{\mathrm{xq}} &{} \mathsf{\varvec {D}}_{\mathrm{xw}} \\ \mathsf{\varvec {D}}_{\mathrm{pq}} &{} \mathsf{\varvec {D}}_{\mathrm{pw}} \end{bmatrix}, \end{aligned}$$
(27)

where in the last equality, we have split the 6D tensor into four blocks. Its dynamics are fully determined by the canonical equations of motion, which we can obtain after a few steps as (Habib and Ryne 1995; Vogelsberger et al. 2008).

$$\begin{aligned} \dot{\mathsf{\varvec {D}}} = \frac{\partial (\dot{\varvec{x}},\dot{\varvec{p}})}{\partial (\varvec{q},\varvec{w})} = \left( \begin{bmatrix} \varvec{\nabla }_x\otimes \varvec{\nabla }_p &{} \varvec{\nabla }_p\otimes \varvec{\nabla }_p \\ - \varvec{\nabla }_x\otimes \varvec{\nabla }_x &{} \varvec{\nabla }_p\otimes \varvec{\nabla }_x \end{bmatrix} \mathscr{H}\right) \cdot \mathsf{\varvec {D}} =: \mathsf{\varvec {H}}\cdot \mathsf{\varvec {D}}. \end{aligned}$$
(28)

This is equation is called the “geodesic deviation equation” (GDE) in the literature and it quantifies the relative motion in phase space along the Hamiltonian flow. For separable Hamiltonians \(\mathscr {H}=T(\varvec{p},t)+V(\varvec{x},t)\), the coupling matrix \(\mathsf{\varvec {H}}\) becomes

$$\begin{aligned} \mathsf{\varvec {H}} = \begin{bmatrix} \mathsf{\varvec {0}}&{} (\varvec{\nabla }_p\otimes \varvec{\nabla }_p)T \\ -(\varvec{\nabla }_x\otimes \varvec{\nabla }_x)V &{} \mathsf{\varvec {0}} \end{bmatrix}, \quad \text {and in cosmic time:}\quad \mathsf{\varvec {H}} = \begin{bmatrix} \mathsf{\varvec {0}} &{} \frac{1}{ma^{2}} \delta _{ij} \\ -m\phi _{,ij} &{} \mathsf{\varvec {0}} \end{bmatrix}, \end{aligned}$$
(29)

and shows a coupling to the gravitational tidal tensor (Vogelsberger et al. 2008; Vogelsberger and White 2011). The evolution of \(\mathsf{\varvec {D}}\) can be used to track the evolution of an infinitesimal environment in phase space around a trajectory \((\varvec{x}(\varvec{q},\varvec{w};\,t),\,\varvec{p}(\varvec{q},\varvec{w};\,t))\). In particular, from Eq. (23) follows that zero-crossings of the determinant of the \(\mathsf{\varvec {D}}_{\mathrm{xq}}\) block correspond to infinite-density caustics, so that it can be used to estimate the local (single) stream density, and count the number of caustic crossings. Infinite density caustics would cause singular behaviour in the evolution of \(\mathsf{\varvec {D}}\), so that its numerical evolution has to be carried out with sufficient softening (Vogelsberger and White 2011; Stücker et al. 2020). Since it is sensitive to caustic crossings, the GDE can be used to quantify the distinct components of the cosmic web (Stücker et al. 2020, see also Sect. 9.5). The GDE has an intimate connection also to studies of the emergence of chaos in gravitationally collapsed structures since it quantifies the divergence of orbits in phase space and has an intimate connection to Lyapunov exponents (Habib and Ryne 1995). An open problem is how rapidly a collapsed system achieves efficient phase space mixing since discreteness noise in N-body simulations could be dominant in driving phase space diffusion if not properly controlled (Stücker et al. 2020; Colombi 2021).

3 Discretization techniques for Vlasov–Poisson systems

The macroscopic collisionless evolution of non-relativistic self-gravitating classical matter in an expanding universe is governed by the cosmological Vlasov–Poisson (VP) set of Eqs. (16a16b) derived above. VP describes the evolution of the density \(f(\varvec{x},\varvec{p},t)\) in six-dimensional phase space over time. Due to the non-linear character of the equations and the attractive (focusing) nature of the gravitational interaction, intricate small-scale structures (filamentation) emerge in phase space already in 1+1 dimensions as shown in Fig. 2, and chaotic dynamics can arise in higher dimensional phase space. Various numerical methods to solve VP dynamics have been devised, with intimate connections also to related techniques in plasma physics. The N-body approach is clearly the most prominent and important technique to-day, however, other techniques have been developed to overcome its shortcomings in certain regimes and to test the validity of results. A visual representation of the various approaches to discretise either phase space or the distribution function is shown in Fig. 3.

3.1 The N-body technique

The most commonly used discretisation technique for dark matter simulations is the N-body method, which has been used since the 1960s as a numerical tool to study the Hamiltonian dynamics of gravitationally bound systems such as star and galaxy clusters (von Hoerner 1960; Aarseth 1963; Hénon 1964) by Monte-Carlo sampling the phase space of the system. They started being used to study cosmological structure formation beginning in the early 1970s (Peebles 1971; Press and Schechter 1974; Miyoshi and Kihara 1975), followed by an explosion of the field in the first half of the 1980s (Doroshkevich et al. 1980; Klypin and Shandarin 1983; White et al. 1983; Centrella and Melott 1983; Shapiro et al. 1983; Miller 1983). These works demonstrated the web-like structure of the distribution of matter in the Universe and established that cold dark matter (rather than massive neutrinos) likely provides the “missing” (dark) matter. By the late 1990s, the resolution and dynamic range had increased sufficiently so that it became possible to study the inner structure of dark matter haloes, leading to the discovery of universal halo profiles (Navarro et al. 1997), the large abundance of substructure in CDM subhaloes (Moore et al. 1999; Klypin et al. 1999b), and predictions of the mass function of collapsed structures in the Universe over a large range of masses and cosmic time (Jenkins et al. 2001). The N-body method is now being used in virtually all large state-of-the-art cosmological simulations as the method of choice to simulate the gravitational collapse of cold collisionless matter (cf. Sect. 10 for a review of state-of-the-art simulations). In “total matter” (often somewhat falsely called “dark matter only”) simulations, the N-body mass distribution serves as a proxy of the potential landscape in which galaxy formation takes place. And also in multi-physics simulations that simulate the distinct evolution of dark matter and baryons (see also Sect. 7.8.1), collisionless dark matter is solved via the N-body method, while a larger diversity of methods are employed to evolve the collisional baryonic component (see e.g. Vogelsberger et al. 2020, for a review).

The N-body discretisation Underlying all these simulations is the fundamental N-body idea: the Vlasov equation is the continuum version of the Hamiltonian equations of motion, which implies that phase-space density is conserved along Hamiltonian trajectories. The non-linear coupling in Hamilton’s equations arises through the coupling of particles with gravity via Poisson’s equation, which is only sourced by the density field. Therefore, as long as a finite number of N particles is able to fairly sample the density field, the evolution of the system can be approximated by these discrete trajectories.

A practical complication is that the (formally) infinitely extended mass distribution has to be taken into account. Most commonly, this complication is solved by restricting the simulation to a finite cubical volume \(V=L^3\) of co-moving linear extent L with periodic boundary conditions. This can formally be written as considering infinite copies of this fundamental cubical box. The effective N-body distribution function is then given by a set of discrete macroscopic particle locations and momenta \(\left\{ \left( \varvec{X}_i(t),\varvec{P}_i(t)\right) ,\,i=1\dots N\right\} \) along with the infinite set of periodic copies, so that

$$\begin{aligned} f_N(\varvec{x}, \varvec{p}, t) = \sum _{\varvec{n}\in \mathbb {Z}^3} \sum _{i=1}^N \frac{M_i}{m}\,\delta _D(\varvec{x}-\varvec{X}_i(t)-\varvec{n} L )\,\delta _D(\varvec{p}-\varvec{P}_i(t)), \end{aligned}$$
(30)

is an unbiased sampling of the true distribution function. Here \(M_i\) is the effective particle mass assigned to an N-body particle, m the actual microscopic particle mass, and \(\varvec{X}_i(t)\) and \(\varvec{P}_i(t)\) are the position and momentum of particle i at time t. The most widespread choice of discretisation is one in which all particles are assumed to have equal mass, \(M_i = \overline{M} = \varOmega _m \rho _{\mathrm{c,0}} V / N\). Note, however, that using different masses is also possible and sometimes desirable (e.g., for multi-resolution simulations, see Sect. 6.3.4).

Initial conditions Since the particles are to sample the full six-dimensional distribution function, a key question is how the initial positions \(\varvec{X}_i(t_0)\) and momenta \(\varvec{P}_i(t_0)\) should be chosen. For cold systems, we have derived a consistent approach above and it is given by Eq. (18) which can be readily evaluated from a discrete sampling of the Lagrangian manifold alone, i.e. by choosing an (ideally homogeneous and isotropic) uniform sampling in terms of Lagrangian coordinates \(\varvec{Q}_i\) for each N-body particle (discussed in more detail in Sect. 6.4) and then obtaining the Eulerian position \(\varvec{X}_i\) and momentum \(\varvec{P}_i\) at some initial time \(t_0\) from the Lagrangian map \(\varvec{\varPsi }(\varvec{Q}_i,t_0)\) (see Sect. 6.2 for more details). This means that, in the case of a cold fluid, the particles sample the mean flow velocity exactly. The situation is considerably more involved if the system has a finite temperature which requires to sample not only the three-dimensional Lagrangian submanifold but the full six-dimensional phase space density. This implies that in some sense each particle in the cold case needs to be sub-divided into many particles that sample the momentum part of the distribution function. Particularly for hot distribution functions, where the momentum spread is large compared to mean momenta arising from gravitational instability (such as in the case of neutrinos), this has caused a formidable challenge due to the large associated sampling noise. To circumvent these problems various solutions have been proposed, e.g. using a careful sampling of momentum space using shells in momentum modulus and an angle sampling based on the healpix sphere decomposition (Banerjee et al. 2018), or reduced variance sampling based on the control variates method (Elbers et al. 2021). Such avenues are discussed in more detail in the context of massive neutrino simulations in Sect. 7.8.2 below.

Equations of motion Once the initial particle sampling has been determined, the subsequent evolution is fully governed by VP dynamics. Moving along characteristics of the VP system, the canonical equations of motion for particle \(i=1\dots N\) in cosmic time are obtained from \(\mathrm{d}f(\varvec{X}_i(t),\,\varvec{P}_i(t),\,t)/\mathrm{d}t=0\) as

$$\begin{aligned} \dot{\varvec{X}}_i = \frac{{\varvec{P}}_i}{M_i a^2}\qquad \text {and}\qquad \dot{\varvec{P}}_i = -M_i \left. \varvec{\nabla }_{x} \phi \right| _{\varvec{X}_i}. \end{aligned}$$
(31)

These are consistent with a cosmic-time Hamiltonian system with a pair interaction potential \(I(\varvec{x},\varvec{x}^{\prime})\) of the form

$$\begin{aligned} \mathscr {H}&= \sum _{i=1}^N \left[ \frac{P_i^2}{2M_i a^2} + \frac{1}{2} M_i \sum _{j\ne i} I(\varvec{X}_i,\varvec{X}_j) \right] \end{aligned}$$
(32a)
$$\begin{aligned} \text {where}&\quad \sum _{j\ne i} I(\varvec{X}_i,\varvec{X}_j) = \phi (\varvec{X}_i) \quad \text {with}\quad \nabla _x^2\phi = \frac{3H_0^2\varOmega _{\mathrm{m}}}{2a} \left( \frac{\rho }{\overline{\rho }}-1\right) . \end{aligned}$$
(32b)

The resulting acceleration term is given by

$$\begin{aligned} \varvec{g}(\varvec{x}) := -\left. \varvec{\nabla }_x \phi \right| _{\varvec{x}} = \frac{G}{a} \int _{\mathbb {R}^3} \mathrm{d}^3x^{\prime} \, \rho (\varvec{x^{\prime}},t) \frac{\varvec{x}^{\prime}-\varvec{x}}{\left\| \varvec{x}^{\prime}-\varvec{x}\right\| ^3}, \end{aligned}$$
(33)

which has no contribution from the background \(\overline{\rho }\) for symmetry reasons (Peebles 1980). The co-moving configuration space density \(\rho \) that provides the Poisson source arises from Eq. (30) and is given by

$$\begin{aligned} \rho (\varvec{x},t)= & {} \int _{\mathbb {R}^3} \mathrm{d}^3p\,\int _{\mathbb {R}^3} \mathrm{d}^3x^{\prime}\,m\,f_N(\varvec{x}^{\prime},\varvec{p},t) \, W(\varvec{x}^{\prime}-\varvec{x}) \nonumber \\= & {} \sum _{\varvec{n}\in \mathbb {Z}^3} \sum _{i=1}^NM_i\,\,W\left( \varvec{x}-\varvec{X}_i(t)-\varvec{n} L \right) . \end{aligned}$$
(34)

Here, we additionally allowed for a regularisation kernel \(W(\varvec{r})\) that is convolved with the discrete N-body density in order to improve the regularity of the density field and speed up convergence to the collisionless limit (or so one hopes). It represents a softening kernel (also called ‘assignment function’ depending on context) that regularises gravity and accounts for the fact that each particle is not a point-mass (like a star or black hole), but corresponds to an extended piece of phase space so that two-body scattering between the effective particles is always artificial and must be suppressed. We discuss how the acceleration obtained from this infinite sum is solved in practice in Sect. 5.

Discreteness effects The quality of the force calculation rests on how good an approximation the force associated with the density from Eq. (34) is. The hope is that by appropriate choice of W and an as large as possible number N of particles, the evolution remains close to the true collisionless dynamics and microscopic collisions remain subdominant. Due to the discrete nature of the particles, problems of the N-body approach are known to arise when force and mass resolution are not matched in which case the evolution of the discrete system can deviate strongly from that of the continuous limit (Centrella and Melott 1983; Centrella et al. 1988; Peebles et al. 1989; Melott and Shandarin 1989; Diemand et al. 2004b; Melott et al. 1997; Splinter et al. 1998; Wang and White 2007; Melott 2007; Melott and Shandarin 1989; Marcos 2008; Bagla and Khandai 2009). A slow convergence to the correct physical solution can, however, usually be achieved by keeping the softening so large that individual particles are never resolved. At the same time, if the force resolution in CDM simulations is not high enough at late times, then sub-haloes are comparatively loosely bound and prone to premature tidal disruption, leading to the ‘overmerging’ effect and the resulting orphaned galaxies (i.e., if the subhalo hosted a galaxy, it would still be a distinct system rather than having merged with the host), e.g. Klypin et al. (1999a), Diemand et al. (2004a), van den Bosch et al. (2018). In this case, one would want to choose the softening as small as possible. We discuss this in more detail in Sect. 8.2.

More sophisticated choices of W beyond a global (possibly time-dependent) softening scale are possible, for instance, the scale can depend on properties of particles, such as the local density (leading to what is called “adaptive softening”). We discuss the aspect of force regularisation by softening in more detail in Sect. 8.2. The gravitational acceleration that follows from Eq. (34) naturally has to take into account the cosmological Poisson equation, i.e., include the subtraction of the mean density and assume periodic boundary conditions. All aspects related to the time integration of cosmological Hamiltonians will be discussed in Sect. 4, those related to computing and evaluating gravitational interactions efficiently in Sect. 5 below.

Fig. 3
figure 3

Discretisations used in the numerical solution of Vlasov–Poisson: a the N-body method which samples the fine grained distribution function (light gray line) at discrete locations, b the ‘GDE’ method that can evolve the local manifold structure along with the particles (the green eigenvectors of \(\mathsf {D}_{\mathsf {xq}}\) are tangential to the Lagrangian submanifold, c the sheet tessellation method, which uses interpolation (here linear) between particles to approximate the Lagrangian submanifold with a tessellation, d a finite volume discretisation of the full phase space with uniform resolution \(\varDelta x\) in configuration space and \(\varDelta p\) in momentum space

3.2 Phase space deformation tracking and Lagrangian submanifold approximations

For large enough N, the N-body method is expected to converge to the collisionless limit. Nonetheless, an obvious limitation of this approach is that the underlying manifold structure is entirely lost as the particles retain only knowledge of positions and momenta and all other quantities (e.g. density, as well as other mean field properties) can only be recovered by coarse-graining a larger number of particles. Two different classes of methods, that we shall discuss next, have been developed over recent years that overcome this key limitation in various ways. The first class is based on promoting particles (which are essentially vectors) to tensors and re-write the canonical equations of motion to evolve them accordingly, resulting in equations of motion reminiscent of the geodesic deviation equation (GDE) in general relativity. The second class retains the particles but promotes them to vertices of a tessellation whose cells provide a discretisation of the manifold.

3.2.1 Tracking deformation in phase space—the GDE approach

We already discussed in Sect. 2.6 how infinitesimal volume elements of phase space evolve under a Hamiltonian flow. In particular, Eq. (28) is the canonical equation of motion for the phase-space Jacobian matrix. In the ‘GDE’ approach, instead of evolving only the vector \((\varvec{X}_i,\varvec{P}_i)\) for each N-body particle, one evolves in addition the tensor \(\mathsf{\varvec {D}}_i\) for each particle [cf. Vogelsberger et al. (2008), White and Vogelsberger (2009), but see also Habib and Ryne (1995) who derive a method to compute Lyapunov exponents based on the same equations]. Of particular interest is the \(\mathsf{\varvec {D}}_{\mathrm{xq}}\) sub-block of the Jacobian matrix since it directly tracks the local (stream) density associated with each N-body particle through \(\delta _i+1 = \left( \det \mathsf{\varvec {D}}_{\mathrm{xq},i}\right) ^{-1}\). The equations of motion for the relevant tensors associated to particle i are

$$\begin{aligned} \dot{\mathsf{\varvec {D}}}_{\mathrm{xq},i}&= \frac{1}{m a^{2}}\,\mathsf{\varvec {D}}_{\mathrm{pq},i} \end{aligned}$$
(35a)
$$\begin{aligned} \dot{\mathsf{\varvec {D}}}_{\mathrm{pq},i}&=- m\,\mathsf{\varvec {D}}_{\mathrm{xq},i} \cdot \left. \mathsf{\varvec {T}}\right| _{\varvec{X}_i}\quad \text {where}\quad \mathsf{\varvec {T}} := \nabla _x\otimes \nabla _x \phi , \end{aligned}$$
(35b)

and are solved alongside the N-body equations of motion (31) by computing the tidal tensor \(\mathsf{\varvec {T}}\). One caveat with the GDE approach is that the evolution of \(\mathsf{\varvec {D}}_\mathrm{xq}\) is determined not by the force but by the tidal field—which contains one higher spatial derivative of the potential than the force—and therefore is significantly less regular than the force field (see the detailed discussion and analysis in Stücker et al. (2021c) who have also studied the stream density evolution in virialised halos, based on a novel low-noise force calculation). This approach thus requires larger softening to achieve converged answers than a usual N-body simulation, and possibly cannot be shown to converge in the limit of infinite density caustics.

Evolving \(\mathsf{\varvec {D}}_{\mathrm{xq}}\) provides additional information about cosmic structure that is not accessible by standard N-body simulations. For instance, solving for the GDE enabled (Vogelsberger et al. 2008; Vogelsberger and White 2011) to estimate the number of caustics in dark matter haloes, which might add a boost to the self-annihilation rate of CDM particles, or the amount of chaos and mixing in haloes. A key result of Vogelsberger and White (2011) was that despite the large over-densities reached in collapsed structures, each particle is nonetheless inside of a stream with a density not too different from the cosmic mean density. This is possible since haloes are built like pâte feuilletée as a layered structure of many stretched and folded streams, as can be seen in panel a) of Fig. 5.

Fig. 4
figure 4

Images reproduced with permission [a-b] from Abel et al. (2012) and (c, d) from Stücker et al. (2020), copyright by the authors

a, b The density field obtained from the same set of N-body particles as a simple particle N-body density in (a) and in terms of a phase space sheet interpolation in terms of tetrahedral cells in (b). c, d The GDE method and the sheet tessellation method provide direct access to the stream density, which is shown in Lagrangian \(\varvec{q}\)-space for c the GDE approach and d the sheet tessellation approach

3.2.2 The dark matter sheet and phase space interpolation

A different idea to reconstruct the Lagrangian submanifold from existing N-body simulations was proposed by Abel et al. (2012) and Shandarin et al. (2012) who noted that a tessellation of Lagrangian space, constructed by using the initial positions of the N-body particles at very early times as vertices (i.e., the particles generate a simplicial complex on the Lagrangian submanifold), is topologically preserved in Hamiltonian evolution. This means that initially neighbouring particles can be connected up as non-overlapping tetrahedra (in the case of a 3D submanifold of 6D phase space). Their deformation and change of position and volume reflect the evolution of the phase-space distribution (and thus changes in the density field). A visual impression of the difference between an N-body density and this tessellation based density is given in Fig. 4. No holes can appear through Hamiltonian dynamics, but since the divergence of initially neighbouring points depends on the specific dynamics (notably the Lyapunov exponents that described the divergence of such trajectories), the edge connecting two vertices can become a tangled curve due to the complex dynamics in bound systems, e.g., Laskar (1993), Habib and Ryne (1995). As long as the tetrahedra edges still approximate well the submanifold, the simplicial complex provides access to a vast amount of information about the distribution of matter in phase space in an evolved system that is difficult or even impossible to reconstruct from N-body simulations. Most notably, it yields an estimate of density that is local but defined everywhere in space, shot-noise free, and produces sharply delineated caustics of dark matter after shell-crossing (Abel et al. 2012), leading also to new rendering techniques for 3D visualisation of the cosmic density field (Kähler et al. 2012; Igouchkine et al. 2016; Kaehler 2017), and very accurate estimators of the cosmic velocity field (Hahn et al. 2015; Buehlmann and Hahn 2019).

Since the density is well defined everywhere in space just from the vertices, and reflects well the anisotropic motions in gravitational collapse, Hahn et al. (2013) have proposed that this density field can be used as the source density field when solving Poisson’s equation as part of the dynamical evolution of the system. The resulting method, where few N-body particles define the simplicial complex that together determine the density field, solves the artificial fragmentation problem of the N-body method for WDM initial conditions (Hahn et al. 2013). The complex dynamics in late stages of collapse, however, limits the applicability of a method with a fixed number of vertices. This problem was later solved by allowing for higher order reconstructions of the Lagrangian manifold from N-body particles—corresponding in some sense to non-constant-metric finite elements in Lagrangian space—, and dynamical refinement (Hahn and Angulo 2016; Sousbie and Colombi 2016). For systems that exhibit strong mixing (phase or even chaotic, such as dark matter haloes), following the increasingly complex dynamics by inserting new vertices becomes quickly prohibitive (e.g., Sousbie and Colombi 2016; Colombi 2021 report an extremely rapid growth of vertices over time in a cosmological simulation with only moderate force resolution). Stücker et al. (2020) have carried out a comparison of the density estimated from phase space interpolation and that obtained from the GDE and found excellent agreement between the two except in the center of halos. The comparison between the two density estimates, shown in Lagrangian space, is reproduced in the bottom panels of Fig. 4.

The path forward in this direction lies likely in the use of hybrid N-body/sheet methods that exploit the best of both worlds as proposed by Stücker et al. (2020). Panel a) of Fig. 5 shows a 1+1D cut through 3+3D phase space for the case of a CDM halo comparing the result of a sheet-based simulation, where the cut results in a finite number continuous lines (top) and the equivalent results for a thin slice from an N-body simulation. The general impact of spurious phase space diffusion driven by the N-body method is still not very well understood with detailed comparison between various solvers under way (e.g., Halle et al. 2019; Stücker et al. 2020; Colombi 2021).

For hot distribution functions, such as e.g. neutrinos, the phase space distribution of matter is not fully described by the Lagrangian submanifold. While a 6D tessellation is feasible in principle, it has undesirable properties due to the inherent shearing along the momentum dimensions. However, Lagrangian submanifolds can still be singled out to provide a foliation of general six-dimensional phase space by selecting multiple Lagrangian submanifolds that are offset from each other initially by constant momentum vectors as proposed by Dupuy and Bernardeau (2014), Kates-Harbeck et al. (2016).

Fig. 5
figure 5

Comparison of evolved structures from N-body simulations with other discretisation approaches. a Comparison of a 1+1 dimensional phase space cut from simulations of a three-dimensional collapse of a CDM halo using a sheet tessellation with refinement (top, cf. Sousbie and Colombi 2016), and a reference N-body particle mesh simulation (bottom). The panels show an infinitely thin slice in the sheet case, and a finitely thin projection in the N-body case. b Simulations of collapse in 1+1D phase space with a particle mesh N-body method, the integer lattice method proposed by Mocz and Succi (2017), as well as two finite volume approaches, one where slabs in velocity space are allowed to continuously move against each other (‘moving mesh’). Images repsroduced with permission from [a] Colombi (2021), copyright by the author; and [b] from Mocz and Succi (2017)

Another version of phase space interpolation has been discussed in the context of collisionless dynamics by Colombi and Touma (2008, 2014), the so-called ‘waterbag’ method, which allows for general non-cold initial data but is restricted to 1+1 dimensional phase space. In this approach one exploits that the value of the distribution function is conserved along characteristics. If one now traces out isodensity countours of f in phase space, one finds a sequence of n (closed) lines defining the level set \(\left\{ (x,p)\;|\;f(x,p,t_0)=f_i\right\} \) of 1+1D phase space with \(i=1\dots n\) at the initial time \(t_0\). In 1+1D, these are closed curves. The curves can then be approximated using a number of vertices and interpolation between them. Moving the vertices along characteristics then guarantees that they remain part of the level set at all times as phase space density is conserved along characteristics. The number of vertices can be adaptively increased in order to maintain a high quality representation of the set contour interpolation at all times. The acceleration of the vertices can be conveniently defined in terms of integrals over the contours (cf. Colombi and Touma 2014).

3.3 Full phase-space techniques

Almost as old as the N-body approach to solve gravitational Vlasov–Poisson dynamics are approaches to directly solve the continuous problem for an incompressible fluid in phase space (cf. Fujiwara 1981 for 1+1 dimensions). By discretising phase space into cells of finite size in configuration and momentum space (\(\varDelta x\) and \(\varDelta p\) respectively) standard finite volume, finite difference methods or semi-Lagrangian techniques for incompressible flow can be employed. The main disadvantage of this approach is that memory requirements can be prohibitive since, without adaptive techniques or additional sophistications, the memory needed to evolve a three-dimensional system scales as \(\mathcal {O}(N_x^3\times N_p^3)\) to achieve a linear resolution of \(N_x\) cells per configuration space dimension and \(N_p\) cells per momentum space dimension. Only rather recently this has become possible at all as demonstrated for gravitational interactions by Yoshikawa et al. (2013) and Tanaka et al. (2017). The limited resolution that can be afforded in 3+3 dimensions leads to non-negligible diffusion errors even with high order methods, so that this direct approach is arguably best suited for hot mildly non-linear systems such as e.g. neutrinos (Yoshikawa et al. 2020, 2021), as the resolution required for colder systems is prohibitive. As a way to reduce such errors, Colombi and Alard (2017) proposed a semi-Lagrangian ‘metric’ method that uses a generalisation of the ‘GDE’ deformation discussed above to improve the interpolation step and reduce the diffusion error in such schemes.

As another way to overcome the diffusion problem, integer lattice techniques have been discussed (cf. Earn and Tremaine 1992), which exploit that if the time step is matched to the phase-space discretisation, i.e., \(\varDelta t = m (\varDelta x / \varDelta p)\), then the configuration space advection is exact and a reversible Hamiltonian system can be obtained for the lattice model discretisation. While this approach does not overcome the \(\mathcal {O}(N^6)\) memory scaling problem of a full phase-space discretisation technique, recently Mocz et al. (2017) have proposed important optimisations that might allow \(\approx \mathcal {O}(N^4)\) scaling by overcomputing, but that, to our knowledge, have not been demonstrated yet in 3+3 dimensional simulations. Results obained by Mocz et al. (2017) comparing the various techniques are shown in Fig. 5.

3.4 Schrödinger–Poisson as a discretisation of Vlasov–Poisson

An entirely different approach to discretise the Vlasov–Poisson system by exploiting the quantum-classical correspondence has been proposed by Widrow and Kaiser (1993) in the 1990s. Hereby one exploits that full information about the system, such as density, velocity, etc., can be recovered from the (complex) wave function, and phase space is discretised by a (here tuneable, not physical) quantisation scale \(\hbar =2\varDelta x\varDelta p\). Since the Schrödinger–Poisson system converges in the limit of \(\hbar \rightarrow 0\) to Vlasov–Poisson (Zhang et al. 2002), it can be used as a UV modified analogue model also for classical dynamics if one restricts attention (i.e. smoothes) on scales larger than \(\hbar \). It is important to note that the phase of the wave function is intimately related to the Lagrangian submanifold—both are given by a single scalar degree of freedom. For this reason, the Schrödinger–Poisson analogue has the advantage that it provides a full phase space theory with only a three-dimensional field (the wave function) that needs to be evolved. Following the first implementation by Widrow and Kaiser (1993), this model has found renewed interest recently (Uhlemann et al. 2014; Schaller et al. 2014; Kopp et al. 2017; Eberhardt et al. 2020; Garny et al. 2020). It is important to note that the underlying equations are identical to those of ‘fuzzy dark matter’ (FDM) models of ultralight axion-like particles, which we discuss in more detail in Sect. 7.4, in the absence of a self-interaction term. In the case of FDM, the quantum scale \(\hbar /m_{\mathrm{FDM}}\) is set by the mass \(m_{\mathrm{FDM}}\) of the microscopic particle and is (likely) not a numerical discretisation scale dictated by finite memory.

4 Time evolution

As we have shown above, large-scale dark matter simulations have an underlying Hamiltonian structure, usually with a time-dependent Hamiltonian. Mathematically, such Hamiltonian systems have a very rigid underlying structure, where the phase-space area spanned by canonically conjugate coordinates and momenta is conserved over time. Consequentially, specific techniques for the integration of Hamiltonian dynamical systems exist that preserve such underlying structure even in a numerical setting. For this reason, this section focuses almost exclusively on integration techniques for Hamiltonian systems as they arise in the context of cosmological simulations.

4.1 Symplectic integration of cosmological Hamiltonian dynamics

In the cosmological N-body problem, Hamiltonians arising in the Newtonian limit are typically of the non-autonomous but separable type, i.e. can be written

$$\begin{aligned} \mathscr {H} = \alpha (t) \, T(\varvec{P}_1,\dots ,\varvec{P}_N) + \beta (t)\, V(\varvec{X}_1,\dots ,\varvec{X}_N), \end{aligned}$$
(36)

where \(\varvec{X}_i\) and \(\varvec{P}_i\) are canonically conjugate, and \(\alpha (t)\) and \(\beta (t)\) are time-dependent functions that absorb all explicit time dependence (i.e. all factors of ‘a’ are pulled out of the Poisson equation for V). In cosmic time t one has \(\alpha =a(t)^{-2}\) and \(\beta =a(t)^{-1}\), which is not a clever choice of time coordinate since then the time dependence appears in both terms which complicates higher order symplectic integration schemes as we discuss below. The unique best choice is to not forget the relativistic origin of this Hamiltonian and consider time as a coordinate in extended phase space (cf. Lanczos 1986), using a parametric time \(\tilde{t}\) with \(\mathrm{d}\tilde{t} = a^{-2} \mathrm{d}t\) so that \(\alpha =1\) and \(\beta =a(t)\). This coincides with the “super-conformal time” first introduced by Doroshkevich et al. (1973) and extensively discussed by Martel and Shapiro (1998) under the name “super-comoving” coordinates.

Grouping coordinates and momenta together as \(\varvec{\xi }_j:=(\varvec{X},\varvec{P})_j\) and remembering that the equations of motion can be written in terms of Poisson bracketsFootnote 6 as \(\dot{\varvec{P}}_j = \left\{ \varvec{P}_j,\,\mathscr {H} \right\} \) and \(\dot{\varvec{X}}_j = \left\{ \varvec{X}_j,\,\mathscr {H} \right\} \), one can write the canonical equations as a first order operator equation

$$\begin{aligned} \dot{\varvec{\xi }}_j = \hat{\mathscr {H}}(t)\, \varvec{\xi }_j\quad \text {with}\quad \hat{\mathscr {H}}(t):= \left\{ \cdot ,\, \mathscr {H}(t)\right\} = \left\{ \cdot ,\, \alpha T\right\} + \left\{ \cdot ,\, \beta V\right\} =: \hat{D}(t) + \hat{K}(t), \end{aligned}$$
(37)

which defines the drift and kick operators \(\hat{D}\) and \(\hat{K}\), respectively. This first order operator equation has the formal solution

$$\begin{aligned} \varvec{\xi }_j(t) = \mathcal {T} \exp \left[ \int _0^t \mathrm{d}t^{\prime} \hat{\mathscr {H}}(t^{\prime})\right] \varvec{\xi }_j(0), \end{aligned}$$
(38)

where Dyson’s time-ordering operator \(\mathcal {T}\) is needed because the operator \(\hat{\mathscr {H}}\) is time-dependent. Upon noticing that the kick acts only on the momenta and it depends only on V (and therefore on the positions), and that the drift acts only on the positions and depends only on the momenta, one can seek for time-explicit operator factorisations that split the coordinate and momentum updates in the form

$$\begin{aligned} \mathcal {T} \exp \left[ \int _t^{t+\epsilon } \mathrm{d}t^{\prime} \hat{\mathscr {H}}(t^{\prime})\right] \simeq \exp \left[ \epsilon _n \hat{K}\right] \cdots \exp \left[ \epsilon _3 \hat{D} \right] \,\exp \left[ \epsilon _2 \hat{K} \right] \,\exp \left[ \epsilon _1 \hat{D} \right] + \mathcal {O}(\epsilon ^m). \end{aligned}$$
(39)

with appropriately chosen coefficients \(\epsilon _j\) that in general depend on (multiple) time integrals of \(\alpha \) and \(\beta \) (Magnus 1954; Oteo and Ros 1991; Blanes et al. 2009). This is a higher-order generalisation of the Baker–Campbell–Hausdorff (BCH) expansion in the case that \(\alpha \) and \(\beta \) are constants (Yoshida 1990). The cancellation of commutators in the BCH expansion by tuning of the coefficients \(\epsilon _j\) determines the order of the error exponent m on the right hand side of Eq. (39). It is important to note that if both \(\alpha \) and \(\beta \) are time-dependent then the generalised BCH expansion contains unequal-time commutators and the error is typically at best \(\mathcal {O}(\epsilon ^3)\). It is therefore much simpler to consider only the integration in extended phase space in super-conformal time, in which no unequal-time commutators appear and standard higher order BCH expansion formulae can be used. While some N-body codes (e.g., Ramses, Teyssier 2002) use super-conformal time, one finds numerous other choices of integration time for second order accurate integrators in the literature (e.g., Quinn et al. 1997; Springel 2005). In order to allow for generalisations to higher orders, we discuss here how to construct an extended phase-space integrator. Consider the set of coordinates \((\varvec{X}_j,\,a)\), \(j=1\dots N\), including the cosmic expansion factor, with conjugate momenta \((\varvec{P}_j,\,p_a)\) along with the new extended phase-space Hamiltonian in super-conformal time

$$\begin{aligned} \tilde{\mathscr {H}} := \sum _j \frac{{P}_j^2}{2M} + a V(\varvec{X}_{1\dots N}) + a^2\mathcal {H}(a) p_a. \end{aligned}$$
(40)

Then the second order accurate “leap-frog” integrator is found when \(\epsilon _1=\epsilon _3=\epsilon /2\) and \(\epsilon _2=\epsilon \) in Eq. (39) (all higher orders \(\epsilon _{3\dots n}=0\)) after expanding the operator exponentials to first order into their generators \(\exp [\epsilon \hat{D}]\simeq I + \epsilon \hat{D}\). The final integrator takes the form

$$\begin{aligned} \varvec{\xi }_j(\tilde{t}+\epsilon ) = \left( I+\frac{\epsilon }{2}\hat{D}\right) \left( I+\epsilon \hat{K}\right) \left( I+\frac{\epsilon }{2}\hat{D}\right) \varvec{\xi }_j(\tilde{t}) \end{aligned}$$
(41)

or explicitly as it could be implemented in code

$$\begin{aligned} \varvec{X}_j(\tilde{t}+\epsilon /2)&= \varvec{X}_j(\tilde{t}) + \frac{\epsilon }{2M} \; \varvec{P}_j(\tilde{t}) \end{aligned}$$
(42a)
$$\begin{aligned} a(\tilde{t}+\epsilon /2)&= a(\tilde{t}) + \frac{\epsilon }{2} \; a(\tilde{t})^2 \, \mathcal {H}(a(\tilde{t})) \end{aligned}$$
(42b)
$$\begin{aligned} \varvec{P}_j(\tilde{t}+\epsilon )&= \varvec{P}_j(\tilde{t}) - \epsilon \,a(\tilde{t}+\epsilon /2)\; \varvec{\nabla }_{\varvec{X}_j} V\left( \varvec{X}_{1\dots N}(\tilde{t}+\epsilon /2)\right) \end{aligned}$$
(42c)
$$\begin{aligned} \varvec{X}_j(\tilde{t}+\epsilon )&= \varvec{X}_j(\tilde{t}+\epsilon /2) + \frac{\epsilon }{2M} \; \varvec{P}_j(\tilde{t}+\epsilon ) \end{aligned}$$
(42d)
$$\begin{aligned} a(\tilde{t}+\epsilon )&= a(\tilde{t}+\epsilon /2) + \frac{\epsilon }{2}\; a(\tilde{t}+\epsilon /2)^2 \, \mathcal {H}(a(\tilde{t}+\epsilon /2))\;. \end{aligned}$$
(42e)

Note that the supplementary equation \(\mathrm{d} a/\mathrm{d}\tilde{t} = \partial \tilde{H}/\partial p_a = a^2\mathcal {H}(a)\), can in principle also be integrated inexpensively to arbitrarily high precision in general cases, for EdS one has \(\tilde{t} = -2/(H_0 \sqrt{a})\) and in \(\varLambda \)CDM

$$\begin{aligned} \tilde{t} = -\frac{2}{H_0 \sqrt{\varOmega _m a}} {}_2F_1\left( -\frac{1}{6},\frac{1}{2},\frac{5}{6};-f_\varLambda (a)\right) , \end{aligned}$$
(43)

where \(f_\varLambda := \varOmega _\varLambda / (\varOmega _{\mathrm{m}} a^{-3})\) as in Eq. (21), which has to be inverted numerically to yield \(a(\tilde{t})\). Since this is a symplectic integration scheme, it will conserve the energy associated with the Hamiltonian \(\tilde{\mathscr {H}}\).

Equations (42a)–(42d) represent the drift-kick-drift (DKD) form of a second order integrator. It is trivial to derive also the respective kick-drift-kick (KDK) form. Based on this, it is possible to construct also higher order integrators, see e.g., Yoshida (1990) for a derivation of operators up to 8th order that involve, however, positive and negative time coefficients. Also, alternative symplectic formulations with purely positive coefficients are possible, see Chin and Chen (2001) for a 4th order method. An exhaustive discussion of symplectic and other geometric integrators and their properties can be found in Hairer et al. (2006) and Blanes and Casas (2016). In cosmological simulations, the second order leap frog is, however, the most commonly used integrator to date, arguably due to its robustness, simplicity, slim memory footprint, and easy integration with hierarchical time-stepping schemes (see below). We are not aware of production implementations of higher-order symplectic integrators used in cosmological simulations.

A long-time evolution operator can be constructed by many successive applications of the KDK or DKD propagators. Writing out the product, it is easy to see that the last and first half-step operators from two successive steps can often be combined into a single operator (if \(\hat{A}\) below is time independent, otherwise usually at second order). Then, in the long product of operators, combining

$$\begin{aligned} \dots \exp (\epsilon _B \hat{B})\exp (\frac{\epsilon _A}{2} \hat{A})\exp (\frac{\epsilon _A}{2} \hat{A})\exp (\epsilon _B \hat{B})\dots =\dots \exp (\epsilon _B \hat{B})\exp (\epsilon _A \hat{A})\exp (\epsilon _B \hat{B})\dots , \end{aligned}$$

implies that in continued stepping only two interleaved steps have to be made per time step, not three, and that the splitting into three sub-steps just serves to symmetrise the scheme and interleave the steps. Half-steps are only made at the very beginning and end of the stepping—or whenever one needs synchronous dynamical variables (e.g. for output or analysis).

4.2 Multi-stepping, adaptive steps and separation of time-scales

4.2.1 Time step criteria

A challenge in cosmological simulations is the large dynamic range from quasilinear flow on large-scales to very short dynamical times in the centres of dark matter halos: we seek to simultaneously simulate large underdense regions of the universe—where particles move on long timescales together with massive clusters—where density contrasts can reach \(10^4\)\(10^5\) times the average density and with very short timescales. In the absence of an adaptive or hierarchical time-stepping scheme, the criteria discussed below yield a global timestep \(\varDelta t = \min _i \varDelta t_i\) dictated by the N-body particle with the smallest time step.

A simple condition for choosing a time step is the Courant–Friedrichs–Lewy (CFL) criterion, which requires that particles travel less than a fraction of one force resolution element, \(\varDelta x\), over the time step. Specifically

$$\begin{aligned} \varDelta t_i = C \frac{\varDelta x}{ \Vert \varvec{P}_i / M_i \Vert } \,, \end{aligned}$$
(44)

where \(0<C<1\) is a free parameter, usually \(C \sim 0.25\). While commonly used in the case of Vlasov–Poisson, we are not aware of explicit derivations of this value from stability criteria as in the case of hyperbolic conservation laws. A closely related criterion is \(\varDelta t_i = C \sqrt{\varDelta x/\Vert \varvec{A}_i\Vert }\), where \(\varvec{A}_i := -\varvec{\nabla }\phi |_{\varvec{X}_i}\) is the acceleration. This condition sets a global timestep that is commonly used in simulations where forces are computed via the PM algorithm (e.g., Merz et al. 2005). Other criteria are also possible, for instance the ABACUS code (Garrison et al. 2016), in addition to Eq. 44, uses a heuristic condition based on the global maximum value of the RMS velocity over the maximum acceleration of particles in a volume element.

In the case of tree or direct summation based methods, the scale played by the mesh grid resolution is taken over by the softening length. Therefore, a simple criterion is to estimate for each particle a time-scale

$$\begin{aligned} \varDelta t_i \simeq \eta \sqrt{\frac{\varepsilon }{\Vert \varvec{A}_i\Vert }} \,, \end{aligned}$$
(45)

where \(\varepsilon \) is the gravitational force softening scale, and \(\eta \) is a dimensionless accuracy parameter. This is the most common time-stepping criterion adopted in large-scale simulations and it is used, for instance, in PKDGRAV-3 (with \(\eta = 0.2\)) and GADGET.

Several authors have argued that a more optimal timestepping criterion should be based on the tidal field rather than the acceleration (Dehnen and Read 2011; Stücker et al. 2020; Grudić and Hopkins 2020). This is also motivated by the fact that a constant (global) acceleration does not change the local dynamics of a system, see e.g. Stücker et al. (2021a). Additionally, using tides avoids invoking the non-physical scale \(\varepsilon \), and one has

$$\begin{aligned} \varDelta t_i \simeq \frac{\eta }{ \sqrt{\left\| \; \mathsf{\varvec {T}}(\varvec{X}_i) \; \right\| }} \,, \end{aligned}$$
(46)

where \(\mathsf{\varvec {T}}=\varvec{\nabla }\otimes \varvec{\nabla }\phi \) is the tidal field tensor, and \(\Vert \cdot \Vert \) is e.g. the Frobenius matrix norm. This tidal criterion typically yields shorter timesteps in the innermost parts and longer timesteps in the outer parts of haloes compared to the standard criterion (45). A caveat is that it is not trivial to get a robust estimate of the tidal field, since it is one spatial order less smooth than the acceleration field entering (45) and so in principle time-step fluctuations due to noise might amplify integration errors.

An additional global timestep criterion, regardless of the dynamics of N-body particles, is sometimes used in large-scale cosmological simulation when high resolution is not required. These criteria are usually tied to the scale-factor evolution and e.g. of the form

$$\begin{aligned} \varDelta \log (a) < B \,, \end{aligned}$$
(47)

where \(B \sim 0.01\). These kind of criteria are also usually employed in PM codes and COLA (which we will discuss in Sect. 4.4), which typically adopt timesteps equally spaced in the expansion factor or in its logarithm, or more complicated functions (e.g., \(\varDelta a/a = (a_1^{-2} + a_2^{-2})^{-0.5}\) with \(a_1\) and \(a_2\) being free parameters in Fast-PM). Different authors have advocated different options and number of steps justified simply by convergence rates of a set of desired summary statistics (White et al. 2014; Feng et al. 2016; Tassev et al. 2013; Izard et al. 2016). The criterion in Eq. 47 is also commonly used together with other conditions even in high-resolution simulations, since it appears to be necessary for a precise agreement with linear theory predictions at high redshift.

We note that different timestep criteria (and combinations) are adopted by different codes, usually heuristically motivated. The optimal choice seems to depend on details of the simulation (redshift, force accuracy, etc), which suggest there could be better strategies to choose the timestep. This could be very important as it has a significant impact in the overall computational cost of a simulation. For instance, by adjusting the timesteps, Sunayama et al. (2016) finds a factor of 4 reduction in the CPU time in N-body simulations while still accurately recovering the power spectrum and halo mass function (after a correction of masses). As far as we know, no systematic study of the optimal general time-stepping strategy has been published for large-scale cosmological simulations taking into account the target accuracy needed for upcoming observations.

For some applications, a global timestep is sufficient to obtain accurate results on the mildly nonlinear regime, as it has been adopted in some large-scale simulations codes. However, as the resolution of a simulation increases, the minimum value of \(\varDelta t_i\) quickly decreases, usually as the result of a small number of particles in short orbit inside dark matter halos. To avoid that the shortest time-scale dictates an intractably small global time step, it is desirable to have individually adaptive time-steps. Some care needs to be taken to consistently allow for this in a time integrator, as we discuss next.

4.2.2 Hierarchical / block time stepping

For systems of particles with widely different time-scales, a division into ‘fast’ and ‘slow’ particles is advantageous. Given a second order accurate integrator and a splitting of the Hamiltonian into \(\mathscr {H} = T + V_{\mathrm{slow}} + V_{\mathrm{fast}}\), then the following n-fold sub-cycling scheme is also second order accurate (Hairer et al. 2006)

$$\begin{aligned} \left( I+\frac{\epsilon }{2}\hat{K}_{\mathrm{slow}}\right) \left[ \left( I+\frac{\epsilon }{2n}\hat{K}_{\mathrm{fast}}\right) \left( I+\frac{\epsilon }{n}\hat{D}\right) \left( I+\frac{\epsilon }{2n}\hat{K}_{\mathrm{fast}}\right) \right] ^n \left( I+\frac{\epsilon }{2}\hat{K}_{\mathrm{slow}}\right) . \end{aligned}$$
(48)

The gain is that in one such KDK timestep, while the fast particles are kicked 2n times, the slow particles are kicked only twice. Since the force computation is the algorithmically slowest part of the update, this leads to a computational speed up.

This sub-cycling idea can be generalised to the block time step scheme (BTS, sometimes also called ‘hierarchical time stepping’) to update particles on their relevant time scales (Hayli 1967, 1974; McMillan 1986) using a time-quantised recursive version of the sub-cycling scheme above. By fixing \(n=2\), but applying the formula recursively, i.e., splitting the ‘fast’ part itself into a ‘slow’ and ‘fast’ part and so on, one achieves a hierarchical time integration scheme with time steps \(\epsilon _\ell = 2^{-\ell } \epsilon _0\) on recursion level \(\ell \). The power-two quantisation means that while a particle on level \(\ell \) makes one timestep, a particle on level \(\ell +1\) makes two, and one on level \(\ell +2\) makes four in the same time interval. Since the scheme is self-similar, after every sub step with Eq. (48) on level \(\ell \), all particles on levels \(\ell ^{\prime}>\ell \) have carried out complete time steps, and can be re-assigned a new time bin. In this way, each particle can be assigned to a close to optimal time step “bin” which is consistent with its local timestep criterion. This multi-stepping approach is adopted in PKDGRAV-3 and GADGET.

This scheme can be further optimized. As written above, the kick \(\hat{K}\) on level \(\ell \) involves the interaction with particles on all other levels, i.e. the ‘fast’ particles interact with the ‘slow’ particles on the ‘fast’ timescale, while it is likely sufficient that they do so on the ‘slow’ time scale. For this reason, a variation, e.g. implemented in GADGET4 (Springel et al. 2021), is that the kick of particles on level \(\ell \) is computed using the gravitational force of particles only on all levels \(\ell ^{\prime} \ge \ell \). Such hierarchical integration schemes have recently been extended to higher order integrators in the non-cosmological case (Rantala et al. 2021). In principle, secondary trees need to be built in every level of the hierarchy of timesteps, however, this would require a significant amount of computing time. Therefore, a optimization strategy, e.g. adopted in PKDGRAV-3, is to build a new secondary tree only if a timestep level contains a small fraction of particles compared to the previous level where the tree was built.

4.3 Symplectic integration of quantum Hamiltonians

The integration of a classical Hamiltonian in operator notation, Eq. (37) is basically identical to that of the Schrödinger-Poisson system which is an effective description of non-relativistic scalar field dark matter, Eq. (92). One can therefore use the entire machinery of operator splitting developed above, with differences only in the form of the kick and drift operators, as they now act on the wave function, rather than the set of conjugate coordinates and momenta. As above, the best choice is again superconformal time so that one has a quantum Hamiltonian acting on wave functions \(\psi \) with associated Poisson equation:

$$\begin{aligned} i\hbar \frac{\partial \psi }{\partial \tilde{t}}= & {} \hat{\mathscr {H}}\psi \qquad \text {with}\qquad \hat{\mathscr {H}} = \frac{\hat{p}^2}{2m}+a(\tilde{t})\,\hat{V}(\hat{q})\qquad \text {and}\qquad \nonumber \\ \nabla ^2 \hat{V}= & {} \frac{3}{2}H_0^2 m \varOmega _X ( \left| \psi \right| ^2-1). \end{aligned}$$
(49)

With the formal solution identical to that in Eq. (38) apart from a factor \(\mathrm{i}/\hbar \) in the exponent, note also that the mass m is the actual microscopic particle mass and not a coarse-grained effective mass. The main difference and advantage compared to the classical case is that drift and kick operators \(\hat{D}\) and \(\hat{K}\) need not be represented through their infinitesimal generators, but can be directly taken to be the operator exponentials

$$\begin{aligned} \hat{D}(\tilde{t},\tilde{t}+\epsilon ) = \exp \left( -\epsilon \,\frac{i}{\hbar } \frac{\hat{p}^2}{2m} \right) ,\qquad \text {and}\qquad \hat{K}(\tilde{t},\tilde{t}+\epsilon ) = \exp \left( -\epsilon a(\tilde{t}+\epsilon /2)\frac{i}{\hbar } \hat{V} \right) . \end{aligned}$$
(50)

The kick operator is purely algebraic, i.e. it is simply a scalar function multiplying the wave function. The same is true for the drift operator in Fourier space, where

$$\begin{aligned} \hat{\tilde{D}}(\tilde{t},\tilde{t}+\epsilon ) := \mathcal {F}\,\hat{D}(\tilde{t},\tilde{t}+\epsilon ) = \exp \left( -\epsilon \, \frac{\mathrm{i}\hbar }{2m}k^2 \right) \end{aligned}$$
(51)

is simply a scalar function; \(\mathcal {F}\) is the Fourier transform operator with inverse \(\mathcal {F}^{-1}\), \(\varvec{k}\) the Fourier-conjugate wave number (or momentum) to coordinate \(\varvec{x}\). One can thus formulate a split-step spectral integration scheme (e.g., Taha and Ablowitz 1984; Woo and Chiueh 2009), where drift operators are simply applied in Fourier space, i.e. before a drift, the wave function is transformed, and after it is transformed back. The time coefficients again have to be matched to cancel out commutators in the BCH relation, so that a second order DKD time step is e.g. given by

$$\begin{aligned} \psi (\varvec{x},t+\epsilon ) = \mathcal {F}^{-1} \hat{\tilde{D}}(\tilde{t}+\epsilon /2,\tilde{t}+\epsilon ) \mathcal {F} \hat{K}(\tilde{t},\tilde{t}+\epsilon ) \mathcal {F}^{-1} \hat{\tilde{D}}(\tilde{t},\tilde{t}+\epsilon /2) \mathcal {F}\,\psi (\varvec{x},t). \end{aligned}$$
(52)

The chaining of multiple steps eliminates two of the Fourier transforms for the steps that are not at the beginning or end of the time integration, leaving one forward and one backward transform per time step. The use of Fourier transforms and the spectrally accurate drift operator (51) which contains the exponentiated Laplacian \(\hat{p}^2\) to all orders significantly increases the convergence rate of the scheme even for large time steps.

The limiting factor of a spectral approach is that the spatial discretisation is necessarily uniform for Fourier based methods. It is clear however that the use of AMR techniques to locally increase the resolution inhibits (to our knowledge) the simple use of spectral approaches, so that smaller stencils and finite-difference expansions become necessary. Since gravity and the expanding Universe contract gravitationally bound structures to ever smaller scales, methods with higher dynamic range are needed to probe the interior of halos. To our knowledge, no spatially adaptive spectral method has been developed yet. Instead, methods with higher dynamic range resort to finite difference methods with AMR, which has been successfully used to model the interior dynamics of scalar field dark matter haloes (Schive et al. 2014; Mina et al. 2020; Schwabe et al. 2020). By discretising the kick operator, one has to resort to the generator formulation again, i.e. \(\hat{D}_\epsilon = 1+\epsilon \frac{\mathrm{i}\hbar }{m}\nabla ^2 + \mathcal {O}(\epsilon ^2)\), where then the Laplacian can be approximated with finite differences. This imposes strong CFL-like time-stepping constraints.

4.4 Acceleration methods: COLA and FastPM

One of the main short-comings of symplectic integration schemes is that they have to evolve a time-dependent Hamiltonian, which can require many time steps even during the quasi-linear early phase of structure formation. In contrast, perturbation theory is very accurate during this regime. For instance, first-order LPT (a.k.a. the Zel’dovich approximation) yields exact results for one-dimensional problems up to shell-crossing and, therefore, the solution could be obtained in a single time-step. This implies that (mostly in low resolution large-scale simulations) a considerable amount of computing time can be spent to accurately evolve the particles during the quasi-linear phase, since too large timesteps during this phase can lead to significant large-scale errors. This has motivated several methods aimed at incorporating results from perturbation theory in the time-integration of an N-body scheme. These approaches have been widely-adopted to more efficiently create large ensembles of simulations that achieve high accuracy on mildly non-linear scales. We review the main ideas and implementations next.

4.4.1 COLA: COmoving Lagrangian Acceleration

In the COLA approach (Tassev et al. 2013), the idea is to consider motion relative to a pre-computed LPT solution (specifically 1 and 2LPT in all implementations we are aware of), by considering the equations of motion of CDM, Eq. (24), relative to the motion in nLPT, as quantified by the order n truncated Lagrangian map \(\varvec{x}_j = \varvec{q}_j+\varvec{\varPsi }_{\mathrm{LPT},j}(t)\) for particle j. In all existing work on COLA that we are aware of, an ad-hoc modification of the leapfrog drift and kick operators is made to reflect this change in the equations of motion. One can however write such a transformation rigorously as a canonical transformation to new coordinates \((\varvec{X}_j,\varvec{P}_j)\) with a generating function \(\mathscr {F}_3(\varvec{p}_j,\varvec{X}_j,\tilde{t}) = ( M\dot{\varvec{\varPsi }}_{\mathrm{LPT},j}-\varvec{p}_j)\cdot \varvec{X}_j\) so that \(\varvec{X}_j=\varvec{x}_j\) and \(\varvec{P}_j=\varvec{p}_j - M\dot{\varvec{\varPsi }}_{\mathrm{LPT},j}\). The COLA Hamiltonian then becomes (where the dot now indicates a derivative w.r.t. superconformal time)

$$\begin{aligned} \mathscr {H}_{\mathrm{COLA}} = \sum _j\frac{(\varvec{P}_j+M\dot{\varvec{\varPsi }}_{\mathrm{LPT},j})^2}{2 M} + a V(\varvec{X}_{1\dots N}) + M \sum _j \ddot{\varvec{\varPsi }}_{\mathrm{LPT},j}\cdot \varvec{X}_j. \end{aligned}$$
(53)

It is immediately obvious that this Hamiltonian has explicit time-dependence in both the kinetic and the potential part which will complicate the development of symplectic splitting schemes. The equations of motion now reflect the motion relative to the LPT solution of the form

$$\begin{aligned} \dot{\varvec{X}}_j = \frac{\varvec{P}_j}{M} + \dot{\varvec{\varPsi }}_{\mathrm{LPT},j}\qquad \text {and}\qquad \dot{\varvec{P}}_j = -a\,\varvec{\nabla }_{\varvec{X}_j} V - M\ddot{\varvec{\varPsi }}_{\mathrm{LPT},j}. \end{aligned}$$
(54)

Existing COLA implementations ignore the Dyson time-ordering and simply modify the drift and kick operators to become

$$\begin{aligned} \hat{D}_{\mathrm{COLA}}(\tilde{t},\tilde{t}+\epsilon ) \varvec{X}_j= & {} \hat{D}(\tilde{t},\tilde{t}+\epsilon ) \varvec{X}_j + \varvec{\varPsi }_{\mathrm{LPT},j}(\tilde{t}+\epsilon ) - \varvec{\varPsi }_{\mathrm{LPT},j}(\tilde{t}), \end{aligned}$$
(55a)
$$\begin{aligned} \hat{K}_{\mathrm{COLA}}(\tilde{t},\tilde{t}+\epsilon ) \varvec{P}_j= & {} \hat{K}(\tilde{t},\tilde{t}+\epsilon ) \varvec{P}_j + M\dot{\varvec{\varPsi }}_{\mathrm{LPT},j}(\tilde{t}+\epsilon ) - M\dot{\varvec{\varPsi }}_{\mathrm{LPT},j}(\tilde{t}), \end{aligned}$$
(55b)

which is accurate at first order since none of the unequal time commutators can be expected to vanish (Tassev et al. 2013 discuss an ad-hoc improvement to reduce errors in their Appendix A.3.2). Despite the low order, this method allows for a very rapid approximate evolution in the quasi-linear regime, at the expense of having to store the fields needed to compute the nLPT trajectory of each fluid element. The above modifications are widely used in, for instance, the generation of a large number of mock catalogues for estimates of clustering covariance matrices. In such cases, ensembles of thousands of low-mass resolution, each of them with typically 10 time steps, are performed. An MPI-parallel implementation of the COLA algorithm is publicly available for the implementation of Koda et al. (2016)Footnote 7 and in the form of the L-PICOLAFootnote 8 code (Howlett et al. 2015), and a modified sCOLA algorithm allowing a more efficient spatial decomposition and “zoom simulations” has been recently proposed (Tassev et al. 2015) and is implemented in the Simbelmyne code (Leclercq et al. 2020)Footnote 9. Therefore, it might be worthwhile exploring non-symplectic integration schemes in COLA, which could be rigorously higher order (given that there is no obvious benefit from symplectic integration anyway) and improve the performance of the method.

4.4.2 FastPM

An alternative that does not require to compute or store the LPT displacement fields, thus saving computer memory, while relying on PT input to speed up the evolution was proposed as the FastPM method by Feng et al. (2016)Footnote 10. In this approach, the prefactors for drift and kick operators receive an ad-hoc modification such that they contain the expected contribution of non-constant acceleration and velocities computed in the Zeldovich approximation. Note, however, that it has been argued that no such modifications are needed to obtain an accurate time integration, as long as the time stepping is chosen appropriately (Klypin and Prada 2018a; see also Sunayama et al. 2016). The performance is similar to that of COLA, allowing approximate simulations with very few time steps that can be accurate on large scales. As for COLA, the order of convergence has, to our knowledge, not been discussed in the literature. The FastPM approach has been recently extended to include the modelling of massive neutrinos (Bayer et al. 2021), and also ported to TensorFlow (Modi et al. 2020b).

5 Gravity calculation

After having discussed the discretization in time and space (i.e. mass, for Lagrangian schemes) of the evolution equations, we now turn to the problem of computing the gravitational interactions of the simulated mass distribution. This step is usually the most time-consuming aspect of a modern N-body simulation and thus also where most numerical approximations are made and where various parallelization strategies have the largest impact. Depending on the problem at hand, the targeted numerical accuracy, and the computer architecture employed, several different methods exists that are in different senses ’optimal’. Modern state-of-the-art codes typically exploit all these existing techniques. Optimal algorithmic complexity, \(\mathcal {O}(N)\) in the number N of particles, is achieved e.g. by the Fast Multipole Method (FMM), which is very promising for very large particle count simulations and is used e.g., in the PKDGRAV3 or Gadget-4 codes, and the geometric multigrid method, used e.g., in the RAMSES code. The newest codes also readily utilise thousands of GPUs to generate simulated Universes for the upcoming generations of cosmological observations.

In the following, we provide a brief overview of the main methods and the main ideas behind them. Regarding the general topic of gravity calculations in N-body, we also refer to other reviews for further details on these methods (Dehnen and Read 2011).

5.1 Mesh-based methods

A robust and fast method to solve for the gravitational interactions of a periodic system is provided by the particle-mesh (PM) method (Doroshkevich et al. 1980; Hockney and Eastwood 1981). Derived from the particle-in-cell (PIC) technique developed in plasma physics, they are among the oldest numerical methods employed to study cosmological structure formation. The techniques described here can be employed not only for the N-body discretisations, but are readily applicable also e.g. for full phase space or integer lattice methods (cf. Sects. 3.2 and 3.3), see also Miller and Prendergast (1968), and even in the case of Schrödinger-Poisson systems (Woo and Chiueh 2009).

5.1.1 Force and potential determination—spectral calculation

Considering a periodic domain of side length L, we want to solve the cosmological Poisson equation (Eq. 32b). Assume that both density \(\rho \) and potential \(\phi \) are periodic in \([-L/2,L/2)\) and can be expanded in a Fourier series, i.e.

$$\begin{aligned} \rho (\varvec{x})=\sum _{\varvec{n}\in \mathbb {Z}^3} \tilde{\rho }_{\varvec{n}}\exp \left( \mathrm{i} k_0\, \varvec{x}\cdot \varvec{n}\right) ,\quad \text {with}\quad k_0:=\frac{2\pi }{L} \end{aligned}$$
(56)

and identically for \(\phi (\varvec{x})\) with coefficients \(\tilde{\phi }_{\varvec{n}}\). It then follows from Poisson’s equation (Eq. 17) that their Fourier coefficients obey the algebraic relation

$$\begin{aligned} -k_0^2\left\| \varvec{n}\right\| ^2\,\tilde{\phi }_{\varvec{n}} = 4\pi G a^{-1} \left( \tilde{\rho }_{\varvec{n}} - \overline{\rho }\,\delta _D(\varvec{n}) \right) \quad \text {for all}\quad \varvec{n}\in \mathbb {Z}^3. \end{aligned}$$
(57)

This equation imposes the consistency condition \(\tilde{\rho }_{\varvec{n}=\varvec{0}}=\overline{\rho }\), i.e. the mean Poisson source must vanish. In practice, this is achieved in PM codes by explicitly setting to zero the \(\varvec{n}=0\) mode (a.k.a. the “DC mode”, in analogy to AC/DC electric currents). For the acceleration field \(\varvec{g} = -\nabla \phi \), one finds \(\tilde{\varvec{g}}_{\varvec{n}} = -\mathrm{i}k_0 \varvec{n} \tilde{\phi }_{\varvec{n}}\). The solution for potential and acceleration can thus be conveniently computed using the Discrete Fourier transform (DFT) as

$$\begin{aligned} \tilde{\phi }_{\varvec{n}} = \left\{ \begin{array}{cl} -\frac{4\pi G }{a k_0^2} \frac{\tilde{\rho }_{\varvec{n}}}{\Vert \varvec{n}\Vert ^2}&{} \quad \text {if}\quad \varvec{n}\ne \varvec{0} \\ 0 &{} \quad \text {otherwise } \end{array} \quad , \right. \qquad \tilde{\varvec{g}}_{\varvec{n}} = \left\{ \begin{array}{cl} \frac{4\pi G}{a k_0} \frac{\mathrm{i}\,\varvec{n}\tilde{\rho }_{\varvec{n}}}{\Vert \varvec{n}\Vert ^2}&{} \quad \text {if}\quad \varvec{n}\ne \varvec{0} \\ 0 &{} \quad \text {otherwise } \end{array} \right. . \end{aligned}$$
(58)

If one considers a uniform spatial discretisation of both potential \(\phi _{\varvec{m}}:=\phi _{i,j,k} := \phi (\varvec{m} h)\) and density \(\rho _{\varvec{m}}\), with \(i,j,k\in [0\dots N_g-1]\), mesh index \(\varvec{m}:=(i,j,k)^T\), and grid spacing \(h:=L/N_g\), then the solution can be directly computed using the Fast-Fourier-Transform (FFT) algorithm at \(\mathcal {O}(M\log M)\) for \(M=N_g^3\) grid points. Many implementations exist, the FFTW libraryFootnote 11 (Frigo and Johnson 2005) is one of the most commonly used with support for multi-threading and MPI. In the case of the DFT, the Fourier sum is truncated at the Nyquist wave number, so that \(\varvec{n} \in (-N_g/2,N_g/2]^3\).

Note that instead of the exact Fourier-space Laplacian, \(-k_0^2 \Vert \varvec{n} \Vert ^2\), which is implicitly truncated at the Nyquist wave numbers, sometimes a finite difference version is used in PM codes such as Fast-PM (Feng et al. 2016) (cf. 4.4). Inverting the second order accurate finite difference Laplacian in Fourier space yieldsFootnote 12

$$\begin{aligned} \tilde{\phi }_{\varvec{n}}^{\mathrm{FD2}} = \left\{ \begin{array}{cl} -\frac{\pi G \varDelta x^2 }{a} \;\tilde{\rho }_{\varvec{n}}\;\left( \sin ^2\left[ \frac{\pi n_x}{N_g} \right] + \sin ^2\left[ \frac{\pi n_y}{N_g} \right] + \sin ^2\left[ \frac{\pi n_z}{N_g} \right] \right) ^{-1}&{} \quad \text {if}\quad \varvec{n}\ne \varvec{0} \\ 0 &{} \quad \text {otherwise. } \end{array} \right. \end{aligned}$$
(59)

This kernel has substantially suppressed power on small scales compared to the Fourier space Laplacian, which reduces aliasing (see the discussion in the next section). It also reduces the effect of anisotropies due to the mesh on grid scales.

Solving Poisson’s equation in Fourier space with FFTs becomes less efficient if boundary conditions are not periodic, or if spatial adaptivity is necessary. For isolated boundary conditions, the domain has to be zero padded to twice its size per linear dimension, which is an increase in memory by a factor of eight in three dimensions. This is a problem on modern architectures since memory is expensive and slow, while floating-point operations per second (FLOP) are much cheaper to have in comparison. A further problem of FFT methods is their parallelization: a multidimensional FFT requires a global transpose of the array. This leads to a very non-local communication pattern and the need to transfer all of the data multiple times between computer nodes per force calculation.

Additionally, if high resolution is required, as is often the case in cosmology due to the nature of gravity as an attractive force, the size of the grid can quickly become the computational bottleneck. One possibility is to introduce additional higher resolution meshes (Jessop et al. 1994; Suisalu and Saar 1995; Pearce and Couchman 1997; Kravtsov et al. 1997; Teyssier 2002), deposit particles onto them and then solve using an adaptive “relaxation method” such as the adaptive multigrid method (see below), or by employing the periodic FFT solution as a boundary condition. Adaptive algorithms are typically more complex‘ due to the more complicated data structures involved.

It is also possible to employ another (or many more) Fourier mesh extended over a particular region of interest in a so-called “zoom simulation”, cf. Sect. 6.3.4, if higher force resolution is required in a few isolated subregions of the simulation volume. A problem related of this method is that, for a finite grid resolution, Fourier modes shorter than the Nyquist frequency will be incorrectly aliased to those supported by the Fourier grid (Hockney and Eastwood 1981), which causes a biased solution to the Poisson equation. The magnitude of this aliasing effect depends on the mass assignment scheme and can be reduced when PM codes are complemented with other force calculation methods, as discussed below in Sect. 5.3, since then the PM force is usually UV truncated.

Instead of adding a fine mesh on a single region of interest, it is possible to add it everywhere in space. This approach is known as two-level PM or PMPM, and has been used for carrying out Cosmo-\(\pi \), the largest N-body simulation to date (cf. Sect. 10). This approach has the advantage that, for a cubical domain decomposition, all the operations related to the fine grid can be performed locally, i.e. without communication among nodes in distributed-memory systems, which might result in significant advantages specially when employing hundreds of thousands of computer nodes.

For full phase-space techniques, the PM approach also is preferable if a regular mesh already exists in configuration space onto which the mass distribution can then be easily projected. The Fourier space spectral solution of the Poisson equation can also be readily employed in the case of Schrödinger–Poisson discretisations on a regular grid. In this case, the Poisson source is computed from the wave function which is known on the grid, so that \(\rho _{\varvec{m}} = \psi _{\varvec{m}} \psi _{\varvec{m}}^*\).

5.1.2 Mass assignment schemes

Grid-based methods always rely on a charge assignment scheme (Hockney and Eastwood 1981) that deposits the mass \(m_i\) associated with a particle i at location \(\varvec{X}_i\) by interpolating the particle masses in a conservative way to grid point locations \(\varvec{x}_{\varvec{n}}\) (where \(\varvec{n}\in \mathbb {N}^3\) is a discrete index, such that e.g. \(\varvec{x}_{\varvec{n}} = \mathbf {n}\,\varDelta x\) in the simplest case of a regular (cubic) grid of spacing \(\varDelta x\)). This gives a charge assignment of the form

$$\begin{aligned} \rho _{\varvec{n}} = \int _{\mathbb {R}^3} \mathrm{d}^3x^{\prime}\,\hat{\rho }(\varvec{x}^{\prime}) \,W_{3D}(\varvec{n}\,\varDelta x-\varvec{x}^{\prime})\quad \text {with}\quad \hat{\rho }(\varvec{x}):=\sum _{i=1}^N M_i \delta _D(\varvec{x}-\varvec{X}_i), \end{aligned}$$
(60)

where the periodic copies in the density were dropped since periodic boundary conditions are assumed in the Poisson solver. Charge assignment to a regular mesh is equivalent to a single convolution if \(M_i=M\) is identical for all particles. The most common particle-grid interpolation functions (cf. Hockney and Eastwood 1981) of increasing order are given for each spatial dimension byFootnote 13

$$\begin{aligned} W_{\mathrm{NGP}}(x)= & {} \frac{1}{h}\left\{ \begin{array}{ll} 1 &{} \quad {\text {for}}\,\left| x \right| \le \frac{\varDelta x}{2}\\ 0 &{} \quad {\text {otherwise}} \end{array}\right. \end{aligned}$$
(61a)
$$\begin{aligned} W_{\mathrm{CIC}}(x)= & {} \frac{1}{h}\left\{ \begin{array}{ll} 1-\frac{\left| x\right| }{\varDelta x} &{}\quad {\text {for}}\,\left| x\right| < \varDelta x \\ 0 &{}\quad \text {otherwise} \end{array}\right. \end{aligned}$$
(61b)
$$\begin{aligned} W_{\mathrm{TSC}}(x)= & {} \frac{1}{\varDelta x}\left\{ \begin{array}{ll} \frac{3}{4} - \left( \frac{x}{\varDelta x}\right) ^2 &{} \quad {\text {for}}\,\left| x\right| \le \frac{\varDelta x}{2}\\ \frac{1}{2}\left( \frac{3}{2} - \frac{\left| x\right| }{\varDelta x}\right) ^2 &{} \quad \text {for }\frac{\varDelta x}{2}\le \left| x \right| < \frac{3\varDelta x}{2}\\ 0 &{} \quad {\text {otherwise}} \end{array}\right. \end{aligned}$$
(61c)
$$\begin{aligned} W_{\mathrm{PCS}}(x)= & {} \frac{1}{\varDelta x}\left\{ \begin{array}{ll} \frac{1}{6} \left[ 4 - 6\left( \frac{x}{\varDelta x}\right) ^2 + 3 \left( \frac{|x|}{\varDelta x}\right) ^3 \right] &{} \quad {\text {for}}\,\left| x\right| \le \varDelta x\\ \frac{1}{6}\left( 2 - \frac{\left| x\right| }{\varDelta x}\right) ^3 &{} \quad {\text {for}}\, \varDelta x \le |x| < 2 \varDelta x\\ 0 &{} \quad {\text {otherwise}} \end{array}\right. \end{aligned}$$
(61d)

The three-dimensional assignment function is then just the product \(W_{3D}(\varvec{x})=W(x)\,W(y)\,W(z)\), where \(\varvec{x}=(x,y,z)^T\). It can be easily shown that interpolating using the inverse of these operators from a field increases the regularity of the particle density field, and thus also has a smoothing effect on the resulting effective gravitational force. This can also be seen directly from the Fourier transform of the assignment functions which have the form (per dimension)

$$\begin{aligned} \tilde{W}_{n}(k) = \left[ \mathrm{sinc }\frac{\pi }{2}\frac{k}{k_\mathrm{Ny}}\right] ^n\quad \text {with}\quad \mathrm{sinc}\,x = \frac{\sin x}{x}. \end{aligned}$$
(62)

where \(n=1\) for NGP, \(n=2\) for CIC, \(n=3\) for TSC, \(n=4\) for PCS interpolation, and \(k_{\mathrm{Ny}}:=\pi /\varDelta x\) is the Nyquist wave number. NGP leads to a piecewise constant, CIC to a piece-wise linear, TSC to a piecewise quadratically (i.e., continuous value and first derivative), and PCS to piecewise cubically changing acceleration as a particle moves between grid points. The real space and Fourier space shape of the kernels is shown in Fig. 6. Note that the support is always \(n \varDelta x\), i.e. n cells, per dimension and thus increases with the order, and by the central limit theorem \(\tilde{W}_n\) converges to a normal distribution as \(n\rightarrow \infty \). Hence, going to higher order can impact memory locality and communication ghost zones negatively. Since an a priori unknown number of particles might deposit to the same grid cell, special care needs to be taken to make the particle projection thread safe in shared-memory parallelism (Ferrell and Bertschinger 1994).

Fig. 6
figure 6

Common particle-mesh mass assignment kernels in real space (panels a), and Fourier space (panels b) of increasing order: \(n=1\) NGP, \(n=2\) CIC, \(n=3\) TSC, \(n=4\) PCS. Note that the NGP kernel is not continuous, CIC is continuous but not differentiable, TSC is continuously differentiable, and PCS is twice differentiable. The support of the assignment functions is \(n\varDelta x\) per dimension, and they converge to a normal distribution for \(n\rightarrow \infty \). Due to their increasing smoothness, they also act as increasingly stronger low-pass filters

Alternatively to these mass assignment kernels for particles, it is possible to project phase-space tessellated particle distributions (cf. Sect. 3.2) exactly onto the force grid (Powell and Abel 2015; Sousbie and Colombi 2016). In practice, when using such sheet tessellation methods, for a given set of flow tracers, the phase-space interpolation can be constructed and sampled with M “mass carrying” particles which can then be deposited into the grid. Since the creation of mass carriers is a local operation, M can be arbitrarily large and thus the noise associated to N-body discreteness can be reduced systematically. This approach has been adopted by Hahn et al. (2013) and Angulo et al. (2013b) to simulate warm dark matter while suppressing the formation of artificial fragmentation, as we will discuss in greater detail in Sect. 7.3

The same mass assignment schemes can be used to reversely interpolate values of a discrete field back to the particle positions \(\left\{ \varvec{X}_i\right\} \) by inverting the projection kernel. It has to be ensured that the same order is used for both mass deposit and interpolation of the force to the particle positions, i.e., that deposit and interpolation are mutually inverse. This is an important consistency since, otherwise, (1) exact momentum conservation is not guaranteed, and (2) self-forces can occur allowing particles to accelerate themselves (cf. Hockney and Eastwood 1981). It is important to note that due to the grid discretisation, particle separations that are unresolved by the discrete grid are aliased to the wrong wave numbers, which e.g. can cause certain Fourier modes to grow at the wrong rate. Aliasing can be ameliorated by filtering out scales close to the Nyquist frequency, or by using interlacing techniques where by combination of multiple shifted deposits individual aliasing contributions can be cancelled at leading order (Chen et al. 1974; Hockney and Eastwood 1981). Such techniques are important also when estimating Fourier-space statistics (i.e., poly-spectra) from density fields obtained using above deposit techniques (see Sect. 9 for a discussion).

5.1.3 Relaxation methods and multi-scale

In order to overcome the limitations of Fourier space solvers (in particular, the large cost of the global transpose on all data necessary along with the lack of spatial adaptivity), a range of other methods have been developed. The requirement is that the Poisson source is known on a grid, which can also be an adaptively refined ‘AMR’ grid structure. On the grid, a finite difference version of the Poisson equation is then solved, e.g., for a second-order approximation in three dimension the solution is given by the finite difference equation:

$$\begin{aligned} \phi _{i-1,j,k}+\phi _{i+1,j,k}+\phi _{i,j-1,k}+\phi _{i,j+1,k}+\phi _{i,j,k-1}+\phi _{i,j,k+1}-6\phi _{i,j,k} = \varDelta x^2\,f_{i,j,k} \,, \end{aligned}$$
(63)

where indices refer to grid point locations as above, \(\varDelta x\) is the grid spacing, and \(f_{i,j,k} := 4\pi G (\rho _{i,j,k}-\overline{\rho })/a\) is the Poisson source. This can effectively be written as a matrix inversion problem \(\mathsf{\varvec {A}} \phi = f\) where the finite difference stencil gives rise to a sparse matrix \(\mathsf{\varvec {A}}\) and the solution sought is \(\phi =\mathsf{\varvec {A}}^{-1}f\). Efficient methods exist to solve such equations. A particularly powerful one, that can directly operate even on an AMR structure, is the adaptive multigrid method (Brandt 1977; Trottenberg et al. 2001), which is used e.g., by the RAMSES code (Teyssier 2002). It combines simple point relaxation (e.g., Jacobi or Gauss–Seidel iterations) with a hierarchical coarsening procedure which spreads the residual correction exponentially fast across the domain. Some additional care is required at the boundaries of adaptively refined regions. Here the resolution of the mesh changes, typically by a linear factor of two, and interpolation from the coarser grid to the ghost zones of the fine grid is required. In the one-way interface type of solvers, the coarse solution is obtained independently of the finer grid, and then interpolated to the finer grid ghost zones to serve as the boundary condition for the fine solution (Guillet and Teyssier 2011), but no update of the coarse solution is made based on the fine solution. This approach is particularly convenient for block-stepping schemes (cf. Sect. 4.2.2) where each level of the grid hierarchy has its own time step by solving e.g. twice on the fine level while solving only once on the coarse. A limitation of AMR grids is however that the force resolution can only change discontinuously by the refinement factor, both in time—if one wants to achieve a resolution that is constant in physical coordinates—and in space—as a particle moves across coarse-fine boundaries. On the other hand, AMR grids contain self-consistently an adaptive force softening (see Sect. 8.2), if the refinement strategy is tied to the local density or other estimators (Hobbs et al. 2016).

Depending on the fragmentation of the finer levels due to the dynamic adaptivity, other solvers can be more efficient than multigrid, such as direct relaxation solvers (Kravtsov et al. 1997) or conjugate gradient methods. However, it is in principle more accurate to account for the two-way interface and allow for a correction of the coarse potential from the fine grid as well, as discussed e.g. by Johansen and Colella (1998), Miniati and Colella (2007). Note that, once a deep grid hierarchy has developed, global Poisson solves in each fine time step are usually prohibitive for numerical algorithms. For this reason optimizations are often employed to solve for the gravitational acceleration of only a subset of particles in multi-stepping schemes. In the case of AMR, some care is necessary to interpolate boundary conditions also in time to avoid possible spurious self-interactions of particles.

5.2 Direct P2P summation

As discussed above, mesh-based methods bring along an additional discretisation of space. This can be avoided by computing interactions directly at the particle level from Eqs. (32b33). In this case, the gravitational potential at particle i’s location, \(\varvec{X_i}\), is given by the sum over the contribution of all the other particles in the system along with all periodic replicas of the finite box, i.e.

$$\begin{aligned} \phi (\varvec{x}_i) = - a^{-1} \sum _{\varvec{n}\in \mathbb {Z}^3} \left[ \sum _{{\mathop{j\!=\!1}\limits_ {{i\!\ne\! j}}}}^N\frac{G M_j}{\Vert \varvec{X}_i-\varvec{X}_j-\varvec{n}L \Vert } + \varphi _{\mathrm{box},L}(\varvec{X}_i-\varvec{n}L)\right] . \end{aligned}$$
(64)

Note that we neglected force softening for the moment, i.e. we set \(W(\varvec{x})=\delta _D(\varvec{x})\). Here \(\varphi _{\mathrm{box},L}\) is the potential due to a box \([0,L)^3\) of uniform background density \(\overline{\rho }=\varOmega _m\rho _c\) that guarantees that the density \(\rho -\overline{\rho }\) sourcing \(\phi \) vanishes when integrated over the box.

This double sum is slowly convergent with respect to \(\varvec{n}\), and in general there can be spurious forces arising from a finite truncation [but note that the sum is unconditionally convergent if the box has no dipole, e.g., Ballenegger (2014)]. A fast and exact way to compute this expression is provided by means of an Ewald summation (Ewald 1921), in which the sum is replaced by two independent sums, one in Fourier space for the periodic long-range contribution, and one in real space, for the non-periodic local contribution, which both converge rapidly. It is then possible to rewrite Eq. (64) employing the position of the nearest replica, which results into pairwise interactions with a modified gravitational potential. This potential needs to be computed numerically, thus, in GADGET3, for instance, it is tabulated and then interpolated at runtime, whereas in GADGET4, the code relies on a look-up table of a Taylor expansion with analytic derivatives of the Ewald potential. We summarise in more detail how this is achieved in Sect. 5.3, where we discuss in particular how the FFT can be efficiently used to execute the Fourier summation.

This direct summation of individual particle-particle forces is \(\mathcal {O}(N^2)\), that is, quadratic in the number of particles and thus becomes quickly computationally prohibitive. In addition, since it is a highly non-local operation, it would require a considerable amount of inter-process communication. In practice, this method is sometimes used to compute short-range interactions, where the operation is local and can exploit the large computational power provided by GPUs. This is, for instance, the approach followed by the HACC code (Habib et al. 2016), when running one of the largest simulations to date with 3.6 trillion particles; and also by the ABACUS code (Garrison et al. 2018). Direct summation enabled by GPUs has also been adopted by Rácz et al. (2019) for compactified simulations, where there is an additional advantage that only a small subset of the volume has to be followed down to \(z=0\) (cf. Sect. 6.3.5).

5.3 Particle mesh Ewald summation, force splitting and the P\(^3\)M method

Beyond the poor \(\mathcal {O}(N^2)\) scaling of the direct P2P summation (for which we discuss the solutions below), another important limitation of the naïve infinite direct summation is the infinite periodic contribution in Eq. (64). At the root of the solution is the Ewald summation (Ewald 1921), first used for cosmological simulations by Bouchet and Hernquist (1988), in which the total potential or acceleration is split into a short and a long range contribution, and where the short range contribution is summed in real space, while the long range contribution is summed in Fourier space where it converges due to its periodic character much faster. One thus introduces a ‘splitting kernel’ S so that

$$\begin{aligned} \phi (\varvec{x}) = \phi _{\mathrm{lr}}(\varvec{x})+ \phi _{\mathrm{sr}}(\varvec{x}) := S*\phi + (1-S)*\phi . \end{aligned}$$
(65)

The long-range contribution \(\phi _{\mathrm{lr}}\) can be computed using the PM method on a relatively coarse mesh. On the other hand, the short-range contribution \(\phi _{\mathrm{sr}}\), can be computed from the direct force between particles only in their immediate vicinity—since the particles further away contribute through the PM part. Instead of the direct force, which gives then rise to the P\(^3\)M method, modern codes often use a tree-method (see next section) for the short range force [this is e.g., what is implemented in the GADGET2 code by Springel (2005), see also Wang (2021)].

The splitting kernel effectively spreads the mass over a finite scale \(r_s\) for the long range interaction, and corrects for the residual with the short range interaction on scales \(\lesssim r_s\). Many choices are a priori possible, Hockney and Eastwood (1981), e.g., propose a sphere of uniformly decreasing density, or a Gaussian cloud. The latter is, e.g., used in the GADGET codes.

In terms of the Green’s function of the Laplacian \(G(\varvec{r}) = -1/(4\pi \Vert \varvec{r}\Vert )\), the formal solution for the cosmological Poisson equation reads \(\phi = \frac{4\pi G}{a} \left( \rho -\overline{\rho }\right) *G\). For a Gaussian cloud of scale \(r_s\), one has in real and Fourier space

$$\begin{aligned} S(r; r_s) = (2\pi r_s^2)^{-3/2} \exp \left( -\frac{r^2}{2r_s^2} \right) ,\quad \tilde{S}(k; r_s) = \exp \left[ -\frac{1}{2}k^2 r_s^2\right] . \end{aligned}$$
(66)

The ‘dressed’ Green’s functions \(G_{\mathrm{lr}} = G*S\) and \(G_\mathrm{sr} = G*(1-S)\) then become explicitly in real and Fourier space

$$\begin{aligned} G_{\mathrm{lr}}(r; r_s)&= - \frac{1 }{4\pi \,r} \,\mathrm{erf\left[ \frac{r}{\sqrt{2}r_s} \right] }, \quad&\tilde{G}_{\mathrm{lr}}(k; r_s)&= -\frac{1}{k^2}\exp \left[ -\frac{1}{2}k^2r_s^2 \right] , \end{aligned}$$
(67a)
$$\begin{aligned} G_{\mathrm{sr}}(r; r_s)&= - \frac{1 }{4\pi \,r} \,\mathrm{erfc\left[ \frac{r}{\sqrt{2}r_s} \right] },&\tilde{G}_{\mathrm{sr}}(k; r_s)&= -\frac{1}{k^2}\left( 1-\exp \left[ -\frac{1}{2}k^2r_s^2 \right] \right) . \end{aligned}$$
(67b)

Instead of the normal Green’s functions, one thus simply uses these truncated functions and obtains a hybrid solver. In order to use this approach, one chooses a transition scale of order the grid scale, \(r_s\sim \varDelta x\), and then replaces the PM Green’s function with \(G_{\mathrm{lr}}\). Instead of the particle-particle interaction in the direct summation or tree force (see below), one uses \(G_\mathrm{sr}\) for the potential, and \(\varvec{\nabla } G_{\mathrm{sr}}\) for the force.

While the long range interaction already includes the periodic Ewald summation component if solved with Fourier space methods, when evaluating the periodic replica summation for the short-range interaction, the evaluation can be restricted to the nearest replica in practice due to the rapid convergence with the regulated interaction. In addition, since PM forces are exponentially suppressed on scales comparable to \(r_s\) which is chosen to be close to the grid spacing \(\varDelta x\), aliasing of Fourier modes is suppressed.

Note that another more aggressive near-far field combination is adopted by the ABACUS code. In this approach, the computational domain is first split into a uniform grid with \(K^3\) cells. Interactions of particles separated by less than approximately 2L/K are computed using direct summation (neglecting Ewald corrections); otherwise are computed using a high-order multipole (\(p=8\)) representation of the force field in the \(K-\)grid. Since two particles only interact via either the near- or far-field forces, and the tree structure is fixed to the K-grid, this allows for several optimizations and out-of-core computations. The price is discontinuous force errors with a non-trivial spatial dependence, as well as reduced accuracy due to the lack of Ewald corrections. This, however, might be acceptable for some applications and, as we will see in Sect. 8.5, ABACUS performs well when compared to other state-of-the-art codes.

5.4 Hierarchical tree methods

Assuming that it is acceptable to compute gravitational forces with a given specified accuracy, there are ways to circumvent the \(\mathcal {O}(N^2)\) and non-locality problem of direct summation. A common approach is to employ a hierarchical tree structure to partition the mass distribution in space and compute the gravitational potential jointly exerted by groups of particles, whose potential is expanded to a given multipole order (Barnes and Hut 1986). Thus, instead of particle-particle interactions, particle-node interactions are evaluated. Since the depth of such a tree is typically \(\mathcal {O}(\log N)\), the complexity of the evaluation of all interactions can be reduced to \(\mathcal {O}(N\log N)\). This can be further reduced to an ideal \(\mathcal {O}(N)\) complexity with the fast multipole method (FMM, see below).

There are several alternatives for constructing tree structures. The most common choice is a regular octree in which each tree level is subdivided into 8 sub-cells of equal volume, this is for instance used by GADGET. Another alternative, used for instance in old versions of PKDGRAV are binary trees in which a node is split into only two daughter cells. This in principle has the advantage to adapt more easily to anisotropic domains, and a smoother transition among levels, at the expense of a higher cost in walking the tree or the need to go to higher order multipole expansions at fixed force error. The tree subdivision continues until a maximum number M of particles per node is reached (\(M=1\) in GADGET2-3 but higher in GADGET4 and PKDGRAV).

The main advantage brought by tree methods is that the pairwise interaction can be expanded perturbatively and grouped among particles at similar locations, thus reducing dramatically the number of calculations that needs to be carried out. The key philosophical difference with respect to direct summation is that one seeks to obtain the result at a desired accuracy, rather than the exact result to machine precision. This difference allows a dramatic improvement in algorithmic complexity. Another key aspect is that hierarchical trees are well suited for hierarchical (adaptive) timesteps.

Tree methods have for a long time been extraordinarily popular for evaluating the short range interactions also in hybrid tree-PM methods, as pioneered by Bagla (2002); Bagla and Ray (2003), or more recent FMM-PM (Gnedin 2019; Wang 2021; Springel et al. 2021) approaches, thus supplementing an efficient method for periodic long-range interactions with an efficient method which is not limited to the uniform coarse resolution of FFT-based approaches (or also discrete jumps in resolution of AMR approaches). We discuss some technical aspects of these methods next.

5.4.1 Hierarchical multipole expansion

In the ‘Barnes & Hut tree’ algorithm (Appel 1985; Barnes and Hut 1986), particle-node interactions are evaluated instead of particle-particle interactions. Let us consider a hierarchical octree decomposition of the simulation box volume \(\mathcal {V}:=[0,L_{\mathrm{box}}]^3\) at level \(\ell \) into cubical subvolumes, dubbed ‘nodes’, \(\mathcal {S}^\ell _{i=1\dots N_\ell }\) of side length \(L_{\mathrm{box}}/2^\ell \), where \(N_\ell =2^{3\ell }\), so that \(\bigcup _i \mathcal {S}^\ell _i = \mathcal {V}\) and \(\mathcal {S}^\ell _i\cap \mathcal {S}^\ell _{j\ne i} = \emptyset \) on each level gives a space partitioning. Let us consider the gravitational potential due to all particles contained in a node \(\varvec{X}_j\in S^\ell _i\). The partitioning is halted when only one (but typically a few) particle is left in a node. We shall assume isolated boundary conditions for clarity, i.e. we neglect the periodic sum in Eq. (64). Thanks to the partitioning, the gravitational interaction can be effectively localised with respect to the ‘tree node’ pivot at location \(\varvec{\lambda }\in \mathcal {S}^\ell _i\), so that the distance \(\Vert \varvec{X}_j - \varvec{\lambda } \Vert \le \sqrt{3} L_{\mathrm{box}}/2^\ell =: r_\ell \) is by definition bounded by the ‘node size‘ \(r_\ell \) and can serve as an expansion parameter. To this end, one re-writes the potential due to the particles in the node subvolume \(\mathcal {S}^\ell _i\)

$$\begin{aligned} \phi ^\ell _i(\varvec{x}) \propto \sum _{\varvec{X}_j\in \mathcal {S}_i^\ell } \frac{M_j}{\Vert \varvec{x}-\varvec{X}_j\Vert } = \sum _{\varvec{X}_j\in \mathcal {S}_i^\ell } \frac{M_j}{\Vert (\varvec{x}-\varvec{\lambda })-(\varvec{X}_j-\varvec{\lambda })\Vert } = \sum _{\varvec{X}_j\in \mathcal {S}_i^\ell } \frac{M_j}{\Vert \varvec{d}+\varvec{\lambda }-\varvec{X}_j\Vert } \end{aligned}$$
(68)

where \(\varvec{d}:=\varvec{x}-\varvec{\lambda }\). This can be Taylor expanded to yield the ‘P2M’ (particle-to-multipole) kernels

$$\begin{aligned} \begin{aligned} \frac{1}{\Vert \varvec{d}+\varvec{\lambda }-\varvec{X}_j\Vert } =&\underbrace{\frac{1}{\Vert \varvec{d}\Vert }}_{\text {monopole}} + \underbrace{\frac{d_k}{\Vert \varvec{d}\Vert ^3} \left( X_{j,k}-\lambda _k\right) }_{\text {dipole}\; \mathcal {O}(r_\ell /d^2)} + \\&\quad + \underbrace{\frac{1}{2}\frac{d_kd_l}{\Vert \varvec{d} \Vert ^5} \left( 3(X_{j,k}-\lambda _k)(X_{j,l}-\lambda _l) -\delta _{kl} \Vert \varvec{X}_j-\varvec{\lambda } \Vert ^2 \right) }_{\text {quadrupole}\;\mathcal {O}(r_\ell ^2/d^3)} +\dots , \end{aligned} \end{aligned}$$
(69)

which converges quickly if \(\Vert \varvec{d}\Vert \gg r_\ell \). The multipole moments depend only on the vectors \((\varvec{X}_j-\varvec{\lambda })\) and can be pre-computed up to a desired maximum order p during the tree construction and stored with each node. In doing this, one can exploit that multipole moments are best constructed bottom-up, as they can be translated in an upward-sweep to the parent pivot and then co-added—this yields an ‘upwards M2M’ (multipole-to-multipole) sweep. Note that if one sets \(\varvec{\lambda }\) to be the centre of mass of each tree node, then the dipole moment vanishes. The complexity of such a tree construction is \(\mathcal {O}(N\log N)\) for N particles.

When evaluating the potential \(\phi (\varvec{x})\) one now proceeds top-down from the root node at \(\ell =0\) in a ‘tree walk’ and evaluates M2P (multipole-to-particle) interactions between the given particle and the node. Since one knows that the error in \(\phi ^\ell _i(\varvec{x})\) is \(\mathcal {O}\left( (r_\ell /d)^p \right) \), one defines a maximum ‘opening angle’ \(\theta _{\mathrm{c}}\) and requires in order for the multipole expansion \(\phi ^\ell _i(\varvec{x})\) to be an acceptable approximation for the potential due to the mass distribution in \(\mathcal {S}^\ell _i\) that the respective opening angle obeys

$$\begin{aligned} \frac{r_\ell }{\Vert \varvec{d}\Vert } <\theta _{\mathrm{c}}. \end{aligned}$$
(70)

Otherwise the procedure is recursively repeated with each of the eight child nodes. Since the depth of a (balanced) octree built from a distribution of N particles is typically \(\mathcal {O}(\log N)\), a full potential or force calculation has an algorithmic complexity of \(\mathcal {O}(N\log N)\) instead of the \(\mathcal {O}(N^2)\) of the direct summation. The resulting relative error in a node-particle interaction is (Dehnen 2002)

$$\begin{aligned} \delta \phi \le \frac{\theta _c^{p+1}}{1-\theta _c} \frac{M_\mathrm{node}}{\Vert \mathbf {d}\Vert }, \end{aligned}$$
(71)

where \(M_{\mathrm{node}}\) is the node mass (i.e. the sum of the masses of all particles in \(\mathcal {S}^\ell _i\)), and p is the order of the multipole expansion. Eq. 70 error estimate is a purely geometric criterion, independent of the magnitude of \(M_\mathrm{node}\) and the multipole moments, as well as the actual value of the gravitational acceleration. It is also independent of the magnitude of the interaction, i.e. neglecting that far nodes contribute more than nearby ones to the total interaction.

An alternative method, proposed by Springel et al. (2001b), is to use a dynamical criterion by comparing the expected acceleration with the force error induced by a given node interaction. Specifically, when evaluating the particle-node interactions for particle j one sets

$$\begin{aligned} \theta _{\mathrm{c},j} = \left( \alpha \Vert \varvec{A}_j\Vert \frac{\Vert \varvec{d}\Vert ^2}{G M_{\mathrm{node}}}\right) ^{1/p}, \end{aligned}$$
(72)

where \(\Vert \varvec{A}_j\Vert \) is the modulus of the gravitational acceleration (which could be estimated from the force calculation performed in a previous step), and \(\alpha \) is a dimensionless parameter that controls the desired accuracy. Note, however, that for relatively isotropic mass distributions, the uncertainty of a given interaction might not be representative of the uncertainty in the total acceleration.

We highlight that the expressions (68)–(69) are valid for the non-periodic particle-node interactions, but for periodic boundary conditions additional terms arise owing to the modified Green’s function as seen in Eq. (64). The Green’s function is also modified in the case when tree interactions are combined with other methods such as PM in a tree-PM method (see Sect. 5.3). This implies in principle also modified error criteria (or opening angles), however, this is often neglected.

So far, performing the multipole expansion only to monopole order (with nodes centered at the center of mass) has been a popular choice for N-body codes. The reason behind this is that a second-order accurate expression is obtained with very low memory requirements (one needs to simply store the centre of mass of tree nodes instead of the geometric centre, which is enough when moderate accuracy is sought. However, in search of higher accuracy, a larger number of codes have started to also consider quadrupole and octupole terms, which requires more memory and computation but allows a less aggressive opening criteria. This has been advocated as the optimal combination that provides the most accurate estimate at a fixed computational cost (Dehnen and Read 2011; Potter and Stadel 2016), although the precise optimal order depends on the required accuracy (Springel et al. 2021). In the future, further gains from higher order terms can be obtained as computer architectures evolve towards higher FLOP/byte ratios.

A problem for tree codes used for cosmological simulations is that on large scales and/or at high redshift the mass distribution is very homogeneous. This is a problem since the net acceleration of a particle is then the sum of many terms of similar magnitude but opposite sign that mostly cancel. Thus, obtaining accurate forces requires a low tolerance error which increases the computational cost of a simulation. For instance, the Euclid Flagship simulation (Potter and Stadel 2016), which employed a pure Tree algorithm (cf. Sect. 10), spends a considerable amount of time on the gravitational evolution at high redshift. Naturally, this problem is exacerbated the larger the simulation and the higher the starting redshift.

A method to address this problem, that was proposed by Warren (2013) and implemented in the 2HOT code, is known as “background subtraction”. The main idea is to add the multipole expansion of a local uniform negative density to each interaction, which can be computed analytically for each cubic cell in a particle-node interaction. Although this adds computational cost to the force calculation, it results in an important overall reduction of the cost of a simulation since many more interactions can be represented by multipole approximations at high redshift. As far as we know, this has not been widely adopted by other codes.

A further optimization that is usually worth carrying out on modern architectures is to prevent tree refinement down to single particles (for which anyway all multipoles beyond the monopole vanish). Since the most local interactions end up being effectively direct summation anyway, one can get rid of the tree overhead and retain a ‘bucket’ of \(10^{2-3}\) particles in each leaf node rather than a single individual particle. All interactions within the node, as well as those which would open child nodes are carried out in direct summation. While algorithmically more complex, such a direct summation is memory-local and can be highly optimized and e.g. offloaded to GPUs, providing significant speed-up over the tree.

5.4.2 Fast-multipole method

Despite the huge advantage with respect to direct summation, a single interaction of a particle with the tree is still computationally expensive as it has a \(\mathcal {O}(\log N)\) complexity for a well-balanced tree. Furthermore, trees as described above have other disadvantages, for instance, gravitational interactions are not strictly symmetric. This leads to a violation of momentum conservation. A solution to these limitations is provided by fast multipole methods (FMM), originally proposed by Greengard and Rokhlin (1987) and extended to Cartesian coordinates by Dehnen (2000, 2002). These algorithms take the idea of hierarchical expansions one step further by realising that significant parts of the particle-node interactions are redundantly executed for particles that are within the same node. In order to achieve a \(\mathcal {O}(1)\) complexity per particle, the node-node interaction should be known and translated to the particle location. This is precisely what FMM achieves by symmetrising the interaction to node-node interactions between well-separated nodes, which are separately Taylor expanded inside of the two nodes. Up to recently, FMM methods have not been widespread in cosmology, presumably due to a combination of higher algorithmic and parallelization complexity. The advantages of FMM are becoming evident in modern N-body codes, which simulate extremely large numbers of particles and seek high accuracy, and thus FMM has been adopted in PKDGRAV, GADGET-4, and SWIFT. We only briefly summarize the main steps of the FMM algorithm here, and refer the reader to the reviews by, e.g., Kurzak and Pettitt (2006), Dehnen and Read (2011) for details on the method.

The FMM method builds on the same hierarchical space decomposition as the Barnes&Hut tree above and shares some operators. For the FMM algorithm, three steps are missing from the tree algorithm outlined in the previous section: a ‘downward M2L’ (multipole-to-local) sweep, which propagates the interactions back down the tree after the upward M2M sweep, thereby computing a local field expansion in the node. This expansion is then shifted in ‘downward L2L’ (local-to-local) steps to the centers of the child nodes, and to the particles in a final ‘L2P’ (local-to-particle) translation. As one has to rely on the quality of the local expansion in each node, FMM requires significantly higher order multipole expansions compared to standard Barnes&Hut trees to achieve low errors. Note that for a Cartesian expansion in monomials \(x^ly^mz^n\) at a fixed order \(p=l+m+n\), one has \((p+1)(p+2)/2\) multipole moments, i.e. \((p+1)(p+2)(p+3)/6\) for all orders up to incl. p, i.e. memory needed for each node scales as \(\mathcal {O}(p^3)\), and a standard implementation evaluating multipole pair interactions scales as \(\mathcal {O}(p^6)\). For expansions in spherical harmonics, one can achieve \(\mathcal {O}(p^3)\) scaling (Dehnen 2014). Note that for higher order expansions one can rely on known recursion relations to obtain the kernel coefficients (Visscher and Apalkov 2010) allowing arbitrary order implementations. Recently, it was demonstrated that a trace-free reformulation of the Cartesian expansion has a slimmer memory footprint (Coles and Bieri 2020) (better than 50% for \(p\ge 8\)). The same authors provide convenient Python scripts to auto-generate code for optimized expressions of the FMM operators symbolically. It is important to note that the higher algorithmic complexity lends itself well to recent architectures which favour high FLOP-to-byte ratio algorithms (Yokota and Barba 2012).

While the FMM force is symmetric, Springel et al. (2021) report however that force errors can be much less uniform in FMM than in a standard tree approach, so that it might be required to randomise the relative position of the expansion tree w.r.t. the particles between time steps in order to suppress the effect of correlated force errors on sensitive statistics for cosmology. In principle also isotropy could be further improved with random rotations. Note that errors might have a different spatial structure with different expansion bases however.

The FMM method indeed has constant time force evaluation complexity for each N-body particle. This assumes that the tree has already been built, or that building the tree does not have \(\mathcal {O}(N\log N)\) complexity (which is only true if it is not fully refined but truncated at a fixed scale). Note however that for FMM solvers, it is preferable to limit the tree depth to a minimum node size or at least use a larger number of particles in a leaf cell for which local interactions are computed by direct ‘P2P’ (particle-to-particle) interactions. Also, tree construction has typically a much lower pre-factor than the ‘tree walk’. Note further that many codes use some degree of ‘tree updating’ in order to avoid rebuilding the tree in every timestep.

In order to avoid explicit Ewald summation, some recent methods employ hybrid FFT-FMM methods, where essentially a PM method is used to evaluate the periodic long range interactions as in tree-PM and the FMM method is used to increase the resolution beyond the PM mesh for short-range interactions (Gnedin 2019; Springel et al. 2021).

6 Initial conditions

In previous sections we have discussed how to discretise a cold collisionless fluid, compute its self-gravity, and evolve it in time. Closing our review of the main numerical techniques for cosmological simulations, in this section we present details on how to compute and set up their initial conditions.

The complicated non-linear structures that are produced at late times in cosmological simulations develop from minute fluctuations around homogeneity in the early Universe that are amplified by gravitational collapse. While the fluctuations remain small (i.e. at early times, or on large scales) they permit a perturbative treatment of the underlying non-linear coupled equations. Such perturbative techniques belong to the standard repertoire of analytic techniques for the study of the cosmic large-scale structure, see e.g., Bernardeau et al. (2002) for a review, and Sect. 2.5 for a concise summary. At late times and on smaller scales, shell-crossing and deeply non-linear dynamics limit the applicability of perturbative techniques. While some attempts have been made to extend PT beyond shell-crossing (Taruya and Colombi 2017; Pietroni 2018; Rampf et al. 2021a), or by controlling such effects in effective fluid approaches, e.g., Baumann et al. (2012), the evolved non-linear universe is still the domain of simulations. At the same time, PT of various flavours is used to set up the fields that provide the initial conditions for fully non-linear cosmological simulations.

6.1 Connecting simulations with perturbation theory

The physics governing the infant phase of the Universe, that is dominated by a hot plasma tightly coupled to radiation and linked through gravity to dark matter and neutrinos, is considerably more complex than the purely geodesic evolution of collisionless gravitationally interacting paricles outlined in Sect. 2. Since density fluctuations are small, this phase can be treated accurately by perturbative techniques at leading order. State-of-the-art linear-order Einstein–Boltzmann (EB) codes that numerically integrate these coupled multi-physics systems are e.g., Camb (Lewis et al. 2000) and Class (Lesgourgues 2011; Blas et al. 2011)Footnote 14. These codes usually evolve at least dark matter, baryons, photons and (massive) neutrinos and output Fourier-space transfer functions for density \(\delta _X\) and velocity divergence \(\theta _X\) for each of the species X as well as the total matter density fluctuations at specifiable output times. Typically equations are integrated in synchronous gauge, in a frame comoving with dust. The use of the output of these Einstein–Boltzmann solver for non-linear simulations that (in the case of N-body simulations) model only Newtonian gravity and no relativistic species, let alone baryon-photon coupling, requires still some numerical considerations that we discuss next. The inclusion or non-inclusion of relativistic species makes a difference of several per cent in the background evolution and therefore the growth rate between redshifts \(z=100\) and \(z=0\) (Fidler et al. 2017b), implying that it is crucial to be aware of what physics is included in the calculations for the initial conditions and in the non-linear cosmological code. Usually, it is sufficiently accurate to combine output for density perturbations in synchronous gauge with Newtonian gauge velocities when working with Newtonian equations of motion, but also self-consistent gauge choices exist which allow the correct inclusion of relativistic effects even within Newtonian simulations.

The main approaches adopted when using the output from an EB solvers to set up initial conditions are illustrated in Figure 7 and are:

Forward method In this approach, the output of the Einstein–Boltzmann code at some early time \(z_{\mathrm{start}} \gtrsim 100\) is used. To avoid errors of several per cent at low redshift on all scales, relativistic components must be included in the background evolution of the non-linear solution in this case. Also the significant evolution of the horizon between \(z>100\) and 0 means that for very large-scale simulations, relativistic corrections can become important and should be included as well for high precision. Since all corrections beyond the background are significant only on large scales, they remain perturbative. In the Cosira approach (Brandbyge et al. 2017), which is also used e.g., in the Euclid flagship simulations (Potter et al. 2017), they are added as corrections at the linear level (subtracting essentially the Newtonian linear gravity from the relativistic), convolving the resulting correction with the random phases of the initial conditions, and adding this realisation-specific correction to the gravity solver step of the N-body code.

Fig. 7
figure 7

Different setups used to initialise N-body simulations with the output from a linear Einstein–Boltzmann (EB) solver such as Camb or Class to bridge the gap between missing physics in N-body codes on the one hand, and missing non-linearities in the EB codes on the other hand. ‘EB linear’ represents the linear full-physics evolution through the EB code, ‘reduced linear’ a reduced linear model (including physics captured in the N-body code), ‘non-linear LPT’ the non-linear evolution using Lagrangian perturbation theory to some finite order valid prior to shell-crossing, and ‘N-body’ the full non-linear evolution. In the ‘forward’ approach additional fields (e.g., neutrinos, relativistic corrections) need to be added to match the EB solution at \(a_{\mathrm{target}}\)

Care has to be taken that these corrections never significantly contribute to non-linear terms as the corresponding back-reaction on the linear field is neglected. While requiring a direct integration of the Einstein–Boltzmann solver with the N-body code, this approach can readily include also a treatment of massive neutrinos at linear order (Tram et al. 2019, and see discussion in Sect. 7.8.2). If all relevant physics is included, this approach guarantees very good agreement between Einstein–Boltzmann and N-body on linear scales. Due to the very nature of this approach, it does not allow the use of high-order Lagrangian perturbation theory since only linear quantities are known at the starting time \(z_{\mathrm{start}}\), which therefore has to be pushed to early times in order to not itself affect the simulation results (see discussion in Sect. 8.3). If fields \(\delta _X\) and \(\theta _X\) are known for a species X at the starting time, then they can be converted into leading order consistent Lagrangian maps with respective velocities through the relations

$$\begin{aligned} \varvec{x}(\varvec{q};\,z_{\mathrm{start}}) = \varvec{q} - \varvec{\nabla }\nabla ^{-2} \delta ^{\mathrm{EB}}_X(\varvec{q};\,z_{\mathrm{start}})\qquad \text {and}\qquad \varvec{v}(\varvec{q};\,z_{\mathrm{start}}) = \varvec{\nabla }\nabla ^{-2} \theta ^\mathrm{EB}_X(\varvec{q};\,z_{\mathrm{start}}). \end{aligned}$$
(73)

Backward method An alternative approach to the forward method that allows to couple a non-linear simulation with reduced physical models to the EB solutions is given by the ‘backward method’ (not to be confused with the ‘backscaling’ method below). Here, the linearised set of equations solved by the non-linear code are used to integrate backwards in time, i.e. from \(z_{\mathrm{target}}\) where the output of the EB code is known, to the starting time of the non-linear simulation \(z_{\mathrm{start}}\). It is thus possible to reduce the multi-species universe e.g., to just two species, total matter and massive neutrinos (Zennaro et al. 2017), at \(z_{\mathrm{target}}\), evolve them backwards in time under the linearised physics of the N-body code, and then provide ICs using the prescription (73). The leading order evolution of the ‘active’ fluids takes into account scale-dependent growth and agrees reasonably well at high redshifts with the full EB solution. The limitation of this approach is that any decaying modes that are present at \(z_{\mathrm{target}}\) must still be small at \(z_{\mathrm{start}}\). This can be achieved well for neutrinos [with sub per cent errors for \(z_{\mathrm{start}}\lesssim 100\) with \(\sum m_\nu \lesssim 0.3\,\mathrm{eV}\), see Zennaro et al. (2017)] due to their small contribution to the total energy budget, but is more limited e.g., for baryons. Again, this approach is then restricted to being used for first order accurate displacements and velocities as only linear fields are known.

Backscaling method The arguably simplest approach, the back-scaling method, avoids the complications of differences of physics. At the same time, it is the one rigorously consistent with Lagrangian perturbation theory. It is also the traditionally used approach to set up N-body ICs. In this method, one uses the Einstein–Boltzmann code to evolve the linear multi-physics equations to a target redshift \(z_{\mathrm{target}}\) and then re-scales the total matter perturbation \(\delta ^{\mathrm{EB}}(z_{\mathrm{target}})\) as output by the EB-code to arbitrary times using the linear theory growth factor defined by Eq. (20) as

$$\begin{aligned} \tilde{\delta }_m(k;\, z_{\mathrm{start}}) = \frac{D_+(z_\mathrm{start})}{D_+(z_{\mathrm{target}})} \, \tilde{\delta }_m^\mathrm{EB}(k;\,z_{\mathrm{target}}). \end{aligned}$$
(74)

The main advantage of this approach is that by definition the correct linear theory is obtained in the vicinity of \(z=z_\mathrm{target}\) including e.g., relativistic and neutrino effects, without having to include this physics into the N-body code. This comes at the price that at early times, it might not in general agree with the full EB solution since a plethora of modes captured by the higher-dimensional EB system are reduced to scale-independent growth under the linear growing mode only. However, this is not a problem if non-linear coupling is unimportant, which is an excellent assumption precisely at early times. Backscaling can also be rigorously extended to simulations including multiple fluids coupled through gravity (Rampf et al. 2021b) by including additional isocurvature modes. It can in principle also account for scale-dependent evolution if it can be modelled at the \(D_+(k;\,a)\) level. This method, w.r.t. the total matter field in \(\varLambda \)CDM agrees exactly with the ‘backward method’ above if the decaying mode is neglected. Arguably the biggest advantage of the backscaling method is that it connects naturally with high order Lagrangian perturbation theory, as we will discuss next.

6.2 Initial conditions from Lagrangian perturbation theory

With the increasing precision requirements of N-body simulations over the last 20 years, it quickly became clear that first order accurate initial conditions are insufficient. Those are ICs that follow Eq. (73), which together with the back-scaled input spectrum from Eq. (74) amounts to the Zel’dovich approximation (ZA; Zel’dovich 1970). The reason is that linear ICs suffer from significant transients—decaying truncation errors between the ICs and the true non-linear evolution (Scoccimarro 1998; Crocce et al. 2006)—since higher order non-Gaussian moments of the density distribution are not accurately captured (see discussion in Sect. 8.3). Higher order is needed to follow correctly the higher order moments of the density distribution, i.e. first order captures only the variance, second in addition also the skewness and third even the kurtosis (Munshi et al. 1994). These ‘transients’ can only be suppressed by going to very early starting times, when the linear Gaussian approximation is accurate, at the cost of typically larger numerical errors (Michaux et al. 2021). The alternative to early starts is to go to higher order perturbation theory beyond the ZA, where the displacement field \(\varvec{\varPsi }\) of the Lagrangian map, \(\varvec{x}(\varvec{q},\,\tau )=\varvec{q}+\varvec{\varPsi }(\varvec{q},\,\tau )\), is expanded in a Taylor series to order n in the linear theory growth factor \(D_+\) yielding the nLPT approximation

$$\begin{aligned} \varvec{\varPsi }(\varvec{q},\tau ) = \sum _{j=1}^n D_+(\tau )^j \; \varvec{\varPsi }^{(j)}(\varvec{q}). \end{aligned}$$
(75)

This includes only growing modes, but remains regular as \(D_+\rightarrow 0\), i.e. \(a\rightarrow 0\), with the key property that \(\varvec{x}\rightarrow \varvec{q}\), i.e. the initial state is indeed a homogeneous unperturbed universe. Since the density is uniform in the \(a\rightarrow 0\) limit (where the limit is taken in the absence of radiation, otherwise one can use \(a\sim 10^{-2}\) during matter domination for simplicity), the growing mode perturbations are at leading order encapsulated in a single potential \(\phi ^{(1)}(\varvec{q})\) which can be connected to the back-scaling relation from above to the EB code output. This yields the famous ‘Zel’dovich approximation’

$$\begin{aligned} \varvec{\varPsi }^{(1)} = -\varvec{\nabla }\phi ^{(1)} \quad \text {with}\quad \phi ^{(1)} := -\nabla ^{-2} \lim _{a\rightarrow 0} \frac{D_+(a)}{a}\,\delta _m(\varvec{q};\, a), \end{aligned}$$
(76)

that can be used to set up simulation initial conditions by displacing Lagrangian fluid elements (e.g., N-body particles) consistent with \(\varvec{\varPsi }\) and giving them velocities according to \(\dot{\varvec{\varPsi }}\). Inserting this ansatz order by order in Eq. (25) returns the well known order-truncated n-th order LPT forms. Specifically, the 2LPT contribution to the displacement field has the form

$$\begin{aligned} \varvec{\varPsi }^{(2)} = \varvec{\nabla }\phi ^{(2)}\quad \text {with}\quad \phi ^{(2)} = -\frac{3}{14}\nabla ^{-2} \left[ {\phi }^{(1)}_{,ii} {\phi }^{(1)}_{,jj} - {\phi }^{(1)}_{,ij}{\phi }^{(1)}_{,ij}\right] , \end{aligned}$$
(77)

while at third order, i.e. 3LPT, the displacement field starts to have both longitudinal (i.e. irrotational) and transverse (i.e. solenoidal) components (Catelan 1995; Rampf and Buchert 2012)

$$\begin{aligned} \varvec{\varPsi }^{(3)} = \varvec{\nabla }\phi ^{(3)}+\varvec{\nabla }\times \varvec{A}^{(3)} \quad&\text {with}&\quad \phi ^{(3)} = \frac{1}{3}\nabla ^{-2} \left[ \det \phi ^{(1)}_{,ij} \right] -\frac{5}{21}\nabla ^{-2}\left[ \phi ^{(2)}_{,ii}\phi ^{(1)}_{,jj}-\phi ^{(2)}_{,ij} \phi ^{(1)}_{,ji}\right] \nonumber \\ \quad&\text {and}&\quad \varvec{A}^{(3)} = \frac{1}{7}\nabla ^{-2}\left[ \varvec{\nabla }\phi ^{(2)}_{,i}\times \varvec{\nabla }\phi ^{(1)}_{,i}\right] . \end{aligned}$$
(78)

The transverse part appears at 3LPT order and preserves the potential nature of the flow in Eulerian space. Newtonian gravity (i.e., Hamiltonian mechanics coupled to only a scalar potential) cannot produce vorticity (i.e., it exactly preserves any that might be present in the initial conditions, which however in cosmological context would appear as a decaying mode that blows up as \(a\rightarrow 0\)). For the truncated nLPT series this is only true at the respective order, i.e. \(\varvec{\nabla }_x \times \varvec{v} = \varvec{\nabla }_x \times \dot{\varvec{\varPsi }} =\mathcal {O}(D_+^n)\) (Uhlemann et al. 2019). For systems of more than one pressureless fluid, this approach can be readily generalised taking into account isocurvature modes to all, and decaying modes to first order currently (Rampf et al. 2021b; Hahn et al. 2021). Note that the relations given above for nLPT have small corrections in \(\varLambda \)CDM (Bouchet et al. 1995; Rampf et al. 2021b), which are however not important when initial conditions are generated at \(z\gg 1\), when it is safe to assume an Einstein–de Sitter cosmology.

In Fig. 8 we show the power spectrum of the source terms, \(\nabla ^2 \phi ^{(n)}\), for 2LPT and 3LPT. As expected, we see that the higher-order fields have significantly smaller amplitude than the linear-order density power spectrum. However, these higher-order contributions are required to improve the faithfulness of the simulated field and are in fact important for correctly predicting certain cosmological statistics. In addition, note that when computing the high-order potentials \(\phi ^{(n)}\) and \(\varvec{A}^{(n)}\) (with \(n\ge 2\)) some care has to be taken to avoid aliasing (Orszag 1971; Michaux et al. 2021; Rampf and Hahn 2021), which can be important even on large scales, as shown in the bottom panels of Fig. 8.

Fig. 8
figure 8

Image reproduced with permission from Michaux et al. (2021), copyright by the authors

The power spectrum of various fields contributing to displacements in Lagrangian Perturbation Theory up to 3rd order. The top panel shows the power spectrum of \(\nabla ^2 \phi ^{(2)}\) that contributes to the 2nd-order displacements (cf. Eq. 77), whereas the bottom left and right panel show the spectrum of \(\nabla ^2 \phi ^{(3a)}\) and \(\nabla ^2 \phi ^{(3b)}\) which correspond to the first and second terms of the 3rd order LPT contribution \(\phi ^{(3)}\), given in Eq. (6.2). In each case we display the results measured in a \(250 h^{-1}\mathrm{Mpc}\) and \(1000 h^{-1}\mathrm{Mpc}\) boxes, with and without correct de-aliasing, as indicated by the figure. The ratio between the aliased and de-aliased solutions with respect to the analytic expectation is shown in the bottom panels

Note that recently, Rampf and Hahn (2021) were the first to numerically implement the full nLPT recursion relations so that fields to arbitrary order can be used for ICs in principle, only limited by computer memory. This is included in the publicly-available MonofonIC Music-2 codeFootnote 15.

6.3 Generating Gaussian realisations of the perturbation fields

The previous results were generic, for numerical simulations, one has to work with a specific realisation of the Universe, which we will discuss next. The specific case of the realisation of our Universe is discussed as well.

6.3.1 Unconstrained realisations

Many inflationary cosmological models predict that scalar metric fluctuations are very close to Gaussian (Maldacena 2003; Acquaviva et al. 2003; Creminelli 2003). As we have shown above, Lagrangian perturbation theory, focusing on the fastest growing mode, is built up from a single initial scalar potential \(\phi ^{(1)}\) as defined in Eqs. (74) and (75), while the forward and backward approaches work with multiple fields \(\delta _m\), \(\theta _m\), and possibly others. These specify expectation values for a random realisation. Since these are all linear fields (i.e. we can assume that non-linear corrections can be neglected at \(a=0\) for back-scaling, and at \(a_{\mathrm{start}}\) for the forward and backward method) they will be statistically homogeneous and isotropic Gaussian random fields fully characterised by their two-point function. A general real-valued Gaussian homogeneous and isotropic random field \(\phi (\varvec{x})\) can be written as the Fourier integral

$$\begin{aligned} \phi (\varvec{x}) = \frac{1}{(2\pi )^3}\int _{\mathbb {R}^3}\mathrm{d}^3k\, \mathrm{e}^{\text {i}\varvec{k}\cdot \varvec{x}} \tilde{\varphi }(k)\,\tilde{W}(\varvec{k}) , \end{aligned}$$
(79)

where \(\tilde{W}(\varvec{k})\) is a complex-valued three-dimensional random field (also known as “white noise" since its power spectrum is \(\varvec{k}\) independent) with

$$\begin{aligned} \tilde{W}(\varvec{k}) = \overline{\tilde{W}(-\varvec{k})},\qquad \langle \tilde{W}(\varvec{k})\rangle =0,\qquad \text {and}\qquad \langle \tilde{W}(\varvec{k})\,\overline{\tilde{W}(\varvec{k}^\prime )}\rangle =\delta _D(\varvec{k}-\varvec{k}^\prime ), \end{aligned}$$
(80)

and \(\tilde{\varphi }(k)\) is the (isotropic) field amplitude in Fourier space as computed by the Einstein–Boltzmann code. This is often given in terms of a transfer function \(\tilde{T}(k)\) which is related to the field amplitude as \(\tilde{\varphi }(k)\propto k^{n_s/2}\tilde{T}(k)\), where \(n_s\) is the spectral index of the primordial perturbations from inflation. If the power spectrum P(k) is given, then setting \(\tilde{\varphi }(k) = (2\pi )^{3/2}\sqrt{P(k)}\) yields the desired outcome

$$\begin{aligned} \langle \tilde{\phi }(\varvec{k})\,\overline{\tilde{\phi }(\varvec{k}^\prime )}\rangle =(2\pi )^3\,P(k)\,\delta _D(\varvec{k}-\varvec{k}^\prime ). \end{aligned}$$
(81)

In order to implement these relations numerically, the usual approach when generating initial conditions is to replace the Fourier integral (79) with a discrete Fourier sum that is cut off in the IR by a ‘box mode’ \(k_0=2\pi /L\) and in the UV by a ‘Nyquist mode’ \(k_{\mathrm{Ny}}:=k_0 N/2\) so that the integral can be conveniently computed by a DFT of size \(N^3\). Naturally, fluctuations on scales larger than the box and their harmonics cannot be represented (but see below in Sect. 6.3.5). This is usually not a problem as long as \(k_0\ll k_{\mathrm{NL}}\), where \(k_{\mathrm{NL}}\) is the scale where non-linear effects become important, since then non-linearities can be assumed to not be strongly coupling to unresolved IR modes and are also not sourced dominantly by the very sparsely populated modes at the box scale (which would otherwise break isotropy in the non-linear structure). For simulations evolved to \(z=0\) this implies typically box sizes of at least \(300\,h^{-1}\mathrm{Mpc}\) comoving.

6.3.2 Reduced variance sampling

The noise field \(\tilde{W}\) naturally has a polar decomposition \(\tilde{W}(\varvec{k}) =: \tilde{A}(\varvec{k})\, \mathrm{e}^{\text {i}\theta (\varvec{k})}\), where A obeys a Rayleigh distribution (i.e. a \(\chi \)-distribution with two degrees of freedom) and \(\theta \) is uniform on \([0,2\pi )\). The power spectrum associated with W transforms as

$$\begin{aligned} \langle \tilde{W}(\varvec{k})\, \overline{\tilde{W}(\varvec{k}^\prime )} \rangle = \langle \tilde{A}(\varvec{k}) \,\tilde{A}(\varvec{k}^\prime )\rangle = \delta _D(\varvec{k}-\varvec{k}^\prime ), \end{aligned}$$
(82)

i.e., it is independent of the phase \(\theta \). In any one realization of \(\tilde{W}(\varvec{k})\), one expects fluctuations (“cosmic variance”) in \(\langle \tilde{W}(\varvec{k})\overline{\tilde{W}(\varvec{k}^\prime )}\rangle \) around the ensemble average of the order of the inverse square root of the number of discrete modes in some finite interval \(k\dots k+dk\) available in the simulation volume.

The amplitude of this cosmic variance can be dramatically reduced by simply fixing A to its expectation value (Angulo and Pontzen 2016), i.e., by ‘sampling’ A from a Dirac distribution \(A(\varvec{k})=\delta _D(\varvec{k})\). This technique is commonly referred to as “Fixing”. Clearly, the resulting field has much reduced degrees of freedom and a very specific non-Gaussian character. In principle, this introduces a bias in the nonlinear evolution of such a field, e.g., the ensemble averaged power spectrum at \(z=0\) differs from that obtained from an ensemble of Gaussian realizations. However, using perturbation theory, Angulo and Pontzen (2016) showed that the magnitude of this bias in the power spectrum and bispectrum is always smaller (by a factor equal to the number of Fourier modes) compared to mode-coupling terms of physical origin. In fact, it was found empirically that simulations initialised with such ICs produce highly accurate estimates of non-linear clustering (including power spectra, bispectra, halo mass function, and others) and that the level of spurious non-Gaussianity introduced is almost undetectable for large enough simulation boxes (Angulo and Pontzen 2016; Villaescusa-Navarro et al. 2018; Klypin et al. 2020).

The main advantage of “Fixing” is that it allows to avoid giant boxes or large ensembles of simulations in order to obtain measurements with low cosmic variance. A further reduction in cosmic variance can be achieved by considering pairs of simulations where the second simulation is initialised with \(\overline{\tilde{W}(\varvec{k})}\) instead of \(\tilde{W}(\varvec{k})\) (Pontzen et al. 2016; Angulo and Pontzen 2016). This is equivalent to inverting the phase \(\theta \), and averages out the leading order non-linear term contributing to cosmic variance (Pontzen et al. 2016). This technique has been adopted in several state-of-the-art dark matter and hydro-dynamical simulations (Angulo et al. 2021; Chuang et al. 2019; Euclid Collaboration 2019; Anderson et al. 2019).

The performance of this approach in practice is shown in Fig. 9 which compares the \(z=1\) power spectrum and multipoles of the redshift space correlation function. The mean of 300 \(L=3000 h^{-1}\mathrm{Mpc}\) simulations is displayed as a solid line whereas the prediction of a single pair of ‘fixed’ simulations is shown as red circles. In the bottom panels we can appreciate the superb agreement between both measurements, with relative differences being almost always below one per cent.

Fig. 9
figure 9

Image reproduced with permission from Angulo and Pontzen (2016), copyright by the authors

The nonlinear dark matter clustering at \(z=1\) in simulations with different initial conditions. The left panel shows the real-space power spectrum whereas the right panel shows the monopole and quadrupole of the redshift space correlation function. Gray lines display the results for an ensemble of 300 simulations with random Gaussian initial conditions, whereas the symbols show the results from a single Paired-&-Fixed pair of simulations. In the bottom panels we show the difference between the ensemble mean and the Paired-&-Fixed results in units of the standard deviation of the ensemble measurements. Note the drastic reduction of noise of this method, specially on large scales

6.3.3 Numerical universes: spatially stable white noise fields

A key problem in generating numerical realisations of random fields is that certain properties of the field should not depend on the exact size or resolution of the field in the numerical realisation. This means that it is desirable to have a method to generate white Gaussian random fields which guarantees that the large-scale modes remain identical, no matter what the resolution of the specific realisation. A further advantage is gained if this method can be parallelised, i.e. if the drawing does not necessarily have to be sequential in order to be reproducible. This problem has found several solutions, a selection of which we discuss below.

N-GenIC (Springel et al. 2005)Footnote 16 and derivates produce spatially stable white noise by drawing white noise in Fourier space in a reproducible manner. I.e. for two simulations with \(k_{\mathrm{Ny}}\) and \(k_{\mathrm{Ny}}^\prime >k_{\mathrm{Ny}}\), all modes that are representable in both, i.e. \(-k_{\mathrm{Ny}}\le k_{x,y,z}\le k_\mathrm{Ny}\), are identical, and for the higher resolution simulation, new modes are added between \(-k_{\mathrm{Ny}}^\prime \le k_{x,y,z} < -k_\mathrm{Ny}\) and \(-k_{\mathrm{Ny}}< k_{x,y,z} \le k^\prime _{\mathrm{Ny}}\). The random number generation is parallelised by a stable 2+1 decomposition of Fourier space. A shortcoming of this approach of sampling modes is that drawing in Fourier space is inherently non-local so that it cannot be generalised to zoom simulations, where high-resolution is desired only in a small subvolume, without generating the full field first.

Panphasia (Jenkins 2013)Footnote 17 has overcome this shortcoming by relying on a hierarchical decomposition of the white noise field in terms of cleverly chosen octree basis functions. Essentially instead of drawing Fourier modes, one draws the coefficients of this hierarchical basis thus allowing to add as much small-scale information as desired at any location in the three-dimensional volume.

Music-1 (Hahn and Abel 2011) subdivides the cubical simulation volume into subcubes for which the random numbers can be drawn independently and in parallel in real space, since each cube carries its own seed. Refined regions are combined with white noise from the overlapping volume from the coarser levels, just as in the N-GenIC approach, thus enforcing the modes represented at the coarser level to be present on the finer level.

6.3.4 Zoom simulations

In cases in which the focus of a simulation is the assembly of single objects or particular regions of the universe, it is neither desirable nor affordable to achieve the necessary high resolution throughout the entire simulation volume. In such situations, ‘zoom simulations’ are preferable, where the (mass) resolution is much higher in a small sub-volume of the entire box. This can be achieved in principle by generating high-resolution initial conditions, and then degrading the resolution outside the region of interest [as followed e.g. by the ZInCo code (Garaldi et al. 2016)]. This approach is, however, limited by memory, and for deeper zooms, refinement methods are necessary. The basic idea is that in the region of interest nested grids are inserted on which the Gaussian noise is refined. Some special care must be taken when applying the transfer function convolution and resolution of the Poisson equation. Such approaches were pioneered by Katz et al. (1994), Bertschinger (2001), and then extended to higher accuracy and 2LPT perturbation theory using a tree-based approach by Jenkins (2010) which is implemented in a non-public code used by the Virgo consortium, and using multi-grid techniques by Hahn and Abel (2011) in the publicly available MUSIC-1 codeFootnote 18. A recent addition is the GENET-IC codeFootnote 19 (Stopyra et al. 2021) that focuses on the application of constraints (cf. Sect. 6.3.7) to such zoom simulations but currently only supports first order LPT. An example of a particularly deep zoom simulation is the ‘voids in voids’ simulation shown in Fig. 13.

6.3.5 Super-sample covariance and ensembles

On the scale of the simulated box, the mean overdensity and its variance are expected to be non-zero for realistic power spectra. This is a priori in contradiction with periodic boundary conditions (but see Sect. 6.3.6) and so the mean overdensity of the volume in the vast majority of cosmological simulations is enforced to be zero for consistency. Hence, when an ensemble of simulations is considered, the variance of modes \(k<k_0\) is zero despite all of them having different initial white noise fields and providing fair ensemble averages for modes \(k_0\lesssim k\lesssim k_{\mathrm{Ny}}\). This implies that the component of the covariance that is due to large-scale overdensities and tides is underestimated (Akitsu et al. 2019) which is sometimes referred to as super-sample covariance—in analogy to a similar effect present in galaxy surveys (Li et al. 2018)—, and can be an important source of error in covariance matrices derived from ensembles of simulations, especially if the simulated boxes are of small size (Klypin and Prada 2019). Such effects can be circumvented in “separate universe” simulations that are discussed in Sect. 6.3.6.

Furthermore, it is important to note that for Fourier summed realisations of \(\phi \), the correspondence between power spectra and correlation functions is broken since the discrete (cyclic) convolution does not equal the continuous convolution, i.e.,

$$\begin{aligned} \phi (\varvec{x}) \ne \phi ^K(\varvec{x})&:= \mathrm{DFT}^{-1}\left[ \tilde{W}(\varvec{k})\, \tilde{\phi }(k) \right]&\text {and}&\end{aligned}$$
(83a)
$$\begin{aligned} \phi (\varvec{x}) \ne \phi ^R(\varvec{x})&:= \mathrm{DFT}^{-1}\left[ \tilde{W}\right] \circledast \phi (\Vert \varvec{x}\Vert )&{\text {where}} \;\phi (r)&:= \frac{1}{2\pi ^2}\int _0^\infty \frac{\sin kr}{kr} \tilde{\phi }(k) k^2 \mathrm{d}k, \end{aligned}$$
(83b)

where ‘\(\circledast \)’ symbolises a discrete cyclic convolution. This implies that real space and Fourier space statistics on discrete finite numerical Universes coincide neither with one another nor with the continuous relation. It is always possible to consider such real space realisations \(\phi ^R\) instead of Fourier space realisations \(\phi ^K\) (Pen 1997; Sirko 2005). In the absence of super-sample covariance, the correct statistics is always recovered on scales \(k_0\ll k\ll k_{\mathrm{Ny}}\). Since the real-space kernel \( \phi (r)\) is effectively truncated at the box scale, it does not impose the box overdensity to vanish and therefore also samples density fluctuations on the scale of the box, which must be absorbed into the background evolution (cf. Sect. 6.3.6).

The real-space sampling, by definition, reproduces the two-point correlation function also in smaller boxes if one allows the mean density of the box to vary. In fact, Sirko (2005) argued that this approach yields better convergence on statistics that depend more sensitively on real-space quantities (such as the halo mass function that depends on the variance of the mass field on a given scale), and also an accurate description of the correlation function for scales \(r \gtrsim L_{\mathrm{box}}/10\). However, Orban (2013) demonstrated that the correct correlation function can still be correctly recovered by accounting for the integral constraint in correlation function measurements, i.e. by realising that the \(\phi ^K\) sampling implicitly imposes \(\int _0^{R}r^2 \xi (r)\mathrm{d}r=0\) already over a scale R related to the box size \(L_\mathrm{box}\) instead of only in the limit \(R\rightarrow \infty \). A better estimator can therefore be obtained by simply subtracting this expected bias.

Note that an alternative approach to account for non-periodic boundary conditions have been recently proposed (Rácz et al. 2018, 2019). In such ‘compactified’ simulations an infinite volume is mapped to the surface of a four-dimensional hypersphere. This compactified space can then be partitioned onto a regular grid with which it is possible to simulate a region of the universe without imposing periodic boundary conditions. This approach has the advantage for some applications that it naturally provides an adaptive mass resolution which increases near to the center of the simulation volume where a hypothetical observer is located.

6.3.6 Separate universe simulations

How small scales are affected by the presence of fluctuations on larger scales is not only important to understand finite-volume effects and ensembles of simulations, as we discussed in the previous section, but it also is a central question for structure formation in general. For instance, this response is an essential ingredient in models for biased tracers or in perturbation theory. These interactions can be quantified in standard N-body simulations, however, a method referred to as “separate universe simulations” provides a more controlled way to carry out experiments where perturbations larger than the simulated volume are systematically varied to yield accurate results even with modest simulation volumes. A key advantage of the separate universe technique is that it allows to quantify the dependence of small-scale structure formation on large-scale perturbations. For instance, changing the effective mean density, e.g., of two simulations to \(+\delta _0\) and \(-\delta _0\), one can compute the response of the power spectrum by taking a simple derivative \(d P / d \delta _0 \simeq (P(k;+\delta _0) - P(k;-\delta _0))/2\delta _0\) (e.g., Li et al. 2014), which can be extended also to higher orders.

The main idea behind the separate universe formalism (Sirko 2005; Baldauf et al. 2011; Sherwin and Zaldarriaga 2012; Li et al. 2014; Wagner et al. 2015; Baldauf et al. 2016) is that a long wavelength density fluctuation can be absorbed in the background cosmology. In other words, larger-than-box fluctuations simply lead to a non-zero overdensity \(\delta _0=\mathrm{const.}\) of the box that must be absorbed into the background density in order to be consistent with the periodic boundary conditions of the box, i.e., one matches

$$\begin{aligned} \rho (a) [1 + D_+(a) \delta _0] =: \breve{\rho }(a) , \end{aligned}$$
(84)

and thus structure in a given region embedded inside a region of overdensity \(\delta _0\) (today, i.e. at \(a=1\)) is equivalent to that of an isolated region of the universe evolving with a modified set of cosmological parameters indicated by ‘\(\breve{}\)’. Specifically, the modified Hubble parameter, and matter, curvature, and dark energy density parameters become

$$\begin{aligned} \breve{H_0} := H_0 \varDelta _H, \qquad \breve{\varOmega }_m := \varOmega _\mathrm{m} \varDelta _H^{-2}, \qquad \breve{\varOmega }_K := 1 - \varDelta _H^{-2}, \qquad \breve{\varOmega }_{\mathrm{\varLambda }} := \varOmega _{\mathrm{\varLambda }} \varDelta _H^{-2} , \end{aligned}$$
(85)

where for a simulation initalised at \(a_{\mathrm{ini}}\)

$$\begin{aligned} \varDelta _H := \sqrt{1 - \frac{5\varOmega _m}{3} \frac{D_+(a_\mathrm{ini})}{a_{\mathrm{ini}}} \delta _0}. \end{aligned}$$
(86)

Note that although these expressions are exact, solutions only exist for \(\delta _0 <\frac{3}{5\varOmega _m} \frac{a_{\mathrm{ini}}}{D_+(a_\mathrm{ini})}\). For larger background densities, the whole region is expected to collapse. An important aspect is that the age of the universe should match between the separate universe box and that of the unperturbed universe, thus the scale factors are not identical but relate to each other as

$$\begin{aligned} \breve{a} := a \left( 1 - \frac{1}{3} D_+(a) \delta _0 \right) . \end{aligned}$$
(87)

Also, when initialising a simulation, the perturbation spectrum from which the ICs are sampled should be rescaled with the growth function \(\breve{D}_+(\breve{a})\) based on the ‘separate universe’ cosmological parameters.

Separate universe simulations have been successfully applied to many problems, that can be roughly split into two groups. The first group includes measurements of the value and mass dependence of local and nonlocal bias parameters for voids and halos. The second group includes quantification of the response of the nonlinear power spectrum and/or bispectrum to global processes. For instance, Barreira et al. (2019) studied the role of baryonic physics on the matter power spectrum. Other studies include measurements of the linear and quadratic bias parameters associated to the abundance of voids (Chan et al. 2020; Jamieson and Loverde 2019b); the correlation between large-scale quasar overdensities and small-scale Ly-\(\alpha \) forest (Chiang et al. 2017); halo assembly bias (Baldauf et al. 2016; Lazeyras et al. 2017; Paranjape and Padmanabhan 2017); and cosmic web anisotropy (Ramakrishnan and Paranjape 2020).

Naturally, a simple change in the mean density only modifies the isotropic expansion. More realistically, a given volume will also be exposed to anisotropic deformation due to a large-scale tidal field. Schmidt et al. (2018) (see also Stücker et al. 2021c; Masaki et al. 2020; Akitsu et al. 2020) have demonstrated that such a global anisotropy can be accounted for by a modification of the force calculation in numerical simulations. These simulations have been used to study the role of large-scale tidal fields in the abundance and shape of dark matter halos and the response of the anisotropic power spectrum, and will be very useful also for studies of coherent alignment effects of haloes and galaxies which are important to understand intrinsic alignments in weak gravitational lensing. We note that the separate Universe approach can also be generalised to study the impact of compensated isocurvature modes (where the relative fluctations of baryons and dark matter changes while leaving fixed the total matter fluctuations) or of modifications to the primordial gravitational potential, as illustrated in Fig. 10.

Fig. 10
figure 10

Image reproduced with permission from Voivodic and Barreira (2021), copyright by IOP

Schematic illustration of various kinds of Separate Universe simulations. Structure formation embedded into a large-scale fluctuation \(\delta _L\) is equivalent to that of a simulation with a modified background matter parameter, \(\rho _m\); structure formation with inside large-scale compensated isocurvature fluctuation, \(\sigma _L\) is equivalent to a modification in the background baryon and cold dark matter densities, \(\rho _c\) and \(\rho _m\); and finally structure formation inside a large potential fluctuation, as originated by primordial non-gaussianity, can be captured in the Separate Universe formalism by a change in the amplitude of fluctuations \(A_s\)

Another limitation of the original separate universe formulation, is that the long wavemode is assumed to evolve only due to gravitational forces. This means that the scale on which \(\delta _0\) is defined has to be much larger than any Jeans or free streaming scale. This condition might be violated if, e.g., neutrinos are considered since their evolution cannot be represented as a cold matter component. This limitation was avoided in the approach of Hu et al. (2016), who introduced additional degrees of freedom in terms of “fake” energy densities tuned to mimic the correct expansion history and thus growth of the large-scale overdensity. This approach has been applied to study inhomogeneous dark energy (Chiang et al. 2017; Jamieson and Loverde 2019a) and massive neutrinos (Chiang et al. 2018).

6.3.7 Real universes: constrained realisations

So far we have discussed how to generate random fields which produce numerical universes in which we have no control over where structures, such as galaxy clusters or voids, would form. In order to carry out cosmological simulations of a realisation that is consistent with the observed cosmic structure surrounding us, it is therefore desirable to impose additional constraints.This could, for instance, shed light on the specific formation history of the Milky Way and its satellites, inform about regions where a large hypothetical DM annihilation flux is expected, or quantify the role of cosmic variance in the observed galaxy properties [see Yepes et al. (2014) for a review.]

The simplest way to obtain such constrained initial conditions is by employing the so-called Hoffman–Ribak (HR) algorithm (Hoffman and Ribak 1991). Given an unconstrained realisation of a field \(\phi \), we seek to find a new constrained field \(\check{\phi }\) that fulfils M (linear) constraints. In general, this can be expressed in terms of kernels H by requiring \((H_j\star \check{\phi })(\varvec{x}_j) = \check{c}_j\), where \(\check{c}_j\) is the desired value of the j-th constraint centered on \(\varvec{x}_j\). The Gaussian field \(\check{\phi }\) obeying these constraints can be found using the Hoffman–Ribak algorithm (Hoffman and Ribak 1991) by computing

$$\begin{aligned} \tilde{\check{\phi }}(\varvec{k}) = \tilde{\phi }(\varvec{k}) + P(k) \, \tilde{H}_i(\varvec{k}) \,\xi _{ij}^{-1}(\check{c}_j-c_j), \end{aligned}$$
(88)

with

$$\begin{aligned} c_j = \frac{1}{(2\pi )^3}\int _{\mathbb {R}^3}\mathrm{d}^3k\, \overline{\tilde{H}_j(\varvec{k})}\,\tilde{\phi }(\varvec{k})\qquad \text {and} \qquad \xi _{ij} = \frac{1}{(2\pi )^3} \int _{\mathbb {R}^3}\mathrm{d}^3k\, \overline{\tilde{H}_i(\varvec{k})}\,\tilde{H}_j(\varvec{k})\,P(k), \end{aligned}$$
(89)

where \(c_j\) is the covariance between constraint j and the unconstrained field, and \(\xi _{ij}\) is the \(M\times M\) constraint covariance matrix (van de Weygaert and Bertschinger 1996; see also Kravtsov et al. 2002 and Klypin et al. 2003 for further implementation details). A possible constraint is e.g., a Gaussian peak on scale R at position \(\varvec{x}_i\) so that \(\tilde{H}_i(\varvec{k})=\exp \left[ -k^2R^2/2+\text {i}\varvec{k}\cdot \varvec{x}_i\right] \), constraints on differentials and integrals of the field can be easily taken into account in the Fourier-space kernel.

Using this algorithm, various simulations were able to reproduce the local distribution of galaxies and famous features of the local Universe such as the Local and Perseus-Pices Superclusters, and the Virgo and Coma clusters as well as the Great Attractor (e.g., Sorce et al. 2014). This is illustrated in the top left panel of Fig. 11, which displays a simulation from the CLUES collaborationFootnote 20 (Gottloeber et al. 2010; Carlesi et al. 2016). Traditionally, the observational constraints were mostly set by radial velocity data (which was assumed to be more linear than density, thus more directly applicable for constraining the primordial density field), for instance, from dataset such as the “Cosmic Flows 2” (Tully et al. 2016).

Fig. 11
figure 11

Images reproduced with permission from [top left] CLUES, copyright by S. Gottlöber, G. Yepes, A. Klypin, A. Khalatyan; [top right] from Wang et al. (2016), copyright by IOP; and [bottom] from Jasche and Lavaux (2019), copyright by IOP

Various N-body simulations of initial fields constrained by local observations. The top left panel shows one of the simulations of local Universe carried out by the CLUES collaboration, where the initial conditions were constrained using the CosmicFlows observations (Tully et al. 2016). In the top right panel we show a \(200 h^{-1}\mathrm{Mpc}\) wide slice of the density field in the ELUCID simulation (Wang et al. 2016). The simulated dark matter is shown in a black-blue color scale whereas red/cyan symbols show the location of red/blue galaxies in the SDSS DR7 observations. Similarly, in the bottom panel we show a Mollweide projection of \(\log (2+\delta )\) of a PM simulation whose initial conditions were set by requiring agreement with the observations of the 2M++ catalogue (Lavaux and Hudson 2011) shown as red dots

A further technique related to constrained realisations has been termed as ‘genetic modification’ (Roth et al. 2016; Stopyra et al. 2021). Its main idea is to impose constraints, not in order to be compatible with the Local Universe, but to perform controlled numerical experiments. For instance, the mass of a halo can be altered by imposing a Gaussian density peak at the halo location with the desired height and mass. In this way, it is possible to, e.g., isolate the role of specific features (e.g., formation time, major merger, etc) in the formation of a halo or of the putative galaxy it might host, seeking a better understanding of the underlying physics.

The HR approach has several limitations, for instance, it does not account for the Lagrangian–Eulerian displacements. A general problem of all reconstruction methods is that small scales are difficult to constrain since those scales have shell-crossed in the \(z=0\) Universe, so that information from distinct locations has been mixed or even lost. Constrained simulations therefore resort to trial and error by running many realisations of the unconstrained scales until a desired outcome is achieved (e.g., a Milky Way Andromeda pair). To speed up this process, Sorce (2020) recently proposed an important speed up by using pairs of simulation carried out with the Paired and Fixed method (cf. Sect. 6.3.2), and Sawala et al. (2021a) quantified the influence of unconstrained initial phases on the final Eulerian field. Another improvement in terms of a “Reverse Zel’dovich Approximation” has been proposed to estimate the Lagrangian position of local structure (Doumler et al. 2013a, b).

An alternative route to numerical universes in agreement with observations is followed in Bayesian inference frameworks (Kitaura and Enßlin 2008; Jasche et al. 2010; Ata et al. 2015; Lavaux and Jasche 2016). In this approach, LPT models or relatively low-resolution N-body simulations (Wang et al. 2014) are carried out as a forward model mapping a random field to observables. The associated large parameter space of typically millions of dimensions of the IC field is then explored using Hamiltonian Monte Carlo until a realisation is found that compares favourably with the observations. The white noise field can then be stored and used for high resolution simulations that can give insights into the formation history of the local Universe. For instance, Heß et al. (2013), Libeskind et al. (2018), Sawala et al. (2021b) simulated the initial density field derived from observations of the local Universe, and Wang et al. (2016), Tweed et al. (2017) created a constrained simulation compatible with the whole SDSS DR7 galaxy distribution, which is shown in the top right panel of Fig. 11.

6.4 Initial particle loads and discreteness

Given displacement and velocity fields from Lagrangian perturbation theory and a random realisation of the underlying phases, one is left with imposing these displacement and velocity perturbations onto a set of particles, i.e., a finite set of initially unperturbed Lagrangian coordinates \(\varvec{q}_{1\dots N}\). These correspond to N-body particles representing the homogeneous and isotropic initial state of the numerical universe. Drawing the \(\varvec{q}_j\) from a Poisson process would be the naive choice, however, its intrinsic power spectrum is usually well above that of the matter fluctuations at the starting time, so that it introduces a large amount of stochastic noise and is gravitationally unstable even in the absence of perturbations. Other choices are regular Bravais lattices (a regular simple cubic grid being the most obvious choice), which are gravitationally stable, have no stochasticity, but are globally anisotropic. Higher-order Bravais lattices such as body-centered or face-centered lattices are more isotropic than the simple cubic lattice. A gravitationally stable arrangement with broken global symmetry can be obtained by evolving a Poisson field under a repulsive interaction (White 1994) until it freezes into a glass-like distribution. The resulting particle distribution is more isotropic than a regular lattice, and has a white-noise power spectrum on scales smaller than the mean inter-particle separation, which decreases as \(k^{4}\) on larger scales and therefore has more noise than a Bravais lattice. Also other alternative initial particle loads have been proposed, among them quaquaversal tilings (Hansen et al. 2007), and capacity constrained Voronoi tessellations (CCVT, Liao 2018), both of which have a \(k^4\) power spectrum. Example initial particle distribution for various cases are shown in Fig. 12.

Fig. 12
figure 12

Image reproduced with permission from Liao (2018), copyright by the authors

Illustration in two dimensions of various particle distributions employed in the generation of initial conditions for cosmological numerical simulations. From left to right: a regular simple cubic grid lattice, a glass, and particle loads from quaquaversal tiling, and capacity constrained Voronoi tessellations

After the creation of the initial particle load, the displacement \(\varvec{\varPsi }(\varvec{q}_{1\dots N})\) and velocity fields \(\dot{\varvec{\varPsi }}(\varvec{q}_{1\dots N})\) are interpolated to the particle locations \(\varvec{q}_{1\dots N}\), thereby defining the initial perturbed particle distribution with growing mode velocities at the simulation starting time. In the case of Bravais lattices, the modes coincide directly with the modes of the DFT in the case of the simple cubic lattice, so that no interpolation is necessary, or can be obtained by Fourier interpolation. For the other pre-initial conditions typically CIC interpolation is used, cf. Sect. 5.1.2. Since CIC interpolation acts as a low-pass filter, the resulting suppression of power is usually corrected by de-convolving the displacement and velocity fields with the CIC interpolation kernel (Eq. 61b).

Since the symmetry of the fluid is always broken at the particle discretisation scale, the specific choice of pre-initial condition impacts the growth rate and growth isotropy at the scale of a few particles. While in the Bravais cases, this deviation from the fluid limit is well understood (Joyce et al. 2005; Marcos et al. 2006; Marcos 2008), for glass, CCVT and quaquaversal tilings such analysis has not been performed. One expects that they are affected by a stochastic component in the growth rate with an amplitude comparable to that of the regular lattices. Such deviations of the discrete fluid from the continuous fluid accumulate during the pre-shell-crossing phase when the flow is perfectly cold so that over time the simulations, which are initialised with the fluid modes, relax to the growing modes of the discrete fluid.

7 Beyond the cold collisionless limit: physical properties and models for dark matter candidates

So far, in this review article we have focused on the case where all the mass in the universe corresponds to a single cold and collisionless fluid. This is also the assumption of the vast majority of “dark matter only” or gravity-only simulations, and it is justified by the fact that gravity dominates long-range interactions, and that dark matter is the most abundant form of matter in the Universe (\(\varOmega _{\mathrm{c}}/\varOmega _{\mathrm{m}}\approx 84\%\)).

In this section we discuss simulations where these assumptions are relaxed in several ways, either to improve the realism of the simulated system, or to explore the detectability and physics associated to the nature of the dark matter particle. In each case, we briefly discuss the physical motivation, its numerical implementation along with potential modifications necessary to the initial conditions, and summarise the main results.

In the first part of this section we will discuss simulations where dark matter is not assumed to be perfectly cold, but instead have a non-zero temperature as is the case when made out of WIMPs, QCD Axions, or generically Warm Dark Matter. We also discuss cases where dark matter is not a classical particle but made instead of ultra light axion-like particles, or of primordial black holes. We also consider cases where dark matter is not assumed to be perfectly collisionless, but instead display microscopic interactions, such in the case of self-interacting and decaying dark matter.

In the second part of this section we consider simulations that seek a more accurate representation of the Universe. Specifically, we discuss simulations where the mass density field is composed of two distinct fluids representing dark matter and baryons, and simulations that include massive neutrinos. For completeness, we also discuss simulations with non-Gaussian initial conditions and modified gravity.

7.1 Weakly-interacting massive particles

Historically, the favoured candidate for dark matter has been a weakly-interacting massive particle (WIMP). WIMP is a generic name for a hypothetical particle with a very cold distribution function due to its early decoupling and high mass (in the GeV scale). For many decades, WIMPs were a strong contender to be the cosmic DM, motivated by the observation that if the cross-section was set by the electroweak scale, then it would result in a relic abundance matching the measured density of dark matter in the Universe. This coincidence has been termed as the “WIMP miracle” [see Bertone et al. (2004) for a classic review]. However, the non-detection of supersymmetric particles at the LHC to-date (Bertone 2010), and the strong constraints from direct detection experiments begin to challenge the explanation of dark matter through thermally produced WIMP particles (Roszkowski et al. 2018). Nevertheless, massive WIMPs remain compatible with all observational constraints and are one of the best motivated dark matter candidates. See Bertone and Tait (2018) for a recent review of various experimental searches.

A concrete example of a WIMP in supersymmetric extensions to the Standard Model is the lightest neutralino. These particles are stable and weakly interacting and should have a mass in excess of \(\gtrsim 100\,~\mathrm{Gev}\). On astrophysical scales, neutralinos can be described as a perfectly cold collisionless fluid. However, the finite temperature at which these particles decouple implies that they have small but non-zero random microscopic velocities. As a consequence, neutralinos can stream freely out of perturbations of sizes \(\sim 0.7\)pc which means that the formation of halos of masses \(\lesssim 10^{-8}\,M_{\odot }\) is strongly suppressed, and the typical mass of the first halos to collapse is about one Earth mass (Hofmann et al. 2001; Green et al. 2004; Loeb and Zaldarriaga 2005).

For perfectly cold fluids, the distribution function reduces to a Dirac-\(\delta \)-distribution in momentum space, i.e., in any point in space there is a unique particle momentum. This corresponds to the “single stream” or “monokinetic” regime as usually referred to in fluid mechanics and plasma physics (see also the discussion in Sect. 2.5). For a warmer fluid, such as that describing a neutralino field, the cold limit is also an excellent approximation for the distribution function. This is because thermal random velocities are small compared to the mean velocities arising from gravitational instability, specially at late times where the former are adiabatically cooled due to the expansion of the Universe whereas the latter keep increasing due to structure formation.

Consequently, numerical simulations of structure formation with neutralinos assume them to be perfectly cold, follow them with traditional N-body integration methods, and incorporate their free streaming effects only in the initial power spectrum suppressing the amplitude of small scale modes. Note however, that the very central parts of halos and caustics could be affected by intrinsic neutralino velocity dispersion, providing e.g., maximum bounds on the density.

One important challenge associated with these types of simulations is the huge dynamic range of the scales involved. For instance, to resolve the full hierarchy of possible structures, it would require about \(10^{23}\) N-body particles. For this reason, neutralino simulations have focused on the formation of the first structures at high redshifts and over small volumes. Usually, these simulations involve zooming (cf. Sect. 6.3.4) into the formation of a small number of halos (Diemand et al. 2005; Ishiyama et al. 2009; Anderhalden and Diemand 2013; Ishiyama 2014; Angulo et al. 2014). Another alternative is to carry out a suite of nested zoom simulations (Gao et al. 2005), which has been recently extended to the free streaming mass by Wang et al. (2020) by re-simulating low density regions embedded into larger underdensities and so on. A selection of projected density fields from this simulation suite are shown in Fig. 13 which displays progressive zooms by factors of \(\sim 50\) in scale, from 10 Mpc to 50 pc.

Fig. 13
figure 13

Progressive zooms onto smaller regions of a simulated nonlinear dark matter field at \(z=0\). From left to right, each image shows a smaller region by factors of 5, 40, and 100. Note that in the rightmost panel shows a region approximately 150 pc wide where the smallest visible clumps corresponds to the smallest dark matter halos expected to form in an scenario where the dark matter particle is made out of \(\gtrsim 100\) GeV neutralinos. Image adapted from Wang et al. (2020)

There is a consensus from all these simulations that the first microhalos have a mass of about \(10^{-6}\,M_{\odot }\) and start collapsing at a redshift of \(z\sim 300\). At those mass scales and resfhifts, structure formation is different than at galactic scales: the spectrum of CDM density fluctuations has a slope close to \({-3}\), i.e. \(P(k) \propto k^{-3}\), which causes a large range of scales to collapse almost simultaneously. This also implies that the formation of the first microhalos is immediately followed by a rapid mass growth and many major mergers. Whether these microhalos can survive tidal forces inside Milky-Way halos is still an open question, which could have implications for the detectability of a potential dark matter self-annihilation signal.

Another, perhaps unexpected, outcome of these simulations is that the density profile of these microhalos is significantly different from that of halos on larger mass scales. Ever since standard CDM simulations (i.e., those without any free-streaming effects) reached adequate force and mass resolution, they revealed that the internal density profiles of collapsed structures were well described by a simple functional form (Navarro et al. 1997), referred to as NFW profile, in terms of a dimensionless radial coordinate \(x := r/r_s\) and density \(\varrho := \rho (r) / \rho _0\) (where \(r_s\) and \(\rho _0\) are parameters that vary from halo to halo)

$$\begin{aligned} \varrho (x) = \frac{1}{x (1 + x^2)} \end{aligned}$$
(90)

regardless of the mass of the halo, cosmic epoch, cosmological parameters, etc. (Ascasibar and Gottlöber 2008; Neto et al. 2007; Brown et al. 2020). Despite its importance and several proposed explanations that involve, for instance, a fundamental connection with the mass accretion history (Ludlow et al. 2014, 2016), maximum entropy or adiabatic invariance arguments (Taylor and Navarro 2001; Dalal et al. 2010a; Pontzen and Governato 2013; El Zant 2013; Juan et al. 2014), or even the role of numerical noise as the main driver (Baushev 2015), there is not yet a consensus on the physical explanation behind this result. This is even more puzzling when contrasted with analytic predictions, which suggest single power laws as the result of gravitational collapse [see e.g., the self-similar solution of secondary infall by Bertschinger (1985)].

In contrast, most neutralino simulations find that the initial internal structure of microhalos is better described by a single power law profile \(\sim r^{-1.5}\), as first pointed out by Diemand et al. (2005) (see also Ishiyama et al. 2009; Anderhalden and Diemand 2013; Ishiyama 2014; Ogiya et al. 2016; Angulo et al. 2017; Delos et al. 2019). This very steep profile would make microhalos very resilient to tidal disruption by the Milky Way or by binary stars and also would enhance their corresponding emission from dark matter self annihilation, making them potentially detectable by future experiments (Diemand et al. 2006; Ishiyama et al. 2009; Ishiyama 2014; Delos 2019), although their abundance is affected by free streaming (Ishiyama and Ando 2020). This power-law profile, however, quickly mutates to a NFW-like profile for higher mass halos. Several authors have argued that subsequent mergers drive this transformation and determine the final density profile (Ogiya et al. 2016; Angulo et al. 2017). These results have been further supported by Colombi (2021) using the idealised collapse of three sine waves simulated with a Lagrangian tessellation algorithm. In contrast, Wang et al. (2020) which is the only neutralino-like simulation that has reached \(z\sim 0\), finds density profiles consistent with an NFW shape.

These inconsistencies among simulations leave the question about the ‘true’ density profiles of the first halos forming in a WIMP dark matter model currently unanswered. The role of numerical artefacts vs. physical processes (such as the dynamical state of halos, their mass accretion histories, or environmental effects) still need to be better understood. The alternative discretization methods discussed in Sect. 3 might play an important role in this.

7.2 Axions

The QCD (Peccei–Quinn) axion is another hypothetical particle, originally proposed to solve the strong charge-parity problem (see, Duffy and van Bibber 2009; Marsh 2016, for reviews), that has been considered as a possible dark matter candidate. Axions, despite being very light (\(m \sim 10^{-5}\)\(10^{-3}\,\mathrm{eV}\)), are extremely cold due to non-thermal production. Therefore they gravitationally cluster on small scales and behave as cold dark matter on cosmological scales.

As for WIMPs, primordial fluctuations in the axion field exist down to very small scales up to an eventual smoothing produced by free streaming velocities. In the case of axions, this occurs on sub-parsec scales. The smallest structures are expected to be “mini halos” of masses \(10^{-12}\, h^{-1}{\mathrm{M}}_{ \odot }\) (set roughly by the mass inside the horizon when the axions become nonrelativistic) and \(10^{12}\,\mathrm{cm}\) radii. These minihalos are even smaller than those in neutralino cosmologies and would form even earlier: their typical collapse is expected to occur during the radiation dominated epoch of the Universe at \(z\sim 1000\). If a large fraction of axion miniclusters survive tidal disruption during structure growth, they could be potentially detected in femto- and pico-lensing experiments (Kolb and Tkachev 1996; Fairbairn et al. 2018), or alternatively, the resulting tidal streams could impact indirect and direct detection in cavity experiments seeking to detect the transformation of axions into photons (Tinyakov et al. 2016; O’Hare and Green 2017; Knirck et al. 2018). Therefore, numerical simulations are required to explore structure formation and evolution of axion structures.

Analogously to WIMPs and neutralinos, cosmological simulations including axions assume them to be in the cold limit, i.e., to have zero velocity dispersion at the starting redshift. The subsequent gravitational evolution can be followed by standard N-body codes and algorithms, but because the first halo collapse is expected to occur on extremely small mass scales and even before the matter-radiation equality, simulations typically have sizes of \(\sim 1\)\(10 h^{-1}\mathrm{kpc}\) and evolve from \(z_{\mathrm{start}} \sim 10^6\) up to \(z \sim 100\) where the fluctuations on the scale of the box start to become nonlinear.

The initial conditions of axion simulations are very different to that of standard dark matter simulations. Whereas most dark matter fluctuations are given by adiabatic fluctuations on all scales, the initial distribution of axions (on the small scales targeted by axion simulations) is set by the formation and decay of topological defects and axion self interactions after the QCD phase transition at the end of inflation. Thus, the initial distribution is not predicted by e.g., Boltzmann codes but has to be followed by QCD lattice simulations (Vaquero et al. 2019; Buschmann et al. 2019). Because of these complications, only recently the first self-consistent cosmological simulation of the formation of structure in an axion dark matter field was carried out (Eggemeier et al. 2020; see Fig. 14). On large scales, however, different patches of the universe would have been uncorrelated, and thus it is common to assume the statistics to be given by a white-noise power spectrum whose amplitude is set to match QCD axion simulations, and with a cut-off at modes that entered the horizon when the axions were relativistic (Xiao et al. 2021).

Fig. 14
figure 14

Image reproduced with permission from Eggemeier et al. (2020), copyright by APS

The projected mass field at \(z=99\) in a cosmological simulation assuming the dark matter is composed of QCD axions. The left panel shows the whole simulation box of 0.86 pc, whereas the right panel zooms into the largest axion minicluster of mass \(\sim 10^{-9}\, h^{-1}{\mathrm{M}}_{ \odot }\) where the white circles denotes the radius enclosing a mean density 200 times the background value. Note that, unlike most cosmological simulations, there is a lack of filamentary structure owing to the scale-independent power spectrum of fluctuations expected in the primordial axion field on the scales displayed by the left panel

These simulations find that indeed axion miniclusters form at very high redshifts (\(z>150'000\)) with an abundance that continuously grows covering a broad range of masses, \(10^{-15}\)\(10^{-8}\, h^{-1}{\mathrm{M}}_{ \odot }\) at \(z\sim 100\). The abundance of axion miniclusters can be reasonably well approximated by analytic arguments based on Press–Schechter and Excursion set theory with the appropriate spectrum of fluctuations. Such calculations allow an estimate of the abundance of these axion miniclusters down to redshifts beyond the end of simulations. These axion miniclusters have density profiles in reasonably good agreement with an NFW profile and extremely high concentrations \(c \sim 100\)–400, consistent with the classical picture where concentration reflects the density of the universe at collapse.

These simulations make interesting predictions for the abundance and internal properties of miniclusters at high redshifts. However, uncertainties exist that are related to the survival of these halos to redshift zero and in an environment such as that of the Milky Way in terms of the host halo tidal field and stellar encounters. While perhaps the recursive-zooming techniques discussed in the previous section could be extended to the axion scales, it would be a formidable computational challenge. Another alternative could be to explore the impact of the tidal field of the Milky Way halo using, for instance, idealised simulation techniques that have been employed to study the survival of CDM halos. Combining these tools it will be possible to perform robust prediction for upcoming observations sensitive to the axion structure.

7.3 Warm Dark Matter (WDM)

For both WIMPs and QCD axions, their microscopic nature has essentially no implications for the properties of galaxies and for the observable large-scale scale structure of the Universe. However, dark matter could be a particle whose mass and free streaming scale is comparable to that of galaxies. Such particles are generically referred to as warm dark matter and a concrete examples are sterile neutrinos or gravitinos.

Additional motivation for considering dark matter alternatives, such as WDM, is that the physical properties of DM could offer a solution to the tension between the predictions of \(\varLambda \)CDM and observations of the abundance, spatial distribution, and internal properties of dwarf galaxies (Klypin et al. 1999b; Moore et al. 1999; Zavala et al. 2009; Power 2013; Papastergis et al. 2011, 2015; Klypin et al. 2015; Boylan-Kolchin et al. 2011). Although baryonic physics and the effects of galaxy formation are found to be able to resolve them even within CDM (or at least ameliorate the tensions) (Navarro et al. 1996; Romano-Díaz et al. 2008; Pontzen and Governato 2012; Brooks et al. 2013; Sawala et al. 2016; Brook and Di Cintio 2015; Chan et al. 2015), there is no consensus on the inevitability nor the magnitude of these effects (Oman et al. 2015).

Constraining the mass of a hypothetical warm dark matter particle is an active area of research. Currently, the most competitive methods are: i) the abundance of Milky Way satellites, as measured by extragalactic surveys, ii) the small-scale properties of intergalactic gas as measured by the Ly-alpha forest, and iii) the abundance and properties of small halos and subhalos as measured by perturbations to strong lensing systems. All these methods have different statistical power and systematics, yet they agree on an upper limit for the dark matter free streaming scale, which currently translates to a mass of about \(m_{\chi } \gtrsim 6~\mathrm{keV}\) for a thermal relic (Enzi et al. 2021).

Since all these observations involve nonlinear structures, predictions from numerical simulations are crucial for the robustness of these constraints. However, as we will discuss below, simulations of WDM structure formation have encountered serious challenges which are only now starting to be overcome.

As for the case of WIMPs, numerical simulations describe warm DM as a perfectly cold fluid but with an initial power spectrum modified to account for the small-scale power suppression due to free streaming effects, erasing small-scale fluctuations below a characteristic free-streaming scale. These initial thermal velocities will decay adiabatically so that they become irrelevant compared to gravitationally induced velocities during structure formation. For instance, even for a relatively warm particle of \(m_{\chi } \sim 250\,\mathrm{eV}\), the velocity dispersion is expected to be 0.28 km/s, to be compared to about 100 to 1000 km/s of typical velocity dispersion in galactic-sized halos. (Nevertheless, there have been attempts to directly sample the WDM velocity in N-body simulations, however, this appears to introduce significant numerical noise, see e.g., Colín et al. 2008; Macciò et al. 2012; Leo et al. 2017 and our discussion regarding neutrinos in Sect. 7.8.2.) While small during the time of the collapse of structures, the free streaming velocities impose phase-space constraints (Tremaine and Gunn 1979), preventing the formation of arbitrarily high densities in caustics and halo centers, in contrast to perfectly cold CDM.

The most important signature of free streaming velocities in structure formation is imprinted as a suppression in the transfer function relative to CDM. There have been several parameterisations for this effect. One of the most widely-used is Bode et al. (2001) for a WDM particle of mass \(m_\chi \) and number of degrees of freedom \(g_\chi \)

$$\begin{aligned} T_{\mathrm{WDM}}(k)= & {} T_{\mathrm{CDM}}(k) \times \left[ 1 + (\alpha \,k)^{2\nu }\right] ^{-5/\nu } \nonumber \\ \text {where}\quad \alpha= & {} 0.05 \left( \frac{\varOmega _\mathrm{m}}{0.4}\right) ^{0.15}\times \left( \frac{h}{0.65}\right) ^{1.3} \left( \frac{m_{\chi }}{1\,\mathrm{keV}} \right) ^{-1.15} \left( \frac{1.5}{g_\chi }\right) ^{0.29}\,h^{-1}{ \mathrm Mpc}, \end{aligned}$$
(91)

which is based on a fit to a linear Einstein–Boltzmann calculation, and where \(\nu \simeq 1\)–1.2, but see also Schneider et al. (2013), Viel et al. (2005) for other popular alternatives.

Warm DM simulations have shown that the abundance of halos decreases exponentially with respect to that in CDM for scales below the half-mode mass, defined as the scale where the transfer function decreases by a factor of 2 with respect to CDM. The abundance of subhalos also is strongly suppressed, though in a smoother fashion owing to the mixture of mass scales caused by tidal stripping, i.e., a broad range of halo masses give origin to any given subhalo mass scale. The lack of small scale structure also leads to a later collapse of halos, which simulations have found is inherited as a reduction in the concentration of halos. These effects can be appreciated in Fig. 15 which compares a zoom simulation of a Milky-Way sized halo assuming a perfectly cold dark matter and a \(\sim 2\) keV WDM particle. The lack of small scale structures and low mass subhalos in the WDM case, as discussed above, is plainly visible. Finally, another interesting question concerns the internal structure of WDM halos near the free streaming scale. As in the case of neutralino simulations, simulations have found indications that WDM halos have inner density profiles that are steeper than their CDM counterparts (Colín et al. 2008; Polisensky and Ricotti 2015; Ogiya and Hahn 2018; Delos et al. 2019).

Fig. 15
figure 15

Image reproduced with permission from Lovell et al. (2012), copyright by the authors

The squared density field of a simulated halo with mass \(\sim 1.2\times 10^{12} h^{-1}{\mathrm{M}}_{ \odot }\) assuming a cold dark matter model in the left panel; and a warm dark matter in the right panel. In this WDM model, fluctuations below \(k = 4.5 h\,\mathrm{Mpc}^{-1}\) are suppressed, which consistent with dark matter being made of resonantly-produced sterile neutrinos of mass 2 keV. Note that the suppression of small scale fluctuations induced by free streaming of warm dark matter results into a lack of low mass subhalos but caustics in the halo outskirts become more visible

Warm Dark Matter simulations are also interesting from a theoretical point of view, since the full hierarchy of nonlinear objects is within reach of a single numerical simulations. Thus, they provide an ideal test-bed for numerical convergence and the accuracy of simulations in general. This is in stark contrast to standard N-body CDM simulation where the smallest resolved scale is always set by numerical limitations rather than physical processes, i.e., the simulated transfer has an effective small-scale cut-off at the scale of the mean inter-particle separation.

Unfortunately, WDM simulations have revealed a serious shortcoming of the N-body method. Ever since the first WDM simulations, they showed the presence of a population of low-mass halos of purely numerical origin. These artificial fragments dominate by factors of 100–1000 in number over those of physical origin (Klypin et al. 1993; Avila-Reese et al. 2001; Bode et al. 2001; Wang and White 2007; Angulo et al. 2012). This problem is ameliorated, but only slowly, with increasing mass resolution, roughly as \(N_{\mathrm{p}}^{1/3}\) (Wang and White 2007; Schneider et al. 2013). An illustration of these fragments is shown in Fig. 16, where we can see in the bottom left region of the image that a vertical filament has been split into small halos, highlighted by the red circles. These fragments likely originate from correlated and highly anisotropic force errors driven by particle discreteness in regions of anisotropic collapse (Hahn et al. 2013; Power et al. 2016).

Fig. 16
figure 16

The simulated density field of Warm Dark Matter simulations illustrating the “artificial fragmentation” problem. The left panel shows the result of a standard N-body simulation where dark matter halos are highlighted as red circles. Note that filaments are broken up into pieces, which are referred to as “artificial fragmentation”, whose abundance increases high with higher resolution. These fragments originate from discreteness errors in N-body simulations, which are considerably suppressed in calculations employing phase-space tessellations as is shown in the right hand side panel. Image adapted from Angulo et al. (2012)

There have been several attempts to solve the problem of artificial fragmentation. At a practical level, there are proposals based on identifying and removing artificial halos according to their Lagrangian properties or by comparing the results of simulations at different mass resolutions (Lovell et al. 2014; Schneider et al. 2013). At a more fundamental level, there are proposals to improve simulation techniques and directly prevent the formation of these spurious micro-halos. Although with limited success, these proposals include adaptive softening based on the local density (Power et al. 2016) and anisotropic softening based on the local moment of inertia (Hobbs et al. 2016). Another interesting idea was proposed by Myers et al. (2016) who showed that the key to accurate N-body simulations is a decorrelation in time of discretization errors. This was achieved by adding an artificial velocity dispersion to particles and periodically remapping the set of N-body particles. To our knowledge, however, this idea has only been investigated in up to 2D and in idealised test problems. Among all the proposed solutions, perhaps the most successful has come from the reformulation of the N-body method in terms of Lagrangian phase-space elements (see Sect. 3.2).

In the context of Warm Dark Matter, Angulo et al. (2013b) showed that by using these Lagrangian methods it was possible to properly quantify the mass function below the cut-off scale. More recently, Stücker et al. (2021b) extended these results as a function of the properties of the cut-off in the initial transfer function by using a hybrid Lagrangian-N-body approach (Stücker et al. 2020), where dynamically simple regions, such as filaments, voids, and sheets are evolved using a phase-space interpolation whereas halos are simulated using N-body discretization. The lack of artificial fragments in these simulations have also allowed to investigate in more detail the observability of warm dark matter in general and in lensing in particular (Richardson et al. 2021).

One might be concerned that artificial fragments could affect the properties of larger halos that have accreted them (e.g., by gravitationally heating their centers). These new generation of Lagrangian simulations are, however, demonstrating that these effects are small and that the N-body method is reliable for halos whose mass is above those dominated by artificial fragmentation. This is an important validation for current constraints on the WDM particle mass, which heavily rely on the correctness of N-body simulations.

7.4 Fuzzy Dark Matter: Quantum Hamiltonians and condensates

A particle can no longer be treated as classical in cosmological context if it is so light that its de Broglie wavelength corresponds to astrophysical scales. If the particle has a high enough number density, then occupation numbers will be so high that it can form a Bose–Einstein-Condensate. Specific examples are generic ultra-light bosons with a wave length of \(\sim 10\) kpc (Press et al. 1990; Frieman et al. 1995) and ultralight axions with masses \(m\gtrsim 10^{-23}\mathrm{eV}\) arising from string theory, e.g., Svrcek and Witten (2006). All these candidates are generically referred to as ‘Fuzzy Dark Matter’ (FDM) and have the distinctive property that they display genuine quantum effects on astrophysical scales (see Marsh 2016; Hui 2021; Ferreira 2021 for recent reviews).

These quantum effects lead to a suppression of small scale structure (due to an effective “quantum pressure”), and as a consequence claims have been made that FDM can alleviate various of the small-scale problems of CDM mentioned in Sect. 7.3, qualitatively similar to WDM (cf. Hui et al. 2017 for an overview). On the other hand, constraints from comparing hydrodynamical simulations with observations of the Ly-\(\alpha \) forest (Iršič et al. 2017a; Armengaud et al. 2017) currently rule out masses smaller than \(10^{-21}\) eV, which is higher than what is needed to solve these problems (Kobayashi et al. 2017). Note, however, that these simulations only model the effects of FDM on the initial transfer function and neglect the effect of quantum pressure in their dynamics, thus there is a debate whether this could affect these constraints (Zhang et al. 2018a; Armengaud et al. 2017; Nori et al. 2019). Nonetheless, ultralight axions remain an interesting dark matter candidate with a distinct signature due to the associated large-scale quantum effects, and that could co-exist together with other forms of dark matter. As for other DM candidates, the observable signatures of FDM are located in the non-linear regime, thus accurate simulation is essential to rule out or eventually constrain the properties of FDM.

Fuzzy dark matter is described by the Schrödinger–Poisson (SP; also known as Gross–Pitaevskii–Poisson in the context of condensates) system for a single complex wave function \(\psi \). This is the non-relativistic limit of a Klein–Gordon field after averaging over rapid oscillations of the scalar field, leaving two scalar degrees of freedom (amplitude and phase). The resulting system is typically expressed through a Hamiltonian of the form

$$\begin{aligned} \hat{\mathscr {H}} = \frac{\hat{p}^2}{2m a^2} + \frac{m}{a}\hat{V} + \frac{\lambda m}{a^{3}} (1+\delta ) \quad \text {with}\quad \nabla ^2 V = \frac{3}{2}H_0^2 \varOmega _m \delta , \quad \text {and}\quad \delta =\left| \psi \right| ^2-1, \end{aligned}$$
(92)

where \(\lambda \) represents a possible self-coupling of the field, and the expansion of the Universe has already been subtracted out as for the co-moving Vlasov–Poisson system. Alternatively, it is possible to transform the SP equations into fluid equations which resemble those of a classical fluid with an additional ‘quantum pressure’ term so that:

$$\begin{aligned} \frac{\partial \varvec{u}}{\partial t} +\frac{1}{a}(\varvec{u}\cdot \varvec{\nabla }) \varvec{u}= -\frac{1}{a} \varvec{\nabla }V - \frac{1}{a^3} \varvec{\nabla }Q \quad \text {with}\quad Q := - \frac{\hbar }{2m} \frac{\nabla ^2 \sqrt{\rho }}{\sqrt{\rho }} \end{aligned}$$
(93)

where m is the axion mass, \(\rho :=1+\delta =|\psi |^2\) the density, and \(\varvec{u}:=\frac{\hbar }{m}\varvec{\nabla }\mathrm{arg}(\psi )\) is the velocity. Q can be regarded as a ‘quantum pressure’ term (prominent in Bohmian quantum mechanics) and is in general singular wherever quantum interference effects are present (since phase jumps, \(\mathrm{arg}(\psi )+2\pi \), are accompanied by \(\rho \rightarrow 0\)). This equation together with the continuity equation for \(\rho \) are also known as the Madelung formulation (Madelung 1927) of quantum mechanics.

The first aspect to consider in FDM numerical simulations is the effect on the initial conditions. At linear order, \(Q\simeq -\frac{\hbar }{4m}\nabla ^2\delta \) which introduces an effective Jeans length below which structure growth is suppressed:

$$\begin{aligned} k_{\mathrm{J}} = \left( \frac{16 \pi G \rho _{\mathrm{b}} m^2}{\hbar ^2}\right) ^{1/4} \underset{a = a_{\mathrm{eq}}}{\simeq } 9\, \mathrm{Mpc^{-1}}\; \left( \frac{m}{10^{-22} eV} \right) ^{1/2}, \end{aligned}$$
(94)

where \(\rho _{\mathrm{b}}\) is the physical mater density, and \(a_{\mathrm{eq}}\) is the expansion factor at matter-radiation equality. The exact effect on the initial transfer function can be computed directly including the relevant modifications to Boltzmann solvers (e.g., AxionCAMB, Hlozek et al. 2015), or approximated as

$$\begin{aligned} T_{\mathrm{FDM}}(k) \approx T_{\mathrm{CDM}}(k) \times \left( \frac{\cos x^3}{1+x^8} \right) \end{aligned}$$
(95)

where \(x \simeq 1.61 (m/10^{-22})^{1/18} k/k_{\mathrm{J}}(a_{\mathrm{eq}})\) (Hu et al. 2000).

There are essentially two approaches in the literature to follow the nonlinear evolution of the axion field. The first one is to directly solve the associated Schrödinger equation for the complex field \(\psi \). Essentially, these codes adopt Eulerian discretisation schemes where the axion field is described on a (static or moving, fixed or adaptive) grid (Schive et al. 2014; Schwabe et al. 2016; Mocz et al. 2017; Edwards et al. 2018; Veltmaat et al. 2018; Mina et al. 2020; May and Springel 2021). We have discussed schemes for the numerical integration in Sect. 4.3. The integrator receives modifications if self-interactions of the field (\(\lambda \ne 0\)) are included. Spectral Eulerian approaches are limited by their lack of spatial adaptivity, so that all recent Eulerian schemes resort to finite difference approximations with AMR or hybrid schemes.

An arguably simpler approach has been followed by other authors by resorting to the Madelung formulation and including the quantum pressure term using an SPH-based estimate (Veltmaat and Niemeyer 2016; Mocz and Succi 2015; Nori and Baldi 2018; Zhang et al. 2018b; Hopkins 2019). Due to the singular nature of the quantum pressure term, this approach will have difficulty to resolve quantum interference effects, which has put some doubts on the results (but see below). During phases of smooth evolution without topological changes (Mocz and Succi 2015) have shown that SPH methods can capture the evolution accurately. This technique has been successfully employed to obtain the first FDM constraints from the Ly-\(\alpha \) forest (Iršič et al. 2017b; Nori et al. 2019) (arguably since these constraints are more sensitive to the suppression of perturbations already in the initial conditions rather than details of the structure in collapsed regions).

The large-scale dynamics in an FDM universe is identical to CDM since Schrödinger–Poisson becomes Vlasov–Poisson in the \(\hbar /m\rightarrow 0\) limit (cf. Widrow and Kaiser 1993; Zhang et al. 2002), and thus, FDM should resemble the filamentary structure of CDM. Numerical simulations have confirmed this and illustrations of the qualitative agreement can e.g., be found in Uhlemann et al. (2014), Kopp et al. (2017), Mocz et al. (2018). On smaller scales, structure formation is suppressed in a way that resembles a WDM cosmology with a free streaming length comparable to the FDM effective Jeans length. On even smaller scales, two effects appear. The first one is the presence of a central long-lived solitonic core in the centers of halos (Schive et al. 2014) whose properties are correlated with the mass of the host halo (Chavanis 2011; Chavanis and Delfini 2011; Chen et al. 2017; Bar et al. 2018). The existence of this central mass excess has been confirmed via Eulerian and Lagrangian simulations, and its presence or absence in observations of galaxies has been used to argue in favor or against FDM (Desjacques and Nusser 2019; De Martino et al. 2020; Pozo et al. 2020; Burkert 2020).

Another key signature of FDM on small scales is the presence of a distinctive granular structure associated with interference patterns and variations in density. This, for instance, can be appreciated in Fig. 17. This feature appears to be a prediction in all Eulerian simulation codes, however, it is absent in Lagrangian ones. The reason behind this is that the Madelung formulation formally diverges when \(\rho \rightarrow 0\), which introduces a singularity that limits the ability of the corresponding numerical methods to correctly capture the behavior of the system. It is thus remarkable that the solitonic cores can be reproduced also in Madelung simulations, which is likely a consequence of them arising from hydrostatic equilibrium in the thermodynamic \(\rho \gg 1\) regime, unaffected by errors arising in earlier, colder, stages of collapse.

Fig. 17
figure 17

Images reproduced with permission from [left] Veltmaat et al. (2018), copyright by APS; and [right] from Mocz et al. (2020), copyright by the authors

The simulated cosmic density field when dark matter is assumed to be made of Fuzzy Dark Matter (FDM), where quantum effects are important on astrophysical scales. The left panel shows a simulation at \(z\sim 1\) from Veltmaat et al. (2018), whose inset zooms into a collapsed structure where a granular behaviour and a dense solitonic core are evident. The right panel shows a comparison of a filament at \(z \sim 7\) simulated assuming CDM, WDM, or FDM (or BECDM), from top to bottom. Note that the filament is broken into small low-mass halos in CDM, whereas in WDM and FDM such small-scale structure is absent due to the smoothing effects of free-streaming velocities and quantum effects, respectively. In addition, in the FDM case, an interference pattern appears as a result of multi-streaming in collapsed regions

Regardless of such numerical problems, there is consensus that interference patterns should appear in FDM and, in fact, they could provide a clear evidence of its existence. For instance, the granularity and the rapid oscillations of the Klein–Gordon field could perturb strong lenses and/or affect the frequency of light from pulsars in a way that could be detected by Pulsar Timing Arrays (Khmelnitsky and Rubakov 2014; Porayko and Postnov 2014; De Martino et al. 2017), or from binary pulsars(Blas et al. 2017). Current constraints are still weak (\(m > 10^{-23}\) eV) (Porayko et al. 2018; Kato and Soda 2020), but the next generation of PTAs and multiple other probes proposed should significantly improve upon these limits, which together with future advances and improvements in numerical simulations, could scrutinise this interesting DM candidate.

7.5 Primordial black holes

Even though large-scale primordial fluctuations are found to follow a \(k^{n_s\simeq 0.96}\) nearly scale-invariant spectrum from the CMB and observations of the large-scale structure of the Universe, small scales are essentially unconstrained. It is therefore possible that large energy-density fluctuations on small scales were generated during inflation. These fluctuations could have collapsed to form a population of black holes (Carr et al. 2016; Carr and Kühnel 2020) as soon as they enter the horizon as early as at the matter-radiation equality.

An interesting possibility is that these primordial black holes (PBH) could make up a significant fraction (or possibly all) of the dark matter without recourse to particles beyond the standard model. This possibility is even more interesting given the recent detection of many black hole mergers detected by LIGO, with measured black hole masses following an astrophysically unexpected distribution. Recent lattice QCD calculations (Borsanyi et al. 2016) now predict the equation of state during the QCD phase transition, which can lead to distinct features in the PBH mass spectrum (Byrnes et al. 2018; Carr et al. 2021).

Various astrophysical observations have put limits on the abundance and mass of these PBHs. The lower mass limit is determined by requiring the evaporation time due to Hawking radiation to be larger than the Hubble time, which puts the minimum mass to make up all of the dark matter at \(m_{\mathrm{PBH}}\gtrsim 10^{15}\,\mathrm{g}\). Up to stellar masses, PBH dark matter is severely constrained by astrophysical observations (microlensing, close binary disruption), at higher than stellar masses it is constrained by allowed spectral distortions of the CMB. Current constraints have ruled out PBH of virtually any mass as the total dark matter with a monochromatic PBH mass function, except for a window around \(10^{-16}\)\(10^{-10}\,\mathrm{M_{\odot }}\) (Carr et al. 2020a), which narrows to a small region around \(10^{-16}\,\mathrm{M_{\odot }}\) if constraints from white dwarf and neutron star disruptions are included. However, a broad spectrum of PBH masses would render them unconstrained by any single observation. Furthermore, since the physics behind PBH formation and that behind an hypothetical DM particle are in principle unrelated, it is possible that both types of dark matter coexist, but CDM clustering around massive PBHs does constrain that scenario (Adamek et al. 2019; Carr et al. 2020b).

The evolution of PBHs is determined by collisional dynamics, thus it can be correctly captured by traditional N-body codes for small or zero softening length. Due to the strong accelerations possible in pair encounters and the need to resolve the formation and disruption of PBH binaries, integration techniques known from star cluster simulations need to be employed (Aarseth 2009), but traditional integrators are sufficient when one is interested in the large-scale distribution of PBHs.

Concerning initial conditions, on sufficiently large scales, PBH fluctuations follow that imprinted in the particle dark matter (CDM), while on small scales the PBHs’ distribution can be assumed to be Poissonian. Recent numerical simulations of the particle DM and PBH have employed standard N-body codes (Inman and Ali-Haïmoud 2019; Tkachev et al. 2020), usually augmented with models for BH-BH mergers. In Fig. 18, we show the density field at \(z=99\) of one of such simulations assuming that 10% or 100% of the DM is in the form of PBHs. For low DM fractions in the form of PBHs, Inman and Ali-Haïmoud (2019) found that DM halos form around single a PBH which have steep power-law density profiles. As they consider higher PBH fractions, halos contain a larger number of PBHs and display broken power laws. In addition, the formation of small halos occurs earlier the larger the PBH fraction, which could have observable consequences (in e.g., the formation of first stars or reionisation). Currently, these simulations are limited by computational resources. In the future, one can expect that these simulations will be able to tackle questions such as the formation of PBH binaries already in the radiation-dominated epoch, potentially informing the fraction of PBH mergers detectable by future gravitational wave experiments.

Fig. 18
figure 18

Image reproduced with permission from Inman and Ali-Haïmoud (2019), copyright by APS

The projected matter distributionat \(z=99\) as predicted by numerical simulations assuming primordial black holes (PBH) make up 10 or 100% of the dark matter (left and right panels, respectively). The simulations correspond to box sizes of \(30 h^{-1}\mathrm{kpc}\) and assume \(20 h^{-1}{\mathrm{M}}_{ \odot }\) as the mass of the PBHs. Note that for larger PBH fractions, structures collapse earlier containing multiple PHB, whereas for low fractions, PBH are mostly founds in isolation but are surrounded by “standard” dark matter particles

Note that if the small-scale fluctuations produced by inflation are not large enough to collapse into PBHs, they could create a population of dense low-mass dark matter halos, usually referred to as ultra-compact minihalos (Berezinsky et al. 2003; Bringmann et al. 2012). These objects are found to have a steep inner density profile, \(\rho \sim r^{-9/4}\) or \(r^{-1.5}\), in simulations (Gosenca et al. 2017; Delos et al. 2018a), and they are expected leave their own distinctive observational signature that could be constrained by microlensing (Ricotti and Gould 2009; Li et al. 2012b), pulsar time delays (Clark et al. 2016), or by (the lack of) a gamma-ray signal if DM self-annihilates (Bringmann et al. 2012; Gosenca et al. 2017; Delos et al. 2018b).

7.6 Self-interacting dark matter (SIDM), decaying and dissipative dark matter

For many purposes, dark matter can be considered to have no other interactions besides gravity. This ’collisionless’ assumption is justified since after freeze-out (in the case of thermal production of the dark matter particles), non-gravitational particle-particle interactions must be very weak given current observational constraints. Nevertheless, any physically motivated particle DM model must have a non-zero interaction cross-section in order for those particles to be produced in the early Universe in the first place.

There are various possible interaction mechanisms. For instance, dark matter can interact and annihilate with itself (e.g., if it were a Majorana particle) producing standard model particles and (gamma-ray) photons. This has fueled searches of annihilation products in galaxy clusters, Milky way satellites, the galactic centre, or as a diffuse extragalactic background [see Leane (2020) for a recent review]. This ‘indirect detection’ of particle dark matter heavily relies on N-body simulations which can pinpoint the most likely places for a putative detection, as well as the expected emission [see e.g., Springel et al. (2008b), Zavala et al. (2010) and Kuhlen et al. (2012), Fornasa and Sánchez-Conde (2015), Zavala and Frenk (2019) for reviews]. Although these searches have not been successful so far (Ackermann et al. 2012; Abdallah et al. 2016), future facilities such as the Cherenkov Telescope Array will offer new prospects for a detection (Doro et al. 2013).

Other possible self-interaction considers the weakly collisional regime with elastic binary processes \(\chi \chi \rightarrow \chi \chi \). This case is generically known as self-interacting dark matter, and the cross-sections can be large enough to be relevant for structure formation. For instance, weak collisionality was found to have an effect on the density profiles of isolated halos (Burkert 2000; Kochanek and White 2000), predominantly by isotropising the core velocity dispersion and reducing the density. In principle, such systems can undergo a gravothermal catastrophe leading to core collapse, but current constraints on the cross-section put the time-scale for this at \(\gtrsim 100~\mathrm{Gyr}\) (Koda and Shapiro 2011). The resulting observational signatures enable constraints of such ‘richer’ dark sector physics [see e.g., Tulin and Yu (2018) for a recent review of the topic], which as for other DM candidates, required a detailed simulation counterpart.

Self-interacting DM requires a modelling of microscopic short-range particle scattering in N-body simulations. These interactions alter the overall the dynamics which is no longer purely geodesic, and thus it requires upgrading from the Vlasov–Poisson (cf. Eq. 5) to the Boltzmann–Poisson system of equations by adding a scattering balance term

$$\begin{aligned} \frac{\partial f}{\partial t} + \frac{\varvec{p}}{ma^2} \cdot \varvec{\nabla }_x f - m \varvec{\nabla }_x\phi \cdot \varvec{\nabla }_p f = \varGamma _{\mathrm{in}} - \varGamma _{\mathrm{out}}, \end{aligned}$$
(96)

where \(\varGamma _{\mathrm{in}}\) is the instanteneous rate of change of \(f(\varvec{x},\varvec{p},t)\) due to all scattering events that lead to particles ending up in an infinitesimal phase space volume \(\mathrm{d}^3x\,\mathrm{d}^3p\) around the point \((\varvec{x},\varvec{p})\), and \(\varGamma _{\mathrm{out}}\) the respective rate for scattering out of that volume.

Assuming a two-particle momentum ‘in state’ \((\varvec{p},\, \varvec{p}_1)\) is scattered into the ’out state’ \((\varvec{p}^{\prime},\,\varvec{p}^{\prime}_1)\), then the scattering rate balance, given a differential scattering cross section \(\mathrm{d}\sigma _\chi / \mathrm{d}\varOmega \), is

$$\begin{aligned} \varGamma _{\mathrm{in}}-\varGamma _{\mathrm{out}} = \,\int _{\mathbb {R}^3} \mathrm{d}^3p_1 \int _{0}^{4\pi } \mathrm{d}\varOmega \,\frac{\mathrm{d}\sigma _\chi }{\mathrm{d}\varOmega } \frac{\left\| \varvec{p}-\varvec{p}_1\right\| }{ma^2} \,\left[ f(\varvec{x},\varvec{p}^{\prime}; t) f(\varvec{x},\varvec{p}^{\prime}_1;t)- f(\varvec{x},\varvec{p}; t) f(\varvec{x},\varvec{p}_1;t) \right] , \end{aligned}$$
(97)

where the centre-of-mass scattering (solid) angle \(\varOmega \) contains the remaining two degrees of freedom in \((\varvec{p}^{\prime},\,\varvec{p}^{\prime}_1)\) allowed by symmetries for binary elastic collisions, i.e. after accounting for momentum and energy (and particle number) conservation. Unfortunately, if one inserts the N-body distribution function (30) in the scattering rate \(\varGamma :=\varGamma _{\mathrm{in}}-\varGamma _{\mathrm{out}}\), one does not arrive at a practicable discretisation since \(\varGamma \) is zero unless two N-body particles are located at the exact same location. Since each N-body particle anyway represents an element of phase space in a coarse-grained sense, practical approaches (Vogelsberger et al. 2012; Rocha et al. 2013) have to resort to estimating \(\varGamma \) with smeared out particles, relying on an SPH-like approach where (30) is replaced with

$$\begin{aligned} f_N(\varvec{x}, \varvec{p}, t) = \sum _{\varvec{n}\in \mathbb {Z}^3} \sum _{i=1}^N \frac{M_i}{m} \,W\left( \left\| \varvec{x}-\varvec{X}_i(t)-\varvec{n} L \right\| ;\,h_i\right) \,\delta _D(\varvec{p}-\varvec{P}_i(t)), \end{aligned}$$
(98)

where we recall that \(M_i\) and \(\varvec{P}_i\) are the mass and momentum of N-body particles, and \(W(r;\,h)\) is a smooth kernel function with finite support and \(h_i\) a bandwidth (or smoothing scale) parameter chosen so that several other particles are found within a distance \(h_i\) from particle i (the summation over periodic copies is of course irrelevant for the self-interaction). This distribution function can then be used to express the scattering rate experienced by particle i as [see Rocha et al. (2013) for details of the calculation]

$$\begin{aligned} \begin{aligned} \varGamma _i&= \sum _j \varGamma _{i\mid j} = \sum _j \frac{\sigma _\chi }{ma^2} \left\| \varvec{P}_i -\varvec{P}_j \right\| V_{ij},\\&\quad \text {with}\quad V_{ij} = \int _{\mathbb {R}^3} \mathrm{d}^3x \;W(\Vert \varvec{x}-\varvec{X}_i\Vert ; h_i)\,W(\Vert \varvec{x}-\varvec{X}_j\Vert ;h_j) \end{aligned} \end{aligned}$$
(99)

representing the overlap between the particle kernels. Since numerically \(\varGamma _{i\mid j} = \varGamma _{j\mid i}\) is not guaranteed, Rocha et al. (2013) proposed to use a symmetrised scattering rate \(\varGamma _{ij}:=(\varGamma _{i\mid j}+ \varGamma _{j\mid i})/2\). Once the scattering rate \(\varGamma _{ij}\) is known, the scattering probability over a time step \(\varDelta t\) is \(\mathcal {P}_{ij} = \varGamma _{ij}\varDelta t\). Then, one can realise an elastic scattering event by sampling this probability over one timestep and randomising the direction of the relative velocity vector of particles i and j in the centre-of-mass frame. Such a Monte Carlo approach to the weakly collisional regime has been compared and validated with a non-ideal fluid model e.g., by Koda and Shapiro (2011).

Dark matter self-interactions have been included in many cosmological simulations (e.g., Yoshida et al. 2000; Vogelsberger et al. 2012; Rocha et al. 2013). An important focus of such simulations is on dwarf galaxies (e.g., Zavala et al. 2013) and on simulations of galaxy clusters (e.g., Brinckmann et al. 2018; Banerjee et al. 2020 for recent examples) since merging clusters provide some of the strongest constraints on the cross-section of dark matter self-interaction (Harvey et al. 2015; Kahlhoefer et al. 2015; Robertson et al. 2017). Note that self-interactions are also included in the ’ETHOS’ effective dark matter model (Cyr-Racine et al. 2016; Lovell et al. 2018), which we will discuss in the next subsection.

Another possible interaction is decaying dark matter, where the dark matter decays into massless species on cosmological time scales. Current constraints from the CMB require that, if all dark matter decays into photons (or dark radiation), then the decay rate is at most \(\varGamma ^{-1} \gtrsim 160 \mathrm{Gyr}\) (Audren et al. 2014). However, a much smaller fraction of DM can decay and still leave a cosmological signature. In this case, the corresponding increase in energy density of relativistic species at late times needs to be accounted for in the background evolution of numerical simulations. Additional non-homogeneous relativistic corrections can, e.g., be absorbed in a similar way as for trans-relativistic massive neutrinos (see Sect. 7.8.2) using linear theory corrections to the gravitational potential (Dakin et al. 2019b) or using a gauge approach (Fidler et al. 2017b). Beyond a background contribution, dark matter decay and annihilation could also contribute to a heating of the gas in low mass halos (e.g. Schön et al. 2015), and the inter-galactic medium (IGM), which can be modelled e.g., in hydrodynamic cosmological simulations (Iwanus et al. 2017; List et al. 2019b), or using machine learned modifications to matter only simulations (List et al. 2019a), similar to those employed to include baryonic effects or modify cosmological parameters (see Sect. 9.8).

While Ockham’s razor might push us to consider simple DM models, there is a priori no reason that the physics of the dark sector could not be significantly richer, with multiple dark species and internal degrees of freedom. An intermediate space between self-interacting dark matter and decaying dark matter is occupied by dissipative dark matter, which allows for up-scattering to an excited state \(\chi \chi \rightarrow \chi ^{\prime}\chi ^{\prime}\) with a subsequent decay, e.g., \(\chi ^{\prime}\rightarrow \chi +X\) under emission of a light or massless particle ’X’. Such processes can efficiently remove energy from the centres of halos and lead to halo core collapse in less than a Hubble time (Essig et al. 2019; Huo et al. 2020) leaving testable signatures in dwarf galaxies. They could also contribute to the formation of supermassive black holes already at high redshift (Choquette et al. 2019). Note, however, that dissipative dark matter interactions are constrained by the non-detection of DM acoustic oscillations (Cyr-Racine et al. 2014), and absence of a significant thin ‘dark disk’ in the Milky Way (Schutz et al. 2018).

7.7 Effective descriptions

In the previous subsections we considered in detail specific DM candidates. However, many more alternatives exist which span a broad range of masses, interactions, production mechanisms, etc. For instance, sterile neutrinos produced by resonant or non-resonant transitions, production in the decay of a parent particle which might result in thermal or non-thermal distributions, or mixed models in which DM is made out multiple particles with different properties. Furthermore, new candidates are being constantly proposed.

Since it is unpractical to carry out numerical simulations for every possible particle and covering their respective degrees of freedom, effective descriptions have been proposed. The basic idea is that a large fraction of physically-viable candidates can be mapped to a generic model with a few free parameters. For instance, since nonlinear large-scale structure expected in a given DM model depends mostly on the initial transfer function, deviations from the CDM transfer function can be parameterised, and a given DM candidate can be mapped to a particular point in this parameter space. Specifically, Murgia et al. (2017) have argued that a large class of DM models can be described using a generalisation of the WDM modification given in Eq. (91)

$$\begin{aligned} T_\chi (k) = T_{\mathrm{CDM}}(k)\times \left[ 1 + \left( \alpha k\right) ^{\beta } \right] ^{\gamma } \end{aligned}$$
(100)

where \(\alpha \), \(\beta \), and \(\gamma \) are the three free parameters of the model (but note a strong degeneracy between \(\alpha \) and \(\gamma \)). This form could be extended with additional free parameters to describe a broader range of models, especially at high wavenumbers (such as ‘dark oscillations’). On the other hand, these scales are expected to have a very minor impact on structure formation.

A different effective description has been proposed in terms of the particle physics Lagrangian in the so-called Effective Theory of Structure Formation (ETHOS) (Cyr-Racine et al. 2016). The physical parameters of a given DM model (DM particle mass, coupling constants, number of degrees of freedom, mediator mass, etc.) map into effective parameters that determine the initial transfer function (e.g., \(\alpha \), \(\beta \), and \(\gamma \) in Eq. 100) and an effective (velocity-dependent) cross section.

An advantage of these effective approaches is that only a relatively small number of simulations need to be carried out for the free parameters of the effective description (Vogelsberger et al. 2016; Stücker et al. 2021b). With these, predictions for the nonlinear structure, their observable signatures, and comparison with observations, can be obtained for any DM model. This has been done for the first galaxies and reionization, the Ly-\(\alpha \)-forest, the abundance of dwarf galaxies, and gravitational lensing (Murgia et al. 2017; Lovell et al. 2018; Díaz Rivero et al. 2018; Lovell et al. 2019; Bose et al. 2019).

This represents an example where N-body simulations are directly employed in constraining cosmological parameters and fundamental physics—the properties of dark matter in this case. As we will argue later, this approach can also be applied to observations of the large-scale structure but it is of paramount importance to demonstrate the accuracy and robustness of the corresponding numerical predictions.

7.8 Multiple species with distinct initial perturbation amplitudes

Due to the dominance of collisionless dark matter and the coldness of both dark matter and baryons during the structure formation epoch, large-scale simulations usually represent the total matter component with a single collissionless fluid. This fluid is commonly referred to as dark matter but in reality it represents dark matter and any other massive, non-relativistic component in the Universe. While this is a good approximation in many cases, the advent of more precise observations and, as a consequence, stricter accuracy requirements for numerical simulations demands the simulation of multiple fluids as the other known massive components in the universe beside dark matter—baryons and neutrinos— have different initial distributions and relative velocities which affects late-time structure.

In this section we review several efforts and approaches to simulating the gravitational co-evolution of multiple fluids, specifically baryons and neutrinos. This is a veritable challenge, that only recently is becoming possible to tackle thanks to multiple novel algorithms and development in numerical techniques.

7.8.1 Baryons

If we consider only primordial adiabatic modes, at very early times and on super-horizon scales, baryons and dark matter have an identical spatial distribution as imprinted by the quantum fluctuations processed by inflation. However, the tight coupling of baryons with the radiation field via Compton scattering prior to recombination renders its subsequent evolution distinctly different from that of the dark matter. Radiation pressure opposes the growth of baryonic overdensities creating oscillations in density and temperature which are damped on small scales because of an imperfect coupling and Jeans damping. On the other hand, dark matter is not expected to couple to radiation and mostly grows unimpeded. After recombination, baryons and photons effectively decouple and the same gravitational interactions dominate the growth of baryon and dark matter overdensities. However, the starting point for baryons and dark matter is already different. In addition to this effect caused by recombination on purely adiabatic perturbations, primordial isocurvature perturbations could add additional differences among baryons and dark matter.

The linear evolution of a system of multiple fluids coupled by gravity can be modelled and computed accurately by Einstein–Boltzmann solvers, which, in fact, show that the power spectrum of density fluctuations at \(z \sim 100\) (the typical starting redshift of numerical simulations) are still significantly different. For instance, the baryonic acoustic oscillations are barely present in the dark matter, there is a relative velocity between baryons and dark matter, and the fluctuation amplitues in baryons are approximately half of those in dark matter, even on gigaparsec scales. This can be seen in Fig. 19 which shows the time evolution of the power spectra for total mass, baryons, and dark matter as predicted by linear perturbation theory and by a N-body simulation.

Fig. 19
figure 19

Image reproduced with permission from Angulo et al. (2013a), copyright by the authors

The time evolution of cold dark matter plus baryon fluctuations from \(z=130\) up to \(z=0\). The left panels show the linear-theory power spectrum for the total mass, cold dark matter, and baryons as solid, dotted, and dashed lines, respectively. Coloured lines show the mass power spectrum predicted by an N-body simulation. The right panels show the ratio of the power spectra of baryons to that of cold dark matter. The amplitude of fluctuations at high-z is typically a factor of 2 smaller for baryons that for dark matter. These differences are progresively reduced at lower redshifts to about a few percent at late times

Although one could think that simulating gravitationally baryons and dark matter is a straightforward extension of standard N-body codes, achieving an accurate evolution as well as initial conditions has proven to be rather challenging, and it is an example of a situation where certain discretizations of the underlying equations can result in a considerable numerical error.

A range of studies (Yoshida et al. 2003; O’Leary and McQuinn 2012; Angulo et al. 2013a) have shown that a naïve simulation of such a baryon-DM two-fluid system leads to the incorrect evolution of the relative baryon-dark matter perturbations. Spurious particle coupling dominates over the real differences on small scales which then quickly propagates to large scales. In early studies, these errors could only be suppressed if forces were smoothed on scales larger than the mean inter-particle separation, either with a large fixed softening or one adapting with the local density (O’Leary and McQuinn 2012; Angulo et al. 2013a).

Other proposed solutions (Yoshida et al. 2003; Bird et al. 2020) relate to the initial particle load (cf. Sect. 6.4), of the two-fluid simulations – adopting at least one glass distribution for one component reduces the spurious coupling even at high force resolution. However, Hahn et al. (2021) demonstrated that the dominant error contribution comes from the constant mode (the density difference \(\delta _b-\delta _c\) approaches a constant at late times) which can be absorbed at all orders of LPT into a simple variation of the relative masses of CDM and baryon particles. This, therefore, enables an extension of Lagrangian Perturbation theory for multiple fluids (Rampf et al. 2021b) which allowed more accurate initial conditions and also lower starting redshifts. The inclusion of relative velocities beyond linear order in the ICs is however still an unsolved problem. In contrast to this back-scaling approach, missing physics in the non-linear solvers (such as the small residual coupling to radiation at \(z\gtrsim 100\)) makes the ‘forward’ approach to initial conditions for CDM+baryon simulations inaccurate at low z, so that in general ‘back-scaling’ should be preferred (cf. Sect. 6.1).

Simulations by Angulo et al. (2013a) showed that the nonlinear total matter power spectrum is largely unaffected by the single/two fluid distinction with the \(z=0\) results differing by less than 0.1% at \(k \sim 1 h\,\mathrm{Mpc}^{-1}\). However, baryon-dark matter differences could be imprinted in halo formation, as expected on theoretical grounds. This question was indeed investigated by using simulations with adaptive softening (Khoraminezhad et al. 2021) and by using an extension of the separate universe approach (Barreira et al. 2020a), which does not suffer from the numerical inaccuracies described above. These authors showed that halo formation is actually sensitive to baryon-dark matter fluctuations, in agreement with analytic arguments (Chen et al. 2019; Schmidt 2016). At a fixed matter overdensity, halos tend to form more efficiently in regions with smaller baryon-dark matter differences. While small, this additional dependence creates coherent fluctuations even on large (BAO) scales which could affect observational constraints on neutrino masses and from baryonic acoustic oscillations (Chen et al. 2019). On much smaller scales, baryons stream pass dark matter collapsed structure (Tseliakhovich and Hirata 2010), which delays their eventual accretion and thus is expected to affect structure formation in a correlated way. Numerical simulations of this effect have shown that neglecting this streaming leads to an overestimation of the abundance of high redshift low-mass halos (Tseliakhovich and Hirata 2010; Dalal et al. 2010b; Park et al. 2020).

We note that in addition to the above, another (in principle unrelated) problem is that caused by baryonic physics itself (finite Jeans scale, gas cooling, UV heating, feedback, etc), which will be discussed below (cf. Sect. 9.9.5).

7.8.2 Massive neutrinos

Neutrinos are one of the fundamental particle families in the Standard Model of particle physics, where they are expected to be massless given the allowed symmetries. However, observations of flavor oscillations in solar and atmospheric neutrinos indicate that they do have mass, which could be a signature of physics beyond the standard model. The measurement of the absolute mass scale of neutrinos is very important, since, when combined with measurements of the neutrino-mass splitting coming from neutrino oscillation experiments, they could distinguish whether neutrinos are Majorana or Dirac particles. This in turn could indicate the kind of extension required to the standard model and thus answering one of the fundamental questions in physics.

Recent results from the KATRIN experiment find an electron neutrino mass upper bound of 0.8 eV (at 90% CL) from the study of the electron endpoint energy in tritium decay experiments (Aker et al. 2019). However, the large-scale structure of the Universe provides currently the most accurate method to constrain the total mass of neutrinos, but note that future experiments such as PTOLEMY (Betti et al. 2019) could be competitive with forecasted errors of \(10^{-3}\)eV. Current constraints on neutrino masses from large-scale structure and the CMB are \(\sum m_{\nu } < 0.12\,\mathrm{eV}\) (Palanque-Delabrouille et al. 2020; Planck Collaboration 2020), but the upcoming generation of large-scale surveys have the potential to achieve an accuracy of \(\sigma [\sum m_{\nu }] = 0.01 - 0.03\,\mathrm{eV}\) when combined with CMB lensing (Boyle and Schmidt 2020; Chen et al. 2021). Therefore, precise numerical simulations of cosmic structure formation in the presence of neutrinos, and particularly their interplay with cold dark matter has become increasingly important.

Neutrinos affect both the expansion history and the clustering of matter in the Universe. The neutrino temperature is intimately connected to the photon temperature as \(T_\nu = (4/11)^{1/3}T_\mathrm{CMB}\simeq 1.95~\mathrm{K}\). As the Universe expands, neutrinos cool and become non-relativistic at \(z \simeq 189 (\sum m_{\nu } / 0.1\,\mathrm{eV})\). While they are relativistic, they contribute to the total energy density of the universe as \(\varOmega _\nu (a) = N_\mathrm{eff}(7/8)(4/11)^{4/3}\varOmega _\gamma a^{-4}\) (where for massive neutrinos we always explicitly indicate the time-dependence in the density parameter), while as a non-relativistic species they contribute as

$$\begin{aligned} \varOmega _{\nu }(a) = \frac{\sum m_{\nu }}{93.14\,\mathrm{eV}} \,h^{-2}\,a^{-3} . \end{aligned}$$
(101)

For intermediate cases, the neutrino mass fraction needs to be obtained numerically from the Fermi-Dirac distribution function

$$\begin{aligned} \varOmega _{\nu }(a) = a^{-3} \sum _j \left( \frac{m_{\nu ,j}}{5.32~\mathrm{meV}}\right) ^4 \int _0^\infty \mathrm{d}y\,\frac{y^2\sqrt{1+y^2/a^2}}{\exp \left[ \beta _j y\right] +1}, \end{aligned}$$
(102)

where \(\beta _j:= m_{\nu ,j}c^2/(k_BT_\nu )\).

Since they are still relativistic when they decouple at \(z\sim 10^9\), they have a typical velocity of about a few hundreds km/sec today. Therefore at late times they contribute mostly as perturbations through Newtonian gravity rather than as a relativistic species through the background, so that for a CDM+baryon+neutrino simulation one would have to solve the Poisson equation

$$\begin{aligned} \nabla ^2 \phi&= \frac{3 H_0^2}{2 a} \varOmega _{\mathrm{m}} \bigl (f_{\mathrm{c}} \delta _{\mathrm{c}} + f_{\mathrm{b}} \delta _{\mathrm{b}} + f_\nu (a)\,\delta _{\nu }\bigr )\nonumber \\&\text {where} \quad f_{\mathrm{c}}:=\frac{\varOmega _{\mathrm{c}}}{\varOmega _\mathrm{m}},\quad f_{\mathrm{b}}:=\frac{\varOmega _{\mathrm{b}}}{\varOmega _{\mathrm{m}}},\quad f_\nu (a):= \frac{\varOmega _\nu (a)}{\varOmega _{\mathrm{m}}a^{-3}}. \end{aligned}$$
(103)

Massive neutrinos therefore, depending on their relative importance through their total mass fraction, need to be included in simulations of structure formation.

Many different approximations and discretisations were proposed to include neutrinos into non-linear simulations of structure formation. These approaches adopt different degrees of simplification, but, as we will see below, they all agree to a large extent for the small neutrino masses compatible with observations.

In the simplest approach, massive neutrinos are simply included in the Einstein–Boltzmann solver, and total matter initial conditions for the N-body simulation are generated in a normal ‘back-scaling’ approach (Agarwal and Feldman 2011; Upadhye et al. 2014). More recent approaches however try to capture also the time-evolution of the massive neutrino component. Since they are hot dark matter, the evolution of neutrinos obeys the Vlasov–Poisson equations with a hot distribution function once they are non-relativistic. In contrast to cold matter, neutrinos remain fairly linear so that it is in principle possible to solve the VP equation directly in phase space. This indeed has been recently achieved by Yoshikawa et al. (2020, 2021), who followed CDM using traditional N-body techniques and solved the VP equation describing neutrinos as an incompressible phase space fluid using a 6D Cartesian grid. This solution is however computationally very expensive. Another approach is to apply the N-body approach to sample the six-dimensional distribution function of neutrinos, i.e. at each location, multiple particles sample the possible momentum magnitude and directions given by a Fermi-Dirac distribution. This approach can capture the nonlinear evolution of the neutrino fluid and traditionally had been regarded as the gold standard in the field. Consequently, it has been adopted in some very large simulations from which most of the current knowledge about the role of neutrinos in structure formation stems (Brandbyge et al. 2008; Viel et al. 2010; Castorina et al. 2015). This N-body approach has been extended to relativistic simulations by Adamek et al. (2017b), which accurately take into account the transition of neutrinos from the relativistic to the non-relatvitistc regime, along with the contribution of both their energy density and their anisotropic stress to metric perturbations in the weak field regime.

Unfortunately, a large number of neutrino particles are needed to reduce discreteness noise, which rapidly increases the computational cost of these simulations (Emberson et al. 2017), since the error on the mean momentum (and thus essentially the growing mode) only decreases with the Poisson noise. Some alternatives have been proposed to minimise the impact of noise. One of them, Ma and Bertschinger (1994), used a pairing of neutrino particles sampling exactly opposite momenta to enforce local momentum conservation. This guarantees that at the very start of the simulation, the mean field is exactly recovered, but is washed out shortly afterwards. To circumvent this problem, Banerjee et al. (2018) sampled the directional distribution in a more regular fashion (using the Healpix decomposition of the sphere rather than uncorrelated random directions). More recently, Elbers et al. (2021) proposed another method to reduce the Poisson noise by sampling only the deviations from the linear solution with particles, which shifts the sampling error from the expectation value of the mean to the expectation value of the deviation from the linear solution (referred to as the ‘\(\delta f\) method’).

Alternatively, to avoid these numerical inaccuracies and reduce the computational cost, one can take advantage of the fact that neutrinos are never expected to cluster significantly, owing to their large peculiar velocities, and thus can be treated perturbatively. In the simplest approach, neutrinos are described on a grid which is evolved according to linear theory (i.e. the neutrino density obtained with a linear Einstein–Boltzmann solver is convolved with the random phases of the simulation), which is then co-added to the total large-scale gravitational potential field (Brandbyge and Hannestad 2009). An advantage of this approach is that the neutrino perturbations can be computed incorporating also general relativistic effects (Tram et al. 2019). A disadvantage is that it cannot correctly capture the nonlinear evolution of neutrino perturbations and that momentum and energy conservation are not guaranteed as there is no back-reaction of non-linear matter on the neutrinos (which is however negligible if the neutrinos are light).

A refinement can be obtained with the so-called linear-response approaches where, although neutrinos are treated perturbatively, the full nonlinear dark matter field is used as a source in the perturbative solution for the neutrinos (Ali-Haïmoud and Bird 2013). This approach appears to work extremely well for all but the slowest moving neutrinos which can be captured in dark matter halos and develop significant nonlinearities. A further refinement was proposed by Bird et al. (2018) where only the initially coldest neutrinos are sampled with N-body particles, and an even more efficient implementation was recently proposed by Chen et al. (2021).

A somewhat different approach is to solve the neutrino evolution using a fluid approach through a Boltzmann hierarchy expansion. One formulation has been proposed by Dakin et al. (2019a) who considered the first three moments of the Boltzmann equation: solving for the continuity and Euler equations in Eulerian space (i.e. the lowest two moments of the Boltzmann equation) while describing the third moment, capturing the stress tensor, at linear order. These terms could also be estimated using N-body particles (Banerjee and Dalal 2016), or by decomposing the neutrino phase-space into shells of equal speed and evolve them with hydrodynamic equations (Inman and Yu 2020).

Neutrinos have also been incorporated in approximate N-body methods (cf. Sect. 4.4 for a review of such approaches). Recently, Bayer et al. (2021) proposed an extension to the FastPM algorithm by modifying the kick and drift operator; and Wright et al. (2017) extended the COLA algorithm by incorporating the scale-dependent effect of neutrinos in the growth factors. On the other hand, an accurate prediction for the impact of neutrinos can also be achieved in post-processing by using cosmology-rescaling methods (Zennaro et al. 2019; Angulo and White 2010; Contreras et al. 2020c), or as a gauge transformation (Partmann et al. 2020).

Finally, when neutrinos are treated in a Newtonian framework at low redshift, simulation initial conditions cannot be accurately used in a “forward” approach (cf. Sect. 6.1), which would result in significant inaccuracies (see e.g., Bastos de Senna Nascimento and Loverde 2021). Zennaro et al. (2017) proposed a back-scaling approach where for a given target redshift, the linear power spectrum predicted by linear Boltzmann solvers are integrated backwards in time accounting only with gravitational interactions. This is in principle unstable for long times since decaying modes are excited that blow up during the backwards evolution, but due to the low mass of neutrinos, this approach works well for starting times \(z_{\mathrm{start}}\lesssim 100\). In this way, when setting up first order accurate initial conditions as in Eq. (73), particles are set-up at the starting redshifts so that when evolved with gravitational interactions they retrieve the correct large-scale density fluctuations (see also the discussion on ‘backward’ initial conditions in Sect. 6.1).

Despite the huge differences among the various methodologies to include neutrinos into non-linear simulations, all these approaches agree extremely well on their predictions for the impact of (light) massive neutrinos on the large-scale structure, especially for the allowed neutrino masses where differences are typically at the sub-percent level. Therefore, currently the cosmic evolution of neutrinos appears to be a robust prediction of numerical simulations. This is illustrated in Fig. 20 which displays the projected density field of neutrinos in a box of \(180 h^{-1}\mathrm{Mpc}\) employing various numerical approaches discussed in this section. Here we can clearly see how linear theory predictions correctly capture the large-scale distribution of neutrinos but underestimate the high-density tail. Additionally, discreteness noise is evident in the particle method. Nevertheless, the agreement of all methods on large scales is remarkable, especially for lighter neutrinos. In the figure insets we see, however, how these numerical methods differ in the accuracy with which describe neutrinos in the nonlinear regime and the associated degree of noise.

Fig. 20
figure 20

Comparison of the simulated density field of massive neutrinos with total mass \(\sum m_{\nu }=0.5eV\) and 0.1eV (top and bottom rows, respectively). Each panel shows the results of various computational methods; linear perturbation theory, linear response, particle method, and \(\delta f\) method. For comparison, the leftmost column shows the cold mass (cold dark matter and baryons) over the same volume. Note the inset in each panel showing a zoom into a massive dark matter halo. Image adapted from Elbers et al. (2021), copyright by the authors

In Fig. 20 one can also see that neutrinos follow the same large-scale patterns as the dark matter. In contrast, on smaller scales, the large velocity dispersion of the neutrino field stops its growth and the field remains smooth. This causes a well-known scale dependent suppression in the total matter clustering. On intermediate scales below the neutrino free streaming, the suppression has an amplitude roughly proportional to \(\varDelta P/P \sim -10 f_{\nu }\), slightly larger than the linear theory prediction \(\varDelta P/P \sim -8 f_{\nu }\) (Brandbyge et al. 2008). On even smaller scales, the differences decrease due to nonlinearities (see, e.g., Hannestad et al. 2020).

Usually, numerical simulations consider a degenerate mass state for neutrinos, i.e. assume that all neutrino states have the same mass. This is incompatible with oscillation experiments, but justified as cosmic structure is mostly sensitive to the total neutrino mass. However, the linear power spectrum is slightly different between the normal and inverted hierarchy (Lesgourgues et al. 2004; Jimenez et al. 2010). Although the effect is small—below 0.4–0.6% for weak lensing and galaxy clustering observables (Archidiacono et al. 2020)—numerical simulations have shown that this signature is imprinted and even enhanced during nonlinear clustering, which opens up the possibility of a marginal detection with future large-scale structure surveys (Wagner et al. 2012). However, this will require significant advances in the modelling of baryonic effects, redshift space distortions, and galaxy bias.

The role of neutrinos has also been explored via the ‘separate universe’ technique (cf. Sect. 6.3.6) (Chiang et al. 2018). These simulations have shown that the linear bias of dark matter halos is scale-dependent in the presence of neutrinos. These findings were later confirmed by the same authors but using standard N-body simulations (where the contribution of neutrinos was artificially increased to enhance their effect) (Chiang et al. 2019), who further showed that the scale dependence persists regardless of whether it is defined with respect to the total or cold mass (CDM plus baryons) power spectrum.

7.9 Primordial non-Gaussianity and small-scale features from inflation

So far, we have considered the case where primordial fluctuations in the Universe were Gaussian. This is motivated by the fact that original quantum fluctuations are known to be Gaussian, thus, if the subsequent physics and evolution were linear, it would result in the primordial seeds for structure formation being also Gaussian. This, however, does not need to be the case since general inflationary models predict various degrees of primordial non-Gaussianity (PNG). These originate from, for instance, non-linear dynamics due to self-interactions in single-field inflation, or from correlations of the inflaton field with additional (light or heavy) fields [see Bartolo et al. (2004) for a review].

The departures from Gaussianity are expected to be of the same order of the second-order corrections to linear-perturbation theory. Since the observed amplitude of primordial perturbations is \(\mathcal {O}(10^{-5})\), the natural expectation of these corrections is thus \(\mathcal {O}(10^{-10})\). Therefore, the deviations from Gaussianity are expected to be very small, and thus they can be generically (agnostic of specific models) prescribed by expanding the true primordial Bardeen potential, \(\varPhi \), around a Gaussian field \(\phi ^{(1)}\) (enforcing \(\langle \phi ^{(1)} \rangle =0\)) (cf. Creminelli et al. 2007):

$$\begin{aligned} \begin{aligned} \varPhi (\varvec{x})&= \phi ^{(1)}(\varvec{x}) + f_{\mathrm{NL}} \int _{\mathbb {R}^3} \mathrm{d}^3y \left[ \phi ^{(1)}(\varvec{x}) \phi ^{(1)}(\varvec{y}) W(\varvec{x},\varvec{y}) - \langle \phi ^{(1)}(\varvec{x}) \phi ^{(1)}(\varvec{y}) W(\varvec{x},\varvec{y}) \rangle \right] + \\&\quad + g_{\mathrm{NL}} \times \left( \text {3-point combinations of }\phi ^{(1)}\right) + \tau _{\mathrm{NL}} \times \left( \text {4-point combinations of }\phi ^{(1)}\right) + \dots \end{aligned} \end{aligned}$$
(104)

where \(f_{\mathrm{NL}},g_{\mathrm{NL}},\tau _{\mathrm{NL}},\dots \) are referred to as the non-Gaussianity parameters, which quantify the level and structure of PNG; and W is a kernel defining the type of non-Gaussianity considered. The most common type in the literature is the so-called “local type” quadratic non-Gaussianity, which is defined by non-zero \(f_{\mathrm{NL}}\) and \(W = \delta _D(\varvec{x}-\varvec{y})\), and is a probe of multi-field inflationary models. Other kinds of configurations, e.g., equilateral and orthogonal types, are also (albeit less frequently) considered, which are expected to be sensitive to different aspects of inflation

From the above arguments, one expects \(f_{\mathrm{NL}} \sim 1\), with the simplest inflationary models of single-field slow-roll predicting \(f_{\mathrm{NL}} \lesssim 1\), but with several others (including non-inflationary cosmologies) predicting \(f_{\mathrm{NL}} \gtrsim 1\). We note that if inflation occurs at high energies, then primordial gravitational waves could distinguish alternative models, but if it occurs at low energies, measuring the value of \(f_{\mathrm{NL}}\) might be the only source of information about the early Universe.

Constraints from the analysis of the CMB fluctuations, as measured by the Planck satellite, are \(f_{\mathrm{NL}}^{\mathrm{local}} = -0.9 \pm 5.1\); \(f_{\mathrm{NL}}^{\mathrm{equil}} = -26 \pm 47\); and \(f_{\mathrm{NL}}^\mathrm{ortho} = -38 \pm 24\) at the 68% confidence level (Planck Collaboration 2016; Planck Collaboration 2020), which is in agreement with most inflationary models. Upcoming polarization and small-scale CMB measurements are expected to improve these constraints (Abazajian et al. 2016). An alternative for stronger constraints on \(f_\mathrm{NL}\) relies on measurements of the late time large-scale structure of the Universe (Alvarez et al. 2014). Although current LSS constraints are significantly weaker than those from the CMB (\(\sigma [f_{\mathrm{NL}}^{\mathrm{local}}] \sim 20-100\)) (Leistedt et al. 2014; Giannantonio and Percival 2014; Ho et al. 2015; Castorina et al. 2019), forecasts anticipate that future surveys could constrain \(f_\mathrm{NL}\) at the level of \(\sigma [f_{\mathrm{NL}}] \sim 0.5-3\) (Giannantonio et al. 2012; Yamauchi et al. 2014; Camera et al. 2015; Ferraro and Smith 2015). However, as the interpretation of upcoming surveys relies on non-linear physics and biased tracers, numerical simulation of primordial non-Gaussianity is essential.

Numerical simulations of primordial non-Gaussianity can be carried out in the same fashion as Gaussian simulations, with the difference being only in the initial conditions. For a local PNG type, \(\phi \) can be trivially computed from any given value of \(f_\mathrm{NL}\). The only numerical consideration is that, as for higher-order LPT implementations, Orszag’s rule needs to be applied to avoid aliasing in the computation of field convolutions. For orthogonal and equilateral types, the generation of initial conditions is more complicated as it involves the generation of a random field subject to a constraint on its bispectrum, but Scoccimarro et al. (2012) and Regan et al. (2012) proposed a general algorithm to efficiently achieve such a task. Another subtlety regards the exact definition of \(\phi ^{(1)}\) and its value at the simulation starting redshift. As for the case of multi-fluids, one can define \(\phi ^{(1)}\) directly at that redshift or apply back-scaling, a freedom that affects the comparison among simulations for different values of \(f_{\mathrm{NL}}\) (cf. Pillepich et al. 2010). Alternatively, the question of the role of PNG in structure formation can be investigated using the separate universe approach (cf. Sect. 6.3.6). As explored by Barreira et al. (2020b), Barreira (2020), a change in the initial power spectrum amplitude can mimic a primordial non-Gaussianity of the local type.

Simulations of primordial non-Gaussianity have found essentially three effects relevant for structure formation. The first one is that a local-type PNG caused a small modification in the nonlinear matter power spectrum (Wagner et al. 2010) which can be understood in perturbation theory as additional mode-coupling terms. The abundance of halos is also affected by the existence of primordial non-Gaussianity, with positive (negative) values of \(f_{\mathrm{NL}}\) increasing(decreasing) the number of very massive halos (Dalal et al. 2008; Wagner and Verde 2012), and with qualitatively similar results for \(g_{\mathrm{NL}}\) and \(\tau _{\mathrm{NL}}\) (LoVerde and Smith 2011). However, these effects can become very degenerate with astrophysical processes and nonlinear evolution, which cautions on their use as cosmological probes.

A clearer signature of local PNG arises on the very large-scale clustering of galaxies or quasars. PNG modifies the abundance of biased tracers on scales, \(k<10^{-2} h\,\mathrm{Mpc}^{-1}\), in a way that the clustering statistics receive an additional contribution proportional to the non-Gaussian term in Eq. (104) (Dalal et al. 2008; McDonald 2008; Baumann et al. 2013; Assassi et al. 2015). This has been confirmed by a number of N-body simulations (Dalal et al. 2008; Grossi et al. 2009; Pillepich et al. 2010; Scoccimarro et al. 2012; Wagner and Verde 2012). In practice, this contribution appears as a scale-dependent term proportional to \(b_1 f_{\mathrm{NL}} k^{-2}\) in e.g., the power spectrum, which can dominate on large scales over the contribution of the standard linear bias parameter, \(b_1\), and thus can be used to place constraints on \(f_{\mathrm{NL}}\) (Slosar et al. 2008). Numerical simulations have shown that a similar scale-dependent effect exists for other kinds of non-Gaussianities (Desjacques and Seljak 2010; Scoccimarro et al. 2012; Shandera et al. 2011).

The amplitude of the ‘non-Gaussianity bias’ can be related to the value of the linear bias parameter, \(b_1\), using analytic arguments: \(b_{\phi } \propto (b_1 - 1)\) (Slosar et al. 2008; Matarrese and Verde 2008), with which a much more predictive model can be constructed with the obvious benefits of increased constraining power. However, results from N-body simulations have revealed that the precise relation depends on the kind of halo considered (or the property used to select them) and the specific galaxy formation physics (Slosar et al. 2008; Scoccimarro et al. 2012; Desjacques et al. 2009; Reid et al. 2010; Barreira et al. 2020b), since it depends on the details in which the formation of a given nonlinear object responds to a change in the large-scale potential. This might hinder the ability to robustly place constraints on PNG in future surveys (Barreira 2020). However, as a potential detection of PNG is within reach of the upcoming generation of LSS surveys, there is certainly motivation to seek a better understanding of these effects in all LSS statistics.

7.10 Modified gravity

Throughout this review we have assumed General Relativity (GR) as the theory of gravity. And so far no experiments or astrophysical observations have been found to be inconsistent with GR. A particularly significant prediction of GR was the recently detected gravitational wave (GW) event with the co-incident detection of an electromagnetic counterpart, which has put stringent constraints on the class of modifications to GR allowed (Creminelli and Vernizzi 2017; Ezquiaga and Zumalacárregui 2017). GR is also consistent with a plethora of cosmological observations and specifically with tests based on redshift space distortions (Mueller et al. 2018; Barreira et al. 2016; Hernández-Monteagudo et al. 2020). However, departures from GR on cosmological scales are still possible and have received significant attention as a potentially important piece to understand the accelerated expansion of the Universe. In fact, testing gravity on cosmological scales is one of the primary science goals of the upcoming generation of large-scale structure observations and gravitational wave detectors (Alam et al. 2021; Belgacem et al. 2019).

To properly interpret future observations it is thus crucial to understand structure formation in modified gravity, for which numerical simulations are indispensable [see Llinares (2018), Baldi (2012) for specialised reviews]. A large number of modifications to gravity has been proposed (reviewed, e.g., in Clifton et al. 2012 and Koyama 2016). In the context of large-scale structure, the class that has received most attention are those with a ‘screening’ mechanism. These models become indistinguishable from GR in high density regions, e.g., via the gravitational potential (chameleon), its gradient (k-Mouflage), or its Laplacian (i.e., density) (Vainshtein). They are in agreement with many local gravity tests, but they may depart significantly from GR on large scale so that they could be probed by cosmic measurements.

The two most explored modifications to GR are the f(R) and DGP [short for Dvali et al. (2000)] models, which are regarded as representative of the kind of MG models currently available. These two gravity models feature an equal speed for photons and gravity, and naturally contain a screening mechanism. Note that although the initial motivation for such models was to explain the accelerated expansion of the universe, nowadays they are predominantly explored to study deviations from GR.

For the case of f(R), there is an additional term in the Einstein–Hilbert action that is a function of the Ricci curvature R, which leads to equations of motion with a gravitational potential modified as

$$\begin{aligned} \nabla ^2 \varPhi = \nabla ^2 \varPhi _{\mathrm{GR}} - \frac{1}{2} \nabla ^2 f_R \quad \text {with}\quad \nabla ^2 f_R =-\frac{a^2}{3}[\Delta R + 8 \pi G \bar{\rho} \delta ], \end{aligned}$$
(105)

where \(\varPhi _{\mathrm{GR}}\) is the gravitational potential in GR, \(\Delta R\) is the perturbation to the Ricci curvature, which could be written in terms of a scalar field \(f_R \equiv d f(R)/dR\) (e.g., in the Hu & Sawicki model \(f(R) := -M^2 \frac{c_1 (-R/M^2)^n}{c_2(-R/M^2)^n+1}\) with \(M^2:=H_0\varOmega _m\) and n, \(c_1\) and \(c_2\) being model parameters with \(c_1/c_2^2 \propto f_{R0}\), the value of the scalar field today). Note that the theory is unscreened in low density regions where there is an additional ‘fifth-force’, which has as an upper limit of 1/3 of the GR value.

In the case of (the normal branch of) DGP, which is screened by means of a Vainshtein mechanism in regions where the Laplacian of the potential is large, the modified Poisson equation reads

$$\begin{aligned} \nabla ^2 \varPhi = \nabla ^2 \varPhi _{GR} + \frac{1}{2} \nabla ^2\phi \quad \text {with}\quad \nabla ^2 \phi + \frac{r_c^2}{3\beta a^2} \left[ (\nabla ^2\phi )^2 - (\nabla _i \nabla _j \phi )^2 \right] = \frac{8\pi G a^2}{3\beta } \delta \bar{\rho } \end{aligned}$$
(106)

where \(\beta = 1 + 2 H r_c \left( 1+\frac{\dot{H}}{3 H^2} \right) \) and \(r_c\) is a free parameter below which gravity becomes 4-dimensional.

One can see that these gravity models generically modify the Poisson equation by adding an additional term, whose amplitude itself is dynamically determined by a non-linear equation. Numerical simulations thus need to solve for these fields in each timestep. This is usually achieved by representing these fields on an adaptive grid and solving for their values via relaxation or multi-grid methods (cf. Sect. 5.1.3). Note that because of this overhead, traditionally, MG simulations were significantly slower than \(\varLambda \)CDM simulations, although recent advances (and suitable approximations) have reduced their computational cost and nowadays they have similar execution times (e.g., Barreira et al. 2015; Winther and Ferreira 2015).

Note that in general, MG can produce differences with respect to \(\varLambda \)CDM already at high redshift. In such cases, the initial conditions of simulations need to be made with power spectra computed with Boltzmann codes incorporating such effects, such as e.g., MG-CAMB (Hojjati et al. 2011) or hi-CLASS (Zumalacárregui et al. 2017). However, it is common to simulate models that depart from GR only at low redshifts, thus, the initial conditions follow that of standard simulations.

Several codes exist that simulate various kinds of modified gravity (Oyaizu 2008; Schmidt 2009; Zhao et al. 2011; Li et al. 2012a; Brax et al. 2012; Llinares et al. 2014; Llinares and Mota 2014; Puchwein et al. 2013; Arnold et al. 2019), usually extending well-established GR codes such as RAMSES, Gadget, and AREPO. Winther et al. (2015) carried out a comparison of various N-body codes for f(R), DGP, and Symmetron models by simulating a \(L=250 h^{-1}\mathrm{Mpc}\) box with \(N=512^3\) particles. These authors found a very good agreement among different codes and for multiple statistics (for instance \(<1\%\) difference in the power spectrum up to \(k \sim 5 h\,\mathrm{Mpc}^{-1}\)), which has supported the validity of the MG observables predicted by numerical simulations. These MG simulations have revealed that the amplitude of the power spectrum is enhanced relative to \(\varLambda \)CDM due to the additional force. This enhancement is roughly scale-independent for DGP with a larger amplitude at low redshifts, about 15(3)% for models with \(cH_0=1(5)\) at \(z=0\). In contrast, departures with respect to \(\varLambda \)CDM are scale dependent for f(R), with a fractional increase of \(\sim 25(5)\%\) at \(k\sim 10 h\,\mathrm{Mpc}^{-1}\) for \(f_{R0}=10^{-5}(10^{-6})\). The abundance of massive halos is also affected by MG, with an increase on all mass scales for DGP and a mass-dependent effect in f(R) as a result of large haloes being effectively screened (see Winther et al. 2015 and references therein for more details).

Modified gravity has also been incorporated in approximate methods. Specifically, Winther et al. (2017) and Valogiannis and Bean (2017) implemented modifications to COLA (cf. Sect. 4.4) by computing 2LPT displacements in generic MG models (up to second order) and included various screening mechanisms. These authors found that the changes relative to \(\varLambda \)CDM in the power spectrum and halo mass function were accurately captured by COLA (within a few per cent up to \(k \sim 3 h\,\mathrm{Mpc}^{-1}\)). A similar accuracy was reached by Mead et al. (2015b) who extended cosmology-rescaling algorithms, so that the effects of MG in an N-body simulation could be incorporated in post-processing.

As for dark matter candidates, a large number of possible modifications to the equations of motion are allowed, each of which with their own free parameters. This is a difficulty when generic predictions are required, e.g., in data analysis, or to perform a systematic scanning of models with numerical simulations. For this reason, there has been significant work to formulate an effective parameterization whose parameters can then be constrained (Lombriser 2016; Thomas 2020).

Deviations from GR can be quantified in a general wayFootnote 21 by modifying two of the Einstein equations (in Fourier space) as

$$\begin{aligned} -k^2 \tilde{\varPhi }(\varvec{k})= & {} 4\pi G a^2 \overline{\rho }(a)\, \mu (k,a) \, \tilde{\delta }(\varvec{k}) \end{aligned}$$
(107a)
$$\begin{aligned} \tilde{\varPsi }(\varvec{k})= & {} \gamma (k,a)\, \tilde{\varPhi }(\varvec{k}) \end{aligned}$$
(107b)

where \(\varPsi \) and \(\varPhi \) are the two gravitational potentials (note that the lensing potential is given by \((\varPsi +\varPhi )/2\)), \(\gamma \) is the gravitational slip, and \(\mu \) is a generic function that captures the modifications between the relation of densities and the gravitational potential (i.e. a scale and time-dependence of the gravitational ‘constant’). Therefore, \(\mu = \gamma = 1\) yields the GR limit, but in general \(\mu \) and \(\gamma \) can be complicated functions of time and scale. In the linear regime, many theories (including f(R) and DGP) can be exactly mapped to specific values and forms of \(\gamma \) and \(\mu \). However, in the nonlinear regime this is less straightforward and parameterisations have been proposed based on spherical collapse (Lombriser 2016), or a post-Friedmann formalism (Thomas 2020), which have been argued to be valid even down to very small scales.

Following this philosophy, Cui et al. (2010) and Srinivasan et al. (2021) used parameterised deviations from GR and carried out a suite of simulations with different parameter values. So far, both of these approaches have considered only a time dependence for \(\mu \) (e.g., Srinivasan et al. 2021 considered piecewise constant values of \(\mu \) in redshift), but it is likely that in the future extensions to scale dependence will be possible. Additionally, recently Hassani and Lombriser (2020) showed that simulations adopting the Lombriser (2016) parametrisation can result in percent-level agreement in the power spectrum with respect to direct f(R) and DGP simulations. These results are very promising, although a more exhaustive exploration of other summary statistics will be required in the future. Note that in this case there is no need to solve for the evolution of the scalar field, which makes this kind of simulations significantly easier to carry out.

Another advantage of such parameterisations is that they could be readily applied to observations (e.g., Blake et al. 2020). For instance, Mueller et al. (2018) used a similar parameterisation to constrain deviations from GR using redshift space distortions in the BOSS survey. These constraints are expected to get significantly tighter with the newest generation of galaxy surveys and also by combining different probes (higher order N-point functions, marked statistics, redshift space distortion), for which we anticipate, the results of numerical simulations will be crucial. An illustration of this are the recent results of He et al. (2016, 2018), who by comparing the observed small-scale galaxy clustering against N-body simulations, could rule out f(R) models with \(|f_{R0}| \gtrsim 10^{-6}\).

7.11 Closing remark

In this section we have reviewed several possible extensions to the physically simplest \(\varLambda \)CDM simulations. Some of these modifications are more speculative in nature, as e.g., modified gravity, whereas others are strongly motivated by physical experiments, e.g., massive neutrinos. For an adequate interpretation of future datasets it will be important to explore the interplay of such modifications and the possible degeneracies that arise among them when interpreting cosmological data. For instance, the power spectrum suppression that baryons display relative to CDM, could potentially be misinterpreted as a signature of massive neutrinos. In this direction, for instance, Kuo et al. (2018) carried out simulations with decaying Warm Dark Matter, and Schwabe et al. (2020) mixing fuzzy and particle Dark Matter. Additionally, Baldi et al. (2014); Baldi and Villaescusa-Navarro (2018); Hashim et al. (2018) have performed simulations with this focus, where they consider cosmological scenarios with both modified gravity and warm dark matter, with modified gravity and massive neutrinos, and with primordial non-Gaussianities and interacting dark energy. As the quest for new physics continues to ever smaller effects on the large-scale structure of the Universe, these kind of simulations will be increasingly important in the future, and they will be required to guarantee the robustness when a given \(\varLambda \)CDM extension becomes ruled out or favoured.

8 Numerical considerations and the challenge of high-accuracy simulations

Numerical simulations start to play an increasingly central role in the interpretation of observational data and in the quantitative inference of physical properties of the Universe. It thus becomes essential to ensure the high precision and accuracy of simulation results. In this section we discuss several key aspects in this regard.

8.1 Box size and mass resolution

Two basic properties of a cosmological simulation are the size of the simulated box and the mass of the particles employed, both of which affect the nonlinear structure formed. For a simulation of a given side length, L, Fourier modes below the fundamental mode \(k < k_0 = 2 \pi /L\) are effectively set to zero. Thus, structure grows as if embedded in a region at the cosmic mean density and devoid of tidal forces, which is a biased representation of actual finite regions in the universe. Power and Knebe (2006) carried out a systematic study of the impact of these missing large modes finding that, while the internal properties of dark matter halos are unaffected, their abundance is strongly suppressed in small boxes, especially so at high masses.

Such a reduction in the number of dark matter halos also affects the nonlinear power spectrum, suppressing its amplitude on quasi-linear scales. This has been studied recently by several authors who all find that a boxsize of \(\gtrsim 1000\, h^{-1}\mathrm{Mpc}\) is required at \(z=0\) to obtain a converged measurement—i.e. independent of further increases of box size—of the density power spectrum and covariance matrices at the per cent level (Schneider et al. 2016; Mohammed et al. 2014; Klypin and Prada 2019). Specifically, the nonlinear evolution of baryonic acoustic oscillations depends on the correct modelling of large-scale flows which also requires similarly large boxes (Crocce and Scoccimarro 2006). The result of a study investigating the effect of box size and mass resolution on the power spectrum by Schneider et al. (2016) is reproduced in Fig. 21, where in the left panel the box size is varied at fixed mass resolution and in the right panel the mass resolution is varied at fixed box size, each with significant impact on the matter density spectrum.

Fig. 21
figure 21

Image reproduced with permission from Schneider et al. (2016), copyright by IOP/SISSA

Dependence of the nonlinear matter power spectrum at \(z=0\) on the volume and mass resolution of a N-body simulation. The left panel shows the fractional change in the power spectrum for various box sizes \(L \in [128, 256, 512, 1024] h^{-1}\mathrm{Mpc}\) at a fixed mass resolution. The right panel shows instead the change produced by increasing the mass resolution while keeping fixed the size of the simulation box. Note that for achieving percent convergence (marked by the shaded region) up to \(k \sim 3 h\,\mathrm{Mpc}^{-1}\), boxes of at least \(\sim 500 h^{-1}\mathrm{Mpc}\) with particles less massive than \(10^{9} h^{-1}{\mathrm{M}}_{ \odot }\) are required

The effects of the boxsize can be ameliorated by changing the equations of motion such that they account for the fundamental (or DC) mode (Sirko 2005; Gnedin et al. 2011), and sampling an ensemble of simulations representative of the variance at the box scale. Such an approach is particularly important when carrying out simulation ensembles to compute covariance matrices, as these are sensitive to overdensities on scales larger than the box (see Sect. 6.3.5). Additionally, by setting up initial conditions matching statistics in real space rather than Fourier space (Sirko 2005), it is possible to obtain e.g., halo mass functions that are less biased even for smaller boxes. Also, recent developments for modelling tidal fields larger than the box size (Schmidt et al. 2018) might prove useful in the future to achieve better convergence. Further, compactified simulations might offer an alternative path as they naturally incorporate very large-scale modes (Rácz et al. 2018).

While the finite simulation volume modifies the formation of halos mostly at the high mass end, finite mass resolution determines the smallest halo resolvable in a given simulation box. Thus, if halos that contribute significantly to the nonlinear power spectrum are not resolved, then the spectrum will be biased low. Schneider et al. (2016) argued that a mass resolution of at least \(10^{9}\, h^{-1}{\mathrm{M}}_{ \odot }\) is required to achieve per cent level convergence at \(k \sim 1\, h\,\mathrm{Mpc}^{-1}\) at \(z=0\), as smaller halos contribute negligibly to the power spectrum. Higher wavenumbers will be set by the inner regions of halos, which might be affected by two-body relaxation and other effects if not resolved with an adequate number of particles.

Furthermore, initial fluctuations resolved with a small number of particles will suffer from large numerical errors. Specifically, the abundance of halos resolved with approximately less than 100 particles is overestimated as halo finders identify statistical upward fluctuations as real objects (Warren et al. 2006). Note however that the actual rate of convergence depends on the halo finder algorithm (see Sect. 9.4.1), as well as other numerical parameters such as force accuracy and the softening length.

8.2 Close encounters and regularization in N-body methods

Another aspect affecting the numerical accuracy of the N-body solutions is related to the so-called softening length, where the gravitational interaction ceases to be Newtonian. As discussed also at length in Sect. 5.3, the Green’s function of the Laplacian is \(G(\varvec{r}) = -1/(4\pi \Vert \varvec{r}\Vert )\), which translates into the Green’s function \(\varvec{G}_a:= -\varvec{\nabla }\nabla ^{-2} \delta _D(\varvec{r})\) of the acceleration operator, i.e.,

$$\begin{aligned} \varvec{G}_a(\varvec{r}) = -\frac{\varvec{r}}{4\pi \Vert \varvec{r} \Vert ^{3}} \end{aligned}$$
(108)

which are both divergent for \(r\rightarrow 0\). If the goal is to model dark matter in the continuum limit, then this is undesired behavior: it drives an effective collision term resulting in two-body relaxation, which leads to a deviation from collisionless Vlasov–Poisson dynamics. In practice, it also imposes arbitrarily small time steps if step criteria like Eq. (45) are used. The usual solution is to impose a small-scale cut-off. In the simplest case of ‘Plummer softening’ (e.g., Hernquist and Barnes 1990) one sets \(G_\epsilon (\varvec{r}) = -1/(4\pi \sqrt{\Vert \varvec{r}\Vert ^2+\epsilon ^2})\) with a ‘softening length’ \(\epsilon \), so that the tamed acceleration becomes

$$\begin{aligned} \varvec{G}_{a,\epsilon }(\varvec{r}) := -\varvec{\nabla }G_\epsilon = -\frac{\varvec{r}}{4\pi \left( \Vert \varvec{r} \Vert ^2+\epsilon ^2\right) ^{3/2}}, \end{aligned}$$
(109)

which in fact vanishes for \(r\rightarrow 0\), but is only asymptotically Newtonian (i.e. at \(r \rightarrow \infty \)). The maximum acceleration is reached for \(r=\epsilon /\sqrt{2}\) with a value of \(\max _r \Vert \varvec{G}_{a,\epsilon }\Vert = 2/(3\sqrt{3}\epsilon ^2)\). Other popular ways of modifying Eq. (108), such as Kernel softening, can guarantee a transition to Newtonian force at finite r, typically a few times the softening scale, e.g., Springel et al. (2001b). One should also remark that if grid-based methods are used, unless extremely aggressive adaptive mesh refinement strategies are employed (e.g. refinement on single particles), the grid scale directly provides a regularisation scale and suppresses two-particle interactions.

With softening, the collisionless N-body system becomes well behaved at the price of smoothing out structure on scales comparable to the size of \(\epsilon \). Formally, the need for a softening length appears in cases where the N-body discretization is no longer valid—separations smaller than the typical size of the volume chosen for the coarse-graining process. Under these arguments, the softening should be chosen such that \(\epsilon \sim \ell := (V/N)^{1/3}\) (Melott et al. 1997; Splinter et al. 1998; Romeo et al. 2008). Unfortunately, with current computational power, this is typically much larger than the regions of interest—e.g. the virial radius of a halo resolved with \(\sim 100\) particles is \(\sim 0.3 \ell \)—and large-scale simulations adopt values 30–100 times smaller. For instance, and in units of the mean inter-particle separation, the Bolshoi simulation employs 0.016 (Klypin et al. 2011), the Multidark simulations 0.0150–0.026 (Klypin et al. 2016), the set of Millennium simulations 0.022 (Springel 2005; Boylan-Kolchin et al. 2009; Angulo et al. 2012), and the EUCLID flagship 0.02 (Potter et al. 2017).

There are several possible criteria for setting the softening length in a simulation. The goal is to suppress discreteness effects and force errors while maximizing the range of scales and amount of resolved nonlinear structure (Ludlow et al. 2019). One possible criterion, \(\epsilon _v\), is to require that the binding energy of a halo is larger than that of two particles at a distance \(\epsilon \), other criteria aim to suppress the effect of close two-body encounters by looking at the change in acceleration (\(\epsilon _\mathrm{acc}\)) or in large-angle deflections (\(\epsilon _{90}\)). However, these are qualitative estimates, and usually softening is chosen empirically from the observed convergence in numerical experiments, e.g. the behaviour of the profiles of halos or the small-scale power spectrum. Figure 22 shows a diagram with various possible estimates of the softening length, including \(\epsilon _v\), \(\epsilon _{\mathrm{acc}}\), \(\epsilon _{90}\) and the optimal values discussed below, along with the values adopted by some state-of-the-art simulations.

Fig. 22
figure 22

Image reproduced with permission from Ludlow et al. (2019), copyright by the authors

Diagonal black lines show three estimates for the minimum value of the gravitational softening \(\epsilon _{90}\), \(\epsilon _v\), \(\epsilon _{\mathrm{acc}}\) required for the simulated dynamics to be collisionless. Thus, the beige region might be dominated by numerical errors. The dark blue region marks the minimum scale for which the 2-body collisional relaxation is equal to a Hubble time, for halos of different concentrations. Thus, an optimal value of the lie near the white region for halos resolved with at least 100 particles. The optimal values of van den Bosch and Ogiya (2018), \(\epsilon _{\mathrm{opt}}/r_{200} = 2.3\times N_{200}^{-1/3}\), which is approximately \(\epsilon _{\mathrm{opt}} \sim 0.017 L/N^{1/3}\), is denoted by the dashed line, whereas the values adopted in various simulations are shown by coloured lines

Power et al. (2003) carried out a systematic analysis of the impact of numerical parameters in the convergence of circular velocity profiles. They empirically showed that convergence to within \(\sim 10\%\) can be achieved on scales larger than a region enclosing a number of particles whose two-body relaxation time was larger than the Hubble time. This criterion has been validated and employed in multiple simulations afterwards (Diemand et al. 2004b; Springel et al. 2008a; Navarro et al. 2010; Gao et al. 2012) and is usually adopted when setting the softening length of current simulations. In addition, the same authors (Power et al. 2003) proposed an empirical ‘optimal’ softening length \(\epsilon _\mathrm{opt}/r_{200} = 4 (N_{200})^{-1/2}\)—this approximately translates to \(\epsilon = 0.03 \ell \)—which is widely used, especially in zoom simulations. Note, however, that Zhang et al. (2019a) argued that this criterion was overly conservative (see also Ludlow et al. 2019) and advocated for a factor of 2 smaller softening, i.e. \(\epsilon _{\mathrm{opt}}/r_{200} = 2 (N_{200})^{-1/2}\)—based on the convergence of density and circular velocity profiles of \(\sim 10^{12} h^{-1}{\mathrm{M}}_{ \odot }\) halos in cosmological boxes. An even smaller value of the softening length was advocated by Mansfield and Avestruz (2021). By comparing a large number of publicly-available simulations, they found that the shape and peak of the circular velocity of halos appeared systematically biased for values larger than \(\epsilon /\ell \sim 0.008\).

Using controlled idealised simulations, van den Bosch et al. (2018), van den Bosch and Ogiya (2018) have also argued that current choices of the softening length are inadequate for simulating the evolution of dark matter subhalos. Specifically, they argue that subhalos resolved with less than 1000 particles suffer from instabilities and are artificially disrupted. This could be a serious "over-merging" problem [a classic N-body problem, see already Moore et al. (1996)] for simulations as their prediction for the abundance and spatial distribution of dark and luminous satellites could be unreliable. However, more recent results (Green et al. 2021) indicate that the effect is at most 10–20 per cent on the subhalo mass function. As a way to alleviate the overmerging problem, van den Bosch et al. (2018) argue for a softening length a few times smaller than usual choices. Specifically, they argue that the optimal softening is \(\epsilon _{\mathrm{opt}}/r_{200} \simeq 2.3 (N_{200})^{-0.33}\) [which is very similar to that of Zhang et al. (2019a) discussed above].

On cosmological scales, Joyce et al. (2020) argued that self-similarity in statistics measured in scale-free simulations (i.e., simulations with a power-law initial power spectrum in an \(\varOmega _m=1\) cosmology) is a good indicator of the degree of convergence of a given simulation setup. With such, Garrison et al. (2021) advocate for an optimal softening length of \(\epsilon = 0.033 \ell \) but fixed in physical (as opposite to comoving) coordinates. The choice of physical softening is a usual practice in hydrodynamic and zoom simulations, but note that this formally yields a time-dependent Hamiltonian (cf. Sect. 4) requiring some extra care in formulating the integrator since the gravitational force would receive an additional contribution (Price and Monaghan 2007) (but note that in superconformal time, the potential is already the only time-dependent piece of the Hamiltonian). However, the magnitude of the error introduced by ignoring such details might be acceptable for large-scale simulations, with the advantage of requiring less timesteps at high redshifts.

In summary, there is consensus that force softening slightly smaller than what has been traditionally adopted is preferred to improve convergence at a fixed computational cost. On the other hand, it is important to keep in mind that there are a few examples of problems originating from too small softening lengths. One is the case of artificial fragmentation in warm dark matter and first halos, where filaments (expected to be completely smooth) break down in pieces in a mass-resolution dependent way. Another example is in the simulation of two cold fluids with different initial power spectra (as is the case of baryons and dark matter in the early Universe). One has to keep in mind however that whenever \(\epsilon \ll \ell \), then the system evolves as a discrete system rather than in the continuous limit. This leads to significant deviations during the initial phase of a simulation (see our discussion in the next section on initial conditions), and contributes an additional source of numerical error in cosmological predictions.

There have been some attempts in the literature to fix the problems of the local evolution of the ‘mean’ particle separation scale \(\ell \) but with relatively limited success. One example is the case of a variable softening length which is determined by the local density (Price and Monaghan 2007; Bagla and Khandai 2009; Iannuzzi and Dolag 2011), or by the eigenvalues of the moment of the inertia tensor (Hobbs et al. 2016). Unfortunately, anisotropic collapse is typical in cosmological simulations, and thus isotropic softening tends to under- or over-smooth forces. Isotropic adaptive softening is also already implicitly built into those codes that rely on adaptive mesh refinement and a particle-mesh based gravity solver, as e.g. RAMSES or ART, see also Knebe et al. (2000), since the refinement criterion is usually tied to the local particle density. A more conservative refinement criterion compared to the very small softening typically employed with tree codes is then also the reason why often such AMR codes display some suppression of the mass function for the smallest halos (see also 8.5 below).

It is clear from the above that a possible direction for progress could be an anisotropic softening determined by the local distortion tensor or by solving for the GDE. This, in fact, would be very similar to a low-order version of the cold sub-manifold method discussed earlier.

8.3 Accuracy of initial conditions

Initial conditions that are set-up using Lagrangian perturbation theory (LPT), see Sect. 6.2, use a low-order truncation of LPT. The resulting truncation error appears as a ‘transient’ error in the non-linear evolution of the cosmic density field. This truncation error is largest if LPT is truncated at low order and the simulation started at late times. As a consequence, second order (2LPT) ICs have long been advocated as a more accurate replacement of first order (Zel’dovich approximation) ICs for setting up initial conditions (Scoccimarro 1998; Valageas 2002; Crocce et al. 2006). In principle, it is always possible to simply start the simulation at early times to reduce the truncation error and avoid having to use higher order LPT, but numerical errors during the early stages of the simulation can be large for some schemes, since the density perturbations away from homogeneity become increasingly smaller and are easily overshadowed by numerical errors if inadequate force calculation is used (this is particularly a problem for the tree method (cf. Sect. 5.4).

However, a further limitation of N-body simulations is due to discreteness effects. While LPT initialises the N-body simulation with the growing modes of the continuous fluid system (with perturbations truncated in the UV at some wave number usually not too far from the particle Nyquist wave number), the N-body system itself follows the dynamics of a discrete system. The resulting growing and decaying modes are different from the continuous fluid, and in fact anisotropic at the discretisation scale, as demonstrated by Joyce et al. (2005), Joyce and Marcos (2007a, b), Marcos et al. (2006). As a consequence, there is a secondary transient resulting from the transition of the N-body simulation from the continuous dynamics to the discrete dynamics, which is accompanied by a distinct suppression of the power spectrum on scales close to the particle Nyquist wave number during weakly non-linear phases of the simulation. To compensate for this effect, Garrison et al. (2016) have proposed to explicitly correct for this error by projecting into the discrete eigenmodes and correcting the lack of growth with a boosted velocity that exactly compensates the leading error term at a specific target time. Arguably a downside of this approach is that, due to the boost, the evolution of the system prior to the target time is somewhat unphysical.

In order to circumvent both truncation and discreteness errors, Michaux et al. (2021) have argued for the use of particularly late starting times employing higher order (3LPT) initial conditions. Michaux et al. (2021) also showed that these errors are largest during the mildly non-linear evolution, but then decrease again once scales close to the particle Nyquist wave number are dominated by structures that are fully collapsed and virialised. The use of high order LPT and late starting times might therefore allow more economical simulations, i.e. fewer particles to reach a given accuracy up to a certain wave number. Even higher order initial conditions might not be necessary except in very special situations, since LPT has been shown to converge very quickly prior to shell-crossing (Rampf and Hahn 2021). The study by Rampf and Hahn (2021) also found that for higher order LPT it is preferable to exclude so-called ‘corner modes’ (i.e., modes between 1 and \(\sqrt{3}\) times the linear particle Nyquist wave number) in order to reduce artefacts in non-linear terms due to UV truncation of the perturbation spectrum and convolution integrals. The effect of these ‘corner modes’ for low order ICs (i.e. up to 2LPT) on non-linear simulation results has been found to be small (Falck et al. 2017).

All these aspects are summarised in Fig. 23 which compares the power spectrum and bispectrum as measured in simulations adopting various starting redshifts, \(z_{\mathrm{start}}\), and LPT orders. We can see, for instance, that using 1LPT (i.e. the “Zeldovich approximation”) leads to a systematic underestimation of clustering statistics on both intermediate and small scales. This effect is somewhat reduced for higher starting redshifts, but an increase in the LPT order yields a much more significant improvement. Employing 3LPT initial conditions, even at \(z_\mathrm{start} = 11.5\), agrees with the reference soluution at the subpercent level for \(k\lesssim 2 h\,\mathrm{Mpc}^{-1}\) at both redshifts and for the power spectrum and bispectrum in this case.

Fig. 23
figure 23

Image reproduced with permission from Michaux et al. (2021), copyright by the authors

Dependence of the statistical properties of a simulated nonlinear density field on the details of how the initial conditions are constructed. Left and right panels show the results for the matter power spectrum and for the bispectrum with equilateral configuration (\(\varvec{k}=\varvec{k_1}=\varvec{k_2}\)); whereas top and bottom panels display \(z=0\) and \(z=1\), respectively. In each panel, results are shown for various choices of starting redshift \(z_{\mathrm{start}} \in [11.5, 24, 49, 99, 199]\), and the order of the Lagrangian Perturbation Theory (LPT) used to compute the particle displacements at the respective redshift. The reference run is a simulation initialized with 3LPT at redshift 24 and using a face-centered-cubic lattice with four times as many particles as in the test simulations

8.4 Chaos and determinism in simulations

So far we have discussed how differences in the setup, discretization, or numerical parameters of a simulation affect its predictions. However, even for a given set of choices and deterministic equations of motion, stochasticity can arise in the predictions for the nonlinear density field as a result of chaotic behavior.

Chaos refers generally to a process in which exponentially divergent results appear from small differences in the initial state of a system. Chaos in a Hamiltonian system in a 3+3-dimensional phase space can be formally quantified in terms of three distinct Lyapunov exponents \(\pm \lambda _{1,2,3}\) (due to the symplectic nature they come in pairs). A notion of predictability of a system is then given by the maximal Lyapunov exponent, defined as the most rapid characteristic separation rate of two trajectories separated by a vector \(\delta \varvec{\xi }(t)\) in phase space, i.e. \(\left| \delta \varvec{\xi }(t) \right| \sim \exp (\lambda t)\,\left| \delta \varvec{\xi }_0 \right| \) in a linearised sense so that

$$\begin{aligned} \lambda \sim \lim _{t\rightarrow \infty } \frac{1}{t} \log \frac{ \left| \delta \varvec{\xi }(t) \right| }{ \left| \delta \varvec{\xi }_0 \right| }, \end{aligned}$$
(110)

where the initial separation \(\delta \varvec{\xi }_0\) should be thought of as infinitesimal.

In the case of numerical simulations, seeds for chaos and stochastic behavior could arise from round-off errors and/or from small variations in the initial conditions of the system. Additionally, errors associated with the force calculation and time integration could be exponentially amplified in chaotic systems. In fact, examples of chaotic behavior have been reported in the literature for N-body simulations of star clusters, satellite galaxies, halo stars, and planetary systems among others (Heggie 1991; Goodman et al. 1993; El-Zant et al. 2019; Maffione et al. 2015; Price-Whelan et al. 2016). In principle, Lyapunov exponents can be computed during the evolution of the N-body system to identify regions of chaos (Habib and Ryne 1995), and the ‘GDE’ approach can also give direct access to them (Vogelsberger et al. 2008). Understanding chaos in a cosmological context is important to determine the accuracy and robustness of numerical simulations. As discussed earlier, the evolution of a N-body system may not be guaranteed to converge to the continuum limit even for infinite particles.

Chaos in a cosmological simulation context has been explored by several authors. By comparing an ensemble of runs with identical initial power spectrum but with small differences in their white-noise field, Thiébaut et al. (2008) found that large scales and time-integrated halo properties such as position, mass and spin were robust predictions. On the other hand, the position of substructures, and the orientation of the spin and the velocity dispersion tensor showed larger variations.

Using pairs of simulations with identical initial conditions up to a small perturbation, Genel et al. (2019) quantified the role of chaos in the properties of well-resolved objects in numerical simulations. They found that differences in the mass and circular velocity of halos grows exponentially, but that it saturates at a level compatible with Poisson noise; \(\sim 1/\sqrt{N}\) for a system resolved with N particles. A similar conclusion was reached by El-Zant et al. (2019) studying the time-reversibility of N-body systems: initial errors grow rapidly during an initial phase but they subsequently saturate.

The results above suggest that numerical simulations of self-gravitating systems do indeed converge, albeit slowly, to the collisionless limit as N tends to infinity. This is in contrast with cosmological hydrodynamical simulations, where initial differences, round-off errors and sometimes stochastic star formation prescriptions have been found to result in marked differences in the global properties of simulated galaxies, even contributing significantly to some scaling relations (Genel et al. 2019; Keller et al. 2019).

8.5 Convergence among codes

With few exceptions (see Sect. 3), all modern simulation codes essentially adopt the same approximations to predict the nonlinear state of large-scale structure in the Universe: the N-body discretization of a single fluid (describing both baryons and dark matter) that only interacts gravitationally. However, different codes make different algorithmic choices regarding the force calculation, force softening, and time integration, and also could be subject to systematic errors (also programming bugs become more likely with the increasing algorithmic complexity of codes). Given the importance of the predictions and role of numerical simulations, it is important to compare the performance and predictions of different codes.

An early comparison carried out by O’Shea et al. (2005) between the codes ENZO and GADGET revealed systematic differences originating from the algorithm employed to compute gravitational interactions. The particle mesh simulation with adaptive mesh refinement yielded a lower power spectrum amplitude on small scales compared to the tree-PM simulation. It was found that this discrepancy came from the conservative refinement criteria missing the formation of small halos at high redshift. A more aggressive mesh refinement led to much better agreement, but this served as an example that, although codes are solving the same equations, numerical simulations are complex systems and numerical errors can easily propagate to large levels. In the same year, Heitmann et al. (2005) compared several codes in the context of large-scale clustering predictions by carrying out both cosmological and idealised simulations.

A more systematic comparison among N-body codes was carried out in 2008 by the Cosmic Code Comparison Project (Heitmann et al. 2008), in which ten codes evolved the same initial conditions of comoving boxes of \(64\, h^{-1}\mathrm{Mpc}\) and \(256\, h^{-1}\mathrm{Mpc}\), as well as the “Santa Barbara cluster” (Frenk et al. 1999). The agreement of the codes had by then improved dramatically with all of them agreeing on the amplitude of the \(z=0\) nonlinear power spectrum at the \(10-20\)% level up to \(k \sim 10 h\,\mathrm{Mpc}^{-1}\), and at \(5-10\)% at \(k \sim 5 h\,\mathrm{Mpc}^{-1}\). A comparison in 2014 of nine cosmological N-body codes simulating the same \(\sim 10^{11} h^{-1}{\mathrm{M}}_{ \odot }\) halo by Kim et al. (2014) demonstrated small differences in the global density profile but pronounced differences in the subhalo mass functions between AMR and tree-based codes consistent with the earlier found suppression of small haloes when conservative refinement criteria are used.

Schneider et al. (2013) carried out the “Euclid comparison project” – a more demanding test simulating a box of \(500\, h^{-1}\mathrm{Mpc}\) per side with \(2048^3\) particles, which should have led to converged results with respect to boxsize and mass resolution. There were only three participant codes, GADGET-3, PKDGRAV-3, and RAMSES, which showed a remarkable agreement of \(< 5\%\) for \(k \sim 5 h\,\mathrm{Mpc}^{-1}\). An important feature of this code comparison was that the initial conditions were released publiclyFootnote 22, so that any N-body code could carry out the same simulation and compare their predictions. In fact, subsequently, further codes have been added to this comparison as the initial conditions were run by several authors recently to include the codes ABACUS (Garrison et al. 2019), L-GADGET-3 (Angulo et al. 2021), and GADGET-4 (Springel et al. 2021).

In Fig. 24 we show a recent comparison between four state-of-the-art codes. Remarkably, the power spectrum on large scales agrees between all of them to better than \(0.4\%\) which extends up to \(k \sim 2 h\,\mathrm{Mpc}^{-1}\) and somewhat degrades to \(2\%\) up to \(k \sim 10 h\,\mathrm{Mpc}^{-1}\). Note that in the original comparison paper, GADGET-3 was an outlier, displaying systematic differences with respect to the other codes. However, the results of Angulo et al. (2021) and Springel et al. (2021) regarding the GADGET-3 code suggest that this run was probably carried out with poorer time integration or force accuracy compared to the other codes.

Fig. 24
figure 24

Image reproduced with permission from Springel et al. (2021), copyright by the authors

Comparison between the nonlinear structure at \(z=0\) as predicted by 4 different N-body codes evolving the same initial conditions of the “Euclid comparison project” (Schneider et al. 2016). This simulation corresponds to a cubical box of side \(L=500 h^{-1}\mathrm{Mpc}\) with \(2048^3\) particles of mass \(1.2\times 10^9 h^{-1}{\mathrm{M}}_{ \odot }\). The left panel shows the nonlinear matter power spectrum whereas the right panel shows the stacked density profile for the 25 most massive halos in the simulation. Displayed results are obtained from GADGET-4 (Springel et al. 2021), PKDGRAV-3 (Potter and Stadel 2016), ABACUS (Garrison et al. 2019), and RAMSES (Teyssier 2002). Note that all codes agree remarkably well even down to very small scales (<1% up to \(k \sim 10 h\,\mathrm{Mpc}^{-1}\)), and eventually disagree on smaller scales owing to different numerical parameters adopted in each run. Similarly, the density profiles agree to a percent level down to \(10 h^{-1}\)kpc

Overall, the current agreement at the sub-percent level among codes is a remarkable achievement for the field of computational cosmology, which can now claim that the nonlinear evolution of collisionless matter can be predicted to better than one per cent, including sources of systematic error, over the full range of scales relevant for upcoming weak lensing surveys. This naturally has been the first necessary step to demonstrate the robustness of numerical predictions, in the future similar comparisons extended to other statistics (e.g. the bispectrum and higher-order correlations, velocity fields, halo mass functions, etc) and to a more realistic description of the universe (e.g. multiple fluids, including neutrinos and baryons), will be required to establish robustly the predictions of numerical simulations. In addition, a careful assessment of the impact of the ubiquitous N-body discretization itself will also be essential, which only now is becoming possible with the development of the alternatives that we have discussed previously. We discuss this in the next subsection.

8.6 The N-body approximation

Throughout this review we have argued that simulations are essential in modern cosmology and are the only way to obtain accurate results in the nonlinear regime. Specifically, in this section we have discussed how the results of cosmological simulations converge as a function of numerical parameters; box size, mass resolution, and initial conditions, as well as conditions for choosing the softening length. Additionally, we discussed how different simulations codes do agree in their predictions for the nonlinear matter clustering down to small scales.

It is important to highlight that these statements regard convergence within the N-body discretisation. All codes employed in large-scale simulations do assume the same N-body discretisation of the underlying fluid equations, and they differ simply in algorithmic choices about how to compute gravitational forces and perform the time integration. Although they might differ in their computational efficiency and convergence rate, they are expected to show convergence. However, the N-body dynamics and final distribution function is not necessarily identical to that of the continuum limit. Actually there is no formal proof of the convergence, and any observed convergence through resolution studies could arise due to, for instance, the inherent noisiness of the problem or it could be that the convergence to the true solution is extremely slow.

As we have discussed before, there are actually a few examples where the N-body discretization clearly introduces significant errors, even on large-scales. The spurious effects of discreteness noise and a finite softening length are very evident in warm dark matter cosmologies where filaments fragment and collapse into halos originated by discreteness effects. Another example is in two-fluid simulations, such as baryon and dark matter, where discretisation errors lead to the incorrect evolution, even on extremely large scales due to small-scale coupling. Also, the differences between the continuous fluid solution and the discrete N-body system are noticeable as a pronounced starting redshift dependence of the small-scale power spectrum during the mildly non-linear evolution.

In fact, there are claims that N-body results might be strongly affected by discreteness noise, and overly strong phase-space diffusion, which could act as an attractor, thus showing convergence to the wrong solution (Baushev 2015). In this case, for instance, the actual density profile of collapsed objects might differ from the standard NFW profile, which could ultimately imply biases in, for instance, the comic parameters inferred from weak gravitational lensing.

Given the importance of simulations in cosmology, it is crucial to be aware of these limitations and explore whether the predictions relevant for large-scale structure and the interpretation of cosmological observations is actually correct. An important test would be to compare the N-body results against those of simulations adopting other discretisation techniques. However, this has not yet been fully achieved in the highly nonlinear regime, since the alternative techniques are much more computationally expensive.

An important step in this direction was recently taken by Colombi (2021). In this study, the formation of earth-mass microhalos and an idealised case of three collapsing sine waves were simulated using a N-body PM code and the Lagrangian phase-space tessellation code ColDICE (Sousbie and Colombi 2016). These authors showed that both approaches agree remarkably well, as long as there is more than one particle per force resolution element. This result is in agreement with the well known studies that argued that N-body simulations can be reliable only as long as the softening length is chosen of the order of the mean inter-particle separation (Melott et al. 1997; Splinter et al. 1998). This scale is also the one implicitly adopted by phase-space tessellations in Lagrangian space.

Interestingly, Colombi (2021) showed that N-body and phase-space tessellations both predict a steep power law density profiles, in agreement with several previous studies (cf. Sect. 7.1). After an initial period of collapse, discreteness noise and perturbations of physical origin, such as mergers, drive the system towards an NFW-like profile (Syer and White 1998; Ishiyama 2014; Ogiya et al. 2016; Angulo et al. 2017; Ogiya and Hahn 2018). This is an interesting result since in some WDM models, the monolithic collapse of fluctuations occurs near the observational limits, thus it might be possible that this distinctive internal properties might leave observable signatures.

In summary, there is strong evidence that the N-body solution is correct as long as the softening is appropriately chosen and is of the order of the local mean inter-particle separation. Note, however, that as we have see in section 8.2, modern large-scale simulations typically use much smaller values. This is mandated by the requirement that the gravitational force should be Newtonian down to the smallest possible scale in order to prevent a too slow growth of small-scale perturbation in CDM simulations. It will be interesting in the future to confirm that the overall predictions of large-scale simulations are quantitatively correct. This will be a crucial step towards robust inferences from cosmic observations that will progressively rely more on numerical simulations.

9 Analysis and postprocessing

In this section, we will review the most common statistics that can be computed from cosmological N-body simulations, and discuss various analysis and post-processing strategies to connect simulated universes to observables in the real Universe.

9.1 The density field

The most basic predictions from simulations are the statistics of the nonlinear matter density and velocity fields across cosmic time. These statistics are closely related to the interpretation of several observational measurements such as gravitational lensing, the abundance and spatial distribution of biased tracers, redshift space distortions, and the kinetic Sunyaev–Zel’dovich effect, and are important for an understanding of structure formation in general.

Using a mass deposit scheme (most commonly CIC interpolation, see Sect. 5.1.2), the density field can be readily obtained on a three-dimensional grid from the N-body particle distribution. Such a procedure has however the problem that, if perturbations on scales \(k>k_{\mathrm{Ny}}\) exist (the Nyquist wave number is defined as \(k_{\mathrm{Ny}} := \pi /\varDelta x\), where \(\varDelta x\) is the grid spacing) they will be misidentified as modes supported by the grid (i.e. \(k<k_{\mathrm{Ny}}\)), which leads to errors in Fourier space. This is referred to as aliasing [see Hockney and Eastwood (1981) for a detailed discussion]. This problem can be cured by interlacing, where the particles are additionally shifted (in the simplest case once by a vector \((\varDelta x/2,\,\varDelta x/2,\, \varDelta x/2)\)) before deposit and then shifted back using a Fourier space shift. If averaged with the original deposit, it can be shown that the leading order aliased contribution is cancelled out (Hockney and Eastwood 1981; Sefusatti et al. 2016).

Since the N-body scheme is Lagrangian, particles sample mass, and therefore the signal-to-noise ratio of the density field can be very low if a grid cells contain few (or no) particles. This shortcoming can in principle be circumvented by employing adaptive kernel estimators (such as an SPH)-like approach, a Voronoi-tessellation based on particle positions (van de Weygaert 1994; Cautun and van de Weygaert 2011), or by using the phase space sheet tessellation method (cf. Sect. 3.2.2, Kähler et al. 2012), which all can give a well defined density estimate everywhere in space. A disadvantage of adaptive softening is that the filter properties are not easily known in Fourier space, so that a de-convolution with the assignment kernel, which is a common step in power spectrum estimation, is not possible. Information from the full three-dimensional density field is then usually further compressed into various ‘summary statistics’, traditionally those statistics that can be predicted from perturbation theory have been favoured (e.g., n-point spectra and correlation functions), but new statistics are being considered to maximise cosmological information content or discriminative power, e.g., Cheng et al. (2020).

Since primordial cosmological perturbations are (close to) Gaussian, non-Gaussianity arises in the late Universe through non-linear evolution, and the density field approaches a Gaussian field only on large scales. A Gaussian field is fully described by its two-point statistics so that power spectra and two-point correlation function take a central role in cosmological analysis, but need to be supplemented with higher order statistics to be sensitive to additional cosmological information in phase correlations, see Sect. 9.2 below. By averaging over finite volumes (‘counts in cells’, Peebles 1980) and evaluating their one-point statistics (i.e., the variance, skewness, kurtosis of the density probability distribution, which is close to log-normal at late times, Coles and Jones 1991) one can probe similar information. Also the statistics of peaks and troughs (or clusters and voids) is a sensitive probe of the underlying cosmology since these occupy the tails of the density distribution, see Sect. 9.4. This analysis has more recently been extended to include the statistics of all critical points (i.e. also saddle points) and critical lines to characterise the density field (Sousbie 2011; Xu et al. 2019). Also excursion sets (i.e., isodensity surfaces) of the density field contain cosmological information, which is conveniently expressed in invariant form in the Minkowski functionals (scalars) (Schmalzing and Buchert 1997; Nakagami et al. 2004; Aragón-Calvo et al. 2010; Fang et al. 2017; Lippich and Sánchez 2021). These latter methods already quantify some aspects of the anisotropic cosmic web (see the separate discussion in Section 9.5 below).

9.2 Power-spectra and correlation functions, n-point and n-spectra

The properties of the density field, or the spatial distribution of a (biased or unbiased) set of points drawn from it, e.g. N-body particles or halos, can be characterized by a hierarchy of n-point functions. For a non-Gaussian field, all the connected n-point functions are needed to fully describe the field. In Fourier space, the power spectrum corresponds to the two-point function; the bispectrum to the tree-point function; the tri-spectrum the four-point function, and so on.

The algorithmic complexity of computing these statistics increases as the n-th power of the number of points. For computational efficiency, it is common to estimate these correlations in Fourier space employing Fast Fourier Transforms (FFT) applied over the density field estimated on a uniform grid. This computation then scale with the number of grid points \(N_g\) as \(\mathcal {O}(N_g \log N_g)\), rather than with the number of points in the sample. FFTs are particularly suited to analyze cosmological simulations as they naturally incorporate the periodic boundary conditions assumed in simulations, and efficient numerical implementation exist. However, as mentioned earlier, some care is needed to avoid “aliased” results.

The impact of aliasing on the power spectrum has been studied by several authors. Jing (2005) analytically estimated the power spectrum including aliasing for generic mass assignments, and proposed an algorithm to iteratively correct for it. Colombi et al. (2009) also proposed a correction based on a Taylor-Fourier expansion of the aliased spectrum. Aliasing can in principle be reduced if the mass assignment scheme is a low-pass filter, thus effectively zeroing all small scale Fourier modes not supported by the FFT (strictly all scales \(k>2k_{\mathrm{Ny}}/3\) should be zeroed). Cui et al. (2008), Yang et al. (2009) argue that Daubechies wavelets and B-splines can fulfil this purpose. More recently, Sefusatti et al. (2016) showed that interlacing (see above) is a very efficient way to reduce significantly the effect of aliasing for any mass assignment scheme.

Another issue related to uniform FFT grids is that the number of grid points increases as \(\mathcal {O}((L/\varDelta x)^{3})\), thus measuring small scales can quickly become prohibitively expensive. To circumvent this limitation, Jenkins et al. (2001) proposed a folding technique, where the particle distribution is periodically wrapped to a smaller box of size \(L^{\prime}\equiv L/\beta \) with \(\beta \in \mathbb {N}\) replacing \(x \rightarrow x \mod L^{\prime}\). The new Nyquist wave number becomes a factor of \(\beta \) higher for the same computational cost, at the price of an overall larger statistical error, which is however acceptable in many situations. More recently, the folding technique has been shown to also be applicable the calculation of the bispectrum (Aricò et al. 2021a).

Note that in various situations, especially when estimating a density field from halos/galaxies, it can be preferable to compute correlation functions directly in configuration space, rather than through two DFTs (since the smallest resolved scale is not limited by the grid resolution, and the computational cost scales with the number of objects rather than the number of grid points). These direct algorithms can be very efficient if search trees are used for the range search. Therefore, another alternative for estimating small-scale power spectra and bispectra is to compute these Fourier statistics in configuration space, by counting pairs or triplets of objects (Philcox and Eisenstein 2020; Philcox 2021). Currently, there exist several publicly available codes to efficiently compute correlation functions, power spectra and higher-order statistics from numerical simulations, some of these are POWMES (Colombi et al. 2009), NBODYKIT (Hand et al. 2018), BSKIT (Foreman et al. 2020), PYLIANS (Villaescusa-Navarro 2018), HIPSTER (Philcox 2021), CORRFUNC (Sinha and Garrison 2020), CUTE (Alonso 2012), and HALOTOOLS (Hearin et al. 2017).

An important property of a discrete sample (i.e. halos or galaxies) of the cosmic density field is its bias, b(k), which quantifies the differences in the overdensity field \(\delta _{\mathrm{g}}\) relative to that of the underlying (unbiased) dark matter \(\delta _{\mathrm{dm}}\) as a (scale-dependent) multiplicative factor \(b(k) := \delta _{\mathrm{g}}(k)/\delta _{\mathrm{dm}}\). The bias can be readily estimated from two point functions as \(b(k) = \sqrt{P_\mathrm{g}(k)/P_{\mathrm{dm}}(k)}\), or from the cross-correlation between the sample and DM as \(b(k) := P_{\mathrm{g, dm}}/P_{\mathrm{dm}}\), which has the advantage of being less affected by stochastic noise. The bias can also be defined for individual objects as the overdensity of DM around it relative to that around random locations (e.g., Paranjape et al. 2018). Note that the average of the individual bias of objects in a sample is mathematically equivalent to the bias of the sample.

On large scales (where \(|\delta | < 1\)), it is possible to expand b(k) perturbatively in terms of powers and derivatives of \(\delta \) (see Desjacques et al. 2018 for a review). The coefficients of this expansion are usually referred to as bias parameters, and have been measured in N-body simulations using n-point functions, cumulants, and more recently also by employing the separate universe formalism (e.g., Lazeyras et al. 2016; Lazeyras and Schmidt 2018, 2019). The perturbative expansion has the appeal that it provides a physically motivated basis for describing the clustering of a generic distribution of objects. In other words, the observed clustering of galaxies can be expressed as a weighted sum of the auto and cross-spectra of powers and derivatives of \(\delta \). Traditionally, these density spectra have been computed using various flavours of perturbation theory (e.g., Eggemeier et al. 2021), but recently there has been increased interest in measuring them directly from N-body simulations (e.g., Modi et al. 2020a; Zennaro et al. 2021).

9.3 Velocity fields

The estimation of the velocity field poses more challenges than that of density fields. The main problem is that the velocity field is only sampled at the position of N-body particles, which therefore implies an implicit mass weighing (i.e. one obtains the momentum field if particle velocities are co-added on a grid). In other words, although the velocity field should be defined everywhere in space, without further sophistication it is only possible to compute it where particles exist. Thus, for instance, in low density regions, signal-to-noise is low and there could be large empty regions where the velocity field is not estimated. This is the case even for high-resolution simulations, and the problem is even more serious when estimating velocity fields for biased tracers due to their higher sparsity compared to N-body particles (Jennings et al. 2015; Zhang et al. 2015).

This mass-weighted sampling of the velocity fields can potentially lead to large uncertainties when estimating their volume-weighted statistical properties (which is the ingredient entering in many frameworks for modelling redshift space distortions) (Pueblas and Scoccimarro 2009; Zheng et al. 2015b; Zhang et al. 2015; Jennings et al. 2015). As discussed, for instance, by Jennings et al. (2011), the standard cloud-in-cell estimation of the velocity field causes its power spectrum to strongly vary depending on the number of grid points defining the velocity field mesh. The power spectra reach convergence only if the vast majority of the cells contain at least one tracer particle. In practice this means that very large cells are needed, of the order of tens of Mpc, which prevents the investigation into the highly nonlinear regime.

There have been several approaches in the literature to address this problem. One of them is to employ a large smoothing of the particle distribution. It is also possible to used adaptive smoothing based on the local density (Colombi et al. 2009), or a Kriging interpolation (Yu et al. 2015, 2017b). A different approach is to theoretically model these artefacts and then correct the measurements (Zhang et al. 2015; Zheng et al. 2015b). In particular, Voronoi or Delaunay tessellations of the particle distribution have been shown to be a good estimator of the volume-weighted velocity field. In this approach, the space is partitioned into a set of disjoint regions with volumes tracing to local density without assuming isotropy. Specifically, several authors have shown that these tessellations allow to measure the statistical properties of velocity fields with significantly less noise and bias (van de Weygaert and Bernardeau 1998; Jennings et al. 2019). These approaches, however, define a coarse-grained velocity field, i.e., the average of the velocity of different streams in a given region of space. Hahn et al. (2015) extended the phase-space tessellation method described in Sect. 3.2.2 to define the velocity field of each dark matter stream everywhere in space, with which the average and higher-order moments and derivatives of the velocity field can be computed (cf. Fig. 25), this approach can also be used to estimate the anisotropic stress tensor (Buehlmann and Hahn 2019).

Fig. 25
figure 25

Image reproduced with permission from Hahn et al. (2015), copyright by the authors

Projections of various fields in a region of \(16\times 16 h^{-1}\mathrm{Mpc}\) around a massive halo of mass \(1.4\times 10^{14} h^{-1}{\mathrm{M}}_{ \odot }\) in a Warm Dark Matter simulation. The top left panel shows the density field whereas the top right, bottom left, and bottom right show the voriticity, divergence, and the y-components of the vorticity field, respectively. Note the positive divergence in low density regions which have not yet shell-crossed. These regions can be clearly distinguished in the top-right panel as zero-vorticity

After these numerical artifacts are minimised or taken into account, it is possible to study the statistical properties of the volume weighted divergence and vorticity of the velocity fields, where analytic fits have been proposed (Jennings 2012; Hahn et al. 2015; Bel et al. 2019). In addition, it was shown that the halo velocity field is unbiased with respect to the matter velocity field on large scales (\(k < 0.1 h\,\mathrm{Mpc}^{-1}\)). On smaller scales, there appear signs of velocity bias potentially originating from the fact that halos form at special locations of the density field (Zheng et al. 2015a).

Furthermore, both vorticity and (generally anisotropic) stress in the velocity field are produced only through shell-crossing in multi-stream regions (both are decaying modes at linear order), so that they are thought to carry some information about the locally collapsed region (Pichon and Bernardeau 1999; Pueblas and Scoccimarro 2009; Laigle et al. 2015; Hahn et al. 2015; Jelic-Cizmek et al. 2018; Buehlmann and Hahn 2019), and also potentially can give insight how perturbative approaches can be pushed beyond the shell-crossing frontier.

9.4 Structure finding

9.4.1 Halos

‘Dark matter halos’ is the generic name given to regions of the universe which have undergone three-dimensional gravitational collapse. These halos are the fundamental nonlinear unit of the CDM Universe and, hence, their abundance, internal properties, and growth histories are at the core of several models for describing structure formation. This includes descriptions of the nonlinear power spectrum, modelling of biased tracers, and multiple models for the formation and spatial distribution of galaxies. Furthermore, large dark matter halos are employed to interpret observations of galaxy clusters, which can directly put constraints on cosmological parameters.

The somewhat arbitrary definition of a halo and its boundary has led to a multitude of operational definitions in N-body simulations and thus numerous algorithms exist (Davis et al. 1985; Eisenstein and Hut 1998; Stadel 2001; Bullock et al. 2001; Springel et al. 2001a; Aubert et al. 2004; Gill et al. 2004; Weller et al. 2005; Neyrinck et al. 2005; Kim and Park 2006; Diemand et al. 2006; Shaw et al. 2007; Maciejewski et al. 2009; Knollmann and Knebe 2009; Planelles and Quilis 2010; Habib et al. 2009; Behroozi et al. 2013b; Skory et al. 2010; Falck et al. 2012; Roy et al. 2014; Ivkin et al. 2018; Elahi et al. 2019); these methods can be roughly split into percolation algorithms in configuration or phase-space, and identification of peaks in the density field. Knebe et al. (2011, 2013) performed a systematic comparison among 17 halo finders based on the same mock halos and N-body simulation. This study found a very good agreement in the recovered structure and their internal properties, although the halo mass returned by each algorithm varied greatly owing to the arbitrariness in the halo definition. Note that although the computational cost of performing group finding is small compared to carrying out a large simulation, storing all particle data imposes significant storage requirements, so that state-of-the-art N-body codes typically carry out on-the-fly halo finding during the simulation runtime thus avoiding to store the dark matter catalogues and allowing a finer time resolution in halo statistics, which is necessary for high quality merger trees (see below).

The three most common halo finders currently in use are the classical configuration space ‘friends-of-friends’ (FoF) (Press and Davis 1982; Davis et al. 1985), spherical overdensity (SO) (Lacey and Cole 1994), and phase-space FoF (Diemand et al. 2006; a widely used example being ROCKSTAR Behroozi et al. 2013b). FoF is a percolation algorithm that groups N-body particles whose maximum separation is below a threshold, usually referred to as the ”linking length" (\(\ell _l\)) and given in terms of the mean inter-particle separation (formally a partitioning of all particles into equivalence classes whose members fullfill the distance criterion with at least one other member). FoF effectively identifies regions in terms of an isodensity contour with a value \(\delta \sim 0.65\,\ell _l^{-3} - 1\) (More et al. 2011). For typical values of \(\ell _l = 0.2\), this corresponds to \(\sim 80\) times the mean density. In contrast, SO identifies spherical regions whose mean enclosed density is above a threshold, usually matching the value expected from spherical collapse (Bryan and Norman 1998), although it also is common to use the threshold value for an Einstein–de-Sitter Universe: 200 times the background density and even 200 times the critical density of the Universe. More recently, phase-space FoF (and ROCKSTAR in particular) has been proposed as a way to improve upon the FoF algorithm to include also sub-structure. The basic idea is to split a standard FoF group into a hierarchy of structures recursively identified with a FoF algorithm but with a linking length and distance metric defined in phase space, thus grouping together particles that are nearby both in position and velocity. Additionally, ROCKSTAR discards those particles that are not gravitationally bound.

FoF and SO are popular methods but have shortcomings. In particular, FoF is known to suffer from numerical artefacts, where the mass of a given halo strongly depends on the mass resolution (More et al. 2011; Ludlow et al. 2019; Leroy et al. 2021). Although corrections have been proposed (Warren et al. 2006), they have not shown to be universal, but depend on the numerical resolution in addition to the redshift and cosmology of the simulation (Ondaro-Mallea et al. 2022). SO on the other hand, suffers less from numerical artefacts, however it cannot readily adapt to the triaxial shape of DM halos (but note that this algorithm can be extended to identify ellipsoidal regions Despali et al. 2013), there is an ambiguity in regions where the boundaries of two separate SO halos overlap García and Rozo (2019), and there can be a time evolution of halo mass simply due to the time-dependence of the overdensity definition used (Diemer et al. 2013). On the other hand, ROCKSTAR has been shown to be relatively robust against numerical resolution (Leroy et al. 2021), arguably thanks to the unbinding procedure. On the other hand, there is still arbitrariness in the definition of the phase-space metric, and the use of gravitational binding energy in the presence of accelerated frames and larger scale tidal fields Stücker et al. (2021a).

Another important issue regarding group finders concerns the so-called universality of the halo abundance as a function of mass, the “halo mass function”. According to Press–Schechter theory and spherical collapse, the halo mass function should be “universal” in the sense that it is primarily given by statistics of peaks in Gaussian random fields—thus a single set of N-body simulations could be used in principle to predict the halo mass function at any redshift and in any (reasonable) cosmology (Jenkins et al. 2001; Reed et al. 2003; Bhattacharya et al. 2011; Crocce et al. 2010; Angulo et al. 2012; Watson et al. 2013; Bocquet et al. 2016; Seppi et al. 2021). At the same time, N-body simulations, however, have detected departures from this universality at the \(10-20\%\) level. While the precise amplitude of the effect varies with the halo mass definition and algorithm (Tinker et al. 2008; Despali et al. 2016; Diemer 2020; Ondaro-Mallea et al. 2022), it can be the leading systematic error in many theoretical models that rely on the halo mass function, and as a consequence in the parameter constraints from future galaxy cluster surveys (e.g., Salvati et al. 2020).

The non-universality of the mass function and the ambiguity in the halo mass function have sparked recently a new wave of halo finders. These are relying on identifying caustic boundaries (Shandarin 2021), the splash-back radius where the slope of the density profile is the steepest (Diemer and Kravtsov 2014; Diemer 2020), and others seeking to define a halo via characteristic features in the halo density profiles (Garcia et al. 2021; Fong and Han 2021).

Even if these approaches succeed in defining more “natural” halo boundaries in the dark matter field and/or result in universal mass functions, it is not clear that they will necessarily lead to an advantage in the modelling of structure in the universe. Quite likely, the same halo (or mass) definition is not the best for all applications. For instance, one specific halo definition and boundary might be an excellent predictor for the properties of the galaxies it hosts, whereas another mass definition could be more tightly correlated with the X-ray luminosity associated to that halo. In any case, the standard halo definitions only approximately mark the boundary of a collapsed region, this new generation of halo finders might shed light into more fundamental relations between halos and observable counterparts.

9.4.2 Subhalos

When a dark matter halo is accreted by a larger one, its outer parts are gradually tidally stripped but the remnant can still be identified as a separate structure, which is referred to as a “subhalo”. These subhalos are expected to host satellite galaxies if they are massive enough, and to produce detectable perturbations in strong gravitational lensing, and potentially even emit electromagnetic radiation due to DM annihilation. An example of a subhalo population of a Milky-Way halo is shown in Fig. 26 which displays structure in radial phase space coordinates, split into the bound subhaloes, still coherent tidally disrupted structures (tidal streams) and the smooth background halo, as identified by the HSF subhalo finder (Maciejewski et al. 2009).

Fig. 26
figure 26

Bound and unbound subtstructures of a CDM halo as identified by the HSF phase-space subhalo finder. Shown is the density in radial phase space (i.e., radial distance from the halo center vs. radial velocity). The left panel shows the gravitationally bound subhalo structures identified, the central panel the coherent but not gravitationally bound structures (i.e. tidal streams) and the right panel the smooth main (host) halo component after removal of all identified substructure. A good substructure finder should leave no visible correlated small scale structure in the main halo. Image adapted from Maciejewski et al. (2009), copyright by the authors

As for the case of halos, the definition of a subhalo is somewhat arbitrary which has led to multiple algorithms being proposed (Onions et al. 2012; Han et al. 2012, 2018; Elahi et al. 2019). They can be roughly grouped into phase-space percolation, peaks in the density field, and temporal algorithms which track the descendant particles of a previously-known dark matter halo. Examples of publicly-available subhalo finders are ROCKSTAR (Behroozi et al. 2015), HBT+ (Han et al. 2018), and VELOCIRAPTOR (Elahi et al. 2019), and SUBFIND (Springel et al. 2001a) (updated along the ideas of HBT+) which is now part of the public GADGET-4 release.

However, unlike for the case of halos, much larger differences exist between different group finders. This is particularly true in the case of major mergers (Behroozi et al. 2015), where percolation algorithms in configuration space fail to distinguish two overlapping structures of similar mass. In the comparison performed by Knebe et al. (2011), it was found that subhalos are typically identified robustly by most algorithms down to \(\sim 40\) particles, which decreased to \(\sim 20\) when velocity information is included.

Beyond identification completeness (which can be a function of radius in the host), also numerical effects lead to an ‘overmerging’ of subhaloes (see the discussion in Sect. 8.2 on force softening). A robust characterisation of subhalo abundance in N-body simulations has to take into account such numerical selection effects. It is further common to replace completely disrupted subhalos with an ‘orphan’ model that tracks them beyond the resolution limit (see Sect. 9.9.2).

9.4.3 Voids

While most of the mass in the Universe is in halos and subhalos, most of the volume is found in low-density regions which have not undergone any kind of collapse. These are referred to as cosmic ‘voids’, and can span regions of up to hundreds of Mpc in diameter.

Voids are capturing an increasing amount of attention in the community since they have been shown to provide useful cosmological tests in galaxy surveys (e.g., Hamaus et al. 2020). In particular, they are expected to be sensitive to the non-Gaussian structure of the nonlinear universe, and thus are complementary to traditional cosmological probes based on two-point correlators. Specifically, they are particularly interesting in the context of screened versions of modified gravity which are identical to GR in high-density regions but display an additional fifth-force in low density regions such as voids (cf. Sect. 7.10). Another motivation is that voids are earlier dominated by dark energy compared to average density regions, and a potential departure of gravity from general relativity can be more evident in low-density regions (see e.g., Cai et al. 2015). They could also help in understanding dark energy trough their imprint in the CMB through the Integrated Sachs Wolfe (ISW) effect (Cai et al. 2016). Another appeal of voids is that due to their comparatively simple dynamics and structure, they can be modelled accurately with simple analytic models (usually based on linear perturbation theory and excursion set theory) (see e.g., Cai et al. 2017). This allows an optimal use of the data, nevertheless, the models employed need to be carefully tested and calibrated against numerical simulations.

As for the other types of cosmic structures, the definition and boundary of a void is arbitrary and many algorithms have been proposed. Some algorithms rely on finding spherical underdense regions (Padilla et al. 2005), others employ watershed approaches (Platen et al. 2007; Neyrinck 2008), and others the eigenvalues of the tidal field (Hahn et al. 2007a). In addition, some algorithms rely on an estimate of the three-dimensional cosmic density field, while others directly operate on halo/galaxy catalogues, which can facilitate a later comparison between results from observations and simulations. The differences among “void finders” were highlighted by Colberg et al. (2008) who systematically compared 13 approaches applied to one large void from the Millennium simulation. A further specialised comparison of void finders focused on their discriminatory power to distinguish modified gravity models was carried out by Cautun et al. (2018); Paillas et al. (2019).

Simulations are used to calibrate and develop theoretical models to use in data analysis: void size functions, the density and velocity profiles, and the geometric and dynamical distortions related to redshift space distortions, as well as their clustering. Due the low number density and sparsity of voids, very-large-volume simulations are required thus gravity-only simulations are commonly used (note no strong baryonic effects are expected (Paillas et al. 2017)), however, usually the dark matter field is used, which might show some dependence with the resolution of the simulation. The use of simulations has also helped in designing new probes, e.g., massive halos inside voids for constraining neutrino masses (Zhang et al. 2020) .

Ultimately, perhaps the biggest problem for cosmological inferences is that theoretical models might depend on the exact operational definition of voids, and it is not obvious how robust these are. Perhaps in the future, there will be emulators for voids (cf. Sect. 9.7) where the same definition can be used both in simulated and observed data. A more careful characterization of the covariances with other cosmological probes will become necessary, as the use of multi-probes and cross-correlation will be more common. New summary statistics could be able to extract non-Gaussian information more efficiently, for instance, the recently proposed k-Nearest-Neighbour (kNN) statistics (Banerjee and Abel 2021b, a), or statistics conditional on large-scale density (Neyrinck et al. 2018; Paillas et al. 2021).

9.5 Cosmic web classification

The filamentary cosmic web (Bond et al. 1996) has for decades been a focus of interest, since it might provide important insights into both environmental differences in the formation and evolution of dark matter haloes and galaxies, and also serve as a probe of cosmology. Its existence follows already from Zel’dovich’s (1970) famous formula for the Lagrangian density contrast

$$\begin{aligned} 1+\delta (\varvec{q}) = \left| \left( 1-D_+ \lambda _1 \right) \left( 1-D_+ \lambda _2 \right) \left( 1-D_+ \lambda _3 \right) \right| ^{-1}, \end{aligned}$$
(111)

where \(\lambda _i\) are the eigenvalues of the Hessian \(\phi ^{(1)}_{,ij}\) of the initial scalar perturbation potential (cf. Sect. 6.2). Evidently, positive eigenvalues lead to singularities, while negative eigenvalues correspond to expansion, so that their signature gives rise to four dynamically distinct structures ‘nodes’ (or ‘clusters’; with eigenvalue signature ‘\(+++\)’), ‘filaments’ (‘\(++-\)’), ‘sheets’ (or ‘pancakes’; ‘\(+--\)’) and ‘voids’ (‘\(---\)’).

Due to the hierarchical nature of structure formation in CDM, one expects sheets to be made up of smaller scale filaments, and filaments to be made up of smaller scale halos, and the interesting question arises about how much mass is actually collapsed (Stücker et al. 2018). Since these signatures are defined w.r.t. the linear perturbation potential, they are in general non-trivially related to the late-time Eulerian density field, as measured in simulations or, more importantly, in the real Universe. For this reason, a large range of techniques have been developed that rely e.g. on a smoothed version of the Eulerian tidal field (Hahn et al. 2007b), or a multi-scale filtered version of the Hessian of the Eulerian density field (Aragón-Calvo et al. 2007a; Cautun et al. 2013), the gradient tensor of the mean velocity field (Hoffman et al. 2012), or the anisotropic velocity dispersion tensor (Buehlmann and Hahn 2019), as well as graph based models (Bonnaire et al. 2020), or an analysis of critical points and lines in the smoothed density field (Sousbie 2011) based on Morse theory.

Another class of techniques addresses the problem from a Lagrangian point of view, by noticing that the factors in Eq. (111) undergo a sign flip when a particle crosses a caustic. Beyond the Zel’dovich approximation, on can either use the GDE tensor \(\mathsf{\varvec {D}}_{\mathrm{xq},i}\) (see Eq. 35a) or an estimate of it from a sheet tessellation (cf. Sect. 3.2.2) to track volume inversions. Algorithms based on tessellations where developed by Falck et al. (2012), Ramachandra and Shandarin (2015), Shandarin and Medvedev (2017), and based on a singular value decomposition of \(\mathsf{\varvec {D}}_{\mathrm{xq},i}\) by Stücker et al. (2020). A detailed discussion of the relative performance of most of these methods can be found in the comparison paper of Libeskind et al. (2018).

While the formation and evolution of the cosmic web is reasonably well understood, the impact it leaves on galaxies and dark matter haloes is still a matter of ongoing research. Many studies have focused on differences of halo properties in the different components of the cosmic web (and its possibly intimate connection with assembly bias, see also Sect. 9.9.2), e.g., Hahn et al. (2007b), Wang et al. (2011), Goh et al. (2019), or the alignment of halo shapes or angular momenta with the web, e.g., Aragón-Calvo et al. (2007b), Hahn et al. (2007a), Codis et al. (2012), Forero-Romero et al. (2014), Chen et al. (2016), which could leave a signature in weak lensing measurements as an ‘intrinsic alignment’ correlation, but note that the degree to which correlations found in dark matter carry over to galaxies is still unclear. An alternative approach has been followed by Paranjape et al. (2018) who quantify only the relative anisotropy of tides as a measure of external influence on the formation of haloes and galaxies.

While most of these web classification techniques have given interesting insights into environmental effects on the formation of haloes and galaxies, the dissected cosmic web, apart from voids and clusters, is not (yet) commonly used also as probe of fundamental physics. Other measures that are sensitive to the structure of the cosmic web, such as Minkowski functionals, are however quite commonly used also to constrain cosmological parameters, e.g., Shirasaki et al. (2012).

9.6 Lightcones

Numerical simulations predict the phase-space coordinates of structure on constant-time slices and this information is commonly the output of a simulation, usually referred to as snapshots. However, observations measure the universe along a null geodesic from our location at \(z=0\), i.e. along the past lightcone of an observer. For a proper comparison, it is thus necessary to transform the simulation output to a lightcone in order to account for, among others, selection effects, line-of-sight contamination, etc., and thus increase the realism of the predictions from simulations (see e.g., Izquierdo-Villalba et al. 2019).

There are essentially two approaches to build lightcones from simulations. In the first, one builds lightcones in a post-processing step by employing adjacent snapshots taken at \(\{ a_n | n=1\dots N_s \} \). In the simplest approach, structure in the lightcone at a distance r is taken from the snapshot whose expansion factor is closest to \(a_r\). This is implicitly defined for light-like distances from the FRW metric as

$$\begin{aligned} \mathrm{d}s^2=0\quad \Rightarrow \quad \int _{a_r}^1 \mathrm{d} a \frac{c}{a^2 H(a)} = r(a_r). \end{aligned}$$
(112)

Although simple, with a reasonable number of snapshots \(N_s\), this approach recovers accurately the large-scale distribution, abundance of objects, etc. However, it creates small-scale discontinuities at the transition from one snapshot to another. To ameliorate this problem, it is possible to interpolate the phase-space coordinates of particles between snapshots to the precise intersection with the lightcone. This however, can lead to an underestimation of central densities in halos due to non-trivial motions in collapsed structures (Merson et al. 2013) for which a low-order interpolation in time is insufficient. An alternative is to interpolate directly halos and subhalos. Specifically, Merson et al. (2013), Smith et al. (2017), Izquierdo-Villalba et al. (2019) employed merger trees to identify the descendant of each structure and in this way build its space-time trajectory. Note that despite these improvements, this approach creates pathological cases where the same object/particle appears twice in the lightcone or never appears at all.

As can be easily seen from Eq. (112), a simulation of side-length L can only represent structure up to a finite redshift, e.g. \(z\sim 1.4\) for \(L=3000h\) Mpc. However, one can take advantage of the periodic boundary conditions and replicate the box to reach higher redshifts. Note that there can be optimal viewing angles and lightcone angular extensions that avoid line-of-sight structure repetition (Kitzbichler and White 2007) (at least for surveys that cover a small angle). If the simulated box is large enough, structure can be considered uncorrelated, and it is common to replicate and rotate the simulation box, which has been adopted by several lightcone construction algorithms (see Fig. 27; Blaizot et al. 2005; Hollowed 2019; Rodríguez-Torres et al. 2016; Sgier et al. 2019, 2021).

Fig. 27
figure 27

Images reproduced with permission from [left] Garaldi et al. (2020), copyright by the authors; and [right] from Sgier et al. (2021), copyright by IOP/SISSA

Schematic representation of a procedure to build lightcones from numerical simulations. In the left panel we show a 1+1 space-time diagram indicating the region inside the past lightcone of an observer at the origin. The vertical lines show the minimum simulation box size required to simulate structure in a full octant of the sky up to \(z=2\), whereas the shaded area indicates the simulated regions which will never enter the lightcone. When building a lightcone, it is possible to identify and store the lightcone during runtime, or it is also possible to concatenate simulation outputs at discrete times, as those indicated by the horizontal grey lines, as a postprocessing step. Alternative to simulating large volumes, a small simulation box can be replicated and rotated such that it has a effectively a much volume, as exemplified in the figure on the right

An implicit requirement of the method above is that it requires a dense sampling of simulation outputs, \(N_s\simeq 50\)–100, to obtain accurate results, thus posing significant storage requirements for large state-of-the art simulations. A second option to build lightcones bypasses these requirements by constructing it directly during the runtime of the simulation. The idea is that particles are output only when they intersect the past lightcone, which is resolved with the time-resolution of the N-body time step. This approach was adopted, for instance, in the Hubble Volume simulation (Evrard et al. 2002) and more recently in the Euclid flagship simulation (Potter et al. 2017), see also Fosalba et al. (2008, 2015).

Since each N-body particle intersects the observer lightcone exactly once, in principle one is no longer interested in its evolution afterwards, since it is also now causally disconnected from the rest of the still ‘observable’ particles. This idea was used by Llinares (2017) to propose a ‘shrinking domain’ framework, in which particles outside the lightcone were simply discarded from the simulation, reducing the computational cost of a simulation. This assumption, however, is incompatible with the Newtonian approximation adopted in traditional simulations where information travels instantaneously. Garaldi et al. (2020) thus proposed a ‘dynamic zoom simulation’ in which particles outside the lightcone are not discarded but instead represented with progressively less resolution. Depending on the mass resolution and volume of the simulation, this approach could reduce the computational cost of a simulation by factors of a few, at the expense of not being able to store any full snapshot. Another short-coming of such optimisations is that only one lightcone can be produced per simulation, while with standard techniques in principle multiple distant observers can be used.

9.7 Emulators and interpolators

Obtaining fast predictions for nonlinear structure for arbitrary cosmologies is extremely important in modern cosmology. This is an essential ingredient in the analysis of large extragalactic surveys, which could help to distinguish among competing explanations for the accelerated expansion of the Universe and play a crucial role in the hunt for new physics.

In this review, we have argued that numerical simulations are the most precise, and often the only, way to predict nonlinear structure in the Universe. However, simulations are expensive computationally, and thus it is only possible to carry them out for a small number of specific cosmological parameters. The general strategy to address this problem is to carry out an ensemble of simulations covering a targeted space of cosmological parameters and then either interpolate the results, or calibrate a physically-motivated model with which fast predictions are obtained. This essentially comes down to non-parametric vs. parametric fitting based on simulation data.

A classical example of the parameterised approach is HALOFIT (Smith et al. 2003) for the nonlinear power spectrum. This approach has proven to be very successful, and several revisions (recalibrations to improved modern simulations) exist (Takahashi et al. 2012; Mead et al. 2015a, 2021) including extensions to massive neutrinos (Bird et al. 2012; Mead et al. 2016). Recently, the same idea has been applied to the matter bispectrum (Takahashi et al. 2020). In the most recent parametric calibration carried out by Mead et al. (2021), 100 simulations of the Mira-Titan project (Heitmann et al. 2016) were used to calibrate the nonlinear matter power spectrum (based on the halo model and empirical corrections), reaching an accuracy of about \(\sim \)5–10% up to very small scales \(k\lesssim 5 h^{-1}\mathrm{Mpc}\). The same authors have extended the approach to model the impact of baryons on these predictions.

The non-parametric approach is followed by ‘emulators’ instead. These also correspond to interpolations of the simulation results but do not make any specific assumption about the underlying physical model or even its functional form. Thus, in general, they require a larger amount of data than parametric approaches, which however, is becoming less of a problem thanks to increasing computational resources. Emulators also have the advantage of being able to model complex trends in the data, so that they are being progressively constructed for many more kinds of large-scale structure statistics. For instance, emulators have been used in modelling the nonlinear matter power spectrum (Heitmann et al. 2009; Lawrence et al. 2017; DeRose et al. 2019; Euclid Collaboration 2019; Angulo et al. 2021) even beyond \(\varLambda \)CDM (Winther et al. 2019; Ramachandra et al. 2021; Arnold et al. 2021) and for baryonic effects (Aricò et al. 2021b; Giri and Schneider 2021), the galaxy power spectrum and correlation function (Kwan et al. 2015; Zhai et al. 2019), weak lensing peak counts and power spectra (Liu et al. 2015; Petri et al. 2015), the 21-cm power spectrum (Jennings et al. 2019), the halo mass function (McClintock et al. 2019; Bocquet et al. 2020), halo clustering statistics (Nishimichi et al. 2019; Kobayashi et al. 2020). Traditionally, emulators have been built with Gaussian processes (GP), but more recently feed forward neural networks are gaining popularity as they allow to deal with larger datasets and a high number of dimensions.

A limitation of emulators is that they require a parameter space that is quite densely-sampled with simulations, and that they perform poorly outside the parameter ranges covered by this training set. As a consequence, most emulators cover a relatively small range of parameter values. Although the sampled area might be sufficient to cover the posterior distribution for parameters in upcoming large-scale structure surveys, they will definitely not be broad enough to cover the parameter prior distributions in data analysis. Therefore, a possible path in the future will be hybrid approaches where approximate and less accurate methods are employed for a broad parameter estimation, and emulators are used in a predefined region where high accuracy is required.

Another limitation of emulators is that they have a non-trivial uncertainty structure which can vary strongly across the parameter space (i.e. higher accuracy near the parameters sampled by simulations and poorer in between them). This means that not only data has an uncertainty but so does the theoretical model, which propagates and could affect cosmological constraints. Although the uncertainty can be decreased in certain parts of the parameter space by iteratively adding new simulations (Rogers et al. 2019; Pellejero-Ibañez et al. 2020) (or by combining simulations of different quality (Ho et al. 2022)), so that e.g. a high-likelihood region is better sampled than regions ruled out by data, this process depends on the summary statistic and scale in question. An alternative is to incorporate the emulator uncertainty in the data analysis, for which it will be very important to accurately quantify the emulator uncertainty in the first place. This will be a challenge per se since these are typically empirically measured with a small number of simulations.

A further challenge will be to construct accurate emulators for biased objects and in redshift space, which could aid in the interpretation of large-scale galaxy redshift surveys. This, however, poses several challenges. The first is related to the increase in the dimensionality of the problem, as an emulator would need to consider the properties of the object, e.g. the mass of the halo, and potentially other properties in addition to cosmological parameters. The second challenge is that numerical simulations are intrinsically noisier for discrete objects than for field quantities due to the shot noise associated with the latter. Nevertheless, recently, some progress towards emulators for dark matter haloes has been achieved (Valcin et al. 2019; Kobayashi et al. 2020). An interesting possibility will be to combine emulators with perturbative expansions of galaxy bias, such as those proposed by Modi et al. (2020a), Kokron et al. (2021), Zennaro et al. (2021), which has been recently applied to data from the Dark Energy Survey (Hadzhiyska et al. 2021b).

9.8 Machine learning

Machine learning is a rapidly progressing field with an increasing impact on cosmology. In the specific context of cosmological numerical simulations, there are roughly three areas in which machine learning algorithms have been applied to date: artificial data generation, enhancement of N-body simulations, and cosmological parameter estimation.

The first area refers to the creation of artificial data, usually new realizations of the cosmic density field in two or three dimensions. Creating nonlinear fields with N-body simulations is expensive computationally, as we have discussed throughout this review. Machine learning can, instead, use a small set of simulations to learn its properties and then quickly create previously unseen realizations statistically consistent with this data. This is a common task in computer vision where, for instance, images of human faces can be automatically generated (Radford et al. 2015). In a cosmological context, He et al. (2019) employed U-net architectures—in which several convolutional neural networks are used on different (downsampled) spatial scales, followed by an upsampling process and up-convolutions—to create non-linear fields from a given initial linear field. Similar ideas can be applied to directly generate catalogues of dark matter halos, as proposed by Berger and Stein (2019), Bernardini et al. (2020), or of weak lensing convergence maps (Mustafa et al. 2019; Tamosiunas et al. 2021).

Another pathway to create new realizations of the nonlinear matter field is based on Generative Adversarial Networks (GANs). The main idea of GANs is to have two competing neural networks; one of which creates fake data from a random process and another that seeks to distinguish them from those in the training sample. This architecture has shown to produce 2D and 3D fields quantitatively very similar to those obtained by direct numerical simulation (Rodríguez et al. 2018; Perraudin et al. 2019; Tamosiunas et al. 2021; Ullmo et al. 2020; Feder et al. 2020), even in previously unseen cosmologies (see Fig. 28 and Perraudin et al. 2021). The architectures used to create fake data can also be combined with existing data from, for instance, low resolution simulations to artificially increase their resolution. This task, usually referred to as “super resolution” has been performed by e.g., KodiRamanah et al. (2020), Li et al. (2020) where small-scale high-resolution features were ‘in-painted’ onto low-resolution N-body simulations, finding good agreement for a selection of density, velocity, and (sub)halo statistics (Ni et al. 2021).

Fig. 28
figure 28

Comparison of the projected two dimensional mass fields for different combinations of cosmological parameters, as computed from N-body simulations in the bottom row, and by a machine learning algorithm in the top row. Specifically, these maps were generated using a Generative Adversarial Network where two neural networks compete against each other; one seeking to generate ever more realistic results and the other attempting to distinguish the generated from real data. In this case, the GAN was trained with 46 different combinations of cosmological parameters (\(\varOmega _m\) and \(\sigma _8\)), and we can see it is able to generate data that correctly captures the cosmological dependence of such maps. Image adapted from Perraudin et al. (2021), copyright by the authors

An alternative way to create new realisations was followed by He et al. (2019), who directly learned the displacement field from FastPM simulations (relative to the Zel’dovich approximation) connecting Lagrangian and Eulerian coordinates of a given particle. Interestingly, these authors obtained density statistics that closely resemble FastPM simulations, even adopting cosmological parameters different to those used in the training (see also Kaushal et al. (2021) for a similar idea). Dai and Seljak (2020) took this idea further by predicting the displacement field (including modifications due to baryonic effects), over a small number of layers (or time steps), but enforcing rotational and translational symmetries, as expected from the underlying physical laws.

Neural networks can also be used to enhance the results of numerical simulations. For instance, Giusarma et al. (2019) used a U-Net architecture to modify the outputs of standard \(\varLambda \)CDM N-body simulation in a way that mimics the presence of massive neutrinos. This approach appears accurate down to nonlinear scales, \(k\sim 0.7 h\,\mathrm{Mpc}^{-1}\), and for multiple statistics. Additionally, Tröster et al. (2019) used GANs and variational auto-encoders trained on the BAHAMAS hydrodynamical simulation to predict the distribution of gas and baryonic pressure in gravity-only simulations. These approaches can also be used to connect galaxies/halos to the initial density field. Specifically, Bernardini et al. (2020), Zhang et al. (2019b), Icaza-Lizaola et al. (2021) have used “Convolutional Neural Networks” or “Sparse Regressions” to predict either the dark matter halos or the galaxy distribution in a hydrodynamical simulation based on the properties of the dark matter field. Other examples are Nadler et al. (2019) and Ogiya et al. (2019) who used “Random Forests” to predict which subhalos in a gravity-only simulation would have been disrupted due to baryonic or tidal effects, respectively.

At the heart of many of the above algorithms is that there are nonlinear ‘features’ that characterise the cosmic density field. Thus, it might be possible to use these features to also distinguish among cosmological models and constrain their parameters. The idea of using machine learning to obtain constraints on cosmological parameters has been explored in many recent works (Ravanbakhsh et al. 2017; Schmelzle et al. 2017; Gupta et al. 2018; Fluri et al. 2018; Ribli et al. 2019; Peel et al. 2019; Ntampaka et al. 2020; Pan et al. 2020; Zorrilla-Matilla et al. 2020) and even applied to weak-lensing data (Fluri et al. 2019) of the KIDS-450 survey (Hildebrandt et al. 2017). These works have used different flavours of deep convolutional neural networks trained on N-body simulations and were typically able to place stronger constraints than traditional clustering analyses based on two-point functions. Although they are usually carried out in idealistic scenarios and only for the matter field (thus bypassing the problem of galaxy bias), they highlight the potential of the methods and, more generally, the existence of valuable cosmological information in LSS data potentially currently unexploited.

One of the drawbacks of this approach is that it is difficult to interpret what features a particular neural network is relying on when constraining cosmological parameters. In addition, as is known from the field of computer vision, it is possible to specifically design perturbations to images that can completely fool a well-trained neural network (adverse examples). This serves as a caution to the use of neural networks for robust parameter inference. An alternative is simply to define new summary statistics motivated by the typical machine learning filters; such as wavelet phase harmonics (Allys et al. 2020) or scattering transforms (Cheng et al. 2020), or by analysing the feature maps in a trained network [such as the steepness of peaks in filtered convergence maps proposed by Ribli et al. (2019)]. These statistics appear to be able to capture most of the cosmological information in the fields, thus opening a promising analysis route for future large-scale structure surveys.

A key requirement for most machine learning algorithms is the existence of vast training data provided by numerical simulation. This has motivated the creation of large ensembles of simulations such as the CAMELS suite (Villaescusa-Navarro et al. 2021), or the MADLens framework (Böhm et al. 2020). Although they have so far been comprised of low-resolution or small simulated volumes, or even created with approximate methods such as COLA and Fast-PM (Sect. 4.4), the continuous advances in numerical cosmology will certainly allow improvements in this direction in the future, which will enhance machine learning applications even further.

9.9 Connecting dark matter to galaxies and baryons

Understanding the galaxy-halo connection is at the very heart of using the large-scale structure of the Universe as a cosmological probe. On small scales, modelling the galaxy-halo connection can provide constraints on fundamental physics; for instance, on the free-streaming scale of dark matter or on possible dark matter-baryon interactions (Nadler et al. 2019), on the allowed modifications of gravity (He et al. 2019) (cf. Sect. 7.10), or on the dark matter mass (Newton et al. 2021). We refer the reader to Baugh (2006), Wechsler and Tinker (2018), Somerville and Davé (2015), Vogelsberger et al. (2020) for specialised reviews of galaxy formation physics and here simply summarise those techniques most commonly used in large-scale dark matter simulations.

Currently, the arguably most realistic way to jointly model galaxies and the dark matter field is through cosmological hydro-dynamical simulations. These simulations attempt to explicitly simulate astrophysical processes related to baryonic physics such as gas cooling, star formation, and feedback energy injection from supernovae and supermassive black holes among others. State-of-the art examples are, in alphabetical order, the BAHAMAS (McCarthy et al. 2017), Cosmo-OWLS (Le Brun et al. 2014), EAGLE (Schaye et al. 2010; Crain et al. 2015), Horizon-AGN (Dubois et al. 2014), Illustris (Vogelsberger et al. 2014), Illustris-TNG (Springel et al. 2018), Magneticum (Hirschmann et al. 2014), MassiveBlack-II (Khandai et al. 2015), and SIMBA (Davé et al. 2019) simulations.

Unfortunately, hydrodynamical simulations are notoriously expensive computationally, and thus they can typically only be carried out for relatively small cosmic volumes. Additionally, since galaxy formation cannot be predicted ab initio, many specific choices need to be adopted regarding the physical processes to be modelled and their free parameters. In addition, the modelling of most of these processes is fine-tuned to the resolution of the simulation at hand, which raises the question how predictive these simulations are. This is perhaps the main limitation of hydrodynamical simulations in the context of large-scale structure. Given the current uncertainties in galaxy formation theory, it is therefore desirable to be able to explore a more general galaxy-halo connection, ideally incorporating the respective uncertainty when analysing and interpreting observational data.

For these reasons, currently, there are various popular and useful models for the galaxy population in a given dark matter simulation which can be applied in post-processing. These models can be roughly split into two categories: empirical, where the relationship between dark matter halos and galaxies is simply given in parametric or non-parametric forms but that do not predict galaxy properties; and physically-motivated which attempt to directly model galaxy formation trough a set of simplified descriptions. In this subsection we review the most common approaches.

9.9.1 HOD and CLF

The simplest approach to connecting haloes with galaxies is the ‘halo occupation distribution’ (HOD) model which assumes that the number of galaxies residing in a given halo is only a function of the host halo mass. Note that a more sophisticated version of HOD is the conditional luminosity/mass function (CLF) which attempts to describe not only the total number of galaxies in a halo, but also the full distribution of a secondary galaxy property (e.g. stellar masses or luminosities) given by its distribution function conditional on halo mass (van den Bosch et al. 2003, 2007)

In the HOD formalism, the expected average number of satellite and central galaxies in a given halo (also referred to as “occupation distribution") is given in a parametric form motivated by observations and galaxy formation models. To make a realisation of a galaxy population of a given halo in an N-body simulation, central galaxies are assumed to follow a Bernoulli distribution, whereas the number of satellite galaxies is assumed to be Poissonian.

Early HOD recipes contained only three free parameters and were able to describe the luminosity and color dependence of the correlation function of galaxies in SDSS (Zehavi et al. 2005). Motivated by the measured occupation distribution in both hydrodynamical simulations and SAMs (cf. Sect. 9.9.4), the number of free parameters increased to five (Zheng et al. 2005), whereas latest incarnations can include up to ten free parameters (Hearin et al. 2016).

In the most basic HOD models, a central galaxy sits at rest in the center of the halo and the satellites are usually assumed to follow a NFW profile (Zheng et al. 2007) (or directly the phase-space distribution as dark matter particles). More sophisticated galaxy models have shown that many of these assumptions are not valid in general: the number and properties of galaxies depend also on halo properties other than mass, and satellite galaxies move and are located differently than DM particles (Diemand et al. 2004a; Wu et al. 2013; Orsi and Angulo 2018). All these impact the expected spatial distribution of galaxies, thus recent HOD implementations have relaxed some of the assumptions of this model to enable a more realistic description of galaxies. In particular, attempts are made to incorporate secondary correlations with other halo properties beyond mass or environmental properties (Hadzhiyska et al. 2020, 2021a), velocity bias (Guo et al. 2015), or flexibility in the spatial distribution to incorporate assembly bias (Hearin et al. 2016; Zehavi et al. 2019; Xu et al. 2021).

Despite its limitations, HOD remains one of the most popular approaches to model galaxies in large-scale simulations, thanks to its flexibility to capture galaxies regardless of their type of selection function and without making strong assumptions about galaxy formation physics. A further advantage is that it can be applied to relatively low-resolution N-body simulations. In fact, the HOD and CLF have been used together with N-body simulations to provide the strongest cosmological constraints to date from galaxy clustering. For instance, Reid et al. (2014) obtained 2.5% measurements on the growth rate using the multipoles of the redshift-space correlation function as measured in SDSS-III BOSS CMASS and Lange et al. (2021) 5% with the LOWZ sample, whereas Cacciato et al. (2013) obtained constraints on \(\varOmega _{\mathrm{m}}\) and \(\sigma _8\) competitive with those derived from CMB analyses but using galaxy clustering and galaxy galaxy lensing in the SDSS survey. Although a careful assessment of the systematic sources of uncertainty in these constraints is required, they serve as an example for the statistical power obtained from the combination of N-body simulations and galaxy formation models.

9.9.2 Subhalo abundance matching—SHAM

Several of the limitations of the HOD can be ameliorated by taking its assumption one step further. In particular, one can assume that galaxies reside in dark matter substructures instead of simply residing in dark matter haloes. This seemingly small difference implies that there should be a one-to-one relation between galaxies and dark matter substructures, which makes DM simulations much more predictive about the phase-space coordinates of galaxies.

In the most general formulation, one could describe the galaxy population with the probability of residing in a subhalo with a given set of subhalo properties (Yang et al. 2012; Moster et al. 2013; Behroozi et al. 2013a). Different authors have chosen different parametrisations, subhalo properties, and redshift evolution, but obtain similar results when inferring halo masses from the stellar mass of observed galaxies. Not only stellar mass can be modelled in this way, but also star formation rate, metallicity, dust content, etc.

Another, alternative avenue is followed in ‘subhalo abundance matching’ (SHAM) by simply assuming a monotonic relation between subhalo and galaxy properties (Kravtsov et al. 2004; Vale and Ostriker 2004; Conroy et al. 2006). In this ‘rank-order’ SHAM, the most massive subhalo hosts the galaxy with the largest corresponding property (e.g. the highest luminosity) and so on, thus providing a non-parametric galaxy-halo relation. In current implementations this relationship is not assumed to be perfect, but it is modelled with an additional parameter that allows for some scatter in the mapping, which appears to be demanded by the data.

Exactly which global property of DM subhalo is the optimal parameter for a SHAM mapping has been a matter of debate. Modern implementations use parameters based on the circular velocity of the subhalo, \(V_{\mathrm{circ}}(r) = \sqrt{G (M<r)/r}\), as it is less sensitive to a specific definition of halo mass and captures better the inner parts of a halo, which arguably, is better connected to the inner regions a galaxy is expected to inhabit. In fact, \(V_\mathrm{circ}\) appears to reproduce the small-scale clustering of galaxies much better than mass-based-SHAMs. Furthermore, since subhalo properties can evolve substantially once accreted due to the tidal heating and stripping, subhalo properties before accretion are preferred. Explicitly, several authors have argued that the highest value of the maximum of the circular velocity, \(v_{\mathrm{max}} = max(V_{\mathrm{circ}})\) over the history of a halo, referred to as \(v_{\mathrm{peak}}\), is the single property that correlates strongest with the stellar mass of galaxies in hydrodynamical simulations (Chaves-Montero et al. 2016; Xu and Zheng 2020; He 2020). There has also been attempts to combine more than one property for SHAM, e.g., including secondary dependences with \(V_{\mathrm{circ}}(r=R_{200})\), the peak halo mass \(M_{\mathrm{peak}}\), concentration, or large-scale density (e.g., Mao et al. 2015; Lehmann et al. 2017; Contreras et al. 2020a; Tonnesen and Ostriker 2021), seeking to improve the accuracy of SHAM or capture better environmental dependences (e.g., as assembly bias).

Subhalo abundance matching has been shown to be in good agreement with multiple observations such as the two-point and three-point galaxy clustering and the Tully–Fisher relation (Vale and Ostriker 2004; Conroy et al. 2006; Marín et al. 2008; Trujillo-Gomez et al. 2011; Nuza et al. 2013; Reddick et al. 2013), and to accurately describe the clustering of stellar-mass selected galaxies in hydrodynamical simulations (Chaves-Montero et al. 2016; Contreras et al. 2020c; Favole et al. 2022), to capture the so-called assembly bias (Contreras et al. 2020c), and to reduce the need of velocity bias (Ye et al. 2017) [but see Hearin et al. (2013) for difficulties regarding galaxy groups]. SHAM, while accurate for describing stellar masses, is less so for star formation rates, or cold gas fraction, which are expected to display a non-monotonous relationships to dark matter halos. On the other hand, SHAM could be used as a starting point to describe these other properties by using the respective distribution functions conditional on stellar masses (Hearin and Watson 2013; Favole et al. 2017), or account for e.g. observational incompleteness (Favole et al. 2016). A similar idea has been implemented by Contreras et al. (2020b), Favole et al. (2022), who extended SHAM with recipes for star formation rate based on the mass accretion rate, and showed a remarkably good agreement with the clustering of SFR-selected galaxies in the Illustris-TNG simulation (Springel et al. 2018).

SHAM imposes stricter requirements than HOD on the numerical accuracy and mass resolution of the parent numerical simulation. This is since the evolution of substructures inside halos needs to be followed for a considerable number of dynamical times, relatively high force and time integration accuracy, together with adequate mass resolution are necessary (Klypin et al. 2015; Guo and White 2014) (see also the discussion in Sect. 8.2).

Roughly, in every pericentric passage, DM subhalos loose approximately 90% of their mass. Thus, for any finite mass resolution, subhalo mass will eventually fall below the resolution limit of a simulation and be lost (Moore et al. 1996; van den Bosch 2017), see also the discussion in Sect. 8.2. Naturally, this does not necessarily mean that the respective galaxy has merged, thus one expects a population of subhalo-less galaxies, also known as orphan galaxies (see e.g., Gao et al. 2004; Wu et al. 2013; Delfino et al. 2021). The relative abundance of these objects has been debated (Klypin et al. 2015; Guo and White 2014), and might depend on the particular subhalo finder and merger tree algorithm used, it also depends on the target galaxy population that is modelled. Furthermore, even with an infinite resolution, baryons and stars modify the gravitational potential inside DM halos and of the infalling structures, thus SHAM and DM-only simulations cannot correctly capture the dynamics of satellite galaxies. Thus, it is important to model orphan galaxies and their eventual disruption within SHAM to obtain precise predictions for the clustering of galaxies, specially on small scales (Campbell et al. 2018).

Similarly to HODs, the SHAM technique has also been used together with N-body simulations to place strong constraints on cosmological parameters from the observed clustering of galaxies. For instance, Simha and Cole (2013) obtained constraints on \(\varOmega _\mathrm{m}\) and \(\sigma _8\) from the projected correlation function in SDSS. Another example was provided by He et al. (2018) who demonstrated that a basic HOD and f(R) simulations was incompatible with the clustering in SDSS which allow them to place one of the strongest constraint on the amplitude of a hypotetical f(R) scalar field. As for the HOD, further work is required to establish the robustness of these results, in particular with respect to degeneracies of galaxy formation physics, but these results suggest a very promising avenue for future cosmological data analysis.

9.9.3 Semi-empirical models

More recently, there have been attempts to build upon the success of SHAM and to extend it into a more physical model which is able to predict self-consistently galaxy properties across time. Such a path is followed in the models dubbed ‘Universe Machine’ (Behroozi et al. 2019) and ‘EMERGE’ (Moster et al. 2018), and the ‘Surrogate-Baryonic-Universe’ (Chaves-Montero and Hearin 2020). The basic idea of these methods is to adopt an empirical model of the relationship between star formation rate (SFR) and the fraction of quenched galaxies, and subhalo properties (e.g. maximum circular velocity and halo formation time), together with a specific redshift evolution. Then, with the use of subhalo merger trees extracted from DM numerical simulations, it is possible to predict the star formation history of every modelled galaxy. As a result, this SFR history is then turned into the total stellar mass, from which other observable properties can be computed for a given stellar population synthesis model, initial mass function, and dust model.

The free parameters (15 - 40) of these relationships are constrained using Bayesian algorithms by requiring an agreement with observational data across multiple redshifts. In general, these models are more physical than purely empirical models in that galaxy predictions are self-consistent (the stellar mass of a galaxy is the integral of its star formation history), but they do not make any specific assumptions about the physics underlying the inferred relations.

These models are quite successful in reproducing and predicting multiple properties of the observed galaxy population. For this reason, it would be very interesting to explore the kind of cosmological inferences that they would be able to place when employed with N-body simulations. For instance, the possible degeneracies between cosmological parameters and those physical parameters describing galaxy formation and evolution. This would also be interesting for informing which physical processes are the most important to understand and model for improved cosmological power. To our knowledge, this has not yet been attempted in the literature, but will certainly be an important step towards optimal exploitation of up upcoming large-scale structure data.

9.9.4 Semi-analytic models—SAMs

Arguably the most complete post-processing modelling of the galaxy population is provided by ‘semi-analytic’ models (SAMs) of galaxy formation. This approach seeks to self-consistently evolve the properties of galaxies through a coupled system of ordinary differential (or difference) equations describing the amounts of stars, cold and hot gas, black holes, and star formation rate in galaxies.

SAMs were in fact the first attempts to model galaxy formation and evolution, with the main ideas dating back to White and Rees (1978), White and Frenk (1991). Early SAMs were built on top of analytic halo merger trees and captured only the main physics that was expected to be necessary to understand galaxy formation (Kauffmann et al. 1993; Cole et al. 1994, 2000). Since then, SAMs have significantly evolved and have been improved in their assumptions, sophistication and scope. Modern examples are ‘L-Galaxies’ (Henriques et al. 2015), ‘Galform’ (Cole et al. 2000), ‘Galacticus’ (Benson 2012), ‘SAG’ (Cora et al. 2018), ‘SAGE’ (Croton et al. 2016), ‘SHARKS’ (Lagos et al. 2018), ‘Gaea’ (Hirschmann et al. 2016), as well as ‘\(\nu ^2\)GC’ (Makiya et al. 2016); and typically include gas cooling, chemical evolution and enrichment for multiple elements, feedback from SNe type I and II, stellar winds, binary evolution, environmental and RAM-pressure stripping, black hole formation, mergers, and feedback, dust formation, and radiative-transfer effects. For a comparison of models see De Lucia et al. (2010), Knebe et al. (2015, 2018).

Schematically, the fundamental assumption underlying SAMs is that every time a mass element is accreted onto a DM halo, a corresponding amount of gas (set by the universal baryon fraction) will also be accreted. Baryons will be shock heated to \(T_\mathrm{vir}\), the virial temperature of the host halo, and start to cool either via bremsstrahlung (\(T_{\mathrm{vir}} > 10^7\,K\)), electron recombination (\(10^4\,K< T_{\mathrm{vir}} < 10^7\,K\)), or metal-line cooling (\(T_{\mathrm{vir}} < 10^4\,K\)). If the cooling time of the halo is long compared to the age of the universe, then the accretion is said to be in a hot-mode, and a fraction of the gas will condense into stars in every dynamical time, otherwise the cooling is fast and system will develop cooling flows.

The initial mass function (IMF) of newly formed stars is a degree of freedom of these models, and most modern implementations adopt a Chabrier IMF, but a Salpeter IMF, Kroupa IMF, and modifications of them have also been used in the past (e.g., Baugh et al. 2005). Subsequently, stellar evolution models and stellar population synthesis models will then determine the typical lifetime and spectral energy distribution of stars of a given mass. During their life and death, stars return part of their mass to the intergalactic medium, which is typically assumed to be recycled instantaneously in the subsequent generation of starts. Energy injection could be thermal, kinetic, or radiative, and halts star formation.

In parallel to star formation, modern SAMs also follow the growth of supermassive black holes at the center of every galaxy. These black holes are assumed to grow from initial seeds by mergers and gas accretion. In most models, during most of their lifetime, black holes increase their mass through Bondi–Hoyle–Littleton accretion, and part of the accreted mass will then be re-injected as energy into the ISM reducing star formation, in what is known as a radio-mode feedback. During some moments of their history, a ‘quasar mode‘ black hole feedback cycle can be triggered by mergers and disk instabilities, which leads to high accretion rates and the emergence of quasars and AGNs (Croton et al. 2016; Bower et al. 2006).

Finally, the halo and subhalo merger trees determine the mass accretion and mergers. As discussed above in the context of SHAM, subhalo mergers and disruption are not necessarily a good predictor for galaxy merger and disruptions. Thus, additional recipes are needed together with the modelling of orphan galaxies.

A disadvantage of SAMs is the relatively large number of free parameters—20–50 typically. However, this arises mainly from our current uncertainty about many aspects of galaxy formation, and the large number of physical processes that are described. Typically, these free parameters are fixed by requiring agreement with a set of observations. Early SAMs were calibrated ‘by hand’ where agreement with observations was subject to the criterion of a particular author. Nowadays, SAMs are calibrated in a more objective manner using Bayesian statistics (Monte Carlo Markov chains, particle swarms; e.g. Kampakoglou et al. 2008; Henriques et al. 2009; Bower et al. 2010; Ruiz et al. 2015), however, since different codes and authors choose different observables for their calibration, large discrepancies still exist among different codes.

The strength of SAMs is that they provide self-consistent (physically motivated) predictions for galaxy properties over, in principle, the whole electromagnetic spectrum. Thus, SAMs have been employed to study and understand multiple aspects of galaxy formation: the mass-metallicity relation, mass-luminosity relations, galaxy clustering, galaxy colours, etc. SAMs have also been used to make predictions for upcoming large-scale galaxy surveys (Merson et al. 2018; Angulo et al. 2014), and to understand the physics of galaxy formation with dark energy beyond a cosmological constant (Fontanot et al. 2012, 2015a), f(R) gravity (Fontanot et al. 2013), and massive neutrinos (Fontanot et al. 2015b). An example of the spatial distribution of dark matter and SAM galaxies is shown in Fig. 29.

Fig. 29
figure 29

The spatial distribution of dark matter and stellar-mass selected galaxies with \(M_* > 10^{10} h^{-1}{\mathrm{M}}_{ \odot }\) at \(z=0\) in the Millennium-XXL simulation (Angulo et al. 2012). Galaxies were simulated using the L-Galaxies semi-analitic galaxy formation model of Henriques et al. (2013), and are displayed as circles whose size is proportional to the expected half-mass radius. Image adapted from Angulo et al. (2014)

9.9.5 Modelling of baryonic and gas physics

The main assumption of the galaxy formation modelling discussed in the previous section is that baryonic physics determines the properties of galaxies, but that the gravitational potential, and thus the dynamics, is fully determined by gravity without any back-reaction from galaxy astrophysics. Although gravitational interactions dominate the nonlinear evolution of cosmic structure, the accuracy required by future cosmological weak lensing observations is such that baryonic effects on the matter distribution cannot be neglected.

Recently, the impact of effects such as feedback from supermassive black holes, star formation, gas cooling, etc, has been studied extensively by comparing the predictions of gravity-only simulations and full hydrodynamical calculations. The consensus is that baryons modify systematically the shape of the nonlinear density power spectrum on small scales. Specifically, baryons induce a suppression of the amplitude on intermediate scales (\(k \sim 1 h\,\mathrm{Mpc}^{-1}\))—which is mostly resulting from the collisional nature of baryons and additional injection of energy through feedback which prevents the accretion of gas onto dark matter halos and modifies the baryonic density profiles—and induce an enhancement on smaller scales (\(k \sim 10 h\,\mathrm{Mpc}^{-1}\))—as a result of gas cooling and star formation (van Daalen et al. 2020; Chisari et al. 2019). This effect is also shown in Fig. 30.

Fig. 30
figure 30

The effect of baryonic physics quantified via \(S(k) \equiv P(k)/P_{\mathrm{GrO}}(k)\): the ratio of the nonlinear power spectrum as measured in various hydrodynamical simulations over that in dark matter gravity-only N-body simulations. Each panel displays the measurements in various hydro-dynamical suites, BAHAMAS, C-OWLS, OWLS, whereas the rightmost panels shows the same measurement in the Illustris, EAGLE, Illustris-TNG, and Horizon-AGN. Additionally, in each case we display as solid lines the best-fitting description of a “baryonification model”, in which the outputs of a gravity-only simulations are perturbed according to physically motivated recipes (Schneider and Teyssier 2015; Schneider et al. 2018; Aricò et al. 2020, 2021a). Image adapted from Aricò et al. (2021b), copyright by the authors

Although different hydrodynamical simulations agree qualitatively in their predictions, they differ in the magnitude and redshift dependence of these effects (van Daalen et al. 2020). Specifically, some simulations predict a relatively minor effect (less than 5% at \(k > 5 h\,\mathrm{Mpc}^{-1}\)), e.g. Horizon-AGN (Dubois et al. 2014), EAGLE (Schaye et al. 2015; Hellwing et al. 2016), and Illustris-TNG300 (Springel et al. 2018). Other simulations, predict stronger effects (10–15%) on the same scales, e.g. BAHAMAS (McCarthy et al. 2017), OWLS (Schaye et al. 2010; van Daalen et al. 2011), and Cosmo-OWLS (Le Brun et al. 2014). An extreme case is Illustris (Vogelsberger et al. 2014), which predicts significant effects even at \(k \sim 0.5 h\,\mathrm{Mpc}^{-1}\) but is known to suffer from unrealistically low gas fractions in massive haloes. Generally, it is striking that those simulations that are tuned to produce ‘realistic’ galaxy properties, predict a weaker impact on the density power spectrum, whereas those tuned on cluster properties predict stronger baryonic effects. The discrepancy will have to be settled with the larger and more realistic next generation of hydrodynamic cosmological simulations. Perhaps, more interestingly will be the comparison with observational data. Although current constraint are able to only marginally rule out the most extreme feedback scenarios (Huang et al. 2021), new prospects for estimating the magnitude of baryonic effects in the real Universe will open with the imminent improvement in weak lensing surveys and in the joint analysis of gas observables (kinetic and thermal Sunyaev–Zel’dovich effects, and X-ray gas fractions; Schneider et al. 2021; Tröster et al. 2021).

Due to this variety in predictions, the impact of baryonic physics currently should not be modelled based on a single simulation, and instead the focus is on creating flexible and general ways to model these effects. This motivation, along with the need to leverage the larger statistical power of gravity-only simulations (at lower computational cost), has triggered the development of several algorithms to modify the mass field in the presence of baryons. Such approaches are found in several extensions of the halo-model (Semboloni et al. 2011; Mohammed et al. 2014; Fedeli 2014; Mead et al. 2015a; Debackere et al. 2019); in terms of response functions calibrated using Separate Universe simulations (Barreira et al. 2019); displacing particles according to the expected gas pressure (Dai et al. 2018); or even using machine learning (Tröster et al. 2019; Villaescusa-Navarro et al. 2020).

In the same spirit as galaxy formation models discussed above, another avenue is to model the impact of baryons directly from the dark matter field. Among such models is the baryonic correction model (Schneider and Teyssier 2015; Schneider et al. 2018; Aricò et al. 2020). An advantage of this approach is that it does not depend on a specific set of simulations and/or observables, that its free parameters can be marginalised over to place cosmological constraints, and that they offer the opportunity to predict multiple observables that depend on the gas distribution, such as the Sunyaev–Zel’dovich effect or the X-ray emission in galaxy clusters. Currently, these models appear flexible and accurate enough to be useful in the interpretation future datasets (Aricò et al. 2021b). Nevertheless, with the advent of more realistic and precise hydrodynamical simulations, these models are expected to be refined and improved, especially concerning the way that baryonic effects are modelled and parameterised, and improved physical priors on baryonic parameters.

10 Overview of state-of-the-art simulations and challenges

In this section, we will provide a brief overview of the largest gravity-only calculations to-date (i.e., by mid 2021), which unveil the most detailed views of nonlinear structure formation on large scales. We then discuss the challenges in carrying out these calculations, in obtaining predictions as a function of cosmology, and in the dissemination and sharing of their results and data products.

10.1 State-of-the-art simulations

The rapid progress of large-scale structure surveys has dramatically pushed the need for on-par advances in the accuracy of theoretical modelling, and particularly so in cosmological simulations. Roughly speaking, it is necessary to simulate volumes similar to those observed with a mass resolution that allows to resolve, as a minimum, the halos expected to host galaxies targeted by observations but ideally to resolve the full merger history of those halos. For instance, the EUCLID survey will map a volume of 60 Gpc\(^3\) and detect star-forming galaxies expected to reside in halos as small as \(10^{10}\)\(10^{11} h^{-1}{\mathrm{M}}_{ \odot }\). This motivates ever larger and more accurate cosmological simulations.

The trend observed by Springel et al. (2005) in 2005 that the number of particles in N-body simulations doubled every 16.5 months has somewhat flattened since but nevertheless cosmological simulations are at the forefront of what is possible with state-of-the-art infrastructure in national and international supercomputing centers. For instance, many cosmological codes have won or have been finalists to the Gordon Bell Performance Prize over the last 30 years, which is awarded for outstanding achievements in high-performance computing. Nowadays there are multiple simulations with more than a trillion particles—several hundreds of times more than those used by the iconic Millennium simulation. In Table 1 we compile a summary of recent efforts, all of which notably adopt different parallelisation and optimisation strategies as well as algorithms for the gravity calculation.

Table 1 List of cosmological simulations with a particle number in excess of 1 trillion (\(10^{12}\))

The first calculation to reach the 1-trillion (\(10^{12}\)) particle milestone was the Dark Sky simulation (Skillman et al. 2014) in 2014. This simulation evolved a \(10'240^{3}\) particle distribution from redshift 93 down to redshift zero in a \(8 h^{-1}\mathrm{Gpc}\) cubic volume. This simulation was carried out with 20,000 CPUs of the Titan Supercomputer at the Oak Ridge National Laboratory in USA. Gravitational forces were computed with the 2HOT code (Warren 2013) using a tree algorithm with multipole expansion, with a background subtraction to improve the performance at high redshifts, and a dual tree traversal specifying cell-cell interactions in which multiple particles share the same interaction list. A key aspect enabling this simulation was the usage of Graphic Processing Units (GPUs) and many optimisations via SSE or AVX vector instructions. Additionally, catalogues of dark matter halos as well as a light-cone were constructed during the runtime of the simulation.

The trillion particle mark was also reached by the TianNu simulation using a different N-body code, \(\mathrm{CUBEP^3M}\) (Harnois-Déraps et al. 2013), which combines a 2-level PM with direct summation below the grid scale and including compression for the internal representation of coordinates (Yu et al. 2018). This simulation employed \(86\%\) (331,776 CPUs) of the second largest supercomputer in the world at that time (4th place in early 2021): Tianhe-2 in China. The TianNu simulated the cosmic density field with \(2.97\times 10^{12}\) particles in a \(1.2 h^{-1}\mathrm{Gpc}\) box which included the effects of massive neutrinos. The base simulation contained \(6912^3\) CDM particles, and at \(z=5\) a new set of \(13824^3\) particles representing \(M_\nu = 0.05eV\) neutrinos were added with which subtle differences between CDM and neutrinos were studied (Inman et al. 2015; Yu et al. 2017a).

In the same year, 2017, the Euclid Flagship simulation (Potter et al. 2017) was completed—the first multi-trillion particle simulation with high force and mass resolution. The simulation comprised \(10'000^3\) particles in a \(3 h^{-1}\mathrm{Gpc}\) box aimed at the preparation for the EUCLID mission. This simulation employed the PKDGRAV-3 code (Potter and Stadel 2016) which uses a binary tree with FMM expansion for the force calculation. As for other large simulations, it was possible thanks to the use of GPUs speeding up the gravity calculation.

The Outer Rim was another simulation that reached the trillion particle count. It simulated a region of \(3 h^{-1}\mathrm{Gpc}\) across using the HACC code (Habib et al. 2016) and a half million of cores of the Mira BG/Q supercomputer at the Argonne Leadership Computing Facility in the USA. The \(10'240^3\) particles of \(10^9 h^{-1}{\mathrm{M}}_{ \odot }\) were initialized at \(z=200\) with 1LPT and later evolved generating approximately 5PB of data. The same group also used HACC and the Mira supercomputer in 2020 to create another 1.24 trillion particle simulation, the Last Journey (Heitmann et al. 2021), which shares many specifics with Outer Rim but covers a slightly larger \(3.4 h^{-1}\mathrm{Gpc}\) cubic volume. Complementing these calculations, in 2021 the same group announced the FarPoint simulation (Frontiere et al. 2021) with 1.86 trillion particles in a \(1 h^{-1}\mathrm{Gpc}\) box run on Summit—the 2nd largest supercomputer in the world at the time.

Similar to the Flagship but with slightly better mass resolution is the Uchuu Simulation (Ishiyama et al. 2021), which is a \(2000 h^{-1}\mathrm{Mpc}\) simulation with 2.1 trillion particles (\(12'800^3\)) of mass \(3.3\times 10^8\, h^{-1}{\mathrm{M}}_{ \odot }\) and employing \(4.3h^{-1}\)kpc softening length. The simulation was carried out on the ATERUI II supercomputer in Japan using the GreeM tree-PM code (Ishiyama et al. 2012, 2021), which makes extensive used of SIMD instructions and other low-level optimizations.

The largest simulation to-date is Cosmo-\(\pi \) which was performed in 2020 using the CUBE code on \(20'480\) cores of the Shanghai Jiao Tong University’s \(\pi \) 2.0 supercomputer in China. Cosmo-\(\pi \) simulated a region of \(3.2 h^{-1}\mathrm{Gpc}\) with 4.39 trillion particles with a relatively modest force resolution, 195\( h^{-1}\mathrm{kpc}\). The force was computed with a 2-level particle mesh algorithm (PMPM) (Merz et al. 2005), where a global PM mesh is used to compute long range forces which is complemented by independent local high-resolution PM meshes with isolated boundary condition to estimate short-range forces. The fine mesh can be stored completely locally in case of cubical domain decomposition, which has the advantage of reducing inter-node communication.

Note that the Cosmo-\(\pi \) simulation was carried out on a comparatively modest supercomputer, this thanks to a heavily optimised memory consumption of the CUBE code. This was realised mainly by using fixed-point precision optimization where phase-space coordinates are stored relative to the position and velocity of a coarse grid (Yu et al. 2018). Although this has additional computing demands (due to the compression/decompression), it reduced the required memory requirements from 24 to 6 bytes per particles, which is usually a limiting factor in these large-scale simulations.

10.2 Computational trends and challenges

It is interesting to note that the multi-trillion particle limit has been reached independently by several groups worldwide, employing different codes and different computational strategies. This is a remarkable achievement for the field of cosmological simulations. However, there are some trends and common features which we discuss next.

Firstly, these calculations are usually limited by the time spent in the force calculation. In this regard, it is finally becoming clear that hierarchical trees are sub-optimal compared to fast multipole methods, which display better scaling with particle number, by decreasing the algorithmic complexity of the (short-range) gravity interaction. On the other hand, implementing FMM efficiently (including its parallelisation in distributed memory machines) is significantly more complex, and thus execution times can vary significantly depending on the implementation. It also seems that highly optimized codes with “hand-tuned” parts using SIMD, AVX, and other low-level instructions are worth in this case, although this seems to depend on the target architecture.

Along these lines, the use of co-processors, and GPUs in particular, is proving extremely beneficial for these large calculations. We can see for instance that multi-trillion particle simulations are carried out with up to half-a-million CPUs (e.g. The Last Journey) but with only 4 thousands CPUs when enhanced with GPUs (e.g. Euclid Flagship). Specifically, PKDGRAV claims that factors of 10 speed-ups can be achieved with the use of one GPU per node. The use of GPUs can also give the opportunity to researchers without access to the largest supercomputers in the world to carry out state-of-the-art simulations. This development is also in line with future supercomputer architectures, where exascale supercomputing is likely to be reached first with GPUs with the additional motivation provided by the training of machine learning algorithms.

Some degree of data compression seems to also be useful for these large calculations. This will be even more true in the future where the FLOP-to-byte ratio is expected to increase in future processors, more than the available RAM. Likewise, GPUs usually have much less memory than traditional CPUs, which provides further incentive for adopting some degree of internal compression. This is demonstrated by the Cosmo-\(\pi \) simulation. However, it is unclear what degree of data compression can be afforded when increasing the spatial resolution, and whether this meets the accuracy requirement of upcoming LSS observations.

Another lesson learned from recent simulations is that I/O bandwidth and disk space is an important factor that could limit the usefulness of simulations. For instance, 200’000 CPU hours were needed only to read the initial conditions of the Outer Rim simulation. Perhaps more importantly, the large raw datasets generated (easily in the petabyte scale), restrict the use of these simulations to a limited number of people, which unavoidably hinders their scientific exploitation. For this reason, more and more of the post-processing, including the construction of light-cones, structure identification, merger trees, and even the creation of maps for various observables (e.g. gravitational lensing or of the Sunyaev–Zel’dovich effect) are carried out during the simulation runtime.

Despite the extraordinary combination of volume and mass resolution of the simulations listed in the previous section, none of these multi-trillion particle calculations is actually sufficient to adequately match the characteristics e.g. of the EUCLID or Rubin-DESC surveys. However, the previous track record of continuous improvement and planned advances in supercomputer infrastructure means that it is likely that in the near future at least one calculation achieves the necessary combination of volume and resolution.

Finally, a related challenge for carrying and storing simulation results will be in securing the necessary computational resources per se. As we have argued throughout this review, gravity-only simulations are essential for modern cosmology and for large extragalactic surveys, where they can significantly enhance the scientific output. However, carrying out these simulations is left to individual groups or scientists who might or might not obtain competitively the required computational resources. This unavoidably creates a large degree of uncertainty and hinders a proper design and exploitation of simulations.

10.3 Challenges for data dissemination

State-of-the-art numerical simulations require huge computational efforts in terms of CPU/GPU power and storage facilities. For this reason, historically, simulations were only available to a small subset of the cosmological community. Over the last 10 years, this situation has changed thanks to the adoption of more open data policies together with the cost reduction for hardware and the advance of web technologies.

It is now customary for simulation projects to put considerable effort into making their data publicly available and provide easy access to raw and secondary data products. In fact, the success of a simulation is many times judged not only by its direct scientific impact, but on the breadth of projects they enable and its use by the astronomical community as a whole. Sharing simulation data is also important in the context of cross-validation of simulation codes and results, since it readily allows a comparison and assessment of the accuracy and potential sources of systematic errors. In principle, this extends also to the simulation codes themselves: a public code release strategy can leverage increased scrutiny of the methods employed in any single code, in terms of both correctness and economy.

One of the pioneer efforts in the direction of data dissemination was that of the Virgo Consortium with the Millennium Database (Lemson and Virgo Consortium 2006)Footnote 23. In the context of the German Virtual Observatory, and following the example of the SkyServer in the Sloan Sky Digital Server, the Millennium Database served halo and semi-analytic galaxy catalogues via a relational database accessed trough the standard Structured Query Language (SQL). More recent examples are databases for the Bolshoi simulationsFootnote 24 which have additionally served full particle data, the Australian Theoretical Virtual ObservatoryFootnote 25 which offers several post-processing and visualization tools (Bernyk et al. 2016), and the INDRA databaseFootnote 26 which provides access to a large ensemble of gravity-only simulations (Falck et al. 2021). As an example of the significant value open data policies provide to simulation projects is the fact that even though the Millennium Simulation was carried out more than 15 years ago, it is still today being used in many research projects.

SQL databases are extremely efficient and useful for mining large datasets, however, for many applications it is desirable to serve large catalogues for which SQL is not optimal. In such cases, the increase of worldwide network bandwidths and the web developments have made practical to provide data in excess of tens of Terabytes. One example is http://skiesanduniverses.org/ where data from the Multidark (Klypin et al. 2016), Bolshoi (Klypin et al. 2011), GLAM (Klypin and Prada 2018b), and Uchuu (Ishiyama et al. 2021) simulations can be directly downloaded. Another interesting alternative approach adopted to serve the data of the 1-trillion particles DarkSky simulationFootnote 27 is to directly map data in a given URL to a Python virtual memory buffer. In this way, local Python sessions can directly access data stored in a remote server. The Outer Rim and Last Journey simulations data, instead, was shared directly trough the Globus technologyFootnote 28. Globus is a non-profit research data management that allows triggering secure data transfers among different locations (e.g. personal computers, cloud storage, or supercomputer facilities) via GridFTP. In this way, a subset of the simulated data (halo catalogues and diluted snapshots, etc) can be easily shared with the research community.

The challenges faced by the current generation of simulations will become more acute in the future, with the increase in data volumes and complexity of the whole post-processing pipelines. Thus, it will be important that together with the data, there is a public release of data analysis software and, more generally, further emphasis in the reproducibility of the results. In this regard, the numerical simulation community has a good track record with open software and open comparison projects, all of which should enhance the robustness and quality of any potential cosmological inference derived from the analysis of numerical simulations.

10.4 Challenges for cosmological predictions

Carrying out a single state-of-the-art simulation is challenging, yet if they are to be used for the exploitation of cosmological observations, we need a large number of them covering the full parameter space of plausible cosmological models. This means that the computational resources will be likely in excess of those of single grand-challenge simulations.

Current campaigns aimed at providing predictions as a function of cosmology typical build emulators which are trained using \(\mathcal {O}(100)\) simulations. This number can be decreased by more than an order of magnitude by using cosmology-rescaling algorithms (Angulo and White 2010; Angulo and Hilbert 2015), but still represent an important computational challenge. With this (relatively) low number of simulations, various groups have shown that it is already possible to achieve a \(\lesssim 1\)% accuracy for predicting the nonlinear power spectrum up to scales \(k \sim 1 h\,\mathrm{Mpc}^{-1}\), for all the parameters of a minimal \(\varLambda \)CDM model (Angulo et al. 2021; Euclid Collaboration 2021). At smaller scales, they have an accuracy of 3%, not too different from the accuracy of simulations themselves. The main limitation is the rather restricted cosmological parameter space they can cover, usually much smaller than the priors considered in data analysis. Covering large hyper-volumes quickly increases the number of required calculations and the computational cost associated, thus, probably in the future hybrid emulators will be built which display high accuracy in a given region of interest (e.g. near the best fitting \(\varLambda \)CDM parameters) but have less accuracy elsewhere. This could be achieved by using approximate simulations, physically-motivated extrapolations, or sampling less densely the parameter space.

Although improving the gravity-only predictions on small scales will be an important task, baryons have a much more important impact on these scales and thus for the overall accuracy of the predictions. To address this, also emulators have been built based on physically-motivated models for baryonic effects, which claim that they can also obtain few percent accuracy (Schneider et al. 2020; Aricò et al. 2021a). Although they are close to the target accuracy, they are likely only sufficient for the first round of data analysis of upcoming surveys, but continued improvement in precision will be needed. This is probably within reach as better understanding from hydrodynamical simulations and observations is achieved.

Note that hydrodynamical simulations are sometimes regarded as the “ground truth” and one might think that the best strategy for cosmological exploitation should be based on multiple hydrodynamic calculations with varying cosmology and baryonic parameters. However, it is important to be aware of the risks of this approach. First, most of the simulated physics rely on simplified “sub-grid” physics describing e.g. star formation, and feedback from black holes and supernovae. Thus the accuracy and realism of the cosmological predictions would be limited by these underlying recipes, which frequently are simply ad-hoc parametrisations with significant degrees of freedom that vary considerably among simulation codes and their physics implementations. These prescriptions are often valid only for the physical resolution of a given simulation and require re-tuning when the resolution changes. Second, many of these recipes have been calibrated or validated against observational data, thus there is the risk of a circular argument where observed data is used to calibrate simulations assuming a cosmology and then one uses those simulations to constrain cosmology. Finally, it is ultimately very difficult to estimate, let alone control, all possible sources of uncertainty and consistently propagate them to cosmological constraints. Due to all these reasons, a more conservative approach is to employ physically-motivated models which are informed by, and tested against, a broad range of cosmologies and galaxy formation models including those seemingly ruled out by data.

Reliably predicting galaxy clustering as a function of cosmology is another area where significant modelling uncertainty exists. One option to make such predictions would be to combine these simulations with one or more galaxy models described in Sect. 9.9, so that the cosmological and astrophysical parameters could be jointly constrained. This however, poses several problems. The first is one of computational nature, as the target parameter space would be extremely large which would be a challenging task for emulators. Additionally, even at a fixed cosmology, the calibration of SAMs and empirical models can cost a few hundred thousands of CPU hours. An alternative could be to formulate these with automatic differentiation, for which several optimization algorithms can be employed speeding up parameter sampling considerably. Another option is to split the cosmological and galaxy formation parameters and emulate directly the “Bayesian evidence" of a given cosmology (Lange et al. 2019, 2022).

As for our discussion regarding baryonic physics and hydrodynamical simulations, there is the risk of cosmological biases due to an inaccurate modelling of galaxy formation physics. Another potentially more robust option has been proposed recently in terms of combinations of parametric bias descriptions (motivated by perturbative expansion of galaxy bias) with N-body dynamics, which can be emulated efficiently and could be a robust way to extract information from the quasi-linear regime. In the future the main challenges will be the extension to include peculiar velocities and a self-consistent treatment of higher-order correlations (Pellejero-Ibañez et al. 2021; Banerjee et al. 2021). It is commonly argued that by using measurements of the late-time LSS, one can access a large number of modes in the Universe which can potentially outperform CMB analysis. It is, however, unclear how many of these modes actually have useful information about fundamental aspects of the universe. Likely a modelling based on simulations such as the one here described will ultimately answer this question and determine the maximum amount of cosmological information in the late-time Universe.

Another rich source of information is found in combining observations in different wavelengths and exploiting their cross-correlations, e.g. lensing-SZ, lensing-galaxy clustering, SZ-galaxy clustering, etc. For this, simulations would need to simultaneously predict multiple observables. In this regard, predictions for the halo mass function have also been possible either by emulators or by refined fitting functions. Future challenges will be about incorporating baryonic physics, and also observational errors (e.g. projection effects, off-centering, etc), since cluster cosmology depends sensitively on the observable-mass calibration (e.g. optical richness, X-ray, or Sunyaev–Zel’dovich effects).

In parallel, instead of building emulators, it will perhaps be possible to employ machine learning algorithms to design optimal summary statistics than go beyond classical N-point functions, or to train them to directly generate ensembles of fields with which a full forward modelling could be possible. In this way, the cosmological exploitation could be performed not only at the level of summary statistics, but also at the field level, e.g. incorporating the information in the phases of the cosmic field in Fourier space. However, to realise this possibility, more advances will be required demonstrating the robustness of the approach (against, e.g. observational systematic effects) and the quantification of modelling errors.

Common to all the above is that it will be very important to carefully quantify the emulators’ accuracy and incorporate them into a Bayesian data analysis workflow. This assessment should entail all possible sources of systematic errors, errors arising from the N-body discretization, errors associated with the specific simulation, structure finding, interpolation method, and modelling of the relevant baryonic physics, and the projection to the observable space (redshift space and/or redshift errors). Since these errors are likely to have complex structures in parameter space (being more accurate in some regions and less so in others), they will need to be taken into account in Bayeasian parameter samplers and in the covariance matrices.

As the sophistication and accuracy of cosmological models improve, they will pose stricter requirements for the accuracy of the respective covariance matrices [see Monaco (2016) for a review]. Currently, there are indications that analytic or approximate methods for building covariances are enough for the precision of future LSS surveys and current data models (Barreira et al. 2018; Lippich et al. 2019; Blot et al. 2019; Colavincenzo et al. 2019). However, as these models improve in precision and become valid deep into the nonlinear regime (and for the cross-correlation of different observables), they will require equally accurate covariances and a re-assessment of the accuracy with which they are currently built will be very important.

Ultimately, simulations will be able to model simultaneously a broad range of observables over different wavelengths as a function of cosmology and over a wide set of scales an redshifts. This will also be important as more surveys scan the sky on different observables, the power of cross-correlations will become more evident by breaking degeneracies, identifying potential sources of systematic errors, and, perhaps more importantly, potentially detecting physics beyond \(\varLambda \)CDM from more than a single cosmic probe.

All this will likely be an iterative process between the numerical simulations and observational communities, where better predictions become available, data is interpreted which results into a better assessment of sources of noise which feeds back into more accurate and refined simulations.

11 Conclusions and outlook

After decades of development, modern large-scale cosmological simulations have become a mature field that provides the most precise method for predicting and understanding nonlinear structure in the cosmos. In consequence, numerical simulations are essential in modern cosmology and in the ongoing efforts to understand the nature of our Universe.

Reaching this point required progress in various directions. The first concerns rigorous tests of the basic assumptions behind these calculations: from the impact of the N-body discretisation on results, to the validity of Newtonian limits of General Relativity. The second direction was a continuous improvement of the algorithms and the accuracy of the calculations: from a better understanding of the sources of error in the creation of initial conditions, in time integration, to the treatment of dark matter and baryons as a single fluid, and many others. The third direction concerned a continuous extension of physical models beyond the simplest \(\varLambda \)CDM model: from modifications of gravity to different dark matter models, this also had synergistic effects on the development of new cosmic probes and experiments. All these advances have been supported by a continuous increase in computational power, which allowed simulations of ever bigger cosmological volumes with better and better resolution. In parallel the development of algorithms continued to make best use of new generations of supercomputers, with cosmological codes continuously being among the few in science that can efficiently exploit the largest supercomputers worldwide.

As a consequence, predictions from gravity-only N-body simulations have become very reliable. For instance, different simulation codes, despite using very different numerical approaches for solving the underlying equations, now agree at the sub-percent level on the nonlinear matter distribution down to the internal structure of halos. Systematic comparison and convergence studies have also allowed the field to identify sources of noise and error, to improve the relevant approaches, and to validate simplifications and optimisations. In parallel, an important validation of the N-body approach itself is starting to be possible. Initial results from Lagrangian tessellation discretizations—currently the only viable alternative to N-body—are indicating the validity of N-body when following the gravitational collapse of CDM fluctuations. While there are still some problems that will require further elucidation, these results appear to provide an important support for the predictions of standard cosmological codes.

In contrast, the connection of dark matter with visible objects such as galaxies and quasars, is comparatively still very much more uncertain. Nevertheless, in recent years tremendous progress has been made in the field of galaxy formation and increasing agreement on that front is being quickly achieved. This allowed the development of models for the impact of galaxy formation and baryonic physics on cosmological observables that can be applied to gravity-only simulations. However, some caution is necessary since hydrodynamical simulations usually rely on sub-resolution physics which have a number of parameters that are adjusted to reproduce observations: reproducing observations does not necessarily imply a modelling of the correct effective physics. Their predictive power can only be scrutinized by testing predictions far outside the calibrating space. Nevertheless, these galaxy formation simulations have aided in the development of physically-motivated models that could be applied to data in order to ultimately obtain constraints on cosmological parameters after a marginalization over baryonic physics. In addition, these baryonic models will keep improving in the future, which will naturally feed back into more accurate and deterministic models to be combined with gravity-only simulations.

All the development described throughout this review is preparing N-body simulations for their next big challenge: to be used directly in the data analysis pipeline of upcoming cosmological measurements. In this regard, we have seen fast progress on two fronts. The first is the development of approximate methods which, although not always formally correct, appear to reproduce large-scale statistics of the matter field at a fraction of the computational cost of a standard N-body simulation. The second front is the development of reliable interpolation techniques of N-body results in cosmological parameter space, either by constructing emulators over the outputs of hundreds of N-body results, or by training machine learning algorithms. With the steady growth in computer power, it is likely that we will see increasingly better predictions to the point that it will be possible to create predictions as a function of cosmology that will be indistinguishable from those of direct N-body simulations. This has the potential to significantly increase the scientific output of future extragalactic observations.

If the challenges described above can be successfully solved with high precision and convincing robustness over the next decade, large-scale N-body simulations could become a key piece in our effort to answer some of the most important questions in physics: the nature of dark matter, dark energy, and gravity.