1 Introduction

The current standard cosmological model describes a Universe accelerated by a cosmological constant (\(\varLambda \)) and dominated by cold dark matter (CDM), where structure arose from minute initial perturbations—seeded in the primordial quantum Universe—which collapsed on ever larger scales over cosmic time (Planck Collaboration 2020; Alam et al. 2017; Betoule et al. 2014). The nonlinear interplay of these ingredients have formed the cosmic web and the intricate network of haloes harbouring galaxies and quasars.

Over the last decades, numerical simulations have played a decisive role in establishing and testing this \(\varLambda \)CDM paradigm. Following pioneering work in the 1980s, numerical simulations steadily grew in realism and precision thanks to major advances in algorithms, computational power, and the work of hundreds of scientists. As a result, various competing hypotheses and theories could be compared with observations, guiding further development along the years. Ultimately, \(\varLambda \)CDM was shown to be quantitatively compatible with virtually all observations of the large-scale structure of the Universe, even for those that involve nonlinear physics and that are inaccessible to any method other than computer simulations (see e.g., Springel et al. 2006; Vogelsberger et al. 2020).

Nowadays, simulations have become the go-to tool in cosmology for a number of tasks: (i) the interpretation of observations in terms of the underlying physics and cosmological parameters; (ii) the testing and aiding of the development of perturbative approaches and analytic models for structure formation; (iii) the production of reliable input (training) data for data-driven approaches and emulators; (iv) the creation of mock universes for current and future large-scale extragalactic cosmological surveys, from which we can quantify statistical and systematic errors; (v) the study of the importance of various aspects of the cosmological model and physical processes, and determining their observability.

Despite the remarkable agreement between simulations and the observed Universe, there are indications that \(\varLambda \)CDM might not be ultimately the correct theory (Planck Collaboration 2020; Riess et al. 2019; Asgari et al. 2021; DES Collaboration 2021). Future cosmological observations will provide enough data to test competing explanations by probing exceedingly large sub-volumes of the Universe in virtually all electromagnetic wavelengths, and including increasingly fainter objects and smaller structures (e.g. Laureijs et al. 2011; Bonoli et al. 2021; DESI Collaboration 2016; Ivezić et al. 2019; Ade et al. 2019; Merloni et al. 2012). These observations will be able to test the physics of our Universe beyond the standard model: from neutrino masses, over the nature of dark matter and dark energy, to the inflationary mechanism. Since these observations are intimately connected to the nonlinear regime of structure formation, any optimal exploitation of the cosmological information will increasingly rely on numerical simulations. This will put numerical simulations in the spotlight of modern cosmology: they can either confirm or break the standard \(\varLambda \)CDM paradigm, and therefore will play a key role in the potential discovery of new physics.

The required accuracy and precision to make predictions of the necessary quality poses many important challenges and requires a careful assessment of all the underlying assumptions. This is the main topic of this review; we cover the ample field of cosmological simulations, starting from the fundamental equations to their connection with astrophysical observables, highlighting places where current research is conducted.

1.1 Large-scale simulations

The spatial distribution and dynamics of the large-scale structure give us access to fundamental aspects of the Universe: its composition, the physical laws, and its initial conditions. For instance, the overall shape of the galaxy power spectrum is sensitive to the abundance of baryons and dark matter; the anisotropy induced by redshift-space distortions can constrain the cosmic density-velocity relation which in turn is set by the gravity law; and high-order cumulants of the galaxy field can encode non-Gaussian signatures inherited from an early inflationary period.

To extract this information from observations of the large-scale distribution of galaxies, quasars or other tracers, we must rely on a model that predicts their observed distribution as a function of cosmic time for a given cosmological model. That is, we require a prediction for the distribution of the mass density and the velocity field together with the properties and abundance of collapsed objects. Furthermore, we need to predict how galaxies or other astronomical objects will populate these fields. This is a well-posed but very challenging problem, due to the complexity and nonlinear nature of the underlying physics.

On large scales and/or at early times, density fluctuations are small and the problem can be tackled analytically by expanding perturbatively the relevant evolution equations. However, a rigorous perturbation theory can only be carried out on scales unaffected by shell-crossing effects. On intermediate scales, instead, so far only effective models with several free parameters exist (that are themselves either fitted to simulations or tested with simulations). On smaller very nonlinear scales, the only way to accurately solve the problem is numerically.

We illustrate the complicated nature of the nonlinear universe in Fig. 1, which shows the simulated matter field in a region \(40 h^{-1}\mathrm{Mpc}\) wide. In the top panel we can appreciate the distribution of dark matter halos and their variety in terms of sizes, masses, and shapes. In the middle and bottom panels we show the same region but on much thinner slices which emphasise the small-scale anisotropies and ubiquitous filamentary structure.

Fig. 1
figure 1

The distribution of dark matter in thin slices as predicted by a cosmological N-body simulation. Each panel shows a region \(40 h^{-1}\mathrm{Mpc}\) wide with different levels of thickness—40, 2, and \(0.1 h^{-1}\mathrm{Mpc}\), from top to bottom—which highlight different aspects of the simulated density field, from the distribution of dark matter halos in the top panel, to the filamentary nature of the nonlinear structure in the bottom panel. Image adapted from Stücker et al. (2018)

When one is concerned with the large-scale structure of the Universe, the dynamics is dominated by gravity, and baryons and dark matter can be considered as a single pressureless (cold) fluid. This poses an ideal situation for computer simulations: the initial conditions as well as the relevant physical laws are known and can be solved numerically considering only gravitational interactions. We review in detail the foundations of such numerical simulations in Sects. 2, 3, 4, 5, and 6. Specifically, in Sect. 2 we describe the derivation of the relevant equations solved by numerical simulations, in Sect. 3 how they can be discretised by either an N-body approach or by alternative methods. As we will discuss later, considering different discretisations will be crucial to test the robustness of the predictions from N-body simulations. In Sect. 4 we discuss how to evolve the discretised equations in time, and pay attention to common approaches for computational optimization, whereas in Sect. 5 we discuss various numerical techniques to compute gravitational interactions. Finishing our recap of the main numerical techniques underlying large-scale simulations, in Sect. 6 we discuss several aspects of how to set their initial conditions.

The importance of numerical solutions for structure formation is that they provide currently the only way to predict the highly nonlinear universe, and potentially extract cosmological information from observations at all scales. In contrast, if one is restricted to scales where perturbative approaches are valid and shell-crossing effects are negligible (i.e., have only a sub-per cent impact on summary statistics), then a huge amount of cosmological information is potentially lost.

The primary role of numerical simulations for cosmological constraints has already been demonstrated for several probes focusing mostly on small scales, and in setting constraints on the properties of the dark matter particle, as exemplified by constraints from the Ly-\(\alpha \) forest, the abundance and properties of Milky-Way satellites, and strong lensing modelling. They all rely on a direct comparison of observations with numerical simulations, or with models calibrated and/or validated using numerical simulations. In Sect. 7 we discuss several ways in which the distinctive properties of various potential dark matter candidates can be modelled numerically, including neutralinos, warm dark matter, axions, wave dark matter, and decaying and self-interacting dark matter. In the future, this will also be the case for large-scale clustering, and we discuss the current and potential challenges to be address in Sect. 8.

1.2 Upcoming challenges

Given that simulations are increasingly used for the inference of cosmological parameters, a question of growing importance is therefore how one can demonstrate the correctness of a given numerical solution. For instance, any simulation-based evidence—of massive neutrinos or a departure from GR—would necessarily need to prove its accuracy in modelling the relevant physics in the nonlinear regime. A proper understanding is of paramount importance: simulators need to demonstrate that a potential discovery relying on simulations is not simply a numerical inaccuracy or could be explained by uncertain/ill understood model parameters (i.e. the uncertainties due to all “free” parameters must be quantified). We devote Sect. 7 to this topic.

Unfortunately still only a limited set of exact solutions is known for which strong convergence can be demonstrated. For useful predictions, the correctness of the solution has to be established in a very different, much more non-linear regime. However, far from these relatively simplistic known solutions. There has been significant progress in this direction over the last decade: from large code comparisons, a better understanding of the effect of numerical noise and discreteness errors, the impact of code parameters that control the accuracy of the solution, to the quality of the initial conditions used to set-up the simulations. These tests, however, presuppose that the N-body method itself converges to the right solution for a relatively small number of particles. Clear examples where the N-body approach fails have emerged: most notably near the free-streaming scale in warm dark matter simulations, and in the relative growth of multi-fluid simulations. Very recently, methods that do not rely on the N-body approach have become possible which have allowed to test for such errors. Although so far no significant biases have been measured for the statistical properties of CDM simulations, many more careful and systematic comparisons will be needed in the future.

In parallel, there has been significant progress in the improved modelling of the multi-physics aspect of modern cosmological simulations. For instance, accurate two-fluid simulations are now possible capturing the distinct evolution of baryons and cold dark matter, as are simulations that quantify the non-linear impact of massive neutrinos with sophisticated methods. Further, the use of Newtonian dynamics has been tested against simulations solving general relativistic equations in the weak field limit, and schemes to include relativistic corrections have been devised. We discuss all these extensions, which seek to improve the realism of numerical simulations, in Sect. 7.

An important aspect for confirming or ruling out the \(\varLambda \)CDM model will be the design of new cosmic probes that are particularly sensitive to departures from \(\varLambda \)CDM. For this it is important to understand the role of non-standard ingredients on various observables and in structure formation in general. This is an area where rapid progress has been seen with advances in the variety and sophistication of models, for instance regarding the actual nature of dark matter and simulations assuming neutralinos, axions, primordial black holes, or wave dark matter. Likewise, modifications of general relativity as the gravity law, and simulations with primordial non-Gaussianity have also reached maturity.

To achieve the required accuracy and precision that is necessary to make optimal use of upcoming observational data, it is blatantly clear that, ultimately, all systematic errors in simulations must be understood and quantified, and the impact of all the approximations made must be under control. One of them is that on nonlinear scales baryonic effects start to become important since cold collisionless and collisional matter start to separate, and baryons begin being affected by astrophysical processes. Hence, it becomes important to enhance the numerical simulation with models for the baryonic components, either for the formation of galaxies or for the effects of gas pressure and feedback from supernovae or supermassive black holes. We discuss different approaches to this problem in Sect. 9.

In parallel to increasing the accuracy of simulations, the community is focusing on improving their “precision”. Cosmological simulations typically push international supercomputer centers and are among the largest calculations. New technologies and algorithmic advances are an important part of the field of cosmological simulations, and we review this in Sect. 10. We have seen important advances in terms of adoption of GPUs and hardware accelerators and new algorithms for force calculations with improved precision, computational efficiency, and parallelism. Thank to these, state-of-the-art simulations follow regions of dozens of Gigaparsecs with trillions of particles, constantly approaching the level required by upcoming extragalactic surveys.

The field of cosmological simulations have been also reviewed in the last decade by other authors, who have focused on different aspects of the field and from different perspectives. For more details, we refer the reader to the excellent reviews by Kuhlen et al. (2012), Frenk and White (2012), Dehnen and Read (2011), and by Vogelsberger et al. (2020).

1.3 Outline

In the following we briefly outline the contents of each subsection of this review.

Section 2: This section provides the basic set of equations solved by cosmological dark matter simulations. We emphasise the approximations usually adopted by most simulations regarding the weak-field limit of General Relativity and the properties of dark matter. This section sets the stage for various kinds of simulations we discuss afterwards.

Section 3: This section discusses the possible numerical approaches for solving the equations presented in Section 2. Explicitly, we discuss the N-body approach and alternative methods such as the Lagrangian submanifold, full phase-space techniques, and Schrödinger–Poisson.

Section 4: This section derives the time integration of the relevant equations of motion. We discuss the symplectic integration of the dynamics at second order. We also review individual and adaptive timestepping, and the integration of quantum Hamiltonians.

Section 5: We review various methods for computing the gravitational forces exerted by the simulated mass distribution. Explicitly, we discuss the Particle-Mesh method solved by Fourier and mesh-relaxation algorithms, Trees in combination with traditional multipole expansions and Fast multipole methods, and their combination.

Section 6: In this section we outline the method for setting the initial conditions for the various types of numerical simulations considered. Explicitly, we review the numerical algorithms employed (Zel’dovich approximation and higher-order formulations) as formulated in Fourier or configuration space.

Section 7: This section is focused on simulations relaxing the assumption that all mass in the Universe is made out of a single cold collisionless fluid. That is, we discuss simulations including both baryons and dark matter; including neutrinos; assuming dark matter is warm; self-interacting; made out of primordial black holes; and cases where its quantum pressure cannot be neglected on macroscopic scales. We also discuss simulations easing the restrictions that the gravitational law is given by the Newtonian limit of General Relativity, and that the primordial fluctuations were Gaussian.

Section 8 This section discusses the current challenge for high accuracy in cosmological simulations. We consider the role of the softening length, cosmic variance, mass resolution, among others numerical parameters. We also review comparisons of N-body codes and discuss the validity of the N-body discretization itself.

Section 9: This section covers the connection between simulation predictions and cosmological observations. We discuss halo finder algorithms, the building of merger trees, and the construction of ligthcones. We also briefly review halo-occupation distribution, subhalo-abundance-matching, and semi-analytic galaxy formation methods.

Section 10: This section provides a list of state-of-the-art large-scale numerical simulations. We put emphasis in the computational challenge they face in connection with current and future computational trends and observational efforts.

In the final section, we conclude with an outlook for the future of cosmological dark matter simulations.

2 Gravity and dynamics of matter in an expanding Universe

Large-scale dark matter simulations are (mostly) carried out in the weak-field, non-relativistic, and collisionless limit of a more fundamental Einstein–Boltzmann theory. Additionally, since these simulations neglect any microscopic interactions in the collisionless limit right from the start (we will not consider them until we discuss self-interacting dark matter), one operates in the Vlasov-Einstein limit. This is essentially a continuum description of the geodesic motion of microscopic particles since only (self-)gravity is allowed to affect their trajectories. Despite these simplifications, this approach keeps the essential non-linearities of the system, which gives rise to their phenomenological complexity.

In this section, we first derive the relevant relativistic equations of motion in a Hamiltonian formalism. We then take the non-relativistic weak-field limit by considering perturbations around the homogeneous and isotropic FLRW (Friedmann-Lemaître–Robertson–Walker) metric. This Vlasov–Poisson limit yields ordinary non-relativistic equations of motion, with the twist of a non-standard time-dependence due to the expansion of space in a general FLRW space time. Due to the preservation of symplectic invariants, the expansion of space leads to an intimately related contraction of momentum space to preserve overall phase space volume.

With the general equations of motion at hand, we consider the cold limit (owed to a property of cold dark matter (CDM) that is well constrained by observations) which naturally arises since the expansion of space (or rather the compression of momentum space) reduces any intrinsic momentum dispersion of the particle distribution over time. In the cold limit, the distribution function of dark matter takes a particularly simple form of a low-dimensional submanifold of phase space. These discussions are aimed to provide a formal foundation for the equations of motion as well as a motivation for many of the techniques and approximations discussed and reviewed in later sections.

2.1 Equations of motion and the Vlasov equation

Since we are, due to the very weak interactions of dark matter, interested mostly in the collisionless limit, we are essentially looking at freely moving particles in a curved space-time. To describe the motion of these particles, it is much easier to work with Lagrangian coordinates in phase space, i.e., the positions and momenta of particles. In general relativity, we have in full generality an eight dimensional relativistic extended phase space of positions \(x_\mu \) and their conjugate momentaFootnote 1\(p^\mu \). Kinetic theory in curved space time is discussed in many introductory texts on general relativity in great detail (e.g. Ehlers 1971; Misner et al. 1973; Straumann 2013; Choquet-Bruhat 2015 for the curious reader), but mostly without connection to a Hamiltonian structure. For our purposes, we can eliminate one degree of freedom by considering massive particles and neglecting processes that alter the mass of the particles. In that case the mass-shell condition \(p^\mu p_\mu = -m^2 = \mathrm{const}\) holdsFootnote 2 and allows us to reduce dynamics to 3+3 dimensional phase space with one parameter (e.g. time) that can be used to parameterise the motion. Note that we employ throughout Einstein’s summation convention, where repeated indices are implicitly summed over, unless explicitly stated otherwise.

Geodesic motion of massive particles In the presence of only gravitational interactions, the motion of particles in general relativity is purely geodesic by definition. Let us therefore begin by considering the geodesic motion of a particle moving between two points A and B. The action for the motion along a trajectoryFootnote 3 (X(t), P(t)), parametrised by coordinate time t, between the spacetime points A and B is

$$\begin{aligned} S(A,B) = \int _{A}^{B} P_\mu \mathrm{d}X^\mu = \int _A^B \left[ P_0 \mathrm{d}t + P_i \mathrm{d}X^i \right] = \int _A^B\left[ P_i \frac{\mathrm{d}X^i}{\mathrm{d}t} + P_0 \right] \mathrm{d}t. \end{aligned}$$

From Eq. (1), one can immediately read off that the Lagrangian \(\mathscr {L}\) and Hamiltonian \(\mathscr {H}\) of geodesic motion are given by the usual Legendre transform pair (e.g. Goldstein et al. 2002)

$$\begin{aligned} \mathscr {L}:=P_i \frac{\mathrm{d}X^i}{\mathrm{d}t} - \mathscr {H},\qquad \text {with}\qquad \mathscr {H}:=-P_0,\quad \mathrm{d}t := \mathrm{d}X^0 \end{aligned}$$

respectively, meaning that \(P_0\) represents the Hamiltonian itself [as one finds also generally in extended phase space, cf. Lanczos (1986)]. It is easy to show in a few lines of calculation that the coordinate-time canonical equations of motion in curved spacetime are then given by two dynamical equationsFootnote 4 (e.g. Choquet-Bruhat 2015)

$$\begin{aligned} \frac{\mathrm{d}X^\mu }{\mathrm{d}t} = \frac{P^\mu }{P^0} \qquad \text {and}\qquad \frac{\mathrm{d}P_\mu }{\mathrm{d}t} = F_\mu \qquad \text {with}\quad F_\mu := -g_{\alpha \beta ,\mu }(X)\, \frac{P^\alpha P^\beta }{2P^0}. \end{aligned}$$

The Christoffel symbols of the metric simplify here to this simple partial derivative due to the mass-shell condition, but otherwise these two equations are equivalent to the geodesic equation. Note the formal similarity of these equations compared to the non-relativistic equations, with the ‘gravitational interaction’ absorbed into the derivative of the metric. Eqs. (3) determine the particle motion given the metric. The metric in turn is determined by the collection of all particles in the Universe through Einstein’s field equations, which we will address in the next section.

Statistical Mechanics When considering a large number of particles, it is necessary to transition to a statistical description and consider the phase-space distribution (or density) function of particles in phase-space over time, i.e. on \((\varvec{x},\varvec{p},t) \in \mathbb {R}^{3+3+1}\), rather than individual microscopic particle trajectories \((\varvec{X}(t),\varvec{P}(t))\). The on-shell phase space distribution function \(f_{\mathrm{m}}(\varvec{x},\varvec{p},t)\) can be defined e.g. through the particle number, which is a Lorentz scalar, per unit phase space volume. This phase-space density is then related to the energy-momentum tensor as

$$\begin{aligned} T^{\mu \nu } := \frac{1}{\sqrt{|g|}} \int _{\mathbb {R}^3} \mathrm{d}^3p\;f_{\mathrm{m}}(\varvec{x},\varvec{p},t) \,\frac{p^\mu p^\nu }{p^0} , \end{aligned}$$

where g is the determinant of the metric. For purely collisionless dynamics, the evolution of \(f_{\mathrm{m}}\) is therefore determined by the on-shell Einstein–Vlasov equation (e.g. Choquet-Bruhat 1971; Ehlers 1971)

$$\begin{aligned} \widehat{L}_{\mathrm{m}} \,f_{\mathrm{m}} = 0,\qquad \text {with}\qquad \widehat{L}_{\mathrm{m}} := \frac{\partial }{\partial t} + \frac{p^i}{p^0}\frac{\partial }{\partial x^i} + \frac{F_i}{p^0} \frac{\partial }{\partial p_i} \end{aligned}$$

where \(\widehat{L}_{\mathrm{m}}\) is the on-shell Liouville operator in coordinate time. This equation relates Hamiltonian dynamics and incompressibility in phase space: the Vlasov equation is simply the continuum limit of Hamiltonian mechanics with only long-range gravitational interactions (i.e., geodesic motion). This can be seen by observing that particle trajectories \(\left( X^i(t),P_i(t)\right) \)following Eqs. (3) solve the Einstein–Vlasov equation as characteristic curves, i.e. \(\mathrm{d}f_\mathrm{m}\left( X^i(t),P_i(t),t\right) /\mathrm{dt}=0\).

2.2 Scalar metric fluctuations and the weak field limit

Metric perturbations The final step needed to close the equations is made through Einstein’s field equations \(G_{\mu \nu } = 8\pi G T_{\mu \nu }\). The field equations connect the evolution of the phase-space density \(f_{\mathrm{m}}\), which determines the stress-energy tensor \(T^{\mu \nu }\), with the force 1-form \(F_i\), which is determined by the metric. The results presented above are valid non-perturbatively for an arbitrary metric. Here, we shall make a first approximation by considering only scalar fluctuations using two scalar potentials \(\phi \) and \(\psi \)Footnote 5. This approximation is valid if velocities are non-relativistic, i.e. \(\left| P_i/P^0\right| \ll 1\). In this case, the only dynamically relevant component of \(T^{\mu \nu }\) is the time-time component. Let us thus consider the metric (which corresponds to the “Newtonian gauge” with conformal time), following largely the notation of Bartolo et al. (2007),

$$\begin{aligned} \mathrm{d}s^2=a^2(\tau )\left[ -\exp (2\psi )\,\mathrm{d}\tau ^2 + \exp (-2\phi )\,\mathrm{d}x^i \mathrm{d}x_i \right] \end{aligned}$$

where x are co-moving coordinates. The metric determinant is given by \(\sqrt{|g|}=a^4\exp \left( \psi -3\phi \right) \).

The kinetic equation in GR is simply a geodesic transport equation and will thus only depend on the gravitational “force” 1-form \(F_i\), which can be readily computed for this metric to be

$$\begin{aligned} F_i = a^2 p^0 \exp \left( -2\psi \right) \left( \psi _{,i}+\phi _{,i}\right) - \frac{m^2}{p^0} \phi _{,i} \end{aligned}$$

If the vector and tensor components are non-relativistic [see e.g. Kopp et al. (2014), Milillo et al. (2015) for a rigorous derivation of the Newtonian limit], we are left only with a constraint equation from the time-time component of the field equations. The time-time component of the Einstein tensor \(G_{\mu \nu }\) is found to be

$$\begin{aligned} G^0{}_{0} = \frac{\exp (2\phi )}{a^2} \left( (\nabla \phi )^2 -2\nabla ^2\phi \right) - 3 \frac{\exp (-2\psi )}{a^2} \left( \frac{a^{\prime}}{a}-\phi ^{\prime}\right) ^2, \end{aligned}$$

where a prime indicates a derivative w.r.t. \(\tau \). Inserting this in the respective field equation and performing the weak field limit (i.e. keeping only terms up to linear order in the potentials) one finds the following constraint equation

$$\begin{aligned} -3\mathcal {H}\left( \phi ^{\prime}+\mathcal {H}\psi \right) + \nabla ^2 \phi + \frac{3}{2}\mathcal {H}^2 = 4\pi G a^2 \rho , \end{aligned}$$

where \(\rho :=T^0{}_{0}\), \(\mathcal {H}:= a^{\prime}/a\), and G is Newton’s gravitational constant. Note that this equation alone does not close the system, since we have no evolution equation for \(a(\tau )\) yet.

Separation of background and perturbations The usual assumption is that backreaction can be neglected, i.e. the homogeneous and isotropic FLRW case is recovered with \(\phi ,\psi \rightarrow 0\) and density \(\rho \rightarrow \overline{\rho }(\tau )\). In this case, \(a(\tau )\) is given by the solution of this equation in the absence of perturbations which becomes the usual Friedmann equation

$$\begin{aligned} \mathcal {H}^2 = \frac{8\pi G}{3} a^2 \overline{\rho }\qquad \text {with }\qquad \overline{\rho } =: \left( \varOmega _{\mathrm{r}} a^{-4}+\varOmega _\nu (a)+\varOmega _{\mathrm{m}} a^{-3} + \varOmega _{\mathrm{k}} a^{-2} + \varOmega _\varLambda \right) \rho _{\mathrm{c},0}, \end{aligned}$$

where \(\rho _{\mathrm{c},0} := \frac{3H_0^2}{8 \pi G}\) is the critical density of the Universe today, \(H_0\) is the Hubble constant, and the \(\varOmega _{X\in \{\mathrm{r},\nu ,\mathrm{m},\mathrm{k},\varLambda \}}\) are the respective density parameters of the various species in units of this critical density (at \(a=1\)). Note that massive neutrinos \(\varOmega _\nu (a)\) have a non-trivial scaling with a (see Sect. 7.8.2 for details). In the inhomogeneous case one can subtract out this FLRW evolution—neglecting by doing so any non-linear coupling, or ‘backreaction’, between the evolution of a and the inhomogeneities—and finds finally

$$\begin{aligned} -3\mathcal {H}\left( \phi ^{\prime}+\mathcal {H}\psi \right) + \nabla ^2 \phi = 4\pi G a^2 (\rho -\overline{\rho }), \end{aligned}$$

This is an inhomogeneous diffusion equation (cf. e.g., Chisari and Zaldarriaga 2011; Hahn and Paranjape 2016) reflecting the fact that the gravitational potential does not propagate instantaneously in an expanding Universe so that super-horizon scales, where density evolution is gauge-dependent, are screened.

2.3 Newtonian cosmological simulations

Newtonian gravity In the absence of anisotropic stress the two scalar potentials in the metric (6) coincide and one has \(\psi =\phi \). One can further show that on sub-horizon scales [where \(\rho \) must be gauge independent, see e.g. Appendix A of Hahn and Paranjape (2016)] one then recovers from Eq. (11) the non-relativistic Poisson equation,

$$\begin{aligned} \nabla ^2 \phi = 4\pi G a^2 (\rho -\overline{\rho }). \end{aligned}$$

Note that this Poisson equation is however a priori invalid on super-horizon scales. Formally, when carrying out the transformation that removed the extra terms from Eq. (11), the Poisson source \(\rho \) has been gauge transformed to the synchronous co-moving gauge. If a simulation is initialised with density perturbations in the synchronous gauge and other quantities are interpreted in the Newtonian gauge, then the Poisson equation consistently links the two. In addition, we have in the non-relativistic weak field limit that \(p^0=a^{-1} m\) so that we also recover the Newtonian force law

$$\begin{aligned} F_i \rightarrow -am\phi _{,i}\,. \end{aligned}$$

Note that such gauge mixing can be avoided and horizon-scale effects can be rigorously accounted for by choosing a more sophisticated gauge (Fidler et al. 2016, 2017b) in which the force law is required to take the form of Eq. (13) and coordinates and momenta are interpreted self-consistently in this ‘Newtonian motion’ gauge to account for leading order relativistic effects. A posteriori gauge transformations exist to relate gauge-dependent quantities, but remember that observables can never be gauge dependent.

Non-relativistic moments For completeness and reference, we also give the components of the energy-momentum tensor (4) as moments of the distribution function \(f_{\mathrm{m}}\) in the non-relativistic limit

$$\begin{aligned} \rho := T^0{}_{0}= & {} \frac{m}{a^{3}} \int _{\mathbb {R}^3} \mathrm{d}^3p\,f_{\mathrm{m}}(\varvec{x},\varvec{p},t)\, \end{aligned}$$
$$\begin{aligned} \pi _i := T^0{}_{i}= & {} \frac{1}{a^4}\int _{\mathbb {R}^3} \mathrm{d}^3p\,f_{\mathrm{m}}(\varvec{x},\varvec{p},t) \,p_i\, \end{aligned}$$
$$\begin{aligned} \varPi _{ij} := T_{ij}= & {} \frac{1}{ma^5} \int _{\mathbb {R}^3} \mathrm{d}^3p\,f_{\mathrm{m}}(\varvec{x},\varvec{p},t) \,p_i \,p_j\,, \end{aligned}$$

defining the mass density \(\rho \), momentum density \(\varvec{\pi }\) and second moment \(\varPi _{ij}\), which is related to the stress tensor as \(\varPi _{ij} - \pi _i \pi _j / \rho \).

The equations solved by standard N-body codes. Finally, the equations of motion in cosmic time \(\mathrm{d}t = a\, \mathrm{d}\tau \), assuming the weak-field non-relativistic limit, are

$$\begin{aligned} \frac{\mathrm{d}{X}^i}{\mathrm{d}t} = \frac{P^i}{m}=\frac{{P}_i}{ma^2}\qquad \text {and}\qquad \frac{\mathrm{d}{P}_i}{\mathrm{d}t} = -m\frac{\partial \phi }{\partial X^i} \end{aligned}$$

with the associated Vlasov–Poisson system

$$\begin{aligned}&\frac{\partial f_{\mathrm{m}}}{\partial t} + \frac{{p}_i}{ma^2} \,\frac{\partial f_{\mathrm{m}}}{\partial {x}^i} - m\,\frac{\partial \phi }{\partial x^i} \,\frac{\partial f_\mathrm{m}}{\partial p_i} =0 \end{aligned}$$
$$\begin{aligned}&\nabla ^2\phi = 4\pi G a^2 (\rho - \overline{\rho }) \end{aligned}$$
$$\begin{aligned}&\rho = \frac{m}{a^{3}} \int _{\mathbb {R}^3} \mathrm{d}^3p\,\,f_\mathrm{m}(\varvec{x},\varvec{p},t) \end{aligned}$$

where \(\overline{\rho }(t)\) is the spatial mean of \(\rho \) that is used also in the Friedmann equation \(\mathcal {H}^2 = \frac{8\pi G}{3} a^2 \overline{\rho }\) which determines the evolution of a(t). It is convenient to change to a co-moving matter density \(a^{-3}\rho \), eliminating several factors of a in these equations. In particular, the Poisson equation can be written as

$$\begin{aligned} \nabla ^2 \phi = \frac{3}{2}H_0^2 \varOmega _{\mathrm{m}} \delta / a, \end{aligned}$$

if gravity is sourced by matter perturbations alone so that \(\rho (\varvec{x},t) = (1+\delta (\varvec{x},t))\,\varOmega _{\mathrm{m}} \rho _{\mathrm{c},0} a^{-3}\). Note that we have also introduced here the fractional overdensity \(\delta := \rho /\overline{\rho }-1\).

2.4 Post-Newtonian simulations

While traditionally all cosmological simulations were carried out in the non-relativistic weak-field limit, neglecting any back-reaction effects on the metric, the validity and limits of this approach have been questioned (Ellis and Buchert 2005; Heinesen and Buchert 2020). In addition, with upcoming surveys reaching horizon scales, relativistic effects need to be quantified and accounted for correctly. Since such effects are only relevant on very large scales, where perturbations are assumed to be small, various frameworks have been devised to interpret the outcome of Newtonian simulations in relativistic context (Chisari and Zaldarriaga 2011; Hahn and Paranjape 2016), which suggested in particular that some care is necessary in the choice of gauge when setting up initial conditions. Going even further, it turned out to be possible to define specific fine-tuned gauges, in which the gauge-freedom is used to absorb relativistic corrections, so that the equations of motion are strictly of the Newtonian form (Fidler et al. 2015; Adamek et al. 2017a; Fidler et al. 2017a). This approach requires only a modification of the initial conditions and a re-interpretation of the simulation outcome. Alternatively, relativistic corrections can also be included at the linear level by adding additional large-scale contributions computed using linear perturbation theory to the gravitational force computed in non-linear simulations (Brandbyge et al. 2017).

Going beyond such linear corrections, recently the first Post-Newtonian cosmological simulations have been carried out (Adamek et al. 2013, 2016) which indicated however that back-reaction effects are small and likely irrelevant for the next generation of surveys. Most recently, full GR simulations are now becoming possible (Giblin et al. 2016; East et al. 2018; Macpherson et al. 2019; Daverio et al. 2019) and seem to confirm the smallness of relativistic effects. The main advantage of relativistic simulations is that relativistic species, such as neutrinos, can be included self-consistently. In all cases investigated so-far, non-linear relativistic effects have however appeared to be negligibly small on cosmological scales. However, such simulations will be very important in the future to verify the robustness of standard simulations regarding relativistic effects on LSS observables (e.g. gravitational lensing, gravitational redshifts, e.g. Cai et al. 2017, or the clustering of galaxies on the past lightcone, e.g. Breton et al. 2019; Guandalin et al. 2021; Lepori et al. 2021; Coates et al. 2021, all of which have been proposed as tests of gravitational physics on large scales).

2.5 Cold limit: the phase-space sheet and perturbation theory

The cold limit All observational evidence points to the colder flavours of dark matter (but see Sect. 7 for an overview over various dark matter models). A key limiting case for cosmological structure formation is therefore that of an initially perfectly cold scalar fluid (i.e. vanishing stress and vorticity). In this case, the dark matter fluid is at early enough times fully described by its density and (mean) velocity field, which is of potential nature. The higher order moments (14c) are then fully determined by the lower order moments (14a14b) and the momentum distribution function at any given spatial point is a Dirac distribution so that \(f_{\mathrm{m}}\) is fully specified by only two scalar degrees of freedom, a density \(n(\varvec{x})\), and a velocity potential, \(S(\varvec{x})\), at some initial time, i.e.

$$\begin{aligned} f_{\mathrm{m}}(\varvec{x},\varvec{p},t) = n(\varvec{x},t)\;\delta _D\left( \varvec{p} - m \nabla _x S(\varvec{x},t) \right) . \end{aligned}$$

Since S is differentiable, it endows phase space with a manifold structure and the three-dimensional hypersurface of six-dimensional phase space on which f is non-zero is called the ‘Lagrangian submanifold’. In fact, if at any time one can write \(\varvec{p}=m\varvec{\nabla }S\), then Hamiltonian mechanics guarantees that the Lagrangian submanifold preserves its manifold structure, i.e. it never tears or self-intersects. It can however fold up, i.e. lead to a multi-valued field \(S(\varvec{x},t)\), invalidating the functional form (18). Prior to such shell-crossing events (as is the case at the starting time of numerical simulations) this form is, however, perfectly meaningful and by taking moments of the Vlasov equation for this distribution function, one obtains a Bernoulli–Poisson system which truncates the infinite Boltzmann hierarchy already at the first moment, leaving only two equations (Peebles 1980) in terms of the density contrast \(\delta = n/\overline{n}-1\) and the velocity potential S,

$$\begin{aligned} \frac{\partial \delta }{\partial t} + a^{-2} \varvec{\nabla }\cdot \left( (1+\delta ) \varvec{\nabla }S\right) =0 \qquad \text {and}\qquad \frac{\partial S}{\partial t} + \frac{1}{2a^2} \left( \varvec{\nabla }S\right) ^2 + \phi = 0, \end{aligned}$$

supplemented with Poisson’s equation (Eq. 17). Note that this form brings out also the connection to Hamilton-Jacobi theory. After shell-crossing, this description breaks down, and all moments in the Boltzmann hierarchy become important.

Eulerian perturbation theory For small density perturbations \(|\delta |\ll 1\), it is possible to linearise the set of equations (19). One then obtains the ODE governing the linear instability of density fluctuations

$$\begin{aligned} \delta ^{\prime \prime } + \mathcal {H} \delta ^\prime - \frac{3}{2}H_0^2 \varOmega _{\mathrm{m}} a^{-1} \delta = 0. \end{aligned}$$

The solutions can be written as \(\delta (\varvec{x},\tau ) = D_+(\tau ) \delta _+(\varvec{x}) + D_-(\tau ) \delta _-(\varvec{x})\) and in \(\varLambda \)CDM cosmologies given in closed form (Chernin et al. 2003; Demianski et al. 2005) as

$$\begin{aligned} D_+(a) = a \, {}_{2}F_1\left( \frac{1}{3},\, 1,\, \frac{11}{6};\,-f_\varLambda (a)\right) \qquad \text {and}\qquad D_-(a)=\sqrt{1+f_\varLambda (a)} \,a^{-\frac{3}{2}}, \end{aligned}$$

where \( f_\varLambda := \varOmega _\varLambda / (\varOmega _{\mathrm{m}} a^{-3})\), and \({}_2F_1\) is Gauss’ hypergeometric function. In general cases, especially in the presence of trans-relativistic species such as neutrinos, Eq. (20) needs to be integrated numerically however. Moving beyond linear order, recursion relations to all orders in perturbations of Eqs. (19) have been obtained in the 1980s (Goroff et al. 1986) and provide the foundation of standard Eulerian cosmological perturbation theory [SPT; cf. Bernardeau et al. (2002) for a review].

Lagrangian perturbation theory Alternatively to considering the Eulerian fields, the dynamics can be described also through the Lagrangian map, i.e. by considering trajectories \(\varvec{x}(\varvec{q},t) = \varvec{q} + \varvec{\varPsi }(\varvec{q},t)\) starting from Lagrangian coordinates \(\varvec{q} = \varvec{x}(\varvec{q},t=0)\). It becomes then more convenient to write the distribution function (18) in terms of the Lagrangian map, i.e.

$$\begin{aligned} f_{\mathrm{m}}(\varvec{x},\varvec{p},\tau ) = \delta _D\left( \varvec{x} - \varvec{q}-\varvec{\varPsi }(\varvec{q},\tau )\right) \;\delta _D\left( \varvec{p} - m a \varvec{\varPsi }^\prime (\varvec{q},t) \right) . \end{aligned}$$

Mass conservation then implies that the density is given by the Jacobian \(\mathrm{J} := \det J_{ij} := \det \partial x_i/\partial q_j\) as

$$\begin{aligned} 1+\delta (\varvec{q},\tau ) = \left| \mathrm{J}\right| ^{-1} := \left| \det \varvec{\nabla }_q \otimes \varvec{x} \right| ^{-1} = \left| \det \delta _{ij} + \partial \varPsi _i / \partial q_j \right| ^{-1}, \end{aligned}$$

which is singular if any eigenvalue of \(\varvec{\nabla }_q\otimes \varvec{x}\) vanishes. This is precisely the case when shell crossing occurs. The canonical equations of motion (15) can be combined into a single second order equation, which in conformal time reads

$$\begin{aligned} \varvec{x}^{\prime \prime } +\mathcal {H}\varvec{x}^\prime + \varvec{\nabla }_x \phi (\varvec{x}) = 0, \end{aligned}$$

where we now consider trajectories not for single particles but for the Lagrangian map, i.e. \(\varvec{x}=\varvec{x}(\varvec{q},t)\). By taking its divergence, this can be rewritten as an equation including only derivatives w.r.t. Lagrangian coordinates

$$\begin{aligned} \mathrm{J}\left( \delta _{ij}+\varPsi _{i,j}\right) ^{-1} \left( \varPsi _{i,j}^{\prime \prime }+\mathcal {H}\varPsi _{i,j}^\prime \right) = \frac{3}{2}\mathcal {H}^2 \varOmega _{\mathrm{m}}\left( \mathrm{J} - 1 \right) . \end{aligned}$$

In Lagrangian perturbation theory (LPT), this equation is then solved perturbatively using a truncated time-Taylor expansion of the form \(\varvec{\varPsi }(\varvec{q},\tau ) = \sum _{n=1}^\infty D(\tau )^n \varvec{\varPsi }^{(n)}(\varvec{q})\) (Buchert 1989, 1994; Bouchet et al. 1995; Catelan 1995). At first order \(n=1\), restricting to only the growing mode, one finds the famous Zel’dovich approximation (Zel’dovich 1970)

$$\begin{aligned} \varvec{x}(\varvec{q},\tau ) = \varvec{q} + D_+(\tau ) \varvec{\nabla }_q \nabla _q^{-2} \delta _+(\varvec{q}), \end{aligned}$$

where \(\delta _+(\varvec{q})\) is, as above, the growing mode spatial fluctuation part of SPT. All-order recursion relations have also been obtained for LPT (Rampf 2012; Zheligovsky and Frisch 2014; Matsubara 2015). LPT solutions are of particular importance for setting up initial conditions for simulations. Both SPT and LPT are valid only prior to the first shell-crossing since the (pressureless) Euler–Poisson limit of Vlasov–Poisson ceases to be valid after. This can be easily seen by considering the evolution of a single-mode perturbation in the cold self-gravitating case, as shown in Fig. 2. Prior to shell-crossing, the mean-field velocity \(\langle \varvec{v}\rangle (\varvec{x},\,t)\) coincides with the full phase-space description \(\varvec{p}(\varvec{x}(\varvec{q};\,t);\,t)/m\). Then the DF from Eq. (18) guarantees that the (Euler/Bernoulli-) truncated mean field fluid equations describe the full evolution of the system. This is no longer valid after shell-crossing, accompanied by infinite densities where \(\det \partial x_i/\partial q_j=0\), when the velocity becomes multi-valued. Nevertheless, LPT provides an accurate bridge between the early Universe and that at the starting redshift of cosmological simulations, which can then be evolved further deep into the nonlinear regime. This procedure will be discussed in detail in Sect. 6.

Heuristic models based on PT Additionally, perturbation theory has been used as the backbone for approximate, but computationally extremely fast, descriptions of the nonlinear structure [see Monaco (2016) or a review and Chuang et al. (2015b) for a comparison of approaches]. These methods overcome the shell-crossing limitation of perturbation theory in different ways. The introduction of a viscuous term in the adhesion model (Gurbatov et al. 1985; Kofman and Shandarin 1988; Kofman et al. 1992) prevents the crossing of particle trajectories. In an alternative approach, Kitaura and Hess (2013) replaced the LPT displacement field on small-scales by values motivated from spherical collapse (Bernardeau 1994; Mohayaee et al. 2006). A similar idea is implemented in Muscle (Neyrinck 2016; Tosone et al. 2021). Numerous models have been developed based on an extension of the predictions of LPT via various small-scale models that aim to capture the collapse into halos or implement empirical corrections: Patchy (Kitaura et al. 2014), PTHalos (Scoccimarro and Sheth 2002; Manera et al. 2013), EZHalos (Chuang et al. 2015a), HaloGen (Avila et al. 2015). Finally, Pinocchio (Monaco et al. 2002, 2013) and WebSky (Stein et al. 2020) both combine LPT displacements with an ellipsoidal collapse model and excursion set theory to predict the abundance, mass accretion history, and spatial distribution of halos. The low computational cost of these approaches makes them useful for the creation of large ensembles of “simulations” designed at constructing covariance matrices for large-scale structure observations, or for a direct modelling of the position and redshift of galaxies. However, due to their heuristic character, their predictions need to be constantly quantified and validated with full N-body simulations.

Fig. 2
figure 2

Evolution of a single-mode perturbation from early times (\(a=0.1\), left panels), through shell-crossing (at \(a=1\), middle panels), to late times (\(a=10\), right panels). The top row shows the phase space, where the cold distribution function occupies only a one-dimensional line. Density is shown in the middle row, with singularities of formally infinite density appearing at and after the first shell-crossing. The bottom panels show the mean fluid velocity (\(\langle \varvec{v}\rangle = \varvec{\pi }/\rho \), Eq. 14b), which is identical to the phase-space diagram up to the first shell-crossing, but develops a complicated structure with discontinuities in the multi-stream region (indicated by green shading). Since the distribution function has a manifold structure, its tangent space (indicated in orange) can be evolved in a “geodesic deviation” equation, or it can be approximated by tessellations. Caustics appear where \(\partial x/\partial q = 0\). N-body simulation do not track this manifold structure, and sample only the distribution function

2.6 Deformation and evolution in phase space

The canonical equations of motion describe the motion of points in phase space over time. Moving further, it is also interesting to consider the evolution of an infinitesimal phase space volume, spanned by \((\mathrm{d}\varvec{x},\mathrm{d}\varvec{p})\), which is captured by the “geodesic deviation equation”.

As we have just discussed, in the cold case, the continuum limit leads to all mass occupying a thin phase sheet in phase space and one can think of the evolution of the system as the mapping \(\varvec{q}\mapsto (\varvec{x},\varvec{p})\) between Lagrangian and Eulerian space (cf. Fig. 2). One can take the analysis of deformations of pieces of phase space one level further beyond the cold case by considering a general mapping of phase space onto itself, i.e. \((\varvec{q},\varvec{w})\mapsto (\varvec{x},\varvec{p})\) (but note that this definition is formally not valid as \(a\rightarrow 0\) since in the canonical cosmological case the momentum space blows up in this limit). The associated phase-space Jacobian matrix \(\mathsf{\varvec {D}}\), which reflects the effect in Eulerian space of infinitesimal changes to the Lagrangian coordinates, is

$$\begin{aligned} \mathsf{\varvec {D}} := \frac{\partial (\varvec{x},\varvec{p})}{\partial (\varvec{q},\varvec{w})} = \begin{bmatrix} \frac{\partial \varvec{x}}{\partial \varvec{q}} &{} \frac{\partial \varvec{x}}{\partial \varvec{w}} \\ \frac{\partial \varvec{p}}{\partial \varvec{q}} &{} \frac{\partial \varvec{p}}{\partial \varvec{w}} \end{bmatrix} =: \begin{bmatrix} \mathsf{\varvec {D}}_{\mathrm{xq}} &{} \mathsf{\varvec {D}}_{\mathrm{xw}} \\ \mathsf{\varvec {D}}_{\mathrm{pq}} &{} \mathsf{\varvec {D}}_{\mathrm{pw}} \end{bmatrix}, \end{aligned}$$

where in the last equality, we have split the 6D tensor into four blocks. Its dynamics are fully determined by the canonical equations of motion, which we can obtain after a few steps as (Habib and Ryne 1995; Vogelsberger et al. 2008).

$$\begin{aligned} \dot{\mathsf{\varvec {D}}} = \frac{\partial (\dot{\varvec{x}},\dot{\varvec{p}})}{\partial (\varvec{q},\varvec{w})} = \left( \begin{bmatrix} \varvec{\nabla }_x\otimes \varvec{\nabla }_p &{} \varvec{\nabla }_p\otimes \varvec{\nabla }_p \\ - \varvec{\nabla }_x\otimes \varvec{\nabla }_x &{} \varvec{\nabla }_p\otimes \varvec{\nabla }_x \end{bmatrix} \mathscr{H}\right) \cdot \mathsf{\varvec {D}} =: \mathsf{\varvec {H}}\cdot \mathsf{\varvec {D}}. \end{aligned}$$

This is equation is called the “geodesic deviation equation” (GDE) in the literature and it quantifies the relative motion in phase space along the Hamiltonian flow. For separable Hamiltonians \(\mathscr {H}=T(\varvec{p},t)+V(\varvec{x},t)\), the coupling matrix \(\mathsf{\varvec {H}}\) becomes

$$\begin{aligned} \mathsf{\varvec {H}} = \begin{bmatrix} \mathsf{\varvec {0}}&{} (\varvec{\nabla }_p\otimes \varvec{\nabla }_p)T \\ -(\varvec{\nabla }_x\otimes \varvec{\nabla }_x)V &{} \mathsf{\varvec {0}} \end{bmatrix}, \quad \text {and in cosmic time:}\quad \mathsf{\varvec {H}} = \begin{bmatrix} \mathsf{\varvec {0}} &{} \frac{1}{ma^{2}} \delta _{ij} \\ -m\phi _{,ij} &{} \mathsf{\varvec {0}} \end{bmatrix}, \end{aligned}$$

and shows a coupling to the gravitational tidal tensor (Vogelsberger et al. 2008; Vogelsberger and White 2011). The evolution of \(\mathsf{\varvec {D}}\) can be used to track the evolution of an infinitesimal environment in phase space around a trajectory \((\varvec{x}(\varvec{q},\varvec{w};\,t),\,\varvec{p}(\varvec{q},\varvec{w};\,t))\). In particular, from Eq. (23) follows that zero-crossings of the determinant of the \(\mathsf{\varvec {D}}_{\mathrm{xq}}\) block correspond to infinite-density caustics, so that it can be used to estimate the local (single) stream density, and count the number of caustic crossings. Infinite density caustics would cause singular behaviour in the evolution of \(\mathsf{\varvec {D}}\), so that its numerical evolution has to be carried out with sufficient softening (Vogelsberger and White 2011; Stücker et al. 2020). Since it is sensitive to caustic crossings, the GDE can be used to quantify the distinct components of the cosmic web (Stücker et al. 2020, see also Sect. 9.5). The GDE has an intimate connection also to studies of the emergence of chaos in gravitationally collapsed structures since it quantifies the divergence of orbits in phase space and has an intimate connection to Lyapunov exponents (Habib and Ryne 1995). An open problem is how rapidly a collapsed system achieves efficient phase space mixing since discreteness noise in N-body simulations could be dominant in driving phase space diffusion if not properly controlled (Stücker et al. 2020; Colombi 2021).

3 Discretization techniques for Vlasov–Poisson systems

The macroscopic collisionless evolution of non-relativistic self-gravitating classical matter in an expanding universe is governed by the cosmological Vlasov–Poisson (VP) set of Eqs. (16a16b) derived above. VP describes the evolution of the density \(f(\varvec{x},\varvec{p},t)\) in six-dimensional phase space over time. Due to the non-linear character of the equations and the attractive (focusing) nature of the gravitational interaction, intricate small-scale structures (filamentation) emerge in phase space already in 1+1 dimensions as shown in Fig. 2, and chaotic dynamics can arise in higher dimensional phase space. Various numerical methods to solve VP dynamics have been devised, with intimate connections also to related techniques in plasma physics. The N-body approach is clearly the most prominent and important technique to-day, however, other techniques have been developed to overcome its shortcomings in certain regimes and to test the validity of results. A visual representation of the various approaches to discretise either phase space or the distribution function is shown in Fig. 3.

3.1 The N-body technique

The most commonly used discretisation technique for dark matter simulations is the N-body method, which has been used since the 1960s as a numerical tool to study the Hamiltonian dynamics of gravitationally bound systems such as star and galaxy clusters (von Hoerner 1960; Aarseth 1963; Hénon 1964) by Monte-Carlo sampling the phase space of the system. They started being used to study cosmological structure formation beginning in the early 1970s (Peebles 1971; Press and Schechter 1974; Miyoshi and Kihara 1975), followed by an explosion of the field in the first half of the 1980s (Doroshkevich et al. 1980; Klypin and Shandarin 1983; White et al. 1983; Centrella and Melott 1983; Shapiro et al. 1983; Miller 1983). These works demonstrated the web-like structure of the distribution of matter in the Universe and established that cold dark matter (rather than massive neutrinos) likely provides the “missing” (dark) matter. By the late 1990s, the resolution and dynamic range had increased sufficiently so that it became possible to study the inner structure of dark matter haloes, leading to the discovery of universal halo profiles (Navarro et al. 1997), the large abundance of substructure in CDM subhaloes (Moore et al. 1999; Klypin et al. 1999b), and predictions of the mass function of collapsed structures in the Universe over a large range of masses and cosmic time (Jenkins et al. 2001). The N-body method is now being used in virtually all large state-of-the-art cosmological simulations as the method of choice to simulate the gravitational collapse of cold collisionless matter (cf. Sect. 10 for a review of state-of-the-art simulations). In “total matter” (often somewhat falsely called “dark matter only”) simulations, the N-body mass distribution serves as a proxy of the potential landscape in which galaxy formation takes place. And also in multi-physics simulations that simulate the distinct evolution of dark matter and baryons (see also Sect. 7.8.1), collisionless dark matter is solved via the N-body method, while a larger diversity of methods are employed to evolve the collisional baryonic component (see e.g. Vogelsberger et al. 2020, for a review).

The N-body discretisation Underlying all these simulations is the fundamental N-body idea: the Vlasov equation is the continuum version of the Hamiltonian equations of motion, which implies that phase-space density is conserved along Hamiltonian trajectories. The non-linear coupling in Hamilton’s equations arises through the coupling of particles with gravity via Poisson’s equation, which is only sourced by the density field. Therefore, as long as a finite number of N particles is able to fairly sample the density field, the evolution of the system can be approximated by these discrete trajectories.

A practical complication is that the (formally) infinitely extended mass distribution has to be taken into account. Most commonly, this complication is solved by restricting the simulation to a finite cubical volume \(V=L^3\) of co-moving linear extent L with periodic boundary conditions. This can formally be written as considering infinite copies of this fundamental cubical box. The effective N-body distribution function is then given by a set of discrete macroscopic particle locations and momenta \(\left\{ \left( \varvec{X}_i(t),\varvec{P}_i(t)\right) ,\,i=1\dots N\right\} \) along with the infinite set of periodic copies, so that

$$\begin{aligned} f_N(\varvec{x}, \varvec{p}, t) = \sum _{\varvec{n}\in \mathbb {Z}^3} \sum _{i=1}^N \frac{M_i}{m}\,\delta _D(\varvec{x}-\varvec{X}_i(t)-\varvec{n} L )\,\delta _D(\varvec{p}-\varvec{P}_i(t)), \end{aligned}$$

is an unbiased sampling of the true distribution function. Here \(M_i\) is the effective particle mass assigned to an N-body particle, m the actual microscopic particle mass, and \(\varvec{X}_i(t)\) and \(\varvec{P}_i(t)\) are the position and momentum of particle i at time t. The most widespread choice of discretisation is one in which all particles are assumed to have equal mass, \(M_i = \overline{M} = \varOmega _m \rho _{\mathrm{c,0}} V / N\). Note, however, that using different masses is also possible and sometimes desirable (e.g., for multi-resolution simulations, see Sect. 6.3.4).

Initial conditions Since the particles are to sample the full six-dimensional distribution function, a key question is how the initial positions \(\varvec{X}_i(t_0)\) and momenta \(\varvec{P}_i(t_0)\) should be chosen. For cold systems, we have derived a consistent approach above and it is given by Eq. (18) which can be readily evaluated from a discrete sampling of the Lagrangian manifold alone, i.e. by choosing an (ideally homogeneous and isotropic) uniform sampling in terms of Lagrangian coordinates \(\varvec{Q}_i\) for each N-body particle (discussed in more detail in Sect. 6.4) and then obtaining the Eulerian position \(\varvec{X}_i\) and momentum \(\varvec{P}_i\) at some initial time \(t_0\) from the Lagrangian map \(\varvec{\varPsi }(\varvec{Q}_i,t_0)\) (see Sect. 6.2 for more details). This means that, in the case of a cold fluid, the particles sample the mean flow velocity exactly. The situation is considerably more involved if the system has a finite temperature which requires to sample not only the three-dimensional Lagrangian submanifold but the full six-dimensional phase space density. This implies that in some sense each particle in the cold case needs to be sub-divided into many particles that sample the momentum part of the distribution function. Particularly for hot distribution functions, where the momentum spread is large compared to mean momenta arising from gravitational instability (such as in the case of neutrinos), this has caused a formidable challenge due to the large associated sampling noise. To circumvent these problems various solutions have been proposed, e.g. using a careful sampling of momentum space using shells in momentum modulus and an angle sampling based on the healpix sphere decomposition (Banerjee et al. 2018), or reduced variance sampling based on the control variates method (Elbers et al. 2021). Such avenues are discussed in more detail in the context of massive neutrino simulations in Sect. 7.8.2 below.

Equations of motion Once the initial particle sampling has been determined, the subsequent evolution is fully governed by VP dynamics. Moving along characteristics of the VP system, the canonical equations of motion for particle \(i=1\dots N\) in cosmic time are obtained from \(\mathrm{d}f(\varvec{X}_i(t),\,\varvec{P}_i(t),\,t)/\mathrm{d}t=0\) as

$$\begin{aligned} \dot{\varvec{X}}_i = \frac{{\varvec{P}}_i}{M_i a^2}\qquad \text {and}\qquad \dot{\varvec{P}}_i = -M_i \left. \varvec{\nabla }_{x} \phi \right| _{\varvec{X}_i}. \end{aligned}$$

These are consistent with a cosmic-time Hamiltonian system with a pair interaction potential \(I(\varvec{x},\varvec{x}^{\prime})\) of the form

$$\begin{aligned} \mathscr {H}&= \sum _{i=1}^N \left[ \frac{P_i^2}{2M_i a^2} + \frac{1}{2} M_i \sum _{j\ne i} I(\varvec{X}_i,\varvec{X}_j) \right] \end{aligned}$$
$$\begin{aligned} \text {where}&\quad \sum _{j\ne i} I(\varvec{X}_i,\varvec{X}_j) = \phi (\varvec{X}_i) \quad \text {with}\quad \nabla _x^2\phi = \frac{3H_0^2\varOmega _{\mathrm{m}}}{2a} \left( \frac{\rho }{\overline{\rho }}-1\right) . \end{aligned}$$

The resulting acceleration term is given by

$$\begin{aligned} \varvec{g}(\varvec{x}) := -\left. \varvec{\nabla }_x \phi \right| _{\varvec{x}} = \frac{G}{a} \int _{\mathbb {R}^3} \mathrm{d}^3x^{\prime} \, \rho (\varvec{x^{\prime}},t) \frac{\varvec{x}^{\prime}-\varvec{x}}{\left\| \varvec{x}^{\prime}-\varvec{x}\right\| ^3}, \end{aligned}$$

which has no contribution from the background \(\overline{\rho }\) for symmetry reasons (Peebles 1980). The co-moving configuration space density \(\rho \) that provides the Poisson source arises from Eq. (30) and is given by

$$\begin{aligned} \rho (\varvec{x},t)= & {} \int _{\mathbb {R}^3} \mathrm{d}^3p\,\int _{\mathbb {R}^3} \mathrm{d}^3x^{\prime}\,m\,f_N(\varvec{x}^{\prime},\varvec{p},t) \, W(\varvec{x}^{\prime}-\varvec{x}) \nonumber \\= & {} \sum _{\varvec{n}\in \mathbb {Z}^3} \sum _{i=1}^NM_i\,\,W\left( \varvec{x}-\varvec{X}_i(t)-\varvec{n} L \right) . \end{aligned}$$

Here, we additionally allowed for a regularisation kernel \(W(\varvec{r})\) that is convolved with the discrete N-body density in order to improve the regularity of the density field and speed up convergence to the collisionless limit (or so one hopes). It represents a softening kernel (also called ‘assignment function’ depending on context) that regularises gravity and accounts for the fact that each particle is not a point-mass (like a star or black hole), but corresponds to an extended piece of phase space so that two-body scattering between the effective particles is always artificial and must be suppressed. We discuss how the acceleration obtained from this infinite sum is solved in practice in Sect. 5.

Discreteness effects The quality of the force calculation rests on how good an approximation the force associated with the density from Eq. (34) is. The hope is that by appropriate choice of W and an as large as possible number N of particles, the evolution remains close to the true collisionless dynamics and microscopic collisions remain subdominant. Due to the discrete nature of the particles, problems of the N-body approach are known to arise when force and mass resolution are not matched in which case the evolution of the discrete system can deviate strongly from that of the continuous limit (Centrella and Melott 1983; Centrella et al. 1988; Peebles et al. 1989; Melott and Shandarin 1989; Diemand et al. 2004b; Melott et al. 1997; Splinter et al. 1998; Wang and White 2007; Melott 2007; Melott and Shandarin 1989; Marcos 2008; Bagla and Khandai 2009). A slow convergence to the correct physical solution can, however, usually be achieved by keeping the softening so large that individual particles are never resolved. At the same time, if the force resolution in CDM simulations is not high enough at late times, then sub-haloes are comparatively loosely bound and prone to premature tidal disruption, leading to the ‘overmerging’ effect and the resulting orphaned galaxies (i.e., if the subhalo hosted a galaxy, it would still be a distinct system rather than having merged with the host), e.g. Klypin et al. (1999a), Diemand et al. (2004a), van den Bosch et al. (2018). In this case, one would want to choose the softening as small as possible. We discuss this in more detail in Sect. 8.2.

More sophisticated choices of W beyond a global (possibly time-dependent) softening scale are possible, for instance, the scale can depend on properties of particles, such as the local density (leading to what is called “adaptive softening”). We discuss the aspect of force regularisation by softening in more detail in Sect. 8.2. The gravitational acceleration that follows from Eq. (34) naturally has to take into account the cosmological Poisson equation, i.e., include the subtraction of the mean density and assume periodic boundary conditions. All aspects related to the time integration of cosmological Hamiltonians will be discussed in Sect. 4, those related to computing and evaluating gravitational interactions efficiently in Sect. 5 below.

Fig. 3
figure 3

Discretisations used in the numerical solution of Vlasov–Poisson: a the N-body method which samples the fine grained distribution function (light gray line) at discrete locations, b the ‘GDE’ method that can evolve the local manifold structure along with the particles (the green eigenvectors of \(\mathsf {D}_{\mathsf {xq}}\) are tangential to the Lagrangian submanifold, c the sheet tessellation method, which uses interpolation (here linear) between particles to approximate the Lagrangian submanifold with a tessellation, d a finite volume discretisation of the full phase space with uniform resolution \(\varDelta x\) in configuration space and \(\varDelta p\) in momentum space

3.2 Phase space deformation tracking and Lagrangian submanifold approximations

For large enough N, the N-body method is expected to converge to the collisionless limit. Nonetheless, an obvious limitation of this approach is that the underlying manifold structure is entirely lost as the particles retain only knowledge of positions and momenta and all other quantities (e.g. density, as well as other mean field properties) can only be recovered by coarse-graining a larger number of particles. Two different classes of methods, that we shall discuss next, have been developed over recent years that overcome this key limitation in various ways. The first class is based on promoting particles (which are essentially vectors) to tensors and re-write the canonical equations of motion to evolve them accordingly, resulting in equations of motion reminiscent of the geodesic deviation equation (GDE) in general relativity. The second class retains the particles but promotes them to vertices of a tessellation whose cells provide a discretisation of the manifold.

3.2.1 Tracking deformation in phase space—the GDE approach

We already discussed in Sect. 2.6 how infinitesimal volume elements of phase space evolve under a Hamiltonian flow. In particular, Eq. (28) is the canonical equation of motion for the phase-space Jacobian matrix. In the ‘GDE’ approach, instead of evolving only the vector \((\varvec{X}_i,\varvec{P}_i)\) for each N-body particle, one evolves in addition the tensor \(\mathsf{\varvec {D}}_i\) for each particle [cf. Vogelsberger et al. (2008), White and Vogelsberger (2009), but see also Habib and Ryne (1995) who derive a method to compute Lyapunov exponents based on the same equations]. Of particular interest is the \(\mathsf{\varvec {D}}_{\mathrm{xq}}\) sub-block of the Jacobian matrix since it directly tracks the local (stream) density associated with each N-body particle through \(\delta _i+1 = \left( \det \mathsf{\varvec {D}}_{\mathrm{xq},i}\right) ^{-1}\). The equations of motion for the relevant tensors associated to particle i are

$$\begin{aligned} \dot{\mathsf{\varvec {D}}}_{\mathrm{xq},i}&= \frac{1}{m a^{2}}\,\mathsf{\varvec {D}}_{\mathrm{pq},i} \end{aligned}$$
$$\begin{aligned} \dot{\mathsf{\varvec {D}}}_{\mathrm{pq},i}&=- m\,\mathsf{\varvec {D}}_{\mathrm{xq},i} \cdot \left. \mathsf{\varvec {T}}\right| _{\varvec{X}_i}\quad \text {where}\quad \mathsf{\varvec {T}} := \nabla _x\otimes \nabla _x \phi , \end{aligned}$$

and are solved alongside the N-body equations of motion (31) by computing the tidal tensor \(\mathsf{\varvec {T}}\). One caveat with the GDE approach is that the evolution of \(\mathsf{\varvec {D}}_\mathrm{xq}\) is determined not by the force but by the tidal field—which contains one higher spatial derivative of the potential than the force—and therefore is significantly less regular than the force field (see the detailed discussion and analysis in Stücker et al. (2021c) who have also studied the stream density evolution in virialised halos, based on a novel low-noise force calculation). This approach thus requires larger softening to achieve converged answers than a usual N-body simulation, and possibly cannot be shown to converge in the limit of infinite density caustics.

Evolving \(\mathsf{\varvec {D}}_{\mathrm{xq}}\) provides additional information about cosmic structure that is not accessible by standard N-body simulations. For instance, solving for the GDE enabled (Vogelsberger et al. 2008; Vogelsberger and White 2011) to estimate the number of caustics in dark matter haloes, which might add a boost to the self-annihilation rate of CDM particles, or the amount of chaos and mixing in haloes. A key result of Vogelsberger and White (2011) was that despite the large over-densities reached in collapsed structures, each particle is nonetheless inside of a stream with a density not too different from the cosmic mean density. This is possible since haloes are built like pâte feuilletée as a layered structure of many stretched and folded streams, as can be seen in panel a) of Fig. 5.

Fig. 4
figure 4

Images reproduced with permission [a-b] from Abel et al. (2012) and (c, d) from Stücker et al. (2020), copyright by the authors

a, b The density field obtained from the same set of N-body particles as a simple particle N-body density in (a) and in terms of a phase space sheet interpolation in terms of tetrahedral cells in (b). c, d The GDE method and the sheet tessellation method provide direct access to the stream density, which is shown in Lagrangian \(\varvec{q}\)-space for c the GDE approach and d the sheet tessellation approach

3.2.2 The dark matter sheet and phase space interpolation

A different idea to reconstruct the Lagrangian submanifold from existing N-body simulations was proposed by Abel et al. (2012) and Shandarin et al. (2012) who noted that a tessellation of Lagrangian space, constructed by using the initial positions of the N-body particles at very early times as vertices (i.e., the particles generate a simplicial complex on the Lagrangian submanifold), is topologically preserved in Hamiltonian evolution. This means that initially neighbouring particles can be connected up as non-overlapping tetrahedra (in the case of a 3D submanifold of 6D phase space). Their deformation and change of position and volume reflect the evolution of the phase-space distribution (and thus changes in the density field). A visual impression of the difference between an N-body density and this tessellation based density is given in Fig. 4. No holes can appear through Hamiltonian dynamics, but since the divergence of initially neighbouring points depends on the specific dynamics (notably the Lyapunov exponents that described the divergence of such trajectories), the edge connecting two vertices can become a tangled curve due to the complex dynamics in bound systems, e.g., Laskar (1993), Habib and Ryne (1995). As long as the tetrahedra edges still approximate well the submanifold, the simplicial complex provides access to a vast amount of information about the distribution of matter in phase space in an evolved system that is difficult or even impossible to reconstruct from N-body simulations. Most notably, it yields an estimate of density that is local but defined everywhere in space, shot-noise free, and produces sharply delineated caustics of dark matter after shell-crossing (Abel et al. 2012), leading also to new rendering techniques for 3D visualisation of the cosmic density field (Kähler et al. 2012; Igouchkine et al. 2016; Kaehler 2017), and very accurate estimators of the cosmic velocity field (Hahn et al. 2015; Buehlmann and Hahn 2019).

Since the density is well defined everywhere in space just from the vertices, and reflects well the anisotropic motions in gravitational collapse, Hahn et al. (2013) have proposed that this density field can be used as the source density field when solving Poisson’s equation as part of the dynamical evolution of the system. The resulting method, where few N-body particles define the simplicial complex that together determine the density field, solves the artificial fragmentation problem of the N-body method for WDM initial conditions (Hahn et al. 2013). The complex dynamics in late stages of collapse, however, limits the applicability of a method with a fixed number of vertices. This problem was later solved by allowing for higher order reconstructions of the Lagrangian manifold from N-body particles—corresponding in some sense to non-constant-metric finite elements in Lagrangian space—, and dynamical refinement (Hahn and Angulo 2016; Sousbie and Colombi 2016). For systems that exhibit strong mixing (phase or even chaotic, such as dark matter haloes), following the increasingly complex dynamics by inserting new vertices becomes quickly prohibitive (e.g., Sousbie and Colombi 2016; Colombi 2021 report an extremely rapid growth of vertices over time in a cosmological simulation with only moderate force resolution). Stücker et al. (2020) have carried out a comparison of the density estimated from phase space interpolation and that obtained from the GDE and found excellent agreement between the two except in the center of halos. The comparison between the two density estimates, shown in Lagrangian space, is reproduced in the bottom panels of Fig. 4.

The path forward in this direction lies likely in the use of hybrid N-body/sheet methods that exploit the best of both worlds as proposed by Stücker et al. (2020). Panel a) of Fig. 5 shows a 1+1D cut through 3+3D phase space for the case of a CDM halo comparing the result of a sheet-based simulation, where the cut results in a finite number continuous lines (top) and the equivalent results for a thin slice from an N-body simulation. The general impact of spurious phase space diffusion driven by the N-body method is still not very well understood with detailed comparison between various solvers under way (e.g., Halle et al. 2019; Stücker et al. 2020; Colombi 2021).

For hot distribution functions, such as e.g. neutrinos, the phase space distribution of matter is not fully described by the Lagrangian submanifold. While a 6D tessellation is feasible in principle, it has undesirable properties due to the inherent shearing along the momentum dimensions. However, Lagrangian submanifolds can still be singled out to provide a foliation of general six-dimensional phase space by selecting multiple Lagrangian submanifolds that are offset from each other initially by constant momentum vectors as proposed by Dupuy and Bernardeau (2014), Kates-Harbeck et al. (2016).

Fig. 5
figure 5

Comparison of evolved structures from N-body simulations with other discretisation approaches. a Comparison of a 1+1 dimensional phase space cut from simulations of a three-dimensional collapse of a CDM halo using a sheet tessellation with refinement (top, cf. Sousbie and Colombi 2016), and a reference N-body particle mesh simulation (bottom). The panels show an infinitely thin slice in the sheet case, and a finitely thin projection in the N-body case. b Simulations of collapse in 1+1D phase space with a particle mesh N-body method, the integer lattice method proposed by Mocz and Succi (2017), as well as two finite volume approaches, one where slabs in velocity space are allowed to continuously move against each other (‘moving mesh’). Images repsroduced with permission from [a] Colombi (2021), copyright by the author; and [b] from Mocz and Succi (2017)

Another version of phase space interpolation has been discussed in the context of collisionless dynamics by Colombi and Touma (2008, 2014), the so-called ‘waterbag’ method, which allows for general non-cold initial data but is restricted to 1+1 dimensional phase space. In this approach one exploits that the value of the distribution function is conserved along characteristics. If one now traces out isodensity countours of f in phase space, one finds a sequence of n (closed) lines defining the level set \(\left\{ (x,p)\;|\;f(x,p,t_0)=f_i\right\} \) of 1+1D phase space with \(i=1\dots n\) at the initial time \(t_0\). In 1+1D, these are closed curves. The curves can then be approximated using a number of vertices and interpolation between them. Moving the vertices along characteristics then guarantees that they remain part of the level set at all times as phase space density is conserved along characteristics. The number of vertices can be adaptively increased in order to maintain a high quality representation of the set contour interpolation at all times. The acceleration of the vertices can be conveniently defined in terms of integrals over the contours (cf. Colombi and Touma 2014).

3.3 Full phase-space techniques

Almost as old as the N-body approach to solve gravitational Vlasov–Poisson dynamics are approaches to directly solve the continuous problem for an incompressible fluid in phase space (cf. Fujiwara 1981 for 1+1 dimensions). By discretising phase space into cells of finite size in configuration and momentum space (\(\varDelta x\) and \(\varDelta p\) respectively) standard finite volume, finite difference methods or semi-Lagrangian techniques for incompressible flow can be employed. The main disadvantage of this approach is that memory requirements can be prohibitive since, without adaptive techniques or additional sophistications, the memory needed to evolve a three-dimensional system scales as \(\mathcal {O}(N_x^3\times N_p^3)\) to achieve a linear resolution of \(N_x\) cells per configuration space dimension and \(N_p\) cells per momentum space dimension. Only rather recently this has become possible at all as demonstrated for gravitational interactions by Yoshikawa et al. (2013) and Tanaka et al. (2017). The limited resolution that can be afforded in 3+3 dimensions leads to non-negligible diffusion errors even with high order methods, so that this direct approach is arguably best suited for hot mildly non-linear systems such as e.g. neutrinos (Yoshikawa et al. 2020, 2021), as the resolution required for colder systems is prohibitive. As a way to reduce such errors, Colombi and Alard (2017) proposed a semi-Lagrangian ‘metric’ method that uses a generalisation of the ‘GDE’ deformation discussed above to improve the interpolation step and reduce the diffusion error in such schemes.

As another way to overcome the diffusion problem, integer lattice techniques have been discussed (cf. Earn and Tremaine 1992), which exploit that if the time step is matched to the phase-space discretisation, i.e., \(\varDelta t = m (\varDelta x / \varDelta p)\), then the configuration space advection is exact and a reversible Hamiltonian system can be obtained for the lattice model discretisation. While this approach does not overcome the \(\mathcal {O}(N^6)\) memory scaling problem of a full phase-space discretisation technique, recently Mocz et al. (2017) have proposed important optimisations that might allow \(\approx \mathcal {O}(N^4)\) scaling by overcomputing, but that, to our knowledge, have not been demonstrated yet in 3+3 dimensional simulations. Results obained by Mocz et al. (2017) comparing the various techniques are shown in Fig. 5.

3.4 Schrödinger–Poisson as a discretisation of Vlasov–Poisson

An entirely different approach to discretise the Vlasov–Poisson system by exploiting the quantum-classical correspondence has been proposed by Widrow and Kaiser (1993) in the 1990s. Hereby one exploits that full information about the system, such as density, velocity, etc., can be recovered from the (complex) wave function, and phase space is discretised by a (here tuneable, not physical) quantisation scale \(\hbar =2\varDelta x\varDelta p\). Since the Schrödinger–Poisson system converges in the limit of \(\hbar \rightarrow 0\) to Vlasov–Poisson (Zhang et al. 2002), it can be used as a UV modified analogue model also for classical dynamics if one restricts attention (i.e. smoothes) on scales larger than \(\hbar \). It is important to note that the phase of the wave function is intimately related to the Lagrangian submanifold—both are given by a single scalar degree of freedom. For this reason, the Schrödinger–Poisson analogue has the advantage that it provides a full phase space theory with only a three-dimensional field (the wave function) that needs to be evolved. Following the first implementation by Widrow and Kaiser (1993), this model has found renewed interest recently (Uhlemann et al. 2014; Schaller et al. 2014; Kopp et al. 2017; Eberhardt et al. 2020; Garny et al. 2020). It is important to note that the underlying equations are identical to those of ‘fuzzy dark matter’ (FDM) models of ultralight axion-like particles, which we discuss in more detail in Sect. 7.4, in the absence of a self-interaction term. In the case of FDM, the quantum scale \(\hbar /m_{\mathrm{FDM}}\) is set by the mass \(m_{\mathrm{FDM}}\) of the microscopic particle and is (likely) not a numerical discretisation scale dictated by finite memory.

4 Time evolution

As we have shown above, large-scale dark matter simulations have an underlying Hamiltonian structure, usually with a time-dependent Hamiltonian. Mathematically, such Hamiltonian systems have a very rigid underlying structure, where the phase-space area spanned by canonically conjugate coordinates and momenta is conserved over time. Consequentially, specific techniques for the integration of Hamiltonian dynamical systems exist that preserve such underlying structure even in a numerical setting. For this reason, this section focuses almost exclusively on integration techniques for Hamiltonian systems as they arise in the context of cosmological simulations.

4.1 Symplectic integration of cosmological Hamiltonian dynamics

In the cosmological N-body problem, Hamiltonians arising in the Newtonian limit are typically of the non-autonomous but separable type, i.e. can be written

$$\begin{aligned} \mathscr {H} = \alpha (t) \, T(\varvec{P}_1,\dots ,\varvec{P}_N) + \beta (t)\, V(\varvec{X}_1,\dots ,\varvec{X}_N), \end{aligned}$$

where \(\varvec{X}_i\) and \(\varvec{P}_i\) are canonically conjugate, and \(\alpha (t)\) and \(\beta (t)\) are time-dependent functions that absorb all explicit time dependence (i.e. all factors of ‘a’ are pulled out of the Poisson equation for V). In cosmic time t one has \(\alpha =a(t)^{-2}\) and \(\beta =a(t)^{-1}\), which is not a clever choice of time coordinate since then the time dependence appears in both terms which complicates higher order symplectic integration schemes as we discuss below. The unique best choice is to not forget the relativistic origin of this Hamiltonian and consider time as a coordinate in extended phase space (cf. Lanczos 1986), using a parametric time \(\tilde{t}\) with \(\mathrm{d}\tilde{t} = a^{-2} \mathrm{d}t\) so that \(\alpha =1\) and \(\beta =a(t)\). This coincides with the “super-conformal time” first introduced by Doroshkevich et al. (1973) and extensively discussed by Martel and Shapiro (1998) under the name “super-comoving” coordinates.

Grouping coordinates and momenta together as \(\varvec{\xi }_j:=(\varvec{X},\varvec{P})_j\) and remembering that the equations of motion can be written in terms of Poisson bracketsFootnote 6 as \(\dot{\varvec{P}}_j = \left\{ \varvec{P}_j,\,\mathscr {H} \right\} \) and \(\dot{\varvec{X}}_j = \left\{ \varvec{X}_j,\,\mathscr {H} \right\} \), one can write the canonical equations as a first order operator equation

$$\begin{aligned} \dot{\varvec{\xi }}_j = \hat{\mathscr {H}}(t)\, \varvec{\xi }_j\quad \text {with}\quad \hat{\mathscr {H}}(t):= \left\{ \cdot ,\, \mathscr {H}(t)\right\} = \left\{ \cdot ,\, \alpha T\right\} + \left\{ \cdot ,\, \beta V\right\} =: \hat{D}(t) + \hat{K}(t), \end{aligned}$$

which defines the drift and kick operators \(\hat{D}\) and \(\hat{K}\), respectively. This first order operator equation has the formal solution

$$\begin{aligned} \varvec{\xi }_j(t) = \mathcal {T} \exp \left[ \int _0^t \mathrm{d}t^{\prime} \hat{\mathscr {H}}(t^{\prime})\right] \varvec{\xi }_j(0), \end{aligned}$$

where Dyson’s time-ordering operator \(\mathcal {T}\) is needed because the operator \(\hat{\mathscr {H}}\) is time-dependent. Upon noticing that the kick acts only on the momenta and it depends only on V (and therefore on the positions), and that the drift acts only on the positions and depends only on the momenta, one can seek for time-explicit operator factorisations that split the coordinate and momentum updates in the form

$$\begin{aligned} \mathcal {T} \exp \left[ \int _t^{t+\epsilon } \mathrm{d}t^{\prime} \hat{\mathscr {H}}(t^{\prime})\right] \simeq \exp \left[ \epsilon _n \hat{K}\right] \cdots \exp \left[ \epsilon _3 \hat{D} \right] \,\exp \left[ \epsilon _2 \hat{K} \right] \,\exp \left[ \epsilon _1 \hat{D} \right] + \mathcal {O}(\epsilon ^m). \end{aligned}$$

with appropriately chosen coefficients \(\epsilon _j\) that in general depend on (multiple) time integrals of \(\alpha \) and \(\beta \) (Magnus 1954; Oteo and Ros 1991; Blanes et al. 2009). This is a higher-order generalisation of the Baker–Campbell–Hausdorff (BCH) expansion in the case that \(\alpha \) and \(\beta \) are constants (Yoshida 1990). The cancellation of commutators in the BCH expansion by tuning of the coefficients \(\epsilon _j\) determines the order of the error exponent m on the right hand side of Eq. (39). It is important to note that if both \(\alpha \) and \(\beta \) are time-dependent then the generalised BCH expansion contains unequal-time commutators and the error is typically at best \(\mathcal {O}(\epsilon ^3)\). It is therefore much simpler to consider only the integration in extended phase space in super-conformal time, in which no unequal-time commutators appear and standard higher order BCH expansion formulae can be used. While some N-body codes (e.g., Ramses, Teyssier 2002) use super-conformal time, one finds numerous other choices of integration time for second order accurate integrators in the literature (e.g., Quinn et al. 1997; Springel 2005). In order to allow for generalisations to higher orders, we discuss here how to construct an extended phase-space integrator. Consider the set of coordinates \((\varvec{X}_j,\,a)\), \(j=1\dots N\), including the cosmic expansion factor, with conjugate momenta \((\varvec{P}_j,\,p_a)\) along with the new extended phase-space Hamiltonian in super-conformal time

$$\begin{aligned} \tilde{\mathscr {H}} := \sum _j \frac{{P}_j^2}{2M} + a V(\varvec{X}_{1\dots N}) + a^2\mathcal {H}(a) p_a. \end{aligned}$$

Then the second order accurate “leap-frog” integrator is found when \(\epsilon _1=\epsilon _3=\epsilon /2\) and \(\epsilon _2=\epsilon \) in Eq. (39) (all higher orders \(\epsilon _{3\dots n}=0\)) after expanding the operator exponentials to first order into their generators \(\exp [\epsilon \hat{D}]\simeq I + \epsilon \hat{D}\). The final integrator takes the form

$$\begin{aligned} \varvec{\xi }_j(\tilde{t}+\epsilon ) = \left( I+\frac{\epsilon }{2}\hat{D}\right) \left( I+\epsilon \hat{K}\right) \left( I+\frac{\epsilon }{2}\hat{D}\right) \varvec{\xi }_j(\tilde{t}) \end{aligned}$$

or explicitly as it could be implemented in code

$$\begin{aligned} \varvec{X}_j(\tilde{t}+\epsilon /2)&= \varvec{X}_j(\tilde{t}) + \frac{\epsilon }{2M} \; \varvec{P}_j(\tilde{t}) \end{aligned}$$
$$\begin{aligned} a(\tilde{t}+\epsilon /2)&= a(\tilde{t}) + \frac{\epsilon }{2} \; a(\tilde{t})^2 \, \mathcal {H}(a(\tilde{t})) \end{aligned}$$
$$\begin{aligned} \varvec{P}_j(\tilde{t}+\epsilon )&= \varvec{P}_j(\tilde{t}) - \epsilon \,a(\tilde{t}+\epsilon /2)\; \varvec{\nabla }_{\varvec{X}_j} V\left( \varvec{X}_{1\dots N}(\tilde{t}+\epsilon /2)\right) \end{aligned}$$
$$\begin{aligned} \varvec{X}_j(\tilde{t}+\epsilon )&= \varvec{X}_j(\tilde{t}+\epsilon /2) + \frac{\epsilon }{2M} \; \varvec{P}_j(\tilde{t}+\epsilon ) \end{aligned}$$
$$\begin{aligned} a(\tilde{t}+\epsilon )&= a(\tilde{t}+\epsilon /2) + \frac{\epsilon }{2}\; a(\tilde{t}+\epsilon /2)^2 \, \mathcal {H}(a(\tilde{t}+\epsilon /2))\;. \end{aligned}$$

Note that the supplementary equation \(\mathrm{d} a/\mathrm{d}\tilde{t} = \partial \tilde{H}/\partial p_a = a^2\mathcal {H}(a)\), can in principle also be integrated inexpensively to arbitrarily high precision in general cases, for EdS one has \(\tilde{t} = -2/(H_0 \sqrt{a})\) and in \(\varLambda \)CDM

$$\begin{aligned} \tilde{t} = -\frac{2}{H_0 \sqrt{\varOmega _m a}} {}_2F_1\left( -\frac{1}{6},\frac{1}{2},\frac{5}{6};-f_\varLambda (a)\right) , \end{aligned}$$

where \(f_\varLambda := \varOmega _\varLambda / (\varOmega _{\mathrm{m}} a^{-3})\) as in Eq. (21), which has to be inverted numerically to yield \(a(\tilde{t})\). Since this is a symplectic integration scheme, it will conserve the energy associated with the Hamiltonian \(\tilde{\mathscr {H}}\).

Equations (42a)–(42d) represent the drift-kick-drift (DKD) form of a second order integrator. It is trivial to derive also the respective kick-drift-kick (KDK) form. Based on this, it is possible to construct also higher order integrators, see e.g., Yoshida (1990) for a derivation of operators up to 8th order that involve, however, positive and negative time coefficients. Also, alternative symplectic formulations with purely positive coefficients are possible, see Chin and Chen (2001) for a 4th order method. An exhaustive discussion of symplectic and other geometric integrators and their properties can be found in Hairer et al. (2006) and Blanes and Casas (2016). In cosmological simulations, the second order leap frog is, however, the most commonly used integrator to date, arguably due to its robustness, simplicity, slim memory footprint, and easy integration with hierarchical time-stepping schemes (see below). We are not aware of production implementations of higher-order symplectic integrators used in cosmological simulations.

A long-time evolution operator can be constructed by many successive applications of the KDK or DKD propagators. Writing out the product, it is easy to see that the last and first half-step operators from two successive steps can often be combined into a single operator (if \(\hat{A}\) below is time independent, otherwise usually at second order). Then, in the long product of operators, combining

$$\begin{aligned} \dots \exp (\epsilon _B \hat{B})\exp (\frac{\epsilon _A}{2} \hat{A})\exp (\frac{\epsilon _A}{2} \hat{A})\exp (\epsilon _B \hat{B})\dots =\dots \exp (\epsilon _B \hat{B})\exp (\epsilon _A \hat{A})\exp (\epsilon _B \hat{B})\dots , \end{aligned}$$

implies that in continued stepping only two interleaved steps have to be made per time step, not three, and that the splitting into three sub-steps just serves to symmetrise the scheme and interleave the steps. Half-steps are only made at the very beginning and end of the stepping—or whenever one needs synchronous dynamical variables (e.g. for output or analysis).

4.2 Multi-stepping, adaptive steps and separation of time-scales

4.2.1 Time step criteria

A challenge in cosmological simulations is the large dynamic range from quasilinear flow on large-scales to very short dynamical times in the centres of dark matter halos: we seek to simultaneously simulate large underdense regions of the universe—where particles move on long timescales together with massive clusters—where density contrasts can reach \(10^4\)\(10^5\) times the average density and with very short timescales. In the absence of an adaptive or hierarchical time-stepping scheme, the criteria discussed below yield a global timestep \(\varDelta t = \min _i \varDelta t_i\) dictated by the N-body particle with the smallest time step.

A simple condition for choosing a time step is the Courant–Friedrichs–Lewy (CFL) criterion, which requires that particles travel less than a fraction of one force resolution element, \(\varDelta x\), over the time step. Specifically

$$\begin{aligned} \varDelta t_i = C \frac{\varDelta x}{ \Vert \varvec{P}_i / M_i \Vert } \,, \end{aligned}$$

where \(0<C<1\) is a free parameter, usually \(C \sim 0.25\). While commonly used in the case of Vlasov–Poisson, we are not aware of explicit derivations of this value from stability criteria as in the case of hyperbolic conservation laws. A closely related criterion is \(\varDelta t_i = C \sqrt{\varDelta x/\Vert \varvec{A}_i\Vert }\), where \(\varvec{A}_i := -\varvec{\nabla }\phi |_{\varvec{X}_i}\) is the acceleration. This condition sets a global timestep that is commonly used in simulations where forces are computed via the PM algorithm (e.g., Merz et al. 2005). Other criteria are also possible, for instance the ABACUS code (Garrison et al. 2016), in addition to Eq. 44, uses a heuristic condition based on the global maximum value of the RMS velocity over the maximum acceleration of particles in a volume element.

In the case of tree or direct summation based methods, the scale played by the mesh grid resolution is taken over by the softening length. Therefore, a simple criterion is to estimate for each particle a time-scale

$$\begin{aligned} \varDelta t_i \simeq \eta \sqrt{\frac{\varepsilon }{\Vert \varvec{A}_i\Vert }} \,, \end{aligned}$$

where \(\varepsilon \) is the gravitational force softening scale, and \(\eta \) is a dimensionless accuracy parameter. This is the most common time-stepping criterion adopted in large-scale simulations and it is used, for instance, in PKDGRAV-3 (with \(\eta = 0.2\)) and GADGET.

Several authors have argued that a more optimal timestepping criterion should be based on the tidal field rather than the acceleration (Dehnen and Read 2011; Stücker et al. 2020; Grudić and Hopkins 2020). This is also motivated by the fact that a constant (global) acceleration does not change the local dynamics of a system, see e.g. Stücker et al. (2021a). Additionally, using tides avoids invoking the non-physical scale \(\varepsilon \), and one has

$$\begin{aligned} \varDelta t_i \simeq \frac{\eta }{ \sqrt{\left\| \; \mathsf{\varvec {T}}(\varvec{X}_i) \; \right\| }} \,, \end{aligned}$$

where \(\mathsf{\varvec {T}}=\varvec{\nabla }\otimes \varvec{\nabla }\phi \) is the tidal field tensor, and \(\Vert \cdot \Vert \) is e.g. the Frobenius matrix norm. This tidal criterion typically yields shorter timesteps in the innermost parts and longer timesteps in the outer parts of haloes compared to the standard criterion (45). A caveat is that it is not trivial to get a robust estimate of the tidal field, since it is one spatial order less smooth than the acceleration field entering (45) and so in principle time-step fluctuations due to noise might amplify integration errors.

An additional global timestep criterion, regardless of the dynamics of N-body particles, is sometimes used in large-scale cosmological simulation when high resolution is not required. These criteria are usually tied to the scale-factor evolution and e.g. of the form

$$\begin{aligned} \varDelta \log (a) < B \,, \end{aligned}$$

where \(B \sim 0.01\). These kind of criteria are also usually employed in PM codes and COLA (which we will discuss in Sect. 4.4), which typically adopt timesteps equally spaced in the expansion factor or in its logarithm, or more complicated functions (e.g., \(\varDelta a/a = (a_1^{-2} + a_2^{-2})^{-0.5}\) with \(a_1\) and \(a_2\) being free parameters in Fast-PM). Different authors have advocated different options and number of steps justified simply by convergence rates of a set of desired summary statistics (White et al. 2014; Feng et al. 2016; Tassev et al. 2013; Izard et al. 2016). The criterion in Eq. 47 is also commonly used together with other conditions even in high-resolution simulations, since it appears to be necessary for a precise agreement with linear theory predictions at high redshift.

We note that different timestep criteria (and combinations) are adopted by different codes, usually heuristically motivated. The optimal choice seems to depend on details of the simulation (redshift, force accuracy, etc), which suggest there could be better strategies to choose the timestep. This could be very important as it has a significant impact in the overall computational cost of a simulation. For instance, by adjusting the timesteps, Sunayama et al. (2016) finds a factor of 4 reduction in the CPU time in N-body simulations while still accurately recovering the power spectrum and halo mass function (after a correction of masses). As far as we know, no systematic study of the optimal general time-stepping strategy has been published for large-scale cosmological simulations taking into account the target accuracy needed for upcoming observations.

For some applications, a global timestep is sufficient to obtain accurate results on the mildly nonlinear regime, as it has been adopted in some large-scale simulations codes. However, as the resolution of a simulation increases, the minimum value of \(\varDelta t_i\) quickly decreases, usually as the result of a small number of particles in short orbit inside dark matter halos. To avoid that the shortest time-scale dictates an intractably small global time step, it is desirable to have individually adaptive time-steps. Some care needs to be taken to consistently allow for this in a time integrator, as we discuss next.

4.2.2 Hierarchical / block time stepping

For systems of particles with widely different time-scales, a division into ‘fast’ and ‘slow’ particles is advantageous. Given a second order accurate integrator and a splitting of the Hamiltonian into \(\mathscr {H} = T + V_{\mathrm{slow}} + V_{\mathrm{fast}}\), then the following n-fold sub-cycling scheme is also second order accurate (Hairer et al. 2006)

$$\begin{aligned} \left( I+\frac{\epsilon }{2}\hat{K}_{\mathrm{slow}}\right) \left[ \left( I+\frac{\epsilon }{2n}\hat{K}_{\mathrm{fast}}\right) \left( I+\frac{\epsilon }{n}\hat{D}\right) \left( I+\frac{\epsilon }{2n}\hat{K}_{\mathrm{fast}}\right) \right] ^n \left( I+\frac{\epsilon }{2}\hat{K}_{\mathrm{slow}}\right) . \end{aligned}$$

The gain is that in one such KDK timestep, while the fast particles are kicked 2n times, the slow particles are kicked only twice. Since the force computation is the algorithmically slowest part of the update, this leads to a computational speed up.

This sub-cycling idea can be generalised to the block time step scheme (BTS, sometimes also called ‘hierarchical time stepping’) to update particles on their relevant time scales (Hayli 1967, 1974; McMillan 1986) using a time-quantised recursive version of the sub-cycling scheme above. By fixing \(n=2\), but applying the formula recursively, i.e., splitting the ‘fast’ part itself into a ‘slow’ and ‘fast’ part and so on, one achieves a hierarchical time integration scheme with time steps \(\epsilon _\ell = 2^{-\ell } \epsilon _0\) on recursion level \(\ell \). The power-two quantisation means that while a particle on level \(\ell \) makes one timestep, a particle on level \(\ell +1\) makes two, and one on level \(\ell +2\) makes four in the same time interval. Since the scheme is self-similar, after every sub step with Eq. (48) on level \(\ell \), all particles on levels \(\ell ^{\prime}>\ell \) have carried out complete time steps, and can be re-assigned a new time bin. In this way, each particle can be assigned to a close to optimal time step “bin” which is consistent with its local timestep criterion. This multi-stepping approach is adopted in PKDGRAV-3 and GADGET.

This scheme can be further optimized. As written above, the kick \(\hat{K}\) on level \(\ell \) involves the interaction with particles on all other levels, i.e. the ‘fast’ particles interact with the ‘slow’ particles on the ‘fast’ timescale, while it is likely sufficient that they do so on the ‘slow’ time scale. For this reason, a variation, e.g. implemented in GADGET4 (Springel et al. 2021), is that the kick of particles on level \(\ell \) is computed using the gravitational force of particles only on all levels \(\ell ^{\prime} \ge \ell \). Such hierarchical integration schemes have recently been extended to higher order integrators in the non-cosmological case (Rantala et al. 2021). In principle, secondary trees need to be built in every level of the hierarchy of timesteps, however, this would require a significant amount of computing time. Therefore, a optimization strategy, e.g. adopted in PKDGRAV-3, is to build a new secondary tree only if a timestep level contains a small fraction of particles compared to the previous level where the tree was built.

4.3 Symplectic integration of quantum Hamiltonians

The integration of a classical Hamiltonian in operator notation, Eq. (37) is basically identical to that of the Schrödinger-Poisson system which is an effective description of non-relativistic scalar field dark matter, Eq. (92). One can therefore use the entire machinery of operator splitting developed above, with differences only in the form of the kick and drift operators, as they now act on the wave function, rather than the set of conjugate coordinates and momenta. As above, the best choice is again superconformal time so that one has a quantum Hamiltonian acting on wave functions \(\psi \) with associated Poisson equation:

$$\begin{aligned} i\hbar \frac{\partial \psi }{\partial \tilde{t}}= & {} \hat{\mathscr {H}}\psi \qquad \text {with}\qquad \hat{\mathscr {H}} = \frac{\hat{p}^2}{2m}+a(\tilde{t})\,\hat{V}(\hat{q})\qquad \text {and}\qquad \nonumber \\ \nabla ^2 \hat{V}= & {} \frac{3}{2}H_0^2 m \varOmega _X ( \left| \psi \right| ^2-1). \end{aligned}$$

With the formal solution identical to that in Eq. (38) apart from a factor \(\mathrm{i}/\hbar \) in the exponent, note also that the mass m is the actual microscopic particle mass and not a coarse-grained effective mass. The main difference and advantage compared to the classical case is that drift and kick operators \(\hat{D}\) and \(\hat{K}\) need not be represented through their infinitesimal generators, but can be directly taken to be the operator exponentials

$$\begin{aligned} \hat{D}(\tilde{t},\tilde{t}+\epsilon ) = \exp \left( -\epsilon \,\frac{i}{\hbar } \frac{\hat{p}^2}{2m} \right) ,\qquad \text {and}\qquad \hat{K}(\tilde{t},\tilde{t}+\epsilon ) = \exp \left( -\epsilon a(\tilde{t}+\epsilon /2)\frac{i}{\hbar } \hat{V} \right) . \end{aligned}$$

The kick operator is purely algebraic, i.e. it is simply a scalar function multiplying the wave function. The same is true for the drift operator in Fourier space, where

$$\begin{aligned} \hat{\tilde{D}}(\tilde{t},\tilde{t}+\epsilon ) := \mathcal {F}\,\hat{D}(\tilde{t},\tilde{t}+\epsilon ) = \exp \left( -\epsilon \, \frac{\mathrm{i}\hbar }{2m}k^2 \right) \end{aligned}$$

is simply a scalar function; \(\mathcal {F}\) is the Fourier transform operator with inverse \(\mathcal {F}^{-1}\), \(\varvec{k}\) the Fourier-conjugate wave number (or momentum) to coordinate \(\varvec{x}\). One can thus formulate a split-step spectral integration scheme (e.g., Taha and Ablowitz 1984; Woo and Chiueh 2009), where drift operators are simply applied in Fourier space, i.e. before a drift, the wave function is transformed, and after it is transformed back. The time coefficients again have to be matched to cancel out commutators in the BCH relation, so that a second order DKD time step is e.g. given by

$$\begin{aligned} \psi (\varvec{x},t+\epsilon ) = \mathcal {F}^{-1} \hat{\tilde{D}}(\tilde{t}+\epsilon /2,\tilde{t}+\epsilon ) \mathcal {F} \hat{K}(\tilde{t},\tilde{t}+\epsilon ) \mathcal {F}^{-1} \hat{\tilde{D}}(\tilde{t},\tilde{t}+\epsilon /2) \mathcal {F}\,\psi (\varvec{x},t). \end{aligned}$$

The chaining of multiple steps eliminates two of the Fourier transforms for the steps that are not at the beginning or end of the time integration, leaving one forward and one backward transform per time step. The use of Fourier transforms and the spectrally accurate drift operator (51) which contains the exponentiated Laplacian \(\hat{p}^2\) to all orders significantly increases the convergence rate of the scheme even for large time steps.

The limiting factor of a spectral approach is that the spatial discretisation is necessarily uniform for Fourier based methods. It is clear however that the use of AMR techniques to locally increase the resolution inhibits (to our knowledge) the simple use of spectral approaches, so that smaller stencils and finite-difference expansions become necessary. Since gravity and the expanding Universe contract gravitationally bound structures to ever smaller scales, methods with higher dynamic range are needed to probe the interior of halos. To our knowledge, no spatially adaptive spectral method has been developed yet. Instead, methods with higher dynamic range resort to finite difference methods with AMR, which has been successfully used to model the interior dynamics of scalar field dark matter haloes (Schive et al. 2014; Mina et al. 2020; Schwabe et al. 2020). By discretising the kick operator, one has to resort to the generator formulation again, i.e. \(\hat{D}_\epsilon = 1+\epsilon \frac{\mathrm{i}\hbar }{m}\nabla ^2 + \mathcal {O}(\epsilon ^2)\), where then the Laplacian can be approximated with finite differences. This imposes strong CFL-like time-stepping constraints.

4.4 Acceleration methods: COLA and FastPM

One of the main short-comings of symplectic integration schemes is that they have to evolve a time-dependent Hamiltonian, which can require many time steps even during the quasi-linear early phase of structure formation. In contrast, perturbation theory is very accurate during this regime. For instance, first-order LPT (a.k.a. the Zel’dovich approximation) yields exact results for one-dimensional problems up to shell-crossing and, therefore, the solution could be obtained in a single time-step. This implies that (mostly in low resolution large-scale simulations) a considerable amount of computing time can be spent to accurately evolve the particles during the quasi-linear phase, since too large timesteps during this phase can lead to significant large-scale errors. This has motivated several methods aimed at incorporating results from perturbation theory in the time-integration of an N-body scheme. These approaches have been widely-adopted to more efficiently create large ensembles of simulations that achieve high accuracy on mildly non-linear scales. We review the main ideas and implementations next.

4.4.1 COLA: COmoving Lagrangian Acceleration

In the COLA approach (Tassev et al. 2013), the idea is to consider motion relative to a pre-computed LPT solution (specifically 1 and 2LPT in all implementations we are aware of), by considering the equations of motion of CDM, Eq. (24), relative to the motion in nLPT, as quantified by the order n truncated Lagrangian map \(\varvec{x}_j = \varvec{q}_j+\varvec{\varPsi }_{\mathrm{LPT},j}(t)\) for particle j. In all existing work on COLA that we are aware of, an ad-hoc modification of the leapfrog drift and kick operators is made to reflect this change in the equations of motion. One can however write such a transformation rigorously as a canonical transformation to new coordinates \((\varvec{X}_j,\varvec{P}_j)\) with a generating function \(\mathscr {F}_3(\varvec{p}_j,\varvec{X}_j,\tilde{t}) = ( M\dot{\varvec{\varPsi }}_{\mathrm{LPT},j}-\varvec{p}_j)\cdot \varvec{X}_j\) so that \(\varvec{X}_j=\varvec{x}_j\) and \(\varvec{P}_j=\varvec{p}_j - M\dot{\varvec{\varPsi }}_{\mathrm{LPT},j}\). The COLA Hamiltonian then becomes (where the dot now indicates a derivative w.r.t. superconformal time)

$$\begin{aligned} \mathscr {H}_{\mathrm{COLA}} = \sum _j\frac{(\varvec{P}_j+M\dot{\varvec{\varPsi }}_{\mathrm{LPT},j})^2}{2 M} + a V(\varvec{X}_{1\dots N}) + M \sum _j \ddot{\varvec{\varPsi }}_{\mathrm{LPT},j}\cdot \varvec{X}_j. \end{aligned}$$

It is immediately obvious that this Hamiltonian has explicit time-dependence in both the kinetic and the potential part which will complicate the development of symplectic splitting schemes. The equations of motion now reflect the motion relative to the LPT solution of the form

$$\begin{aligned} \dot{\varvec{X}}_j = \frac{\varvec{P}_j}{M} + \dot{\varvec{\varPsi }}_{\mathrm{LPT},j}\qquad \text {and}\qquad \dot{\varvec{P}}_j = -a\,\varvec{\nabla }_{\varvec{X}_j} V - M\ddot{\varvec{\varPsi }}_{\mathrm{LPT},j}. \end{aligned}$$

Existing COLA implementations ignore the Dyson time-ordering and simply modify the drift and kick operators to become

$$\begin{aligned} \hat{D}_{\mathrm{COLA}}(\tilde{t},\tilde{t}+\epsilon ) \varvec{X}_j= & {} \hat{D}(\tilde{t},\tilde{t}+\epsilon ) \varvec{X}_j + \varvec{\varPsi }_{\mathrm{LPT},j}(\tilde{t}+\epsilon ) - \varvec{\varPsi }_{\mathrm{LPT},j}(\tilde{t}), \end{aligned}$$
$$\begin{aligned} \hat{K}_{\mathrm{COLA}}(\tilde{t},\tilde{t}+\epsilon ) \varvec{P}_j= & {} \hat{K}(\tilde{t},\tilde{t}+\epsilon ) \varvec{P}_j + M\dot{\varvec{\varPsi }}_{\mathrm{LPT},j}(\tilde{t}+\epsilon ) - M\dot{\varvec{\varPsi }}_{\mathrm{LPT},j}(\tilde{t}), \end{aligned}$$

which is accurate at first order since none of the unequal time commutators can be expected to vanish (Tassev et al. 2013 discuss an ad-hoc improvement to reduce errors in their Appendix A.3.2). Despite the low order, this method allows for a very rapid approximate evolution in the quasi-linear regime, at the expense of having to store the fields needed to compute the nLPT trajectory of each fluid element. The above modifications are widely used in, for instance, the generation of a large number of mock catalogues for estimates of clustering covariance matrices. In such cases, ensembles of thousands of low-mass resolution, each of them with typically 10 time steps, are performed. An MPI-parallel implementation of the COLA algorithm is publicly available for the implementation of Koda et al. (2016)Footnote 7 and in the form of the L-PICOLAFootnote 8 code (Howlett et al. 2015), and a modified sCOLA algorithm allowing a more efficient spatial decomposition and “zoom simulations” has been recently proposed (Tassev et al. 2015) and is implemented in the Simbelmyne code (Leclercq et al. 2020)Footnote 9. Therefore, it might be worthwhile exploring non-symplectic integration schemes in COLA, which could be rigorously higher order (given that there is no obvious benefit from symplectic integration anyway) and improve the performance of the method.

4.4.2 FastPM

An alternative that does not require to compute or store the LPT displacement fields, thus saving computer memory, while relying on PT input to speed up the evolution was proposed as the FastPM method by Feng et al. (2016)Footnote 10. In this approach, the prefactors for drift and kick operators receive an ad-hoc modification such that they contain the expected contribution of non-constant acceleration and velocities computed in the Zeldovich approximation. Note, however, that it has been argued that no such modifications are needed to obtain an accurate time integration, as long as the time stepping is chosen appropriately (Klypin and Prada 2018a; see also Sunayama et al. 2016). The performance is similar to that of COLA, allowing approximate simulations with very few time steps that can be accurate on large scales. As for COLA, the order of convergence has, to our knowledge, not been discussed in the literature. The FastPM approach has been recently extended to include the modelling of massive neutrinos (Bayer et al. 2021), and also ported to TensorFlow (Modi et al. 2020b).

5 Gravity calculation

After having discussed the discretization in time and space (i.e. mass, for Lagrangian schemes) of the evolution equations, we now turn to the problem of computing the gravitational interactions of the simulated mass distribution. This step is usually the most time-consuming aspect of a modern N-body simulation and thus also where most numerical approximations are made and where various parallelization strategies have the largest impact. Depending on the problem at hand, the targeted numerical accuracy, and the computer architecture employed, several different methods exists that are in different senses ’optimal’. Modern state-of-the-art codes typically exploit all these existing techniques. Optimal algorithmic complexity, \(\mathcal {O}(N)\) in the number N of particles, is achieved e.g. by the Fast Multipole Method (FMM), which is very promising for very large particle count simulations and is used e.g., in the PKDGRAV3 or Gadget-4 codes, and the geometric multigrid method, used e.g., in the RAMSES code. The newest codes also readily utilise thousands of GPUs to generate simulated Universes for the upcoming generations of cosmological observations.

In the following, we provide a brief overview of the main methods and the main ideas behind them. Regarding the general topic of gravity calculations in N-body, we also refer to other reviews for further details on these methods (Dehnen and Read 2011).

5.1 Mesh-based methods

A robust and fast method to solve for the gravitational interactions of a periodic system is provided by the particle-mesh (PM) method (Doroshkevich et al. 1980; Hockney and Eastwood 1981). Derived from the particle-in-cell (PIC) technique developed in plasma physics, they are among the oldest numerical methods employed to study cosmological structure formation. The techniques described here can be employed not only for the N-body discretisations, but are readily applicable also e.g. for full phase space or integer lattice methods (cf. Sects. 3.2 and 3.3), see also Miller and Prendergast (1968), and even in the case of Schrödinger-Poisson systems (Woo and Chiueh 2009).

5.1.1 Force and potential determination—spectral calculation

Considering a periodic domain of side length L, we want to solve the cosmological Poisson equation (Eq. 32b). Assume that both density \(\rho \) and potential \(\phi \) are periodic in \([-L/2,L/2)\) and can be expanded in a Fourier series, i.e.

$$\begin{aligned} \rho (\varvec{x})=\sum _{\varvec{n}\in \mathbb {Z}^3} \tilde{\rho }_{\varvec{n}}\exp \left( \mathrm{i} k_0\, \varvec{x}\cdot \varvec{n}\right) ,\quad \text {with}\quad k_0:=\frac{2\pi }{L} \end{aligned}$$

and identically for \(\phi (\varvec{x})\) with coefficients \(\tilde{\phi }_{\varvec{n}}\). It then follows from Poisson’s equation (Eq. 17) that their Fourier coefficients obey the algebraic relation

$$\begin{aligned} -k_0^2\left\| \varvec{n}\right\| ^2\,\tilde{\phi }_{\varvec{n}} = 4\pi G a^{-1} \left( \tilde{\rho }_{\varvec{n}} - \overline{\rho }\,\delta _D(\varvec{n}) \right) \quad \text {for all}\quad \varvec{n}\in \mathbb {Z}^3. \end{aligned}$$

This equation imposes the consistency condition \(\tilde{\rho }_{\varvec{n}=\varvec{0}}=\overline{\rho }\), i.e. the mean Poisson source must vanish. In practice, this is achieved in PM codes by explicitly setting to zero the \(\varvec{n}=0\) mode (a.k.a. the “DC mode”, in analogy to AC/DC electric currents). For the acceleration field \(\varvec{g} = -\nabla \phi \), one finds \(\tilde{\varvec{g}}_{\varvec{n}} = -\mathrm{i}k_0 \varvec{n} \tilde{\phi }_{\varvec{n}}\). The solution for potential and acceleration can thus be conveniently computed using the Discrete Fourier transform (DFT) as

$$\begin{aligned} \tilde{\phi }_{\varvec{n}} = \left\{ \begin{array}{cl} -\frac{4\pi G }{a k_0^2} \frac{\tilde{\rho }_{\varvec{n}}}{\Vert \varvec{n}\Vert ^2}&{} \quad \text {if}\quad \varvec{n}\ne \varvec{0} \\ 0 &{} \quad \text {otherwise } \end{array} \quad , \right. \qquad \tilde{\varvec{g}}_{\varvec{n}} = \left\{ \begin{array}{cl} \frac{4\pi G}{a k_0} \frac{\mathrm{i}\,\varvec{n}\tilde{\rho }_{\varvec{n}}}{\Vert \varvec{n}\Vert ^2}&{} \quad \text {if}\quad \varvec{n}\ne \varvec{0} \\ 0 &{} \quad \text {otherwise } \end{array} \right. . \end{aligned}$$

If one considers a uniform spatial discretisation of both potential \(\phi _{\varvec{m}}:=\phi _{i,j,k} := \phi (\varvec{m} h)\) and density \(\rho _{\varvec{m}}\), with \(i,j,k\in [0\dots N_g-1]\), mesh index \(\varvec{m}:=(i,j,k)^T\), and grid spacing \(h:=L/N_g\), then the solution can be directly computed using the Fast-Fourier-Transform (FFT) algorithm at \(\mathcal {O}(M\log M)\) for \(M=N_g^3\) grid points. Many implementations exist, the FFTW libraryFootnote 11 (Frigo and Johnson 2005) is one of the most commonly used with support for multi-threading and MPI. In the case of the DFT, the Fourier sum is truncated at the Nyquist wave number, so that \(\varvec{n} \in (-N_g/2,N_g/2]^3\).

Note that instead of the exact Fourier-space Laplacian, \(-k_0^2 \Vert \varvec{n} \Vert ^2\), which is implicitly truncated at the Nyquist wave numbers, sometimes a finite difference version is used in PM codes such as Fast-PM (Feng et al. 2016) (cf. 4.4). Inverting the second order accurate finite difference Laplacian in Fourier space yieldsFootnote 12

$$\begin{aligned} \tilde{\phi }_{\varvec{n}}^{\mathrm{FD2}} = \left\{ \begin{array}{cl} -\frac{\pi G \varDelta x^2 }{a} \;\tilde{\rho }_{\varvec{n}}\;\left( \sin ^2\left[ \frac{\pi n_x}{N_g} \right] + \sin ^2\left[ \frac{\pi n_y}{N_g} \right] + \sin ^2\left[ \frac{\pi n_z}{N_g} \right] \right) ^{-1}&{} \quad \text {if}\quad \varvec{n}\ne \varvec{0} \\ 0 &{} \quad \text {otherwise. } \end{array} \right. \end{aligned}$$

This kernel has substantially suppressed power on small scales compared to the Fourier space Laplacian, which reduces aliasing (see the discussion in the next section). It also reduces the effect of anisotropies due to the mesh on grid scales.

Solving Poisson’s equation in Fourier space with FFTs becomes less efficient if boundary conditions are not periodic, or if spatial adaptivity is necessary. For isolated boundary conditions, the domain has to be zero padded to twice its size per linear dimension, which is an increase in memory by a factor of eight in three dimensions. This is a problem on modern architectures since memory is expensive and slow, while floating-point operations per second (FLOP) are much cheaper to have in comparison. A further problem of FFT methods is their parallelization: a multidimensional FFT requires a global transpose of the array. This leads to a very non-local communication pattern and the need to transfer all of the data multiple times between computer nodes per force calculation.

Additionally, if high resolution is required, as is often the case in cosmology due to the nature of gravity as an attractive force, the size of the grid can quickly become the computational bottleneck. One possibility is to introduce additional higher resolution meshes (Jessop et al. 1994; Suisalu and Saar 1995; Pearce and Couchman 1997; Kravtsov et al. 1997; Teyssier 2002), deposit particles onto them and then solve using an adaptive “relaxation method” such as the adaptive multigrid method (see below), or by employing the periodic FFT solution as a boundary condition. Adaptive algorithms are typically more complex‘ due to the more complicated data structures involved.

It is also possible to employ another (or many more) Fourier mesh extended over a particular region of interest in a so-called “zoom simulation”, cf. Sect. 6.3.4, if higher force resolution is required in a few isolated subregions of the simulation volume. A problem related of this method is that, for a finite grid resolution, Fourier modes shorter than the Nyquist frequency will be incorrectly aliased to those supported by the Fourier grid (Hockney and Eastwood 1981), which causes a biased solution to the Poisson equation. The magnitude of this aliasing effect depends on the mass assignment scheme and can be reduced when PM codes are complemented with other force calculation methods, as discussed below in Sect. 5.3, since then the PM force is usually UV truncated.

Instead of adding a fine mesh on a single region of interest, it is possible to add it everywhere in space. This approach is known as two-level PM or PMPM, and has been used for carrying out Cosmo-\(\pi \), the largest N-body simulation to date (cf. Sect. 10). This approach has the advantage that, for a cubical domain decomposition, all the operations related to the fine grid can be performed locally, i.e. without communication among nodes in distributed-memory systems, which might result in significant advantages specially when employing hundreds of thousands of computer nodes.

For full phase-space techniques, the PM approach also is preferable if a regular mesh already exists in configuration space onto which the mass distribution can then be easily projected. The Fourier space spectral solution of the Poisson equation can also be readily employed in the case of Schrödinger–Poisson discretisations on a regular grid. In this case, the Poisson source is computed from the wave function which is known on the grid, so that \(\rho _{\varvec{m}} = \psi _{\varvec{m}} \psi _{\varvec{m}}^*\).

5.1.2 Mass assignment schemes

Grid-based methods always rely on a charge assignment scheme (Hockney and Eastwood 1981) that deposits the mass \(m_i\) associated with a particle i at location \(\varvec{X}_i\) by interpolating the particle masses in a conservative way to grid point locations \(\varvec{x}_{\varvec{n}}\) (where \(\varvec{n}\in \mathbb {N}^3\) is a discrete index, such that e.g. \(\varvec{x}_{\varvec{n}} = \mathbf {n}\,\varDelta x\) in the simplest case of a regular (cubic) grid of spacing \(\varDelta x\)). This gives a charge assignment of the form

$$\begin{aligned} \rho _{\varvec{n}} = \int _{\mathbb {R}^3} \mathrm{d}^3x^{\prime}\,\hat{\rho }(\varvec{x}^{\prime}) \,W_{3D}(\varvec{n}\,\varDelta x-\varvec{x}^{\prime})\quad \text {with}\quad \hat{\rho }(\varvec{x}):=\sum _{i=1}^N M_i \delta _D(\varvec{x}-\varvec{X}_i), \end{aligned}$$

where the periodic copies in the density were dropped since periodic boundary conditions are assumed in the Poisson solver. Charge assignment to a regular mesh is equivalent to a single convolution if \(M_i=M\) is identical for all particles. The most common particle-grid interpolation functions (cf. Hockney and Eastwood 1981) of increasing order are given for each spatial dimension byFootnote 13

$$\begin{aligned} W_{\mathrm{NGP}}(x)= & {} \frac{1}{h}\left\{ \begin{array}{ll} 1 &{} \quad {\text {for}}\,\left| x \right| \le \frac{\varDelta x}{2}\\ 0 &{} \quad {\text {otherwise}} \end{array}\right. \end{aligned}$$
$$\begin{aligned} W_{\mathrm{CIC}}(x)= & {} \frac{1}{h}\left\{ \begin{array}{ll} 1-\frac{\left| x\right| }{\varDelta x} &{}\quad {\text {for}}\,\left| x\right| < \varDelta x \\ 0 &{}\quad \text {otherwise} \end{array}\right. \end{aligned}$$
$$\begin{aligned} W_{\mathrm{TSC}}(x)= & {} \frac{1}{\varDelta x}\left\{ \begin{array}{ll} \frac{3}{4} - \left( \frac{x}{\varDelta x}\right) ^2 &{} \quad {\text {for}}\,\left| x\right| \le \frac{\varDelta x}{2}\\ \frac{1}{2}\left( \frac{3}{2} - \frac{\left| x\right| }{\varDelta x}\right) ^2 &{} \quad \text {for }\frac{\varDelta x}{2}\le \left| x \right| < \frac{3\varDelta x}{2}\\ 0 &{} \quad {\text {otherwise}} \end{array}\right. \end{aligned}$$
$$\begin{aligned} W_{\mathrm{PCS}}(x)= & {} \frac{1}{\varDelta x}\left\{ \begin{array}{ll} \frac{1}{6} \left[ 4 - 6\left( \frac{x}{\varDelta x}\right) ^2 + 3 \left( \frac{|x|}{\varDelta x}\right) ^3 \right] &{} \quad {\text {for}}\,\left| x\right| \le \varDelta x\\ \frac{1}{6}\left( 2 - \frac{\left| x\right| }{\varDelta x}\right) ^3 &{} \quad {\text {for}}\, \varDelta x \le |x| < 2 \varDelta x\\ 0 &{} \quad {\text {otherwise}} \end{array}\right. \end{aligned}$$

The three-dimensional assignment function is then just the product \(W_{3D}(\varvec{x})=W(x)\,W(y)\,W(z)\), where \(\varvec{x}=(x,y,z)^T\). It can be easily shown that interpolating using the inverse of these operators from a field increases the regularity of the particle density field, and thus also has a smoothing effect on the resulting effective gravitational force. This can also be seen directly from the Fourier transform of the assignment functions which have the form (per dimension)

$$\begin{aligned} \tilde{W}_{n}(k) = \left[ \mathrm{sinc }\frac{\pi }{2}\frac{k}{k_\mathrm{Ny}}\right] ^n\quad \text {with}\quad \mathrm{sinc}\,x = \frac{\sin x}{x}. \end{aligned}$$

where \(n=1\) for NGP, \(n=2\) for CIC, \(n=3\) for TSC, \(n=4\) for PCS interpolation, and \(k_{\mathrm{Ny}}:=\pi /\varDelta x\) is the Nyquist wave number. NGP leads to a piecewise constant, CIC to a piece-wise linear, TSC to a piecewise quadratically (i.e., continuous value and first derivative), and PCS to piecewise cubically changing acceleration as a particle moves between grid points. The real space and Fourier space shape of the kernels is shown in Fig. 6. Note that the support is always \(n \varDelta x\), i.e. n cells, per dimension and thus increases with the order, and by the central limit theorem \(\tilde{W}_n\) converges to a normal distribution as \(n\rightarrow \infty \). Hence, going to higher order can impact memory locality and communication ghost zones negatively. Since an a priori unknown number of particles might deposit to the same grid cell, special care needs to be taken to make the particle projection thread safe in shared-memory parallelism (Ferrell and Bertschinger 1994).

Fig. 6
figure 6

Common particle-mesh mass assignment kernels in real space (panels a), and Fourier space (panels b) of increasing order: \(n=1\) NGP, \(n=2\) CIC, \(n=3\) TSC, \(n=4\) PCS. Note that the NGP kernel is not continuous, CIC is continuous but not differentiable, TSC is continuously differentiable, and PCS is twice differentiable. The support of the assignment functions is \(n\varDelta x\) per dimension, and they converge to a normal distribution for \(n\rightarrow \infty \). Due to their increasing smoothness, they also act as increasingly stronger low-pass filters

Alternatively to these mass assignment kernels for particles, it is possible to project phase-space tessellated particle distributions (cf. Sect. 3.2) exactly onto the force grid (Powell and Abel 2015; Sousbie and Colombi 2016). In practice, when using such sheet tessellation methods, for a given set of flow tracers, the phase-space interpolation can be constructed and sampled with M “mass carrying” particles which can then be deposited into the grid. Since the creation of mass carriers is a local operation, M can be arbitrarily large and thus the noise associated to N-body discreteness can be reduced systematically. This approach has been adopted by Hahn et al. (2013) and Angulo et al. (2013b) to simulate warm dark matter while suppressing the formation of artificial fragmentation, as we will discuss in greater detail in Sect. 7.3

The same mass assignment schemes can be used to reversely interpolate values of a discrete field back to the particle positions \(\left\{ \varvec{X}_i\right\} \) by inverting the projection kernel. It has to be ensured that the same order is used for both mass deposit and interpolation of the force to the particle positions, i.e., that deposit and interpolation are mutually inverse. This is an important consistency since, otherwise, (1) exact momentum conservation is not guaranteed, and (2) self-forces can occur allowing particles to accelerate themselves (cf. Hockney and Eastwood 1981). It is important to note that due to the grid discretisation, particle separations that are unresolved by the discrete grid are aliased to the wrong wave numbers, which e.g. can cause certain Fourier modes to grow at the wrong rate. Aliasing can be ameliorated by filtering out scales close to the Nyquist frequency, or by using interlacing techniques where by combination of multiple shifted deposits individual aliasing contributions can be cancelled at leading order (Chen et al. 1974; Hockney and Eastwood 1981). Such techniques are important also when estimating Fourier-space statistics (i.e., poly-spectra) from density fields obtained using above deposit techniques (see Sect. 9 for a discussion).

5.1.3 Relaxation methods and multi-scale

In order to overcome the limitations of Fourier space solvers (in particular, the large cost of the global transpose on all data necessary along with the lack of spatial adaptivity), a range of other methods have been developed. The requirement is that the Poisson source is known on a grid, which can also be an adaptively refined ‘AMR’ grid structure. On the grid, a finite difference version of the Poisson equation is then solved, e.g., for a second-order approximation in three dimension the solution is given by the finite difference equation:

$$\begin{aligned} \phi _{i-1,j,k}+\phi _{i+1,j,k}+\phi _{i,j-1,k}+\phi _{i,j+1,k}+\phi _{i,j,k-1}+\phi _{i,j,k+1}-6\phi _{i,j,k} = \varDelta x^2\,f_{i,j,k} \,, \end{aligned}$$

where indices refer to grid point locations as above, \(\varDelta x\) is the grid spacing, and \(f_{i,j,k} := 4\pi G (\rho _{i,j,k}-\overline{\rho })/a\) is the Poisson source. This can effectively be written as a matrix inversion problem \(\mathsf{\varvec {A}} \phi = f\) where the finite difference stencil gives rise to a sparse matrix \(\mathsf{\varvec {A}}\) and the solution sought is \(\phi =\mathsf{\varvec {A}}^{-1}f\). Efficient methods exist to solve such equations. A particularly powerful one, that can directly operate even on an AMR structure, is the adaptive multigrid method (Brandt 1977; Trottenberg et al. 2001), which is used e.g., by the RAMSES code (Teyssier 2002). It combines simple point relaxation (e.g., Jacobi or Gauss–Seidel iterations) with a hierarchical coarsening procedure which spreads the residual correction exponentially fast across the domain. Some additional care is required at the boundaries of adaptively refined regions. Here the resolution of the mesh changes, typically by a linear factor of two, and interpolation from the coarser grid to the ghost zones of the fine grid is required. In the one-way interface type of solvers, the coarse solution is obtained independently of the finer grid, and then interpolated to the finer grid ghost zones to serve as the boundary condition for the fine solution (Guillet and Teyssier 2011), but no update of the coarse solution is made based on the fine solution. This approach is particularly convenient for block-stepping schemes (cf. Sect. 4.2.2) where each level of the grid hierarchy has its own time step by solving e.g. twice on the fine level while solving only once on the coarse. A limitation of AMR grids is however that the force resolution can only change discontinuously by the refinement factor, both in time—if one wants to achieve a resolution that is constant in physical coordinates—and in space—as a particle moves across coarse-fine boundaries. On the other hand, AMR grids contain self-consistently an adaptive force softening (see Sect. 8.2), if the refinement strategy is tied to the local density or other estimators (Hobbs et al. 2016).

Depending on the fragmentation of the finer levels due to the dynamic adaptivity, other solvers can be more efficient than multigrid, such as direct relaxation solvers (Kravtsov et al. 1997) or conjugate gradient methods. However, it is in principle more accurate to account for the two-way interface and allow for a correction of the coarse potential from the fine grid as well, as discussed e.g. by Johansen and Colella (1998), Miniati and Colella (2007). Note that, once a deep grid hierarchy has developed, global Poisson solves in each fine time step are usually prohibitive for numerical algorithms. For this reason optimizations are often employed to solve for the gravitational acceleration of only a subset of particles in multi-stepping schemes. In the case of AMR, some care is necessary to interpolate boundary conditions also in time to avoid possible spurious self-interactions of particles.

5.2 Direct P2P summation

As discussed above, mesh-based methods bring along an additional discretisation of space. This can be avoided by computing interactions directly at the particle level from Eqs. (32b33). In this case, the gravitational potential at particle i’s location, \(\varvec{X_i}\), is given by the sum over the contribution of all the other particles in the system along with all periodic replicas of the finite box, i.e.

$$\begin{aligned} \phi (\varvec{x}_i) = - a^{-1} \sum _{\varvec{n}\in \mathbb {Z}^3} \left[ \sum _{{\mathop{j\!=\!1}\limits_ {{i\!\ne\! j}}}}^N\frac{G M_j}{\Vert \varvec{X}_i-\varvec{X}_j-\varvec{n}L \Vert } + \varphi _{\mathrm{box},L}(\varvec{X}_i-\varvec{n}L)\right] . \end{aligned}$$

Note that we neglected force softening for the moment, i.e. we set \(W(\varvec{x})=\delta _D(\varvec{x})\). Here \(\varphi _{\mathrm{box},L}\) is the potential due to a box \([0,L)^3\) of uniform background density \(\overline{\rho }=\varOmega _m\rho _c\) that guarantees that the density \(\rho -\overline{\rho }\) sourcing \(\phi \) vanishes when integrated over the box.

This double sum is slowly convergent with respect to \(\varvec{n}\), and in general there can be spurious forces arising from a finite truncation [but note that the sum is unconditionally convergent if the box has no dipole, e.g., Ballenegger (2014)]. A fast and exact way to compute this expression is provided by means of an Ewald summation (Ewald 1921), in which the sum is replaced by two independent sums, one in Fourier space for the periodic long-range contribution, and one in real space, for the non-periodic local contribution, which both converge rapidly. It is then possible to rewrite Eq. (64) employing the position of the nearest replica, which results into pairwise interactions with a modified gravitational potential. This potential needs to be computed numerically, thus, in GADGET3, for instance, it is tabulated and then interpolated at runtime, whereas in GADGET4, the code relies on a look-up table of a Taylor expansion with analytic derivatives of the Ewald potential. We summarise in more detail how this is achieved in Sect. 5.3, where we discuss in particular how the FFT can be efficiently used to execute the Fourier summation.

This direct summation of individual particle-particle forces is \(\mathcal {O}(N^2)\), that is, quadratic in the number of particles and thus becomes quickly computationally prohibitive. In addition, since it is a highly non-local operation, it would require a considerable amount of inter-process communication. In practice, this method is sometimes used to compute short-range interactions, where the operation is local and can exploit the large computational power provided by GPUs. This is, for instance, the approach followed by the HACC code (Habib et al. 2016), when running one of the largest simulations to date with 3.6 trillion particles; and also by the ABACUS code (Garrison et al. 2018). Direct summation enabled by GPUs has also been adopted by Rácz et al. (2019) for compactified simulations, where there is an additional advantage that only a small subset of the volume has to be followed down to \(z=0\) (cf. Sect. 6.3.5).

5.3 Particle mesh Ewald summation, force splitting and the P\(^3\)M method

Beyond the poor \(\mathcal {O}(N^2)\) scaling of the direct P2P summation (for which we discuss the solutions below), another important limitation of the naïve infinite direct summation is the infinite periodic contribution in Eq. (64). At the root of the solution is the Ewald summation (Ewald 1921), first used for cosmological simulations by Bouchet and Hernquist (1988), in which the total potential or acceleration is split into a short and a long range contribution, and where the short range contribution is summed in real space, while the long range contribution is summed in Fourier space where it converges due to its periodic character much faster. One thus introduces a ‘splitting kernel’ S so that

$$\begin{aligned} \phi (\varvec{x}) = \phi _{\mathrm{lr}}(\varvec{x})+ \phi _{\mathrm{sr}}(\varvec{x}) := S*\phi + (1-S)*\phi . \end{aligned}$$

The long-range contribution \(\phi _{\mathrm{lr}}\) can be computed using the PM method on a relatively coarse mesh. On the other hand, the short-range contribution \(\phi _{\mathrm{sr}}\), can be computed from the direct force between particles only in their immediate vicinity—since the particles further away contribute through the PM part. Instead of the direct force, which gives then rise to the P\(^3\)M method, modern codes often use a tree-method (see next section) for the short range force [this is e.g., what is implemented in the GADGET2 code by Springel (2005), see also Wang (2021)].

The splitting kernel effectively spreads the mass over a finite scale \(r_s\) for the long range interaction, and corrects for the residual with the short range interaction on scales \(\lesssim r_s\). Many choices are a priori possible, Hockney and Eastwood (1981), e.g., propose a sphere of uniformly decreasing density, or a Gaussian cloud. The latter is, e.g., used in the GADGET codes.

In terms of the Green’s function of the Laplacian \(G(\varvec{r}) = -1/(4\pi \Vert \varvec{r}\Vert )\), the formal solution for the cosmological Poisson equation reads \(\phi = \frac{4\pi G}{a} \left( \rho -\overline{\rho }\right) *G\). For a Gaussian cloud of scale \(r_s\), one has in real and Fourier space

$$\begin{aligned} S(r; r_s) = (2\pi r_s^2)^{-3/2} \exp \left( -\frac{r^2}{2r_s^2} \right) ,\quad \tilde{S}(k; r_s) = \exp \left[ -\frac{1}{2}k^2 r_s^2\right] . \end{aligned}$$

The ‘dressed’ Green’s functions \(G_{\mathrm{lr}} = G*S\) and \(G_\mathrm{sr} = G*(1-S)\) then become explicitly in real and Fourier space

$$\begin{aligned} G_{\mathrm{lr}}(r; r_s)&= - \frac{1 }{4\pi \,r} \,\mathrm{erf\left[ \frac{r}{\sqrt{2}r_s} \right] }, \quad&\tilde{G}_{\mathrm{lr}}(k; r_s)&= -\frac{1}{k^2}\exp \left[ -\frac{1}{2}k^2r_s^2 \right] , \end{aligned}$$
$$\begin{aligned} G_{\mathrm{sr}}(r; r_s)&= - \frac{1 }{4\pi \,r} \,\mathrm{erfc\left[ \frac{r}{\sqrt{2}r_s} \right] },&\tilde{G}_{\mathrm{sr}}(k; r_s)&= -\frac{1}{k^2}\left( 1-\exp \left[ -\frac{1}{2}k^2r_s^2 \right] \right) . \end{aligned}$$

Instead of the normal Green’s functions, one thus simply uses these truncated functions and obtains a hybrid solver. In order to use this approach, one chooses a transition scale of order the grid scale, \(r_s\sim \varDelta x\), and then replaces the PM Green’s function with \(G_{\mathrm{lr}}\). Instead of the particle-particle interaction in the direct summation or tree force (see below), one uses \(G_\mathrm{sr}\) for the potential, and \(\varvec{\nabla } G_{\mathrm{sr}}\) for the force.

While the long range interaction already includes the periodic Ewald summation component if solved with Fourier space methods, when evaluating the periodic replica summation for the short-range interaction, the evaluation can be restricted to the nearest replica in practice due to the rapid convergence with the regulated interaction. In addition, since PM forces are exponentially suppressed on scales comparable to \(r_s\) which is chosen to be close to the grid spacing \(\varDelta x\), aliasing of Fourier modes is suppressed.

Note that another more aggressive near-far field combination is adopted by the ABACUS code. In this approach, the computational domain is first split into a uniform grid with \(K^3\) cells. Interactions of particles separated by less than approximately 2L/K are computed using direct summation (neglecting Ewald corrections); otherwise are computed using a high-order multipole (\(p=8\)) representation of the force field in the \(K-\)grid. Since two particles only interact via either the near- or far-field forces, and the tree structure is fixed to the K-grid, this allows for several optimizations and out-of-core computations. The price is discontinuous force errors with a non-trivial spatial dependence, as well as reduced accuracy due to the lack of Ewald corrections. This, however, might be acceptable for some applications and, as we will see in Sect. 8.5, ABACUS performs well when compared to other state-of-the-art codes.

5.4 Hierarchical tree methods

Assuming that it is acceptable to compute gravitational forces with a given specified accuracy, there are ways to circumvent the \(\mathcal {O}(N^2)\) and non-locality problem of direct summation. A common approach is to employ a hierarchical tree structure to partition the mass distribution in space and compute the gravitational potential jointly exerted by groups of particles, whose potential is expanded to a given multipole order (Barnes and Hut 1986). Thus, instead of particle-particle interactions, particle-node interactions are evaluated. Since the depth of such a tree is typically \(\mathcal {O}(\log N)\), the complexity of the evaluation of all interactions can be reduced to \(\mathcal {O}(N\log N)\). This can be further reduced to an ideal \(\mathcal {O}(N)\) complexity with the fast multipole method (FMM, see below).

There are several alternatives for constructing tree structures. The most common choice is a regular octree in which each tree level is subdivided into 8 sub-cells of equal volume, this is for instance used by GADGET. Another alternative, used for instance in old versions of PKDGRAV are binary trees in which a node is split into only two daughter cells. This in principle has the advantage to adapt more easily to anisotropic domains, and a smoother transition among levels, at the expense of a higher cost in walking the tree or the need to go to higher order multipole expansions at fixed force error. The tree subdivision continues until a maximum number M of particles per node is reached (\(M=1\) in GADGET2-3 but higher in GADGET4 and PKDGRAV).

The main advantage brought by tree methods is that the pairwise interaction can be expanded perturbatively and grouped among particles at similar locations, thus reducing dramatically the number of calculations that needs to be carried out. The key philosophical difference with respect to direct summation is that one seeks to obtain the result at a desired accuracy, rather than the exact result to machine precision. This difference allows a dramatic improvement in algorithmic complexity. Another key aspect is that hierarchical trees are well suited for hierarchical (adaptive) timesteps.

Tree methods have for a long time been extraordinarily popular for evaluating the short range interactions also in hybrid tree-PM methods, as pioneered by Bagla (2002); Bagla and Ray (2003), or more recent FMM-PM (Gnedin 2019; Wang 2021; Springel et al. 2021) approaches, thus supplementing an efficient method for periodic long-range interactions with an efficient method which is not limited to the uniform coarse resolution of FFT-based approaches (or also discrete jumps in resolution of AMR approaches). We discuss some technical aspects of these methods next.

5.4.1 Hierarchical multipole expansion

In the ‘Barnes & Hut tree’ algorithm (Appel 1985; Barnes and Hut 1986), particle-node interactions are evaluated instead of particle-particle interactions. Let us consider a hierarchical octree decomposition of the simulation box volume \(\mathcal {V}:=[0,L_{\mathrm{box}}]^3\) at level \(\ell \) into cubical subvolumes, dubbed ‘nodes’, \(\mathcal {S}^\ell _{i=1\dots N_\ell }\) of side length \(L_{\mathrm{box}}/2^\ell \), where \(N_\ell =2^{3\ell }\), so that \(\bigcup _i \mathcal {S}^\ell _i = \mathcal {V}\) and \(\mathcal {S}^\ell _i\cap \mathcal {S}^\ell _{j\ne i} = \emptyset \) on each level gives a space partitioning. Let us consider the gravitational potential due to all particles contained in a node \(\varvec{X}_j\in S^\ell _i\). The partitioning is halted when only one (but typically a few) particle is left in a node. We shall assume isolated boundary conditions for clarity, i.e. we neglect the periodic sum in Eq. (64). Thanks to the partitioning, the gravitational interaction can be effectively localised with respect to the ‘tree node’ pivot at location \(\varvec{\lambda }\in \mathcal {S}^\ell _i\), so that the distance \(\Vert \varvec{X}_j - \varvec{\lambda } \Vert \le \sqrt{3} L_{\mathrm{box}}/2^\ell =: r_\ell \) is by definition bounded by the ‘node size‘ \(r_\ell \) and can serve as an expansion parameter. To this end, one re-writes the potential due to the particles in the node subvolume \(\mathcal {S}^\ell _i\)

$$\begin{aligned} \phi ^\ell _i(\varvec{x}) \propto \sum _{\varvec{X}_j\in \mathcal {S}_i^\ell } \frac{M_j}{\Vert \varvec{x}-\varvec{X}_j\Vert } = \sum _{\varvec{X}_j\in \mathcal {S}_i^\ell } \frac{M_j}{\Vert (\varvec{x}-\varvec{\lambda })-(\varvec{X}_j-\varvec{\lambda })\Vert } = \sum _{\varvec{X}_j\in \mathcal {S}_i^\ell } \frac{M_j}{\Vert \varvec{d}+\varvec{\lambda }-\varvec{X}_j\Vert } \end{aligned}$$

where \(\varvec{d}:=\varvec{x}-\varvec{\lambda }\). This can be Taylor expanded to yield the ‘P2M’ (particle-to-multipole) kernels

$$\begin{aligned} \begin{aligned} \frac{1}{\Vert \varvec{d}+\varvec{\lambda }-\varvec{X}_j\Vert } =&\underbrace{\frac{1}{\Vert \varvec{d}\Vert }}_{\text {monopole}} + \underbrace{\frac{d_k}{\Vert \varvec{d}\Vert ^3} \left( X_{j,k}-\lambda _k\right) }_{\text {dipole}\; \mathcal {O}(r_\ell /d^2)} + \\&\quad + \underbrace{\frac{1}{2}\frac{d_kd_l}{\Vert \varvec{d} \Vert ^5} \left( 3(X_{j,k}-\lambda _k)(X_{j,l}-\lambda _l) -\delta _{kl} \Vert \varvec{X}_j-\varvec{\lambda } \Vert ^2 \right) }_{\text {quadrupole}\;\mathcal {O}(r_\ell ^2/d^3)} +\dots , \end{aligned} \end{aligned}$$

which converges quickly if \(\Vert \varvec{d}\Vert \gg r_\ell \). The multipole moments depend only on the vectors \((\varvec{X}_j-\varvec{\lambda })\) and can be pre-computed up to a desired maximum order p during the tree construction and stored with each node. In doing this, one can exploit that multipole moments are best constructed bottom-up, as they can be translated in an upward-sweep to the parent pivot and then co-added—this yields an ‘upwards M2M’ (multipole-to-multipole) sweep. Note that if one sets \(\varvec{\lambda }\) to be the centre of mass of each tree node, then the dipole moment vanishes. The complexity of such a tree construction is \(\mathcal {O}(N\log N)\) for N particles.

When evaluating the potential \(\phi (\varvec{x})\) one now proceeds top-down from the root node at \(\ell =0\) in a ‘tree walk’ and evaluates M2P (multipole-to-particle) interactions between the given particle and the node. Since one knows that the error in \(\phi ^\ell _i(\varvec{x})\) is \(\mathcal {O}\left( (r_\ell /d)^p \right) \), one defines a maximum ‘opening angle’ \(\theta _{\mathrm{c}}\) and requires in order for the multipole expansion \(\phi ^\ell _i(\varvec{x})\) to be an acceptable approximation for the potential due to the mass distribution in \(\mathcal {S}^\ell _i\) that the respective opening angle obeys

$$\begin{aligned} \frac{r_\ell }{\Vert \varvec{d}\Vert } <\theta _{\mathrm{c}}. \end{aligned}$$

Otherwise the procedure is recursively repeated with each of the eight child nodes. Since the depth of a (balanced) octree built from a distribution of N particles is typically \(\mathcal {O}(\log N)\), a full potential or force calculation has an algorithmic complexity of \(\mathcal {O}(N\log N)\) instead of the \(\mathcal {O}(N^2)\) of the direct summation. The resulting relative error in a node-particle interaction is (Dehnen 2002)

$$\begin{aligned} \delta \phi \le \frac{\theta _c^{p+1}}{1-\theta _c} \frac{M_\mathrm{node}}{\Vert \mathbf {d}\Vert }, \end{aligned}$$

where \(M_{\mathrm{node}}\) is the node mass (i.e. the sum of the masses of all particles in \(\mathcal {S}^\ell _i\)), and p is the order of the multipole expansion. Eq. 70 error estimate is a purely geometric criterion, independent of the magnitude of \(M_\mathrm{node}\) and the multipole moments, as well as the actual value of the gravitational acceleration. It is also independent of the magnitude of the interaction, i.e. neglecting that far nodes contribute more than nearby ones to the total interaction.

An alternative method, proposed by Springel et al. (2001b), is to use a dynamical criterion by comparing the expected acceleration with the force error induced by a given node interaction. Specifically, when evaluating the particle-node interactions for particle j one sets

$$\begin{aligned} \theta _{\mathrm{c},j} = \left( \alpha \Vert \varvec{A}_j\Vert \frac{\Vert \varvec{d}\Vert ^2}{G M_{\mathrm{node}}}\right) ^{1/p}, \end{aligned}$$

where \(\Vert \varvec{A}_j\Vert \) is the modulus of the gravitational acceleration (which could be estimated from the force calculation performed in a previous step), and \(\alpha \) is a dimensionless parameter that controls the desired accuracy. Note, however, that for relatively isotropic mass distributions, the uncertainty of a given interaction might not be representative of the uncertainty in the total acceleration.

We highlight that the expressions (68)–(69) are valid for the non-periodic particle-node interactions, but for periodic boundary conditions additional terms arise owing to the modified Green’s function as seen in Eq. (64). The Green’s function is also modified in the case when tree interactions are combined with other methods such as PM in a tree-PM method (see Sect. 5.3). This implies in principle also modified error criteria (or opening angles), however, this is often neglected.

So far, performing the multipole expansion only to monopole order (with nodes centered at the center of mass) has been a popular choice for N-body codes. The reason behind this is that a second-order accurate expression is obtained with very low memory requirements (one needs to simply store the centre of mass of tree nodes instead of the geometric centre, which is enough when moderate accuracy is sought. However, in search of higher accuracy, a larger number of codes have started to also consider quadrupole and octupole terms, which requires more memory and computation but allows a less aggressive opening criteria. This has been advocated as the optimal combination that provides the most accurate estimate at a fixed computational cost (Dehnen and Read 2011; Potter and Stadel 2016), although the precise optimal order depends on the required accuracy (Springel et al. 2021). In the future, further gains from higher order terms can be obtained as computer architectures evolve towards higher FLOP/byte ratios.

A problem for tree codes used for cosmological simulations is that on large scales and/or at high redshift the mass distribution is very homogeneous. This is a problem since the net acceleration of a particle is then the sum of many terms of similar magnitude but opposite sign that mostly cancel. Thus, obtaining accurate forces requires a low tolerance error which increases the computational cost of a simulation. For instance, the Euclid Flagship simulation (Potter and Stadel 2016), which employed a pure Tree algorithm (cf. Sect. 10), spends a considerable amount of time on the gravitational evolution at high redshift. Naturally, this problem is exacerbated the larger the simulation and the higher the starting redshift.

A method to address this problem, that was proposed by Warren (2013) and implemented in the 2HOT code, is known as “background subtraction”. The main idea is to add the multipole expansion of a local uniform negative density to each interaction, which can be computed analytically for each cubic cell in a particle-node interaction. Although this adds computational cost to the force calculation, it results in an important overall reduction of the cost of a simulation since many more interactions can be represented by multipole approximations at high redshift. As far as we know, this has not been widely adopted by other codes.

A further optimization that is usually worth carrying out on modern architectures is to prevent tree refinement down to single particles (for which anyway all multipoles beyond the monopole vanish). Since the most local interactions end up being effectively direct summation anyway, one can get rid of the tree overhead and retain a ‘bucket’ of \(10^{2-3}\) particles in each leaf node rather than a single individual particle. All interactions within the node, as well as those which would open child nodes are carried out in direct summation. While algorithmically more complex, such a direct summation is memory-local and can be highly optimized and e.g. offloaded to GPUs, providing significant speed-up over the tree.

5.4.2 Fast-multipole method

Despite the huge advantage with respect to direct summation, a single interaction of a particle with the tree is still computationally expensive as it has a \(\mathcal {O}(\log N)\) complexity for a well-balanced tree. Furthermore, trees as described above have other disadvantages, for instance, gravitational interactions are not strictly symmetric. This leads to a violation of momentum conservation. A solution to these limitations is provided by fast multipole methods (FMM), originally proposed by Greengard and Rokhlin (1987) and extended to Cartesian coordinates by Dehnen (2000, 2002). These algorithms take the idea of hierarchical expansions one step further by realising that significant parts of the particle-node interactions are redundantly executed for particles that are within the same node. In order to achieve a \(\mathcal {O}(1)\) complexity per particle, the node-node interaction should be known and translated to the particle location. This is precisely what FMM achieves by symmetrising the interaction to node-node interactions between well-separated nodes, which are separately Taylor expanded inside of the two nodes. Up to recently, FMM methods have not been widespread in cosmology, presumably due to a combination of higher algorithmic and parallelization complexity. The advantages of FMM are becoming evident in modern N-body codes, which simulate extremely large numbers of particles and seek high accuracy, and thus FMM has been adopted in PKDGRAV, GADGET-4, and SWIFT. We only briefly summarize the main steps of the FMM algorithm here, and refer the reader to the reviews by, e.g., Kurzak and Pettitt (2006), Dehnen and Read (2011) for details on the method.

The FMM method builds on the same hierarchical space decomposition as the Barnes&Hut tree above and shares some operators. For the FMM algorithm, three steps are missing from the tree algorithm outlined in the previous section: a ‘downward M2L’ (multipole-to-local) sweep, which propagates the interactions back down the tree after the upward M2M sweep, thereby computing a local field expansion in the node. This expansion is then shifted in ‘downward L2L’ (local-to-local) steps to the centers of the child nodes, and to the particles in a final ‘L2P’ (local-to-particle) translation. As one has to rely on the quality of the local expansion in each node, FMM requires significantly higher order multipole expansions compared to standard Barnes&Hut trees to achieve low errors. Note that for a Cartesian expansion in monomials \(x^ly^mz^n\) at a fixed order \(p=l+m+n\), one has \((p+1)(p+2)/2\) multipole moments, i.e. \((p+1)(p+2)(p+3)/6\) for all orders up to incl. p, i.e. memory needed for each node scales as \(\mathcal {O}(p^3)\), and a standard implementation evaluating multipole pair interactions scales as \(\mathcal {O}(p^6)\). For expansions in spherical harmonics, one can achieve \(\mathcal {O}(p^3)\) scaling (Dehnen 2014). Note that for higher order expansions one can rely on known recursion relations to obtain the kernel coefficients (Visscher and Apalkov 2010) allowing arbitrary order implementations. Recently, it was demonstrated that a trace-free reformulation of the Cartesian expansion has a slimmer memory footprint (Coles and Bieri 2020) (better than 50% for \(p\ge 8\)). The same authors provide convenient Python scripts to auto-generate code for optimized expressions of the FMM operators symbolically. It is important to note that the higher algorithmic complexity lends itself well to recent architectures which favour high FLOP-to-byte ratio algorithms (Yokota and Barba 2012).

While the FMM force is symmetric, Springel et al. (2021) report however that force errors can be much less uniform in FMM than in a standard tree approach, so that it might be required to randomise the relative position of the expansion tree w.r.t. the particles between time steps in order to suppress the effect of correlated force errors on sensitive statistics for cosmology. In principle also isotropy could be further improved with random rotations. Note that errors might have a different spatial structure with different expansion bases however.

The FMM method indeed has constant time force evaluation complexity for each N-body particle. This assumes that the tree has already been built, or that building the tree does not have \(\mathcal {O}(N\log N)\) complexity (which is only true if it is not fully refined but truncated at a fixed scale). Note however that for FMM solvers, it is preferable to limit the tree depth to a minimum node size or at least use a larger number of particles in a leaf cell for which local interactions are computed by direct ‘P2P’ (particle-to-particle) interactions. Also, tree construction has typically a much lower pre-factor than the ‘tree walk’. Note further that many codes use some degree of ‘tree updating’ in order to avoid rebuilding the tree in every timestep.

In order to avoid explicit Ewald summation, some recent methods employ hybrid FFT-FMM methods, where essentially a PM method is used to evaluate the periodic long range interactions as in tree-PM and the FMM method is used to increase the resolution beyond the PM mesh for short-range interactions (Gnedin 2019; Springel et al. 2021).

6 Initial conditions

In previous sections we have discussed how to discretise a cold collisionless fluid, compute its self-gravity, and evolve it in time. Closing our review of the main numerical techniques for cosmological simulations, in this section we present details on how to compute and set up their initial conditions.

The complicated non-linear structures that are produced at late times in cosmological simulations develop from minute fluctuations around homogeneity in the early Universe that are amplified by gravitational collapse. While the fluctuations remain small (i.e. at early times, or on large scales) they permit a perturbative treatment of the underlying non-linear coupled equations. Such perturbative techniques belong to the standard repertoire of analytic techniques for the study of the cosmic large-scale structure, see e.g., Bernardeau et al. (2002) for a review, and Sect. 2.5 for a concise summary. At late times and on smaller scales, shell-crossing and deeply non-linear dynamics limit the applicability of perturbative techniques. While some attempts have been made to extend PT beyond shell-crossing (Taruya and Colombi 2017; Pietroni 2018; Rampf et al. 2021a), or by controlling such effects in effective fluid approaches, e.g., Baumann et al. (2012), the evolved non-linear universe is still the domain of simulations. At the same time, PT of various flavours is used to set up the fields that provide the initial conditions for fully non-linear cosmological simulations.

6.1 Connecting simulations with perturbation theory

The physics governing the infant phase of the Universe, that is dominated by a hot plasma tightly coupled to radiation and linked through gravity to dark matter and neutrinos, is considerably more complex than the purely geodesic evolution of collisionless gravitationally interacting paricles outlined in Sect. 2. Since density fluctuations are small, this phase can be treated accurately by perturbative techniques at leading order. State-of-the-art linear-order Einstein–Boltzmann (EB) codes that numerically integrate these coupled multi-physics systems are e.g., Camb (Lewis et al. 2000) and Class (Lesgourgues 2011; Blas et al. 2011)Footnote 14. These codes usually evolve at least dark matter, baryons, photons and (massive) neutrinos and output Fourier-space transfer functions for density \(\delta _X\) and velocity divergence \(\theta _X\) for each of the species X as well as the total matter density fluctuations at specifiable output times. Typically equations are integrated in synchronous gauge, in a frame comoving with dust. The use of the output of these Einstein–Boltzmann solver for non-linear simulations that (in the case of N-body simulations) model only Newtonian gravity and no relativistic species, let alone baryon-photon coupling, requires still some numerical considerations that we discuss next. The inclusion or non-inclusion of relativistic species makes a difference of several per cent in the background evolution and therefore the growth rate between redshifts \(z=100\) and \(z=0\) (Fidler et al. 2017b), implying that it is crucial to be aware of what physics is included in the calculations for the initial conditions and in the non-linear cosmological code. Usually, it is sufficiently accurate to combine output for density perturbations in synchronous gauge with Newtonian gauge velocities when working with Newtonian equations of motion, but also self-consistent gauge choices exist which allow the correct inclusion of relativistic effects even within Newtonian simulations.

The main approaches adopted when using the output from an EB solvers to set up initial conditions are illustrated in Figure 7 and are:

Forward method In this approach, the output of the Einstein–Boltzmann code at some early time \(z_{\mathrm{start}} \gtrsim 100\) is used. To avoid errors of several per cent at low redshift on all scales, relativistic components must be included in the background evolution of the non-linear solution in this case. Also the significant evolution of the horizon between \(z>100\) and 0 means that for very large-scale simulations, relativistic corrections can become important and should be included as well for high precision. Since all corrections beyond the background are significant only on large scales, they remain perturbative. In the Cosira approach (Brandbyge et al. 2017), which is also used e.g., in the Euclid flagship simulations (Potter et al. 2017), they are added as corrections at the linear level (subtracting essentially the Newtonian linear gravity from the relativistic), convolving the resulting correction with the random phases of the initial conditions, and adding this realisation-specific correction to the gravity solver step of the N-body code.

Fig. 7
figure 7

Different setups used to initialise N-body simulations with the output from a linear Einstein–Boltzmann (EB) solver such as Camb or Class to bridge the gap between missing physics in N-body codes on the one hand, and missing non-linearities in the EB codes on the other hand. ‘EB linear’ represents the linear full-physics evolution through the EB code, ‘reduced linear’ a reduced linear model (including physics captured in the N-body code), ‘non-linear LPT’ the non-linear evolution using Lagrangian perturbation theory to some finite order valid prior to shell-crossing, and ‘N-body’ the full non-linear evolution. In the ‘forward’ approach additional fields (e.g., neutrinos, relativistic corrections) need to be added to match the EB solution at \(a_{\mathrm{target}}\)

Care has to be taken that these corrections never significantly contribute to non-linear terms as the corresponding back-reaction on the linear field is neglected. While requiring a direct integration of the Einstein–Boltzmann solver with the N-body code, this approach can readily include also a treatment of massive neutrinos at linear order (Tram et al. 2019, and see discussion in Sect. 7.8.2). If all relevant physics is included, this approach guarantees very good agreement between Einstein–Boltzmann and N-body on linear scales. Due to the very nature of this approach, it does not allow the use of high-order Lagrangian perturbation theory since only linear quantities are known at the starting time \(z_{\mathrm{start}}\), which therefore has to be pushed to early times in order to not itself affect the simulation results (see discussion in Sect. 8.3). If fields \(\delta _X\) and \(\theta _X\) are known for a species X at the starting time, then they can be converted into leading order consistent Lagrangian maps with respective velocities through the relations

$$\begin{aligned} \varvec{x}(\varvec{q};\,z_{\mathrm{start}}) = \varvec{q} - \varvec{\nabla }\nabla ^{-2} \delta ^{\mathrm{EB}}_X(\varvec{q};\,z_{\mathrm{start}})\qquad \text {and}\qquad \varvec{v}(\varvec{q};\,z_{\mathrm{start}}) = \varvec{\nabla }\nabla ^{-2} \theta ^\mathrm{EB}_X(\varvec{q};\,z_{\mathrm{start}}). \end{aligned}$$

Backward method An alternative approach to the forward method that allows to couple a non-linear simulation with reduced physical models to the EB solutions is given by the ‘backward method’ (not to be confused with the ‘backscaling’ method below). Here, the linearised set of equations solved by the non-linear code are used to integrate backwards in time, i.e. from \(z_{\mathrm{target}}\) where the output of the EB code is known, to the starting time of the non-linear simulation \(z_{\mathrm{start}}\). It is thus possible to reduce the multi-species universe e.g., to just two species, total matter and massive neutrinos (Zennaro et al. 2017), at \(z_{\mathrm{target}}\), evolve them backwards in time under the linearised physics of the N-body code, and then provide ICs using the prescription (73). The leading order evolution of the ‘active’ fluids takes into account scale-dependent growth and agrees reasonably well at high redshifts with the full EB solution. The limitation of this approach is that any decaying modes that are present at \(z_{\mathrm{target}}\) must still be small at \(z_{\mathrm{start}}\). This can be achieved well for neutrinos [with sub per cent errors for \(z_{\mathrm{start}}\lesssim 100\) with \(\sum m_\nu \lesssim 0.3\,\mathrm{eV}\), see Zennaro et al. (2017)] due to their small contribution to the total energy budget, but is more limited e.g., for baryons. Again, this approach is then restricted to being used for first order accurate displacements and velocities as only linear fields are known.

Backscaling method The arguably simplest approach, the back-scaling method, avoids the complications of differences of physics. At the same time, it is the one rigorously consistent with Lagrangian perturbation theory. It is also the traditionally used approach to set up N-body ICs. In this method, one uses the Einstein–Boltzmann code to evolve the linear multi-physics equations to a target redshift \(z_{\mathrm{target}}\) and then re-scales the total matter perturbation \(\delta ^{\mathrm{EB}}(z_{\mathrm{target}})\) as output by the EB-code to arbitrary times using the linear theory growth factor defined by Eq. (20) as

$$\begin{aligned} \tilde{\delta }_m(k;\, z_{\mathrm{start}}) = \frac{D_+(z_\mathrm{start})}{D_+(z_{\mathrm{target}})} \, \tilde{\delta }_m^\mathrm{EB}(k;\,z_{\mathrm{target}}). \end{aligned}$$

The main advantage of this approach is that by definition the correct linear theory is obtained in the vicinity of \(z=z_\mathrm{target}\) including e.g., relativistic and neutrino effects, without having to include this physics into the N-body code. This comes at the price that at early times, it might not in general agree with the full EB solution since a plethora of modes captured by the higher-dimensional EB system are reduced to scale-independent growth under the linear growing mode only. However, this is not a problem if non-linear coupling is unimportant, which is an excellent assumption precisely at early times. Backscaling can also be rigorously extended to simulations including multiple fluids coupled through gravity (Rampf et al. 2021b) by including additional isocurvature modes. It can in principle also account for scale-dependent evolution if it can be modelled at the \(D_+(k;\,a)\) level. This method, w.r.t. the total matter field in \(\varLambda \)CDM agrees exactly with the ‘backward method’ above if the decaying mode is neglected. Arguably the biggest advantage of the backscaling method is that it connects naturally with high order Lagrangian perturbation theory, as we will discuss next.

6.2 Initial conditions from Lagrangian perturbation theory

With the increasing precision requirements of N-body simulations over the last 20 years, it quickly became clear that first order accurate initial conditions are insufficient. Those are ICs that follow Eq. (73), which together with the back-scaled input spectrum from Eq. (74) amounts to the Zel’dovich approximation (ZA; Zel’dovich 1970). The reason is that linear ICs suffer from significant transients—decaying truncation errors between the ICs and the true non-linear evolution (Scoccimarro 1998; Crocce et al. 2006)—since higher order non-Gaussian moments of the density distribution are not accurately captured (see discussion in Sect. 8.3). Higher order is needed to follow correctly the higher order moments of the density distribution, i.e. first order captures only the variance, second in addition also the skewness and third even the kurtosis (Munshi et al. 1994). These ‘transients’ can only be suppressed by going to very early starting times, when the linear Gaussian approximation is accurate, at the cost of typically larger numerical errors (Michaux et al. 2021). The alternative to early starts is to go to higher order perturbation theory beyond the ZA, where the displacement field \(\varvec{\varPsi }\) of the Lagrangian map, \(\varvec{x}(\varvec{q},\,\tau )=\varvec{q}+\varvec{\varPsi }(\varvec{q},\,\tau )\), is expanded in a Taylor series to order n in the linear theory growth factor \(D_+\) yielding the nLPT approximation

$$\begin{aligned} \varvec{\varPsi }(\varvec{q},\tau ) = \sum _{j=1}^n D_+(\tau )^j \; \varvec{\varPsi }^{(j)}(\varvec{q}). \end{aligned}$$

This includes only growing modes, but remains regular as \(D_+\rightarrow 0\), i.e. \(a\rightarrow 0\), with the key property that \(\varvec{x}\rightarrow \varvec{q}\), i.e. the initial state is indeed a homogeneous unperturbed universe. Since the density is uniform in the \(a\rightarrow 0\) limit (where the limit is taken in the absence of radiation, otherwise one can use \(a\sim 10^{-2}\) during matter domination for simplicity), the growing mode perturbations are at leading order encapsulated in a single potential \(\phi ^{(1)}(\varvec{q})\) which can be connected to the back-scaling relation from above to the EB code output. This yields the famous ‘Zel’dovich approximation’

$$\begin{aligned} \varvec{\varPsi }^{(1)} = -\varvec{\nabla }\phi ^{(1)} \quad \text {with}\quad \phi ^{(1)} := -\nabla ^{-2} \lim _{a\rightarrow 0} \frac{D_+(a)}{a}\,\delta _m(\varvec{q};\, a), \end{aligned}$$

that can be used to set up simulation initial conditions by displacing Lagrangian fluid elements (e.g., N-body particles) consistent with \(\varvec{\varPsi }\) and giving them velocities according to \(\dot{\varvec{\varPsi }}\). Inserting this ansatz order by order in Eq. (25) returns the well known order-truncated n-th order LPT forms. Specifically, the 2LPT contribution to the displacement field has the form

$$\begin{aligned} \varvec{\varPsi }^{(2)} = \varvec{\nabla }\phi ^{(2)}\quad \text {with}\quad \phi ^{(2)} = -\frac{3}{14}\nabla ^{-2} \left[ {\phi }^{(1)}_{,ii} {\phi }^{(1)}_{,jj} - {\phi }^{(1)}_{,ij}{\phi }^{(1)}_{,ij}\right] , \end{aligned}$$

while at third order, i.e. 3LPT, the displacement field starts to have both longitudinal (i.e. irrotational) and transverse (i.e. solenoidal) components (Catelan 1995; Rampf and Buchert 2012)

$$\begin{aligned} \varvec{\varPsi }^{(3)} = \varvec{\nabla }\phi ^{(3)}+\varvec{\nabla }\times \varvec{A}^{(3)} \quad&\text {with}&\quad \phi ^{(3)} = \frac{1}{3}\nabla ^{-2} \left[ \det \phi ^{(1)}_{,ij} \right] -\frac{5}{21}\nabla ^{-2}\left[ \phi ^{(2)}_{,ii}\phi ^{(1)}_{,jj}-\phi ^{(2)}_{,ij} \phi ^{(1)}_{,ji}\right] \nonumber \\ \quad&\text {and}&\quad \varvec{A}^{(3)} = \frac{1}{7}\nabla ^{-2}\left[ \varvec{\nabla }\phi ^{(2)}_{,i}\times \varvec{\nabla }\phi ^{(1)}_{,i}\right] . \end{aligned}$$

The transverse part appears at 3LPT order and preserves the potential nature of the flow in Eulerian space. Newtonian gravity (i.e., Hamiltonian mechanics coupled to only a scalar potential) cannot produce vorticity (i.e., it exactly preserves any that might be present in the initial conditions, which however in cosmological context would appear as a decaying mode that blows up as \(a\rightarrow 0\)). For the truncated nLPT series this is only true at the respective order, i.e. \(\varvec{\nabla }_x \times \varvec{v} = \varvec{\nabla }_x \times \dot{\varvec{\varPsi }} =\mathcal {O}(D_+^n)\) (Uhlemann et al. 2019). For systems of more than one pressureless fluid, this approach can be readily generalised taking into account isocurvature modes to all, and decaying modes to first order currently (Rampf et al. 2021b; Hahn et al. 2021). Note that the relations given above for nLPT have small corrections in \(\varLambda \)CDM (Bouchet et al. 1995; Rampf et al. 2021b), which are however not important when initial conditions are generated at \(z\gg 1\), when it is safe to assume an Einstein–de Sitter cosmology.

In Fig. 8 we show the power spectrum of the source terms, \(\nabla ^2 \phi ^{(n)}\), for 2LPT and 3LPT. As expected, we see that the higher-order fields have significantly smaller amplitude than the linear-order density power spectrum. However, these higher-order contributions are required to improve the faithfulness of the simulated field and are in fact important for correctly predicting certain cosmological statistics. In addition, note that when computing the high-order potentials \(\phi ^{(n)}\) and \(\varvec{A}^{(n)}\) (with \(n\ge 2\)) some care has to be taken to avoid aliasing (Orszag 1971; Michaux et al. 2021; Rampf and Hahn 2021), which can be important even on large scales, as shown in the bottom panels of Fig. 8.

Fig. 8
figure 8

Image reproduced with permission from Michaux et al. (2021), copyright by the authors

The power spectrum of various fields contributing to displacements in Lagrangian Perturbation Theory up to 3rd order. The top panel shows the power spectrum of \(\nabla ^2 \phi ^{(2)}\) that contributes to the 2nd-order displacements (cf. Eq. 77), whereas the bottom left and right panel show the spectrum of \(\nabla ^2 \phi ^{(3a)}\) and \(\nabla ^2 \phi ^{(3b)}\) which correspond to the first and second terms of the 3rd order LPT contribution \(\phi ^{(3)}\), given in Eq. (6.2). In each case we display the results measured in a \(250 h^{-1}\mathrm{Mpc}\) and \(1000 h^{-1}\mathrm{Mpc}\) boxes, with and without correct de-aliasing, as indicated by the figure. The ratio between the aliased and de-aliased solutions with respect to the analytic expectation is shown in the bottom panels

Note that recently, Rampf and Hahn (2021) were the first to numerically implement the full nLPT recursion relations so that fields to arbitrary order can be used for ICs in principle, only limited by computer memory. This is included in the publicly-available MonofonIC Music-2 codeFootnote 15.

6.3 Generating Gaussian realisations of the perturbation fields

The previous results were generic, for numerical simulations, one has to work with a specific realisation of the Universe, which we will discuss next. The specific case of the realisation of our Universe is discussed as well.

6.3.1 Unconstrained realisations

Many inflationary cosmological models predict that scalar metric fluctuations are very close to Gaussian (Maldacena 2003; Acquaviva et al. 2003; Creminelli 2003). As we have shown above, Lagrangian perturbation theory, focusing on the fastest growing mode, is built up from a single initial scalar potential \(\phi ^{(1)}\) as defined in Eqs. (74) and (75), while the forward and backward approaches work with multiple fields \(\delta _m\), \(\theta _m\), and possibly others. These specify expectation values for a random realisation. Since these are all linear fields (i.e. we can assume that non-linear corrections can be neglected at \(a=0\) for back-scaling, and at \(a_{\mathrm{start}}\) for the forward and backward method) they will be statistically homogeneous and isotropic Gaussian random fields fully characterised by their two-point function. A general real-valued Gaussian homogeneous and isotropic random field \(\phi (\varvec{x})\) can be written as the Fourier integral

$$\begin{aligned} \phi (\varvec{x}) = \frac{1}{(2\pi )^3}\int _{\mathbb {R}^3}\mathrm{d}^3k\, \mathrm{e}^{\text {i}\varvec{k}\cdot \varvec{x}} \tilde{\varphi }(k)\,\tilde{W}(\varvec{k}) , \end{aligned}$$

where \(\tilde{W}(\varvec{k})\) is a complex-valued three-dimensional random field (also known as “white noise" since its power spectrum is \(\varvec{k}\) independent) with

$$\begin{aligned} \tilde{W}(\varvec{k}) = \overline{\tilde{W}(-\varvec{k})},\qquad \langle \tilde{W}(\varvec{k})\rangle =0,\qquad \text {and}\qquad \langle \tilde{W}(\varvec{k})\,\overline{\tilde{W}(\varvec{k}^\prime )}\rangle =\delta _D(\varvec{k}-\varvec{k}^\prime ), \end{aligned}$$

and \(\tilde{\varphi }(k)\) is the (isotropic) field amplitude in Fourier space as computed by the Einstein–Boltzmann code. This is often given in terms of a transfer function \(\tilde{T}(k)\) which is related to the field amplitude as \(\tilde{\varphi }(k)\propto k^{n_s/2}\tilde{T}(k)\), where \(n_s\) is the spectral index of the primordial perturbations from inflation. If the power spectrum P(k) is given, then setting \(\tilde{\varphi }(k) = (2\pi )^{3/2}\sqrt{P(k)}\) yields the desired outcome

$$\begin{aligned} \langle \tilde{\phi }(\varvec{k})\,\overline{\tilde{\phi }(\varvec{k}^\prime )}\rangle =(2\pi )^3\,P(k)\,\delta _D(\varvec{k}-\varvec{k}^\prime ). \end{aligned}$$

In order to implement these relations numerically, the usual approach when generating initial conditions is to replace the Fourier integral (79) with a discrete Fourier sum that is cut off in the IR by a ‘box mode’ \(k_0=2\pi /L\) and in the UV by a ‘Nyquist mode’ \(k_{\mathrm{Ny}}:=k_0 N/2\) so that the integral can be conveniently computed by a DFT of size \(N^3\). Naturally, fluctuations on scales larger than the box and their harmonics cannot be represented (but see below in Sect. 6.3.5). This is usually not a problem as long as \(k_0\ll k_{\mathrm{NL}}\), where \(k_{\mathrm{NL}}\) is the scale where non-linear effects become important, since then non-linearities can be assumed to not be strongly coupling to unresolved IR modes and are also not sourced dominantly by the very sparsely populated modes at the box scale (which would otherwise break isotropy in the non-linear structure). For simulations evolved to \(z=0\) this implies typically box sizes of at least \(300\,h^{-1}\mathrm{Mpc}\) comoving.

6.3.2 Reduced variance sampling

The noise field \(\tilde{W}\) naturally has a polar decomposition \(\tilde{W}(\varvec{k}) =: \tilde{A}(\varvec{k})\, \mathrm{e}^{\text {i}\theta (\varvec{k})}\), where A obeys a Rayleigh distribution (i.e. a \(\chi \)-distribution with two degrees of freedom) and \(\theta \) is uniform on \([0,2\pi )\). The power spectrum associated with W transforms as

$$\begin{aligned} \langle \tilde{W}(\varvec{k})\, \overline{\tilde{W}(\varvec{k}^\prime )} \rangle = \langle \tilde{A}(\varvec{k}) \,\tilde{A}(\varvec{k}^\prime )\rangle = \delta _D(\varvec{k}-\varvec{k}^\prime ), \end{aligned}$$

i.e., it is independent of the phase \(\theta \). In any one realization of \(\tilde{W}(\varvec{k})\), one expects fluctuations (“cosmic variance”) in \(\langle \tilde{W}(\varvec{k})\overline{\tilde{W}(\varvec{k}^\prime )}\rangle \) around the ensemble average of the order of the inverse square root of the number of discrete modes in some finite interval \(k\dots k+dk\) available in the simulation volume.

The amplitude of this cosmic variance can be dramatically reduced by simply fixing A to its expectation value (Angulo and Pontzen 2016), i.e., by ‘sampling’ A from a Dirac distribution \(A(\varvec{k})=\delta _D(\varvec{k})\). This technique is commonly referred to as “Fixing”. Clearly, the resulting field has much reduced degrees of freedom and a very specific non-Gaussian character. In principle, this introduces a bias in the nonlinear evolution of such a field, e.g., the ensemble averaged power spectrum at \(z=0\) differs from that obtained from an ensemble of Gaussian realizations. However, using perturbation theory, Angulo and Pontzen (2016) showed that the magnitude of this bias in the power spectrum and bispectrum is always smaller (by a factor equal to the number of Fourier modes) compared to mode-coupling terms of physical origin. In fact, it was found empirically that simulations initialised with such ICs produce highly accurate estimates of non-linear clustering (including power spectra, bispectra, halo mass function, and others) and that the level of spurious non-Gaussianity introduced is almost undetectable for large enough simulation boxes (Angulo and Pontzen 2016; Villaescusa-Navarro et al. 2018; Klypin et al. 2020).

The main advantage of “Fixing” is that it allows to avoid giant boxes or large ensembles of simulations in order to obtain measurements with low cosmic variance. A further reduction in cosmic variance can be achieved by considering pairs of simulations where the second simulation is initialised with \(\overline{\tilde{W}(\varvec{k})}\) instead of \(\tilde{W}(\varvec{k})\) (Pontzen et al. 2016; Angulo and Pontzen 2016). This is equivalent to inverting the phase \(\theta \), and averages out the leading order non-linear term contributing to cosmic variance (Pontzen et al. 2016). This technique has been adopted in several state-of-the-art dark matter and hydro-dynamical simulations (Angulo et al. 2021; Chuang et al. 2019; Euclid Collaboration 2019; Anderson et al. 2019).

The performance of this approach in practice is shown in Fig. 9 which compares the \(z=1\) power spectrum and multipoles of the redshift space correlation function. The mean of 300 \(L=3000 h^{-1}\mathrm{Mpc}\) simulations is displayed as a solid line whereas the prediction of a single pair of ‘fixed’ simulations is shown as red circles. In the bottom panels we can appreciate the superb agreement between both measurements, with relative differences being almost always below one per cent.

Fig. 9
figure 9

Image reproduced with permission from Angulo and Pontzen (2016), copyright by the authors

The nonlinear dark matter clustering at \(z=1\) in simulations with different initial conditions. The left panel shows the real-space power spectrum whereas the right panel shows the monopole and quadrupole of the redshift space correlation function. Gray lines display the results for an ensemble of 300 simulations with random Gaussian initial conditions, whereas the symbols show the results from a single Paired-&-Fixed pair of simulations. In the bottom panels we show the difference between the ensemble mean and the Paired-&-Fixed results in units of the standard deviation of the ensemble measurements. Note the drastic reduction of noise of this method, specially on large scales

6.3.3 Numerical universes: spatially stable white noise fields

A key problem in generating numerical realisations of random fields is that certain properties of the field should not depend on the exact size or resolution of the field in the numerical realisation. This means that it is desirable to have a method to generate white Gaussian random fields which guarantees that the large-scale modes remain identical, no matter what the resolution of the specific realisation. A further advantage is gained if this method can be parallelised, i.e. if the drawing does not necessarily have to be sequential in order to be reproducible. This problem has found several solutions, a selection of which we discuss below.

N-GenIC (Springel et al. 2005)Footnote 16 and derivates produce spatially stable white noise by drawing white noise in Fourier space in a reproducible manner. I.e. for two simulations with \(k_{\mathrm{Ny}}\) and \(k_{\mathrm{Ny}}^\prime >k_{\mathrm{Ny}}\), all modes that are representable in both, i.e. \(-k_{\mathrm{Ny}}\le k_{x,y,z}\le k_\mathrm{Ny}\), are identical, and for the higher resolution simulation, new modes are added between \(-k_{\mathrm{Ny}}^\prime \le k_{x,y,z} < -k_\mathrm{Ny}\) and \(-k_{\mathrm{Ny}}< k_{x,y,z} \le k^\prime _{\mathrm{Ny}}\). The random number generation is parallelised by a stable 2+1 decomposition of Fourier space. A shortcoming of this approach of sampling modes is that drawing in Fourier space is inherently non-local so that it cannot be generalised to zoom simulations, where high-resolution is desired only in a small subvolume, without generating the full field first.

Panphasia (Jenkins 2013)Footnote 17 has overcome this shortcoming by relying on a hierarchical decomposition of the white noise field in terms of cleverly chosen octree basis functions. Essentially instead of drawing Fourier modes, one draws the coefficients of this hierarchical basis thus allowing to add as much small-scale information as desired at any location in the three-dimensional volume.

Music-1 (Hahn and Abel 2011) subdivides the cubical simulation volume into subcubes for which the random numbers can be drawn independently and in parallel in real space, since each cube carries its own seed. Refined regions are combined with white noise from the overlapping volume from the coarser levels, just as in the N-GenIC approach, thus enforcing the modes represented at the coarser level to be present on the finer level.

6.3.4 Zoom simulations

In cases in which the focus of a simulation is the assembly of single objects or particular regions of the universe, it is neither desirable nor affordable to achieve the necessary high resolution throughout the entire simulation volume. In such situations, ‘zoom simulations’ are preferable, where the (mass) resolution is much higher in a small sub-volume of the entire box. This can be achieved in principle by generating high-resolution initial conditions, and then degrading the resolution outside the region of interest [as followed e.g. by the ZInCo code (Garaldi et al. 2016)]. This approach is, however, limited by memory, and for deeper zooms, refinement methods are necessary. The basic idea is that in the region of interest nested grids are inserted on which the Gaussian noise is refined. Some special care must be taken when applying the transfer function convolution and resolution of the Poisson equation. Such approaches were pioneered by Katz et al. (1994), Bertschinger (2001), and then extended to higher accuracy and 2LPT perturbation theory using a tree-based approach by Jenkins (2010) which is implemented in a non-public code used by the Virgo consortium, and using multi-grid techniques by Hahn and Abel (2011) in the publicly available MUSIC-1 codeFootnote 18. A recent addition is the GENET-IC codeFootnote 19 (Stopyra et al. 2021) that focuses on the application of constraints (cf. Sect. 6.3.7) to such zoom simulations but currently only supports first order LPT. An example of a particularly deep zoom simulation is the ‘voids in voids’ simulation shown in Fig. 13.

6.3.5 Super-sample covariance and ensembles

On the scale of the simulated box, the mean overdensity and its variance are expected to be non-zero for realistic power spectra. This is a priori in contradiction with periodic boundary conditions (but see Sect. 6.3.6) and so the mean overdensity of the volume in the vast majority of cosmological simulations is enforced to be zero for consistency. Hence, when an ensemble of simulations is considered, the variance of modes \(k<k_0\) is zero despite all of them having different initial white noise fields and providing fair ensemble averages for modes \(k_0\lesssim k\lesssim k_{\mathrm{Ny}}\). This implies that the component of the covariance that is due to large-scale overdensities and tides is underestimated (Akitsu et al. 2019) which is sometimes referred to as super-sample covariance—in analogy to a similar effect present in galaxy surveys (Li et al. 2018)—, and can be an important source of error in covariance matrices derived from ensembles of simulations, especially if the simulated boxes are of small size (Klypin and Prada 2019). Such effects can be circumvented in “separate universe” simulations that are discussed in Sect. 6.3.6.

Furthermore, it is important to note that for Fourier summed realisations of \(\phi \), the correspondence between power spectra and correlation functions is broken since the discrete (cyclic) convolution does not equal the continuous convolution, i.e.,

$$\begin{aligned} \phi (\varvec{x}) \ne \phi ^K(\varvec{x})&:= \mathrm{DFT}^{-1}\left[ \tilde{W}(\varvec{k})\, \tilde{\phi }(k) \right]&\text {and}&\end{aligned}$$
$$\begin{aligned} \phi (\varvec{x}) \ne \phi ^R(\varvec{x})&:= \mathrm{DFT}^{-1}\left[ \tilde{W}\right] \circledast \phi (\Vert \varvec{x}\Vert )&{\text {where}} \;\phi (r)&:= \frac{1}{2\pi ^2}\int _0^\infty \frac{\sin kr}{kr} \tilde{\phi }(k) k^2 \mathrm{d}k, \end{aligned}$$

where ‘\(\circledast \)’ symbolises a discrete cyclic convolution. This implies that real space and Fourier space statistics on discrete finite numerical Universes coincide neither with one another nor with the continuous relation. It is always possible to consider such real space realisations \(\phi ^R\) instead of Fourier space realisations \(\phi ^K\) (Pen 1997; Sirko 2005). In the absence of super-sample covariance, the correct statistics is always recovered on scales \(k_0\ll k\ll k_{\mathrm{Ny}}\). Since the real-space kernel \( \phi (r)\) is effectively truncated at the box scale, it does not impose the box overdensity to vanish and therefore also samples density fluctuations on the scale of the box, which must be absorbed into the background evolution (cf. Sect. 6.3.6).

The real-space sampling, by definition, reproduces the two-point correlation function also in smaller boxes if one allows the mean density of the box to vary. In fact, Sirko (2005) argued that this approach yields better convergence on statistics that depend more sensitively on real-space quantities (such as the halo mass function that depends on the variance of the mass field on a given scale), and also an accurate description of the correlation function for scales \(r \gtrsim L_{\mathrm{box}}/10\). However, Orban (2013) demonstrated that the correct correlation function can still be correctly recovered by accounting for the integral constraint in correlation function measurements, i.e. by realising that the \(\phi ^K\) sampling implicitly imposes \(\int _0^{R}r^2 \xi (r)\mathrm{d}r=0\) already over a scale R related to the box size \(L_\mathrm{box}\) instead of only in the limit \(R\rightarrow \infty \). A better estimator can therefore be obtained by simply subtracting this expected bias.

Note that an alternative approach to account for non-periodic boundary conditions have been recently proposed (Rácz et al. 2018, 2019). In such ‘compactified’ simulations an infinite volume is mapped to the surface of a four-dimensional hypersphere. This compactified space can then be partitioned onto a regular grid with which it is possible to simulate a region of the universe without imposing periodic boundary conditions. This approach has the advantage for some applications that it naturally provides an adaptive mass resolution which increases near to the center of the simulation volume where a hypothetical observer is located.

6.3.6 Separate universe simulations

How small scales are affected by the presence of fluctuations on larger scales is not only important to understand finite-volume effects and ensembles of simulations, as we discussed in the previous section, but it also is a central question for structure formation in general. For instance, this response is an essential ingredient in models for biased tracers or in perturbation theory. These interactions can be quantified in standard N-body simulations, however, a method referred to as “separate universe simulations” provides a more controlled way to carry out experiments where perturbations larger than the simulated volume are systematically varied to yield accurate results even with modest simulation volumes. A key advantage of the separate universe technique is that it allows to quantify the dependence of small-scale structure formation on large-scale perturbations. For instance, changing the effective mean density, e.g., of two simulations to \(+\delta _0\) and \(-\delta _0\), one can compute the response of the power spectrum by taking a simple derivative \(d P / d \delta _0 \simeq (P(k;+\delta _0) - P(k;-\delta _0))/2\delta _0\) (e.g., Li et al. 2014), which can be extended also to higher orders.

The main idea behind the separate universe formalism (Sirko 2005; Baldauf et al. 2011; Sherwin and Zaldarriaga 2012; Li et al. 2014; Wagner et al. 2015; Baldauf et al. 2016) is that a long wavelength density fluctuation can be absorbed in the background cosmology. In other words, larger-than-box fluctuations simply lead to a non-zero overdensity \(\delta _0=\mathrm{const.}\) of the box that must be absorbed into the background density in order to be consistent with the periodic boundary conditions of the box, i.e., one matches

$$\begin{aligned} \rho (a) [1 + D_+(a) \delta _0] =: \breve{\rho }(a) , \end{aligned}$$

and thus structure in a given region embedded inside a region of overdensity \(\delta _0\) (today, i.e. at \(a=1\)) is equivalent to that of an isolated region of the universe evolving with a modified set of cosmological parameters indicated by ‘\(\breve{}\)’. Specifically, the modified Hubble parameter, and matter, curvature, and dark energy density parameters become

$$\begin{aligned} \breve{H_0} := H_0 \varDelta _H, \qquad \breve{\varOmega }_m := \varOmega _\mathrm{m} \varDelta _H^{-2}, \qquad \breve{\varOmega }_K := 1 - \varDelta _H^{-2}, \qquad \breve{\varOmega }_{\mathrm{\varLambda }} := \varOmega _{\mathrm{\varLambda }} \varDelta _H^{-2} , \end{aligned}$$

where for a simulation initalised at \(a_{\mathrm{ini}}\)

$$\begin{aligned} \varDelta _H := \sqrt{1 - \frac{5\varOmega _m}{3} \frac{D_+(a_\mathrm{ini})}{a_{\mathrm{ini}}} \delta _0}. \end{aligned}$$

Note that although these expressions are exact, solutions only exist for \(\delta _0 <\frac{3}{5\varOmega _m} \frac{a_{\mathrm{ini}}}{D_+(a_\mathrm{ini})}\). For larger background densities, the whole region is expected to collapse. An important aspect is that the age of the universe should match between the separate universe box and that of the unperturbed universe, thus the scale factors are not identical but relate to each other as

$$\begin{aligned} \breve{a} := a \left( 1 - \frac{1}{3} D_+(a) \delta _0 \right) . \end{aligned}$$

Also, when initialising a simulation, the perturbation spectrum from which the ICs are sampled should be rescaled with the growth function \(\breve{D}_+(\breve{a})\) based on the ‘separate universe’ cosmological parameters.

Separate universe simulations have been successfully applied to many problems, that can be roughly split into two groups. The first group includes measurements of the value and mass dependence of local and nonlocal bias parameters for voids and halos. The second group includes quantification of the response of the nonlinear power spectrum and/or bispectrum to global processes. For instance, Barreira et al. (2019) studied the role of baryonic physics on the matter power spectrum. Other studies include measurements of the linear and quadratic bias parameters associated to the abundance of voids (Chan et al. 2020; Jamieson and Loverde 2019b); the correlation between large-scale quasar overdensities and small-scale Ly-\(\alpha \) forest (Chiang et al. 2017); halo assembly bias (Baldauf et al. 2016; Lazeyras et al. 2017; Paranjape and Padmanabhan 2017); and cosmic web anisotropy (Ramakrishnan and Paranjape 2020).

Naturally, a simple change in the mean density only modifies the isotropic expansion. More realistically, a given volume will also be exposed to anisotropic deformation due to a large-scale tidal field. Schmidt et al. (2018) (see also Stücker et al. 2021c; Masaki et al. 2020; Akitsu et al. 2020) have demonstrated that such a global anisotropy can be accounted for by a modification of the force calculation in numerical simulations. These simulations have been used to study the role of large-scale tidal fields in the abundance and shape of dark matter halos and the response of the anisotropic power spectrum, and will be very useful also for studies of coherent alignment effects of haloes and galaxies which are important to understand intrinsic alignments in weak gravitational lensing. We note that the separate Universe approach can also be generalised to study the impact of compensated isocurvature modes (where the relative fluctations of baryons and dark matter changes while leaving fixed the total matter fluctuations) or of modifications to the primordial gravitational potential, as illustrated in Fig. 10.

Fig. 10
figure 10

Image reproduced with permission from Voivodic and Barreira (2021), copyright by IOP

Schematic illustration of various kinds of Separate Universe simulations. Structure formation embedded into a large-scale fluctuation \(\delta _L\) is equivalent to that of a simulation with a modified background matter parameter, \(\rho _m\); structure formation with inside large-scale compensated isocurvature fluctuation, \(\sigma _L\) is equivalent to a modification in the background baryon and cold dark matter densities, \(\rho _c\) and \(\rho _m\); and finally structure formation inside a large potential fluctuation, as originated by primordial non-gaussianity, can be captured in the Separate Universe formalism by a change in the amplitude of fluctuations \(A_s\)

Another limitation of the original separate universe formulation, is that the long wavemode is assumed to evolve only due to gravitational forces. This means that the scale on which \(\delta _0\) is defined has to be much larger than any Jeans or free streaming scale. This condition might be violated if, e.g., neutrinos are considered since their evolution cannot be represented as a cold matter component. This limitation was avoided in the approach of Hu et al. (2016), who introduced additional degrees of freedom in terms of “fake” energy densities tuned to mimic the correct expansion history and thus growth of the large-scale overdensity. This approach has been applied to study inhomogeneous dark energy (Chiang et al. 2017; Jamieson and Loverde 2019a) and massive neutrinos (Chiang et al. 2018).

6.3.7 Real universes: constrained realisations

So far we have discussed how to generate random fields which produce numerical universes in which we have no control over where structures, such as galaxy clusters or voids, would form. In order to carry out cosmological simulations of a realisation that is consistent with the observed cosmic structure surrounding us, it is therefore desirable to impose additional constraints.This could, for instance, shed light on the specific formation history of the Milky Way and its satellites, inform about regions where a large hypothetical DM annihilation flux is expected, or quantify the role of cosmic variance in the observed galaxy properties [see Yepes et al. (2014) for a review.]

The simplest way to obtain such constrained initial conditions is by employing the so-called Hoffman–Ribak (HR) algorithm (Hoffman and Ribak 1991). Given an unconstrained realisation of a field \(\phi \), we seek to find a new constrained field \(\check{\phi }\) that fulfils M (linear) constraints. In general, this can be expressed in terms of kernels H by requiring \((H_j\star \check{\phi })(\varvec{x}_j) = \check{c}_j\), where \(\check{c}_j\) is the desired value of the j-th constraint centered on \(\varvec{x}_j\). The Gaussian field \(\check{\phi }\) obeying these constraints can be found using the Hoffman–Ribak algorithm (Hoffman and Ribak 1991) by computing

$$\begin{aligned} \tilde{\check{\phi }}(\varvec{k}) = \tilde{\phi }(\varvec{k}) + P(k) \, \tilde{H}_i(\varvec{k}) \,\xi _{ij}^{-1}(\check{c}_j-c_j), \end{aligned}$$


$$\begin{aligned} c_j = \frac{1}{(2\pi )^3}\int _{\mathbb {R}^3}\mathrm{d}^3k\, \overline{\tilde{H}_j(\varvec{k})}\,\tilde{\phi }(\varvec{k})\qquad \text {and} \qquad \xi _{ij} = \frac{1}{(2\pi )^3} \int _{\mathbb {R}^3}\mathrm{d}^3k\, \overline{\tilde{H}_i(\varvec{k})}\,\tilde{H}_j(\varvec{k})\,P(k), \end{aligned}$$

where \(c_j\) is the covariance between constraint j and the unconstrained field, and \(\xi _{ij}\) is the \(M\times M\) constraint covariance matrix (van de Weygaert and Bertschinger 1996; see also Kravtsov et al. 2002 and Klypin et al. 2003 for further implementation details). A possible constraint is e.g., a Gaussian peak on scale R at position \(\varvec{x}_i\) so that \(\tilde{H}_i(\varvec{k})=\exp \left[ -k^2R^2/2+\text {i}\varvec{k}\cdot \varvec{x}_i\right] \), constraints on differentials and integrals of the field can be easily taken into account in the Fourier-space kernel.

Using this algorithm, various simulations were able to reproduce the local distribution of galaxies and famous features of the local Universe such as the Local and Perseus-Pices Superclusters, and the Virgo and Coma clusters as well as the Great Attractor (e.g., Sorce et al. 2014). This is illustrated in the top left panel of Fig. 11, which displays a simulation from the CLUES collaborationFootnote 20 (Gottloeber et al. 2010; Carlesi et al. 2016). Traditionally, the observational constraints were mostly set by radial velocity data (which was assumed to be more linear than density, thus more directly applicable for constraining the primordial density field), for instance, from dataset such as the “Cosmic Flows 2” (Tully et al. 2016).

Fig. 11
figure 11

Images reproduced with permission from [top left] CLUES, copyright by S. Gottlöber, G. Yepes, A. Klypin, A. Khalatyan; [top right] from Wang et al. (2016), copyright by IOP; and [bottom] from Jasche and Lavaux (2019), copyright by IOP

Various N-body simulations of initial fields constrained by local observations. The top left panel shows one of the simulations of local Universe carried out by the CLUES collaboration, where the initial conditions were constrained using the CosmicFlows observations (Tully et al. 2016). In the top right panel we show a \(200 h^{-1}\mathrm{Mpc}\) wide slice of the density field in the ELUCID simulation (Wang et al. 2016). The simulated dark matter is shown in a black-blue color scale whereas red/cyan symbols show the location of red/blue galaxies in the SDSS DR7 observations. Similarly, in the bottom panel we show a Mollweide projection of \(\log (2+\delta )\) of a PM simulation whose initial conditions were set by requiring agreement with the observations of the 2M++ catalogue (Lavaux and Hudson 2011) shown as red dots

A further technique related to constrained realisations has been termed as ‘genetic modification’ (Roth et al. 2016; Stopyra et al. 2021). Its main idea is to impose constraints, not in order to be compatible with the Local Universe, but to perform controlled numerical experiments. For instance, the mass of a halo can be altered by imposing a Gaussian density peak at the halo location with the desired height and mass. In this way, it is possible to, e.g., isolate the role of specific features (e.g., formation time, major merger, etc) in the formation of a halo or of the putative galaxy it might host, seeking a better understanding of the underlying physics.

The HR approach has several limitations, for instance, it does not account for the Lagrangian–Eulerian displacements. A general problem of all reconstruction methods is that small scales are difficult to constrain since those scales have shell-crossed in the \(z=0\) Universe, so that information from distinct locations has been mixed or even lost. Constrained simulations therefore resort to trial and error by running many realisations of the unconstrained scales until a desired outcome is achieved (e.g., a Milky Way Andromeda pair). To speed up this process, Sorce (2020) recently proposed an important speed up by using pairs of simulation carried out with the Paired and Fixed method (cf. Sect. 6.3.2), and Sawala et al. (2021a) quantified the influence of unconstrained initial phases on the final Eulerian field. Another improvement in terms of a “Reverse Zel’dovich Approximation” has been proposed to estimate the Lagrangian position of local structure (Doumler et al. 2013a, b).

An alternative route to numerical universes in agreement with observations is followed in Bayesian inference frameworks (Kitaura and Enßlin 2008; Jasche et al. 2010; Ata et al. 2015; Lavaux and Jasche 2016). In this approach, LPT models or relatively low-resolution N-body simulations (Wang et al. 2014) are carried out as a forward model mapping a random field to observables. The associated large parameter space of typically millions of dimensions of the IC field is then explored using Hamiltonian Monte Carlo until a realisation is found that compares favourably with the observations. The white noise field can then be stored and used for high resolution simulations that can give insights into the formation history of the local Universe. For instance, Heß et al. (2013), Libeskind et al. (2018), Sawala et al. (2021b) simulated the initial density field derived from observations of the local Universe, and Wang et al. (2016), Tweed et al. (2017) created a constrained simulation compatible with the whole SDSS DR7 galaxy distribution, which is shown in the top right panel of Fig. 11.

6.4 Initial particle loads and discreteness

Given displacement and velocity fields from Lagrangian perturbation theory and a random realisation of the underlying phases, one is left with imposing these displacement and velocity perturbations onto a set of particles, i.e., a finite set of initially unperturbed Lagrangian coordinates \(\varvec{q}_{1\dots N}\). These correspond to N-body particles representing the homogeneous and isotropic initial state of the numerical universe. Drawing the \(\varvec{q}_j\) from a Poisson process would be the naive choice, however, its intrinsic power spectrum is usually well above that of the matter fluctuations at the starting time, so that it introduces a large amount of stochastic noise and is gravitationally unstable even in the absence of perturbations. Other choices are regular Bravais lattices (a regular simple cubic grid being the most obvious choice), which are gravitationally stable, have no stochasticity, but are globally anisotropic. Higher-order Bravais lattices such as body-centered or face-centered lattices are more isotropic than the simple cubic lattice. A gravitationally stable arrangement with broken global symmetry can be obtained by evolving a Poisson field under a repulsive interaction (White 1994) until it freezes into a glass-like distribution. The resulting particle distribution is more isotropic than a regular lattice, and has a white-noise power spectrum on scales smaller than the mean inter-particle separation, which decreases as \(k^{4}\) on larger scales and therefore has more noise than a Bravais lattice. Also other alternative initial particle loads have been proposed, among them quaquaversal tilings (Hansen et al. 2007), and capacity constrained Voronoi tessellations (CCVT, Liao 2018), both of which have a \(k^4\) power spectrum. Example initial particle distribution for various cases are shown in Fig. 12.

Fig. 12
figure 12

Image reproduced with permission from Liao (2018), copyright by the authors

Illustration in two dimensions of various particle distributions employed in the generation of initial conditions for cosmological numerical simulations. From left to right: a regular simple cubic grid lattice, a glass, and particle loads from quaquaversal tiling, and capacity constrained Voronoi tessellations

After the creation of the initial particle load, the displacement \(\varvec{\varPsi }(\varvec{q}_{1\dots N})\) and velocity fields \(\dot{\varvec{\varPsi }}(\varvec{q}_{1\dots N})\) are interpolated to the particle locations \(\varvec{q}_{1\dots N}\), thereby defining the initial perturbed particle distribution with growing mode velocities at the simulation starting time. In the case of Bravais lattices, the modes coincide directly with the modes of the DFT in the case of the simple cubic lattice, so that no interpolation is necessary, or can be obtained by Fourier interpolation. For the other pre-initial conditions typically CIC interpolation is used, cf. Sect. 5.1.2. Since CIC interpolation acts as a low-pass filter, the resulting suppression of power is usually corrected by de-convolving the displacement and velocity fields with the CIC interpolation kernel (Eq. 61b).

Since the symmetry of the fluid is always broken at the particle discretisation scale, the specific choice of pre-initial condition impacts the growth rate and growth isotropy at the scale of a few particles. While in the Bravais cases, this deviation from the fluid limit is well understood (Joyce et al. 2005; Marcos et al. 2006; Marcos 2008), for glass, CCVT and quaquaversal tilings such analysis has not been performed. One expects that they are affected by a stochastic component in the growth rate with an amplitude comparable to that of the regular lattices. Such deviations of the discrete fluid from the continuous fluid accumulate during the pre-shell-crossing phase when the flow is perfectly cold so that over time the simulations, which are initialised with the fluid modes, relax to the growing modes of the discrete fluid.

7 Beyond the cold collisionless limit: physical properties and models for dark matter candidates

So far, in this review article we have focused on the case where all the mass in the universe corresponds to a single cold and collisionless fluid. This is also the assumption of the vast majority of “dark matter only” or gravity-only simulations, and it is justified by the fact that gravity dominates long-range interactions, and that dark matter is the most abundant form of matter in the Universe (\(\varOmega _{\mathrm{c}}/\varOmega _{\mathrm{m}}\approx 84\%\)).

In this section we discuss simulations where these assumptions are relaxed in several ways, either to improve the realism of the simulated system, or to explore the detectability and physics associated to the nature of the dark matter particle. In each case, we briefly discuss the physical motivation, its numerical implementation along with potential modifications necessary to the initial conditions, and summarise the main results.

In the first part of this section we will discuss simulations where dark matter is not assumed to be perfectly cold, but instead have a non-zero temperature as is the case when made out of WIMPs, QCD Axions, or generically Warm Dark Matter. We also discuss cases where dark matter is not a classical particle but made instead of ultra light axion-like particles, or of primordial black holes. We also consider cases where dark matter is not assumed to be perfectly collisionless, but instead display microscopic interactions, such in the case of self-interacting and decaying dark matter.

In the second part of this section we consider simulations that seek a more accurate representation of the Universe. Specifically, we discuss simulations where the mass density field is composed of two distinct fluids representing dark matter and baryons, and simulations that include massive neutrinos. For completeness, we also discuss simulations with non-Gaussian initial conditions and modified gravity.

7.1 Weakly-interacting massive particles

Historically, the favoured candidate for dark matter has been a weakly-interacting massive particle (WIMP). WIMP is a generic name for a hypothetical particle with a very cold distribution function due to its early decoupling and high mass (in the GeV scale). For many decades, WIMPs were a strong contender to be the cosmic DM, motivated by the observation that if the cross-section was set by the electroweak scale, then it would result in a relic abundance matching the measured density of dark matter in the Universe. This coincidence has been termed as the “WIMP miracle” [see Bertone et al. (2004) for a classic review]. However, the non-detection of supersymmetric particles at the LHC to-date (Bertone 2010), and the strong constraints from direct detection experiments begin to challenge the explanation of dark matter through thermally produced WIMP particles (Roszkowski et al. 2018). Nevertheless, massive WIMPs remain compatible with all observational constraints and are one of the best motivated dark matter candidates. See Bertone and Tait (2018) for a recent review of various experimental searches.

A concrete example of a WIMP in supersymmetric extensions to the Standard Model is the lightest neutralino. These particles are stable and weakly interacting and should have a mass in excess of \(\gtrsim 100\,~\mathrm{Gev}\). On astrophysical scales, neutralinos can be described as a perfectly cold collisionless fluid. However, the finite temperature at which these particles decouple implies that they have small but non-zero random microscopic velocities. As a consequence, neutralinos can stream freely out of perturbations of sizes \(\sim 0.7\)pc which means that the formation of halos of masses \(\lesssim 10^{-8}\,M_{\odot }\) is strongly suppressed, and the typical mass of the first halos to collapse is about one Earth mass (Hofmann et al. 2001; Green et al. 2004; Loeb and Zaldarriaga 2005).

For perfectly cold fluids, the distribution function reduces to a Dirac-\(\delta \)-distribution in momentum space, i.e., in any point in space there is a unique particle momentum. This corresponds to the “single stream” or “monokinetic” regime as usually referred to in fluid mechanics and plasma physics (see also the discussion in Sect. 2.5). For a warmer fluid, such as that describing a neutralino field, the cold limit is also an excellent approximation for the distribution function. This is because thermal random velocities are small compared to the mean velocities arising from gravitational instability, specially at late times where the former are adiabatically cooled due to the expansion of the Universe whereas the latter keep increasing due to structure formation.

Consequently, numerical simulations of structure formation with neutralinos assume them to be perfectly cold, follow them with traditional N-body integration methods, and incorporate their free streaming effects only in the initial power spectrum suppressing the amplitude of small scale modes. Note however, that the very central parts of halos and caustics could be affected by intrinsic neutralino velocity dispersion, providing e.g., maximum bounds on the density.

One important challenge associated with these types of simulations is the huge dynamic range of the scales involved. For instance, to resolve the full hierarchy of possible structures, it would require about \(10^{23}\) N-body particles. For this reason, neutralino simulations have focused on the formation of the first structures at high redshifts and over small volumes. Usually, these simulations involve zooming (cf. Sect. 6.3.4) into the formation of a small number of halos (Diemand et al. 2005; Ishiyama et al. 2009; Anderhalden and Diemand 2013; Ishiyama 2014; Angulo et al. 2014). Another alternative is to carry out a suite of nested zoom simulations (Gao et al. 2005), which has been recently extended to the free streaming mass by Wang et al. (2020) by re-simulating low density regions embedded into larger underdensities and so on. A selection of projected density fields from this simulation suite are shown in Fig. 13 which displays progressive zooms by factors of \(\sim 50\) in scale, from 10 Mpc to 50 pc.

Fig. 13
figure 13

Progressive zooms onto smaller regions of a simulated nonlinear dark matter field at \(z=0\). From left to right, each image shows a smaller region by factors of 5, 40, and 100. Note that in the rightmost panel shows a region approximately 150 pc wide where the smallest visible clumps corresponds to the smallest dark matter halos expected to form in an scenario where the dark matter particle is made out of \(\gtrsim 100\) GeV neutralinos. Image adapted from Wang et al. (2020)

There is a consensus from all these simulations that the first microhalos have a mass of about \(10^{-6}\,M_{\odot }\) and start collapsing at a redshift of \(z\sim 300\). At those mass scales and resfhifts, structure formation is different than at galactic scales: the spectrum of CDM density fluctuations has a slope close to \({-3}\), i.e. \(P(k) \propto k^{-3}\), which causes a large range of scales to collapse almost simultaneously. This also implies that the formation of the first microhalos is immediately followed by a rapid mass growth and many major mergers. Whether these microhalos can survive tidal forces inside Milky-Way halos is still an open question, which could have implications for the detectability of a potential dark matter self-annihilation signal.

Another, perhaps unexpected, outcome of these simulations is that the density profile of these microhalos is significantly different from that of halos on larger mass scales. Ever since standard CDM simulations (i.e., those without any free-streaming effects) reached adequate force and mass resolution, they revealed that the internal density profiles of collapsed structures were well described by a simple functional form (Navarro et al. 1997), referred to as NFW profile, in terms of a dimensionless radial coordinate \(x := r/r_s\) and density \(\varrho := \rho (r) / \rho _0\) (where \(r_s\) and \(\rho _0\) are parameters that vary from halo to halo)

$$\begin{aligned} \varrho (x) = \frac{1}{x (1 + x^2)} \end{aligned}$$

regardless of the mass of the halo, cosmic epoch, cosmological parameters, etc. (Ascasibar and Gottlöber 2008; Neto et al. 2007; Brown et al. 2020). Despite its importance and several proposed explanations that involve, for instance, a fundamental connection with the mass accretion history (Ludlow et al. 2014, 2016), maximum entropy or adiabatic invariance arguments (Taylor and Navarro 2001; Dalal et al. 2010a; Pontzen and Governato 2013; El Zant 2013; Juan et al. 2014), or even the role of numerical noise as the main driver (Baushev 2015), there is not yet a consensus on the physical explanation behind this result. This is even more puzzling when contrasted with analytic predictions, which suggest single power laws as the result of gravitational collapse [see e.g., the self-similar solution of secondary infall by Bertschinger (1985)].

In contrast, most neutralino simulations find that the initial internal structure of microhalos is better described by a single power law profile \(\sim r^{-1.5}\), as first pointed out by Diemand et al. (2005) (see also Ishiyama et al.