Test particle simulations of cosmic rays

Modelling of cosmic ray transport and interpretation of cosmic ray data ultimately rely on a solid understanding of the interactions of charged particles with turbulent magnetic fields. The paradigm over the last 50 years has been the so-called quasi-linear theory, despite some well-known issues. In the absence of a widely accepted extension of quasi-linear theory, wave-particle interactions must also be studied in numerical simulations where the equations of motion are directly solved in a realisation of the turbulent magnetic field. The applications of such test particle simulations of cosmic rays are manifold: testing transport theories, computing parameters like diffusion coefficients or making predictions for phenomena beyond standard quasi-linear theory, e.g. for cosmic ray small-scale anisotropies. In this review, we seek to give a low-level introduction to test particle simulations of cosmic rays, enabling readers to run their own test particle simulations. We start with a review of quasi-linear theory, highlighting some of its issues and suggested extensions. Next, we summarise the state-of-the-art in test particle simulations and give concrete recipes for generating synthetic turbulence. We present a couple of examples for applications of such simulations and comment on an important conceptual detail in the backtracking of particles.


Introduction
Cosmic rays (CRs), that is the population of charged, relativistic particles with non-thermal spectra, are ubiquitous in the Universe. They pervade systems of all sizes, from stellar systems to whole galaxies, from galaxy clusters to the intercluster medium. See Berezinsky et al. (1990); Strong et al. (2007); Grenier et al. (2015); Kotera and Olinto (2011) for reviews on Galactic and extra-galactic cosmic rays. CRs are not only responsible for genuinely non-thermal phenomena: the fluxes of CRs observed at Earth, the non-thermal emission of radio, X-ray and gamma-ray sources or the diffuse Galactic and extragalactic emission; but CRs oftentimes have energy densities comparable or even superior to other components, like the thermal gas, magnetic fields or radiation backgrounds. As such, CRs can contribute to the pressure equilibrium or even drive large-scale outflows, e.g. Everett et al. 2008;Hanasz et al. 2013;Simpson et al. 2016;Recchia et al. 2016. At the largest scales, it has been suggested that CRs (or gamma-rays from blazars) contribute to the heating of the Universe at redshifts as high as z ∼ 10 (Nath and Biermann 1993;Sazonov and Sunyaev 2015;Leite et al. 2017).
Any detailed modelling of CRs relies on understanding transport in coordinate and momentum space. For instance, modelling the locally observed CRs involves their propagation from the sources to the observer. It is believed that diffusion is the dominant process in shaping the spectra, both during shock or stochastic acceleration inside the sources and during their transport from the sources. Indeed for Galactic CRs the most important effect, that is the softening of the observed spectra with respect to the source spectra and the relative softness of so-called secondary species (e.g. boron) with respect to so-called primary species (e.g. carbon), can be explained with a rigidity-dependent diffusion coefficient. (Cf. Gabici et al. (2019) for a recent review of  the challenges to this picture.) Scrutinizing this picture and improving upon it requires a better, more refined understanding of spatial transport. A prominent example is the issue of small-scale anisotropies, that is the variation of the flux of CRs on angular scales as small as 5 • which is absent in simple diffusion models. (See Ahlers and Mertsch (2017) for a review on smallscale anisotropies).
What has been hampering progress are mainly two issues. First, the transport of high-energy, charged particles through a turbulent magnetised plasma is intrinsically non-linear: The temporal evolution of the phase space density of particles can be described by a Fokker-Planck equation with coefficients that depend on the small-scale magnetic field as will be reviewed below. At the same time, however, CRs contribute to the dielectric tensor of the plasma, thus affecting its dispersion relation. Broadly speaking, waves are damped if the phase space density is very isotropic, but they can grow if there is sufficient anisotropy. In general, sources are distributed inhomogeneously, this leads to anisotropy in momentum and growth of wave modes. This is called the streaming instability and can lead to self-confinement of CRs. While this fact was known already in the 1960's (Kulsrud and Pearce 1969;Kulsrud and Cesarsky 1971;Skilling 1975), only recently has it been incorporated into (simple) phenomenological models Evoli et al. 2018). Note that self-generated turbulence is also important close to the sources of Galactic CRs (Malkov et al. 2013;Ptuskin et al. 2008;Nava et al. 2016Nava et al. , 2019 and in providing the amplified magnetic fields necessary for shock acceleration to the highest energies (Bell and Lucek 2001).
The other issue is the lack of a fundamental microscopic theory for the transport of charged particles through turbulent magnetic fields. More than 50 years since its inception, quasi-linear theory (QLT) (Jokipii 1966;Kennel and Engelmann 1966;Hall and Sturrock 1967;Hasselmann and Wibberenz 1970) is still very much the paradigm theory. In QLT, the Fokker-Planck equation for the temporal evolution of the phase space density of CRs is derived in a perturbative approach where the force on a particle due to a turbulent magnetic field is evaluated along the unperturbed trajectory in a regular background field. The Fokker-Planck coefficients, most prominently the components of the spatial diffusion tensor, can be computed for a given model of turbulence, parametrised by the two-point function of the turbulent magnetic field. QLT's predictions are largely confirmed by data, e.g. the rigidity-dependence 1 1 Rigidity R is defined as the ratio of particle momentum over electric charge, R = pc/(Ze). For ultra-relativistic particles, it is of the diffusion coefficients, but its role in some of the observational anomalies is unclear. Famously, in QLT the interactions between plasma waves and particles are found to be resonant, meaning that particles of a certain gyro radius r g = v/Ω, Ω denoting the gyro frequency, are only affected by waves with a wavenumber k that satisfies kr g µ ≈ 1 (for low-frequency waves like Alfvén waves) where µ is the cosine of the pitch-angle, that is the angle between the particle momentum p and the regular magnetic field B , µ ≡ p · B /(|p|| B |).
On a conceptual level, the perturbative approach of QLT is strictly only valid if the magnitude of the turbulent magnetic field is much smaller than the magnitude of the regular component. Due to the resonance condition, this criterion must to a certain extent be rigidity-dependent. Furthermore, QLT suffers from a number of well-known pathologies. The most famous one is the 90 • problem: Due to the resonance condition, particles with pitch-angle close to 90 • (µ ≈ 0) can only be in resonance with very large wavenumbers k which for the usual turbulent spectra contain little power. In the limit µ → 0, the scattering rate vanishes and particles cannot change direction along the background field resulting in ballistic transport. This is obviously at variance with the diffusive transport inferred from observations. Note that the 90 • problem is absent in non-linear extensions of QLT where the resonance condition is broadened with respect to QLT. Finally, another issue with QLT arises when the anisotropic nature of turbulence is considered.
In the absence of any single, agreed-upon model that overcomes the difficulties of QLT, it therefore seems natural to consider alternative computational approaches. A conceptually simple one is to perform test particle simulations on a computer. To this end, a realisation of the turbulent magnetic field is generated and the equations of motion (Newton-Lorentz equations) are solved for test particles, that is the contributions of the CR particles to the electromagnetic fields are ignored. Given the trajectories of a large enough number of test particles, one can numerically compute the Fokker-Planck coefficients or simulate the distribution of arrival directions seen by an observer. This idea has been very popular ever since powerful enough computers have been available to allow for the computation of thousand if not millions of test particles. Yet, we have found the body of literature on this rather disjoint, with different groups employing incompatible prescriptions. It is the intention of this review to provide a simply related to the energy E and energy-per-nucleon (E/A) as R = pc/(Ze) E/(Ze) = (A/Z)(E/A)/e. low-level introduction to the uninitiated while also discussing some of the applications of test particle simulations.
This review will be structured as follows. In Sec. 2 we give a brief review of QLT, describing how the diffusion coefficients are evaluated, introducing some of the simplest and most popular turbulence models. We will also review a few of QLT's non-linear extensions. In Sec. 3, we explain the two main methods that have been employed in generating turbulent magnetic fields on a computer. We will reproduce the recipes from the literature in a way that should allow the interested reader to produce her/his own synthetic turbulence. In Sec. 4.3, we will clarify some of the issues related to backtracking-a technique based on solving the equations of motion backward in time. We will conclude with a short summary and outlook in Sec. 5.

Quasi-linear theory and extensions
For some 40 years, quasi-linear theory (QLT) (Jokipii 1966;Kennel and Engelmann 1966;Hall and Sturrock 1967;Hasselmann and Wibberenz 1970) has been the broadly accepted and widely employed theory of CR transport. Its success and popularity can be ascribed to its conceptual simplicity and validity in a number of important environments, including the solar wind, the interstellar medium and galaxy clusters. In addition, QLT is simple in principle and thus allows for a straight-forward computation of the transport parameters, albeit it can become arbitrarily complex in practice. Finally, these results can be found to agree with inferences from observations, e.g. the normalisation and power law shape of the Galactic diffusion coefficient.
At the heart of QLT is the evaluation of the turbulent magnetic field and its contribution to the Lorentz force along "unperturbed orbits", that is trajectories calculated in only a large-scale, regular magnetic field. Interactions of CRs with small-scale, magnetised turbulence result in resonant interactions, that is particles of Larmor radius r g and pitch-angle cosine µ interact predominantly with modes of wavenumber k that satisfies kr g µ ∼ 1. These resonant interactions lead to pitch-angle scattering and for a spectrum of magnetic turbulence with random phases, the particle performs a random walk in pitch-angle. The evolution of the phase space density can be described by a Fokker-Planck equation and the Fokker-Planck coefficients, e.g. the pitch-angle diffusion coefficient or the rate of secondorder Fermi acceleration, depend on the two-point correlation functions of the turbulent magnetic field. In addition, under the assumption of slow variation of the phase space density with position and time, pitch-angle diffusion results in spatial diffusion along the background magnetic field (Earl et al. 1988). Finally, QLT also allows computing the dipole anisotropy in the arrival directions of CRs for a given spatial gradient of the phase space density.
In the following, we review the foundations of QLT, starting from the derivation of the Fokker-Planck equation. After an introduction to the various turbulence geometries in use, we outline how the transport coefficients can be computed. Motivated by the shortcomings of QLT, we review some of its non-linear extensions.

Derivation of the Fokker-Planck equation
Charged particles in electric and magnetic fields E and B are subject to the Lorentz force, with e and v the charge and velocity of the particle and c the speed of light. It is customary to decompose the magnetic field into a large-scale, homogeneous, regular background field, B and a small-scale, turbulent, random field δB, that is B = B + δB with δB ≡ 0.
(Throughout this article, we use angled brackets to denote averages over an ensemble of turbulent magnetic fields.) Without loss of generality, we assume in the following that the regular field is oriented along the zdirection, B = B zẑ , unless stated otherwise. Largescale electric fields are usually ignored, E = 0, as the large mobility of charges in astrophysical plasmas is efficiently shielding against regular electric fields (that is on scales much larger than the Debye length). Smallscale electric fields δE are necessarily present, but from Faraday's induction law, their magnitude can be estimated to be |δE| ∼ (v A /c)|δB| with v A the Alfvén velocity and v A /c 1 in most astrophysical environments. To lowest order, we therefore ignore the effect of electric fields in the plasma frame. As the magnetic force is not performing any work on the particle, particle energy is consequently conserved.
A charged particle in a magnetic field forms a Hamiltonian system as long as dissipative processes (or any form of energy losses) can be ignored. A consequence of this is Liouville's theorem, that is the conservation of phase space volume under canonical transformations. As time evolution is a canonical transformation, phases space volume is conserved in time (Goldstein et al. 2002). Together with particle number conservation this implies the conservation of phase space density f = f (r, p, t). This is conveniently captured by what we will call Liouville's equation, encoding the incompressibility of the phase space flow. Here, are the equations of motion. Note that a necessary condition for a Hamiltonian system is that the forces are conservative and differentiable ("p-divergence-free"). A collisionless plasma under the influence of external electric and magnetic fields, E and B, is an example of a Hamiltonian system. Its Hamiltonian is (Jackson 1998) Here, P = p + (e/c)A is the canonical momentum, A the vector potential, m the particle mass, e its charge and Φ the electric potential. Therefore, the phase space density of this collisionless plasma satisfies eq. (2) and substituting the Lorentz force, eq. (1), in eq. (2) gives the Vlasov equation, which together with Maxwell's equations forms the basis of plasma kinetic theory. For a collisional plasma, a term needs to be added to the right-hand side, the famous collision operator. For a collisionless plasma (as appropriate for CRs) the right-hand side remains zero. Considering turbulent fields, e.g. in the magnetostatic approximation, E(r) = 0, B(r) = B + δB(r), the phase space density also becomes a random field, f = f + δf , with an expectation value, f , and fluctuations around it, δf , that satisfy δf = 0.
In any realistic astrophysical situation, it is of course impossible to know the small-scale turbulent field at all positions in order to exactly solve eq. (5). Instead, one can only hope to predict statistical moments of the phase space density for a statistical ensemble of turbulent magnetic fields. Traditionally, one is mostly interested in the first moment, the ensemble average, though see Mertsch and Ahlers (2019) for the computation of a second-order moment. Averaging eq. (5), we find, see e.g. Jokipii (1972), Note that unlike the phase space density f , the ensemble averaged phase space density f is not conserved, d f /dt = 0. (More on this in Sec. 4.3.) One way to glean some physical insight from eq. (6) is to identify its right-hand-side with a damping term, (Earl et al. 1988;Webb 1989), that is driving the phase space density towards isotropy at a rate ν, an approach that can also be motivated by gas kinetic theory (Bhatnagar et al. 1954). This way, eq. (6) can be solved and shown to lead to a spatial diffusion equation. The parallel diffusion coefficient can be identified as κ = 1/(3ν) whereas the perpendicular diffusion rate satisfies ν ⊥ /ν = 1 + Ω 2 /ν 2 , a result referred to as the "classical scattering limit" (Gleeson 1969). Here, Ω is the particle's gyro frequency. In QLT, however, a more systematic solution for f is sought through an equation for the temporal evolution of the fluctuations δf . Such an equation can be obtained by subtracting the ensemble-averaged Vlasov eq. (6) from the original Vlasov eq. (5), Here, we have chosen to ignore the difference which is second order in perturbed quantities, δB and δf . This assumes, of course, that |δB| | B | and therefore δf f . Eq. (8) can now be integrated with the method of characteristics, the formal solution being Here, δf 0 ≡ δf (r, p, t 0 ) denotes the phase space density at time t 0 and the subscript P (t ) indicates that positions and momenta in the square brackets are to be evaluated along the characteristics of eq. (8), that is the solutions of the equations of motions, eq. (3) with B replaced by the regular field B only. These solutions P are commonly referred to as "unperturbed orbits" or "unperturbed trajectories". For the homogeneous regular magnetic field B = B zẑ assumed here they are of course helices along the z-direction.
We can now substitute eq. (10) into eq. (6), where we have dropped the term ∝ δf 0 . At this stage, we can already see that the right-hand side will lead to diffusion terms (courtesy of the two momentum derivatives) and that it depends on the turbulent magnetic field's two-point function, integrated along the unperturbed trajectory P (t ). To make further progress, we consider the momentum p in spherical coordinates, that is p = p( 1 − µ 2 cos φ, 1 − µ 2 sin φ, µ).
The right-hand side of eq. (12) is still rather unwieldy and further progress requires a number of assumptions. In addition to 1. Smallness of perturbations, |δB| | B | (see above); these are: 2. Gyrotropy: The ensemble-averaged phase space density f does not depend on the azimuthal angle φ, so f (r, p, µ, φ, t) → f (r, p, µ, t). 3. Adiabatic approximation: The phase space density only varies on time-scales much larger than the correlation time of the turbulent magnetic field, τ c , 4. Finite correlation times: The correlation times of the turbulent magnetic field are much larger than the Larmor time, τ c Ω −1 . 5. Homogeneous and stationary turbulence.
Under these conditions, the ensemble averaged Vlasov equation ultimately results in a Fokker-Planck type equation (Fokker 1914;Planck 1917), also known as the Kolmogorov forward (Kolmogoroff 1931) or as the Smoluchowski equation (Bogolyubov and Krylov 1939), describing diffusion in pitch-angle, In summary, under the influence of a turbulent magnetic field, charged particles are performing a random walk in pitch-angle which in the ensemble average results in diffusion in pitch-angle (cosine).
Note that under the assumption of magnetostatic turbulence the Fokker-Planck coefficient D µp = D pµ and D pp vanish. We have furthermore assumed that v A /v 1 in order for D xx , D yy , D xy and D yx to be negligible. Not doing so, would have resulted in the additional terms to be added to the right-hand side of eq. (14).
It seems clear that transport in any perturbative theory with |δB| | B | must be ballistic at early enough times: Particles just gyrate around B and ∆z 2 = (vµ∆t) 2 while ∆x 2 = ∆y 2 = 0 when integrated over full gyroperiods. At late times, that is for t D −1 µµ , we would expect diffusive behaviour for the transport along the field.
In order to formalise this picture, we derive a spatial diffusion equation from the . To this end, we decompose f into an isotropic part, g, and an anisotropic part, h, f (p, µ, t) = g(p, t) + h(p, µ, t) , where g(p, t) = 1 2 and If g varies only slowly with time and position, g ∂g ∂t τ sc and g ∂g ∂z λ sc , where τ sc ∼ D −1 µµ and λ sc ∼ vτ sc are the scattering time and mean-free path, respectively, the phase space density will be very isotropic, h g. In this case, we can derive a spatial diffusion equation for the isotropic part g, with the parallel diffusion coefficient Furthermore, we would expect the anisotropic part h to be dominated by the dipole anisotropy, that is h ≈ h 1 µ with

Computation of transport coefficients
So far, we have not specified the functional form of the Fokker-Planck coefficients, e.g. the pitch-angle diffusion coefficient D µµ , and its dependence on the two-point correlation function of turbulenceP ij (k) that emerges in the derivation of the Fokker-Planck equation (14). An alternative derivation of the Fokker-Planck coefficients starts from postulating for CR transport to be a Markov process and for f to satisfy the same Fokker-Planck equation. An arbitrary Fokker-Planck coefficient D P Q can then simply be defined in terms of the mean displacements of the variables in question, P and Q. For instance, the pitch-angle diffusion coefficient can be derived as the t → ∞ limit of the running diffusion coefficient, This is a consequence of the Taylor-Green-Kubo formula (Taylor 1922;Green 1951;Kubo 1957), where the dots denote derivatives with respect to time. For instance, this allows computing the parallel diffusion coefficient κ without the detour of computing D µµ first and then applying the diffusion approximation, eq. (23). From the equations of motion, see eq.
(3), we finḋ and thus Here, we have defined and both the velocities and the magnetic fields are to be evaluated along unperturbed trajectories. Note that the fact that the Fokker-Planck coefficients only depend on the two-point function means that we can constrain ourselves to the Gaussian part of the turbulent magnetic field.

Turbulence geometries and spectra
To make further progress, we need to specify the turbulence correlations. In the derivation of the Fokker-Planck equation we had to assume that turbulence is homogeneous and stationary, that is its statistical moments are invariant under translations in space and time (see assumption 5). In this case, the field can be represented very economically in Fourier space. To this end, we introduce the Fourier transform pair Note that for the magnetic field to have real values, δB j (r) = δB * j (r), requires a relation between the Fourier components and their complex conjugates, The homogeneity and stationarity now guarantee that the two-point functions P ij = δB i (r, t)δB j (r , t) depend on the positions r and r and times t and t only through the differences ∆r ≡ (r − r ) and (t − t ). It is then easy to see that the two-point function in Fourier space is diagonal, withP ij (k, t − t ) being the Fourier transform of the Two-point function, In the following, we will refer toP ij (k, ∆t) as the turbulence correlation tensor. It contains all the (statistical) information on the magnetic turbulence that enters into the computation of the Fokker-Planck coefficients. This includes information on the turbulence geometry, for instance whether there is a preferred direction for the propagation of waves, information on the turbulence spectrum, that is the distribution of energy among different turbulent scales, as well as information on the time-dependence of the correlations. We will discuss a few parametrisations below.
Oftentimes, it is assumed thatP ij (k, ∆t) factorises into a correlation tensorP ij (k) ≡P ij (k, 0) independent of time and a time-dependent dynamical correlation function Γ(k, ∆t), In the magnetostatic approximation, we ignore any time-dependence altogether as we ignore the small electric fields. While in realityP ij may be arbitrarily complicated, three simple turbulence geometries have dominated much of the literature, both in analytical studies of transport coefficients and numerical test particle simulations. These three geometries are conceptually simple and particularly amenable to analytical computations of the components of the diffusion tensor and the other Fokker-Planck coefficients: 3D isotropic turbulence, slab turbulence and a composition of slab and 2D isotropic turbulence. In the following, we will give explicit formulas for the turbulence correlation tensor for these models in terms of a scalar power spectrum g(k), the spectral part of the turbulence tensors. Afterwards, we introduce two popular parametrisations for g(k) and conclude with an example for the computation of the pitch-angle diffusion coefficient.

3D isotropic turbulence
It is easy to show that for 3D isotropic turbulence the magnetostatic correlation tensor takes the form with k = |k|. The k-dependent real functions g 3D (k) and σ(k) allow modelling of the overall spectrum and of a wavenumber-dependent helicity, respectively. Note that for linearly polarised waves σ(k) ≡ 0. The normalisation of g 3D (k) is fixed by requiring

Slab turbulence
In slab turbulence, it is assumed that all quantities are independent of the coordinates perpendicular to the background field (in our case: x and y) and that the turbulent field has no z-component. Consequently, the wave vectors k ẑ and if we further demand turbulence to be axisymmetric, the turbulence correlation tensor reads for i, j ∈ x, y and zero otherwise. In our case, k = k z and k ⊥ = k 2 x + k 2 y . Again, σ(k ) allows for wavenumber-dependent helicity, but vanishes for linear polarisation. The normalisation is then While slab turbulence might seem rather restrictive a turbulence model, it is quite attractive due to its simplicity. In addition, it could be argued that it is of physical relevance in situations where the turbulence is self-generated by anisotropies in the distribution of CRs (Kulsrud and Pearce 1969;Skilling 1975): It has been shown (e.g. Tademaru 1969) that the modes with wavevectors along the background magnetic field grow fastest.

Composite (slab + 2D isotropic) turbulence
Motivated by observations of the turbulence in the solar wind (Matthaeus et al. 1990), the heliospheric community has adopted a composite model for the correlation tensor as a superposition of a slab component and a 2D isotropic component. The motivation for this composite turbulence model were observations of CR mean-free paths which were in conflict with the observed turbulent energy densities. In fact, the observed mean-free path was significantly larger than what was predicted for the measured turbulence level in a pure slab model. As 2D turbulence contributes to pitch-angle scattering (and therefore to the parallel mean-free path) only marginally, moving part of the turbulent energy density from the slab to the 2D component, the measured level of turbulence could be reconciled with the meanfree path. According to Bieber et al. (1994), a 80 % to 20 % split between 2D and slab turbulence, respectively, reconciles the available data sets.
For linearly polarised waves, we can writẽ withP slab ij (k) as in eq. (38) and for i, j ∈ x, y and zero otherwise. This turbulent 2D field only depends on the x-and y-coordinate, and has no z-component. The normalisation condition for the 2D component is

Turbulence spectra
Having reviewed three simple turbulence geometries, we need to specify the spectral shapes g(k) in order to compute transport coefficients. In cascade models of turbulence (Kolmogorov 1941;Iroshnikov 1963;Kraichnan 1965), energy is injected on the largest scales in the so-called energy range. Non-linear interactions transfer energy to smaller scales over the so-called inertial range. At very small scales, the turbulent energy is dissipated in the so-called dissipation range. The scale between the energy and inertial ranges is called the outer scale of turbulence and the scale between the inertial and the dissipation range is called the dissipation scale. For an introduction to turbulence theory, see e.g. Frisch (1995). Both turbulence theory and observations point at the existence of power law spectra in the inertial range. In fact, power law spectra have been observed in interplanetary and interstellar space (Armstrong et al. 1995). (For a review on interstellar turbulence, see Elmegreen and Scalo 2004). Both in numerical simulations and in analytical work, most authors have confined themselves to one of two spectra. The first one is a simple power law with spectral index q and low wavenumber cut-off k 0 , corresponding to the outer scale (2π/k 0 ), The alternative is a broken power law with a flat spectrum below the wavenumber k 0 and a power law slope q above, Here, s is parametrising the softness of the break and s → 0 corresponds to a sharp break. It is assumed that the broken power law form can potentially also capture turbulence in the energy range, that is for k < k 0 .

Slab turbulence with broken power law spectrum
By ways of example, we report the result for the pitchangle diffusion coefficient in slab turbulence and for the broken power law spectrum (Shalchi 2009), The function C(q, s) is fixed by the normalisation condition, see eq. (40), where Γ(·) denotes the gamma function. Substituting eq. (46) into eqs. (38) and (28), one encounters the resonance function see Schlickeiser (2002) for details. Eventually, this simplifies to Here, r g denotes again the particle's gyro radius. For relativistic particles r g ∝ R and if the rigidity is small enough such that µ 2 (r g k 0 ) 1, we observe that the rigidity-dependence of D µµ is of power law form reflecting the power law nature of the underlying turbulence spectrum. For Kolmogorov and Kraichnan type values, q = 5/3 and 3/2, the rigidity-dependence of the pitch-angle diffusion coefficient is D µµ ∝ R −1/3 and R −1/2 and the spatial diffusion coefficient κ ∼ 1/D µµ ∝ R 1/3 and R 1/2 , respectively.

Field-line random walk
The computation of the pitch-angle diffusion coefficient in eqs. (26), (28) and (49) is based on an evaluation of the turbulent part of the Lorentz force along trajectories around the homogeneous background field. As long as perturbations are small, this gives the dominant contribution to the parallel diffusion coefficient, eq. (23).
For perpendicular transport, however, there is another important contribution due to the fact that the field line is not perfectly homogeneous. Instead, the large-scale magnetic field evaluated for a particle along a field line changes direction with distance along this field line. Under certain conditions, this movement can be shown to be diffusive, see below. If the movement of the particle due to this effect is included in the computation of the mean-square displacements (or equivalently through the Taylor-Green-Kubo approach), this gives the so-called field-line random walk (FLRW) contribution to perpendicular transport. The contribution without this is oftentimes called the microscopic contribution.
For slab turbulence, the microscopic diffusion coefficient vanishes (the transport is in fact sub-diffusive), hence FLRW gives the only contribution. For other turbulence geometries, FLRW can also contribute, but might not be dominating.
Let's again assume the regular background field B = B zẑ to be dominating over the perturbations δB. The equation determining the field line and similarly for y(z). This can formally be integrated to obtain the mean square displacement in the perpendicular directions, e.g.
In slab turbulence, the integrand only depends on z and it is easy to show that the perpendicular meansquare displacement (∆r ⊥ ) 2 is ballistic at small z and diffusive for large z, e.g.
with the FLRW diffusion coefficient In other turbulence geometries, the integrand in eq. (51) also depends on x and y, such that an explicit solution is not possible without further assumptions. See Shalchi (2009) for a more detailed discussion.

Short-comings of QLT
Despite its popularity, QLT exhibits a number of issues which we will briefly review in the following.
The most well-known pathology of magnetostatic QLT is its inability to scatter particles through 90 • . While present in a number of turbulence geometries, it is easiest illustrated in slab turbulence where the dependence of the pitch-angle diffusion coefficient D µµ on the spectrum g slab (k) becomes very simple. In fact, inspecting eq. (49) we see that D µµ → 0 for µ → 0, thus particles cannot reverse directions and keep moving ballistically. As κ ∼ 1/D µµ , the parallel diffusion coefficient diverges.
The root cause for the 90 • problem is the narrow resonance condition in magnetostatic QLT, k µ r g = ±1, see eq. (48). Particles at finite µ are in resonance with waves of finite parallel wavenumber, k = ±1/(µr g ). For µ approaching 0, however, the resonant parallel wavenumber grows without bounds. With the turbulence spectra being falling power laws, however, there is only little power at small scales and the pitch-angle scattering rate vanishes. In practice, there is of course no power at all at scales below the dissipation scale.
Nature has of course no difficulty to scatter particles through 90 • , as evidenced by the isotropy of Galactic CRs. Therefore, the vanishing of D µµ at µ = 0 must be considered a theoretical issue. It was realised early on (Voelk 1975, see cf. Tautz et al. (2008 for other references) that the origin of the 90 • problem is actually the delta-like resonance function of QLT in the magnetostatic approximation and it was claimed that plasma wave effects or dynamical turbulence would in fact cure this issue. Other authors (Tautz et al. 2006b) have however pointed out that non-linear effects are likely more important. Non-linear theories, in particular, exhibit finite resonance widths, thus curing the 90 • problem.
Another important issue with QLT are its difficulties in describing perpendicular transport for slab turbulence. Whereas simulations find subdiffusive behaviour, (∆r ⊥ ) 2 ∝ √ ∆t, the answer from analytical models is not quite as clear and depends on what kind of assumptions enter the definition of the perpendicular displacements and which equations of motion are assumed. If we define the perpendicular diffusion coefficients as found in the derivation of the Fokker-Planck equation (14), we assume the equations of motion as in eq. (3), meaning that the turbulent field is evaluated along the unperturbed trajectories in the homogeneous background field B , see Fig. 1a. In this case, κ ⊥ vanishes (Schlickeiser 2002), again due to the narrow resonance condition. This assumption is of course strictly only true for small enough turbulent magnetic fields. If we instead make the assumption that particles follow field lines, see Fig. 1b, diffusive behaviour is found, (∆r ⊥ ) 2 ∝ ∆t. However, what has been ignored here is the diffusive nature of transport along the field line. If this is taken into account, see Fig. 1c, subdiffusive behaviour is found again. See also Sec. 2.5 above. Numerical simulations indeed confirm the subdiffusive behaviour. Whether the ambiguity of evaluating the perpendicular transport is an issue with QLT or of the additional assumptions made when evaluating (∆r ⊥ ) 2 is a matter of debate. Note that for non-slab geometries, diffusive behaviour is recovered. Finally, it has been noted (Shalchi 2009) that in other turbulence geometries there are also deviations between the QLT predictions and numerical results. Noteworthy are the deviations for composite geometry (Shalchi et al. 2004b).

Non-linear extensions
So far, we have only considered magnetostatic turbulence which for QLT implies the δ-like resonance function. Both dynamical turbulence and plasma wave damping lead to broadening of the resonance function. This has the potential of curing some of the deficiencies of QLT. In particular resonance broadening can prevent the singular behaviour of D µµ at µ = 0. (See Tautz et al. (2006b) for a discussion of the failure of QLT in undamped plasma wave models.) Another way to broaden the resonance function are non-linear theories. These replace the unperturbed orbits of QLT with perturbed orbits, that are more realistic at finite turbulence levels. Below, we review a number of non-linear theories and cite their respective resonance functions.

BAM model (Bieber and Matthaeus 1997)
Bieber and Matthaeus (1997) start from the velocity autocorrelation functions, V ij (t) ≡ v i (0)v j (t) that are required for computing diffusion coefficients with the TGK formalism, In QLT, particle trajectories are perfect helices and the velocities of a particle along its trajectory stay correlated forever. This leads to simple, oscillatory correlations, V xx (t) = V yy (t) ∝ cos Ωt and −V xy (t) = V yx (t) ∝ sin Ωt. In reality, however, velocities will not stay correlated indefinitely as particles will scatter in pitch-angle, and therefore these correlations should decay with time. In the BAM model, the decay is assumed exponential and thus the velocity correlation functions read Substituting those into eq. (54), one finds for the diffusion coefficients, This is of the same form as the classical hard-sphere scattering result (Gleeson 1969), see Sec. 2. In order to fix the perpendicular decorrelation rate, Bieber and Matthaeus (1997) consider FLRW and postulate that the distance z c over which the field lines decorrelate is z c = r 2 g /κ FLRW and thus ω ⊥ = v/z c = vκ FLRW /r 2 g . For a given turbulence geometry and spectrum, both κ and κ FLRW can be computed and the BAM model then allows determining κ ⊥ and κ A . In slab turbulence, however, the BAM model predicts diffusive behaviour in the perpendicular direction which is at variance with what is seen in simulations. Furthermore, in composite turbulence (slab+2D) the BAM model cannot deal with the superdiffusive behavior of FLRW seen in simulations (Shalchi 2009). We thus conclude that the BAM model does not agree with simulation results, at least for two of the most important turbulence geometries.

Non-linear guiding centre (NLGC) theory
Non-linear guiding centre (NLGC) theory (Matthaeus et al. 2003) improves upon the velocity correlation functions of the BAM model insofar as that the perpendicular velocities (i ∈ x, y) are assumed to fulfill where a is a free parameter that needs to be determined by fitting to simulations. This is inspired by the requirement for particle guiding centres to stay on field lines. In fact, for a = 1, eq. (61) reduces to the field line equation (50). The perpendicular diffusion coefficient is then evaluated with the Taylor-Green-Kubo formula (Taylor 1922;Green 1951;Kubo 1957). This gives four-point correlation functions v z (0)v z (t)δB x (0)δB x (t) with two factors of magnetic field strength and two factors of parallel velocity. In NLGC this is assumed to factorise into two two-point functions. The (parallel) velocity part has a simple exponential form, if pitch-angle diffusion is isotropic, i.e. D µµ ∝ (1 − µ 2 ). The Fourier transform of the two-point correlation for the magnetic field is further assumed to factorise into the power spectrum P xx and the so-called characteristic function, exp[ık ·∆x] . If the particle separations ∆x are assumed normaldistributed and diffusive, e.g. (∆x) 2 = 2κ ⊥ t, the characteristic function takes a simple Gaussian form and the perpendicular diffusion coefficient reads, With a power spectrum of the form P ⊥ (k, t) = P ⊥ (k)Γ(k, t) and a dynamical correlation function Γ(k, t) = exp[−γ(k)t] this simplifies to Note how the sought-for perpendicular diffusion coefficient appears on both sides of the equations. Oftentimes, κ ⊥ is therefore computed iteratively. For slab turbulence and in the magnetostatic case (γ = 0), the integral in eq. (63) can be computed analytically (Shalchi et al. 2004a;Zank et al. 2004). Comparing the parallel mean-free path λ = 3κ /v to the correlation length c , two limiting cases are noteworthy: For λ c and for λ c , the results for λ ⊥ from QLT and from the nonlinear closure approximation of Owens (1974) are recovered respectively. Note, however, that even though no assumption is made about the transport in the perpendicular directions (since k ⊥ = 0 in slab turbulence) perpendicular transport turns out to be diffusive, again at variance with numerical test particle simulations (see Sec. 4.1). For a composite slab+2D model, however, the NLGC theory agrees well with simulations if a = 1/3.

Weakly non-linear theory
In weakly non-linear theory (WLNT, Shalchi et al. 2004b), the first two steps of NLGC theory are followed: (1) the factorisation of the fourth-order correlation function of two velocities and two magnetic field factors into two second-order correlation functions for velocities and magnetic field strength separately; (2) the decomposition of the field strength correlation function into the magnetic power spectrum and a characteristic function. The crucial difference with respect to the BAM theory is the form of the velocity correlations. Instead of eqs. (55) to (57), the QLT velocity correlations are kept for the perpendicular motions and only the parallel velocities are assumed to decorrelate at a rate ω, where ω is identified with the pitch-angle scattering frequency, ω = 2D µµ /(1−µ 2 ). For the characteristic function, a Gaussian distribution is assumed in the perpendicular direction whereas for the parallel motion, any possible diffuse contribution is ignored altogether.
Comparing the resulting expression with those from QLT it appears that only additional exponential factors with a linear time-dependence in the exponent have been introduced. When performing the timeintegration these lead to resonance broadening which can be ascribed to pitch-angle scattering and perpendicular motion and the deviation of the particle orbits from purely helical motion. The resonance function is of the Breit-Wigner form. From this, the Fokker-Planck coefficient can be computed, in particular the pitch-angle diffusion coefficient and the perpendicular diffusion coefficient. Note however, that the perpendicular diffusion coefficient depends on the pitch-angle diffusion rate (or equivalently on the parallel diffusion coefficient). In order to probe the perpendicular diffusion independently when comparing to simulations, oftentimes the empirical parallel mean free path from the simulations is adopted. Tautz et al. (2008) use a broadening of the resonance condition in isotropic turbulence, parametrised by smoothing of the particle position along the magnetic field as motivated by second-order QLT (Shalchi 2005). The width of the particle position is computed from the usual QLT. As a consequence, D µµ now has its maximum at µ = 0. The authors find good agreement with the numerical simulations of Giacalone and Jokipii (1999). Also noteworthy is the work of Shalchi et al. (2009) who also present an analytical computation of pitch-angle diffusion coefficient and mean-free path for slab turbulence. It is shown that QLT is a good approximation for |µ| > δB/B z .

Generating turbulent magnetic fields on a computer
The most realistic way of generating a turbulent magnetic field on a computer to propagate particles in is of course to rely on simulations of this turbulence. This offers the opportunity to include (some of) the known complexity beyond the simple turbulence models described above, for instance anisotropic turbulence like the Goldreich-Sridhar picture (Sridhar and Goldreich 1994;Goldreich and Sridhar 1995). Given the large dynamical range required for most applications, it is however also the most computationally expensive. In the following, we will review such attempts and their results, before discussing the generation of synthetic turbulence.

Simulated turbulence
The most extensive set of simulations to date have been performed by Cohet and Marcowith (2016), CM16 from hereon, who tracked test particles through MHD turbulence generated with the RAMSES code (Teyssier 2002). They followed the pioneering work of Beresnyak et al. (2011) and Xu and Yan (2013) and discussed differences in setups and results. For the most part, CM16 ran the MHD part of their simulations on a 512 3 grid, and the box length of the simulation was taken to be five times larger than the turbulence injection scale L inj . This resulted in about one-and-a-half orders of magnitude in dynamical range between the coherence length of turbulence and the dissipation length, the latter being due to the finite numerical resolution. Turbulence was injected either by solenoidal or compressible forcing and the results differ significantly. It is hypothesised that this is due to the preferential driving of Alfvénic turbulence for the solenoidal and of fast-magnetosonic turbulence by compressible case, the latter leading to an isotropic turbulence cascade and being more efficient in CR scattering (Chandran 2000;Yan and Lazarian 2002).
CM16 studied in detail the dependence of parallel and perpendicular mean-free paths on the Alfvénic Mach number M a (which is defined as the ratio of the rms fluid velocity and the Alfvén speed in the total magnetic field, i.e. background plus turbulent). For the parallel mean-free path, a power law scaling with the Alfvénic Mach number λ ∝ (M A ) α is found. At small M a , the results differ strongly between solenoidal and compressible forcing, with the parallel mean-free path at M a = 0.3 being about two orders of magnitude larger in the former case. For the solenoidal case, λ is much larger than found by Xu and Yan (2013) and the dependence on M A is much stronger: Typically α is between −7 and −5 which is also in tension with expectations from QLT where λ ∝ M −2 A , e.g. (Sun 2011). Note that this scaling was also confirmed in test particle simulations of synthetic isotropic turbulence, notable beyond the limits of validity of QLT (Casse et al. 2002). For the compressible driving, λ ∝ (M A ) −2 as expected. The perpendicular mean-free path, on the other hand, is scaling like λ ⊥ ∝ M 2 A in QLT which is largely confirmed by CM16. This is being ascribed to the contribution from field-line random walk to the perpendicular transport. Another prediction for compressible MHD turbulence (Yan and Lazarian 2008) is λ ⊥ ∝ M 4 A , but this only applies for the limits λ L inj or λ L inj , whereas the simulations of CM16 are in between.
An equally crucial result is the dependence of the parallel and perpendicular mean-free paths on gyro radius r g (normalised with respect to the simulation scale L). Here, the results for λ again depend very sensitively on the driving at L inj : If the forcing is solenoidal, the rigidity dependence of λ can be very weak: The dependence is power law like in the range of rigidities tested, λ ∝ r δ g , and δ can become even negative, especially for large M a . In QLT this is only possible for turbulence spectra g(k) ∝ k −q with q > 2 while the power spectral indices found by CM16 are q ∼ 1.5, that is consistently smaller than 2. In the compressible case, the agreement with expectations is much better and the observed scaling is compatible with both δ = 1/3 and 1/2. (The dynamical range is too small to tell, in fact.) Perpendicular mean-free paths show less of a difference between the solenoidal and compressible cases and are largely consistent with a scaling ∝ r 1/2 g . For gyroradii larger than L inj , the transition to small-angle scattering with λ ∝ r 2 g is being observed, as expected.

Synthetic turbulence
Realistic modelling of CR transport requires a rather wide dynamical range for the turbulent modes. MHD simulations of turbulence usually cover no more than one and a half orders of magnitude between the coherence length and the dissipation scale (see e.g. Cohet and Marcowith 2016). An alternative to using simulated turbulence is to adopt one of the turbulence correlation tensorsP ij (k, t) discussed in Sec. 2.4 and to directly generate random realisations of a field with such a correlation structure on a computer. The turbulence generated in this way is usually referred to as "synthetic turbulence". The obvious drawback of this method is its reliance on a turbulence model instead of using the more realistic results from MHD simulations of turbulence. The advantages are the large dynamical range possible in principle, and the possibility of directly testing some of the results of QLT and its non-linear extensions which are more straight-forward to compute for simple turbulence models. When solving the equations of motion, we will need to evaluate the turbulent magnetic field δB at many different positions, possibly also at different times, the latter distinction becoming relevant when considering models of dynamical turbulence. In order to do this, we need to keep track not only of the amplitudes of the turbulent field, but also its phases which are random. This implies generating a random sequence of phases and storing them for the duration of the test particle simulation. On a computer, the turbulent magnetic field will be characterised by a finite number of real numbers, that is the corresponding magnetic field is band-limited. In the literature, two methods have been suggested, depending on whether the phases of a finite number of modes are stored or whether the turbulent magnetic field δB(r) is stored on a discrete grid. We will refer to the former as the harmonic method and to the latter as the grid method. Both methods have their advantages, but also disadvantages which we will discuss.

Harmonic method
In the harmonic method, pioneered by Giacalone and Jokipii (1999), the turbulent field is defined as a superposition of plane waves, Here, only the wavenumbers are discrete, and in order to cover as broad a dynamical range with as small a number N of modes as possible, the spacing in k is oftentimes assumed to be logarithmic. The alternative, but equivalent representation, makes explicit the interpretation as a superposition of N independent waves travelling in the directionsk n with amplitudes A n , polarisationsξ n , wavenumbers k n and phase factors β n . Each mode n is thus specified by six real numbers: one for A n , one forξ n (as it needs to be ⊥k n in order for δB to be divergence-free), one for k n , one for β n and two fork n . Of these,ξ n ,k n and β n are random variables and their statistical distributions are determined by the turbulence model. For instance, in isotropic turbulence (see Sec. 2.4.1), ξ n is uniformly distributed on the unit circle (such that ξ n ·k n = 0),k n is uniformly distributed on the unit sphere and β n is uniformly distributed in [0, 2π[. Giacalone and Jokipii (1999) suggested the following construction with polarisation vector ξ n = cos αx n + ı sin αŷ n , and   x y z   =   cos θ n cos φ n cos θ n sin φ n − sin θ n − sin φ n cos φ n 0 sin θ n cos φ n sin θ n sin φ n cos θ n     x y z   (71) These equations describe a superposition of waves with wavenumbers k n and (complex) amplitudes A(k n ). The direction of each mode is along the z -axis in a coordinate system generated from the lab system through a rotation by θ n around the y-axis and a subsequent rotation by φ n around the new z -axis. {θ n , φ n , 0} are thus the Euler angles defining the rotation of the lab system into the rotated system in the zyz convention and we denote the rotation matrix as M (θ n , φ n , 0). Note that the first term in the exponent of eq. (69) has been simplified in primed coordinates, k · x = k · x = k z z . The polarisation vectorξ n , cf. eq. (70), is then always in the x -y -plane and rotated with respect to the x -direction by the angle α.
It is easy to see that this construction does not guarantee isotropy as the wavevector always lies in the x-zplane. This was first noted by Tautz and Dosch (2013) who suggested the alternative construction, δB(r) from eq. (68) with a wave vector direction k n =   sin θ n cos φ n sin θ n sin φ n cos θ n   , and polarisation vector ξ n =   − sin φ cos α + cos θ n cos φ sin α cos φ n cos α n + cos θ n sin φ n sin α n − sin θ n sin α n   .
Note that in this construction, the turbulent field is isotropic, divergence-free and all three components of the magnetic field have the same energy density. The A n in turn are fully determined by the power spectrum of turbulence. Again, for an isotropic turbulence tensor and ignoring helicity, δB i (k)δB j (k ) = δ (3) (k−k )g(k)(δ ij −k i k j /k 2 ) and thus A n = g(k n ) is the discrete approximation for the desired power spectrum.
While the turbulence model fixes the A n and the statistical distributions of theξ n ,k n and β n , what is not fixed is the binning of the k n and the total number of modes, N . Both are usually constrained by the need to cover as wide a dynamical range as possible. Given our understanding from QLT that interactions are resonant, what is required in the magneto-static limit for one particle energy at a minimum is a spectrum spanning at least a factor of a few around the resonant wavenumber. In addition, power on larger scales can have an impact, depending exactly on what the observable is. This means that easily a few orders of magnitude in wavenumber range are required, even at minimum. Therefore, oftentimes a logarithmic spacing in k is adopted. This leaves open the question what the required number N of modes is. For the case of slab-turbulence, this question has been investigated using the convergence with number of modes of a "quasi-Lyapunov exponent" Tautz and Dosch (2013). On a more practical level, we note that the number oftentimes adopted are N = O(100) − O(1000) for a dynamical range k min /k max ∼ 10 4 .

Grid method
Standard grid method. An alternative way to set up turbulent magnetic fields on a computer is called the grid method (e.g. Qin et al. 2002). While in the harmonic method the amplitudes and phases of the turbulent modes are stored (e.g. in the combination {A n ,ξ n ,k n , φ n }, in the grid method the turbulent magnetic field itself δB(r) is stored on a spatial grid r i,j,k and can be interpolated between these grid points.
Here, we introduce the discretisations of the position r j , r m j = m∆r j , and wavenumbers k j , k m j = n∆k j = 2πm/(N j ∆r j ). In the following, we will consider three spatial dimensions, such that the threedimensional positions x and wave vectors k are indexed by the three integers n 1 , n 2 , n 3 and m 1 , m 2 , m 3 . The Fourier transform pair of eqs. (30) and (31), δB j (k) and δB j (x), then corresponds to the discrete Fourier transform pair δB m1,m2,m3 j and δB n1,n2,n3 j , δB m1,m2,m3 δB n1,n2,n3 for discretely sampled δB j (r) and δB j (k), δB j (k m1 1 , k m2 2 , k m3 3 ) = ∆r 1 ∆r 2 ∆r 3 δB m1,m2,m3 j A fast way of setting up a homogeneous scalar Gaussian random field in 3 dimensions with a given power spectrum works in harmonic space. The power spectrum only determines the amplitudes, but not the complex phases. To obtain a homogeneous Gaussian random field (with the correlation structure defined by the power spectrum), the phases must be complex normal distributed, arg(δB n ) ∼ N (0, 1) + ı N (0, 1). However, for a real turbulent field the phases need to further satisfy the relation implied by eq. (32). For a discrete field in one dimension, that is δB(N/2 − k n ) = δB * (k n ). Instead of enforcing the reality conditions by hand, it has proven convenient to use an efficient routine for the generation of a real Gaussian random field with no correlation structure, that is white noise, Fourier transform and then scale the complex amplitudes with the desired Fig. 2 Illustration of the idea of using nested grids. Note that in this illustration padding is not used and thus the grids are not overlapping.
power spectrum before transforming back. Note that modern Fourier transform libraries provide routines for reconstructing the full inverse Fourier transform from the Fourier transform at just the positive frequencies.
Knowing how to generate a scalar Gaussian random field, it might seem that we just need to combine three independent scalar fields into a 3D vector. However, in general this 3D random field will not be divergencefree. In order to guarantee that the field is divergencefree, only the polarisations perpendicular tok should be retained. This can be achieved by subtracting from eachB n the projection of it ontok.
The advantage of the grid method is most importantly its speed: Instead of performing a sum of N modes for a large number of test particles at each timestep of the test particle propagation, only an interpolation between the relevant grid points is needed. For a fine enough grid in 3D, a tri-linear interpolation is sufficient. (See, however, Schlegel et al. (2019).) In most cases, this is computationally more efficient. However, this gain in speed is achieved at the price of increased memory requirements. For example, a 3D field on a 2048 3 grid requires 192 GB of RAM, where we have ignored overhead. While certain nodes of computing clusters can have more RAM, as of the writing of this review, this is already beyond the reach but of the most powerful personal computers.
At any rate, a finite grid size implies issues with periodicity and accuracy of interpolation. The latter can be minimised by ensuring that the smallest wavenumber are a factor of a few larger than the grid spacing, λ min = (a few) ∆x. At the same time, a few of the largest modes should fit onto the extent L of the grid, L = (a few) λ max , in order to reduce possible periodicity issues. Thus with 2048 grid points, we can cover at most a dynamical range of λ max /λ min ∼ O(100). This is probably enough to capture the particle-wave resonance, even for broadened resonances. However, modes at scales larger than the resonant scale can also have an effect on particle transport, e.g. through FLRW, but cannot be taken taken into account for such a small dynamical range.
Nested grid method. In light of these considerations, it was suggested  to increase the dynamical range by using nested grids. This method was later also used by Mertsch and Funk (2015) and Savchenko et al. (2015). The idea is that the total dynamical range [k min , k max ] is divided into N intervals [k i , k i+1 ] with k 0 = k min and k N +1 = k max . Each interval is set up on a separate grid and these sub-grids are then periodically replicated over the whole computational domain. See Fig. 2 for an illustration of the method in 3D.
The total turbulent field is given by the sum of turbulent fields on each grid. For a power law power spectrum, P (k) ∝ k −q , the turbulent energy δB 2 i to be localised on a sub-grid i is As for the case of a single grid, it is advisable not to use the whole range of the grid for turbulent modes, but to use part of the range for padding. In Fig. 3, we illustrate the overlapping nested grids produced in this way.
In this way a much larger dynamical range can be achieved. For definiteness, we close the discussion of nested grids with an example for how to set up the (sub-)grids for a test particle simulation. In Fig. 3, we illustrate the nesting of four grids with 32 points each. On each grid i, we are only using 12 points to set up the turbulent modes with a dynamical range of k i+1 /k i = 12. The remaining 20 points are used for padding. For example, we can set the amplitude to zero 10 1 10 0 10 1 10 2 10 3 k/k 0 Illustration of the nested grid approach. Shown is the power spectrum and how it is partitioned onto four sub-grids i, each only contributing in a limit range of wavenumbers.
for the first (a−1) = 3 modes, have finite power between j i = a and b (corresponding to the wavenumbers k i and k i+1 ) and again no power for the remaining grid points. Note how the wavenumber grids are organised in order for the different grids to smoothly connect. The parameters of this examples have been chosen to allow for a clear presentation in Fig. 3. As a real application example, we might instead consider the propagation of 10 TeV test particles in a B 2 = 4 µG isotropic field with an outer scale of 0.1 kpc. The gyro radius in the 4 µG field is ∼ 2.7 × 10 −6 kpc, thus the dynamical range required is at least 0.1/(2.7 × 10 −6 ) 3.7 × 10 4 . This could be achieved by nesting five grids of 128 points each, each grid only covering a factor 16 in dynamical range. The remaining range of 128/16 = 8 would be used as padding. Note that without nesting, the dynamical range of 3.7 × 10 4 would have required a number of grid points per dimension of 131 072 or more which corresponds to 48 PB of RAM for a 3D-vector field of doubles!

Applications
Traditionally, test particle simulations have been used primarily for computation of diffusion coefficients which would then be compared with analytical results in order to test CR transport theories (Giacalone and Jokipii 1999;DeMarco et al. 2007;Snodin et al. 2016;Subedi et al. 2017). In addition, test particle simulations have been used (and are still being used) to study the deflection of ultra-high energy CRs in the Galactic magnetic fields where transport is certainly not diffusive (Karakula et al. 1971;Harari et al. 2000;Tinyakov and Tkachev 2002;Alvarez-Muniz et al. 2001;Harari et al. 2002;Kachelriess et al. 2006;Bretz et al. 2014;Farrar and Sutherland 2019). There are however situations where even Galactic transport is not diffusive or where the diffusive picture is questionable. These include the escape of Galactic CRs from the CR halo around the knee (DeMarco et al. 2007;Giacinti et al. 2015), near source transport Kachelrieß et al. 2015), stochastic acceleration (Fraschetti and Melia 2008;O'Sullivan et al. 2009;Winchen and Buitink 2018) and the study of CR anisotropies Schwadron et al. 2014;Mertsch and Funk 2015;Ahlers and Mertsch 2015;López-Barquero et al. 2016;Pohl and Rettig 2016;Kumar et al. 2019;Mertsch and Ahlers 2019). In the following, we will briefly review the use of test particle simulations and discuss the results for a few physics cases.

Computing transport coefficients
All the non-linear extensions that are meant to address QLT's issues need to make certain assumptions (see Sec. 2.7). While these assumptions may be well motivated, it is not clear a priori whether they result in an accurate description of CR transport. It is therefore of great interest to test these theories by comparing their results with those of numerical simulations.
A central prediction of the non-linear models are the parallel and perpendicular mean-free path or equiva-lently the parallel and perpendicular diffusion coefficients, κ and κ ⊥ . To a lesser extent, numerical simulations have also been employed to compute the pitchangle scattering diffusion coefficients D µµ and the offdiagonal, anti-symmetric elements of the diffusion tensor κ A describing drifts. Of course, checking if transport is diffusive in the first place (instead of subdiffusive or superdiffusive) is another important application of test particle simulations.
We start by recalling the definition of the instantaneous diffusion coefficients, The mean square displacements (∆x i ) 2 are directly accessible for a set of trajectories {r j } from test particle simulations Assuming again that the regular magnetic field B = B zẑ , we identify d = d zz and d ⊥ = d xx = d yy .
As far as the averaging on the RHS of eq. (80) is concerned, most authors have adopted an averaging over initial particle velocity and over magnetic field realisations. The former is necessary as the (instantaneous) diffusion coefficients do not retain any pitch-angle dependence, cf. eq. (23), and the latter is a consequence of QLT considering the ensemble-averaged phase space density. There is no agreement in the literature, however, on how many particle directions and how many field realisations are required to accurately compute diffusion coefficients.
For times much larger than the scattering time, the instantaneous diffusion coefficients should converge towards the asymptotic diffusion coefficients, κ and κ ⊥ . Depending on the normalised rigidity and on the level of turbulence, this only happens after many gyroperiods. Correspondingly, the computational expense can be very high. In order to increase the statistics at intermediate times, it was suggested (Casse et al. 2002) to not only use the initial position r j (0) as one endpoint of simulated trajectories in computing the mean squared distances, but to also consider intermediate intervals [t i , t i+1 ]. This improves the statistics of trajectories for intermediate times, however, it is not clear whether this will guarantee enough pitch-angle scattering.
We note that it is also possible to test the diffusion approximation by computing κ from the pitchangle diffusion coefficient D µµ . Note that in practice, oftentimes the scattering rate is derived from the already pitch-angle averaged correlation function µ(t)µ(0) instead of from the pitch-angle diffusion coefficient D µµ (µ).
We close by noting that already Giacalone and Jokipii (1999) explored alternatives for computing the diffusion coefficients. The solution of the diffusion equation for an initially localised distribution is a multivariate Gaussian with variances σ = 2κ t and σ ⊥ = 2κ ⊥ t in the parallel and perpendicular directions. Determining the spread of a set of trajectories from their common origin therefore allows computing the diffusion coefficients.
In Tbl. 1 we compare the prediction of κ and κ ⊥ from various transport theories to the results from numerical simulations.

CR anisotropies and backtracking
Another application of test particle simulations is the study of anisotropies. These are motivated by the discrepancy between the standard diffusive picture and observations, both on large-scale and small-scale anisotropies, hinting at limitations of the standard diffusive picture of Sec. 2.2.
In this standard picture, a small spatial gradient in the CR phase space density leads to the formation of a small dipole in the arrival directions, aligned with the direction of the regular or mean magnetic field. What matters for the formation of the dipole is the gradient over a few mean-free paths before observation and any anisotropy imprinted at larger distances will be destroyed by pitch-angle scattering. However, the phase space density f in the actual realisation of the turbulent field that we live in will in general differ from the ensemble average f , see the discussion in Sec. 2.1, and therefore, also the arrival directions seen by an observer will differ from the dipole predicted for the ensembleaveraged phase space density.
This reasoning has been applied by Mertsch and Funk (2015) to the CR anisotropy problem (Hillas 2005;Zirakashvili 2005;Erlykin and Wolfendale 2006;Ptuskin et al. 2006;Evoli et al. 2012;Pohl and Eichler 2013;Sveshnikova et al. 2013;Kumar and Eichler 2014;Schwadron et al. 2014;Ahlers 2016), that is the discrepancy between the measured dipole anisotropy and the one predicted in isotropic diffusion models. Test particle simulations can be used to explore the deviations of the phase space density and anisotropies from the ensemble average in particular realisations of the turbulence magnetic field.
To this end, particles are followed backward in time, starting at time t of observations and computing the trajectories back to an earlier time t 0 . For a given set of trajectories {r j } from test particle simulations, we can then use Liouville's theorem, that is the conservation of phase space density along trajectories, to connect the Table 1 Comparison of parallel and perpendicular transport in simulations and theories for different turbulence geometries. Here, we assume magnetostatic turbulence.
isotropic slab composite NLGC c κ ⊥ too high, but scaling with R and δB/Bz WLNT "Serious mathematical issues" (Tautz et al. 2006a) diffusive ⊥ diffusive ⊥ subdiffusive a Note that there have been hints for subdiffusion at low rigidities Casse et al. (2002); Candia and Roulet (2004) b Except for steep turbulence spectra where 90 • degree scattering becomes important c NLGC theory requires λ as an input. phase space density seen by an observer at time t and at the origin of the trajectories r ⊕ to the assumed phase space density f (t 0 ) at the other end of the trajectories.
More specifically, where r i (t ) and cp i (t ) are the positions and velocities of a particle with position r i (t) = r ⊕ and velocity cp i (t) at observation. In order to predict the phase space density seen by an observer at time t, some assumptions need to be made on the phase space density at the other ends of the trajectories, specifically at time t 0 . Usually, for f (t 0 ) the random fluctuations are ignored and the ensemble-averaged f (t 0 ) is adopted. Eq. (81) then becomes exact if the backtracking time (t−t 0 ) → ∞. This is motivated by the fact that ensemble averages of second moments of the phase space density, e.g. the dipole amplitude or the angular power spectrum, are insensitive to the fluctuations δf at t 0 (Ahlers and Mertsch 2015). For the ensemble-average a solution of the CR transport equation is adopted, e.g. a spatial gradient. It was shown by Mertsch and Funk (2015) that the intermittency effects due to the turbulent magnetic field can lead to a significant uncertainty in the prediction of the dipole amplitude and direction, both for the case without and with strong background field. Together with the projection effect due to a potential misalignment between CR gradient and magnetic field direction, this can bring the predicted dipole anisotropy back into agreement with the observations.
The same backtracking technique and Liouville's theorem can be used to also investigate the appearance of anisotropies on small scales (Abdo et al. 2008;Abbasi et al. 2011;Abeysekara et al. 2014;Aartsen et al. 2016;Abeysekara et al. 2018Abeysekara et al. , 2019 due to intermittency effects in small-scale turbulence (Giacinti and Sigl 2012; Ahlers and Mertsch 2015;López-Barquero et al. 2016;Pohl and Rettig 2016;Kumar et al. 2019). We refer the interested reader to the recent review by Ahlers and Mertsch (2017).

The validity of Liouville's theorem
It has been questioned whether backtracking can be used reliably to investigate the formation of (smallscale) anisotropies (López-Barquero et al. 2017) and whether Liouville's theorem is valid in the presence of pitch-angle scattering. We therefore provide a few comments on its validity.
First, we note that pitch-angle scattering is to be distinguished from collisions. In collisions the particle trajectories changes abruptly due to short-range forces, e.g. hard-sphere collisions in gas kinetic theory. In contrast, in collisionless plasmas each interaction between the particle and a wave-packet changes the particle's pitch-angle only very moderately due to the small turbulent magnetic field, δB 2 /B 2 z 1 (e.g. Kulsrud 2005). Thus, interactions with many wave-packets are needed for a particle to scatter (which can be defined as a particle changing direction by 180 • ). The particle trajectories are smooth since the Lorentz force mediating this change is differentiable.
Second, the validity of Liouville's theorem is not only the basis for numerical backtracking, but is also at the heart of kinetic theory, including QLT and its nonlinear extensions. If Liouville's theorem was not applicable to collisionless plasmas in the presence of smallscale turbulence, then we would also need to abandon the majority of microscopic particle transport theories and much of plasma theory, in fact.
It has been claimed (López- Barquero et al. 2017) that conservation of phase space density is equivalent to the conservation of the magnetic moment M = mv 2 ⊥ /(2B) of individual particles which can be checked by simulating test particles in random (electro)magnetic fields. We have elsewhere already argued against this view (Ahlers and Mertsch 2017): While conservation of phase space density requires only differentiability of forces, conservation of the magnetic moment requires the magnetic field to change only adiabatically, that is B/|∇B| r g and B/Ḃ Ω −1 where r g and Ω are the gyro radius and gyro frequency. Therefore, the conditions for the conservation of the magnetic moment are stricter and variability of the magnetic moment does not imply violation of Liouville's theorem. Note that, of course, magnetic moment M and pitchangle cosine µ are closely related for fixed particle energy, such that any pitch-angle scattering necessarily implies the violation of magnetic moment (Dalena et al. 2012;Weidl et al. 2015). The validity of Liouville's theorem is however not affected by this.
Due to the equivalence of phase space volume and (negative) information entropy, it can be said that in the ensemble-average information is lost. The increase of entropy also implies that the evolution of the system is irreversible, reflecting the diffusive nature of the process. However, it is important to realise that the loss of reversibility only occurs through the ensemble averaging. By contrast, in one particular realisation of the turbulent magnetic field, even though particles scatter, phase space volume is conserved, entropy does not increase and the equations of motion are reversible. It is possible to confirm this fact in numerical test particle simulations.

Summary and outlook
In this review, we have given an overview over test particle simulations of CRs that are used to check transport theories, compute their parameters and predict observables beyond the current reach of such theories. In the first part, we summarised the findings of the current paradigm theory, QLT, and its possible extensions. In deriving the Fokker-Planck eq. (14) and the diffusion eq. (22), we have reviewed the salient features of QLT, that is the evaluation of the force due to the turbulent magnetic field along unperturbed trajectories and the hierarchy of time scales involved. We have introduced the three most popular analytical turbulence geometries (3D isotropic, slab and composite) and, as an example, have reviewed the derivation of the pitchangle diffusion coefficient in slab geometry with a broken power law turbulence spectrum. Pointing out some of the shortcomings of QLT, in particular the so-called 90 • problem for slab turbulence, we have motivated the need to go beyond the simplest quasi-linear theories. For non-linear theories of CR transport, we have mostly limited ourselves to the BAM model, to NLGC theory and to WLNT.
The second part of this review was concerned with test particle simulations itself. First, we developed a technical but central part of running test particle simulations: the generation of the turbulent magnetic field. We have reviewed the two approaches that are regularly used, the harmonic method and the grid method. Both have advantages and disadvantages, but the grid method allows for a much faster evaluation if a large dynamic range in wavenumbers is to be considered. This is particularly true for the nested grid method. We have concluded by reviewing some of the applications of test particle simulations, the major motivation being the current lack of an agreed-upon microscopic transport theory that addresses the various issues that point beyond QLT. Extensions of QLT need to be tested against observations or simulations. Any theory necessarily relies on a certain turbulence model and since the nature of turbulence in the interstellar medium (to a lesser extent also in the interplanetary medium) is uncertain, comparing analytical approaches and numerical simulations based on the same assumed turbulence model is most reliable. We have sketched two important application cases, that is the computation of transport coefficients and the investigation of anisotropies. In doing so, we have stressed the validity of Liouville's theorem for the phase space density before ensembleaveraging which is the basis not only of the backtracking used in anisotropy studies, but also of the analytical approaches.
Given the conceptual simplicity of test particle simulations of CR transport and the availability of computational resources necessary, test particle simulations are one of the most important computational tools in studies of CR transport. It is however also necessary to point to the limitations of test particle simulations. First, as alluded to above, the questions of whether the results can be compared to data is hindered by our ignorance of the underlying turbulence model. Of course, analytical transport theories suffer from the same shortcoming. Turning this argument around, we can however hope to constrain the nature of magnetised interstellar turbulence by comparing the results from test particle codes with observations, for example for anisotropies. Also, with ever increasing computational resources, computing trajectories in simulated turbulence will become increasingly important, but for the time being synthetic turbulence is more useful in investigating a number of phenomenological questions.
Second, test particle simulations ignore feedback of the cosmic rays onto the magnetised turbulence, by definition. While there are first principle approaches, like particle-in-cell simulations, only now are they being used to study phenomena like the non-resonant streaming instability (Haggerty et al. 2019). Such approaches are appropriate for studying such processes in principle, but for the application of such instabilities to astrophysical phenomena, the large dynamical range between plasma skin widths and the relevant astrophysical scales is still challenging. We believe that careful hybrid approaches, combining kinetic cosmic rays with magnetohydrodynamic background plasma will prove most fruitful.