Field-Aligned Interpolation for Semi-Lagrangian Gyrokinetic Simulations

This work is devoted to the study of field-aligned interpolation in semi-Lagrangian codes. In the context of numerical simulations of magnetic fusion devices, this approach is motivated by the observation that gradients of the solution along the magnetic field lines are typically much smaller than along a perpendicular direction. In toroidal geometry, field-aligned interpolation consists of a 1D interpolation along the field line, combined with 2D interpolations on the poloidal planes (at the intersections with the field line). A theoretical justification of the method is provided in the simplified context of constant advection on a 2D periodic domain: unconditional stability is proven, and error estimates are given which highlight the advantages of field-aligned interpolation. The same methodology is successfully applied to the solution of the gyrokinetic Vlasov equation, for which we present the ion temperature gradient (ITG) instability as a classical test-case: first we solve this in cylindrical geometry (screw-pinch), and next in toroidal geometry (circular Tokamak). In the first case, the algorithm is implemented in Selalib (semi-Lagrangian library), and the numerical simulations provide linear growth rates that are in accordance with the linear dispersion analysis. In the second case, the algorithm is implemented in the Gysela code, and the numerical simulations are benchmarked with those employing the standard (not aligned) scheme. Numerical experiments show that field-aligned interpolation leads to considerable memory savings for the same level of accuracy; substantial savings are also expected in reactor-scale simulations.


Introduction
In a Tokamak, due to the large confining magnetic field, a fast homogenisation of the different physical quantities occurs along the magnetic field lines; this leads to very smooth and small variations along the field lines, whereas the scale length of the variations is very small (comparable to the gyro-radius) in a perpendicular direction. This should be taken into account for more efficient simulations. It is typically done by using field aligned coordinates in many gyrokinetic codes. However this approach has the drawback of needing a non-conformal correction after one turn, either in the poloidal or the toroidal direction, which yields a break of symmetry on one section of the torus. More importantly, field-aligned coordinates become singular when approaching the separatrix in a divertor configuration, with potentially serious consequences on the robustness of the numerical algorithm that employs them.
A very promising alternative, which is very flexible in regard to the choice of coordinates, has been introduced by Hariri and Ottaviani [1], and an equivalent approach by Stegmeier et al. [2]. The main idea is to compute the derivatives locally along the field lines, getting the needed values for finite differences by interpolation to the intersection points of a field line with the poloidal planes. We are interested here in a thorough numerical investigation of this idea in the context of gyrokinetic simulations using semi-Lagrangian methods. Pioneering in this sense is the recent work by Kwon, Yi, Piao and Kim [3], where "field-aligned interpolation" is employed in a semi-Lagrangian gyrokinetic code for full-f turbulence simulations. Our work complements the above on the numerical analysis side, and it focuses on the following topics: convergence analysis (i.e., stability proof and error estimates), numerical verification against analytical solutions, and benchmarking with the classical (not aligned) algorithm. The reader interested in the physics context can consult the review article [4], and exhaustive information about the semi-Lagrangian method which was introduced in the context of gyrokinetic simulations in [5] and the Gysela code are provided in [6].
In this work we use the so-called 'backward' semi-Lagrangian method, which consists of an advection phase, where the characteristic trajectories ending at the grid points are traced back in time from t + t to t, and an interpolation phase, where the particle distribution function is interpolated at the origin of these trajectories using the known grid values at time t. By virtue of the method of characteristics, the solution on the grid is therefore known at time t + t. Moreover, in the Gysela code the motion is split between the poloidal plane and the toroidal direction (and also the parallel velocity, but this will not play any role in this paper). In this context, the idea of taking derivatives along magnetic field lines can be naturally extended to semi-Lagrangian methods by replacing the advection and interpolation in the toroidal (ϕ coordinate in the torus geometry) direction by an advection and interpolation along magnetic field lines (combining a ϕ and θ motion).

Model Equations
We are interested in solving the gyrokinetic Vlasov equation (1. 2) The fields u and a , and therefore the characteristic trajectories in (1.2), are completely defined by the modified magnetic field B * (x, v ) = B(x) + (m/q)v ∇ × b(x), where b = B/B, and by the gyro-center Hamiltonian H (t, x, v , μ). Then, defining B * = b · B * = B + mv /(q B)b · ∇ × B, we have (see for example [7]) We now neglect μ and focus on the reduced phase-space (x, v ), where we define the phasespace velocity ξ = (u, a ). The phase-space divergence of this field is where B * (x, v ) is present because B * /m is the Jacobian determinant of the coordinate transformation that was used to obtain the gyrokinetic Vlasov equation. It is straightforward to see that and therefore: div(ξ ) = 0.
Since the phase-space velocity field is incompressible (i.e., divergence-free), an equivalent formulation of (1.1) is the conservation equation In the electrostatic case the gyro-center Hamiltonian reads , x), (1.4) where φ(t, x) is the electrostatic potential, and · α is the gyro-average operator. (In the zero-Larmor-radius limit, we simply have that φ α = φ.) In general one should solve one gyrokinetic Vlasov equation for each particle species, and couple these to a Poisson equation for the self-consistent φ. For simplicity, in this paper we model only one ion species kinetically, we assume an adiabatic response of the electrons, and we make use of the quasi-neutrality approximation [6]. Under these hypotheses the electrostatic potential φ(t, x) satisfies a non-linear integro-differential equation, which is linearized around a given equilibrium to obtain (1.5) where ∇ ⊥ = ∇ −b(b·∇) is the perpendicular gradient operator, · f represents an averaging operator over the whole magnetic flux surface passing through x, ρ th,i (x) is the thermal ion Larmor radius, and λ D,i (x) and λ D,e (x) are the Debye lengths for ions and electrons respectively. On the right-hand-side, q i is the charge of an ion particle, 0 is the vacuum permittivity and n i (t, x) is the instantaneous ion number density, while n 0 (x) is the ion density at equilibrium. n i (t, x) is obtained from f (t, x, v , μ) as (1.6) where α is the gyro-phase angle and ρ(x , μ, α) is the gyro-radius vector. n 0 (x) can be similarly obtained from the equilibrium distribution f eq (x, v , μ). The non-dimensional coefficient on the left-hand side of (1.5) may be reformulated as ρ 2 th,i /λ 2 D,i = n 0 m i / 0 B 2 = E 0 /2E B , that is, half the ratio between the rest energy density E 0 (x) = n 0 (x)m i c 2 and the magnetic energy density E B (x) = B(x) 2 /2μ 0 (here m i is the mass of an ion particle, c is the speed of light in vacuum and μ 0 is the vacuum permeability).
We refrain from giving more details on the equations here; the interested reader may refer for example to [4] and references therein. In fact, since our focus is on assessing the fieldaligned interpolation method, and not on incorporating all components needed for a realistic turbulence simulation, the equations presented in this section will be further simplified for implementation in the Selalib and Gysela codes (Sects. 4 and 5).

Magnetic Configurations
For our numerical simulations in Gysela (see Sect. 5), we will consider a circular magnetic equilibrium in a torus as defined in [6], with magnetic field where R 0 and B 0 are respectively the major radius and the toroidal magnetic field at the magnetic axis, R(r, θ) = R 0 + r cos θ , and q(r ) is the classical safety factor in the large aspect ratio limit (r/R 0 → 0). The unit vectors (r,θ ,φ) form an orthogonal basis of R 3 as long as R > 0. We notice that the magnetic field (1.7) depends on r and θ through R, but not on ϕ. In order to verify that ∇ · B = 0, we recall that the divergence of a vector a = a rr + a θθ + a ϕφ in toroidal components reads ∇ · a = 1 r R ∂ ∂r (r R a r ) + ∂ ∂θ (Ra θ ) + r ∂a ϕ ∂ϕ , and we observe that the magnetic field in (1.7) has B r = 0, ∂ θ (R B θ ) = 0 and ∂ ϕ B ϕ = 0. With regard to the unit vector b, we have b r = 0, b ϕ = sgn(B 0 )/ 1 + ζ 2 and b θ = ζ b ϕ , so that ∇ · b = −(b θ sin θ)/R = 0. The magnetic field lines, parametrized by (r, θ, ϕ), are defined by the equations dr ds = B r = 0, r dθ ds and so in tokamaks with a medium/large aspect ratio (i.e., with small r/R 0 ). The magnetic field lines are close to straight lines in the (θ, ϕ) plane for a given r . For our numerical simulations in Selalib (see Sect. 4) we shall consider the simplified case of a straight periodic cylinder, which amounts to taking R = R 0 in (1.7), and replacing the toroidal angular variable ϕ by a straight variable z = R 0 ϕ. Then We see that the magnetic field is characterized by its central axial component B 0 , the major radius R 0 and the rotational transform iota, which satisfies . (1.10) In order to verify that ∇ ·B = 0, we recall that the divergence of a vector a = a rr + a θθ + a zẑ in cylindrical components reads and we observe that the magnetic field in (1.9) has B r = 0, ∂ θ B θ = 0 and ∂ z B z = 0.
In a similar fashion we also notice that ∇ · b = 0 in this case. The magnetic field lines, parametrized by (r, θ, z), are defined by the equations Therefore the magnetic field lines are straight oblique lines in the (θ, z) plane for each given r .

Overview
The remainder of the paper is structured as follows. Section 2 describes the numerical algorithms that are employed for performing interpolation and differentiation in a 'field-aligned' fashion. Section 3 details the field-aligned semi-Lagrangian scheme in the simplified setting of the constant advection equation in 2D, and provides a rigorous proof of unconditional stability, together with an extensive error analysis. Section 4 presents a simplified gyrokinetic model in cylindrical geometry (screw pinch), which is implemented in Selalib (semi-Lagrangian library) and verified against an analytical solution. Section 5 presents a gyrokinetic model in toroidal geometry (circular Tokamak), which is implemented in the Gysela code and benchmarked against a standard (not aligned) version of the same code. Finally, Sect. 6 gives our conclusions and an outlook on possible future investigations.

Numerical Scheme for 2D Field-aligned Interpolation
To describe the 2D field-aligned interpolation method, we consider a magnetic flux surface at r = r 0 , parametrized by the angular coordinates (θ, ϕ) ∈ [0, 2π] × [0, 2π]. This setting is natural in toroidal geometry, and it also applies to a periodic cylinder by introducing the straight coordinate z = R 0 ϕ and considering straight magnetic field lines in the (θ, z) plane. Our goal is to interpolate a sufficiently regular function g(θ, ϕ), 2π-periodic in both coordinates, at an arbitrary location (θ , ϕ ). We assume that that the values g(θ i , ϕ j ) are known on the uniform 2D grid . By periodicity we can extend this to (i, j) ∈ Z 2 . There exists a unique index j ∈ Z and 0 ≤ β < ϕ such that We then define We will use information stored in the 1D slices g(θ = * , ϕ = ϕ j +k ) k=r,...,s to perform the aligned interpolation at (θ , ϕ ). Let us define a function fieldline θ (θ, ϕ, j) that gives a θ -value that corresponds to the intersection of the field line (or an approximation of the field line) that passes by the point (θ, ϕ) and the line (θ = * , ϕ j ). This function is the cornerstone of the method, as it provides a way to interpolate using values that are close to each other, because the locations of these values are aligned to the physical structures. The fieldline θ function is chosen such as all interpolated points h k are aligned on a single field line. The first stage of the method is to compute u θ ,ϕ (k) k=r,...,s by interpolating g at positions ( fieldline θ (θ , ϕ , j + k), ϕ j +k ) k=r,...,s . We currently employ cubic splines to interpolate along the θ direction on the 1D slices g(θ = * , ϕ = ϕ j +k ) k=r,...,s . The formula for fieldline θ that we have been using so far is the linear approximation which is the equation of a straight line. This expression is exact in the case of the screwpinch described in Sect. 4, and it is a good approximation in the case of the circular Tokamak we consider in Sect. 5 (because of its medium-large aspect-ratio). Should this assumption not be fulfilled, one can take a more accurate description for the field line. The fieldline θ function can be easily changed in the code: it is effectively a parameter of the method. The impact on the simulation would be a small additional cost compared to the very cheap linear approximation, along with an improvement in the accuracy of the method.
The second stage of the method consists in interpolating g(θ , ϕ ) using the values aligned on the parallel direction we just get: u θ ,ϕ (k) k=r,...,s . To achieve this, we use Lagrange polynomials of degree 2d + 1 LAG(2d+1) and take r = −d, s = d + 1. The pseudo-code implementation of the scheme is presented in Algorithm 1, and an illustration is given in Fig. 1.  Fig. 1 Illustration of the aligned interpolation scheme for a target point at position (θ , ϕ ); the squares are located at (θ = fieldline θ (θ , ϕ , j +k), ϕ = ϕ j +k ) k=r,...,s ; the values at square positions are interpolated using values known at black small points; the value at the red circle position (θ * , ϕ * ) is interpolated using values known at the square positions Algorithm 1: Aligned interpolation in 2D

Field-aligned Computation of Derivatives
In the Gysela code (Sect. 5) we need to evaluate ϕ derivatives of the electric potential φ(r, θ, ϕ) to compute the non-linear terms appearing in the advection equations, but also in the diagnostics that compute a set of macroscopic physical variables. In order to do so with a reduced number of points in the ϕ direction (authorized by the aligned interpolation approach), a scheme should be designed to get an accurate approximation of these derivatives. We have evaluated two alternatives to estimate ∂φ/∂ϕ: the first one relies on separately computing the polar derivative ∂φ/∂θ and the parallel derivative b · ∇φ, and then using the formula ∂φ/∂ϕ = b · ∇φ − (b θ /r )∂φ/∂θ R/b ϕ ; the second one performs a field-aligned interpolation similar to Algorithm 1 to compute two accurate values of φ(r, θ, ϕ ± ), and then employs the finite difference formula ∂φ/∂ϕ(r, θ, ϕ) ≈ [φ(r, θ, ϕ + ) − φ(r, θ, ϕ − )] /2 . Algorithm 2 describes the second solution, which is effectively used in the Gysela code.
In the Selalib code (Sect. 4) we do not need to evaluate z derivatives of the electric potential φ(r, θ, z), as in the equations everything is expressed in terms of derivatives in the poloidal plane (r, θ) or along the parallel direction b. To evaluate the electric field along b, we use for example a finite difference formula of order 6, which reads (after setting z = R 0 ϕ Input : g, Output : dg/dϕ Algorithm 2: Derivatives along ϕ with aligned scheme with coefficients whereφ(r i , fieldline θ (θ j , z k , k + ), z k+ ) is obtained by interpolation (for example cubic splines) from the values φ(r i , θ l , z k+ ), l = 0, . . . , N θ .

Theoretical Justification of the Approach for 2D Advection
Our drift-kinetic simulations with the Gysela code (presented in Sect. 5) will emphasize the practical advantages of the field-aligned approach over traditional tensor-product 2D interpolation schemes. Nevertheless, in order to trust the output of such a code when no analytical solutions are at hand, it is very desirable to have proven convergence (that is, consistency and stability) of the numerical methods employed. As it often happens in computational physics, we can provide such a proof only for a drastically reduced mathematical model; but even so, we gain useful insight and a certain degree of confidence in the final numerical scheme. Therefore, in this section we assess the convergence of our field-aligned semi-Lagrangian method when applied to the 2D constant advection equation for f : R + × R 2 → R, where the initial function f 0 : R 2 → R is 2π-periodic in θ and ϕ, and b = (b θ , b ϕ ) is the unit vector of a constant magnetic field (therefore b θ , b ϕ ∈ R such that b 2 θ + b 2 ϕ = 1). We assume that b ϕ = 0, because this hypothesis is required by the scheme. The exact solution reads while the numerical solution f n i, j ≈ f (t n , θ i , ϕ j ) is computed on a uniform grid with indices n, i, j ∈ Z and discretization parameters t ∈ R, θ = 2π N θ , and ϕ = 2π N ϕ , where N θ , N ϕ ∈ N * . Specifically, we have t n = n t and (θ i , ϕ j ) = (θ 0 + i θ, ϕ 0 + j ϕ) with θ 0 , ϕ 0 ∈ R. In our 'backward' semi-Lagrangian scheme, the solution at time t n+1 is obtained from the solution at time t n as is reconstructed from f n i, j through field-aligned interpolation. By virtue of the linearity of the interpolation operator (essentially a linear discrete convolution), the stability of the scheme will be assessed by means of a standard Von Neumann analysis: upon taking the semi-discrete Fourier transform on both sides of the previous equation, we will get where ρ : [−π, π] × [−π, π] → C is the 'Fourier symbol', or 'amplification factor', for a given choice of discretization parameters. We will then prove stability by showing that In the present version of our field-aligned semi-Lagrangian scheme, we make use of centered Lagrange interpolation along both directions b and θ . In particular, we assume odd order 2d b + 1 along b, and 2d θ + 1 along θ , with d b , d θ ∈ N. For any choice of (d b , d θ ), we prove that such a scheme is unconditionally stable, i.e., stable for all values of ( t, N θ , N ϕ ). Finally, we analyze the truncation error for single-mode initial conditions, proving that the scheme converges to the exact solution with order 2d θ + 2 in N θ and 2d b + 2 in N ϕ , as expected. Our error estimates correctly recover the asymptotic behavior for b θ → 0, where the scheme reduces to 1D Lagrange interpolation. In comparison to classical tensor-product 2D interpolation, we clarify how field-aligned interpolation allows for a reduced N ϕ in those situations where the gradients along b are smaller than along the ϕ direction, as typical in magnetic confinement devices. Because of the additional interpolations along θ , our estimates suggest a slight increase in the error constant along this direction, but we expect such an effect to be negligible in practice. In fact, the numerical experiments in Sects. 4 and 5 will confirm that this is more than compensated by the gain along ϕ.
We point out that periodic spline interpolation in the θ direction is used in the codes (Selalib in Sect. 4 and Gysela in Sect. 5) instead of Lagrange interpolation. In Sect. 4 our preference goes to spline interpolation simply because of its higher accuracy for a given polynomial degree. In Sect. 5 a more important motivation is the requirement of introducing the fewest possible modifications with respect to the standard (not field aligned) simulations, which employ cubic splines along both the θ and ϕ directions: although the advection field is altered according to changes in the operator splitting, the polar advection solver remains substantially unchanged because the original interpolation strategy in (r, θ) is not modified. When comparing the two kinds of simulations (standard vs. field aligned), we can therefore ascribe any significant difference in the numerical results solely to a) choice of operator splitting, and b) strategy of interpolation along the ϕ direction.
The outline of this section is as follows: Section 3.1 provides the explicit update formula of our field-aligned scheme, Sect. 3.2 gives a rigorous proof of unconditional stability, and Sect. 3.3 assesses the truncation error of our scheme and compares it with the standard (not field-aligned) algorithm.

Update Formula for Field-aligned Semi-Lagrangian Scheme
For any given grid point (θ i , ϕ j ), we trace the magnetic field line backward in time to obtain the foot of the characteristic (θ * i , ϕ * j ), where ϕ * j ∈ [ϕ j * , ϕ j * +1 ). Since b ϕ = 0 by assumption, the same magnetic field line intersects the grid lines at constant ϕ at the locations (θ * i,k , ϕ j * +k ) with k ∈ Z. The basic idea of the field-aligned semi-Lagrangian method is to use 1D interpolation along θ to obtain the intermediate values f n+1 i, j,k = f n (θ * i,k , ϕ j * +k ), and then 1D interpolation along b to obtain f n+1 i, j = f n (θ * i , ϕ * j ). Thanks to the constant b and uniform discretization, the concepts above will be succinctly formalized in the following discussion, leading to a very compact algorithm.
First, we consider the normalized displacement −b ϕ t/ ϕ along the ϕ direction and decompose it into its integer and fractional parts: For Lagrange interpolation along b, the integer shift r ϕ is used to correctly place the stencil on the grid, and α ϕ is the interpolation variable. We now turn to finding the displacements in the θ coordinate, which correspond to the intersections between the magnetic field line and the various grid lines at constant ϕ. For this purpose, we first define the flight times t k such that This is possible, as b ϕ = 0. At each ϕ j * +k -intersection we now have the normalized diplacements along the θ direction as −b θ t k / θ , which we also decompose into integer and fractional parts: For Lagrange interpolation along θ , the integer shifts r θ,k are used to correctly place each stencil k on the grid, and α θ,k are the interpolation variables. We are now ready to compute the intermediate values f n+1 i, j,k at each ϕ j * +k -intersection through Lagrange interpolation along θ , as and from these we compute the new solution f n+1 i, j , using Lagrange interpolation along b: Combining the last two equations leads to the compact update formula Here we recall that i, j ∈ Z, and that f 0 i, j is N θ -periodic in i and N ϕ -periodic in j. As a result, f n i, j is N θ -periodic in i and N ϕ -periodic in j for n ∈ N. For completeness, we also recall that L d k are the elementary Lagrange basis functions defined by

Fourier Symbol
We now turn to studying the Fourier symbol of our numerical scheme (3.1) and, for simplicity, we redefine i:= √ −1 as the imaginary unit. Because of its periodicity, the Fourier spectrum of f n i, j contains only N θ × N ϕ modes. Therefore, we can proceed by taking the 2D discrete Fourier transform of both sides of (3.1). For i 1 , j 1 ∈ Z we get where the Fourier symbol ρ : Thanks to the relation we can parametrize the symbol in the variables (λ, r ϕ , α ϕ ) as where · : R → Z is the floor function. In the spirit of the Von Neumann stability analysis, we are now led to compute the maximum absolute value S of the symbol above, and to prove that S ≤ 1.

Relation to Discrete Fourier Transform (DFT) for Rational λ
We suppose for the moment that λ ∈ Q, and we represent it as with m ∈ Z, q ∈ N * , and m, q coprime.
So, for any r ϕ ∈ Z, we observe that α θ,k can only assume at most q different values, all rational: where we have introduced the natural sequence Under this assumption for λ we can use the following identity for any complex sequence a k : Here δ : We can then write ρ λ,rϕ ,αϕ (ω θ , ω ϕ ) We now focus on the term between square brackets, a complex sequence w p ∈ C with p = 0, . . . , q − 1, and we represent it as a sum of q Fourier modes by means of a discrete Fourier transform (DFT) and its inverse: Incidentally, we notice that the sum of the Fourier coefficients defined in (3.3b) is equal to 1, as can be proven by looking at the term w 0 in (3.3a): By substituting (3.3a) into the Fourier symbol ρ λ,r ϕ ,α ϕ (ω θ , ω ϕ ) we can get rid of one of the sums, as well as of the Kronecker delta: Because exp(i2π) = 1, we now multiply the right-hand side by exp(i2π p 1 r θ,k ) = 1, use the fact that r θ,k + α θ,k = (r ϕ + k)λ, and then change p 1 with p to obtain By properly rearranging the complex exponential factors according to their dependence on the indexes p and k, we also get where we have introduced the frequencies ω p ∈ R for p = 0, . . . , q − 1 as We now turn to studying the absolute value of the Fourier symbol in (3.5), which is the sum over p of q complex terms. We apply the triangular inequality to such a sum, and use the fact that the modulus of a complex exponential is equal to 1, to obtain the estimate which can be factorized as The second factor on the right-hand side is typical of backward semi-Lagrangian schemes applied to the 1D advection equation. The stability analysis in [8], for example, has already shown that Therefore our attention will focus on the first factor, which must also be ≤ 1. If we can prove that t p ∈ R + for each p = 0, . . . , q − 1, then |t p | = t p and we can use our previous result in (3.4) to obtain For this purpose we first notice, because exp(i 2π p) = 1, that we can rewrite (3.3b) as (3.9) Our stability analysis will now proceed in three stages. In Sect. 3.2.3 we will prove that the Fourier coefficients t p are all real, and in Sect. 3.2.4 that they are non-negative. This implies (3.8) and therefore stability of our numerical scheme for any rational λ, according to (3.6) and (3.7). Finally, in Sect. 3.2.5 we will extend this result to the general situation of real λ.
Remark 1 This result of positivity of the DFT has direct connection with results of Ferretti [9,10] stating equivalence between semi-Lagrangian and Lagrange-Galerkin methods under some assumptions, one of it being the positivity of the (continuous) Fourier transform. Such link may be further studied.

Proving that the DFT is Real
We now prove that t p ∈ R for each p = 0, . . . , q − 1. Given that the t p coefficients are obtained through the DFT (3.3), this follows from the symmetry property w q− p = w * p for p = 1, . . . , q − 1.
If we introduce the complex function of real variable it suffices to prove that the imaginary part of S q,d (ω) is always zero in [0, 2πq]. To show this, we start from Euler's formula exp(i x) = cos(x) + i sin(x) and then make use of the symmetry of Lagrange basis functions on a uniform grid, namely Therefore we have proven that t p ∈ R for p = 0, . . . , q − 1.

Proving that the DFT is Non-negative
Now, it remains to see if we can prove that If this inequality is true for d = d θ , from (3.11) we obtain that t p ≥ 0 for p = 0, . . . , q − 1, and therefore (3.8) holds. From this follows the stability of our numerical scheme for any rational λ.
For q = 1, we have For q > 1, the situation is much more complicated and it requires a careful study of the function S q,d in the interval [0, 2πq]. We first notice that we can explicitly compute the values S q,d (2πn) for n ∈ N, because (3.10) simplifies to where we have used the identity (exp(i2π)) n = 1 together with the partition of unity of the Lagrange interpolant, namely d+1 =−d L d (α) = 1, for all α ∈ R. From the last equation we obtain that and therefore S q,d has at least q −1 zeros in (0, 2πq) and is strictly positive at the boundaries.
In the following discussion we will show that there are no other zeros in the same interval, and that the function is convex at all zeros (and therefore positive in some open interval around each zero). By continuity, this proves that S q,d (ω) ≥ 0 for all 0 ≤ ω ≤ 2πq. Our derivation is somewhat involved because most information will be extracted from S q,d , as in [11]. The derivative of (3.10) reads: Now, if we multiply by exp(idω) the term within square brackets in the expression for S q,d (ω), we can identify the sum therein as the polynomial expansion of a binomial power, and obtain therefore The coefficient that appears in front of the summation can be reformulated as which yields an expression for S q,d (ω) where all terms are real, apart from the complex exponential coefficients in the summation: Because S q,d is real valued, so must be its derivatives. In fact, the imaginary part of the summation above is zero thanks to the fact that w d (0) = 0 by definition, and w d ( We will now study separately the two factors sin 2d+1 (ω/2) and σ q,d (ω) that appear in (3.15). The former has zeros at 2nπ for n ∈ N, with 2d derivatives also zero at the same location.
In fact, if we look at the asymptotic behavior near any of the points ω = 2nπ, we have where we used the angle sum identity for sines, together with cos(nπ) = (−1) n and sin(nπ) = 0. A comparison with the Taylor expansions around the same locations yields We now turn to study the term σ q,d (ω) at the same locations. If we multiply (3.16) by (−1) n+d and then use the angle sum identity for cosines, again with cos(nπ) = (−1) n and sin(nπ) = 0, we obtain We then will use the following discrete form of a lemma relating real convex functions to positive Fourier transforms (see [12,13] and [14] for historical notes).
Lemma 1 Let q ≥ 2 be an integer and f j be a sequence of q + 1 real numbers with j = 0, . . . , q, such that f 0 = f q = 0 and Then, we have This leads to Proof We know that w d is a polynomial of degree 2d+2 whose roots are k, k = −d, . . . d+1. By Rolle's theorem and since w d is a polynomial of degree 2d + 1, w d vanishes exactly one time in each interval (k, k + 1). We also have w d (1/2 + x) = w d (1/2 − x) by symmetry, so the unique zero of w d in (0, 1) is 1/2. By Rolle's theorem and since w d is a polynomial of degree 2d, looking at the variation table, we can see that 0) and t 1 the unique zero of w d in (1, 2), and we have s −1 ∈ (t −1 , 1/2), We now turn to study the asymptotic behavior of S q,d (ω) as ω → 2nπ for n = 1, . . . , q − 1. Thanks to (3.19), we can substitute (3.17) into (3.15) to obtain an asymptotic expression for S q,d , which we integrate once using (3.13). Finally, we obtain the asymptotic equivalence Because the first non-zero derivative is of even order and positive, we conclude that S q,d is convex at each of its zeros 2nπ for n = 1, . . . , q − 1. If we can prove that S q,d has no other zeros in [0, 2qπ], it follows that S q,d ≥ 0 in the whole interval.
We have now the following lemma.

Statement of Unconditional Stability
The stability of our numerical scheme for a general λ ∈ R follows from the stability proof already given, thanks to the density of the rational numbers in R. Let A : R → R be the modulus of our Fourier symbol as a function of λ, for a given choice of r ϕ , α ϕ , ω θ and ω ϕ : Because of the floor function that appears in the Fourier symbol (3.2), A presents discontinuities of the first kind at a set of isolated rational locations, so that the minimum distance between two discontinuities is 1/(1 + d b + |r ϕ |). Everywhere else A is continuous, and specifically so at all irrational values of λ. We now have two cases: 1. If λ ∈ Q, we have already proven that A(λ) ≤ 1; 2. If λ ∈ R \ Q, the function A is continuous in some open interval (λ − δ, λ + δ) with δ > 0. We now suppose that A(λ) > 1 and show that this leads to a contradiction.

Because of continuity, there exists an open interval
Because of the density of Q in R, there exists λ * ∈ Q ∩ (λ − ε, λ + ε) such that A(λ * ) > 1, but this contradicts case 1. Therefore we obtain again that A(λ) ≤ 1.

Approximation Error for 1D Centered Lagrange Interpolation
We now focus on the truncation error due to 1D centered Lagrange interpolation on a uniform grid, of odd order 2d + 1 with d ∈ N. To this end we repeat here part of the analysis done in [15], with a slight refinement on the final error estimates. The results of this section will then be used for assessing the error of our 2D field-aligned semi-Lagrangian scheme, and to compare it to the classical (non field-aligned) scheme. Consider a function g : R → C smooth enough, which we sample on a uniform grid z j = j z with j ∈ Z and z ∈ R * + . Without loss of generality, we focus on the location z = α z with 0 ≤ α ≤ 1. If we write the interpolation error at α z in terms of divided differences, which we reformulate in Peano form, we obtain α, z is the B-spline function over the points α z and z, for = −d, . . . , d + 1, satisfying If we introduce the linear change of coordinates η(z) = (z + d z)/((2d + 1) z)), we can write Q 2d+2 α, z (z)dz = B 2d+2,α (η)dη, where B 2d+2,α is the B-spline function over the 2d + 3 points We now suppose that the function g is harmonic, i.e., g(z) = exp(i(ωz + φ)) with ω ∈ R and φ ∈ [0, 2π], and we proceed with estimating the maximum magnitude of the interpolation error over all possible values of α and φ. Since ∂ 2d+2 z g(z) = (iω) 2d+2 g(z), the absolute value of the error is The maximum magnitude of the integral factor is difficult to compute, nevertheless we can obtain an upper bound by using the triangular inequality for integrals, as with the understanding that such an estimate is sharp in the limit as z → 0, which does not depend on α or φ: The product in front of the integral can be written as where we have extracted the = 1 factor because it goes to zero in the limits as α → 0 and as α → 1, and we would like to retain such an asymptotic behavior in our estimates. All factors in the final multiplication are strictly positive and achieve their maximum value for α = 1/2, and the missing = 1 term has value (1 − 1/2) 2 = 1/4. Therefore we can write the upper bound which reduces to an equality for α ∈ {0, 1/2, 1}. We point out that, although this expression converges to the correct limit for α → 0, it overestimates the linear rate of convergence. In other words, this upper bound is not sharp. Such an expression can be explicitly evaluated as where the central binomial coefficient can be approximated very accurately with the following upper bound, which is sharp in the limit as d → ∞ and can be obtained in various ways (e.g., from Stirling's formula): Putting everything together we obtain the following upper bound, which is sharp in the limit as z → 0 and d → ∞: This formula will be used directly to construct an error bound for the 2D classical semi-Lagrangian scheme, which in turn will be the base of comparison for our 2D field-aligned method. Based on this estimate, we observe that: 1. The approximation error decreases with order 2d + 2 in the discretization parameter z, as expected; 2. In practical applications one seeks the largest value z that yields an error smaller than a certain threshold ε 1; regardless of the order of the polynomial, this always implies that ω z < 2, that is, at least π grid points must fit within the characteristic wavelength of g(z); 3. When the interpolation procedure is part of a semi-Lagrangian scheme, the time step size t must be taken into consideration, because it directly effects the value of α; particularly important is the fact that, in the limit of t/ z → 0, we have α → 0 and therefore the error also goes to zero. For an extended discussion over the role of t in the convergence of semi-Lagrangian schemes we refer to [15].

Error Estimate for Field-aligned Semi-Lagrangian Scheme
We let f (t n ) be the exact solution, and f (n) the numerical solution, at time t n . We introduce some notation (see [15]): is the discretization (sampling) operator on a uniform 2D grid, and T (resp. T ) is the numerical (resp. exact) transport operator in direction b, over one time step t. The (global) error then reads where we identify in (Π T − T Π) f (t n ) the "truncation error" introduced by the numerical scheme between time t n and t n + t. Since the scheme is proven to be unconditionally stable, the error cannot grow in the L 2 -norm when transported by the numerical scheme, i.e., T e (n) 2 ≤ e (n) 2 . The triangular inequality then yields and if we proceed recursively up to time t 0 , where e (0) = 0 by construction, we obtain the upper bound that is, the norm of the (global) error at time t n cannot be larger than the sum of the norms of the previous n truncation errors. Here we made use of the discrete L 2 -norm, defined as The upper bound (3.23) provides us with an error estimate at time t n , if an upper bound for the truncation error is available. Similarly to the analysis in the previous section, we now compute the truncation error for harmonic initial condition f 0 (θ, ϕ) = exp(i(n ϕ ϕ + m θ θ)), for which the exact solution is simply Under this assumption, the local truncation error for our field-aligned semi-Lagrangian scheme can be decomposed into two parts, as where A 1 is the approximation error introduced by Lagrange interpolation in direction b, and A 2 is the approximation error of Lagrange interpolation along θ , which is then interpolated along b: Here we recall the following definitions: Furthermore, in the following calculation we will write θ i θ = 2πi θ /N θ and ϕ i ϕ = 2πi ϕ /N ϕ . We first compute A 2 : similarly to the previous section, we formulate the interpolation error in integral form and obtain We then have We then get As in the analysis for 1D interpolation, we can provide an error bound for the product term and write therefore Nϕ dσ, which leads to and therefore Now we want an upper bound for the L 2 -norm of the global error at time t n . We have Our estimates for |A 1 | and |A 2 | are independent of the grid indices i θ and i ϕ , and therefore they also apply to the maximum over the domain. Moreover, we observe that such estimates apply to any time instant, because they are invariant to the rigid translation that the exact solution undergoes in time. Accordingly, our upper bound for the global error of the fieldaligned semi-Lagrangian scheme is simply e (n) 2 ≤ n|A 1 | + n|A 2 |, with |A 1 | bounded by (3.26) and |A 2 | bounded by (3.25): We notice that for sufficiently small values of b θ we have r θ,k = 0 and α θ,k (1 − α θ,k ) ∝ |b θ |. Therefore in the limit as b θ → 0 the first error term goes to zero and we recover the classical error bound for 1D semi-Lagrangian schemes with n b = n ϕ . In the following discussion we will assume that b θ = 0. We now want to assess the consistency of the scheme, that is, whether at a fixed time t n = n t = T the global error goes to zero in the limit as t, θ, ϕ → 0 (or equivalently in the limit as n, N θ , N ϕ → ∞). If we assume that the three parameters converge to zero according to the algebraic relationships we can distinguish between two different cases: 1. If 0 < γ ≤ 1, the Courant numbers along θ and ϕ either grow as t → 0 (for γ < 1) or they are constant (γ = 1), therefore it is appropriate to use the upper bound 4α(1−α) ≤ 1. Moreover, we use the identities is the central local maximum of the Lebesgue function for Lagrange interpolation on 2d + 2 equispaced nodes [16,17]. Such a maximum is obtained for α = 1/2, and corresponds to the Landau constant [18,19]. The asymptotic behavior G d ∼ log(d)/π for d → ∞ was predicted by Landau [20], and various bounds valid for all d have been given by many authors (e.g., see [21]). Here we report the computed values of practical interest: 2. If γ > 1, the Courant numbers along θ and ϕ go to zero, and therefore for t sufficiently small we have one of these two situations: For the sake of brevity, we only consider case (a); since our Lagrange interpolant is constructed on an even number of equispaced nodes, it can be shown that the final result of this discussion is identical for case (b). As α ϕ goes to zero, we now have In general for b θ = 0 we have α θ,k = 0 for k = 0. Therefore, if we use the upper bound 4α θ,k (1 − α θ,k ) ≤ 1 and the asymptotic equivalence we obtain the estimate where the O( t)α ϕ term represents the error that results from truncating the MacLaurin Finally, we use this last estimate together with the identities to get the approximate upper bound (not valid in the limit as b θ → 0) The magnitude of λ = b θ N θ /(b ϕ N ϕ ) is discussed in the next section, where a comparison with the classical scheme is presented. Here we compute C d b , which grows logarithmically with d b and has values For all possible values of γ , we have shown that our field-aligned semi-Lagrangian scheme is consistent (i.e., the error goes to zero as t, θ, ϕ → 0). Given stability and consistency, we have proven convergence of our method.
Remark 2 We point out that the limit as γ → 0 corresponds to a constant t, i.e., no time refinement: the scheme correctly converges to the exact solution as N θ and N ϕ are increased, and our first estimate (3.28) applies. Conversely, the limit as γ → ∞ corresponds to constant N θ and N ϕ , i.e., no spatial refinement: our second estimate (3.29) applies, because it is independent of γ , and the error goes to a constant value, without diverging, as t is reduced.

Comparison with Classical (Not Aligned) Approach
If a standard tensor-product 2D interpolation is used, one could show that the two 1D interpolation operators exactly commute, and their corresponding approximation errors independently contribute to the local truncation error. As a result, an upper bound for the L 2 -norm of the global error is simply By comparing (3.27) with (3.30), we immediately notice that the second error term is much smaller in the field-aligned case if |n b | < |n ϕ |, that is, if the gradients along b are smaller than the gradients along ϕ. Specifically, if we assume that d ϕ = d b = d, the error is reduced by a factor (n b /n ϕ ) 2d+2 . Vice versa, if we seek to reduce the number of points along the ϕ direction for a given error level, then the field aligned scheme allows us to use only N ϕ |n b /n ϕ | points. The first error term is more difficult to compare, because it has a more complicated form in the field aligned case. In general terms, we can say that the error constant is somewhat larger because of the additional interpolations that are required; in order to quantify this overhead, we now look at the rate of convergence of the classical scheme for the various values of γ : therefore the first error term of the field-aligned scheme in (3.28) is larger by a factor equal to the Landau constant G d b , which is smaller than 2 for the cases of practical interest; 2. For γ > 1 we have the estimate therefore the first error term of the field-aligned scheme in (3.29) is multiplied by a factor . We have already shown that C d < 3 for d ≤ 9, therefore C d b /4 < 1 for the cases of practical interest. It now remains to see if |λ| ≥ 1, which is not an obvious task given that both |b θ /b ϕ | and N ϕ /N θ are small numbers in practice. Here we assume that |n b /n ϕ | 1, which is the condition that justifies the use of a field-aligned approach. An appropriate mesh for the classical scheme would require c = N θ /N ϕ ≈ |m θ /n ϕ |, which yields the estimate while an appropriate mesh for the field-aligned scheme would be coarser in ϕ, specifi- Therefore, we can say that |λ| ≥ 1 in all situations where the mesh is not overly refined in ϕ. This leads to C d b /|4λ| < 1: for sufficiently small t the first error term is smaller for the field-aligned scheme than for the classical scheme.
Overall we can conclude that the field-aligned semi-Lagrangian scheme allows for important computational savings, of the order of |n ϕ /n b |, for those situations where the gradients are smaller along b than along ϕ. The price to pay is an increased error constant for convergence in N θ . Such an increase is negligible for 0 ≤ γ ≤ 1, which are the conditions where refinement usually occurs. In the unusual situation where the t refinement dominates (γ > 1), the increase may be substantial only if the mesh is overly refined in ϕ (which are not the conditions in which we intend to use the field-aligned scheme). In all cases, the order of convergence is unaffected. approach and on n b /N ϕ for the field-aligned algorithm. Since the new method targets applications with strong plasma anisotropy, such that n b n ϕ , it allows working with a much smaller N ϕ for an equivalent error. Since the CPU time is roughly proportional to N ϕ , one can see the advantage of the new method for strongly anisotropic problems, both in terms of CPU time and in terms of storage.
In order to validate the field-aligned approach for application to plasma kinetics, in this section we present a first simplified gyrokinetic simulation, developed in the framework of the Selalib library [23]. The model consists of a 4D drift-kinetic equation in cylindrical geometry, with an oblique magnetic field as defined in (1.9); it is a generalization of the case with a uniform magnetic field in the z direction (corresponding to ι(r ) = 0 and thus ζ(r ) = 0 in (1.9)), which has been first developed in [5] and then reproduced in [24] for example. The complete derivation of the model is given in "Appendix A". By virtue of operator splitting, one of the equations to be solved is exactly the same as in Sect. 3, therefore we use the same algorithm presented there, but with one notable difference: instead of Lagrange interpolation in the θ direction, we prefer using spline interpolation, as this benefits from a lower error for a given polynomial degree.
In the uniform case, we are able to check the linear phase behaviour by solving numerically the dispersion relation and compare the simulation output with it (see also [25,26]). Note that the dispersion relation depends on k : this permits to compare simulations in the oblique and uniform case, in order to check the correctness of the simulations in the oblique case, as we will see. Another (more straightforward) way to validate the code will be to double the number of points along the ϕ direction (in practice, we will compare N ϕ = 32 with N ϕ = 64) and to observe that the results do not significantly change (convergence of numerical discretizations). We could also have compared the code with a standard (i.e., not using a field aligned interpolation) approach with a refined mesh, but this would have required to develop the corresponding code. The latter approach is not tackled in Sect. 4, but it will be employed in Sect. 5 in the framework of the Gysela code.

Model Equations
We look for f = f (t, r, θ, z, v ) satisfying so that Here, we have z = R 0 ϕ and |B 0 | = 1. The self-consistent potential φ = φ(t, r, θ, z) solves the quasi-neutrality equation without zonal flow When ι = b θ /r b z /R 0 = 0, we recover the classical drift kinetic model given in [5,24] for example. A similar model has been simulated in [27], with ι = 0.8 as an example, using a Particle in Cell method.
We note that all quantities appearing in these equations are non-dimensional. The equations themselves can be derived from (1.1), (1.3) and (1.5) by neglecting terms in power of ιr/R 0 (see "Appendix A").
The boundary conditions on f are the following: -Periodicity along θ , z and v ; -Zeroth-order extrapolation along r , i.e., we give values to f outside the domain (for interpolation at the foot of the characteristic) according to the scheme The boundary conditions on φ are the following: -Periodicity along θ and z; -Neumann mode 0 (see [24]) at r = r min , that is, if we decompose φ into its Fourier modes φ k along θ : -homogeneous Neumann boundary condition for the Fourier mode 0 (∂ r φ 0 (t, r min ) = 0), i.e., 2π 0 ∂ r φ(t, r min , θ)dθ = 0; -homogeneous Dirichlet boundary condition for all other Fourier modes ( φ k (t, r min ) = 0 ∀k), i.e., ∂ θ φ(t, r min , θ) = 0.
-Homogeneous Dirichlet at r = r max , that is φ(t, r max , θ) = 0; The initial function is given by where the equilibrium function is The radial profiles {T i , T e , n 0 } have the analytical expressions where the constants are The dispersion relation reads (see "Appendix B") Here Z is the so-called 'plasma dispersion function' [28], defined as We notice that the dispersion relation depends on m and k but not directly on n. This means that taking different values of ι and n but with same m and k will lead to the same dispersion relation.

Numerical Methods
For time-stepping of (4.1) we use a predictor-corrector scheme closely related to the explicit midpoint rule for ordinary differential equations: starting from the solution at time t, the fields at time t + t/2 are evaluated to first order accuracy in t (predictor), and are then used to update the solution at time t + t to second order accuracy (corrector). Both the predictor and the corrector algorithms are splitting methods, where the 4D gyrokinetic Vlasov equation (4.1) is decomposed into three separate advection equations: A. 2D advection on a magnetic flux surface (θ, z), with constant velocity v b, C. 2D advection on a poloidal plane (r, θ), Hereφ(r, θ, z) is a constant-in-time approximation of the time varying field φ(t, r, θ, z). The three equations above are all solved using backward semi-Lagrangian methods. Specifically, for equation A we use the field-aligned algorithm described in Sect. 3, with a slight modification: we use cubic spline interpolation in the θ direction, and 5th order Lagrange interpolation (field-aligned) in the z direction. For equation B we use 1D cubic spline interpolation, and the parallel gradient ofφ is computed by 6th order finite differences (field-aligned) in the z direction. For equation C we use 2D tensor-product cubic spline interpolation in the (r, θ) plane, and since the flow field is not uniform, we calculate the feet of the 2D characteristic trajectories by means of the Verlet algorithm: letẊ = u 1 (X, Y ) andẎ = u 2 (X, Y ) be the characteristic equations of C, and (X n+1 , Y n+1 ) = (r i , θ j ) be the final position of one characteristic trajectory at time t n+1 ; the foot (X n , Y n ) of the characteristic is calculated as We use Lie splitting (1st order) as predictor and Strang splitting (2nd order) as corrector; the complete time-stepping algorithm then reads: 1. Computeφ from f n by solving the quasi-neutrality equation (4.2); 2. Computef n+1/2 using Lie splitting:f n+1/2 = C( t 2 )B( t 2 )A( t 2 ) f n ; 3. Computeφ fromf n+1/2 by solving (4.2) again; 4. Compute f n+1 using Strang splitting: The quasi-neutrality equation (4.2) is an elliptic partial differential equation in the variables (r, θ), therefore it can be solved independently on each poloidal plane z = z * . Taking advantage of the linearity of the differential operator and of the periodicity of the domain, we apply the discrete Fourier transform in θ to both sides of (4.2). Since the factor in front of ∂ 2 θ does not depend on θ and the boundary conditions in r are homogeneous, each Fourier coefficientφ m (r ) solves a separate 1D boundary value problem on [r min , r max ] and is independent of the other coefficients: For each mode m, this ordinary differential equation is collocated at the grid points r = r i , and the derivatives are approximated by 2nd-order central finite differences. Once the proper boundary conditions are taken into account (see previous section), calculating {φ m (r i ) : ∀i} requires the solution of a tridiagonal linear system of size N r . When all modes m are computed, the potential on the polar plane z = z * is reconstructed.

Numerical Results
We consider the parameters of [25] (MEDIUM case) Given the magnetic field (1.9), we recall that b θ = ζ b z and ζ = ιr/R 0 , therefore k = (ιm + n)b z /R 0 . We use a large value of R 0 /r max so that ζ 1 everywhere in the domain, and the reduced model (4.1) is a good approximation of the gyrokinetic Vlasov equation (see "Appendix A"). As our first test-case we consider a straight magnetic field with ι = 0 and excite the mode (m, n) = (15, 1), which leads to k = 1/R 0 . In our second test-case we consider a twisted magnetic field with ι = 0.8 and choose (m, n) = (15, −11), which leads to k = (0.8 · 15 − 11) b z /R 0 = b z /R 0 . We note that we have for the second case b z = 1/ 1 + ζ 2 with 0 ≤ ζ = ιr/R 0 ≤ ιr max /R 0 ≤ 0.05, so that |b z − 1| ≤ 1.25 · 10 −3 . The two cases have the same value of m and, thanks to the fact that b z ≈ 1, almost identical values of k . Thus the dispersion relation, which only depends on m and k , yields almost the same result in both cases, which means that the two simulations should give very similar results in the poloidal plane, at least in the linear phase (as it is also observed in [27]). The test-case parameters are summarized in Table 1, together with the resulting frequencies calculated from the analytical dispersion relation. In all simulations we take v max = 7.32, N r = 255, N θ = 512 and N v = 128; we use N ϕ = 32 for ι = 0, and N ϕ ∈ {32, 64} for ι = 0.8. ι is the (constant) magnetic rotational transform, m and n are the polar and axial mode numbers of the initial conditions, k is the resulting parallel wave number, and ω is the complex frequency calculated from the dispersion relation. The first three significant digits of ω (both real and imaginary parts) are the same in both test-cases  Fig. 2 we report the time evolution of the discrete L 2 -norm of the electrostatic potential φ over configuration space. To this end we approximate the continuous L 2 -norm of φ(t, r, θ, z), defined as with a quadrature formula based on the numerical solution φ n i jk ≈ φ(t n , r i , θ j , z k ): 3) The linear phase (left plot) is in accordance with the dispersion relation, and differences between the three runs become significant only in the non-linear phase (right plot).
On Fig. 3 we see poloidal cuts P T (r, θ):= f (t = T, r, θ, z = 0, v = 0) of the distribution function, at time T = 4000 (end of the linear phase) and time T = 6000 (non-linear phase). Again there is accordance between the figures with more visible differences in the non-linear phase. Note that the solution becomes very complex at T = 6000 with lot of small scales which are difficult to capture; we already observe some diffusion effect, due to the finite grid size; this is not the case at time T = 4000, where convergence still seems to occur.
On Fig. 4 we see magnetic-surface cuts S T (θ, z):= f (t = T, r = (r min +r max )/2, θ, z, v = 0) of the distribution function, again for T = 4000 and T = 6000. Since the number of points N ϕ is purposely low, in order to produce meaningful plots the numerical solution had to be reconstructed on a finer mesh using our field-aligned interpolation algorithm. At the end of the linear phase (T = 4000) we clearly see the structure of the mode (m, n) = (15, 1) in the straight case and (m, n) = (15, −11) in the oblique case. Later in the non-linear phase (T = 6000) the solution presents extensive filamentation, which is not properly resolved on the grid; nevertheless, we notice that in the case of closed flux-surfaces (ι = 0.8) these chaotic structures are roughly aligned with the magnetic field.
On the basis of these numerical experiments we can conclude that field-aligned interpolation performs as well as expected, and enables us to run simulations in cylindrical geometry with an oblique magnetic field (screw-pinch configuration) using limited computational resources. Simulations in a more realistic toroidal geometry will be performed in the next section, where we will compare the efficiency of the new field-aligned approach to its standard (not aligned) counterpart.

GYSELA Model
The Gysela code computes the 5D ion gyro-center distribution f (t, r, θ, ϕ, v , μ) by solving the gyrokinetic Vlasov equation (1.1) in the electrostatic limit (1.4). All quantities appear-ing in the code are non-dimensional. Temperatures are normalized to T e0 , i.e., the initial electron temperature at the mean radius r p = (r min + r max )/2, the electric potential is normalized to K T e0 /q i , where K is the Boltzmann constant, and the magnetic field is normalized to |B 0 |, i.e., its intensity at the magnetic axis. Time is normalized to the inverse of the ion cyclotron frequency ω c,i = q i |B 0 |/m i and velocities are given in units of the ion sound speed v T 0 = √ K T e0 /m i . Consequently, lengths are normalized to the ion "sonic" Larmor radius ρ s = m i v T 0 /(q i |B 0 |) and the magnetic moment μ to K T e0 /|B 0 |. Finally, number densities are normalized to N 0 , i.e., the 'reference' equilibrium density at the mean radius r p , the distribution function f is normalized to N 0 /(v T 0 ) 3 , and electric currents are normalized to μ 0 |B 0 |/ρ s . Under these assumptions we simplify the phase-space flow field (1.3) as In tokamak configurations, the plasma quasineutrality approximation is often made [5]. Electron inertia is ignored, which means that an adiabatic response of the electrons is assumed. Three additional simplifying hypotheses are used in the quasi-neutrality equation (1.5) for the electrostatic potential φ(t, r, θ, ϕ): 1. In the differential operator on the left-hand side spatial variations of the magnetic field intensity are neglected, so that B(r, θ) ≈ 1 (normalized intensity at magnetic axis); 2. Everywhere on the left-hand side any dependence of the equilibrium density on the polar angle θ is neglected, and the nominal radial density profile n 0 (r ) is employed instead (which is used to construct the kinetic equilibrium, as described later); 3. In the integral operator on the right-hand side the so-called "Pade approximation" is used [6].
Further, we define the operator ∇ ⊥ = (∂ r , 1 r ∂ θ ) and let T e (r ) be the electron temperature. Under all the given assumptions, Gysela's quasi-neutrality equation reads where J 0 is the Bessel function of first order and k ⊥ is the transverse component of the wave vector. f eq (r, θ, v , μ) is the reference state for the quasi-neutrality approximation (that is, the plasma is neutral when f ≡ f eq ), and it is also an equilibrium solution of the gyrokinetic Vlasov equation (1.1) in toroidal geometry, in the limit of vanishingly small electric field. We notice that the solution φ to (5.2) couples back into the characteristic equations (5.1) through the spatial derivatives of its gyro-average φ α (t, r, θ, ϕ, μ).
A detailed description of the model equations can be found in [6]. As usual, θ ∈ [0, 2π] and ϕ ∈ [0, 2π]; the radial domain is r ∈ [r min , r max ] and the parallel velocity domain is v ∈ [−v max , v max ]. For simplicity, all simulations in this section are carried out with the 4D toroidal version of Gysela, using a single value of μ = 0. The safety factor profile is q(r ) = q ø + [q a − q ø ] (r/a) 2 , i.e., parabolic with free parameters q ø and q a , where a is the minor radius of the torus. Periodic boundary conditions along θ and ϕ are considered. For interpolation outside the computational domain of the Vlasov solver, zeroth-order extrapolation along r and v is performed: for example, we assume that whenever we interpolate f at r < r min then f (t, r, θ, ϕ, v ) = f (0, r min , θ, ϕ, v ). The initial conditions for all simulations presented in this section are of the form where f eq is a so-called "canonical Maxwellian", which is a global kinetic equilibrium obtained as a phase-space transformation of an isotropic (but inhomogeneous) Maxwellian distribution [29,30]. Specifically, we use the definition of f eq from [31, pp. 102-103], constructed from the nominal radial profiles of density and temperature, given as with parameters κ n 0 = 25, r n 0 = 0.08, κ T i = 3, r T i = 0.08. We recall that n 0 (r p ) = 1 after normalization; by assuming T e ≡ T i we also have T i (r p ) = T e (r p ) = 1. The expression between square brackets in (5.3) represents a perturbation of the equilibrium distribution [5, p. 398]: g(r ) and h(v ) are Gaussian functions centered at r = r p and v = 0 respectively, which ensure that the perturbation vanishes (in the sense that it is smaller than unit roundoff for double precision arithmetic) at the boundaries of the r and v domains; δp(θ, ϕ) is a bath of accessible Fourier modes in the form where mn ∈ [0, max ] and χ mn ∈ [0, 2π] are the amplitude and phase for mode (m, n), randomly sampled from a uniform distribution over the given intervals, with max = 10 −4 .

Parallel Algorithms
The algorithms and the parallelization strategies used in the Gysela code have been already described in previous works [32][33][34]. Algorithm 3 sketches the main features concerning the Vlasov solver that we are interested in here. The usual way to perform a single Vlasov solving in the Gysela code [5] consists of a series of directional advections: (v /2,φ/2,r θ,φ/2,v /2). Each directional advection is performed with the semi-Lagrangian scheme. This procedure is named Strang-splitting and converges in O( t 2 ). It decomposes the Vlasov solver into four 1D advections and one central 2D advection (in the poloidal plane (r, θ)). This solver uses two parallel domain decompositions for the distribution function f . The main rationale that justifies this approach is that advections along a given dimension need all points along this dimension in f . This constraint comes from the spline interpolants that we use actually. Therefore, the 1D advections along ϕ and v are performed with a domain decomposition that retains all points of f along these two dimensions (ϕ, v ) locally in the MPI process. Then, a transpose of the distributed data structure f is performed that involves large collective communications. Then, the 2D advection along both r and θ dimensions can be done, this step uses a local subdomain in ϕ, v and μ directions. After a second tranposition of f , two 1D advections are again performed. In order to depart from the original algorithm to accommodate the aligned strategy, one can list the different constraints that must be taken into account. First, to use the aligned advection approach in (θ, ϕ) plane, it is of outmost importance to treat these two directions in a single step, it permits to apply easily the scheme introduced in Sect. 2.1. Second, 2D advections in (r, θ) can not be suppressed or transformed into a simple advection along the r direction, because the non-linear terms in r and θ interact tightly. Third, to evaluate a new The proposed Algorithm 4 fulfills these constraints. The advections along v are unchanged. The aligned advections along (θ, ϕ) (lines 2,4,7,9) replace the previous advections along the ϕ direction. All advective terms (except the non-linear ones) along the θ direction are treated in this aligned advection in (θ, ϕ). The 2D advections along (r, θ) are modified in order to keep only nonlinear terms in the θ direction. All other advective terms along θ are transfered to the 2D advection operator in (θ, ϕ). Finally, this solution uses a new parallel decomposition at one single location only (distributing over MPI processes along μ, v ) in the 2D aligned advection (lines 4 and 9). Compared to the standard algorithm, the extra transpose and redistribute steps constitute a communication overhead. Another overhead comes from the computation of the feet of the characteristics (lines 2 and 7) that are performed in an already known parallel decomposition in order to have access to needed values that are stored with these parallel decompositions.
We then have a robust parallel solution that does not require an entire overhaul of the Gysela code. Nevertheless some extra communications are created that we measure in the following discussion. In a future work, we will be able to cut costs with a more sophisticate implementation. Indeed, several fixes can be foreseen. One among other solutions is described shortly hereafter. First, one can execute aligned advections (lines 4 and 7) using the usual parallel decomposition of line 6 (∀(μ, ϕ, v ) = [local], ∀(r, θ) = [ * ]). It will require tricky (but not so costly) communication patterns to deal with the parallel decomposition along ϕ direction. Second, when this first change will be made we can mix the computations of feet (lines 2 and 7) with the corresponding 2D aligned advections and then eliminate the transposes of lines 5 and 8. Finally, with this new solution to come, we will reduce the communication volume: avoid transfer of the feet (lines 3 and 8) and remove the need of specific data distribution (lines 4 and 9) and associated data redistribution.
We now describe in detail the equations to be solved at each step of Algorithm 4. Following [35, Section 2.1] we define b * (x, v ):=B/B * + (v /B * )(J/B), where J(x) = ∇ × B is the equilibrium plasma current, and reformulate the phase-space flow field (5.1) as where v b * represents the streaming velocity, v D the curvature drift velocity and v E the E×B drift velocity: After normalization, the 5D gyrokinetic Vlasov equation (1.1) with flow field (5.4) is expressed with toroidal coordinates (r, θ, ϕ) and decomposed into three separate advection equations: A. 1D advection along v , which is left untouched compared to the original algorithm, B. 2D advection on a magnetic flux surface (θ, ϕ), where field-aligned interpolation is used, C. 2D advection on a poloidal plane (r, θ), where only the nonlinear E × B velocity is retained along θ , It is important to notice that Equation B above is more complicated than the oblique advection of Sects. 3 and 4. However, this approach permits to avoid large modifications of the Gysela code which was a constraint. An illustration is given in Fig. 5, where we plot the streamlines for the advection field of Equation B, in the limit of vanishing electric field (v E = 0). We consider the same geometrical parameters of the test-cases in the next section (namely a = 40, [r min , r max ] = [0.1 a, a], and R 0 = 3 a), and we set q ø = 1 and q a = 2.5 in our parabolic safety factor profile q(r ), which yields a local value q(r p ) = 1.45375. For brevity we set μ = 0, select a single magnetic flux surface at r = r p , and focus on four different values of v ∈ {−2, −1, 1, 2}. (We recall that all velocities are normalized to the thermal velocity v T 0 .) For comparison we also plot the streamlines of the magnetic field passing through the same points at ϕ = 0: as expected, the misalignment between the advection velocity and the magnetic field grows with v , but remains reasonably small throughout this range of velocity. Finally, in the background is shown a set of straight lines with dθ/dϕ ≡ ι(r p ), which approximate the magnetic field on this flux surface: these are used for fieldaligned interpolation of the distribution function f (Sect. 2.1), as well as for field-aligned differentiation of the electric potential φ (Sect. 2.2).

Numerical Results with Gysela
In a gyrokinetic simulation with kinetic ions and adiabatic electrons it is to be expected that the smallest length scale is of the order of the ion Larmor radius ρ s . This is due in part to the gyroaverage operator in configuration space, and in part to the averaging over μ that takes place when computing the charge density. Since ρ s is also the quantity used for normalization of all lengths, we can say that a well-refined numerical simulation requires r 1 and r max θ 1. A fundamental non-dimensional parameter in magnetic fusion devices is the ratio ρ * = ρ s /A, where A is the minor radius of the device. (In terms of nondimensional quantities, the minor radius of the device is a = A/ρ s and therefore ρ * = 1/a.) The number of degrees of freedom needed to represent a poloidal cut of the solution scale with (ρ * ) −2 , therefore smaller values of ρ * lead to larger numerical simulations.
In order to have accurate and converged simulations, in this section we use a setup with a relatively large value of ρ * = 1/40, and we consider a single μ-value of μ = 0. Strictly speaking, in such a situation there is neither gyro-averaging nor μ-averaging, therefore there is no physical lower bound on the characteristic length scales; nevertheless, the solution is still well resolved at the end of our simulations. We investigate two physical cases with geometrical parameters a = 40, r min = 0.1 a, r max = 1.0 a, R 0 = 3 a.
that differ in their safety factor profiles q(r ). The parallel velocity domain is truncated at v max = 6.3. Benchmarks have been realized with the 4D toroidal version of the Gysela code, on a fine computational domain of size In order to keep the time integration error low compared to interpolation errors, a small time step t = 1 was chosen.
A first case with an almost constant safety factor q(r ), slowly varying between q(0) = 1 and q(a) = 1.1, is illustrated by Figs. 6a and 7. A second case with a safety factor strongly depending on r , varying between q(0) = 1 and q(a) = 2.5, is illustrated by Figs. 6b and 8. The second case could be slightly more difficult to handle for the aligned approach, because the b direction depends on the r position through equation (1.10). Indeed, for each hyperplane at a given r , the aligned advection algorithm uses possibly a different direction than for another r value. Figure 6   One can see on Fig. 6a that the standard approach with N ϕ = 128 gives a similar result compared to the aligned method with N ϕ = 32. The two other curves with standard method and N ϕ = 32 and N ϕ = 64 are not converged along the ϕ direction and give substantially different potential energy evolutions. Figure 7 corroborates this fact by showing different cuts of the electric potential: the graphs at the middle and bottom rows show quite identical structures. It is important to notice that we have reconstructed finely the graph with N ϕ = 32 in order to recover a fine resolution on the plots (through 4 aligned interpolations per original grid point, leading to a virtual N ϕ = 128). In order to do that, we use Algorithm 1 with (θ , ϕ ) being the grid points on the fine mesh. As stated in [35], global conservation of mass and energy in toroidal geometry is quite difficult to achieve in practice, due to boundary conditions within Gysela. Therefore, these quantities have not been used to estimate the benefits of the aligned method in this paper. Figures 6b and 8 show results for the second simulation with a strongly varying safety factor. Conclusions are quite analogous as the first simulation. On the left-hand side, one can see elongated structures along the parallel direction, which constitute the rationale that justifies why the aligned method reduces interpolation approximation errors. For these two simulations, we conclude that the aligned approach works well and permits to reduce by a factor of 4 the number of grid points in the ϕ direction for these cases at ρ = 1/40. From  the previous analysis, we also expect that, as ρ * is further reduced to approach the ITER values of the order of 10 −3 [36], it would not be necessary to increase the number of grid points in the ϕ direction in order to achieve comparable precision. Thus, our method could allow a saving of the order of 100 in grid points when employed in the context of realistic simulations of reactor scale devices.

Execution Times Comparison
As a matter of comparison between the standard and aligned methods, Table 2 gives typical execution times of Gysela for four short runs that employ the same configuration and grid size already described in Sect. 5.3 (N r = 256, N θ = 256, N v = 48). For the aligned scheme we take N ϕ = 32, while for the standard scheme we consider three different simulations with N ϕ ∈ {32, 64, 128}. The time breakdown of specific regions of the code are shown in addition to the total run time.
Let us compare the timings for the aligned and standard methods at N ϕ = 32. Firstly we observe that the execution times for the transposes is much higher with the aligned scheme, mainly because there are four transpose steps required (Algorithm 4) instead of two (Algorithm 3). The advection steps are also slightly more expensive with the aligned scheme, because the 1D advection along ϕ is replaced by the 2D advection aligned in (θ, ϕ). In fact, the 2D field-aligned interpolation of Algorithm 1 requires additional computations compared to simple 1D interpolations. The improvements and optimizations addressed at the end of Sect. 5.2 can contribute to decrease these overheads in the future.
Nevertheless, one can see that the aligned strategy with N ϕ = 32 is already competitive against the standard approach with N ϕ = 64 in terms of total run time, with the big benefit of requiring two times less memory to store the distribution function. Since Sect. 5.3 has shown that the aligned approach with N ϕ = 32 is more accurate than the standard approach with N ϕ = 64 (at least in the linear phase), we can conclude that there is a clear gain in using field-aligned interpolation in Gysela.

Conclusions
We have described a semi-Lagrangian method based on field-aligned interpolation, for the solution of the gyrokinetic Vlasov equation. The application of interest is the numerical simulation of magnetically confined plasmas in fusion devices. Thanks to the smooth variation of the solution in the direction of the magnetic field, field-alignment enhances the accuracy of the interpolation: for a given level of accuracy, this allows us to reduce the number of discretization points along the toroidal direction.
In the simplified setting of 2D constant advection, we have given a rigorous proof of convergence, as well as extensive error estimates which underline the advantages of fieldaligned interpolation. We have implemented the scheme into two semi-Lagrangian codes, Selalib and Gysela, for the solution of the 4D gyrokinetic Vlasov equation in the zero-Larmor-radius limit. We have used the ion temperature gradient (ITG) instability as a standard verification test-case in cylindrical (screw-pinch) and toroidal (circular Tokamak) geometries. In our benchmarks against the standard (not aligned) interpolation scheme, we have observed large reductions in memory footprint (up to a factor of 4), as well as moderate (but improvable) simulation speed-ups. Our estimates suggest that these gains will be even larger in reactorscale simulations.
Field-aligned interpolation does not pose constraints on the 2D poloidal grids, and the use of magnetic flux coordinates is not necessary. Accordingly, the magnetic axis, as well as the X-point in a divertor configuration, do not pose theoretical problems. Therefore our semi-Lagrangian algorithms can be extended to more complex magnetic geometries, enabling the global simulation of diverted Tokamaks and Stellarators. which leads to From this it follows that B * = |B 0 | 1 + ζ 2 + ζ + ζ /r 1 + ζ 2 v .
We then write so that Before proceeding, we replace ∂ z φ with derivatives along the directions (r,θ , b): and B * · ∇φ = B * + ζ 3 /r 1 + ζ 2 v b · ∇φ − Finally, the phase-space flow field in the (r,θ , b) basis reads where B * = |B 0 | 1 + ζ 2 + v (ζ + ζ /r )/(1 + ζ 2 ). If we now let ζ → 0 and ζ → 0 while keeping our basis unchanged, the equations above reduce to ⎧ ⎨ ⎩ u(t, r, θ, z, v ) = − ∂ θ φ a (t, r, θ, z, v ) = −b · ∇φ, which correspond to (4.1). We notice that under this approximation we have let B * → |B 0 |, another mode (m, n * ) and compute a profile ι * (r ) that yields the same value of k (r ) and hence the same dispersion relation. In practice it is not easy to obtain this condition exactly, but a very good approximation may be achieved as long as b z (r ) ≈ 1.