Better Integrators for Functional Renormalization Group Calculations

We analyze a variety of integration schemes for the momentum space functional renormalization group calculation with the goal of finding an optimized scheme. Using the square lattice $t-t'$ Hubbard model as a testbed we define and benchmark the quality. Most notably we define an error estimate of the solution for the ordinary differential equation circumventing the issues introduced by the divergences at the end of the FRG flow. Using this measure to control for accuracy we find a threefold reduction in number of required integration steps achievable by choice of integrator. We herewith publish a set of recommended choices for the functional renormalization group, shown to decrease the computational cost for FRG calculations and representing a valuable basis for further investigations.


Introduction
The functional renormalization group (FRG) has proven itself to be a versatile theoretical framework to study competing ordering tendencies and other many-body phenomena both in itinerant fermionic [1][2][3] and quantum spin systems [4][5][6][7][8]. The underlying flow equations are a large system of coupled, non-linear ordinary differential equations (ODEs), the solution of which stipulated the development of highly sophisticated numerical codebases [9][10][11]. So far, algorithmical improvements mainly focussed on the efficient representation of vertex functions and optimizing momentum integrations on the right-hand-side (RHS) of the flow equations, with the numerical solution of the ODE itself only treated as a neccessity rather than an opportunity for improvements. Consequently, a simple Euler step was used in the majority of FRG calculations to date, see e.g. [3,4,10], with the inception of higher-order solvers from the Runge-Kutta family being a fairly recent development in the field [11][12][13][14][15][16][17].
The appropriate choice of integrator will lead to improved accuracy at the same number of evaluations of the RHS, which constitutes the main computational bottleneck of any FRG calculation. Alternatively, one can significantly reduce this number while maintaining good accuracy. In contrast to many other optimizations, this gain does not introduce any further physical approximations, but solely rests in mathematics. As a prototypical setup to study the influence of different integrators, we choose the momentum space FRG in the simplest truncation scheme, only considering the four-particle vertex Γ (4) while neglecting self-energy effects. Additionally, we limit ourselves to the well-understood case of the square lattice Hubbard model with hopping up to second nearest neighbors [9, 13,[18][19][20][21][22][23][24][25][26][27][28][29][30][31].
We explicitly refrain from implementing more sophisticated approximation schemes [9, 13,26,29,32,33] or multi-band extensions [3,[34][35][36][37][38], not only for clarity of analysis, but rather focus on benchmarking a large variety of different standard ODE solvers, most available in existing libraries [39,40]. Nevertheless we expect our conclusions to provide a starting ground for the application of better integrators to more sophisticated physical approximations.
The paper is organized as follows: After an introduction of our notation of momentum space FRG and the approximations employed in our implementation in Section 2.1, we provide an overview over various integration schemes in Section 2.2 and define tangible metrics to judge the quality of the different schemes in Section 2.3. In Section 3 we employ a two-stage elimination procedure. In a first step we sort out all integrators incapable of reproducing the physical ordering instability for the nearest-neighbor only model. We subsequently analyze the remaining set over a larger parameter space including second-neighbor hopping to distinguish optimal choices.

FRG in brevity
The following discussion of FRG will be reduced to an absolute minimum and only serves as a means to introduce the notation used. For detailed derivations we refer the interested reader to standard literature [1][2][3].
In the following, we will focus solely on the square-lattice t − t Hubbard model defined by where t (t ) denote (next-) nearest-neighbor hopping amplitudes, µ is the chemical potential and U the strength of the onsite Hubbard interaction. Due to the spin-rotation invariance of Eq. (1) and Eq. (2), we can constrain our derivation to the SU(2)-symmetric set of equations. To simplify the functional dependence of the FRG equations, we invoke the static approximation of the problem, i.e. consider only the zero Matsubara frequency component of the vertex functions, and neglect self-energy effects [3,21].
The SU(2)-reduced FRG equation in the conventional truncation scheme including contributions up to the four-point vertex Γ (4) is diagrammatically shown in Fig. 1 and can be expanded to (using V as SU(2)-symmetrized four-point vertex): where we suppress the, in the SU(2) Hubbard model, unnecessary spin indices. We also use a modified Einstein sum convention to indicate the momentum-integration in l over the Brillouin zone (BZ). To treat the continuous momentumdependence of the vertex function, we employ the "grid-FRG" momentum discretization scheme in the BZ with the additional refinement-scheme described in Ref. [41]. Eq. (3) will commonly be referred to as the right-hand side (RHS) of the FRG equation, which we need to compute at each flow step iteration. TheL Λ given in the RHS equation is the derivative w.r.t. the scale Λ of the regulated loop The FRG regulator Θ(Λ) here is introduced multiplicatively by defining G Λ = Θ(Λ)G where G is the bare propagator. We have used both the particle-particle loop L −,Λ and the particle-hole loop L +,Λ in out notation. As a regulator we choose the Ω-cutoff Θ(Λ) = ω 2 ω 2 −Λ 2 [25] with high numerical stability. The analytic Matsubara frequency summations then yield: To obtain the effective low-energy vertex we integrate the differential equation (RHS) starting at infinite (i.e. large compared to bandwidth) Λ = Λ ∞ and approach Λ → 0. When encountering a phase transition the loop derivatives will diverge in conjunction with the associated susceptibilities and we terminate the flow. This is the integration for which we attempt to find an optimized solver. Due to the divergence we can not finalize the calculations using the solvers but instead hard-terminate them once the maximum element of the vertex exceeds a threshold V max .

Integrators
The possible choices of integrators are plentiful but in the following we highlight some of the most prevalent algorithms as well as some which will prove to excel during our testing. The measurement schemes used to determine quality are provided at the end of this section. In order to numerically solve the non-linear non-autonomous differential equation we utilize both single-step and multi-step methods [42] in order to obtain an iteration procedure that we terminate at the divergence of V . We almost exclusively focus on integrators that are implemented in the excellent DifferentialEquations.jl package [39] written in the Julia programming language. We emphasize integrators that are well-known and commonly employed, such as the Runge-Kutta class and also widely available in other programming languages. A full list of the considered (including all disregarded) integrators can be found in Section A.

Single-Step Methods
A single-step method is characterized by utilizing at each step a starting value V n and additional evaluations of the RHS to construct V n+1 . The most well-known family of methods in this class are the Runge-Kutta type integrators where the function Φ is constructed as a weighted average of evaluations of RHS within the interval [Λ n , Λ n+1 ] such that it coincides up to the respective order with its Taylor polynomial: The method is explicit if a ij = 0 for j ≥ i else it is an implicit method. Adaptivity can be included by a time-step dependent ∆ Λ,n . The specifics of the methods are covered in tremendous detail in the literature, e.g., [42].

Example: Adaptive Explicit Euler
The conceptually simplest integrator in FRG that serves as the baseline for this work is the adaptive explicit Euler described in Algorithm 1. Note that we have included a function f (Λ, V max ) = min(max(aΛ/V max , ∆ min ), ∆ max ) which is an adaption scheme specifically designed to reduce the step-width in the proximity of the expected flow divergence. Here a is a small number determining the relative speed of the integration.

A note on implicit Runge-Kutta methods
We have attempted to include implicit integration schemes in our analysis but were inhibited by their high memory cost. While for explicit methods the requirements are understood to be n × sizeof(V ), n ∈ N, for the implicit methods we must instead consider the size of the Jacobian we are attempting to invert. This is proportional to the square of the size of a single solution (as it is a linear map between two of them) and is thus ∝ sizeof(V ) 2 . Even for the relatively small scale systems we employ here the number of elements is n 6 k ≈ 10 13 . All implicit integrators will therefore not be a viable option for FRG calculations.

Methods based on the Non-linear Magnus Series
Another class of one-step methods specifically suited for homogeneous, non-linear, nonautonomous ODEs is based on suitable approximations Ω [m+1] of the Magnus operator Ω [43] in the formal exact solution of (7), where V ∞ = V (Λ = ∞). For the following approximations we define the matrix A by the relation In this overview of methods we restrict ourselves to the approximations of first, and second order and refer the reader to the literature [43] for expressions of the higher order in terms of iterated commutators. An iterative procedure is obtained by applying the approximate time evolution operator exp(Ω [m] ) for small steps ∆ Λ such that Approximating the integrals in Eq. (12) and Eq. (13) by order-consistent low-order approximations, such that e.g., at first order Adaptivity is easily included by using time slices of differing times ∆ Λ,n determined by the same strategy as for the adaptive Euler.

Multistep/Adams methods
In contrast to one-step methods, multistep methods utilize previous evaluations of the right hand side, f n+k = RHS(V n+k , Λ n+k ) in order to approximate it with a Lagrange polynomial. The general form of these linear multistep methods can be stated as If β k = 0 the method is explicit and does not require the evaluation of RHS at unknown points.
Commonly this class is termed Adams-Bashforth technique. Otherwise the method is implicit. If the interpolation polynomial is only augmented with the single yet unknown point RHS(Λ n+k , V n+k ) the method is termed Adams-Moulton method. In this case the implicit nature can be treated by employing a predictor-corrector scheme where the predictionV is obtained by an explicit Adams method of one order lower than the implicit method, RHS is evaluated and the new value V n+k is obtained. Multistep methods need startup values, but these are obtained with explicit Runge-Kutta or similar methods. Adaptive step size integrators can be obtained by utilizing more sophisticated interpolation techniques.

Measures of Quality
Differentiation of the quality of integrators will be based on the following aspects: 1. Error compared to a converged high-fidelity Euler calculation 2. Minimal number of RHS-evaluations 3. RAM requirements

Error compared to high fidelity Euler
All integrators must converge to the same result at infinite numbers of steps performed, or equally at negligible integration error for adaptive procedures. To obtain this result in the most controlled fashion we converge an adaptive Euler integrator to a high number of steps (N RHS ≈ 6000) as a reference. The measure of error employed in the comparison is the normed sum of differences in the resulting vertex tensor: To avoid numerical discrepancies in the vicinity of the flow divergence, we instead compare the vertices slightly above the critical scale Λ crit at Λ = 0.17. This scale is sufficiently low for the integrator to have performed a significant number of integration steps but sufficiently high to avoid the critical scale. Note that this number is arbitrarily chosen to lie within this region.
Because this measure does not include an analysis of the error size which is tolerable for qualitatively correct FRG results (for the given approximation level), we have additionally in stage 2 Section 3.2 included a full phase scan to ensure the results are consistent with expected behavior. We eliminate all integrators that are inconsistent.

Minimal number of RHS-evaluation
As we are aiming to optimize the calculation time requirements, the most important measure of quality for any integration routine must be the number of RHS-evaluations it requires until divergence. This is the only expensive part of the calculation and is thus a trivial, but platform independent estimator for the runtime. While this is no longer true for implicit integrators where the inversion of the Jacobian would be most expensive, these are disallowed by their RAM requirements and thus can be neglected here.

Memory requirement
For most FRG calculations the bottleneck is not only the calculation time but also the memory consumption [41,44]. As higher-order methods may require the saving of intermediate results we want to track the number of concurrently allocated vertex-sized objects as a measure of RAM usage. Once again for implicit methods this may not be the most accurate measure due to the creation of high-dimensional Jacobian matrices, but we disregard them entirely. The impact on memory usage by other objects is of second order when compared to the vertices in scaling: The vertices scale as N 3 k while the loop-derivatives scale as N 2 k and the dispersion cache scales as N k N k f . The measure chosen is the peak memory usage during the calculations. This will be highly proportional to the total number of vertex objects though slight lower-order effects are to be expected.

Results
We have ensured that our implementation falls within the FRG equivalence class of the Hubbard model reported on in Ref. [41]. This binary equivalence is sufficient as a proof of correctness. Note that while one of the codes in that publication is written by one of us, we utilize for this benchmark an independent implementation written in the Julia programming language. All simulations were performed using the following parameters: t = 1, U = 3t, t ∈ [0.0, ..., −0.5]t, µ = −4t . We use a momentum meshing of n k = 16 × 16 with a refined meshing of n fine = 9 × 9. The FRG flow is started at Λ ∞ = 50 and tracked until the maximum element of the vertex exceeds V max = 50t, where we terminate the integration and analyze the last vertex V Λcrit with respect to its ordering tendencies. While this resolution is insufficient for physical simulations at low scales and we acknowledge that the stability of all integrators increases for higher resolutions, the fact that most integrators calculate the proper results is testament to a sufficient resolution for this analysis.
We have split the investigation into a two staged elimination process where we iterate over consecutively more involved tests to find the best set of integrators and neglect the remainder.

Stage 1
The first approach to determine the feasibility of the integrators is to analyze the square lattice Hubbard model at t = 0, µ = 0 where we expect antiferromagnetic ordering.
The results of stage 1 are displayed in Fig. 2. We now select the subset of integrators which yield a speedup (or are similar) in number of evaluations when compared to the adaptive Euler integration scheme while maintaining higher or similar accuracy. This reduces our list of integrators to consider for the continuation to be: • Adaptive Euler  The best integrators lie near the origin of the coordinate system, low number of steps and negligible error. We in the following will consider only integrators within the marked region, the set of which we will refer to as stage 2 integrators. A full list of considered integrators as well as the reasons some are not shown can be found in Section A.
While we acknowledge that some of these might have been tweaked into compliance by optimizing parameters we insist on the out-of-the-box setup of the integrators being at least sufficient (if maybe not optimal).

Stage 2
To check the reduced set of integrators for physical consistency we now evaluate the t − t square lattice Hubbard model at van Hove filling by scanning the second neighbor hopping t (adapting the chemical potential µ = 4t accordingly to remain at v.H. filling) as previously calculated in Refs. [1,28]. We perform these calculations with each of the second stage integrators to obtain a more qualified understanding of their accuracy and cost.
To determine accuracy we in a first step check that the phase transitions occur in a controlled manner in all integrators and all points are properly found to diverge. In this we however make an exception near the SC-FM phase transitions, where the low integration resolutions used here will lead to uncontrolled behavior. By this process we eliminate the Midpoint and TanYam7 integration methods, which did not properly reproduce the leading instability for some points in the phase diagram. The remaining results and integrators are shown in Fig. 3/ We can here also see the good agreement of qualitative results from the different integrators, an expected but satisfying result. The comparative data is taken from [45], on which we have superimposed the subset of stage 2 integrators. We have removed all integrators who were unable to produce the correct phase at points. Note that we in this analysis ignore the region around the SC-FM phase transition as it is notoriously unstable under low scales [28,41]. Dots represent antiferromagnetism, crosses superconductivity and diamonds ferromagnetism.
To evaluate the performance we sum the number of RHS evaluation for each of the 20 runs in the interval t ∈ [0, ..., −0.5] into a total number N RHS . This is a more accurate measure of performance than before as the phase diagram contains parameter regions, where early divergences are expected as well as points, at which low RG scales have to be reached. This is supplemented by an improved error estimate: As before we calculate the effective vertex at the low scale of Λ = 0.17 and compare it to the results obtained by a converged high-fidelity Euler, but we now consider 3 additional points, one in each phase at t = 0.0, −0.05, −0.2, −0.45. As an accuracy measure we consider the mean of the relative error for these four parameter combinations. The results of this analysis are presented in Fig. 4. We find, that depending on the exact requirements on the integrator, different algorithms should be employed. For purely optimizing the runtime, i.e. the number of RHS evaluations, the multi-step method VCABM3 clearly is the method of choice, with a mean error comparable to the adaptive Euler, but only needing less than a quarter of RHS evaluations. On the other extreme, we find DP5 to be the most accurate, but requiring 40% more runtime. A competitive alternative is RK4 with a slightly larger relative deviation from the reference data, but at a computational cost comparable to the Adaptive Euler. As a good compromise between numerical complexity and accuracy, we find a cluster of three methods: BS3, Tsit5 and, most notably, SSPRK43 all have an order of magnitude lower relative error, but at roughly two third of the computational cost of the Adaptive Euler.
In Fig. 5 we additionally show the measured peak memory requirement of our implementation relative to the Adaptive Euler. This figure is highly implementation dependent, but should nevertheless serve as a rough guide to the overhead incurred by the different solvers. Most clearly, this figure shows, that the extreme numerical efficiency of VCABM3 is achieved trading runtime for memory consumption, with a fivefold requirement compared to a simple Euler implementation. The remaining integrators considered all fall in the range of 1.5 to 2.5 the memory requirement. Out of the three general algorithms, SSPRK43 clearly has the lowest memory requirements, which combined with its good accuracy and numerical efficiency makes it the best suited ODE integrator in our benchmark, followed closely by BS3.

Conclusions
We have benchmarked a multitude of ODE integration algorithms both from the DifferentialEquations.jl library and own implementations against a reference Adaptive Euler using the square-lattice Hubbard model as a test case. We have identified, that the right choice of integrator can significantly reduce the numerical cost for integrating the runtime while at the same time producing more accurate results. We identify the multi-step algorithm VCABM3 to be the most efficient one with acceptable numerical errors, while highest accuracy can be achieved by the numerically expensive DP5 algorithm, or alternatively the slightly less accurate RK4, which however has about the same numerical cost as the Adaptive Euler.
As the best compromise between accuracy and numerical speedup, we identify Tsit5, BS3 and SSPRK43, with the latter slightly outclassing its rivals. This fact at first glance is surprising, as this specific integrator is designed to handle dissipative differential equations stemming from a discretization of hyperbolic partial differential equations. However, as the FRG equations are believed to flow towards a fixed point for any given phase, we speculate them to exhibit dissipation in the mathematical sense, which is the property SSPRK43 is design to utilize. This means that solutions of the RG flow corresponding to different initial conditions in the same phase become more similar as  Fig. 4 we here provide the measurements of the memory usage during the integration procedure. We have denoted the memory usage in terms of the most efficient schemes, the adaptive Euler. Note that we have just measured the total memory usage, partial might thus correspond to objects of smaller size. Nonetheless the measurement will reflect the ram needed for the integrations very well. they approach Λ = 0, as the dominant physics will be the same for all of these. A mathematically thorough analysis of the FRG equations regarding this property, however, is beyond the scope of this paper.
As we only have analyzed a subset of the plethora of available ODE integrators in this paper, we cannot claim to have found the "best" integrator for FRG system, but we ensured to represent a spectrum of the common choices.
We also have focused on the simplest physically interesting model, the square-lattice Hubbard model. While we believe, that the scaling of computational cost can be extrapolated to other problems, more complicated problems posed with more intricate Fermi surfaces may require denser integration steps, the best algorithms may therefore differ.
Furthermore when using other approximations of momentum space FRG, for example self energy inclusions or multiloop, the RHS equations are subject to structural changes. The inclusion of self energies might benefit from the use of operator splitting methods, such as the generalized Strang or Leapfrog splitting [46], where self energy and vertex are evaluated at alternating half-step intervals. The merit of our results is still valid, the best integrators as found here will be good candidates for the other applications.
Another path of algorithmic progress can be made by taking into account to intricacies of the FRG equations. Since their integration starts at infinity we propose to investigate transformations of this semi-infinite domain as in Ref. [16,47,48].
Furthermore, the integration of the FRG equations contains an inherent singularity corresponding to the physical ordering tendency. Implicit methods [49] are well suited for such problems, but were out of the scope of our investigation due to memory requirements. It could be a worthwhile direction of future investigation. More specialized multistep integrators that utilize rational interpolants [49,50] instead of Newton and Lagrange polynomials for the definition of their integration rule could be beneficial in dealing with the singular behavior. But of course their effectiveness rests on methods of obtaining the action of the Jacobian of RHS or suitable approximations to it. Research Foundatation) for support through RTG 1995 within the Priority Program SPP 244 "2DMP". F.G. and T.M. thank the GRF for funding through the SFB 1170 "Tocotronics" under the grant numbers Z03 and B04, respectively. Furthermore we want to thank Jonas Hauck, Dominik Kiese, Lennart Klebl, Janik Potten, Stephan Rachel and Tilman Schwemmer for helpful discussions.
Author's contributions. J.B. and T.M. were leading the development of the code with F.G. taking charge of the simulation effort. Analysis was conducted on a joint basis, the manuscript was a joint effort by the authors. Overall no distinction can be drawn in level of contribution between the authors.
Data availability. The FRG implementation used in this paper is available at https://doi. org/10.5281/zenodo.6391339. The numerical data shown is available from the corresponding author upon reasonable request.