1 Introduction

The functional renormalization group (FRG) has proven itself to be a versatile theoretical framework to study competing ordering tendencies and other many-body phenomena both in itinerant fermionic [1,2,3] and quantum spin systems [4,5,6,7,8]. The underlying flow equations are a large system of coupled, non-linear ordinary differential equations (ODEs), the solution of which stipulated the development of highly sophisticated numerical codebases [9,10,11]. So far, algorithmical improvements mainly focussed on the efficient representation of vertex functions and optimizing momentum integrations on the right-hand-side (RHS) of the flow equations, with the numerical solution of the ODE itself only treated as a necessity rather than an opportunity for improvements. Consequently, a simple Euler step was used in the majority of FRG calculations to date, see e.g. [3, 4, 10], with the inception of higher-order solvers from the Runge–Kutta family being a fairly recent development in the field [9, 11,12,13,14,15,16,17,18,19,20,21].

The appropriate choice of integrator will lead to improved accuracy at the same number of evaluations of the RHS, which constitutes the main computational bottleneck of any FRG calculation. Alternatively, one can significantly reduce this number while maintaining good accuracy. In contrast to many other optimizations, this gain does not introduce any further physical approximations, but solely rests in mathematics. As a prototypical setup to study the influence of different integrators, we choose the momentum space FRG in the simplest truncation scheme, only considering the four-particle vertex \(\Gamma ^{(4)}\) while neglecting self-energy effects. Additionally, we limit ourselves to the well-understood case of the square lattice Hubbard model with hopping up to second-nearest neighbors [9, 14, 15, 22,23,24,25,26,27,28,29,30,31,32,33,34].

We explicitly refrain from implementing more sophisticated approximation schemes [9, 14, 15, 30, 35, 36] or multi-band extensions [3, 37,38,39,40,41], not only for clarity of analysis, but rather focus on benchmarking a large variety of different standard ODE solvers, most available in existing libraries [42, 43]. Nevertheless, we expect our conclusions to provide a starting ground for the application of better integrators to more sophisticated physical approximations.

The paper is organized as follows: after an introduction of our notation of momentum space FRG and the approximations employed in our implementation in Sect. 2.1, we provide an overview over various integration schemes in Section 2.2 and define tangible metrics to judge the quality of the different schemes in Sect. 2.3. In Sect. 3 we employ a two-stage elimination procedure. In a first step we sort out all integrators incapable of reproducing the physical ordering instability for the nearest-neighbor only model. We subsequently analyze the remaining set over a larger parameter space including second-neighbor hopping to distinguish optimal choices.

2 Methods

2.1 FRG in brevity

The following discussion of FRG will be reduced to an absolute minimum and only serves as a means to introduce the notation used. For detailed derivations we refer the interested reader to standard literature [1,2,3].

Fig. 1
figure 1

Diagrammatic representation of the functional renormalization group equations. We show the \({\text {SU}}(2)\)-symmetric flow equation, where the \({\text {SU}}(2)\)-degree of freedom is kept constant along the lines of the vertex. The line through the loop represents the scale derivative \(\mathrm {d}/\mathrm {d}\Lambda \). We have indicated the three channels of the FRG flow, differing in the bosonic transfer momentum: -, -, and -channel

In the following, we will focus solely on the square-lattice \(t-t'\) Hubbard model defined by

$$\begin{aligned} H_0= & {} -\! \sum \limits _{\langle i,j \rangle ,\sigma } t \, c_{i,\sigma }^\dagger c_{j,\sigma }^{\dagger } \, -\! \sum _{\langle \langle i,j \rangle \rangle ,\sigma } t' \, c_{i,\sigma }^\dagger c_{j,\sigma }^{\dagger } \nonumber \\&\quad -\! \sum _{i,\sigma } \mu \, c_{i,\sigma }^\dagger c_{i,\sigma }^{\dagger }\,, \end{aligned}$$
$$\begin{aligned} H_\mathrm {I}= & {} U\, \sum _{i,\sigma } \, n_{i,\sigma } n_{i,{\bar{\sigma }}} \,, \end{aligned}$$

where t (\(t'\)) denote (next-) nearest-neighbor hopping amplitudes, \(\mu \) is the chemical potential and U the strength of the onsite Hubbard interaction. Due to the spin-rotation invariance of Eqs. (1) and (2), we can constrain our derivation to the \({\text {SU}}(2)\)-symmetric set of equations. To simplify the functional dependence of the FRG equations, we invoke the static approximation of the problem, i.e. consider only the zero Matsubara frequency component of the vertex functions, and neglect self-energy effects [3, 25]. We furthermore calculate at zero temperature to enable analytic evaluation of all occurring Matsubara frequency summations.

The \({\text {SU}}(2)\)-reduced FRG equation in the conventional truncation scheme including contributions up to the four-point vertex \(\Gamma ^{(4)}\) is diagrammatically shown in Fig. 1 and can be expanded to (using V as \({\text {SU}}(2)\)-symmetrized four-point vertex):

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}\Lambda } V_{\varvec{k}_0, \varvec{k}_1, \varvec{k}_2}&= \,\,\, V_{\varvec{k}_0, \varvec{k}_1, \varvec{l}} \dot{L}^{-,\Lambda }_{\varvec{l}, -\varvec{l} + \varvec{k}_0 + \varvec{k}_1}V_{\varvec{l}, -\varvec{l} + \varvec{k}_0 + \varvec{k}_1, \varvec{k}_2} \nonumber \\&\quad + V_{\varvec{k}_0, \varvec{l}-\varvec{k}_0 + \varvec{k}_2, \varvec{k}_2}\dot{L}^{+,\Lambda }_{\varvec{l}, \varvec{l} - \varvec{k}_0 + \varvec{k}_2}V_{\varvec{l}, \varvec{k}_1, \varvec{l} - \varvec{k}_0 + \varvec{k}_2} \nonumber \\&\quad + V_{\varvec{k}_0, \varvec{l} - \varvec{k}_0 + \varvec{k}_2, \varvec{k}_2}\dot{L}^{+,\Lambda }_{\varvec{l}, \varvec{l} - \varvec{k}_0 + \varvec{k}_2}V_{\varvec{k}_1, \varvec{l}, \varvec{l} - \varvec{k}_0 + \varvec{k}_2} \nonumber \\&\quad + V_{\varvec{k}_0, \varvec{l} - \varvec{k}_0 + \varvec{k}_2, \varvec{l}}\dot{L}^{+,\Lambda }_{\varvec{l}, \varvec{l} - \varvec{k}_0 + \varvec{k}_2}V_{\varvec{l}, \varvec{k}_1, \varvec{l} - \varvec{k}_0 + \varvec{k}_2} \nonumber \\&\quad + V_{\varvec{k}_0, \varvec{l}-\varvec{k}_1 + \varvec{k}_2, \varvec{l}}\dot{L}^{+,\Lambda }_{\varvec{l}, \varvec{l} + \varvec{k}_1 - \varvec{k}_2}V_{\varvec{l}, \varvec{k}_1, \varvec{k}_2} \, , \end{aligned}$$

where we suppress the, in the \({\text {SU}}(2)\) Hubbard model, unnecessary spin indices. We also use a modified Einstein sum convention to indicate the momentum-integration in \(\varvec{l}\) over the Brillouin zone (BZ). To treat the continuous momentum-dependence of the vertex function, we employ the “grid-FRG” momentum discretization scheme in the BZ with the additional refinement-scheme described in Ref. [44]. Eq. (3) will commonly be referred to as the right-hand side (RHS) of the FRG equation, which we need to compute at each flow step iteration.

The \(\dot{L}^{\Lambda }\) given in the RHS equation is the derivative w.r.t. the scale \(\Lambda \) of the regulated loop

$$\begin{aligned} L^{\pm ,\Lambda } = \sum \limits _n G^\Lambda (\varvec{k},\mathrm {i}\omega _n) G^\Lambda (\varvec{k}',\pm \mathrm {i}\omega _n)\, . \end{aligned}$$

The FRG regulator \(\Theta (\Lambda )\) here is introduced multiplicatively by defining \(G^\Lambda = \Theta (\Lambda )G\) where G is the bare propagator. We have used both the particle-particle loop \(L^{-, \Lambda }\) and the particle-hole loop \(L^{+, \Lambda }\) in our notation. As a regulator we choose the \(\Omega \)-cutoff \(\Theta (\Lambda ) = \frac{\omega ^2}{\omega ^2-\Lambda ^2}\) [29] with high numerical stability. The analytic Matsubara frequency summations then yield:

$$\begin{aligned} \dot{L}^{\pm , \Lambda }_{\varvec{l}, \varvec{l}'}&= \frac{\mathrm {d}}{\mathrm {d}\Lambda }L^{\pm , \Lambda }_{\varvec{l}, \varvec{l}'}\end{aligned}$$
$$\begin{aligned}&= {\left\{ \begin{array}{ll} \displaystyle \frac{\pm 1/4}{\epsilon \mp \epsilon '} \left( \frac{\epsilon (3|\epsilon |+ \Lambda )}{(|\epsilon |+ \Lambda )^3} \mp \frac{\epsilon '(3|\epsilon '|+\Lambda )}{(|\epsilon '|+\Lambda )^3} \right) \\ \displaystyle \mp \frac{3\epsilon ^2-4|\epsilon |\Lambda -\Lambda ^2}{4(|\epsilon |+\Lambda )^4} \qquad \text {if } \epsilon = \epsilon '\ , \end{array}\right. } \end{aligned}$$

with \(\epsilon = \epsilon (l)\) and \(\epsilon ' = \epsilon (l')\) being the dispersion \(\epsilon \) of the non-interacting Hamiltonian \(H_0\) for specific momenta \(l,l'\). To obtain the effective low-energy vertex we integrate the differential equation (RHS) starting at infinite (i.e. large compared to bandwidth) \(\Lambda =\Lambda _\infty \) and approach \(\Lambda \rightarrow 0\). When encountering a phase transition the loop derivatives will diverge in conjunction with the associated susceptibilities and we terminate the flow. This is the integration for which we attempt to find an optimized solver. Due to the divergence we can not finalize the calculations using the solvers but instead hard-terminate them once the maximum element of the vertex exceeds a threshold \(V_{\max }\).

2.2 Integrators

The possible choices of integrators are plentiful but in the following we highlight some of the most prevalent algorithms as well as some which will prove to excel during our testing. The measurement schemes used to determine quality are provided at the end of this section. To numerically solve the non-linear non-autonomous differential equation

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}\Lambda } V = \mathrm {RHS}(V, \Lambda ) \end{aligned}$$

we utilize both single-step and multi-step methods [45] to obtain an iteration procedure

$$\begin{aligned} V_{n+1} = V_n + \Delta _{\Lambda ,n} \Phi (\Lambda _n, V_n, \Delta _{\Lambda ,n}) \end{aligned}$$

that we terminate at the divergence of V.

We almost exclusively focus on integrators that are implemented in the excellent DifferentialEquations. jl package [42] written in the Julia programming language. We emphasize integrators that are well-known and commonly employed, such as the Runge–Kutta class and also widely available in other programming languages. A full list of the considered (including all disregarded) integrators can be found in Appedix A. We want to note here that non-adaptive integrators were eliminated as finer step sizes at lower \(\Lambda \) are required for efficient solution of the flow equations.

2.2.1 Single-step methods

A single-step method is characterized by utilizing at each step a starting value \(V_n\) and additional evaluations of the RHS to construct \(V_{n+1}\). The most well-known family of methods in this class are the Runge–Kutta type integrators where the function \(\Phi \) is constructed as a weighted average of evaluations of RHS within the interval \([\Lambda _n, \Lambda _{n+1}]\) such that it coincides up to the respective order with its Taylor polynomial:

$$\begin{aligned} V_{n+1}= & {} V_n + \Delta _{\Lambda }\sum _{i=1}^s b_i k_i \end{aligned}$$
$$\begin{aligned} k_i= & {} \mathrm {RHS}\left( \Lambda _n + c_i \Delta _{\Lambda }, V_n + \Delta _{\Lambda } \sum _{j=1}^s a_{ij} k_j\right) \end{aligned}$$

The method is explicit if \(a_{ij}=0\) for \(j \ge i\) else it is an implicit method. Adaptivity can be included by a time-step dependent \(\Delta _{\Lambda ,n}\). The specifics of the methods are covered in tremendous detail in the literature, e.g., [45].

2.2.2 Example: adaptive explicit Euler

The conceptually simplest integrator in FRG that serves as the baseline for this work is the adaptive explicit Euler described in Algorithm 1. Note that we have included a function \(f(\Lambda , V_{\max }) = \min (\max (a\Lambda /V_{\max }, \Delta _{\min }), \Delta _{\max })\) which is an adaption scheme specifically designed to reduce the step-width in the proximity of the expected flow divergence. Here a is a small number determining the relative speed of the integration while \(\Delta _{\max }, \Delta _{\min }\) are chosen based on previous experience: \(\Delta _{\max }\gtrsim \Lambda \times 10^{-2}\) and \(\Delta _{\min }~\lesssim ~\Lambda ~\times ~10^{-5}\).

figure e

2.2.3 A note on implicit Runge–Kutta methods

We have attempted to include implicit integration schemes in our analysis but were inhibited by their high memory cost. While for explicit methods the requirements are understood to be \(m\times \texttt {sizeof}(V)\) - m being the number of stored evaluations of RHS - for the implicit methods we must instead consider the size of the Jacobian we are attempting to invert. This is proportional to the square of the size of a single solution (as it is a linear map between two of them) and is thus \(\propto \texttt {sizeof}(V)^2\). Even for the relatively small scale systems we employ here the number of elements is \(N_{\varvec{k}}^6 \approx 10^{13}\). All implicit integrators will therefore not be a viable option for FRG calculations.

2.2.4 Methods based on the non-linear Magnus series

Another class of one-step methods specifically suited for homogeneous, non-linear, non-autonomous ODEs is based on the Magnus form of the exact solution of (7),

$$\begin{aligned} V(\Lambda ) = \exp \left( \Omega (\Lambda , V_\infty ) \right) V_\infty \end{aligned}$$

where \(V_\infty = V(\Lambda = \infty )\) and we now instead require a suitable approximation for \(\Omega \). \(\Omega \) fulfills a non-linear differential equation

$$\begin{aligned} \frac{d \Omega }{d\Lambda } = \text {dexp}^{-1}_\Omega \left( A (\Lambda , e^\Omega V_\infty )\right) ,\quad \Omega (\infty ) = 0 \end{aligned}$$

where the inverse of the \(\text {dexp}_\Omega \) operator is defined by the series

$$\begin{aligned} \text {dexp}^{-1}_\Omega (C) = \sum _{k=0}^\infty \frac{B_k}{k!} \, \text {ad}^k_\Omega C, \end{aligned}$$

with the iterated right-nested commutator \(\text {ad}^{k+1}_X (Y) = [X, \text {ad}^k_X(Y)]\), \(B_k\) are the well-known Bernoulli numbers, and the matrix A is defined by the relation \(\mathrm {RHS}(\Lambda , V) = A(\Lambda , V) V\). The first-order approximation can be obtained by truncating Eq. (13) as

$$\begin{aligned} \Omega ^{[1]}(\Lambda , \Lambda _\infty ) = \int _{\Lambda _\infty }^\Lambda A(s,V_\infty ) ds. \end{aligned}$$

Of course the presence of iterated commutators makes higher order expressions unwieldy and hence we restrict ourselves to the first, Eq. (14), and second order approximations

$$\begin{aligned} \Omega ^{[2]}(\Lambda , \Lambda _\infty ) = \int _{\Lambda _\infty }^\Lambda A(s, e^{\Omega ^{[1]}} V_\infty ) ds. \end{aligned}$$

The general higher order approximation can be found in the literature [46]. An iterative time-stepping procedure is obtained by applying the approximate time evolution operator \(\exp (\Omega ^{[m]})\) for small steps \(\Delta _{\Lambda }\) such that

$$\begin{aligned} y_{n+1} = \exp \left( \Omega ^{[m]}\left( \Lambda _n + \Delta _{\Lambda }, \Lambda _n \right) \right) y_n. \end{aligned}$$

The integrals in Eqs. (14) and (15) are approximated by order-consistent low-order expressions, such that e.g., at first order

$$\begin{aligned} \Omega ^{[1]}(\Lambda +\Delta _\Lambda , \Lambda )= & {} \int _{\Lambda }^{\Lambda +\Delta _\Lambda } A(s,V(\Lambda )) ds\end{aligned}$$
$$\begin{aligned}\approx & {} \Delta _\Lambda A(\Lambda , V(\Lambda )). \end{aligned}$$

Adaptivity is easily included using time slices of differing times \(\Delta _{\Lambda ,n}\) determined by the same strategy as for the adaptive Euler.

2.2.5 Multistep/Adams methods

In contrast to one-step methods, multistep methods utilize previous evaluations of the right-hand side, \(f_{n+k} = \mathrm {RHS}(V_{n+k}, \Lambda _{n+k})\) to approximate it with a Lagrange polynomial. The general form of these linear multistep methods can be stated as

$$\begin{aligned} \sum _{j=0}^k \alpha _j V_{n+j} = h \sum _{j=0}^k \beta _j \mathrm {RHS}(\Lambda _{n+j}, V_{n+j}). \end{aligned}$$

If \(\beta _k = 0\) the method is explicit and does not require the evaluation of RHS at unknown points. Commonly this class is termed Adams-Bashforth (AB) technique. Otherwise the method is implicit. If the interpolation polynomial is only augmented with the single yet unknown point \(\mathrm {RHS}(\Lambda _{n+k}, V_{n+k})\) the method is termed Adams-Bashforth-Moulton method(ABM). In this case the implicit nature can be treated by employing a predictor-corrector scheme where the prediction \({\hat{V}}\) is obtained by an explicit Adams method of one order lower than the implicit method, RHS is evaluated and the new value \(V_{n+k}\) is obtained. Multistep methods need startup values, but these are obtained with explicit Runge–Kutta or similar methods. Adaptive step size integrators can be obtained by utilizing more sophisticated interpolation techniques.

2.3 Measures of quality

Differentiation of the quality of integrators will be based on the following aspects:

  1. 1.

    Error compared to a converged high-fidelity Euler calculation

  2. 2.

    Minimal number of RHS-evaluations

  3. 3.

    RAM requirements

2.3.1 Error compared to high-fidelity Euler

All integrators must converge to the same result at infinite numbers of steps performed, or equally at negligible integration error for adaptive procedures. To obtain this result in the most controlled fashion we converge an adaptive Euler integrator by successively shrinking the maximum allowed step-width. We find for all simulated parameter points, that a maximum step width of \(\Delta _{\max }=0.001\Lambda \) (cf. Sect. 2.2.2) leads to converged results, implying \(N_{\mathrm {RHS}} = 5700-8000\) for a single simulation, depending on the precise divergence scale. Note that this number is an order of magnitude higher than for all other discussed methods. The measure of error employed in the comparison is the normed sum of differences in the resulting vertex tensor:

$$\begin{aligned} \frac{\sum _i |V(i) - V_{\mathrm {Euler}}(i)|}{\sum _i|V_{\mathrm {Euler}}(i)|} \,. \end{aligned}$$

To avoid numerical discrepancies in the vicinity of the flow divergence, we instead compare the vertices slightly above the critical scale \(\Lambda _{\mathrm {crit}}\) at \(\Lambda =0.17\). This scale is sufficiently low for the integrator to have performed a significant number of integration steps but sufficiently high to avoid the critical scale. Note that this number is arbitrarily chosen to lie within this region.

Because this measure does not include an analysis of the error size which is tolerable for qualitatively correct FRG results (for the given approximation level), we have additionally in stage 2 Sect. 3.2 included a full phase scan to ensure the results are consistent with expected behavior. We eliminate all integrators that are inconsistent.

2.3.2 Minimal number of RHS-evaluation

As we are aiming to optimize the calculation time requirements, the most important measure of quality for any integration routine must be the number of RHS-evaluations it requires until divergence. This is the only expensive part of the calculation and is thus a trivial, but platform independent estimator for the runtime. While this is no longer true for implicit integrators where the inversion of the Jacobian would be most expensive, these are disallowed by their RAM requirements and thus can be neglected here.

2.3.3 Memory requirement

For most FRG calculations the bottleneck is not only the calculation time but also the memory consumption [44, 47]. As higher-order methods may require the saving of intermediate results we want to track the number of concurrently allocated vertex-sized objects as a measure of RAM usage. Once again for implicit methods this may not be the most accurate measure due to the creation of high-dimensional Jacobian matrices, but we disregard them entirely. The impact on memory usage by other objects is of second order when compared to the vertices in scaling: the vertices scale as \(N_{\varvec{k}}^3\) while the loop-derivatives scale as \(N_{\varvec{k}}^2\) and the dispersion cache scales as \(N_{\varvec{k}}N_{\varvec{k}_f}\). The measure chosen is the peak memory usage during the calculations. This will be highly proportional to the total number of vertex objects though slight lower-order effects are to be expected.

3 Results

We have ensured that our implementation falls within the FRG equivalence class of the Hubbard model reported on in Ref. [44]. This is a grouping of three distinct FRG implementations which all reproduce numerically equivalent results for a narrow range of tests. Reproduction of all the published tests, most notably equality of all elements of the resulting vertex, confirms that this code is correct for the chosen approximations. Note that while one of the codes in that publication is written by one of us, we utilize for this benchmark an independent implementation written in the Julia programming language.

All simulations were performed using the following parameters: \(t=1, U=3t, t'\in [0.0,..., -0.5] t\), \(\mu = 4t'\). We use a momentum meshing of \(n_{\varvec{k}} = 16\times 16\) with a refined meshing of \(n_{\mathrm {fine}}=9\times 9\). The FRG flow is started at \(\Lambda _\infty =50\) and tracked until the maximum element of the vertex exceeds \(V_{\max }=50t\), where we terminate the integration and analyze the last vertex \(V_{\Lambda _{\mathrm {crit}}}\) with respect to its ordering tendencies. While this resolution is insufficient for physical simulations at low scales, and we acknowledge that the stability of all integrators increases for higher resolutions, the fact that most integrators yield correct phase predictions at the end of flow is testament to a sufficient resolution for this analysis. These predictions are shown in Fig. 3 where the similarities are easily apparent. We also already want to note that the limits of the chosen resolution are visible here in the difference to the established literature result. This gap could be lessened by increasing the resolution.

For error controlled adaptive step-size integrators, we allow for an absolute local error of \(10^{-6}\), a relative local error of \(10^{-3}\) and set a lower bound of \(10^{-5}\) on the absolute step size to prevent excessive runtimes close to divergences of the flow.

We have split the investigation into a two staged elimination process where we iterate over consecutively more involved tests to find the best set of integrators and neglect the remainder.

Fig. 2
figure 2

Integrator Comparisons Quick analysis of the quality of integrators. For this measurement we chose the single point at \(t'=0\), \(\mu =0\) in the \(t-t'\) square-lattice Hubbard model. At this point we expect an antiferromagnetic divergence, every method incapable of producing this is omitted. We show the number of steps required by the integrators for the prediction of the divergence as well as the error when compared to high-fidelity Euler calculations. The best integrators lie near the origin of the coordinate system, low number of steps and negligible error. We in the following will consider only integrators marked with circles, the set of which we will refer to as stage 2 integrators. The crossed ones were eliminated from further consideration as they were far from optimal. A full list of considered integrators as well as the reasons some are not shown can be found in Appendix A

3.1 Stage 1

The first approach to determine the feasibility of the integrators is to analyze the square lattice Hubbard model at \(t'=0\), \(\mu =0\) where we expect antiferromagnetic ordering.

The results of stage 1 are displayed in Fig. 2. We now select the subset of integrators which yield a speedup (or are similar) in number of evaluations when compared to the adaptive Euler integration scheme while maintaining higher or similar accuracy. Other reasons for discarding, are that we run out of memory, lack of features, or plainly wrong results. A detailed listing of all considered integrators can be found in Appendix A, where we also give references to their details in literature. This reduces our list of integrators to consider for the continuation to be:

  • Adaptive Euler

  • Bogacki-Shampine 3/2 (BS3)

  • Dormand-Prince’s 5/4 Runge–Kutta (DP5)

  • Second order Heun’s

  • Second order Midpoint

  • canonical Runge–Kutta 4 (RK4)

  • Strong-stability preserving Runge–Kutta (SSPRK43)

  • Tanaka-Yamashita 7 Runge–Kutta (TanYam7)

  • Tsitouras 5/4 Runge–Kutta (Tsit5)

  • Third order Adams-Moulton, BS3 for starting values (VCABM3)

We have thus significantly reduced the number of integrators to consider in the next section. While we acknowledge that some of these might have been tweaked into compliance by optimizing parameters we insist on the out-of-the-box setup of the integrators being at least sufficient (if maybe not optimal). Particularly noteworthy is here the behavior of AB methods in contrast to the ABM methods. The AB methods sampled their startup values of V very close to the initial \(\Lambda _\infty \) where V is very smooth and structureless. From this structurelessness huge steps for the rest of the integration domain were inferred that led to wrong results. In the ABM methods this use of excessively huge step sizes was intrinsically prevented by the corrector step and hence did not require any tweaking of method parameters.

Fig. 3
figure 3

Hubbard Model Calculations of the reference Hubbard model at van Hove filling with \(t'\in [0.0, -0.4t]\). The comparative data is taken from [48], on which we have superimposed the subset of stage 2 integrators. We have removed all integrators who were unable to produce the correct phase at points. Note that we in this analysis ignore the region around the SC-FM phase transition as it is notoriously unstable under low scales [32, 44]. Dots represent antiferromagnetism, crosses superconductivity and diamonds ferromagnetism. Note that the scale difference to literature results is due only to the chosen resolution

3.2 Stage 2

To check the reduced set of integrators for physical consistency we now evaluate the \(t-t'\) square lattice Hubbard model at van Hove filling by scanning the second neighbor hopping \(t'\) (adapting the chemical potential \(\mu =4t'\) accordingly to remain at v.H. filling) as previously calculated in Refs. [1, 32]. We perform these calculations with each of the second stage integrators to obtain a more qualified understanding of their accuracy and cost. The choice to evaluate along the van Hove singularity was made to ease the costs of the simulations. At the singularity the high density of states raises the critical scale of the transition, lowering the minimum required resolution to find a phase other than metal. We retain however quantitative differences in the critical scale compared to literature results as can be seen in Fig. 3.

To determine accuracy we in a first step check that the phase transitions occur in a controlled manner in all integrators and all points are properly found to diverge. In this we however make an exception near the SC-FM phase transitions, where the low integration resolutions used here will lead to uncontrolled behavior. By this process we eliminate the Midpoint and TanYam7 integration methods, which did not properly reproduce the leading instability for some points in the phase diagram. The remaining results and integrators are shown in Fig. 3. We can here also see the good agreement of qualitative results from the different integrators, an expected but satisfying result.

To evaluate the performance we sum the number of RHS evaluation for each of the 20 runs in the interval \(t' \in [0,..., -0.5]\) into a total number \(N_\mathrm {RHS}\). This is a more accurate measure of performance than before as the phase diagram contains parameter regions, where early divergences are expected as well as points, at which low RG scales have to be reached. This is supplemented by an improved error estimate: as before we calculate the effective vertex at the low scale of \(\Lambda = 0.17\) and compare it to the results obtained by a converged high-fidelity Euler, but we now consider 3 additional points, one in each phase at \(t' = 0.0, -0.05, -0.2, -0.45\). As an accuracy measure we consider the mean of the relative error for these four parameter combinations. The results of this analysis are presented in Fig. 4. We find, that depending on the exact requirements on the integrator, different algorithms should be employed. For purely optimizing the runtime, i.e. the number of RHS evaluations, the multi-step method VCABM3 clearly is the method of choice, with a mean error comparable to the adaptive Euler, but only needing less than a quarter of RHS evaluations. On the other extreme, we find DP5 to be the most accurate, but requiring 40% more runtime. A competitive alternative is RK4 with a slightly larger relative deviation from the reference data, but at a computational cost comparable to the Adaptive Euler. As a good compromise between numerical complexity and accuracy, we find a cluster of three methods: BS3, Tsit5 and, most notably, SSPRK43 all have an order of magnitude lower relative error, but at roughly two third of the computational cost of the Adaptive Euler.

Fig. 4
figure 4

Stage 2 Integrator Comparisons We compare the integrators which passed the primary analysis in a second iteration using an extended testing scheme. This extended scheme consists of a full \(t-t'\) Hubbard model phase scan. The \(N_{\mathrm {RHS}}\) here represents the total number of steps for the evaluation of the 20 points in the phase diagram. Similarly, the error represents the mean of the errors at the four evaluation points \(t'=0, -0.05, -0.2, -0.4\). Once again proximity to the origin is the quality measure in the graphic

In Fig. 5 we additionally show the measured peak memory requirement of our implementation relative to the Adaptive Euler. This figure is highly implementation dependent, but should nevertheless serve as a rough guide to the overhead incurred by the different solvers. Most clearly, this figure shows, that the extreme numerical efficiency of VCABM3 is achieved trading runtime for memory consumption, with a fivefold requirement compared to a simple Euler implementation. The remaining integrators considered all fall in the range of 1.5 to 2.5 the memory requirement. Out of the three general algorithms, SSPRK43 clearly has the lowest memory requirements, which combined with its good accuracy and numerical efficiency makes it the best suited ODE integrator in our benchmark, followed closely by BS3.

Fig. 5
figure 5

Stage 2 Memory Comparison Supplementing the analysis presented in Fig. 4 we here provide the measurements of the memory usage during the integration procedure. We have denoted the memory usage in terms of the most efficient schemes, the adaptive Euler. Note that we have just measured the total memory usage, partial might thus correspond to objects of smaller size. Nonetheless, the measurement will reflect the ram needed for the integrations very well

4 Conclusions

We have benchmarked a multitude of ODE integration algorithms both from the DifferentialEquations.jl library and own implementations against a reference Adaptive Euler using the square-lattice Hubbard model as a test case. We have identified, that the right choice of integrator can significantly reduce the numerical cost for integrating the runtime while at the same time producing more accurate results. We identify the multi-step algorithm VCABM3 to be the most efficient one with acceptable numerical errors, while highest accuracy can be achieved by the numerically expensive DP5 algorithm, or alternatively the slightly less accurate RK4, which however has about the same numerical cost as the Adaptive Euler.

As the best compromise between accuracy and numerical speedup, we identify Tsit5, BS3 and SSPRK43, with the latter slightly outclassing its rivals. This fact at first glance is surprising, as this specific integrator is designed to handle dissipative differential equations stemming from a discretization of hyperbolic partial differential equations. However, as the FRG equations are believed to flow towards a fixed point for any given phase [49], we speculate them to exhibit dissipation in the mathematical sense, which is the property SSPRK43 is design to utilize. This means that solutions of the RG flow corresponding to different initial conditions in the same phase become more similar as they approach \(\Lambda =0\), as the dominant physics will be the same for all of these. A mathematically thorough analysis of the FRG equations regarding this property, however, is beyond the scope of this paper.

As we only have analyzed a subset of the plethora of available ODE integrators in this paper, we cannot claim to have found the “best” integrator for FRG system, but we ensured to represent a spectrum of the common choices.

We also have focused on the simplest physically interesting model, the square-lattice Hubbard model. While we believe, that the scaling of computational cost can be extrapolated to other problems, more complicated problems posed with more intricate Fermi surfaces may require denser integration steps, the best algorithms may therefore differ.

Furthermore, when using other approximations of momentum space FRG, for example self energy inclusions [50] or multiloop [9, 16, 51, 52], the RHS equations are subject to structural changes. The inclusion of self energies might benefit from the use of operator splitting methods, such as the generalized Strang or Leapfrog splitting [53], where self energy and vertex are evaluated at alternating half-step intervals. The combination of operator splittings together with the Magnus series could also enhance its competitiveness when certain parts of the equations can be treated analytically exact.

The merit of our results is still valid, the best integrators as found here will be good candidates for the other applications.

Another path of algorithmic progress can be made by taking into account the intricacies of the FRG equations. Since their integration starts at infinity we propose to investigate transformations of this semi-infinite domain as in Refs. [20, 54, 55].

Furthermore, the integration of the FRG equations contains an inherent singularity corresponding to the physical ordering tendency. Implicit methods [56] are well suited for such problems, but were out of the scope of our investigation due to memory requirements. It could be a worthwhile direction of future investigation. More specialized multistep integrators that utilize rational interpolants [56, 57] instead of Newton and Lagrange polynomials for the definition of their integration rule could be beneficial in dealing with the singular behavior. But of course their effectiveness rests on methods of obtaining the action of the Jacobian of RHS or suitable approximations to it.

Table 1 Full set of considered single-step integrators. The disregarded integrators have the reason given for disregarding them