TU2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^2$$\end{document}FRG: a scalable approach for truncated unity functional renormalization group in generic fermionic models

Describing the emergence of phases of condensed matter is one of the central challenges in physics. For this purpose many numerical and analytical methods have been developed, each with their own strengths and limitations. The functional renormalization group is one of these methods bridging between efficiency and accuracy. In this paper we derive a new truncated unity (TU) approach unifying real- and momentum space TU, called TU2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^2$$\end{document}FRG. This formalism significantly improves the scaling compared to conventional momentum (TU)FRG when applied to large unit-cell models and models where the translational symmetry is broken.


Introduction
Predicting the phase diagrams of real materials is one of the central goals and challenges of condensed matter research.For this purpose, many numerical techniques have been developed for many different scenarios [1][2][3][4][5].In the weak to intermediate coupling regime, pertubatively motivated approaches have been very successful [6][7][8][9][10]: Starting from simple random-phase approximation (RPA) [11] to the more elaborate selfconsistent fluctuation-exchange (FLEX) formalism [12], these methods have been widely used to study magnetism and superconductivity.The biggest drawback of these methods is that they are biased.In particular they do not account for all diagrammatic contributions up to a certain order.This problem can be remedied by adopting the Parquet approximation or using the functional renormalization group (FRG) [13,14].These methods sum up all diagrammatic contributions up to a certain order, therefore giving a coherent and unbiased picture of the phases of matter.This higher accuracy comes at a higher numerical cost, making calculations of many relevant toy models of real solids hard or even impossible.Thus, it is critical to find suitable approximations to reduce the numerical cost of these methods, while maintaining the advantage of unbiased predictions.An illustrative comparison of the three methods discussed above and the method presented in this paper is given in Fig. 1.There we summarize the capabilities of each of the four methods in five different categories: Few-Band, Many-Band, Non-translational invariant, Extended interactions and frequency dependence.Each of the methods has its strengths and weaknesses.For example, plain FRG can be used to consider the full frequency dependence of the effective interaction, but is hardly applicable to non-translationally invariant models without any further approximations.While this can be solved by using the TU 2 FRG, we lose part of the full frequency dependence to be applicable to a wider set of models as otherwise numerical cost would be too high.RPA is only calculating the spin-fluctuation mediated effective interaction and is thus numerically cheap; but inter-channel feedback is neglected and the concept of renormalization is not included.sFLEX builds on top of RPA and iterates its self-energy till convergence while including all spin and charge fluctuation diagrams, but this comes at the cost of comparable scaling to FRG for many-bands and nontranslational invariant models.
One approach recently put forward for this purpose is the use of truncated unities, the so called truncated unity FRG (TUFRG) [15] or truncated unity Parquet approach [16].The main proposition of these approaches is to reduce the numerical effort by reducing the degrees of freedom under consideration by truncating the dependence of weakly varying variables.This Fig. 1 Illustrative comparison of the capabilities of four different methods concerning different use cases.We compare the simple RPA+FLEX without self consistency, the self-consistent FLEX, standard FRG and the here presented TU 2 FRG.We define few-bands as the range between one and twenty bands and many-band as all systems with more than 20 bands.The corner frequency dependence is measured by the number of frequencies included in the method.The capability of each method to process effects of long range interactions is measured in the number of dependencies allowed for the effective interaction generated by the method.For all corners we used the scaling combined with the accuracy in terms of the number of diagrams included by the method to estimate its capability in this direction.
approach was already successfully applied in the square lattice Hubbard model [16][17][18] and graphene [19][20][21][22].On the other hand, in the real-space representation the so called extended coupled ladder approximation and analogously the real-space TUFRG was put forward [23][24][25][26].Another big advantage of methods like Parquet or FRG is that in principle one can benchmark the numerical accuracy by taking into account higher order terms.This has been recently implemented in one incarnation in the multiloop-formalism [27][28][29], where the authors were able to match results to numerically exact methods in the half-filled Hubbard model up to intermediate interactions strengths.Thus, there are currently two different directions of research in the methodological development of FRG, the first is the increase of numerical accuracy, the second is the adaption to broader classes of models.In this paper we will present a new approach which falls into the second category.It aims to enable simulations of models with broken translational symmetries, such as quasicrystals [26], boundary effects, finite size effects [30] or disorder.Additionally, we fuse this approach with known momentum-space TUFRG in order to enable studies of models with large unit cells or many orbitals, as in the case of twisted materials or Kagomé metals [31,32].For this purpose we derive a full TUFRG-scheme including momentum-, frequency-and real-space unities, in the following called TU 2 FRG.This can be seen as a key stepping-stone to establish FRG as the default method for calculating electronic instabilities for wide classes of electronic models.

Derivation of the flow equations in the unity-space
We start the derivation from the general flow equations in the second order truncation.As the derivation of these flow equations is well documented [13,33] we omit it here for the sake of conciseness, and just derive their representation in full unity-space, inserting a unity in all relevant degrees of freedom except spin.The flow equation for the self-energy reads where we defined the single-scale propagator The flow equation for the effective interaction Γ is separable into the three two-particle-irreducible (2-PI) channels according to Eq. ( 2), each of which is associated with a specific fermionic-bilinear: Γ (1, 2; 3, 4) =U (1, 2; 3, 4) + Φ P (1, 2; 3, 4) +Φ D (1, 2; 3, 4) + Φ C (1, 2; 3, 4). (2) We have the particle-particle channel (P ), whose bilinear is of cooper pair type.We have the direct particle hole channel (D), which has a density-density type bilinear and we have the crossed particle hole channel (C) with a spin-spin bilinear.Each of these bilinears can be linked to a different mean-field decoupling, thus, divergences in a channel indicate a phase transition to a certain ordered state [34].This decomposition allows a separation of the flow equations into the three channels [13]: dΦ D (1, 2; 3, 4) dΛ = −Γ (1, 4 ; 3, 1 )L(1 , 2 ; 3 , 4 ) where in-and out-going indices (of a diagramatic representation of these equations) are separated by a semicolon and we defined L(1 , 2 ; 3 , 4 ) = G(1 ; 3 )S(4 ; 2 ) + S(1 ; 3 )G(4 ; 2 ).
In principle, one just needs to integrate Eq. (1) and Eqs.(3a, 3b, 3c) numerically, which is a computationally heavy task and is not possible for many models of interest.In particular, the flow equations computation time scales ∝ O(N 6  o N 3 k N 3 ω ) with N o the number of orbitals, sites and spins per unit cell, N k the number of momentum points and N ω the number of Matsubara frequencies.In addition, storing the vertex requires ω elements, such that it becomes apparent that a more efficient representation has to be found for the study multi-orbital/multi-site models.
In the following we will derive the flow equations in the full unity-space.We will argue why this representation is advantageous when compared with brute force solutions, and what its limitations are.To keep our derivation as general as possible, we will keep all dependencies of the vertex; site, momentum, frequency and spin (note that we exclude the case of non-energyconserving models).The starting point for our derivation are the Mandelstamm variables, see Eq. ( 4), each of which is associated with one of the channel's momentum transfer.For brevity we condense the momentum and frequency contributions into a four vector denoted as q and collapse the spin and site indices into orbital indices o.
These Mandelstamm variables can now be used to rewrite the flow equations in compact fashion.For brevity we introduce an abbreviation for sets of increasing indices i 1 , i 2 , i 3 , i 4 as i 1 .. 4 .Additionally we use an altered Einstein sum convention, where we sum or integrate out each doubly occurring index.It has to be kept in mind that the momentum summations stem from a Fourier transformation, therefore they always include a normalization factor, which too gets suppressed for brevity.Thereby we arrive at the following set of flow equations.
These equations are the starting point for all subsequent derivations.So far, spin and orbitals or sites are treated equally, this means that o i = (õ i , s i ) is a multiindex consisting of both.For the following derivation we will denote spin and orbitals separately as spins and orbitals need to be treated differently and redefine õ ≡ o as the site and orbital index.
The motivation to transform into the unity-space follows from the behavior of RPA-like resummations of each channel [17]; In the flow equations, the lowest order of diagrams we do not account for is U 3 , which fixes our truncation order in the interaction.Usually the initial interaction has a finite range, or drops off as a function of the distance.Therefore, for the sake of the argument we now assume an interaction between nearest neighbors, with a density-density bilinear, thus U s1,s2,s3,s4 o1,o2,o3,o4 (ω 1 , ω 2 ; ω 3 ) = U 0 δ s1,s3 o1,o3 δ s2,s4 o2,o4 δ <o1,o2> , where δ <o1,o2> is only one if the two indices belong to neighbouring sites.If we now insert this interaction into the P -channel flow equation we obtain dΦ We observe that in the frequency space we do not generate terms depending on ω 1 and ω 2 , whereas in real-space, we will keep on generating terms ∝ δ <o1,o2> δ <o3,o4> if we insert Φ P into the right hand side again.Thus, on a single channel level, we cover all dependencies by only allowing for very specific index combinations, and neglecting all others as they will be always zero.The argument carries over to the other two channels.If we now include the feedback in between the channels by reconstructing Γ we will again generate additional dependencies.These will have a hierarchy in the distance, where larger distances correspond to higher order interaction terms increasing with the distance from the initial index combinations.As a consequence, we can identify leading and subleading dependencies of each channel, which can be exploited to arrive at an efficient description of the problem.
For this purpose we define a set of functions g by (o x , k) which give for each site x the momentum and frequency dependent connections to site y.These functions are required to form an orthonormal basis on the space of the momentum, site and frequency, thus fulfilling b1 o,k The choice of the basis set is not unique and we will discuss two possibilities later.With these unities we can define projections onto the leading dependencies of each channel as Note the spin reordering in each of the projections, which is performed to enable a reformulation of the flow equations in terms of batched matrix multiplications.These projections are designed such that we fully keep all ladder-like contributions of each channel.Depending on how we truncate the basis we take the feedback in between the channels into account only approximately.The inverse projections follow from the completeness relation as To benefit from the interaction hierarchy in the basis function dependence, we truncate the unity to small number of basis functions.Thereby, the projection procedure becomes approximate and the full effective interactions cannot be recovered exactly anymore.While the error between the truncated and the full unity is uniform in the number of basis functions included, the error in the projected channels is non-uniform, see discussion above.This allows for a faithful representation of the effective interaction with only a few basis functions.
For better readability we explicitly expand the summations when we insert a new unity.We begin with the flow equations for P [Φ P ] ≡ P , starting from Eq. (3a) in Eq. ( 15).Here, we defined the particle-particle loop derivative and analogously the particle-hole loop derivative as • G s1,s3 o1;o3 (p 1 )S s2,s4 n2;n4 (q where X ∈ {C, D}.The derivations for the flow equations for Ĉ[Φ C ] and D[Φ D ] follow analogously (as can be seen in App.Appendix A) and result in So far, this is a mere reformulation, but we already note that the flow equations are transformed into matrix products, and that the summation over momenta dP b1,b3 o1,o3 (q P ) s2;s4 s1;s3 is now only required inside the loop derivative calculations.
As it is impossible for many models to store the full reconstructed vertex we instead store the three projected channels P [ s (with N s the number of spins and N o th number of sites and orbitals in the unit cell).Depending on how many basis functions are required to reach convergence of the calculations, this can be a drastic reduction.For the derivation of the unity-space self-energy we recall that in principle we can approximately restore the full vertex as Inserting this into the self-energy flow equation results in The FRG in unity-space has therefore reduced to Eqs. (18,21,22,25) which need to be integrated.So far we did not discuss the specific form of the unity, nor specify how we implement these equations.

Specification of the unity
In this section we will discuss a suitable choice of momentum-and real-space unities.For the frequency unity we will stick to an on-site form factor expansion in the following, corresponding to a single-frequencyper-channel approximation [24,25,35].For a more in depth discussion of the frequency dependence we refer the reader to [36][37][38][39].To differentiate between the full basis functions we discussed before and the momentumand real-space basis, we refer to the latter as formfactor-bonds.The intuitive way to implement the unity in momentum-and real-space is to utilize a two step procedure: We formally write which amounts to using completely separate sets of bonds and form-factors.This is easy to implement as it simply adds another projection into existing momentum-space TUFRG codes.Additionally, the projections can be pulled apart making numerical implementations faster and easier to handle.However, this simple approach comes with a big drawback: For models with multiple sites per unit cell, graphene to name just one example, this approach always breaks the rotational symmetry as soon as we introduce a cutoff distance and neglect corresponding form-factor-bonds, as visualized in Fig. 2. The problem arises, as the definition of a unit cell introduces a preferred direction for each site, and momentum form-factors are generated in shells around the origin.Thus, the real space distance between sites is not covered correctly in the form-factor shells.Via this inconsistency we take some length-scales, and thereby interaction orders, only partially into account destroying the unbiased nature of the method.Even though this is not a problem if we converge in form-factor-bonds, it is still a bias and could possibly lead to unexpected behavior and renders observations of nematic phases doubtful.
Fig. 2 Visualisation of the first form factor shell in a honeycomb lattice (graphene), the reference unit cell is marked in light blue, the six first form factor shells are marked in pink.We observe, that if we include all bonds within the unit cell and the first form factor shell, we still do not include all connections within a distance of 2 √ 3 , but instead we lack the connection to the site marked in red.
Luckily, these issues can be resolved.For this, we start with a full real-space description of the lattice, where we set up bonds on the full lattice as real-space Kronecker deltas, as defined in Eq. (27).
Here r as opposed to r indicates that the objects are full lattice vectors.Therefore, the bond bi connects the site with index õi to another site, such that only if õj is equivalent to this site, the Kronecker delta is non-zero.This basis can now be truncated according to the length of the corresponding bonds ensuring the conservation of rotational and inversion symmetries.To return to a mixed space form-factor-bond basis, we need to Fourier transform these bonds according to: where we defined ri = R i + r i , with R i the existing lattice vectors and r i the vectors within the unit cell.We also introduce bi = B i + b i for the bonds.Again b i is equivalent to the connection between site i and the image of another site j within the same unit cell and B i gives the connection between the unit cells.It is important to notice, that these two length scales are fully separate, which means that two vectors from the two different sets, within the unit cell and beyond the unit cell, can never add to zero.
From the translational invariance it follows that the sum of two lattice vectors must be a lattice vector, i.e.R j = R l + R i and using this we can write Within the Kronecker delta we have two different and separate length scales, one within the unit cell, and one beyond.Due to the fully separate nature of the two length scales, the Kronecker delta can only be non-zero if both δ ri+bi,rj and δ R l ,Bi hold, which allows us to split the Kronecker delta into two parts = e −ikBi δ ri+bi,rj .
As one would naively expect we obtain as form-factorbonds the plain wave form factor multiplied by a suitable real-space Kronecker-delta, which is also obtained when applying two separate unities for real-and momentum-space after applying the so called filtering process of Ref. [21], in which the rotational symmetry gets explicitly enforced.
For each model, it has to be ensured that the results are converged in form-factor-bonds.These runs are costly in computation time and complex as the convergence is different at each point in parameter space.Thus it is important to understand the physical implications of a certain bond cutoff.For this purpose the pure real space representation of the form-factor-bonds, see Eq. ( 27), is most suitable.We observe, that removing certain bond vectors bi amounts to neglecting certain orbital combinations in the inter-channel projections.Their importance can be deduced by inspecting single-channel flows, which amount to RPA-like resummations.
The TU 2 FRG code consists of three numerically challenging steps; the calculation of the loop derivatives (Eq.( 19) and Eq. ( 20)), the inter-channel projections (Eq.( 23)) and the flow equations (Eq.(18,21,22,25)).We will now discuss each implementation in detail.The flow equations in the representation presented here already have the form of batched matrix products, which are implemented in BLAS-libraries and are numerically very efficient.Thus we only discuss the projections and the loop derivative implementation.
As the integral over p 1 is decoupled from the flow equation, we can refine the momentum integration of the loop [17,40].This can either be done by choosing a separate grid for the integration or by implementing an adaptive integration scheme.There are multiple options to implement this refinement; The simplest option is to use a finer mesh for the p 1 integration.This has the advantage of conserving all symmetries but is numerically more demanding then the second strategy.Alternatively we add an additional sum around each kpoint which sums up a mini Wigner-Seitz cell around this momentum point for each single-scale propagatorpropagator product, analogously to the refinment used in N -patch FRG schemes [40].So far we did not specify the cutoff-function we chose for the calculations.Here, many different cutoffs are possible each with specific advantages and disadvantages.The Ω-cutoff [41] for example is a smooth cutoff which simplifies the integration of the flow equation in the case of self-energy feedback.The temperature cutoff [33] allows for a physical interpretation of the critical scale as a critical temperature and the interaction cutoff [42] allows for scanning for the critical interaction strength in a single FRG run.Each of these cutoffs comes with a more or less severe drawback, for example the interaction cutoff does not regularize infrared divergences.The biggest complications arise if one is interested in large unit cell models, as for example twisted materials [31,43,44].Here the analytic Matsubara summation scales ∝ N 4  o , making it the numerical most costly part of the whole calculation by a factor of N o .On the other hand, the numerical Matsubara summation requires summing up many frequencies for convergence.If the calculation of the Greens-function is non-negligible, this step can also become the bottleneck of the calculation as for a reasonable resolution we need many frequencies, for which in each step the Greenfunctions have to be recalculated.A solution to these numerical issues is the sharp cutoff R(Λ) = θ(|ω| − Λ), which reduces the number of Greens-functions required to be calculated in each step to 2N ω − 1, but introduces discontinuities of the loop derivatives on the frequency axis.Therefore, the integration of the flow equations has to be performed more carefully, i.e. the integrator must never integrate over one such discontinuity.The integration procedure has to be adopted such that we end with an integration step right before the discontinuity and the next step is then performed starting from a point right behind it.Thereby we minimize the numerical error introduced by the discontinuity.

Projection
As we truncate the unity we are unable to exactly recover the three channels Φ or the effective vertex Γ exactly.Additionally due to memory constraints it is often impossible to even store the full vertex.Therefore, we need to derive efficient formulas for these inter-channel projections.The naive form of these projections, see Eq. ( 36) requires three Brillouin-zone integrations in addition to four form-factor-bond sums, making it numerically demanding.We will instead derive a form which has superior scaling and requires as few operations on the projected vertex channels as possible, as these are the largest objects in our calculation.Note that in pure momentum space, such derivations have been performed similarly [17,20] and are dubbed the real-space trick.Using the above defined form-factorbonds we can rewrite the projections explicitly starting with the C to P projection using q C = k 1 + k 3 − q P , see Eq. ( 36) till Eq. ( 41).
The idea is to perform a Fourier transformation of C in order to decouple the integration over momenta from the channel.In a second step, we then revert this transformation obtaining a more compact description of the formulas.The projections can now be implemented as follows: For each combination of incoming indices and bonds (o 1 , b 1 , o 3 , b 3 ) (TO) we store the allowed index combinations (o 1 , b 1 , o 3 , b 3 ) (FROM).For each element TO we need to store the offset to the first corresponding element in the FROM list, as well as the number of elements corresponding to this.Additionally, we cache all occurring e iq C (B3−B 3 ) , where (B 3 − B 3 ) has to be mapped back to the minimal representation within the o1,o3 (q P ) s2;s4 s1;s3 = dk 1 dk 3 o2,o4 = dk 1 dk 3 e −ik1B1 δ r1+b1,r2 e ik3B3 δ r3+b3,r4 e ik1B 1 δ r1+b 1 ,r4 e −ik3B 3 = extended unit cell, we will call this the form-factor map.Lastly, we need to store for each of the FROM and TO elements, which element of the form-factor map corresponds to it.In total we thus have five distinct arrays for the projection, which is reduced to a highly sparse reordering plus multiplication with a prefactor.The other projections can be treated analogously and the derivations can be found in App.Appendix B. An advantage of this rewriting is not only that we removed one momentum integration, but also do not sum over four independent form factors-bonds.The Kroneckerdeltas remove one of the summations, thus effectively reducing the scaling to ∝ N 3 b .In total, each of the interchannel projections scales

Advantages and Limitations
Here we will shortly discuss the main advantages and limitations of this approach.The additional calculations required for the TU reduce the scaling, but increase the time constant of the calculation.Therefore, it can be numerically more efficient to use a grid FRG implementation for models with small unit cells, the square lattice Hubbard model to name just one.Another drawback is the reliance on short ranged interactions, as the method is based on only developing long ranged behavior at high orders in U .This is however not as drastic as naively expected [15].Furthermore, this restriction to short ranged interaction introduces a slight bias towards short ranged fluctuations as inter-channel contributions are restricted to certain length scales.In form-factor-bond converged calculations this bias is however negligible.The advantages are superior scaling in all degrees of freedom, allowing for computations in many models previously unaccessible to FRG.
The main scalings of a flow step are shown in Fig. 3.
Using hybrid architecture and MPI, the time constant can be brought down further.Especially the usage of GPUs can reduce the the computation time a lot.To verify the validity of the implementation we ensured the reproduction of FRG benchmark results [45].
In stark contrast to brute force FRG, this implementation is not memory bound anymore, instead we are bound for most use cases by the computation time due to the linear scaling of the memory in the number of momentum points.The exception to this rule are systems with very large unit cells.Another big advantage of the method presented here, is that it can be readily employed in models without translational symmetry.We simply have to leave the momentum variables out of the equations and arrive at a real-space TUFRG formalism [26].

Application
Many body localization [46][47][48][49] has been a central topic in condensed matter theory in recent years.Its observation in optical gases [50,51] lead to a surge in theoretical works trying to understand this phenomenon.However, the methods applicable in this case are limited as the thermalization hypothesis does not hold and the systems are non-translationally invariant.So far there mostly DMRG [52,53] and ED [47] have been applied to investigate these phenomena.Both methods do not scale favorable in 2D.Here we aim to show, that the presented method could be used to study this phenomenon.For this purpose we consider a 8 site Hubbard chain with open boundary conditions, a random on site potential and on-site interactions where δ <i,j> = 1 if i and j are neighbouring sites, t = 1 is chosen as unit of energy and r i is a random number ∈ [−0.5, 0.5].As we include all terms up to order U 3 , we expect the error to scale accordingly, which is verified in Fig. 4.Even though we do not incorporate the full frequency content we stay below a relative error of about 1% up to U = 2.This offers a route to further exploring this fascinating phenomenon in higher dimensions with FRG.

Conclusion and Outlook
We presented a new full unity-space derivation of the level-two truncated FRG flow equations up to first loop order.The real-space variant of this approach has already been successfully applied to quasicrystals [26] and finite sized models [30].The TU 2 FRG we derived further extends the grasp of FRG significantly and enables calculations for many interesting systems.Additionally, we presented possible implementation strategies for the key operations of the flow of Γ 4 and have shown that the optimal scalings can be reached.Furthermore, we showed that the present implementation could be used to investigate disorder effects and possible many-body localization on a qualitative level.
The next step towards establishing FRG as the go to method for electronic instability calculations at weak to intermediate couplings is to prove its applicability in systems of interest, such as multi-layer graphene [54][55][56], twisted materials [57,58] and Kagome metals [32].Furthermore, it is desirable to implement a more sophisticated frequency unity which then enables us to study phonon and photon mediated superconductivity.Recent works in this direction [39] show promising results.The combination with the recently developed single boson exchange formulation of the FRG [59] also is an interesting route for further developments.The derived formalism is also directly applicable in Parquet approaches [16], extending the applicability of these method to larger unit cell models.
b1,b 1 o1,o 1 Here we already reduced the number of vertex-loopvertex contractions for the D-channel from three to one by performing a completion of the square.This has proven to be crucial in the case of large unit cells as we basically reduce the computational effort by 2  5 .

Fig. 3
Fig.3Scaling of the time t needed to perform a single flow step with respect to the number of momentum points (left), number of sites (center) and number of bonds, right.The scaling expected from the analytical expressions is shown in orange.

Fig. 4
Fig.4 Relative error of the occupation number predicted by the single frequency TU 2 FRG compared to exact diagonalization.Calculations were perfomed at T = 0 an 8 site open boundary conditions Hubbard chain with random on-site potentials.Even at intermediate interactions, here U = 2, the relative the error does not exceed 1%