Sampling the lattice Nambu-Goto string using Continuous Normalizing Flows

Effective String Theory (EST) represents a powerful non-perturbative approach to describe confinement in Yang-Mills theory that models the confining flux tube as a thin vibrating string. EST calculations are usually performed using the zeta-function regularization: however there are situations (for instance the study of the shape of the flux tube or of the higher order corrections beyond the Nambu-Goto EST) which involve observables that are too complex to be addressed in this way. In this paper we propose a numerical approach based on recent advances in machine learning methods to circumvent this problem. Using as a laboratory the Nambu-Goto string, we show that by using a new class of deep generative models called Continuous Normalizing Flows it is possible to obtain reliable numerical estimates of EST predictions.


Introduction
In the last few years Effective String Theory (EST) has emerged as a highly promising approach for the understanding and modeling of the non-perturbative behavior of confining Yang-Mills theories.In this framework, the confining flux tube that connects a quark-antiquark pair is represented as a thin vibrating string [1,2].
In D ̸ = 26, where D are the space-time dimensions of the target Lattice Gauge Theory (LGT), the EST (at least in its simplest formulation) is anomalous at the quantum level and thus must be considered only as an effective, large-distance description of Yang-Mills theories.Notwithstanding this, precise Monte Carlo simulations of several different LGTs proved that it is indeed a highly predictive effective model (for recent reviews see for instance [3,4]).
The reason of this success is in the so called "low energy universality" [3,[5][6][7][8][9] which states that, due to the symmetry constraints imposed by the Poincaré invariance in the target space, the first few terms of the EST large-distance expansion are universal and coincide with those of the Nambu-Goto action [1].As a consequence, the EST turns out to be much more predictive than typical effective models and its predictions depend only on one free parameter: the string tension σ of the Nambu-Goto model.
It is thus clear that a central role in this game is played by the Nambu-Goto action.Its predictions can be immediately compared with the results of Monte Carlo simulations of essentially any confining LGT (we shall comment below on a few relevant exceptions to this statement).Thus, a precise understanding of its physical properties may lead to a better understanding of the confining regime of Yang-Mills theories.
A major progress in this context was the recent observation that the Nambu-Goto model is actually an exactly integrable, irrelevant, perturbation of the two-dimensional free Conformal Field Theory (CFT) [8] of D − 2 free bosons which represent the transverse degrees of freedom of the string.This irrelevant perturbation is driven by the T T operator of the D − 2 free bosons [10,11].
We shall see below that several results are analytically known for the partition function of the Nambu-Goto action, which can be explicitly calculated for essentially all the geometries which are relevant for the comparison with LGT results.On the contrary, much less is known on the correlation functions, and in particular on the one that measures the density of the chromoelectric flux in presence of quark sources: this quantity has an important meaning in LGT, since it allows to study the shape of the flux tube.
In principle one could address this problem, by treating the Nambu-Goto action as a spin model and regularizing it on a two-dimensional lattice, where one could perform Monte Carlo simulations.However, due to the strong non-linearity of the model, it turns out that performing such simulations using standard algorithms is highly inefficient.The problem is instead perfectly suited for a completely different numerical approach, based on deep learning architectures.
In this respect it is important to stress that the spin model that we obtain discretizing the Nambu-Goto action is critical for the whole set of values of the string tension.Its continuum limit is in fact, as mentioned above, the theory of a free bosonic massless degree of freedom perturbed by the T T operator.This makes this model a perfect laboratory to test different algorithms in a particularly challenging setting.
In recent years, the exponential growth of machine learning has stimulated the introduction of novel methods for numerical studies of quantum field theory [12].A very promising class of algorithms are Normalizing Flows (NF) [13], deep generative models used to efficiently sample from complex statistical distributions.NFs provide highly flexible, invertible and differentiable maps between suitable base distributions and the target.One possible application of such models is to sample configurations from Boltzmann distributions: this approach found successful application in toy models of lattice field theory [14][15][16][17][18][19][20][21].While standard NFs still suffer from poor scaling [22], a generalization of these algorithms called Continuous Normalizing Flows (CNFs) [23] has been used to obtain interesting results in lattice scalar field theory [19,20,24].
In this paper we used a CNF architecture to evaluate the partition function and the shape of the flux tube of a lattice-regularized version of the Nambu-Goto model.Thanks to the exact knowledge of the partition function we can benchmark the correctness of this novel numerical approach and then use the results to obtain information on the shape of the flux tube.This rather unusual approach to EST calculations is interesting for several reasons that we discuss in the following.
• It allows to study large values of σ which are not accessible in LGT simulations; they are of great interest since they correspond to the perturbative regime of the T T perturbation which is controlled by 1/σ [11].
• On the LGT side, it allows for a precise study of the flux-tube shape predicted by the Nambu-Goto model, which can be used as a benchmark to compare results from LGT simulations.In this context, for instance, we will confirm the predictions of [25].
• It is the first step toward an analytic study of the corrections beyond Nambu-Goto, which have been recently the subject of great interest in the EST community [26].
• With a suitable modification of the Lagrangian of our model it would be possible to study, with this same methods, the so-called "rigid string", for which very few analytic results are known.This would provide the essential missing ingredient to test if the rigid string is the correct model to describe theories, such as the three-dimensional U (1) model [27], in which confinement is due to monopole condensation [28].
• It allows to study in detail the role of lattice artifacts in the Nambu-Goto predictions for LGTs.
• Last, but not least, this is the first numerical study of the lattice realization of a T T perturbed model and could provide a tool to test in a non-trivial setting various predictions obtained in the last few years on this class of models.
While some of these goals are for the moment beyond the scope of this contribution, this work represents a proof of concept for the feasibility of this approach.As a test of the efficiency of this approach we numerically evaluate the next-to-leading order of the flux tube width, for which no Monte Carlo result in LGT had been obtained up to now.This paper is organized as follows.In section 2 we will briefly recall a few basic results on the Nambu-Goto model, then in section 3 we will describe the technical aspects of Continuous Normalizing Flows and how we obtained our numerical results, which we describe in detail in section 4. The last section will be devoted to some concluding remarks.

Nambu-Goto string
We shall recall here only a few basic results on EST and in particular on the Nambu-Goto string.We refer the interested reader to the reviews [3,4] for a more complete discussion.
In EST, the quark-antiquark potential is modelled in terms of a vibrating string.In particular, the correlator between two Polyakov loops is related to the sum over all the surfaces bordered by the two Polyakov loops weighted by the EST action: where R denotes the distance between the two Polyakov loops.
The simplest choice for S EST fulfilling the constraints imposed by the Lorentz invariance in the target space is the Nambu-Goto action [1] that is defined as follows: where g ≡ det g αβ and is the metric induced on the reference world-sheet surface Σ by the mapping X µ (ξ) of the worldsheet in the target space, and ξ ≡ (ξ 0 , ξ 1 ) denote the worldsheet coordinates.This term has a simple geometric interpretation: it measures the area of the surface spanned by the string in the target space and has only one free parameter: the string tension σ. S NG is manifestly reparametrization-invariant and the first step is to fix this invariance.The standard choice is the so-called "physical gauge".In this gauge the two worldsheet coordinates are identified with the longitudinal degrees of freedom of the string: ξ 0 = X 0 , ξ 1 = X 1 , so that the string action can be expressed as a function only of the (D − 2) degrees of freedom corresponding to the transverse displacements, X i , with i = 2, . . ., (D − 1) which are assumed to be single-valued functions of the worldsheet coordinates.In the following, for simplicity, we assume D = 3 so as to have only one transverse direction.We thus eliminate the index i and simply denote as X(ξ) the remaining degrees of freedom.
It is well known that this gauge fixing is anomalous at the quantum level and the resulting action should only be considered as a large distance approximation of the true string action.This is the reason why we define this approach as an "effective" string description of confinement.
With a suitable redefinition of the fields, X = ϕ/ √ σ, we can write explicitly this large distance expansion as follows where R denotes, as above, the distance between the two Polyakov loops and L their length.The first term of this expansion is exactly the gaussian action, i.e. a two dimensional Conformal Field Theory (CFT) of a free bosonic field which represents the only remaining transverse degree of freedom of the string in D = 3. Remarkably enough, all the remaining terms of the expansion combine themselves to give an exactly integrable, irrelevant perturbation of this CFT [8], driven by the T T operator of the D − 2 free bosons [10,11].Using the zeta function regularization it is possible to evaluate exactly the partition function of this CFT to all orders in this irrelevant perturbation.The result is [6,29] ⟨P (x) where K 0 is the modified Bessel function of order 0, w n is the multiplicity of the closed string states which propagate from one Polyakov loop to the other, and E n their energies: From the Polyakov loop correlator it is possible to obtain the interquark potential, which is defined as follows and in the large-R limit we have, as expected, a linearly rising potential whose slope is controlled by the ground state energy of the EST In the following for our comparisons we will perform an expansion in powers of 1 σRL of the action.The first few terms of this expansion were actually obtained by Dietz and Filk [30] much earlier than eq.( 4), by treating the next-to-leading terms which appear in eq. ( 3) as small perturbations of the gaussian free action.In particular the first term is the well-known partition function of the free bosonic c = 1 CFT where η(τ ) denotes the Dedekind η function To understand the meaning of this result it is useful to expand it in the two limits R ≪ L and R ≫ L, in which we can neglect the q n terms in the infinite product of the Dedekind function.These limits correspond in the LGT language to the low-and high-temperature limits of the confining regime1 while from a string point of view they correspond to the open and closed string channels respectively: The − π 24R term in the first limit is the well known "Lüscher term" which has been widely studied in the LGT context.The next-to-leading term (i.e. the 1/σRL correction) can be recast in the following combination of the Eisenstein functions E 2 and E 4 [30]: where E 2 and E 4 can be expressed in power series and σ(n) and σ 3 (n) are, respectively, the sum of all divisors of n (including 1 and n) and the sum of their cubes.
For the range of values of σ, R and L that we shall study in this paper the terms proportional to q can be systematically neglected and we end up with the following expressions in the two limits: It is important to stress the role in these calculations of the zeta function regularization, which eliminates all the "bulk", divergent terms of the above partition functions.We shall further comment on this point below.

Width of the flux tube according to the Nambu-Goto action
In the EST framework the width of the flux tube in the position (ξ 0 , ξ 1 ) is given by where R, L denote the distance between the Polyakov loops and their length.This correlator is singular and must be regularized: the most natural choice is a point-splitting regularization: Due to the periodic boundary condition in the time direction the result must be translationinvariant in that direction and hence we do not expect any dependence on ξ 0 .Moreover, following the usual convention in LGT we fix ξ 1 to be exactly the midpoint between the two Polyakov loops ξ 1 = R/2, so that the EST width is a function only of R and L.
As we mentioned in the introduction much less is known analytically on w 2 with respect to the partition function discussed previously.In particular, no result is known for the width using the whole Nambu-Goto action and obtaining a numerical estimate of this result is one the goals of our line of research.An explicit expression can be obtained for the free Gaussian action [31] (see also [32] for a detailed calculation of the width in the cylindric geometry in which we are presently interested) where θ i are the Jacobi θ functions defined as: (−1) n q n(n+1) sin(2n + 1)z and This expression simplifies in the limits L ≫ R and R ≫ L in which we are interested: In this case, setting R c = π|ϵ|/2 we find the well-known logarithmic increase of the flux tube width with the interquark distance R • L ≪ R In this case, setting L c = π|ϵ| we have instead a linear increase of the width as a function of R σw Remarkably enough it is also possible to obtain the next to leading contribution to w2 [25].Also in this case the expression simplifies in the two limits and one finds: These are the results that we shall compare with our numerical simulations.

Lattice regularization of the Nambu-Goto action
As we mentioned in the introduction, our strategy is to treat the Nambu-Goto model as a twodimensional spin model and then study it numerically.To this end the first step is to discretize the model on a two-dimensional lattice representing the worldsheet of the string, which is not to be confused with the D−dimensional lattice in which the original LGT lives.Following the usual convention we subtract from the Nambu-Goto action the bulk term proportional to the area and rewrite it as: where Λ is a square lattice of size L × R with index x = (x 0 , x 1 ) representing the worldsheet of the string 2 and lattice step a = 1; ϕ(x) ∈ R is a real scalar field representing the transverse degrees of freedom of the string.Along the temporal extension of length L we fix periodic boundary conditions ϕ(x 0 , x 1 ) = ϕ(x 0 + L, x 1 ); thus, we can identify the physical temperature as the inverse of L. Along the spatial extension of length R we fix Dirichlet boundary conditions ϕ(x 0 , 0) = ϕ(x 0 , R) = 0, which represent the Polyakov loops P and P † .The only coupling constant of the model is the string tension σ.
As in the previous section we can expand eq. ( 24) in powers of 1/σ: where is the lattice discretization of the free boson action.
The lattice discretization has two main effects: the first is the appearance of a set of "bulk" constants, which diverge in the continuum limit3 , the second is the appearance of finite size corrections which are proportional to negative powers of the lattice sizes R and L and vanish in the continuum limit 4 .Both sets of terms are not universal, but depend on the details of the lattice regularization and are automatically eliminated by the zeta function regularization which selects only the adimensional terms in the expansion.These adimensional terms are the only ones which are "universal" in the renormalization group sense and survive in the continuum limit.Our main goal in the following will be the comparison of our numerical results with the zeta function predictions for these universal terms.
While the non-universal terms are in principle undetermined, with a few simple choices for the lattice regularization, the contribution to the partition function of S FB can be evaluated exactly also on a finite lattice, thus allowing to use them too as benchmarks of our calculations.
We sketch here the main steps of the solution (see also [33] for further details).The lattice action can be diagonalized with eigenfunctions and eigenvalues Performing the Gaussian integrations we obtain or equivalently This sum is divergent and is usually regularized with the zeta function.However, as mentioned above, in the present case we are interested also in these non-universal terms, because they will appear in the simulation of the whole Nambu-Goto action, and we can use the exact solution of the free bosonic part to fix them.
Indeed the sum in eq. ( 30) can be evaluated numerically with any precision and besides the Dedekind function one finds two divergent contributions, which are proportional respectively to the area RL and to the length of the boundary L: with being the constants which we shall use in section 4 to benchmark our numerical results.The width of the regularized Nambu-Goto string can be expressed as: where the expectation value ⟨...⟩ x 0 is computed also by averaging over the temporal dimension x 0 (due to the translation invariance imposed by the periodic boundary conditions).In fig. 1, a schematic representation of the configurations ϕ is reported.2.3 Implications of our approach for EST modelling.
The increase in precision of simulations in Lattice Gauge Theories is posing new challenges to EST models.In particular it is by now clear that the Nambu-Goto action should be considered only as a first approximation of the actual EST describing the confining regime of LGTs [26].In order to address more complex models one should first have a perfect control of the Nambu-Goto EST.This has been achieved in the past years for the partition function of the model but it is far to be reached for the correlation functions and in particular for the width of the flux tube.This width, and more generally the shape of the flux tube is one of the main observables of interest in LGT simulations and has been shown to discriminate among different confining mechanisms [27].For instance, we expect a Gaussian-like shape for the Nambu-Goto action while in presence of a term (beyond the Nambu-Goto action) proportional to the extrinsic curvature of the string we expect an exponentially decreasing shape, with the appearance of a new scale similar to the London penetration length of the Abriksov vortices.The main analytical tool we have to address these problems is the zeta function regularization, but, as we mentioned above, there are situations in which it cannot be used due to the complexity of the observable or to the non-perturbative nature of the corrections.Our approach can be used to overcome these problems and in fact, as we shall see below, we obtain for the first time a precise determination of the next to leading term of the flux tube width for the Nambu-Goto EST.This is a preliminary step toward a similar analysis in more complex models and, possibly, will give us a tool to discriminate among different confining mechanisms.

Continuous Normalizing Flows
Normalizing Flows [13,34] are a deep learning architecture based on invertible and differentiable transformations f : R V → R V which interpolate between a prior distribution q 0 (z), z ∈ R V and a target distribution, which in our case is set to be the Boltzmann distribution Continuous Normalizing Flows represent a particular implementation of such architectures: they can be constructed setting f to be the solution of a Neural Ordinary Differential Equation (NODE) [19,20,23,24] where x is a site of the lattice Λ.The probability density q of the generated samples ϕ can be readily computed solving the ODE In this work, we consider an architecture inspired by [20].The vector field g θ that we use is defined as: where the time kernel K(t) ∈ R F is given by the first F coefficients of a Fourier expansion, while W ∈ R F ×V ×V is a tensor of learnable weights with V = L × (R − 1) being the volume of the active variables of the configurations ϕ(t).The divergence of the g θ defined in eq. ( 36) is trivial and is found to be This model corresponds to a continuous in-time extension of the classical linear neuron y = W ϕ.
CNFs can be trained so that the learned distribution q well approximates the target p by minimizing the Kullback-Leibler divergence [35]: Since the partition function Z of the target theory is a constant, the problem can be solved by minimizing the first term of eq. ( 38) (also called variational free-energy), which provides also an upper bound on log Z.After training, the partition function of the target p can be computed using the following estimator where we introduced the weight Moreover, an estimator for the expectation value of a generic observable O can be computed with a reweighting procedure, also referred to as Importance Sampling in the machine learning field [15,36]:

Results
The focus of our numerical study is the computation of both the partition function and the string thickness in two different regimes of the Nambu-Goto theory: high-temperature (HT) (R ≫ L) and low-temperature (LT) (L ≪ R).These observables are computed through eq. ( 41) using 10 6 configurations sampled with trained CNFs at fixed σ, L, and R. For all the models, we use F = 3 for the temporal kernel of eq. ( 36) and integrate the NODE over 13 steps with T = 1.We initialize the elements of the vector field to 0 (identity initialization) and train the CNFs for 1000 Adam [37] iterations with 10000 samples, 0.0005 initial learning rate, β 1 = 0.8, β 2 = 0.9, and a cosine annealing scheduler.
The performances of the CNFs have been tested using as a metric the so-called Effective Sample Size (ESS) [17,38]: and serves as a measure of the overlap between the generated probability density q and the target p.It takes values between 0 and 1, with the latter representing a perfect training (q = p).We generally use models with ESS ≥ 0.1.Due to the limits of the architectures used in this work, we report results for σ > 5.0 in the HT regime and for σ > 10.0 in the LT regime.To reduce the computational cost of the simulations, we trained each combination of L and R for σ = 5.0, 10.0, 25.0, 100.0 (σ = 5.0 only for the HT regime); we then used these models as the initialization for CNFs with different string tensions.When performing this transfer learning procedure between models, we train the new CNFs with 100 Adam iterations with 0.00001 as the initial learning rate.The error is estimated using a jackknife procedure.The code is based on the PyTorch library [39] and a simple implementation can be found in [40].We ran the computation on Tesla V100 GPUs, with a total cost of approximately 2600 CPU hours (1 GPU hour corresponds to 3 CPU hours).

Simulation setup
For the high-temperature regime we considered 1804 combinations of L, R and the string tension σ.More precisely, we performed simulations for 11 even values of R between 50 and 100, 9 values of L between 4 and 12, 21 values of σ between 5 and 300 for L < 8 and 16 values between 10 and 300 for L ≥ 8.In fig.2, the ESS is reported as a function of R, for different σ and L = 10.This metric indicates that the overlap between the learned and target distribution is very good, even if the scaling with the volume and with σ can be a limiting factor for the architectures under consideration in this study.
In the low-temperature regime, we performed simulations with 1008 different combinations of L, R and the string tension σ.We extracted results for eight values of R between 8 and 22, six values of L between 84 and 94 and 21 values of σ between 10.0 and 300.0.In fig. 3 we report the ESS as a function of R for different σ and L = 90.Due to the larger volumes under consideration in this regime the quality of the flows is generally lower than in the HT regime, but it guarantees an efficient sampling of the target distribution nonetheless.LT (R) a (2)  1: Results for the coefficients of the fit of log Z of eq. ( 43) in the LT regime.

Partition Function
As mentioned above we benchmark our approach using results for the partition function.Since we focus on the study of the large-σ region, we used only data for σ ≥ 40.The most direct check is in the low-temperature regime.

Low-temperature regime (L ≫ R)
Following the discussion of section 2, we fitted our data for the partition function for each value of R, assuming a linear dependence on L which keeps into account both the area and the perimeter terms; then we looked at the first few terms of the 1/σ expansion of this linear behaviour Results of these fits are reported in table 1.
Then we used the a (0) LT (R) coefficient to further validate our results, since its behaviour should coincide with that of the lattice discretization of the free bosonic action (i.e. the σ → ∞ limit of the Nambu-Goto action).Fitting these values according to led to values for the constants A (0) LT and C (0) LT which are compatible with those reported in eq. ( 32) (see table 2).
Moreover, the B (0 LT coefficient is compatible with the coefficient of the Lüscher term − π 24 = −0, 13089969..., which is the only remnant in this limit of the Dedekind function.The agreement with the expectation is clearly visible in fig.4, where we plotted the quantity a (0) as a function of R and compared it to the Lüscher term − π 24R .A similar analysis can be performed also for the higher order terms of the 1/σ expansion.We report for completeness in the second line of tab.2 the results for a LT (R) of eq. ( 44) (upper table) and of a LT (R) of eq. ( 45) (lower table) in the LT regime.expression: LT of eq. ( 44) as a function of R compared to the Lüscher term − π 24R (solid line).

High-temperature regime (R ≫ L)
A non trivial aspect of this regime is that, due to the resummation of the infinite set of exponentials in the Dedekind function in eq. ( 9), a logarithmic term appears in the partition function: thus, the analogous of the Lüscher term has a different coefficient moving from − π 24R to − π 6L 2 (see eq. ( 11)).We first performed a set of preliminary fits keeping the coefficient of the logarithmic term as a free parameter: since the result was compatible with very high precision to 1/2, we L a (0) HT (L) a HT (L) a (3) HT (L) χ 2 /d.o.f. 4 -1.477442(2)-1.0654(5) 0.37( 4 3: Results for the coefficients of the fit of log Z of eq. ( 46) in the HT regime.
then fixed the coefficient to this value, subtracted the log term from the data and fitted again with the following function: HT (L) HT (L) Results are listed in table 3.Then, in analogy to the low-temperature regime, we fitted the a HT (L) coefficient with a (0) and the corresponding results are reported in table 4. The value of A HT is again compatible within two standard deviations with A FB and B (0) HT is compatible with the expected value − π 6 = −0.523598....The quality of this agreement is visible in fig. 5 where a   In a similar way we also studied the L dependence of the "perimeter" term c HT (L) by fitting: Results are listed in table 4, where we see that the dominant term C HT is in excellent agreement with the expected value C FB .We report for completeness also the results of the fit to the 1/σ HT (L).Using a HT (L) = A (1) we found the values reported in table 4.

Width of the flux tube
The main goal of this contribution is to show that with a CNF-based simulation it is possible to compute numerically the next-to-leading correction to the flux tube width, which has never been measured before in LGT simulations.At the same time we shall use the agreement of our results with the leading order predictions for the flux tube width as a further sanity check of our approach.For the analysis of the flux tube width we used all the values of σ, R and L at our disposal.In the low-temperature regime the next-to-leading corrections are too small and will be below our resolution; however, in the high-temperature regime they are larger and within the precision of our numerical approach.

Low-temperature regime (L ≫ R)
Following the discussion of section 2.1 we fitted our results with where we expect (in agreement with eq.22) f LT = 1 2π = 0.159155... for the leading correction, e LT = 5 96 = 0.05208... for the next-to-leading ones.The results for the various coefficients are reported in table 5.For the leading term we found an excellent agreement with the expected analytical values while, as anticipated, for the next-to-leading ones the statistical uncertainty of the fits is of the same order of the corrections we aim to detect.In fig.6 we report values of σw 2 as a function of R for different L at fixed σ = 100.0,with the corresponding curve from the fit.The situation is more interesting in the high-temperature regime.In agreement with eq. ( 23) we fitted our data with where we expect for the leading term j HT = 1 4 , l HT = 1 2π = 0.159155..., and for the next-toleading correction i (1) HT = π 6 = 0.523598...; results are reported in table 6.We see an excellent agreement with the expected values for all the constants, with a relative error on the next-toleading correction which is less than 10%.As we mentioned in section 2.1, a non-trivial behaviour of the flux tube width in the HT regime is its linear increase with the interquark distance R.This feature is clearly visible in fig.7, where we plotted the results of the simulations, together with the best fit curves for a few selected values of L as a function of R. In the same figure it is also possible to appreciate the fact that, as L increases, the angular coefficient of the linear term decreases with 1/L.
Then, in order to isolate the next-to-leading contribution, we looked at the following quantity: where we used the best-fit values for the coefficients j HT , k HT , l HT and i    In order to further validate the efficiency of our approach, we compared the results obtained with flows with simulations performed using the Hybrid Monte Carlo (HMC) algorithm [41].We performed several simulations for various volumes L × R and σ = 10, 25, 50, 75, 100 in order to measure the width σw 2 .We trained and evaluated the CNFs using the same procedure described at the beginning of this section, whereas for the HMC we generated 100000 thermalized configurations for each value of σ.For each molecular dynamics trajectory we fixed ϵ = 0.1 and integrated over 10 steps.We compute errors and autocorrelations for results generated with the HMC algorithm using the Γ−method [42] implementation by [43].
In fig. 10, we report the bias between the CNFs and the HMC simulations: the results are fully compatible well within the statistical error both when varying the system size or the string tension.In order to fully appreciate the problematic scaling of the HMC when simulating this model, in fig.11 we plot the integrated autocorrelation time τ int of the width for HMC simulations performed with increasing system sizes.The behaviour is the one typical of a model affected by critical slowing down: when the volume grows, the autocorrelations grow very rapidly.The samples generated with CNFs are independent by construction (and thus τ int = 0.5), but the error on the string width is still affected as the sampling is more difficult.Thus, in fig.12 we show the error of both methods multiplied by the square root of the sampling time for a transparent comparison between the cost for the measurements in the two cases.The trained flows appear to be more precise than the HMC simulations by as much as a factor 4 when 1000 replicas are used to perform HMC simulations, and by about two factors of magnitude when just one replica is used, always keeping the overall statistics fixed.The role of replicas is clearly understood, as they represent independent HMC simulations and are advantageous when large autocorrelations are present in each replica.However, if the number of replicas is increased further thermalization times should be then taken into account in the cost of the sampling.Overall, these results strongly indicate that in the numerical simulation of the Nambu-Goto string CNFs possess a clear advantage with respect to the more traditional Monte Carlo approach.Furthermore, we also mention the fact that in traditional Monte Carlo simulations only ratios of partition functions are directly accessible and their computations via integration methods are notoriously cumbersome.On the other hand, Normalizing Flows offer the possibility (at least in relatively simple models) to compute directly the partition function itself.In light of the results for the flux tube, where the partition function itself has the important physical meaning of the interquark potential, CNFs clearly represent the more efficient option.However, since a careful analysis of the efficiency of the two approaches in computing log Z is beyond the scope of this work, we refer to [15] for a rigorous comparison (albeit with a different NF architecture).

Concluding Remarks
The main goal of this contribution was to show the feasibility of a high precision numerical study of the Nambu-Goto action with a deep generative architecture based on Normalizing Flows.All the terms that we found in the behaviour of the observables under study (the partition function and the width of the flux tube) were already known using the zeta function regularization, but were never observed before in the framework of a lattice regularization of the Nambu-Goto action, due to the intrinsic complexity of performing simulations with traditional methods with such a non-linear action.As a matter of fact, one of the terms that we studied, i.e. the next-to-leading correction to the flux tube width that we discussed in section 4.3.2,had never been observed before even in high-precision lattice gauge theory simulations.
On the numerical side, we sampled thousands of different combinations of σ, R and L. This wide exploration of the phase space of the system was mandatory to reach the needed precision in our analyses.This sampling could be achieved only thanks to the specific properties of NF algorithms: transfer learning, sampling without autocorrelations, easy GPU parallelization.We would like to stress the fact that this class of algorithms proved itself to be much more efficient than the traditional Monte Carlo approach in a setting which is not a simple toy model.
Furthermore, we see a few natural applications of Normalizing Flow-like methods, that we list below.
• They could be used to study the Nambu-Goto action in situations in which the observable of interest is too complex to apply the zeta function regularization.This is the case for instance of the higher order corrections of the flux tube width.In this case, in particular, they could be used to test a few existing conjectures on the behaviour of this quantity to all orders in 1/σ (see for instance [44]).
• They could be used to evaluate the contribution of next-to-leading boundary contributions to the effective string action.
• They could be used to study corrections to the effective string action beyond Nambu-Goto.
In particular a numerical approach of the type discussed in this paper would be mandatory in regimes in which these corrections are large (as we expect to be the case for, say, the U (1) model in three dimensions) and cannot be treated as small perturbations of the Nambu-Goto action.
• They could be used to study numerically the T T deformation of the compactified boson, an issue which could be of interest for the description of the spatial string tension in hightemperature LGTs [45].
To conclude, we showed how NFs (and their various generalizations) can already play a crucial role in the understanding of non-perturbative behaviour of Yang-Mills theories through reliable and precise simulations of EST.In particular, the feasibility of numerical simulations with flows sets the stage for more refined computations in the context of effective string theory actions.

Figure 1 :
Figure 1: Schematic representation of the lattice configurations ϕ with volume L × R = 10 × 10.The green sites represent the Dirichlet boundary and they are fixed to 0, blue and red sites represent the "active" volume seen by the CNFs.The width σw 2 of one configuration ⟨ϕ 2 (x 0 , R/2)⟩ x 0 is computed by averaging over x 0 the square of the red sites.

Figure 2 :
Figure 2: Effective Sample Size as a function of R for fixed L = 10 and various σ.Error bars are not visible due to the very small statistical errors.

Figure 3 :
Figure 3: Effective Sample Size as a function of R for L = 90 and different σ.

( 1 )Table 2 :
LT (R), fitted with theA Results for the fit of a (0) HT L is plotted as a function of L and compared to the Dedekind function prediction − π 6L .

Figure 6 :
Figure 6: σw 2 in LT regime as a function of R with σ = 100.0and different L. Dotted line represents the best fit of the measures.The error bars are narrower than the points.

4. 3 . 2
High-temperature regime (R ≫ L) HT and we took the average over different values of R (as represented by the ⟨...⟩ R parentheses).Namely, we are

Figure 8 :
Figure 8: Plot of ⟨σw 2 N LO ⟩ as a function of σ for three values of L compared to the expected analytical solution π 6σL 2 .

Figure 11 :
Figure 11: Plot of the integrated auto-correlation time (τ int ) of the HMC algorithm for L × L lattices with a fixed σ = 10.The solid line denotes the value for uncorrelated configurations.

Figure 12 :
Figure 12: Error on the thickness ∆σw 2 multiplied by the square root of sampling time in a 20 × 20 volume and varying σ (left plot) and on L × L lattices with σ = 10 (right plot).

Table 4 :
Results for the coefficients of the fit of eq.(47) (upper table), for the coefficients of eq.(48) (middle table) and for the coefficients of eq.(49) (lower table).

Table 5 :
Results for the coefficients of the fit of the string thickness of eq.(50) in the lowtemperature regime.