Characterizing 4-string contact interaction using machine learning

,


Introduction
Despite possibly being one of the most intricate quantum field theory, our practical knowledge of closed string field theory (CSFT) and its possible solutions, yet alone our understanding of its quantum effects, is still limited after almost three decades from its initial formulation (for reviews see [1][2][3][4]).Although there were some attempts towards understanding the classical tachyon potential in the case of bosonic CSFT in the past [5][6][7][8][9][10][11][12][13][14][15][16][17], the closed string tachyon vacuum (or lack thereof) hasn't revealed itself entirely yet.This is primarily due to CSFT being a non-polynomial theory.For the lack of better formulation and the absence of analytical techniques that can overcome these difficulties,1 the most straightforward way to progress appears to truncate the theory to some order/level and perform numerical computations.It is possible such approach will yield results sufficiently close to the results of the full theory, just like it does in open string field theory [25][26][27][28][29][30][31][32][33][34].
In the past, Nicolas Moeller has taken this approach to CSFT first by truncating classical bosonic CSFT (in the minimal-area parametrization) up to quartic order [8], then up to quintic order [13][14][15], and he numerically calculated the tachyon potential and its minimum by level truncation.Even though there has been some progress this way, the fate of closed string tachyon condensation in the bosonic CSFT remains unclear; indicating including terms from higher orders/levels (and understanding their interplay) may be necessary to produce precise results.
In order to truncate the classical bosonic CSFT up to sextic and higher orders, it is necessary to solve the geometry of string contact interactions on six-and higher-punctured spheres.This involves finding Strebel quadratic differentials, 2 obtaining their associated local coordinates to calculate off-shell string amplitudes and finally finding the relevant sub-region (so-called vertex region) in the moduli space of the punctured spheres where the moduli integration has to be performed.Performing all of these numerically was feasible for four-and five-punctured spheres using classical numerical methods, such as Newton's method.However, they seem to fall short and become unfeasible when there are six or more punctures.The basic roadblock is that the equations one needs to solve begin depending on the shape of the so-called critical graph of Strebel differential, which is impossible to obtain without knowing the Strebel differential itself.This informs us that the numerical methods have to be modified in a way that the algorithm should produce Strebel differentials without referring their critical graphs a priori.
In this paper we precisely do this by representing Strebel differentials on four-punctured spheres as an artificial neural network.We obtain such network by performing unsupervised learning using a custom-built loss function which gets minimized when a quadratic differential is Strebel.Such loss function is, by construction, agnostic of the critical graph and this is the reason it overcomes the hurdle mentioned above.Machine learning algorithms, such as the one we use, have been already found its place in string theory, from exploring the landscape of string theory [36][37][38][39] to obtaining information regarding Calabi-Yau manifolds [40][41][42][43][44][45][46][47][48] and its overall relation to quantum field theory [49][50][51].In particular, we would like to point out that our algorithm has been partially inspired by the methods described in [46,47].For a recent review on the applications of data science to string theory see [52].
We note that four-punctured sphere is the first case leading to a non-trivial string contact interaction and reemphasize it has been already solved by Moeller in [8].Here we just use it as a test ground for our ideas.Even so, since we obtain Strebel differentials as a neural network, our approach is an improvement as once the network is properly trained it can be used to find the Strebel differential for any four-punctured sphere practically immediately.It is also philosophically different from Moeller's approach as we are solving for the function itself, instead of just finding the solutions at specific moduli.While one can use a (polynomial) fit to approximate the function, neural networks are more flexible and expressive.In particular, they may be used for non-parametric regression [53,54] and they can extrapolate outside the training region [55,56].Let us further note that Moeller stressed that he did not succeed in finding a simple fit in the case of 5-punctured spheres [13].
We make sure every step of our algorithm is manifestly independent of the number of punctures.In subsequent work [57], we plan to characterize the string contact interactions for higher-punctured spheres, where benefits of constructing the algorithm this fashion would be apparent.Once the Strebel differential is obtained, we can find the local coordinates by expanding it around the punc-tures.This alone doesn't specify the so-called mapping radii, but one can easily solve it for by numerically evaluating a specific integral [5,8].In this step we make an observation turning this calculation independent of the critical graph as well.
On top of the local coordinates, we also need to solve for the region V 0,n ⊂ M 0,n where the moduli integration has to be performed.Here M 0,n is the moduli space of n-punctured spheres, while V 0,n is so-called vertex region, implicitly determined by taking the lengths of all non-contractible curves greater than or equal to certain value in the metric associated with Strebel differential [1,[58][59][60][61], which we take 2π by convention. 3These lengths can be computed given Strebel differential and once computed, we can generate a data set to train a neural network to distinguish punctured-spheres in V 0,n from those outside.That is, we can train a network for the indicator function that outputs 1 if the surface is part of the vertex region and 0 otherwise.In this work we obtain the indicator function Θ 0,4 : M 0,4 C \ {0, 1} → {0, 1} defined by, as a neural network.Here ξ denotes the moduli.This allows us to replace and it simplifies the moduli integration in practical terms by eliminating the need for describing the region V 0,4 explicitly.We argue an analogous construction for the indicator function would differentials on 4-punctured spheres when all punctures are real and introduce a loss function for Strebel differentials which forms the central part of our algorithm.In section 3, we describe the specifics of our neural networks and their training.In particular we show that the trained network has learned the relevant symmetries of the Strebel differential and produced the analytic results correctly.Moreover, we show that our results are consistent with the fits provided by Moeller in [8].
In the last section we conclude our paper and discuss possible future directions, especially we argue that scaling the algorithm to higher-punctured spheres is expected to be feasible.In appendix A and B, we provide some details on numerical evaluations in our work.

The geometry of string contact interaction
In this section we review Strebel differentials, local coordinates, mapping radius, and the indicator function.For more details, reader can refer to [5,6,8,35].The novel features in this sections are complete analytic characterization of Strebel differentials for 4-punctured sphere when all punctures lie on a great circle and introduction of a loss function for Strebel differentials as well as few simplifying observations on the calculation of mapping radii.

Strebel quadratic differential
Imagine a n-punctured sphere Σ 0,n , with punctures placed at P = {ξ 1 , . . ., ξ n } assuming none of the punctures are at infinity.We always fix the positions of the last three punctures to pre-determined positions by appropriate PSL(2,C) transformations.We are interested in quadratic differentials that have a double pole at each punctures with residue equal to −1. 4 In general they can be written as where c i ∈ C, i = 1, . . .n are a priori undetermined variables which we call accessory parameters.These are unconstrained and they are only going to be fixed upon demanding (2.1) to be a special type of quadratic differential, Strebel differential.The double pole structure with residues equal to −1 can be argued by demanding the metric associated with quadratic differential ds = |φ(z)||dz| (so-called ϕ−metric) is of the flat cylinder of circumference 2π when sufficiently close to a puncture [35].The flat cylinders here corresponds to external strings.
Given we have the punctures at P = {ξ 1 , . . ., ξ n }, the point at infinity z = ∞ has to be regular in general for string contact interactions.Inverting the coordinates by w = 1/z it is easy to see that the quadratic differential ϕ takes the following form around z = ∞ (or w = 0) This leads to following three linear conditions among accessory parameters c i n i=1 Notice this still leaves us with n − 3 undetermined accessory parameters c i .Also notice this explains why we haven't included regular terms in (2.1): it just leads to more singular terms around z = ∞.
Notice there are no undetermined accessory parameters when n = 3.If we place the punctures at P = {0, 1, ∞}, we find the quadratic differential (2.1) to be It is well known that such differential leads to Witten's vertex for closed strings [62,63].
Before we introduce Strebel differentials, let us introduce some nomenclature.Define horizontal trajectory as a path such that ϕ > 0 along it.A critical trajectory is a horizontal trajectory that begins and ends on a zero of ϕ.It is easy to argue n + 2 critical trajectory would emanate from n-th order zero and the orders of zeros would add up to 2n − 4 [35].The union of critical trajectories of ϕ, together with their endpoints, is called critical graph.Strebel differential is then defined as a quadratic differential with double poles of residue −1 whose critical graph forms a non-empty measure zero set and is connected.Horizontal trajectories of such a differential foliate the entire surface [35].An example of the trajectory structure of a Strebel differential is shown in figure 1.

Im z
Figure 1: The trajectory structure of the Strebel differential when punctures are at P = {0, 1, 0.8734 − 0.6242i, ∞} (left).We marked the positions of punctures and zeros by crosses and plusses respectively.The inaccuracies around the zeros are due to evaluating the trajectories as an expansion after (2.18).The critical graph is a tetrahedron whose sketch on CP 1 is shown on the right.
The Strebel differential exists and is unique for every punctured sphere (see Theorem 23.5 in [35]).From this, and the fact that double poles with negative residues give closed horizontal trajectories sufficiently close to the punctures, there exists a set of accessory parameters, unique up to relations given in (2.3), such that the quadratic differential (2.1) is Strebel given punctures at P .Our goal is to find such accessory parameters c i as a function of the position of punctures.
We remark that ϕ−metric associated with Strebel differential ϕ is the metric of minimal-area and it looks like flat cylinders grafted to each other dictated by the critical graph of ϕ [1,35,58,59].However, CSFT in minimal-area parametrization actually demands solving for minimal-area metrics for which the lengths of all non-contractible curves are greater than or equal to 2π for consistency.
While the latter condition certainly fails for Strebel differentials when surfaces are sufficiently close to degeneration, for surfaces that are part of the classical elementary interaction (that is, those in the vertex region V 0,n ) this condition is satisfied by definition.So after finding Strebel differential everywhere on the moduli space M 0,n (or in any region containing V 0,n ), it is possible to map out the vertex region V 0,n by checking the lengths of non-contractible curves.Furthermore, once Strebel differential is known it is possible to find the local coordinates characterizing the geometry of n-string contact interaction.We explain these in subsequent subsections.
We finally note that for the quadratic differentials on the surfaces outside the vertex region, that is, those in the so-called Feynman region F 0,n = M 0,n \ V 0,n , Strebel differential is not the right type of quadratic differential from the perspective of CSFT: we have to use a quadratic differential of the form (2.1) whose associated metric is of minimal-area under the condition that the lengths of all of its non-contractible curves is greater than or equal to 2π. 5 We call such differentials Zwiebach differentials (see Theorem 3.2 in [59]).In the vertex region V 0,n the definition of Zwiebach differentials coincides with the definition of Strebel differential, but in the Feynman region F 0,n they differ: the critical graph of Zwiebach differential becomes disconnected [59].Geometrically this corresponds having internal cylinders corresponding to string propagators.In fact, Zwiebach differentials are examples of more general type of differentials called Jenkins-Strebel (JS) differential for which the critical graph forms a non-empty measure zero set but not necessarily connected.Zwiebach differentials can be shown to exist and be unique (see Theorem 5.1 in [59]).We emphasize that the accessory parameters for Strebel and Zwiebach differentials are distinct functions of the moduli in the Feynman region F 0,n .In this study, we only consider Strebel differentials. 6

Complex length and loss function
It is hard to work with the definition of Strebel differential given in previous subsection.Here we provide an equivalent characterization more amenable to analytical and numerical investigations.Begin with defining the complex length between zeros z i , z j of ϕ as (2.5) The path of integration here is chosen so that it avoids any branch cuts.Since the branch structure of φ(z) is hard to keep track numerically, we are going to replace the square root with the continuous square root ± √ , just like in [8].That is, we are going to define the domain of square root in the double cover of complex plane sans the origin, where it is holomorphic.This would make the overall sign of the continuous square root, hence complex length, ambiguous.But, as we shall see, the problems of this sort would be of technical natural and they can be easily overcome.More details on numerical evaluation of continuous square root is given in appendix A.
With such replacement the complex length is now taken to be (2.6) 5 Even though Strebel differentials in Feynman region do not seem to be relevant for CSFT, we would like to point out that it has found applications in worldsheet approaches to AdS/CFT correspondence, see [64][65][66][67]. 6Although see section 4 for the discussion on how ideas here can be extended to obtain Zwiebach differential as a neural network.
Assuming the integrand doesn't vanish on the path of integration and the path is non self-intersecting, the integrand is holomorphic around some neighborhood of the path and the integral is equal to (2.5) up to overall sign (with appropriate choice of branch cut for √ ).Hence we can deform the path of integration for (2.6) freely without changing the value of the integral as long as the endpoints are fixed, the path doesn't cross any punctures and/or intersects itself.Therefore, for convenience, we evaluate the integral in (2.6) always on the straight line The details on numerical evaluations the complex length can be found in appendix A.
But notice, regardless of where the path lies relative to the punctures, the (absolute value of) imaginary part of the complex length would be the same.To see this, just note that deforming the path of integration over the puncture would pick up the residue of ± φ(z) and this is always purely imaginary by having a residue equal to −1 for the double poles (2.1).This makes the shift real and equal to 2π, leading to no change for the imaginary part of the complex length.This reasoning implies the complex length is real for Strebel differentials, as we can deform the path of integration (2.7) to a critical trajectory between z i , z j , which has ϕ ≥ 0. This makes the integrand equal to the line element ds = |φ(z)||dz| up to sign and manifestly real.We remark that the (absolute value of) complex length may not always give the geodesic distance between zeros z i , z j in the ϕ−metric for Strebel differentials due to the sign ambiguity and the placement of the punctures relative to the path of integration.
Above we essentially provide a necessary condition for a quadratic differential (2.1) to be Strebel: if ϕ is Strebel then the complex length is real between all zeros of ϕ.In fact the other direction is true as well.That is, if z i , z j are zeros of a quadratic differential of the form (2.1), φ(z i ) = φ(z j ) = 0, then we have (2.8) In order to argue for the sufficient condition, we just have to show the integrand of the complex length, ± φ(z)dz, is real throughout some path among all zeros and then its square, φ(z)dz 2 would define a Strebel differential as it is going to be a single-valued quadratic differential of the form (2.1) and its critical graph would be measure zero and connected.This is easy to accomplish, as we can deform the path between each zeros to the path that would make the integrand real by beginning from one zero and moving in the direction that sets the imaginary part to zero.Since Im( (z i , z j )) = 0 for all i, j, we would be guaranteed to hit another zero after this procedure and this makes φ(z)dz 2 a Strebel differential.
The condition (2.8) gives an alternative formulation for Strebel differentials.In fact this is the condition solved by Moeller using Newton's method [8,13,15].Observe that the existence and uniqueness of Strebel differentials translates to the existence and uniqueness of the solution of the equations in the right-hand side of (2.8) in terms of accessory parameters up to relations in (2.3).Note that there are 2n−4 2 distinct equations in the right-hand side of (2.8).However, it is actually sufficient to demand vanishing of dim(M 0,n ) = 2n − 6 imaginary parts of complex lengths by dimensional counting.This shows the set of equations in the right-hand side of (2.8) is in fact over-determined.Now, define the following function of quadratic differentials of the form (2.1) motivated by the conditions in the right-hand side of (2.8) (2.9) Here i, j = 1, . . ., 2n − 4 runs over the zeros of ϕ (accounting degeneracy) and the overall factor is for normalization.We call this function loss function for reasons that is going to be apparent in section 3. Observe this function can be unambiguously evaluated using the integral (2.6) and taking the path of integration to be the straight line (2.7): as we square each imaginary part and their sign ambiguity becomes irrelevant.By construction we have L n ≥ 0.
The loss function (2.9) has a unique global minimum as a function of accessory parameters c i (up to relations (2.3)) at fixed positions of punctures ξ i by the existence and uniqueness of Strebel differentials and its value is equal to zero.So, it is in principle possible to obtain Strebel differentials by minimizing the loss function (2.9) in the space of accessory parameters given the positions of punctures.This optimization problem is perfectly suited to machine learning and it is how we are going to approach finding Strebel differentials in section 3.In particular, the advantage of this approach is clear from the fact that the loss function constructed out of (2.1) is totally agnostic of the shape of critical graphs, which made the previous approaches to solving Strebel differential slightly convoluted as we mentioned.
It is an interesting question whether the loss function (2.9) has another extremum.Our experimental investigation in the case of 4-punctured sphere informs us that even if there is, it hasn't made an appearance in our algorithm.So we assume there is no other extremum of the loss function (2.9) for all intents and purposes.It may be interesting to rigorously establish this is the case.

Strebel differential on 4-punctured sphere
Since we are going to test our algorithm for 4-string contact interaction, let us focus on Strebel differentials on 4-punctured spheres more.Begin with placing punctures at P = {0, 1, ξ, ∞} by performing P SL(2, C) transformation.We see ξ here is the moduli.Since there is single accessory parameter after solving the modified version of conditions in (2.2) when one of the punctures is at z = ∞, it can be shown that the quadratic differential (2.3) can be put into the following form: where a = a(ξ, ξ * ) is the single accessory parameter and z i , i = 1, 2, 3, 4 are the zeros of the quadratic differential. 7As we have emphasized earlier, finding Strebel differential is equivalent to finding the function a = a(ξ, ξ * ).
There are certain symmetries the accessory parameter a = a(ξ, ξ * ) enjoys.These are: • The involution symmetry.The complex conjugate of Strebel differential on the surface Σ 0,4 would be the Strebel differential for the conjugated Riemann surface Σ * 0,4 .That is, if the accessory parameter corresponding to the moduli ξ is a, then the accessory parameter corresponding to the moduli ξ * is a * .In particular this shows for ξ ∈ R, we have a ∈ R. Clearly, similar symmetry holds for n-punctured spheres.
• P SL(2, C) symmetries permuting {0, 1, ∞}.There are 6 such transformations but they are generated by the following two transformations .11)and the position of the puncture at z = ξ as well as the quadratic differential changes accordingly.These transformations shouldn't change a quadratic differential being Strebel, so it can be shown performing (2.11) induces following transformations for the moduli and the accessory parameter of a Strebel differential (2.12) These symmetries can be generalized to higher-punctured spheres in an obvious fashion.
We can use these symmetries to solve for the accessory parameter for certain values of ξ.For example, for ξ = 1/2 + ( √ 3/2)i = e iπ/3 we have 1/ξ = 1 − ξ, which shows In fact the symmetries fix the critical graph to be a regular tetrahedron whose sides have lengths equal to 2π/3 in this case [6].We can also find the Strebel differential when ξ = 1/2.The moduli satisfies ξ = 1 − ξ so we have a = −a + 4, which shows a = 2.This subsequently shows a = 4, 0 for ξ = 2, −1 respectively.Furthermore, it is actually possible to find the Strebel differentials for the moduli between 0 < ξ < 1.Our claim is a = 4ξ for 0 < ξ < 1.
In order to argue for this, recall that the critical graph of a Strebel differential on 4-punctured sphere has to be topologically planar tetrahedron on z−plane in general [6,60]-one example has been already shown in figure 1.Now recall that the critical graph in the complex conjugated surface has to be the mirror image of the original graph around the real axis by the involution symmetry.Combining these two facts and taking ξ ∈ R, we see the mirror image of the critical graph has to be itself.However for planar tetrahedral graphs this is impossible: the graph should degenerate.That means at least two of the zeros coincide and the other two zeros coincide as well by the involution symmetry.Pair of double zeros zero emanates 4 critical trajectories each now.
In the language of quadratic differentials that means we now have a pair of double zeros that are complex conjugates of each other.That is, Strebel differential has to take the form when ξ ∈ R. Comparing with the form in (2.10), solving for a, and using the fact that a = 2 when ξ = 1/2 it can be shown that a = 4ξ for 0 < ξ < 1 after some algebra.Using the symmetries in (2.12) we can further show a = 4 for ξ > 1 and a = 0 for ξ < 0.
The situation when punctures collide is more subtle.Demanding continuity of a suggests we should have a = 0, 4 for ξ = 0, 1.Indeed, if this is the case, we see the quadratic differentials become (2.15) for ξ = 0, 1 respectively.But notice these are the quadratic differentials describing an infinite flat cylinders whose punctures are at z = 1, ∞ and z = 0, ∞ respectively.This is exactly what it should be expected from the degeneration of Strebel differentials when we take ξ → 0, 1: the residue condition forces having a single cylinder.The situation at ξ = ∞ is similar to ξ = 0, only difference being that we have to perform this calculation after inversion z → 1/z.In particular, it can be shown that the limit doesn't depend on which value of a we use (0 or 4).Summarizing, we find the following expression for the accessory parameter a when the moduli is real: The plot of this function is shown in figure 8.
With the accessory parameter is available for Strebel differentials when ξ ∈ R, we can find the lengths of the sides of the critical graph as a function of the moduli.In order to do that recall the length of the geodesics homotopic to a puncture is always equal to 2π.Then, since the critical graph is degenerated in the way described above, we only need to calculate the length of a single side, , and the lengths of the other sides would be either equal to this or 2π − .We can find by carefully evaluating (2.6) using (2.10) with the accessory parameter a given in (2.16) for ξ ∈ R. The result is It is clear that the limits make sense from this expression.For example for ξ = 1/2, we have = π and this can be alternatively argued by the symmetry of this case.In fact, notice 2 and 4π − 2 are the lengths of the non-contractible geodesics non-homotopic to punctures and they are equal to 2π only when ξ = −1, 1/2, 2 and less than 2π otherwise.In other words, these are the only real moduli that are in the vertex region V 0,4 .This result is consistent with [60].
Before closing off this subsection, we note that our argument for degenerate Strebel differentials on four-punctured spheres may not generalize to every type of degeneration of higher-punctured spheres.This is mostly due to the symmetry of the critical graph when all (or some) moduli taken to be real may not be as restrictive as it does for the four-punctured spheres.Still, it may be possible to find exact solutions for specific type of degenerations.Since we don't need them immediately, we plan to investigate the degeneration behavior in more detail in our upcoming work [57].

Local coordinates, mapping radii, and vertex region
Calculating off-shell string amplitudes on any Riemann surface requires a choice of local coordinates up to an overall phase around the punctures [1].Our case of interest, the local coordinates for the n-string contact interactions, can be obtained using Strebel differential on n-punctured spheres as they are described through how n flat semi-infinite cylinders are grafted at the critical graph of Strebel differential [5,8,58].Following the conventions of [8], this means one can find n analytic maps h i of the form to n-punctured sphere for which the Strebel differential takes the form it takes for the flat cylinders in w i coordinates and the unit circles |w i | = 1 are mapped to its critical graph.Here w i would be the local coordinates for the string contact interactions (sometimes called natural coordinates) for which vertex operators are inserted.It can be shown that such coordinates always exist [35].
Notice how we have organized the expansion (2.18).This was because of convenience: as the overall phase of the local coordinates is irrelevant for CSFT we chose ρ i ∈ R without loss of generality and we defined the rest of the coefficients accordingly.Here Our primary goal is to find the maps (2.18), i.e. to find d i,α and ρ i .Except for the mapping radius, the coefficients in the expansion (2.18) can be found by expanding the Strebel differential around the puncture z = ξ and setting it equal to (2.19), along with using the expansion for z = h i (w i ) in (2.18).Comparing term by term in w, we can solve d coefficients in terms of b coefficients.First few terms are Note that the b coefficients, therefore d coefficients, are determined by the accessory parameters c i , so knowing the latter is sufficient to construct the maps z = h i (w i ) up to mapping radius.For example, we see b Finding mapping radii associated with the punctures takes more effort.To that end, begin by writing the local coordinate around z = ξ i by equating (2.1) and (2.19) as [5] where z c is some point on the critical trajectory surrounding the puncture z = ξ i and the path of integration here is the straight line from the puncture to z = z c .The lower bound of the integral makes sure |w i | = 1 when z lies on the critical trajectory surrounding the puncture, as the integral just evaluates to real number in this case by repeating the arguments made below (2.7).We implicitly adjust the choice of sign of the exponent to guarantee Here we make a small observation that has apparently gone unnoticed in the literature.We demanded z c to lie on the critical trajectory surrounding the puncture z = ξ i above, but actually this can be relaxed and one can choose z c to be lying anywhere on the critical graph.To argue for this, let z c to be any point on the critical graph and notice the integral in the exponent of (2.22) can be deformed as shown in figure 2. But then the contribution to the integral from this "outside" part becomes real following a similar argument made below (2.7) and this just results in an irrelevant phase for the local coordinate.In particular, notice that we can take z c to be any zero of the quadratic differential.With this choice, the dependence of the local coordinates (2.22) (and in extension, the mapping radii (2.23)) to the shape of the critical graph drops out.Using (2.22) we can obtain an integral expression for the mapping radii.It is [5] log Note that such limit exists with our choice of sign in the exponent of (2.22) and assuming ξ + lies on the straight path from the puncture at z = ξ to z = z c .Details on numerical evaluation of such an integral are relegated to appendix A. In passing we note that it is possible to obtain local coordinates for 4-punctured spheres analytically when ξ ∈ R.However, as we have stated earlier, these surfaces are not relevant from the perspective of CSFT, so we opt out reporting them here.
Calculating off-shell string amplitudes on Riemann surfaces not only requires a choice of local coordinates around the punctures, but also suitable choice of vertex region in the associated moduli space [1].Instead of trying to describe the vertex region explicitly we can consider its associated indicator function.Such function is already defined in the introduction in (1.1) for the case of 4-punctured spheres.Here we give a general definition for the case of n-punctured sphere The criteria for ξ ∈ V 0,n is having all non-contractible curves in the ϕ-metric to be greater than or equal to 2π [1,[58][59][60].For the case of ϕ-metric this means that it is sufficient to check the lengths of the critical trajectories separating 2 or more punctures from the rest, i.e. geodesics that are not homotopic to a puncture.
These lengths can be computed by finding the geodesic lengths between each zero of Strebel differential then combining them up suitably.Since the critical graph of a Strebel differential is an undirected graph and we can assign geodesic length to an edge of such graph, it is useful to arrange the associated data into (weighted) adjacency matrix M as follows: Here M is a symmetric (2n−4)×(2n−4) matrix as there are 2n−4 zeros of ϕ (including degeneracy).
Elements in the diagonal are zero (hence M is traceless) and each row and column contains at most 3 non-zero elements as zeros of ϕ emit 3 critical trajectories for a generic moduli.Notice there are certain nonzero co-dimension loci in the moduli space where some zeros of ϕ may coincide.In this case we would set the elements of M corresponding to their connections to zero.
Once such matrix is constructed, it is a simple matter to extract the length of all non-contractible curves: this is what we have done in the case of n = 4.But notice for the purposes of the indicator function (2.24) we just need to check the length of the shortest non-contractible geodesic and this can be found with relative ease given M , such as using Dijkstra's shortest path algorithm [68].
So the only really contention here is to find the lengths associated with each edge.This can be done by solving the critical trajectories and calculating their lengths.Recall a critical trajectory is a horizontal trajectory begins and ends on a zero of ϕ.So given a zero, it is possible to construct a critical trajectory emanating from it by taking small steps so that ϕ > 0 at each step, until we hit another zero.While we do this we can add the line elements ds = |φ(z)||dz| and that would generate the lengths, hence the adjacency matrix M , we are looking for.
In passing, we note that the condition of having all non-contractible curves greater than or equal to 2π is equivalent the length of each edge of the critical graph to be smaller than π for the case of 4-and 5-punctured spheres [60].This fact has been exploited in previous works by Moeller [8,13].However this condition is not sufficient for higher-punctured spheres, so we opt out to use the generic method described above to make the algorithm manifestly independent of the number of punctures.

Neural networks for accessory parameter and indicator function
In this section, we describe the neural networks for the accessory parameter a = a(ξ, ξ * ) and the indicator function Θ(ξ, ξ * ) in the case of 4-punctured sphere.We show the accessory parameter neural network has successfully learned the analytic behavior for the real moduli described in (2.16) and the symmetry properties described in (2.12).We emphasize these behaviors haven't been explicitly programmed into our neural network -they appear as a consequence of the training process.We additionally compare our result for the accessory parameter with the polynomial fit provided by Moeller [8] and observe a good agreement between our results.
Similarly, we test the indicator function neural network by plotting the vertex region V 0,4 in the moduli space M 0,4 and show our results are consistent with those in the literature [6,8].In particular, even though the indicator function neural network outputs values between 0 and 1, we observe a sharp transition from Feynman region to vertex region and it is almost always 1 or 0as it should be the case for the actual indicator function in (1.1).
Lastly, we compute the 4-tachyon contact term in the closed string tachyon potential by performing moduli integration over the vertex region using both trapezoid and Monte-Carlo methods.As mentioned in introduction, we get a good agreement with the results in the literature, providing extra support for our method based on machine learning.

Accessory parameter neural network
Artificial neural networks are computing systems inspired by biological neural networks.They consist of number of layers and each layer consists of number of nodes.A node in a given layer is connected to the nodes in the previous and subsequent layer.An example of a neural network we consider in this work is shown in figure 3.At each node, based on the input received from the nodes in the previous layer, a mathematical operation is performed.More specifically, if we denote the collection of inputs received from the (i − 1)-th layer containing n i−1 nodes to a node in the i-th layer containing n i nodes as a column vector a (i−1) of length n i−1 , the nodes in the i-th layer would perform non-linear transformation and transmit this to the nodes in the (i + 1)-th layer.Here W (i) is a n i × n i−1 matrix, b (i) is column vector of length n i , and the function σ is some non-linear function called activation function.In the operation above the function σ acts on column vectors element-wise.The collection of all W (i) and b (i) for all layers is called weights and bias respectively and we collectively denote them by W and b. Figure 4 summarizes this procedure.It can be shown that artificial neural networks can approximate class of arbitrarily complicated continuous functions [69][70][71][72][73] for which accessory parameters as a function of moduli are expected to belong. . . .
The summary of mathematical operations performed by artificial neural networks.
We are interested to approximate the collection of accessory parameters c 1 , • • • , c n−3 uniquely specifying the Strebel differentials on n-punctured spheres as a function of moduli ξ 1 , • • • , ξ n−3 using neural networks such as the one shown in figure 3. Since this problem is inherently about complex numbers, we take weights and bias to be complex numbers and use complex neural networks [74][75][76].For similar reasons, we take the activation function σ to be complex exponential linear unit (CELU ), which is defined for u ∈ C by where ELU is the usual exponential linear unit activation function defined for x ∈ R as Here α is a hyperparameter of the network, and the activation function becomes ReLU for α = 0. See appendix B for more details on the architecture of the network.
We have implicitly fix the positions of 3 punctures (ξ n−2 , ξ n−1 , ξ n ) using P SL(2, C) transformation already.We pick these fixed punctures to be at ξ n−2 = 0, ξ n−1 = 1, and ξ n = ∞.Moreover, we solved for three accessory parameters in terms of other parameters and the moduli using (2.3).We did these out of convenience for numerical calculations and in order to have a unique answer after training.In terms of 4-punctured sphere, this means we can use the parametrization given in (2.10).In this case the network inputs the position of the unfixed punctures (moduli) ξ and outputs the accessory parameter a = a(ξ, ξ * ).
In order to approximate accessory parameters using neural networks we need to adjust the weights W and bias b of the network appropriately.Per usual in machine learning, this can be done using iterative (stochastic) gradient descent in the space of weights and bias based on an appropriate averaged loss function.Averaging here is made over finite number of points in the moduli space called training set S, an example of which is shown in figure 5.At the end of the gradient descent we end up in some local minima of the averaged loss function in the space of weights and bias.The hope is that the resulting network from such local minima would be generic enough to approximate the behavior of accessory parameters, not only for the points in S, but everywhere on a subset of M 0,n containing the training set S. If this is the case, we say the neural network learned the accessory parameters.Before we delve into the specifics of the averaged loss function, let us describe the training set S. For us, the set S consists of random collection of points uniformly sampled over the moduli space M 0,n excluding the regions where punctures are about to collide.We call this region training region.The exclusion condition here is to make sure that the learned behavior for the accessory parameters doesn't get affected by the degeneration behavior of surfaces as our numerical evaluations get unreliable for them. 8Recall that we are not interested in Strebel differentials on surfaces arbitrarily close to degeneration in the view of CSFT, as it is sufficient to obtain Strebel differentials on the vertex region V 0,n which doesn't contain such surfaces by construction.Hence, as long as we guarantee the training region to cover the vertex region V 0,n and successfully train, we are supposed to be able to get all the geometric data relevant to CSFT.
As mentioned earlier, gradient descent should be performed based on an appropriate averaged loss function.In the case at hand, this is constructed by averaging the function (2.9) over S It is useful to comment on the dependence of indices here.As we have indicated in (2.9), the loss function depends on the quadratic differential ϕ, which in turn determined by the choice of accessory parameters (collectively denoted as c) and moduli (collectively denoted as ξ), that is ϕ = ϕ (c, ξ).
But notice that the accessory parameters c are determined by the parameters of the network (weights and bias), hence we have c = c(W, b).When we average over the points in S, we see the averaged loss function indeed has the dependence shown in (3.4).Note that we have not provided any labels for the points in the training set S. So in essence we perform unsupervised learning for the accessory parameters using the loss (3.4).
Now we have all the ingredients to train an accessory parameter neural network for 4-punctured spheres.Before we give an example of a training run, let us summarize our strategy to confirm our results.We can test how well the network performs by investigating the loss function L 0,4 over the training region.This involves sampling two new sets of points over the training region, called validation set if it is used during training or hyperparameter optimization and test set for subsequent calculations.If we observe the loss is small not only for the training set, but also over the validation/test sets, we conclude the network interpolates and declare it has learned the accessory parameter successfully over the training region.We evaluate the loss function to be 8.5 × 10 −14 for the exact solution for ξ = e iπ/3 given in (2.13), so we see there is a scale associated with the loss function and its smallness indeed characterizes how close we are to Strebel differential.
An example of a training based on (3.4) is shown in figure 6 and some of its statistics shown in tables 2 and 3.9 This particular network has 3 hidden layers with [512, 128, 1028] nodes each respectively.Training was performed in Python using Google Jax [77] by sampling 10 5 points in the training region shown in figure 5. We confirm our results with the test set, but also evaluated the loss on the training and validation sets for comparison, and find that they are small and have the same order of magnitude, indicating that the network doesn't overfit and interpolates other points in the training region.Expanded details on the training and the architecture of network can be found in appendix B.    "best NN" produced.The last point is solved using Newton's Method and is taken from [8].
Observe from figure 6 that almost all points in the training region have relatively small loss except for few outliers.In fact, by plotting the behavior of the loss function over the training region (figure 7) we see these outlier points primarily lie close to the real line -the region we don't need for CSFT.The reason for this behavior is actually clear: when the moduli is real, some of the terms in the sum given in (2.9) becomes zero and some of them imposes the equivalent condition as the critical graph degenerates.So relative to other points on the moduli space, the loss function close to the real line is less constrained leading to relatively larger loss.Even if the network relatively underperforms for the real moduli, it correctly generates the analytic behavior described in (2.16).This, along with the behavior of the accessory parameter over the training region, is shown in figure 8.
Already from figure 8, it is apparent that the network has learned the involution symmetry of a.We can quantify this, along with the shift and inversion symmetries given in (2.12) by comparing network's result for a pair of moduli related by symmetry.In order to do that, define the error by where g represents the symmetry transformations.For example, g(ξ) = ξ * and g(a) = a * for the involution symmetry.Figure 9 shows the distribution of g for points sampled over the training region for three distinct symmetries.As one can see, the errors are quite small and we can conclude the network has learned the symmetries of the accessory parameter without being explicitly programmed.
Finally we compare our network's result for the accessory parameter with the polynomial fit provided for a = a(ξ, ξ * ) by Moeller [8].Again, we define the error between our results as  where a Moeller is given by the equation (6.9) in [8].Since the fit in [8] is provided for a subset of the vertex region, we only consider the errors for the points sampled in this subset. 10Again, we see our results and Moeller's fit are consistent with each other from figure 9.
We have listed various evidence for our approach above and they show using machine learning to solve for accessory parameters is sound and the results one gets this way are consistent with the exact solutions as well as with the literature.We have worked with 4-punctured sphere, but we emphasize everything in this subsection admits a trivial generalization to higher-punctured spheres in principle.Thus the results here should be viewed as a proof of principle.
Before closing off this subsection, we note that the trained network always interpolates in the training region, but one can ask whether it extrapolates to outside of the training region.We observed extrapolation of our networks is not quite as good as their interpolations as it is already somewhat evident from figures 8. However we also observed that the better the network extrapolates, the better our results become.So if we would like to specialize to networks among the trained networks, it seems reasonable to us doing this based on how well the network extrapolates for the real moduli and discard the rest of the runs. 11This further defines the "best NN" for us: it is the network that extrapolates farthest away on the positive real line and it was the one we chose to report here.In higher-punctured spheres analogous procedure can be repeated by investigating specific degeneration limits.

Indicator function neural network
As we have described in previous subsection, it is possible to obtain accessory parameters as a neural network.Once such representation is in our possession we can solve for the local coordinates and mapping radii as described in section 2. So all it remains for constructing classical CSFT action is to solve for the explicit description of the vertex region V 0,n over which the moduli integration has to be performed.In this subsection, we train a neural network for the indicator function for the vertex region V 0,4 , which has already been defined in (1.1).This provides an explicit characterization after (1.2).Again, we emphasize that the methods here can be trivially extended to the situation in higher-punctured spheres.
We train the indicator function neural network by performing supervised learning.In order to do that, we begin by uniformly sampling points over the training region.However, unlike before, we label these points based on whether they are in the vertex region or not.Remember, a point in the moduli space is in the vertex region if and only if all non-contractible curves in its associated ϕ-metric has length greater than or equal 2π.Since the accessory parameter is known, we can compute these lengths using the method described in section 2 and use them to label points: 1 if all such lengths are greater than or equal to 2π and 0 otherwise.Randomly sampled points in the training region, together with their labels, would form the training set S is given by Now the problem of solving for the indicator function (2.24) transforms into a binary classification problem.In the view of this, let us call the indicator function neural network Θ (N N ) 0,n .It inputs the moduli and outputs some value between 0 and 1, i.e.Θ is a probability distribution for a given point in M 0,n to be element of the vertex region V 0,n .In any case, we are going to observe transition from 0 to 1 is relatively sharp when n = 4.We comment more on this below.

The network Θ
(N N ) 0,n can be trained to learn the indicator function for V 0,n after performing gradient descent in the space of its weights and bias using the cross-entropy loss Like in the accessory parameter neural network, we are going to focus only on four-punctured spheres as a testing ground.
The training curve, along with the progression of accuracy during the training, for Θ is shown in figure 10.For this particular network, we have used the training set S constructed using the best NN.Again, the behavior we obtain was generic and such a choice was purely for the presentation purposes.We used 10 5 points for training.The weights and bias of this networks was   chosen to be real and we input the complex moduli as a two-dimensional vector.We confirmed our results by checking the loss, as well as accuracy, for both training and validation sets.We have achieved 99.34% accuracy for the training set, 99.27% for the validation set, and 99.68% for the test set. 12We observed no overfitting as is evident from figure 11.These results show the training was successful.Expanded details on the training and the architecture can be found in appendix B.
Figure 11 shows the probabilities Θ produces.Note that the shape of the region V 0,4 shown in 11 is consistent with the literature [6,8].Moreover, we see the transition from 0 to 1 is quite sharp: Θ provides a good approximation for the indicator function.Since this is the case, we declare a point in M 0,4 is an element of V 0,n when the generated probability is greater than or equal to 1/2.That is, we declare our indicator function to be where H(x) is Heaviside step function.More quantitatively, we can compare the fit provided by Moeller for the boundary of the vertex region ∂V 0,4 restricted to Re(ξ) ≤ 1/2, Im(ξ) ≥ 0, and |ξ| ≥ 1 (equation (6.5) in [8]) with the corresponding curve we obtain, i.e.Θ (N N ) 0,4 (ξ, ξ * ) ≈ 1/2 restricted in similar way.This is shown in figure 12. Again we see a perfect agreement between our results.

Off-shell 4-tachyon contact term
With an explicit description for the vertex region in terms of (3.9), we now have all the ingredients to find terms in the classical closed string tachyon potential.This is given by [6,8] Here dots represents the terms involving fields other than the tachyon t.We are only interested in v 4 whose expression can be read from where Θ 0,n is the indicator function for V 0,n and ρ i 's are the mapping radii associated with punctures.
The convention for the measure is d 2 ξ = d(Reξ)d(Imξ).Derivation of (3.11) can be found in [6].Note that everything in the integrand of (3.11) can be expressed in terms of two neural networks of previous subsections, so all we need to do at this point is to perform this integration over the moduli space.The integrand of (3.11) for n = 4 is shown in figure (12).Our results are already presented in table (1).
The integral is first computed using the trapezoid method along the imaginary and real dimensions, with a grid of 700 × 700 points for Re(ξ) ∈ [−1.1, 2.1] and Im(ξ) ∈ [−2.1, 2.1].In order to assess the stochastic deviations, we have run the full pipeline (training of the accessory parameter and indicator function networks, and computations of v 4 ) 10 times.As we mentioned earlier, we observed that networks which perform best are those which generalize well outside the training region on the real axis (as in figure 8).As a consequence, we kept only the 4 networks whose mean loss for ξ ∈ [2.5, 4] is below 0.1.This allows determining how v 4 varies when we change the random seed, which consists of using different training points, network initializations, and stochastic gradient descent.From the scale of uncertainty of our result for this (v 4 = 72.320± 0.146) we see our algorithm is sufficiently stable and produced results consistent with the literature.For the best NN we report v 4 = 72.396using trapezoid method.
For the best NN, we additionally perform a Monte Carlo integration with 2 × 10 6 points in the vertex region.We report its mean and standard deviation to be v 4 = 72.366± 0.096 by evaluating it 5 times.We stress that we use the sharpened indicator function (3.9) in both methods.Of course, trapezoid method provides a deterministic result for v 4 so one may question the point of using Monte-Carlo integration here.However, we note that the moduli integration would be higher dimensional for higher-punctured spheres for which Monte-Carlo integration would be superior to any deterministic method.Here we would like to imitate that case and see how large the resulting errors was due to integration, while still having a baseline for the expected result.We observe our result for v 4 is still precise sufficiently even if we use Monte-Carlo integration.The convergence of this integral can be further improved using more points or employing importance sampling.

Discussion and future directions
In this paper we have characterized 4-string string contact interaction using machine learning by constructing neural networks for the accessory parameter for Strebel differentials and the indicator function for the vertex region in the moduli space.Doing so allowed us to construct the local coordinates associated with the 4-string contact interaction.We tested our pipeline by computing the off-shell 4-tachyon contact term in the tachyon potential.We obtained a good agreement with the results in the literature.
We would like to emphasize few advantages of using machine learning over traditional numerical methods to characterize the local coordinates in CSFT: 1.The algorithm presented here is manifestly independent of the number of punctures.So it is in principle possible to repeat the similar construction for n-punctured spheres.
2. Having a neural network representation for the accessory parameters would help to explore properties of Strebel differentials and uncover hidden patterns as neural networks are just approximations to the actual function.
3. Building string interactions using machine learning would eventually simplify the technical aspects of CSFT calculations, as all the geometric data needed for this is encoded in the neural networks.The neural network weights and architectures will be made public in the future. 13Of course, providing fits for the relevant functions, like Moeller did in [8], achieves the same goal.However, even in the case of 5-punctured sphere, there is no polynomial fits for all of these functions [13], and even if we are able to construct one, it is a general wisdom that neural network representation of functions are superior to fits.
We want to briefly comment on the precision of our results.Even though we get quite close to the results in the literature [6,8], they differ in the third significant digit and there is still room for improvement.We think this is primarily due to the training for the accessory parameter being not sufficiently precise.Even though we have reached quite low losses during the training, as is evident in figure 6, it is not as low as one would get using Newton's method.This indicates the training precision has to be improved, at least until we reach the same order for the loss as Newton's method [8].Indeed, the accessory parameters are used in rest of the computations so it is crucial to minimize its error as much as possible.
This type of problem is unconventional from the perspective of machine learning since high precision is rarely needed (although see the recent work [78]).The tendency is to decrease the float precision.Having a loss plateau around 10 −7 in the training shown in figure (8) motivates that we need to use double precision float64 instead of simple precision float32.Still, this is not sufficient as the usual optimization techniques have not been designed to handle such scenarios.For instance, both the gradients and learning rate are around 10 −7 at the end of the training which implies that weight updates are effectively frozen.Another instance is that the use of regularization (such as L 2 ) can the loss (2.9) in later epochs.One may try to circumvent these problems by turning off/decaying the regularization and/or using learning rate restart, that is increasing the learning rate to counter-balance the vanishing gradients for later epochs.We have made a preliminary study on using some of these techniques which resulted in smaller loss.We note that the hyperparameter optimization is difficult for these techniques as one needs to train networks over extended periods to find the long-time effects.
In any case, we think such level of precision, at least for the purpose of establishing existence of closed string tachyon vacuum by level/order truncation, won't be needed.Because if the tachyon vacuum happens to be finely-tuned and requires terms in the tachyon potential to be evaluated precisely, the whole procedure for searching the vacuum by truncating the theory won't work; as it would presumably require all orders of CSFT to be considered. 14here are numerous natural directions one can take in future.Here we list some of them seem to us of utmost importance: 1.An obvious direction is to generalize this approach to higher-order string contact interactions which we are currently working on [57].As we have emphasized numerous times throughout the paper, there are no conceptual obstruction doing so, as long as the algorithm, especially the training for the accessory parameters, scales favorably.
Observe that the number of distinct integrals in (2.9) scales as O(n 2 ) and evaluating all of them may slow down the training for higher-punctured spheres.To remedy this problem, consider following modification to the loss function15 Here sum over (ij) means the following.We first construct an ordered list of zeros and only compute the complex lengths between zeros adjacent to each other in this list.It is easy to argue this new function preserves the properties of (2.9).However, compared to (2.9), we use just enough condition to specify Strebel differential at the function's global minimum while being agnostic of the shape of the critical graph.The number of integrals in (4.1) scales as O(n) and this may speed up the training, possibly at the expense of precision due to imposing less condition on the differential.
Scaling the algorithm for higher-punctured spheres may also require using more advanced architectures, for example, by including equivariant layers for the unitary group U (n) and complex translations C n [79][80][81].Furthermore we may also want to represent the local coordinates themselves by a new neural network (and more generally, it would be interesting to understand how to compute conformal maps as neural networks) or use graph neural networks [82] to extract properties from the critical graphs as they become more complicated with increasing number of punctures.
2. We have trained a neural network for the indicator function distinguishing the vertex region from the Feynman region.It is also possible to train a network that would distinguish distinct type of degeneration of punctured-spheres from each other.For 4-punctured spheres, this means we can train a network that takes different values for s, t,-and u-type degenerations.
Such networks allow us to sample points just from the relevant parts of the Feynman region and based on these points it may be possible to train a network for the Zwiebach differentials using the following modified loss function and variations thereof (i.e.z 2 ↔ z 3 and z 2 ↔ z 4 ).Remember Zwiebach differentials have a disconnected critical graph in Feynman region (that is, zeros (z 1 , z 2 ) and (z 3 , z 4 ) no longer connected to each other by a critical trajectory) and this is reflected by eliminating terms such as Im( (z 1 , z 3 )) and replacing them with terms such as (Re( ( Assuming such network can be trained, we can find the local coordinates associated with Feynman diagrams as well.This is interesting for few reasons.First, all off-shell string amplitudes will be characterized using neural networks.But more interestingly, this gives an alternative way to plumbing fixture to obtain such local coordinates.It might be interesting to cross-compare these two methods.
3. Since we approximated functions relevant to the geometry of 4-string contact interaction as neural networks, we can try performing symbolic regression to get an analytic insight into the nature of these functions.In particular, it may be interesting to search for closed form expressions for the accessory parameter and ∂V 0,4 shown in figure 12.
4. Generalizing the ideas in this paper to the case of higher genera, especially using minimal-area vertices, would possibly take a non-trivial effort: it is known not all minimal-area metrics in higher genera arises from Strebel differentials while our loss function is intrinsically about the latter.Still, it may be possible to exploit convex optimization approach to minimal-area problem [83][84][85] to construct a suitable loss function.
Better yet, one may try solving a version of accessory parameter problem appearing in the case of hyperbolic string vertices [22] using machine learning to construct quantum CSFT.In this case the loss function ought to impose the solutions of Fuchsian equation to have a real monodromy on each non-contractible cycle of a Riemann surface at its minimum.We expect the natural loss function for this problem to be independent of the number of punctures and genera.This approach may even provide novel insights to Fuchsian uniformization considering these two problems are continuously connected.
Map. Rad.  the puncture at the origin.Since the mapping radii associated with the origin is invariant under inversion, we obtain the mapping radii associated with the puncture at z = ∞.
The integral (A.11) numerically evaluated using Chebyshev-Gauss method [86] after changing variables t = 1 − x 2 using 500 grid points.Notice that given ρ k , different choices for z i must give the same result as explained below (2.22).We observed this is indeed the case.For example, the mapping radii and their uncertainties are reported in table 4 when punctures are placed at {0, 1, 0.8734 − 0.6242, ∞}.The behavior of the (mean) mapping radii is shown in figure (13).
It was possible to have cancellation errors in the numerical evaluation of the integral (A.11), but having small standard deviation for the results above indicates that it doesn't actually pose a treat to the accuracy of our computation.We indeed observed that it doesn't cause any issue during our evaluations.In fact, we have used the mean value for the mapping radii in our evaluations and since their associated uncertainties are small we opt to not include them into our final result for v 4 .

B Machine learning details
In this appendix, we present more details on our neural networks and their training.We build the training, validation and test sets by randomly sampling 10 5 points on the complex plane, restricted to the disk of radius 2 centered at 1/2 with two disks of radius 0.2 centered at 0 and 1 excised.We use different sets for both networks and each run we sampled new sets.The test sets are left aside until the end of the training to evaluate the performance.The integrals for v 4 are computed using yet another set, a grid for trapezoid method and random points in the vertex region for Monte-Carlo.
Both neural networks are fully connected.Training is performed using the eponymous set employing AdamW and we use early stopping: metrics are evaluated on the validation set at the end of each epoch, and training stops when there is no improvement of the loss (resp.accuracy) using the accessory neural network (resp.indicator function neural network) after 100 epochs.Gradients are clipped by the global norm.We employ the following learning rate scheduler: a warm-up period increases the learning rate linearly from 0 to the base learning rate during a given number of epochs; then, the learning rate is decayed exponentially with some period.The hyperparameters are presented in table 5 and were found using BOHB hyperparameter optimization [87].The statistics for the loss of our runs are shown in table 6

Figure 2 :
Figure2: Part of the path of integration in (2.22) can be deformed to the dotted path.The integration over the dotted path produces a real number, resulting in an irrelevant phase for the local coordinates(2.22).In extension, this part also doesn't contribute to the mapping radii (2.23) below.
there is an edge of length ij begins at ith zero and ends at jth zero 0 otherwise .(2.25)

Figure 3 :
Figure 3: An example of an artificial neural network with 3 hidden layers containing n i nodes each.It inputs the position of unfixed punctures (moduli) and outputs the (independent) accessory parameters.

Figure 5 :
Figure 5: An example of training set S for 4-puncture spheres with |S| = 10 5 .Notice we have excluded small circles centered at 0, 1, ∞ where 4-punctured sphere is close to degeneration and only sampled points from the remaining triply-connected region (training region).

Figure 6 :
Figure 6: Training curve (left) and the distribution of loss over the test points in the training region (right)for "best NN".As one can observe from the curves on left training was successful and the curve on right informs most points have relatively small losses, except for few outliers.

Figure 7 :
Figure 7: The distribution of the loss over the training region.Network underperforms close to the real line.

Figure 8 :Figure 9 :
Figure 8: The behavior of the accessory parameter compared with the exact solution (top left) and its loss (top right) for the real moduli.Notice the exact and trained behaviors are almost indistinguishable and differ slightly only when the moduli is close to 0 or 1 or outside of the training region.Even if this is the case, we see the network was still able to extrapolate away from the training region.The overall behavior of the accessory parameter a = a(ξ, ξ * ) is plotted below.

Figure 10 :
Figure 10: The training curve (left) and the progression of accuracy during training (right).High degree of accuracy is achieved for both training and validation sets.

Table 2 :
The mean, median, minimum, and maximum values for the training loss of the "best NN".

Table 3 :
Comparison of the losses and accessory parameters for previously known solutions with the results

Table 6 :
Mean and standard deviations of the metrics for the different data sets, averaged over 4 runs.The full pipeline takes 294 ± 29 minutes to run on Google Colab.