Reconstructing $S$-matrix Phases with Machine Learning

An important element of the $S$-matrix bootstrap program is the relationship between the modulus of an $S$-matrix element and its phase. Unitarity relates them by an integral equation. Even in the simplest case of elastic scattering, this integral equation cannot be solved analytically and numerical approaches are required. We apply modern machine learning techniques to studying the unitarity constraint. We find that for a given modulus, when a phase exists it can generally be reconstructed to good accuracy with machine learning. Moreover, the loss of the reconstruction algorithm provides a good proxy for whether a given modulus can be consistent with unitarity at all. In addition, we study the question of whether multiple phases can be consistent with a single modulus, finding novel phase-ambiguous solutions. In particular, we find a new phase-ambiguous solution which pushes the known limit on such solutions significantly beyond the previous bound.


Introduction
A crucial ingredient in the S-matrix bootstrap program is the relation between the modulus of an amplitude F (z) and its phase, as constrained by unitarity.An important question is, is it always possible to find a phase given the magnitude of a scattering amplitude, and, if so, is that phase unique?If an algorithm were known to find the phase, then it could be applied to physical data, from the differential cross section, to reconstruct the underlying quantum mechanical amplitude.It is generally believed that the amplitude is uniquely fixed, up to the trivial ambiguity, F (z) → −F (z) ⋆ , provided one has access to the differential cross section across all energies and all angles [1,2]. 1 In a more realistic setting such information is never known, so we can ask how much information about the phase can be deduced from scattering data in a limited range of energies, or even at fixed energy.Even at fixed energy, and even in the elastic scattering regime where only 2 → 2 scattering is possible, the problem of finding the amplitude from the differential cross section is a hard one [3,4,5].For inelastic scattering with an infinite number of partial waves, there can be continuous families of phases with the same modulus [6,7,8]; for inelastic scattering with L partial waves there are at most 2 L phases for a given modulus [9]; for elastic scattering, the number of phase-ambiguous solutions is expected to be at most two [10].The question of how to constrain the phase given the cross section has been around since the 1960s, but not much progress has been made since the 1970s.Given the revitalization of the S-matrix bootstrap program [11], partly inspired by modern computational techniques, we propose to revisit some of the questions using modern tools such as machine learning.We focus here on a clear well-defined problem: given a differential cross-section of a scalar 2 → 2 scattering process in the elastic region (energy below the first inelastic threshold), under what circumstances does an underlying complex amplitude producing it exists and under what circumstances is the amplitude unique?
We focus on the elastic scattering regime at fixed energy.Since energy is fixed, a 2 → 2 amplitude is a function only of the scattering angle θ and we use z ≡ cos θ throughout.We write F (z) for the amplitude, B(z) ≡ |F (z)| for its modulus, and ϕ(z) for its phase.The partial wave decomposition of the amplitude is where P ℓ (z) are the standard Legendre polynomials of spin ℓ.In this notation, unitarity requires Imf ℓ = |f ℓ | 2 for all ℓ. 2 Writing the partial waves as f ℓ = sin δ ℓ e iδ ℓ , unitarity is equivalent to all the phase shifts δ ℓ being real.The differential cross section depends only on |F (z)| 2 = B(z) 2 , so up to trivial kinematic factors the differential cross section and modulus are equivalent.
Although the partial-wave decomposition is general, if there are an infinite number of partial waves it may not be so useful.One can instead phrase the unitarity condition as an integral equation for the modulus B(z) and phase ϕ(z) [14,15]: 1 The proof [1] assumes that the external particles are scalars, that the amplitude is symmetric under crossing, and finally that there are no bound states. 2 This unitarity relation holds for elastic scattering of non-identical scalar particles AB → AB.The relationship between F (z) and the standard amplitude ⟨p3, p4| T |p2, p1⟩ = (2π) 4 δ 4 (p1 + p2 − p3 − p4)T (s, t) at fixed s, see e.g.[12,13], is (1 − z) , where in writing this formula we assumed for simplicity that all particles have equal mass m.For scattering of identical particles AA → AA only even spin partial waves appear in the sum (1), so that F (z) = F (−z), and the relationship to the standard scattering amplitude becomes F (z) ≡ where z 2 (z, z 1 , ϕ 1 ) ≡ zz 1 + 1 − z 2 1 − z 2  1 cos ϕ 1 .
(3) By evaluating Eq. ( 2) at z = 1 so that z 2 = z 1 we immediately get a necessary condition on B(z) to be a valid modulus, namely This "dual" bound already severely restricts the space of allowable B(z).
Given B(z) it is in general very difficult to solve Eq. ( 2) to find ϕ(z).Indeed, there are two closely related, but unanswered, questions we can ask 1.For which B(z) is there a solution to Eq. (2)?That is, which elastic-scattering cross sections can conceivably be realized in a unitary quantum field theory?
2. For which B(z) can there be more than one solution to Eq. ( 2)?That is, when is the phase unique?
Both of these questions were studied some time ago and only partially answered, as we now review.The sharpest statements so far have been made by applying the contraction mapping principle to the unitarity equation, where the search for a phase solution can be recast as a problem of finding the mapping's associated fixed point [14,15,16].The current bounds on both existence and uniqueness have been derived based on the integrated form of the kernel in Eq. ( 2) The maximum of this function was denoted by Martin as To motivate this, we note that since | cos[ϕ(z 1 ) − ϕ(z 2 )]| ≤ 1 Eq.( 2) implies If the phase ϕ(z) is constant then by Eq. ( 2) B(z) must be constant as well and sin ϕ = sin µ = B, so this bound is saturated.Furthermore, since ϕ must be real we have that for constant phases, sin µ ≤ 1 and B ≤ 1.It has also been proven that as long as sin µ ≤ 1 for any given B(z), a corresponding phase always exists.The proof treats the Eq. ( 2) as a non-linear operation ϕ n+1 = O(ϕ n ), and applies the Leray-Schauder principle to argue for the existence of a fixed point [3].We discuss this approach more in Section 3.1.Conversely, there exist differential cross sections with sin µ > 1 for which no phase exists, the simplest example being constant B > 1.So one cannot hope to push this sufficient criterion for the existence of a phase further.If sin µ > 1 there is no known test to determine whether or not a phase exists for a given B(z) beyond (4).
Regarding the question of uniqueness, the contraction mapping principle was applied to demonstrate that any solution with sin µ < √ 5−1 2 ≈ 0.79 is unique [3,14], while further refinements [17] pushed the bound up to sin µ < 0.86.For polynomial amplitudes (finite number of partial waves) sin µ ≤ 1 is enough to ensure both existence and uniqueness [3].For polynomial amplitudes, it has also been shown that if the average modulus σ T = 1 2 1 −1 dzB 2 (z) satisfies σ T < 1.38 then uniqueness is guaranteed.For general amplitudes with an infinite number of partial waves, it has been conjectured [3,15] but not proven or disproven that uniqueness should still hold if sin µ < 1.In the elastic scattering region that we consider, any nontrivial (i.e.excluding F (z) → −F (z) ⋆ ) phase ambiguity is expected to be twofold at most, as has been proven for genuine entire functions (i.e not a polynomial) [18], see also [10].
A modulus with two corresponding phases was found by Crichton in 1966 [19].Crichton's solution has only the L = 2 partial waves and sin µ = 3.2.Shortly after, the complete set of L = 2 phase ambiguous solutions was characterized [20].The lowest value of sin µ among these was 2.6.Solutions with L = 3 and L = 4 have also been studied [21,22].These solutions are discussed in Section 4.
Phase-ambiguous solutions have also been found with an infinite number of partial waves in [23].The lowest published value of sin µ among these, to our knowledge, is sin µ ≈ 2.15.These results are reviewed in more detail in Section 5. Applying modern numerical methods we are able to find a phase-ambiguous solution with sin µ ≈ 1.67.
In this paper, we revisit some of these old questions about phase determination in the elastic regime using modern numerical methods and machine learning.Recent advances in machine learning have given rise to a multitude of applications in physics, from jet tagging algorithms [24], to fast detector simulators [25] or AI-driven symbolic regression [26,27] and give us the perfect tool for tackling hard numerical problems for which classical algorithms are challenging to design.Is it well known that neural networks are universal function approximators [28] and as such are ideal candidates for solving integro-differential equations.In particular Physics-informed neural networks have been shown to be able to resolve multi-dimensional differential equations [29], where they act as a functional ansatz and have a loss function given by the differential equation of interest.The extension to integral equations usually involves a discretization scheme for the actual integral and has been studied and implemented in various libraries [30,31,32,33].In the following, we will explore how similar techniques can be applied to study and solve the unitarity integral equation Eq. ( 2), demonstrating how to numerically recover the phase corresponding to a given input differential cross section.We will highlight the interesting duality between the convergence properties of the machine learning algorithm and the kernel function, making the link with bounds derived in the literature.Finally, we will deploy various neural networks and impose a repulsive loss in order to probe the uniqueness of the recovered solution.
We begin in Section 2 by describing the machine learning setup and approach we take to establishing consistency between a modulus and a phase.The unitarity constraint is encoded in Eq. (2).There are different ways to solve this equation.Given a known modulus B(z), for example from experimental cross-section data or some S-matrix-bootstrap computation, one can then search for a phase ϕ(z) consistent with unitarity.To find moduli with phase ambiguities, one can alternatively search for 3 functions B(z), ϕ 1 (z) and ϕ 2 (z) all consistent with unitarity.To guarantee that ϕ 1 (z) and ϕ 2 (z) are not trivially equivalent (e.g. by ϕ 1 (z) = π − ϕ 2 (z)), we will also need to add a repulsive loss to keep the solutions apart.Section 3 discusses how to use our machine learning set-up to find ϕ(z) given B(z).Section 4 discusses the case of phase-ambiguous solutions for amplitudes with a finite number of partial waves and Section 5 the infinite partial-wave case.By exploring this space through a combination of a machine learning search and a refinement using a classical algorithm capable of high precision, we find a large number of new phase-ambiguous solutions.The lowest sin µ among these is sin µ ≈ 1.67, a value significantly closer to sin µ = 1 than the best previously known example sin µ ≈ 2.15.A summary and conclusions are in Section 6.

Machine Learning implementation
Finding a unitary amplitude for a given input differential cross section boils down to solving the Eq. ( 2).Solving differential or integral equations with machine learning is a problem that has already been tackled efficiently in the literature, through the use of Physics-Informed Neural Networks (PINNs) [29].In a PINN the idea is to use a neural network u σ (⃗ x) as a surrogate of the solution u(⃗ x).Here u σ is a neural network that takes as input the data ⃗ x and has parameters (weights and biases) described by σ.The precise architecture of u σ can be fine-tuned given the problem at hand but typical setups consider simple feed-forward neural networks.Having the neural network ansatz allows one to take derivatives efficiently with respect to the inputs, a desirable property for solving differential equations.Indeed the loss function for PINNs is usually taken to be the differential equation itself, evaluated at a set of collocation points.
In our problem, we have a few notable particularities that depart from the typical PINN use case: 1. We are solving an integral equation as opposed to a differential equation.In practice, this is done by approximating the integral, for instance with Gaussian quadrature or a trapezoidal rule, which will inevitably lead to some numerical errors.
2. We will be interested in understanding whether the phase solution is unique.Probing this property could be done by training different neural networks that are initialized with different random seeds.Another approach, which is one that we will prefer, is to add a repulsion term in the loss function.This allows us to simultaneously train different neural networks, each corresponding to a distinct solution of the integral equation.
4. We are interested in parsing through the space of differential cross sections to probe existence and uniqueness criteria.When explicitly looking for ambiguous solutions we will see that after parameterizing the amplitude in some simple way we are able to learn both the phase ϕ(z) and the modulus B(z).
Similarly to PINNs however we will parametrize the phase ϕ(z) by a neural network ϕ σ and ask for it to solve the Eq. ( 2).The parameterized phase ϕ σ (z) shown in Fig. ( 1) is a network that takes in a single input and depends on a set of neural network parameters σ.These parameters are to be updated and optimized to satisfy a given objective or loss function, typically given by the unitarity integral equation.Contrarily to PINNs, we will not have any specific boundary condition to satisfy, rather we will force the output of ϕ σ (z) to lie within the range [−π, π].This is done by adding a scaled sigmoid or tanh activation function at the end of the network.Since we are interested in solving a formal equation we are free to take any z point as part of our training data, provided z ∈ [−1, 1].

Implementation details
Following the discussion of the previous section we parametrize ϕ(z) by a network with a simple feed-forward architecture, which we implement with PyTorch [34].We will restrict ourselves to small architectures, typically 4 layers with 64 nodes each using Rectified Linear Unit (ReLU) activation functions.The outputs are constrained in the [−π, π] range by adding a scaled hyperbolic tangent function after the final layer 3 .The loss function is taken to be the Mean Squared Error (MSE) of the integral equation, averaged over a set of N c randomly sampled collocation points, namely: In practice, the expectation value E z appearing in the loss is estimated by averaging over the set of collocation points {z c } as The two-dimensional integral is also estimated, discretizing the integrand over a grid in (z 1 , ϕ 1 ) space.For training, we will also consider a scaled version of this loss defined as which has the desirable feature of having terms of order 1.When learning B(z) this loss will also discourage the network from learning arbitrarily small moduli.One could be tempted to further normalize and divide the loss function by sin ϕ(z), but this leads to numerical instabilities if the expected phase value nears 0.
During training the expectation value in the loss function is approximated by averaging over a batch of 64 randomly sampled {z c } collocation points.Random sampling ensures that the entire angle range is properly resolved and not overfitted.The network parameters are updated at the end of each epoch, defined here by the complete processing of a single batch.The parameter update is done via the Adam optimizer [35] where we set the coefficients β 1 = 0.9 and β 2 = 0.999 to their default values.These coefficients correspond to the exponential decay rates of respectively the first and second moment estimates of the gradient.In order to calculate the loss function and the relevant two-dimensional integrals we have to resort to a numerical approximation.We use the trapezoidal rule 4 , where at each fixed z value we pick out 25×25 reference points, linearly spaced out in the z 1 and ϕ 1 directions, giving us an evaluation grid for approximating the two-dimensional integral.In general the trapezoidal rule for a function f (x) will give a numerical error scaling as KN −2 , where N is the number of points picked along a single direction and |f ′′ (x)| < K.In our problem of interest, with our choice of points, we expect the trapezoidal rule to give errors in the range of 10 −4 − 10 −6 at different z values.This implies that as L E ∼ 10 −8 the loss becomes of the order of the numerical precision that we operate at.Our default model choice and hyperparameters are summarized in Table 1 and any deviation from those in the numerical experiments will be mentioned explicitly.Different choices of hyperparameters have been considered but those listed ended up giving the best performance after a brief optimization search.

Single phase determination
We start our analysis by probing the question of existence of ϕ(z) given B(z).That is, we will be interested in training a single neural network to recover a phase ϕ(z), which is a solution to Eq. ( 2), assuming that the modulus B(z) is a known function.We will start by verifying that the network can be trained to recover solutions in the regime where sin µ < 1, where we have guarantees on the existence of the function.Focusing on simple polynomial differential cross sections, we will illustrate that the loss landscape is sensitive to the value of sin µ and that the existence bounds are respected.We will also demonstrate that our method is able to recover solutions when sin µ > 1, taking examples where the amplitudes are parameterized by an either finite or infinite partial wave decomposition.

Warmup: simple examples
To get started, we first consider simple polynomial forms for the modulus.We consider a linear function B(z) = (z + 4)/10 and a quadratic function B(z) = (z 2 + 1)/2.Both moduli are positive across the z range and have sin µ values that are respectively sin µ 1 = 47 90 ≈ 0.522 and sin µ 2 = 13 15 ≈ 0.867, guarantying the existence of a solution.Their integrated kernels K(z) (cf.Eq. ( 5)) whose maximum gives sin µ are shown in Fig 2 .We implement different neural networks following the setup described in Section 2 and let them run for 5000 epochs.The final performance is evaluated on a test set of 100 linearly spaced out z points.We show at the bottom of Fig. 2 the predicted phases for both cases considered.The final evaluation losses are both of the order of L S E ∼ 10 −8 , thus at the order of the numerical precision which is supported by the numerical integration scheme.For moduli satisfying sin µ < 1 such as these, the numerical fixed point iteration [15] applies and we have verified with a classical algorithm that the solutions agree with the ones found by our framework.
In Fig. 2, we can observe that for both the linear and quadratic cases, the phases ϕ(z) look a lot like the integrated kernel K(z).This is straightforward to understand.When ϕ(z) ≪ 1, one can expand the unitarity equation Eq. (2) using sin ϕ(z) ≈ ϕ(z) and cos[ϕ(z 1 ) − ϕ(z 2 )] ≈ 1 to see that ϕ(z) = K(z) to first order in ϕ(z).Indeed, one can then expand to second order in ϕ(z) giving and so on.This is actually just a version of the fixed-point iteration scheme starting with ϕ(z) = 0.
One can in principle use this procedure even if ϕ(z) is not small.There the iterative scheme takes and aims at finding the fixed point ϕ ⋆ = Φ(ϕ ⋆ ).However, if sin µ > 1 then K(z) > 1 for some z and sin ϕ(z) = K(z) has no solution for real phases ϕ(z).This is one reason we only expect phase ambiguities for sin µ > 1.Generally, we find that for sin µ < 1 the iteration tends to converge fairly quickly (although we cannot prove it is independent of the initial condition for the iteration), but when sin µ > 1 it often does not converge at all.The unitarity equation also implies the bound sin ϕ(z) ≤ K(z), which is respected in our results, giving an important cross-check.For these simple polynomial amplitudes, we can see from Fig. 2 that K(z) ≈ B(z) −1 .This is once again expected in the regime where the integral appearing in Eq. ( 5) is slowly varying.In particular, for the linear modulus, we have Since −1 < z < 1 this function is essentially constant and K(z) ≈ c B(z) for some c results.As a side note, when ϕ(z) is a solution then so is π − ϕ(z).These solutions are said to be trivially related and either could have been expected.Changing the initialization seed of the networks is a way to recover such alternative solutions.

Scanning the loss landscape
Having validated our method on two simple polynomial examples, we can now look into the performance of our implementation on families of B(z).We consider two families: a linear one, B(z) = az + b and a quadratic one B(z) = cz 2 + d.For each value of a and b or c and d we can search for a phase.Although the network cannot tell us for sure whether unitarity is exactly satisfied, the loss of the neural network provides a good proxy for satisfaction.We thus explore the loss landscape and compare it to other indicators of whether unitarity can be satisfied.

Linear functions
We consider the family B(z) = az + b with a and b real and b > |a|, which ensures positivity of B(z) for all z values.Although B(z) is a polynomial, the amplitude F (z) = B(z)e iϕ(z) will generally not be.Indeed, this parameterization for B(z) is not compatible with any unitary polynomial F (z) (see Appendix A).As such, any numerical solution for ϕ(z) has to be understood as possessing an infinite partial wave decomposition.We conduct a scan over this family of B(z) by taking a 75 × 60 grid over different a, b values, with a ∈ [−0.5, 2.0] and b ∈ [0, 2.0].For every parameter pair, we train a new neural network using the scaled loss function of Eq. ( 9) for 2000 epochs.We then evaluate the base and scaled losses, Eqs.(8) and (9), on the resulting solutions.In Fig. 3 we show the complete scaled loss landscape, along with the sin µ = 1 and 1 −1 B(z 1 ) 2 = 2B(1) contours.Those are contours for respectively guaranteeing and excluding solutions.A zoomed-in perspective on the regions of low losses is shown in Fig. 4. sin µ < 1 seems to give a good indication that a solution exists or not.This is non-trivial -the ML algorithm knows nothing about sin µ and there could equally well have been an entirely different functional of B(z) which characterized the existence of a solution.The correspondence of sin µ = 1 with the boundary of the allowed region is further explored in the bottom panels.There we also show the values of sin µ across this linear B(z) family.
On the bottom right we show that the boundaries of sin µ ∼ 1 and L S E ∼ 10 −5 almost perfectly overlap.As the loss crosses this threshold it rapidly grows by many orders of magnitude, up to 10 −4 − 10 −2 , indicative of a regime where the network is unable to find a solution outside of sin µ > 1 for linear moduli.In particular, the bottom boundary of sin µ = 1 overlaps with the dual exclusion bound, across which the network has L S E > 10 −5 and does not find any solutions.However, near the top sin µ = 1 boundary, we have a very thin region of potential solutions with sin µ > 1.All of these solutions fall within the grey region and are not forbidden by the dual exclusion bound.Additional dual bounds constraints are explored in Appendix D but do not go beyond the simple exclusion bound we have considered.For each a and b we find a phase ϕ(z).Red regions indicate that no solution is likely.The black curve is sin µ = 1 and delimits the region within which we are guaranteed the existence of a solution.The grey curve delimits the bound of Eq. ( 4) and solutions cannot be found outside of its enclosed region.

Quadratic functions
Next, we consider the family of symmetric quadratic moduli with B(z) = c + dz 2 .To keep B(z) positive we restrict to c > |d|.We proceed in a similar fashion as in the previous section, constructing a grid of 45 × 180 points for c ∈ [0, 1.5] and d ∈ [−0.5, 5.5], where we train a new neural network at each point for 2000 epochs.
In Figs.5-6 we repeat the same plots as for the linear function: the loss landscapes for the base and scaled losses along with the sin µ values and the L S E cut.Within the sin µ < 1 region, where a solution is guaranteed, the loss is generally small and L S E < 10 −5 is always satisfied.This is a strong sign that our network is able to properly find the expected solutions in this region.We notice however that the network still finds approximate numerical solutions with L S E ∼ 10 −5 up to sin µ ∼ 1.1 − 1.2.In Fig. 6 we can see two regions of low loss away from sin µ < 1: two small islands circled in panel d and an additional one-dimensional curve along which the loss is around L S E ∼ 10 −6 − 10 −5 .Both regions are significantly outside of the sin µ = 1 boundary.
As shown in Appendix A the two islands correspond to the genuine finite partial wave solutions of order L = 2, which have quadratic differential cross sections.Although genuine solutions, their associated loss is around L S E ∼ 10 −5 as their corresponding sin µ values are quite high, around 2.95 and 3.67 respectively.Indeed, as sin µ grows the networks become harder to train as can be understood from the structure of the loss function of Eq. ( 9).For sin µ > 1 the cosine term in the integral needs to precisely modulate the kernel function in order to have a value that can be matched with | sin ϕ(z)| < 1. Resolving these high sin µ solutions to better accuracy will require using a higher number of training epochs and some additional fine-tuning of the neural network parameters.Alternatively, provided one knows that a finite partial wave solution is expected, it is also possible to first parametrize the unitarity amplitude as in Eq.( 1).One can then fit the phase shifts δ ℓ by ensuring that modulus |F (z)| matches  the B(z) given as input.In that context, one can go back and forth between the machine learning scans and the classical fitting algorithm in order to fully characterize the low loss landscape.
Similarly, as described in Appendix A, the 1D curve of sin µ > 1 solutions is associated with finite partial wave solutions of order L ̸ = 2. Their corresponding moduli are numerically well approximated by a quadratic B(z) in the z ∈ [−1, 1] region.Thus, even though their moduli are not quadratic per see, these spurious solutions show up in our scans and are associated with low loss values.

Extremal amplitudes
Up until now, we have considered only toy amplitudes where the modulus is a low order polynomial.
Here we demonstrate that the same technique can also be applied when using differential cross sections obtained in the context of the nonperturbative S-matrix bootstrap program, see e.g.[36,37,38].One class of amplitudes of interest are "extremal" amplitudes that maximize or minimize the value of the amplitude at the crossing-symmetric point λ 3 ), which intuitively measures the strength of the interaction between particles.By combining the so-called primal and dual methods, see [39], the maximal value of the coupling was obtained to be Red regions indicate that no solution is likely.The black curve is sin µ = 1 and delimits the region within which we are guaranteed the existence of a solution.The grey curve delimits the bound of Eq. ( 4) and solutions cannot be found outside of its enclosed region.
whereas for the minimal coupling [38], the current bound is It was also found that when maximizing/minimizing various couplings, while elastic unitarity was not imposed it effectively emerged with a good precision in the extremization process.This structure was further explored in [40].In the context of the present work, it is therefore interesting to consider the differential cross-sections produced by the extremal amplitudes and check that elastic unitarity can indeed be satisfied by finding the appropriate ϕ(z).
To analyze this case in more detail, we take the numerical results from [37] and use them to compute B(z) that we then use as input.The functions B(z) computed in this way are more physical than our toy functions in that they come from amplitudes that satisfy analyticity, unitarity and crossing at all energies.Due to the fact that they describe the scattering of identical particles, they obey B(z) = B(−z).This symmetry is also respected by the expected phase, and we force our networks to output a symmetric result by considering [ϕ σ (z) + ϕ σ (−z)]/2 as the final output.In order to expedite the numerical evaluations of the input B(z) functions, we will fit them by symmetric polynomials of finite order.
We find that more interesting results arise from the amplitudes that are closer to minimizing the coupling.Using the primal bootstrap results of [37] for N max = 26, we get λ ≈ −6.48.We choose energies to be s = 4.5m 2 and s = 8m 2 and extract the appropriate B(z) functions.Those are fitted by symmetric polynomials of order 14 and 40 respectively, which are then fed as inputs to our networks.Training is done over 5000 epochs, minimizing the scaled unitarity loss and using the default architecture and hyperparameters.The input B(z) and associated phases are plotted on the top panels  of Fig. 7, where we also include the phase predictions from the bootstrap amplitudes.At s = 4.5m 2 the modulus has sin µ = 0.64, while at s = 8m 2 the modulus is associated with sin µ = 1.64.
To verify the accuracy of our results we also plot the evaluation of the unitarity loss of Eq. ( 9) on the bottom panels of Fig 7.This is done for both the trained networks and the bootstrap predictions and allows us to verify to which extent unitarity is broken in both cases.We can immediately notice that at s = 4.5m 2 the bootstrap and the neural network phases agree with one another and that in both cases unitarity is well respected.The unitarity loss is lower for the phase coming from the neural network as it has been specifically trained to minimize this quantity, while in the case of the bootstrap program elastic unitarity is only an emergent property.At s = 8m 2 the difference between the neural network and bootstrap phases is more pronounced.Unitarity is not respected with very good accuracy, in particular for the bootstrap phase.At the z = ±1 edges the loss reaches the order of 10 −2 and indicates that numerically the emergent elastic unitarity property does not hold accurately.[37] for the amplitude with λ ≈ −6.48 and N max = 26.We extract B(z) at different s energies and use a neural network to predict the associated phase.(Bottom panels) Evaluation of the unitarity loss of Eq. ( 9) on the predicted amplitudes coming from the bootstrap program and the trained neural networks.

Phase ambiguities: finite partial waves
So far we considered the existence question: given a modulus B(z) when does there exist a phase ϕ(z) so that the amplitude F (z) = B(z)e iϕ(z) is unitary?A related question is: when are there two possible phases for the same B(z)?These phases should not be related by the redefinition, ϕ(z) → π − ϕ(z) (such redefinition is called a trivial ambiguity).The value of sin µ is often considered as an indicator of whether ambiguous solutions may exist.The best bound in the literature [17] guarantees uniqueness for sin µ < 0.89, and it has been conjectured [3] that uniqueness should hold up to sin µ = 1.In practice, however, most known examples have sin µ values much higher (above 2.0) and require a dedicated construction.In the following sections, we will see how machine learning can be used in conjunction with classical algorithms in order to study these ambiguous solutions.In this section, we focus on finite partial wave phase ambiguities and consider the infinite partial wave case in Section 5.
In the finite L case, we both review known results and then discuss how machine learning can help.For finite L, the first phase-ambiguous solution was found by Crichton in 1966 and has L = 2, so the amplitude is quadratic in z.As we will see, for finite L there are an infinite number of phase-ambiguous solutions which decompose into 1d curves in the space of phase-shifts.The low dimensionality of the solution space makes the machine learning approach challenging.In the infinite L case, the solution space is higher-dimensional and easier to explore with gradient descent.

Classical solutions
We first review what is known about the finite L phase-ambiguous solutions classically, i.e. without machine learning.When there are a finite number of partial waves in an amplitude, the question of whether there are multiple phases for the same amplitude reduces to whether a finite set of equations can be solved simultaneously.For finite L we write the amplitude as where P ℓ (z) are Legendre polynomials and δ ℓ ∈ R are the phase shifts.This parameterization in terms of real phases guarantees that the amplitude is unitary.We are looking for another amplitude with the same norm as F (z).We are interested in non-trivial ambiguities.
To find non-trivial ambiguities we need two sets of phases δ ℓ and δ ℓ not all equal (and not all opposite) for which a polynomial of degree 2L.So setting the coefficients of z j from |F (z)| 2 equal to those of | F (z)| 2 gives 2L + 1 equations for the 2L + 2 real phase shifts δ ℓ and δ ℓ .This generically leads to a 1-dimensional solution space.Indeed, the finite L solutions for every L correspond to a set of 1D curves in 2L + 1 dimensions.
The term associated with For this to be the same with δ ℓ and δ ℓ requires δ L = δ L so that the highest phase shift for the two solutions must be equal.There may or may not be one of the 1D curves which has a given value of δ L .However, if we find a point on one of these curves with a given δ L we can then move along the curve unambiguously to determine all the other connected solutions.
For example, with L = 1 equating the expression for |F (z)| 2 using the two sets of phase shifts gives Matching the coefficients of z 0 , z 1 and z 2 gives 2L + 1 = 3 equations for the 4 phase shifts δ 0 , δ 1 , δ 0 and δ 1 .The z 2 term forces δ 1 = δ 1 and the z 0 forces δ 0 = δ 0 so that there are no nontrivial solutions with L = 1.The L = 2 case is already fairly complicated.A non-trivial solution with L = 2 was found by Crichton [19]: and Set 2 : that give rise to two amplitudes F (z) and F (z) which share the same differential cross section.
Shortly after Crichton's paper, Atkinson, Johnson, Mehta and de Roo found the complete set of L = 2 solutions [20].These form a 1D curve in the space of phase shifts, as expected.The value of sin µ for these solutions is shown in Fig 8 with Crichton's point indicated.Crichton's solution has sin µ = 3.2.For L = 3 the complete space of solutions is also known.For L = 4 only a handful of solutions are known.
Rather than attempting to improve these analytic results, we simply take a brute-force approach to finding solutions.To do so we first pick a random δ L = δ L .Then we search for a solution to the remaining 2L equations and 2L unknows close to random seed points for the other δ ℓ and δ ℓ .Once the equations are solved, we then confirm that the solutions are not trivially related.By sampling enough points one can see the emergence of a set of curves (one can also move along the curves to find connected solutions if desired).Results for L = 2, 3, 4 and 5 are shown in Fig. 8. Interestingly we observe that the minimum value of sin µ with δ L ̸ = 0 does not seem to decrease with L. 6 For L = 2, the lowest value has sin µ ≈ 2.63.For L = 3 it is sin µ ≈ 3.41.The lowest value for L = 3 is when δ 3 = 0 which reduces it to L = 2.Such points do not show up in our search since they must have δ L = 0 exactly which will never occur in a random scan.For L = 4 and L = 5, the lowest values we found with δ L ̸ = 0 are sin µ ≈ 3.58 and sin µ ≈ 3.83 respectively.Based on these observations, we do not believe that going to higher finite L will reveal ambiguous solutions with sin µ values smaller than those from L = 2.

Machine learning with repulsive loss
One might hope to use machine learning to find lower values of sin µ for finite L. This becomes difficult because the solution space is one-dimensional.Thus, if one starts on a particular curve and does gradient descent in sin µ, one will only find the local minimum of that curve and never be able to jump to other curves.Fortunately, this is only a problem for finite L. Ambiguous solutions with infinite L fill higher dimensional regions which are easier to explore.The finite L case is still useful for exploring the machine learning approach as we have some exact phase ambiguous solutions, such as the Crichton one.So we will use finite L as a testbed for constructing a neural network capable of finding phase ambiguities.The lessons learned from these examples can then be applied to the more promising infinite L case.
We first attempt to recover both of Crichton's L = 2 solutions.To begin, we simply try to find the phase multiple times and hope to get different answers based on different initialization seeds.We fix B(z) = |F (z)| from Crichton's solution, as shown in Fig 9a .We first simply define two independent neural networks (ϕ 1 (z) ϕ 2 (z)) and train them according to the principles of Section 3 with different random initializations.We let the networks run for 5000 epochs using the loss of Eq. ( 8).We do find two phases this way, as displayed in Fig. 9b alongside the theoretical solution coming from using the second set of phase shifts δ.Unfortunately, the phases are trivially related: ϕ 1 (z) = π − ϕ 2 (z).As we randomly initialize new neural networks we can end up recovering either solution (or the second Crichton solution).What is apparent from this simple experiment is that we need a way to avoid trivial ambiguities.
In order to study the uniqueness property associated with a given differential cross section we devise a methodology for consistently recovering different solutions with our neural networks.The setup is almost identical to the one used in Section 3, where we start by independently initializing various neural networks that aim to solve the unitarity equation, minimizing the losses of either Eq. ( 8) or Eq. ( 9).The main deviation from this simple setup follows previous work in the literature [41] and consists in introducing a new repulsive term in the loss function.The role of this term will be to push apart the various neural network solutions and ensure that they do not overlap.
To introduce the repulsion term we must first define a measure for the closeness of two solutions ϕ 1 (z) and ϕ 2 (z).This measure should account for the periodicity properties of the phases to avoid boundary effects.One choice is to first define as a distance between two phase solutions.
For the repulsive loss itself, we will consider two alternatives.The first one, the kick repulsion, follows [41] and consists in introducing the pairwise loss where p is some hyperparameter to be fixed.The first term in this loss ensures that we do not get exactly the same solutions, while the second term pushes us away from the trivial ambiguity.Since this repulsive term is always non-null, it will only be included in the loss function at intermediate epochs.
More precisely, we let the networks first train for e i epochs, then activate the repulsion and then turn it off after a total of e f epochs.In that way, this repulsive term in the loss function acts like a kick that pushes the two solutions apart but doesn't prevent the network from achieving arbitrary low loss even when the solutions are similar.At a given epoch t the full loss function for N networks now reads where λ R is another hyperparameter defining the relative repulsion strength and L E is the unitarity loss.At the end of the training run, we can then evaluate the repulsion term to check if the two solutions found are indeed not identical or correspond to the trivial ambiguity.
The second loss that we considered is a decaying repulsion.It consists of an interaction term that is not singular even when the solutions overlap.This is done by adding the loss term where c(t) is some hyperparameter.In order to precisely fit the different solutions we want the repulsion to be inconsequential as the training nears the end.This is achieved by making the parameter c(t) epoch dependent, increasing throughout the training.In particular, we will take where the parameters are chosen such that c(t) starts at c(0) = c 0 and reaches 99% of its maximal final value c f after e f epochs, with c(e f ) = 0.99c 0 s f where s f is the total scale factor 7 .At a given epoch t the full loss function for N networks now reads where the repulsive loss term is always included 8 .We summarize the hyperparameters for the two types of repulsive losses in Table 2.
Adding the repulsive loss L R1 , we train a neural network for 10000 epochs using the unitarity loss of Eq. ( 8).After a quick search of the hyperparameter space, we use p = 2, e i = 200, e f = 300 and λ R = 1.0, although other values could be considered with additional fine-tuning.The phases recovered by the networks are shown in Fig. 10a and are plotted against Crichton's prediction of the finite partial wave parameterization.The final losses for each phase are around L S E ∼ 10 −5 , in good agreement with the expected answer.We note that for the solid blue curve in Fig. 10a we represent the phase coming from using Crichton's phase shifts −δ l , which corresponds to representing the trivial ambiguity associated with Crichton's first solution.
We also train another neural network with the L R2 repulsive loss using c 0 = 2, s f = 16, e f = 1000 and λ R = 50.0.The resulting phases also have a final loss around L S E ∼ 10 −5 .We compare the loss functions using the kick and decaying repulsion in Fig. 10b, where we use a moving average window of 10 epochs to smooth the plot.Whereas the decaying repulsion has a smoothly decreasing loss we can indeed verify that the kick repulsion sharply boosts the loss across a given window.

Infinite partial wave ambiguities
Looking for amplitudes with ambiguous phases that have an infinite number of partial waves has also been explored classically.We first review some results in this direction, then apply machine learning to the infinite L case. 7Additionally we ask for c ′ (t) to be maximum at e f /2.These constraints fix the parameters of c(t) to be a = e f /(2 arctanh [(100 − 99s f )/(100 − 101s f )]), b = e f /(2a), ∆1 = c0(100 + 99s f )/200 and ∆2 = c0(−100 + 101s f )/200. 8In practice when the repulsive loss starts to be smaller than a tenth of the unitarity loss we will discard it to facilitate training.This check is to be performed throughout training and the repulsive loss can be reactivated as soon as we breach that threshold.On the left panel, we show the two distinct ambiguous phases recovered by our networks.On the right panel, we compare the evolution of the two repulsive losses introduced in Eq. ( 22) and Eq. ( 25).

Classical solutions
One approach to finding phase ambiguities with infinite L is based on partially factorizing the amplitude and conjugating some of its zeros.The following discussion follows [23].Any amplitude can be written (non-uniquely) as for some L and some g(z).When g(z) = 1 the amplitude is a polynomial with a finite number of phase shifts.For L = 0, this is just F (z) = g(z) an arbitrary function.This form is still useful, since if we conjugate any number of z ℓ the amplitude will have the same norm.In particular, the amplitude Constructing amplitudes in this way guarantees they have the same norm but does not guarantee unitarity.And moreover, even if F (z) is unitary, F (z) generally will not be.
In order to make progress, one needs to restrict the class of functions searched.Ref. [23] focused exclusively on the simplest case where two amplitudes differ by a single zero.Doing a partial wave decomposition of g(z) we can write This is similar to a normal partial wave decomposition where S ℓ = e 2iδ ℓ so that the unitarity condition δ ℓ ∈ R is equivalent to prefactor the unitarity condition is not |γ ℓ | = 1 but rather |S ℓ | = 1.Solving for the S ℓ in terms of the γ ℓ gives The condition that |S ℓ | = 1 then gives a recursion relation among the γ ℓ .Writing this relation can be written in descending form: or equivalently in ascending form: Note the sign ambiguity: there are generally two solutions at each step leading to 2 n sign choices.Generally, only one of them will give finite amplitudes, with ε ℓ → 0 as ℓ → ∞.There is nevertheless no clear criterion for deciding which sign to choose at each recursion step.As discussed by [23] an additional necessary condition for the unitarity of F (z) following from |S ℓ | = 1 and finiteness is that This means that each pair of successive γ ℓ must have the same phase.The phase can only change if γ L = 0 for some L, in which case the γ L−1 and γ L+1 can have different phases.So overall the series of γ ℓ comprises sequences with the same phase separated by zeros.Moreover, for the amplitude to be finite γ ℓ → 1 as ℓ → ∞, which means that there has to be some L beyond which all of the γ ℓ are real.Since some γ ℓ has to be complex (or else F (z) and F (z) are complex conjugates), we conclude that γ L−1 = 0 for some L and γ ℓ ∈ R for ℓ ⩾ L. Atkinson et al. call such solutions class L. Class 2 amplitudes have γ 0 ∈ C, γ 1 = 0 and γ ℓ ∈ R for ℓ > 1.One way to find such solutions for a given z 1 is by guessing a γ 2 ∈ R and recursing upwards.Given z 1 and γ 2 we can solve for γ 0 using the downward recursion relations.This gives From here, one can iterate upwards from γ 2 demanding that γ ℓ be real for ℓ > 2. It is only possible for γ ℓ to be real if the upward-iterating discriminant is real, which implies This must hold for all ℓ so, in particular, it gives an allowed range for ε 2 = 1 − γ 2 .For a given value of γ 2 and z 1 as one iterates upwards one may find that some higher γ is imaginary.If this happens for all choices of signs in the recursion relation then that value of γ 2 is disallowed.This approach is called the ascending iteration.Although Ref. [23] found the ascending iteration inefficient, we find it can actually work quite well.An alternative search procedure is the descending iteration.There one starts at large ℓ where |ε ℓ | ≪ 1 and one can linearize the recursion relation.Solving the linearized version exactly gives the Legendre polynomial of the second kind and C a constant.One can then take this form for ε L and ε L+1 for some large L and iterate downwards.Only for some value of C will γ 1 = 0. Thus one can search for γ 1 = 0 via the shooting method, as one might search for eigenfunctions of a differential equation.
We find that both the ascending and descending solutions are computationally extremely intense, sometimes requiring 60+ digits of precision to converge to a trustworthy solution.The choice of which sign to pick on each step of the recursion is problematic as the search tree grows exponentially.The descending solution is inferior in the sense that it can never produce an exact solution since the asymptotic form is assumed to be reached at some finite L; the ascending solution can give an exact answer if the seed at low γ ℓ is exact.In addition, we find that in some cases, the asymptotic behaviour at large ℓ is approached very slowly: even at ℓ = 100, the partial wave coefficients are not exponentially small.In particular, this happens for solutions that are strongly peaked at z = ±1 requiring large numbers of modes in their partial wave decomposition.Such solutions happen to be the ones which we found with low values of sin µ (see Fig. 15 below).
In Ref.A challenge with the iterative approach is that there is no clear prescription for which sign to choose at each step in the iteration.When using the descending iteration Ref [23] defined Region I to have all positive signs in the recursion relation in Eq. (32).Region II is defined by choosing − for ℓ = 2 and + for all other ℓ in Eq. (32).Region III has a minus sign for ℓ = 2, 3 only, while region IV has minus signs at ℓ = 2, 3, 4. We also define a new region V which has a minus sign for ℓ = 3 only.The points belonging to these respective regions are coloured differently in Fig. 11.We extend this study by searching for solutions in a 80×40 grid in z 1 space.Our results are shown on the right side of Fig. 11.Although not clear from the figure, we find that the regions overlap: some points have solutions for the same z 1 but different sign choices in the iteration.Note that such pairs of solutions have different moduli so there are still only at most two phases for a given modulus (it is only z 1 which the solutions share).
Although Atkinson and collaborators found a number of phase-ambiguities at infinite L, they could only consider a small class of functions.They restricted to a single zero conjugated, as in Eq. ( 28), and moreover looked exclusively at class 2 amplitudes where γ 1 = 0 in their numerical work.Even with these assumptions, searching through the different regions is tedious, and trying to generalize to other classes or multiple conjugated zeros would be a herculean task.We next explore how machine learning can search more efficiently for solutions.

Machine learning complex functions
For the machine learning approach, we start with the class of functions in Eq. ( 28) with a single conjugated zero.However, we do not need to do a partial wave decomposition.So we write simply where f (z) is the function to be learned and z 1 will be treated as a fixed input.The function f (z) is parametrized by a neural network, closely following the implementation of Section 2, however for this application we need f (z) to be a complex function instead of a real phase.In order to avoid complex numbers we have two obvious choices: have the network learn the modulus and phase of f (z) or have it learn the real and imaginary parts of f (z).We tried both approaches but found that learning the real and imaginary parts was the most promising so we restrict to that choice in the following discussion.
For our neural network implementation, we modify the neural network depicted in Fig. 1 by removing the final Tanh layer and having two final outputs instead of one.Removing the Tanh layer is justified since we are now predicting the real and the imaginary part of the amplitude directly and thus do not need to constrain them within a given finite range.The two amplitudes are then reconstructed following the Eq. ( 37) and training is done using the unitarity loss of Eq. ( 9) and the decaying repulsion of Eq. (23).
Although the iterative algorithm described in Section 5.1 and the machine learning implementation try to find similar solutions, the approaches differ in several key points: 1.The iterative algorithm requires the choice of a particular coefficient γ L that is to be set to zero, yielding a class L amplitude.In the machine learning implementation, the same neural network is used to recover any type of solution.For the same z 1 there can be phase-ambiguous solutions of different classes.Whereas these different class solutions can be individually picked out by the classical algorithm, the machine learning implementation will naturally tend to yield the one that is easiest to train, typically the one that has the lowest sin µ value.
2. The classical algorithm requires the choice of signs for the discriminant at each step in the iteration.So for L steps, there are 2 L choices, leading to 2 L "regions".Most of these choices will not yield solutions, but there is no clear way to restrict the search.In the machine learning approach, the regions play no role: the algorithm will find solutions in all regions automatically.
3. In the iterative algorithm unitarity is automatically enforced by the partial wave decomposition.However, the iteration may yield divergent results.A shooting method is used to find a sensible boundary condition for the iteration.In the machine learning implementation unitarity is enforced by minimizing a loss function.In practice one has to impose a cutoff on the loss L E in order to assess whether the learned amplitudes do indeed respect unitarity.
4. In the iterative algorithm, it is easy to see if the iterative solution yields a trivial ambiguity since the complex coefficients are known.In the machine learning approach, a repulsive loss has to be used to avoid trivial ambiguities.This makes the resolution of ambiguous solutions that are naturally close to one another more difficult.
5. Finding a solution with the classical algorithm can require high numerical precision.For example, we found that to confirm solutions close to the boundary of allowed z 1 values (where sin µ is minimized) one can require 60 digits of precision or more.On the machine learning side, the precision is limited by the numerical integration scheme required when calculating the unitarity constraint of Eq. ( 8).One generally cannot expect to have more than 5-10 digits of precision at best.However, to explore the space of solutions, high numerical precision is not required, as we could see in previous examples in Section 3 or Section 4.2.
In summary, the ML algorithm has the advantage of not needing a bunch of discrete choices and special cases to search.Thus it has the potential to search for a much broader class of solutions than the classical algorithm.On the other hand, its numerical precision is limited: you can never know if it actually finds a solution or not.An optimal approach may be to combine the two approaches: exploring the landscape of solutions with machine learning and then using a classical algorithm to refine particular solutions we find.

Resolving the z 1 landscape with ML
As a warm-up, we take one of the points and solutions found in [23] which belongs to region I: z 1 = 6 5 + 3 5 i.With this z 1 value, we train a network for 5000 epochs with the unitarity loss of Eq. ( 9) and using the decaying repulsion with c 0 = 2, s f = 16, e f = 1000 and λ R = 2.0.We extract the phases corresponding to the amplitudes of Eq. ( 37) along with their respective moduli (which are identical by construction).We show these in Fig. 12, along with the solutions recovered by the iterative algorithm.The final loss values are around L S E ∼ 10 −6 for both solutions, and we observe good agreement with the answer derived from the classical approach.
We note that the amplitudes and phases for this solution are similar to the ones that we obtained when solving for the Crichton ambiguity in Section 4.1.This observation will hold for a major part of the z 1 plane (where solutions are expected to be found) and prompts us to implement a better initialization for our neural network.When searching for another solution along the z 1 plane, it will be advantageous to initialize the network with a solution from a neighbouring point.The main training run will then be done over a smaller number of epochs and will not consider any repulsion term for the loss function.Since the phase solutions will be seeded at initialization as being properly distinct, we do not expect further training to modify them drastically and, instead, we will recover the two genuine ambiguous solutions. 9ext, we explore the space of z 1 values with phase ambiguities.To do so, we create a grid of 80 × 40 points in the complex z 1 plane with Re z 1 ∈ [0, 2] and Im z 1 ∈ [0, 1].This region is motivated by the known bound on the allowed range of z 1 (see Fig. 11), but as we will see, that region will be

B(z)
Network B(z) Atkinson B A (z) Figure 12: Phase-ambiguous solutions rediscovered with machine learning compared to previous results from [23].This solution has the form of Eq. ( 37) with z 1 = 6 5 + 3 5 i. rediscovered independently by the network.At each point, we train for 500 epochs where the new neural networks are initialized with the trained networks of a nearest neighbor. 10he loss landscape from our scan is displayed on the left in Fig. 13.The black curves here are an analytical bound on possible solutions: all possible infinite-L single-conjugated-root solutions must lie within those curves.That does not mean that the entire region is allowed.The bounds also do not apply to the finite-L polynomial solutions which can (and do) have |z 1 | < 1, as discussed in Appendix B. From the figure, we can clearly see a region of low loss within the allowed region.In addition we recover a small domain with |z 1 | < 1 that has low loss, which is associated with polynomial solutions.We also note that not all of the allowed region has low loss.Indeed, for Re(z 1 ) ∈ [1, 1.2] and Im(z 1 ) ∈ [0, 0.3] the networks all have high evaluation losses with L S E > 10 −3 , indicating a failure to recover ambiguous solutions in that area.However, the iterative classical algorithm shows that some of the points should correspond to genuine solutions (see regions IV, V in Fig. 11).Upon inspection we see that the solutions in this region are highly oscillatory -they do not resemble the functions obtained in the region I. Since these solutions are sufficiently dissimilar they cannot hope to be resolved by neural networks that have been initialized following another class of solutions and would require independent training and optimization trials in order to be recovered.This is certainly possible.But these solutions  Looking at this solution it appears that B(z) is peaked near z = ±1 and the phases are roughly linear going between π 2 at z = 1 to either π or 2π at z = −1.By exploring this structure, we can consider a toy amplitude of the form with a, b, ϕ a and ϕ b real numbers.Unitarity then implies Recall that the expected lower bound on sin µ for which there are ambiguous solutions is sin µ = 1.
If a < 1 then the condition sin µ = 1 automatically leads to the value ϕ a = π 2 and then the second constraint in Eq. ( 39) becomes sin ϕ b = a sin ϕ b .Since we assumed a < 1 we must have ϕ b = π, 2π.Remarkably, these are exactly the z = ±1 endpoints of the phases we found, see Fig. 15.That is, we find two solutions with a < 1.Unfortunately, these two solutions are related by F + = −F ⋆ − which is a trivial ambiguity.Based on this feature, it is interesting to contemplate a scenario in which any family of phase-ambiguous solutions will approach the trivial ambiguity as sin µ → 1.It would be very interesting to explore this further.

Extensions beyond one zero
So far we have concentrated our efforts on finding solutions where a single zero z 1 is complex conjugated.This is only a small subset of all possible ambiguous solutions.In general, one could consider probing ambiguous solutions constructed from complex conjugating multiple different zeros.With the classical approach, one could try to construct an iterative algorithm in the style of [23].With more zeros, there is a larger space to search and many more discrete choices to make.While progress is possible, it seems extraordinarily tedious to search this way.The complexity of the classical approach is to be contrasted with the flexibility of the machine learning implementation, where one simply needs to modify the  For z = 1 both phases approach π 2 , whereas for z = −1 they approach π and 2π correspondingly.Curiously, the same features are exhibited by the simple toy model (39).Right panel shows the integrated kernel whose maximum is sin µ.This is the lowest known value of sin µ for which a phase ambiguity exists.δ 0 -0.7435785523  3: Here we present first 25 phase shifts for the sin µ ≈ 1.67 solution with two possible phases.Only the first two phase shifts differ between the two amplitudes.To find the results we ran the ascending algorithm with ∼ 100 − 200 modes and checked that the significant figures quoted above are convergent, in that they are not sensitive to how many modes are included.parametrization of Eq. ( 37) by taking out the appropriate number of zeros z 1 , . . ., z n .Training can then proceed in the exact same way with no additional conceptual work.
As a warm-up with multiple zeros, we show that a known ambiguous L = 3 solution can be reproduced with the same ML construction.Ref. [21] considered cases where one root z 3 is held fixed and the other two roots z 1 and z 2 are conjugated: For the machine learning setup, we parameterize we can then take z 1 and z 2 as inputs and learn a single complex function f (z) as we did in the case of one zero.
To be concrete, one example solution found in [21] had a ≈ 2.80 + 4.67i, z 1 ≈ −0.74 − 0.06i, z 2 ≈ 0.19 + 0.03i and z 3 ≈ 1.32 + 0.95i.We take only z 1 and z 2 from this known solution and then train to find f (z).We train the network for 5000 epochs using the decaying repulsion with the parameters of Section 5.3.To speed up training we also seed the neural network for f (z) by the one that has been trained on the z 1 = 6 5 + 3 5 i point.The result is shown in Fig. 16.We find that the phases and moduli of the two solutions agree with those found in [21].The agreement is robust, with an evaluation loss around L S E ∼ 10 −5 indicating that we indeed recovered the expected solution.We note that although the phases in the left panel of Fig. 16 appear to be discontinuous, the functions predicted by the network, Ref (z) and Imf (z), are themselves continuous.This translates into the real and imaginary parts of the amplitude F (z) being continuous, as displayed on the right panel of Fig. 16.The modulus and integrated kernel K(z) are also shown.The sin µ value comes from the maximum of K(z), which gives sin µ ≈ 18 for this example.
Following the procedure outlined in Section 5.3 one can also proceed to do gradient descent in the z 1 , z 2 space in order to minimize sin µ.An example of a gradient descent trajectory is displayed in Fig. 17 where both roots are simultaneously updated at each point of the descent.The main difficulty we encounter is ensuring that the gradient descent follows a trajectory of low loss.As can be observed in the right panel of Fig. 17, the evaluation loss associated with our trained networks starts to blow up at a given point along the trajectory, after which we cannot trust that viable ambiguous solutions are being recovered.The last value that can be trusted yields an ambiguous solution with sin µ ≈ 2.57.In order to reach a trustworthy lower value of sin µ one could envision modifying the gradient descent update of Eq. ( 38) by forbidding updates that increase L S E substantially.While finding low sin µ phase-ambiguous solutions in this way is conceivable, our initial assessment suggests that the amount of oversight required for this approach outweighs its probability for success, so have not pursued this direction further.

Conclusions
In this paper, we have explored the problem of determining the phase of an amplitude from its modulus using modern machine learning.In the elastic scattering regime, the modulus and phase of an amplitude are constrained by a non-linear integral equation which enforces unitarity.Although the equation is difficult to solve analytically or with traditional numerical methods, it is easily solved with machine learning.Given a modulus B(z) with z the cosine of the scattering angle, a phase ϕ(z) for the amplitude can be parameterized as a neural network, then determined from unitarity through gradient descent.Using this technique we were able to reproduce known results for finite partial wave amplitudes, infinite partial wave amplitudes, and amplitudes determined by other S-matrix bootstrap principles.A few obvious extensions of our work include focusing on the scattering of identical particles, considering elastic scattering of spinning and/or flavored particles which would lead to a coupled system of unitarity equations, as well as doing the computation in the general number of spacetime dimensions d.
More generally, it would be very interesting to apply machine learning to explore the full amplitude in both energy and angle, as in the classic analysis of pion scattering in [42], or more recent explorations of the space of nonperturbative amplitudes starting from [36].This would require imposing in addition to unitarity the constraints of analyticity and crossing.In fact, methods very similar to the ones used to analyse elastic scattering at fixed energy were developed for the full amplitude by Atkinson, see e.g.[43,44,45,46], and were recently successfully implemented numerically [40].They are based on iterations of unitarity and are expected to converge only for a small subset of admissible amplitudes.More powerful gradient-descent type methods to construct the full amplitude have not been developed yet and it is to be seen if machine learning could be useful to tackle this problem.
Coming back to the present paper, there are two important open questions in S-matrix theory which we have shown machine learning approaches can address.The first is whether a phase ϕ(z) exists at all for a given differential cross section, or equivalently, a given modulus B(z) of the scattering amplitude.This problem is solvable in a straightforward manner with machine learning.Code to find ϕ(z) from B(z) is available here https://github.com/aureliendersy/S-Matrix-Bootstrap.It has been proposed that a functional sin µ which involves a non-linear integral over B(z) (see Eq. ( 6)) is a good criterion for whether a solution exists.It has been shown that for sin µ < 1 a solution always exists.If sin µ > 1 there are three possibilities 1) no phase may exist 2) a unique phase may exist 3) two non-trivially related phases may exist.Examples are known in all three cases.No clear criterion is known however to determine which case applies for a given B(z) in general.Only options 2) and 3) are possible for sin µ < 1 but no criterion is known to decide which.The analytical bound is that uniqueness must hold if sin µ < 0.86 or the average modulus 1 2 1 −1 dzB 2 (z) is less than 1.38.In the literature, ambiguous solutions are known with at best sin µ ≈ 2.15.Using machine learning, we have found B(z) with ambiguous phases with sin µ ≈ 1.67.This is the first improvement on this bound in 50 years.
The machine learning approach offers several distinct advantages over classical approaches.The framework we have developed for solving the unitarity integral equation is general and recovering a phase can be attempted for any input modulus.This is to be contrasted with classical fixed point iteration schemes, which only converge if sin µ < 1.This straightforwardness has enabled us in Section 3.1 to extensively explore various polynomial moduli and determine which ones are consistent with unitarity.We confirmed the existence bounds set by sin µ < 1 and identified the region in moduli space where sin µ > 1 solutions could be expected.Extending the setup to probe the uniqueness of the solution space was equally conceptually simple and only required the addition of a repulsive term to the loss function.Classical approaches have instead focused only on analytically solvable cases, such as finite partial waves of low order, or on specific parametrizations for the amplitudes.One such parametrization proposed in [23] was reviewed in Section 5.1 and contrasted with a machine-learning solution of the same problem.Whereas the classical algorithm required multiple discrete choices of parameters, carving out separate solution regions, the machine learning algorithm was able to smoothly interpolate across the whole solution landscape.This flexibility comes at a cost, lack of numerical precision, but allows a complementarity approach to traditional numerical methods.It is in that spirit that we have demonstrated in Section 5.3 that one can utilize the smoothness of the machine learning loss landscape to perform gradient descent and find ambiguous solutions with low sin µ.There the inflexible, but powerful, classical algorithms allowed further refinement in order to obtain the lowest possible solution precisely.Extensions to other amplitude parameterizations are immediate with our machine learning framework whereas developing the corresponding classical iterative schemes (if possible) would require considerable amounts of effort.
Although here we focused on the narrow problem of the relationship between the modulus of an amplitude and its phase in the elastic scattering regime, a similar methodology can be used for much broader questions.The S-matrix bootstrap approach attempts to apply a set of general constraints such as unitarity, analyticity, and crossing to constrain the form of amplitudes.Implementing these constraints directly 12 is a nontrivial task and the subject of many ongoing works, see e.g.[36,37,38,40].Machine learning offers the potential to search through a broad class of functions and perform the gradient descent efficiently, as we have seen here for phase-ambiguities in the elastic regime.In addition, a similar methodology could help with the analytic S-matrix bootstrap which applies constraints such as collinear limits or possible locations of singularities to perturbative scattering amplitudes.The classical approach has already been very successful, bootstrapping the 6-point amplitude in N = 4 super-Yang-Mills theory to 6 loops this way [47].Additional constraints are known, such as those on sequential discontinuities [48], but have not been incorporated.Machine learning could make it easier to apply additional constraints and it would be very interesting to explore the potential of machine learning for the S-matrix bootstrap further.This paper represents just a small first step into a field with enormous possibilities.polynomial B(z).We start our analysis by looking for solutions that could have a corresponding linear B(z) = az + b.Since B(z) 2 is a polynomial of order 2, our finite partial wave solution must be of order 1 and parameterized as F 1 (z) = e iδ 0 sin δ 0 + 3e iδ 1 sin δ 1 z where we used the explicit representation for the first two Legendre polynomials.Equating |F 1 (z)| 2 = (az + b) 2 we find the system which can only be realized for a = 3b with δ 0 = δ 1 .Then we have F 1 (z) = ±e iδ 0 az + a 3 that satisfies the unitarity constraint and where the absolute value is necessary to ensure positivity for −1 < z < 1.Thus we do not have a simple finite partial wave amplitude that is associated with B(z) = az + b where b > |a| as was considered in Section 3.2.1.
For the quadratic modulus B(z) = az 2 + c we can proceed in a similar fashion, parameterizing and deriving a similar system of equations We have 3 different solution sets.The first one with δ 1 = 0 does not lead to a valid solution with both a > 0 and c > 0. The second solution set has δ 0 = 0 and δ 1 = δ 2 ± π 2 .For δ 1 = δ 2 − π 2 we can have a = 15 4 3 7 with a = 3c that leads to a valid quadratic differential cross section.The third solution set has 2 with a = 5c that leads to a valid solution.Summarizing, for B(z) = az 2 + c we have two valid solutions with a > 0 and c > 0: which are respectively associated with the moduli As discussed in Section 3.2.2,scans over quadratic moduli revealed a 1D curve of low loss, whose corresponding B(z) do not match any of the ones of Eq. (52-53).Upon closer inspection, the low loss values are explained by the numerical closeness of the input moduli with ones corresponding to finite partial wave solutions with L ̸ = 2.For instance the modulus B c 2 (z) = 22 29 z 2 + 26 29 has a loss L E S < 10 −6 and is numerically within 0.3% of another modulus corresponding to the L = 3 solution For the best fit values of δ 0 = 2.051, δ 1 = 0.4578, δ 2 = −3.131,δ 3 = −3.128,we have an associated B 3 (z) that is numerically close to B c 2 (z) which we display on the Fig. 18a.We refine the phase learned during our quadratic scans by letting the network run for 5000 epochs with B c 2 (z) as input.We then compare the resulting phase to the exact L = 3 solution as shown in Fig. 18b.The agreement between the learned phase and the finite L = 3 solution explains why the associated loss value is so low for this spurious solution.Along the 1D curve of low loss, all of the factious solutions found have the same property in that their quadratic modulus is well approximated by a finite partial wave solution with L ̸ = 2.It is to be noted that the finite partial wave solutions of Fig. 18b look qualitatively different from the infinite partial wave solutions found in the sin µ < 1 region, with one example displayed in the bottom right panel of Fig.

B Phase shift ambiguities with finite partial waves
Whereas the main convergence region of the algorithm by Atkinson et al. is limited to |z 1 | > 1 for solutions with a large number of non-zero partial waves, it can also resolve a select few solutions within the unit circle, corresponding to true finite partial wave amplitudes.Finite partial wave ambiguities can be resolved exactly when the number of partial waves is small, as has been described in the literature for L = 2, 3, 4 [20,21,22].In particular, the L = 2 ambiguous solutions are all of the type described by Ref. [23].At L = 2 the two ambiguous amplitudes F (z) and F (z) are polynomials of order 2 and possess one common root z 2 .The other root is distinct and is respectively z 1 and z ⋆ 1 .Since the difference F (z) − (− F ⋆ (z)) is a linear polynomial in z, the amplitudes correspond to genuine class 2 solutions.We represent in Fig. 19a the roots associated with these ambiguous amplitudes, noticing that we have the real part of z 1 that precisely equates 4  5 .As expected we have recovered solutions both outside the z 1 unit circle but also inside of it.All of these solutions can be recovered by both the classical descending algorithm and our machine learning implementation.
At L = 3 we have two different families of ambiguous amplitude solutions which share two roots z 2 , z 3 and differ only by a single root, z 1 and z ⋆ 1 .One family has Im(z 2 + z 3 ) = 0 and the other family has Im(z 2 + z 3 ) ̸ = 0.In the first case the difference F (z) − (− F ⋆ (z)) is a linear polynomial in z, a class 2 solution, while in the second case, the same difference is a quadratic polynomial, hence a class 3 solution.The first family is resolved by the original implementation of the descending algorithm while the second family would require a shooting method that aims at finding γ 2 (C) = 0 for γ 2 appearing in Eq. (28).Both family and their respective roots are displayed in Fig. 19b, where notably we have a plethora of points lying within |z 1 | < 1.

C Scaled and non-scaled losses
The difference between using the non-scaled or scaled losses of respectively Eq. ( 8) and Eq. ( 9) becomes apparent when studying edge cases.One example of interest concerns differential cross sections that are almost vanishing at a particular z value, making sin µ blow up.The B(z) term in the denominator of Eq. ( 9) makes the whole loss expression close to singular in that case.A simple example to probe this is to take B(z) = z 2 /2 + ϵ.The differential cross section is positive but almost vanishes at z = 0, such that sin µ = (60ϵ 2 + 20ϵ + 1)/(60ϵ).For ϵ < (10 − √ 85)/30 we have sin µ > 1 and the existence of a solution is not guaranteed.In particular, the classical fixed point iterative scheme of [15] does not converge.
We can compare how the choice of the loss function plays a role in this edge case scenario, by training different neural networks using either Eq. ( 8) or Eq. ( 9) as a loss function.We create a series of different ϵ values, distributed in ϵ ∈ [10 −5 , 10 −1 ], and train a neural network for 2000 epochs at each point, using either loss function.We show in Fig. 20a the scaled loss L S E at evaluation and in Fig. 20b the non-scaled loss L E at evaluation.In the sin µ < 1 region the networks trained using the scaled loss (orange curves) perform better on both evaluation metrics and accurately recover the phase solutions.In the sin µ > 1 region the networks trained with a scaled loss also perform better overall.In particular, we observe a dip in the evaluation losses around ϵ ∼ 10 −3 , where L S E < 10 −4 and L E < 10 −7 .For the networks trained with the non-scaled loss (blue curves), no such dip is observed and instead, the scaled loss at evaluation blows up when ϵ < 10 −2 , as shown in Fig. 20a.To understand this point better we can plot the learned phases at a few specific values of ϵ.In Fig. 21a and Fig. 21b we plot the phases learned at respectively ϵ = 0.04715 and ϵ = 0.00091 where the former corresponds to a point in the sin µ < 1 region and the latter to the dip in the sin µ > 1 region.For sin µ < 1, both phases are identical and both networks properly resolve the phase solution.However, as sin µ > 1, the phases learned by the two networks become drastically different.The networks trained on the base loss L E will learn a simple deformation of the phase solution while the networks trained on L S E will learn a brand new phase shape.The dip in the evaluation losses that we observed can be explained by the numerical proximity of the phase to a genuine finite partial wave solution, as discussed in Appendix A. This feature will only be able to be captured by the networks trained using the scaled loss.It is important to mention however that in most circumstances (including for physical differential cross sections) we do not expect B(z) to be close to vanishing and thus both training losses will perform similarly.

D Simple dual bounds
Let us consider the following problem: can a given function B(z) be an elastic differential cross-section?One obvious requirement is that B(z) ≥ 0 but there are more constraints.Let us show that not any B(z) can arise as a differential cross-section.We consider the following integral (60) We then notice that using elastic unitarity This bound is correct but it is trivially satisfied given (57).
To get better bounds we need to put more constraints.Imagine that we know that B(z) 2 is a polynomial of a maximal degree N .We can then consider zero projections (63) We can try to add these zero projections to the argument above.Consider for example the spin three zero projection where in the second line we rewrote it in terms of partial waves.
We can now derive bounds by considering the following positive semi-definite problem, see e.g.[49] for a similar analysis in the case of dispersion relations, where N i are zero projection matrices f * N i f = 0, T is the target quantity that we want to bound t = f * T f , and I is the diagonal matrix with elements being 2ℓ + 1.
If we have found c ± and n i such that the matrix above is positive semi-definite, we get the bound Numerically, this can be done first by truncating in spin, and then extrapolating the cut-off to infinity.Implementing the spin-3 zero projection constraint we get that Spin 3: which is an improvement of the previous bound.Similarly, we can consider the spin-four zero projection For polynomial cross-sections we have infinitely many null constraints which we could try to use to derive the dual bounds.These bounds must be satisfied and they do not depend on the existence of the actual solution.

Figure 1 :
Figure 1: We utilize a neural network ansatz for parametrizing the phase solution.The feedforward neural network has a series of layers with learnable parameters σ ending with a final Tanh activation function, constraining the outputs to lie within the range [−π, π].

Figure 2 : 2 (
Figure 2: We warm up by determining the phase ϕ(z) for a linear modulus B(z) = z+4 10 (left) and a quadratic modulus B(z) = z 2 +1 2 (right).Top panels show the integrated kernel K(z) and its maximum sin µ.Bottom panels show B(z) and the phase ϕ(z) found with machine learning.

Figure 3 :
Figure 3: Scaled loss landscape for the two-parameter family of moduli B(z) = az + b.For each a and b we find a phase ϕ(z).Red regions indicate that no solution is likely.The black curve is sin µ = 1 and delimits the region within which we are guaranteed the existence of a solution.The grey curve delimits the bound of Eq. (4) and solutions cannot be found outside of its enclosed region.
Cut on the scaled log loss landscape

Figure 4 :
Figure 4: Zoom on the loss landscapes for the two-parameter family of moduli B(z) = az + b.Black and grey curves follow Fig. 3. Top panels show the base loss and scaled loss.Panel (c) shows a heat map of sin µ values over the family B(z) = az + b (no phase is determined or needed).Right shows the scaled loss landscape with a hard cut of L S E ∼ 10 −5 .This loss boundary agrees very well with the sin µ = 1 boundary.

Figure 5 :
Figure 5: Scaled loss landscape for the two-parameter family of moduli B(z) = cz 2 + d.For each c and d we find a phase ϕ(z).Red regions indicate that no solution is likely.The black curve is sin µ = 1 and delimits the region within which we are guaranteed the existence of a solution.The grey curve delimits the bound of Eq. (4) and solutions cannot be found outside of its enclosed region.
Cut on the scaled log loss landscape

Figure 6 :
Figure 6: Loss landscapes for the two-parameter family of moduli B(z) = cz 2 + d.Black and grey curves follow Fig. 3.For each c and d we find a phase ϕ(z).Red regions indicate that no solution is likely.Top panels show the base loss and scaled loss.Panel (c) shows a heat map of sin µ values over the family B(z) = cz 2 + d.Right shows the scaled loss landscape with a hard cut of L S E ∼ 10 −4.5 , along with the L = 2 finite partial wave solutions circled in black.The 1D curve corresponds to finite L > 2 solutions.

Figure 7 :
Figure7: (Top pannels) B(z) moduli and associated ϕ(z) phases coming from the bootstrap program of[37] for the amplitude with λ ≈ −6.48 and N max = 26.We extract B(z) at different s energies and use a neural network to predict the associated phase.(Bottom panels) Evaluation of the unitarity loss of Eq. (9) on the predicted amplitudes coming from the bootstrap program and the trained neural networks.

Figure 8 :
Figure 8: Finite partial-wave amplitudes with phase ambiguities separate into non-intersecting 1D curves in the L + 1 dimensional space of phase shifts.For L = 2 there is only one curve.These plots show a projection of these curves into the sin µ, δ L plane for L = 2, 3, 4, 5.The point on the L = 2 plot is Crichton's original ambiguity with δ 2 = π 9 .The solid curves are analytic solutions when known.The blue dots are a random scan.The red line gives the minimum sin µ in each case.

Figure 9 :
Figure 9: Crichton ambiguity: naively training using two independently initialized neural networks.On the right panel, we notice that the two phases are trivially ambiguous.
Loss functions for the different repulsion types

Figure 10 :
Figure10: Recovering the Crichton ambiguity with the addition of a repulsive loss.On the left panel, we show the two distinct ambiguous phases recovered by our networks.On the right panel, we compare the evolution of the two repulsive losses introduced in Eq. (22) and Eq.(25).

Figure 11 :
Figure11:The left panel shows the region in z 1 space where non-polynomial amplitude pairs with a single conjugated zero are possibly allowed.The smaller green region is the possibly-allowed region for class 2 amplitudes where γ 1 = 0. Points are solutions found in[23] labelled by their values of sin µ.Different colours correspond to different choices of signs in the descending iteration.The right panel shows solutions we found using the descending iterative scheme of Ref.[23].Different colours are different regions corresponding to different sign choices at various steps in the iteration.Class 2, L = 2, 3 partial waves polynomial amplitudes are also represented.
[23], Atkinson et al. focused exclusively on parameterizations of the form of Eq. (28) with one zero conjugated.They showed that non-polynomial solutions are only possible if|z 1 | > 1, |Im z 1 | < 1 and if in the region where |Re z 1 | > 1 the additional constraint |z 1 | 2 − 2|Re z 1 | + 1 ≤ 1 holds.The majority of the examples were class 2 (which has γ 1 = 0) for which one can impose the stronger constraints regions and the explicit points Atkinson et al. found are shown in Fig. 11.The lowest value of sin µ listed was sin µ = 2.15.

Figure 13 :
Figure 13: Loss landscape and sin µ values in the search for phase ambiguities in amplitude pairs with a single conjugated root.The solid black lines delimit the region reachable with the descending algorithm of [23].Class 2 amplitudes (those with γ 1 = 0) are only allowed between the grey dashed line and the lower black line |z 1 | = 1.The low-loss points outside of the lower black curve (|z 1 | = 1) correspond to finite-L solutions.Right panel shows the sin µ landscape for low losses L S E < 10 −4.5 and non degenerate solutions L R2 < 0.99.

Figure 14 :
Figure 14: Gradient descent in the z 1 plane for minimizing sin µ.The left panel shows the value of sin µ along the trajectory, along with the scaled loss L S E at each point.The right panel displays the trajectory in the z 1 plane.The point with the minimal sin µ is found with the classical ascending algorithm by extending the gradient descent trajectory.

Figure 15 :
Figure 15: Left panel shows the modulus and unitary phases for the amplitude with sin µ ≈ 1.67.Notice that it is peaked both in the forward, z = 1, and backwards, z = −1, directions with B(1) ≈ 3.53 and B(−1) ≈ 44.2.For z = 1 both phases approach π 2 , whereas for z = −1 they approach π and 2π correspondingly.Curiously, the same features are exhibited by the simple toy model(39).Right panel shows the integrated kernel whose maximum is sin µ.This is the lowest known value of sin µ for which a phase ambiguity exists.

Figure 16 :
Figure16: Resolving the ambiguous solutions associated with a finite L = 3 partial wave differential cross section.The two solutions differ by complex conjugation of two of their zeros.We display the phase outputs of the neural network compared to the exact solutions (top left panel) and the prediction for the real and imaginary parts of the first solution's amplitude F (z) (top right panel).The bottom panels show the learned modulus B(z) and the corresponding kernel K(z).

Figure 17 :
Figure 17: Gradient descent in the z 1 , z 2 space for minimizing sin µ.The left panel displays the trajectories of both the z 1 and z 2 roots for the points along the descent where L S E remains small.The right panel shows the value of sin µ along the trajectory, along with the scaled loss L S E at each point.The gradient descent trajectory cannot be trusted once the evaluation loss blows up.

Figure 18 :
Figure 18: Comparison between the exact L = 3 finite partial wave solution and the machine-learned one associated with the quadratic input modulus B c 2 (z).On the left panel, we display the numerical closeness of the two moduli and on the right panel we compare the exact L = 3 phase and the machine learned one associated with the input B c 2 (z).

Figure 19 :
Figure 19: Roots of the ambiguous polynomial ambiguities at L = 2, 3. We only represent solutions where the two ambiguous amplitudes have a single distinct root, respectively z 1 and z ⋆ 1 .At L = 2 the solutions are all of Ref's.[23] class 2, as described in Section 5.1.At L = 3 the solutions can be class 2 or class 3.
Trained with S E (b) Non-scaled loss of Eq. (8) at evaluation

Figure 20 :
Figure 20: Training on the input modulus B(z) = az 2 + ϵ using either a scaled (orange) or a non-scaled (blue) loss function.The panels compare the different loss metrics at evaluation time.The dashed black line indicates the transition between moduli with sin µ < 1 and moduli with sin µ > 1.

Figure 21 :
Figure 21: Learned phases for the input modulus B(z) = z 2 /2 + ϵ where networks are trained using either a scaled (orange) or non-scaled (blue) loss function.On the left panel we use ϵ ∼ 0.047 where sin µ < 1 and on the right panel we use ϵ ∼ 9.1 × 10 −4 where sin µ > 1.On the right panel, the network trained with the non-scaled loss does not lead to a physical phase.

Table 1 :
Model architecture and hyperparameters used for the parameterization of the phase ϕ σ (z) with a single neural network.

Table 2 :
Hyperparameters to be tuned for the different repulsive losses considered